Skip Navigation Links
Click here to read last month's RSS Matters.
By Dr. Karl Ho, Research and Statistical Support Services Manager
The RSS office has recently expanded our arsenal by adding one more statistical package. Stata is a "general-purpose statistical package that does all of the textbook statistical analyses" (2001 Kolenikov). So, Why Stata? Some may ask, considering the three general stat. packages already on board (namely, SAS, S-Plus and SPSS). Well, that is what I will be addressing in this article. But before I introduce the features of this software, let me briefly talk about our general strategy in acquiring software for the RSS office.
First, our primary goal at RSS is to support researchers on campus. So, we constantly solicit suggestions of new software that best supports latest research procedures and data analyses. However, we need to have a general interest to justify a centrally supported package, i.e., we will have a large enough group of users who will enjoy the benefits of the new acquisition.
Second, to keep us on the cutting edge, we work hard to keep up-to-date on development of methods and research-related software. Certain packages are already well-developed and have strategies on updating procedures to catch up with latest development. But some specialized procedures can only be developed by users and researchers who write their own scripts. We intend to target at these software that support an "open" policy that allows support from a network of advanced users who can supply add-on scripts.
Third, the software has to have an affordable option for students purchase. Given the decentralization of computers and new modes of distributed learning, we have to take into account an option that allows students to acquire the software for home use at a reasonable price.
Last but not least, the price of software must justify the cost in the long run. We cannot support a software, regardless how excellent it is, that raises maintenance prices two- or three-fold in a few years. So, this new package must be cost-effective.
After multiple requests from customers and careful consideration, we decided to acquire Stata, for not merely the reasons stated above, but also a long list of features that support almost all statistical procedures and data analysis. Moreover, it provides ample room for user development. Regarding cost, the perpetual license price is much lower than what we are currently paying for other packages. Also, an affordable package is in place for faculty and student purchase (*).
So, what is Stata?
To give a not-so-general answer, I shortlist some key features that make Stata different from SPSS, SAS and S-Plus.
1. Command-based operation
Stata is primarily designed for daily operations for researchers. Procedures are run in commands at a prompt (.):
For example, the following command generates a crosstabulation of two variables:
table died drug
All output can be easily cut and paste to other environment. The above table is in ASCII and can be copied and pasted as follows:
Unlike other packages, Stata handles data all in RAM (Random Access Memory). That makes execution of procedures much faster than other software packages that spend time in hard disk access and generation of objects. Users can also specify amount of virtual memory to be used for large data sets and matrix operations. In my case, I allocate 8 Mb of space for storage (by adding a /k8000 at the command line), which is sufficient for medium to large data sets on most procedures. There is no limit on the number of observations and variables, only up to the virtual memory or space on the harddrive. Click here to find out the benchmarks on some commonly used procedures.
This is the feature I like the most about Stata: Its "open" architecture permits users to write their own procedures using simple Stata codes. Stata is composed of two types of commands. The basic suite of commands (kernel commands) is responsible for primary operations and "factory-provided" procedures. Users can write ASCII text file programs, called ado files, to run specialized procedures. And, these interpretable ado files can be shared with other users, who just need to download the ASCII files and placed them in appropriate directories. This design makes upgrading, updating and development fast and accessible for all parties involved, vendors and users alike. The built-in connectivity in Stata 7 even allows running upgrades or updates within the software via the internet, provided the machine running Stata is on-line. As a result, although Stata is a product of a commercial firm, it virtually represents an "Open Source" effort by a multitude of users and supporters all over the world.
On top of the above features that make Stata special, another note I want to add is the software is really easy to learn and get attached to. Once you start using it, you will stick to it for most of the analysis. In the following, I introduce how to get started with a simple regression procedure and get to know more about this package.
Stata is composed of four windows by default. All these windows are adjustable in size and fonts according to your personal preference. The biggest window is the output windows, putting on top of the command windows for command input. On the left hand side are the review window recording the commands entered and the variable window. The former facilitates well the data analysis process which takes very often replication and modification of previous procedures. For instance, the following simple command runs a regression on three variables:
regress price mpg headroom rep78
Since regression process takes reparameterization (i.e. modifying the model by changing independent variable combinations), users will find Stata's design really convenient, just by clicking the previous command at the review window, (or just by hitting the pageup key) and change or add variables on the right hand side.
There is one downside about the software, nevertheless. Stata can only read ASCII data and its own binary format of data, although a conversion software, Stat/Transfer, is available from the same company for doing the conversion to and from most formats. Another not-so-smart way is to cut and paste the data from other packages such as Excel to Stata's data window:
Data management tools in Stata are all command-based. So, if you are not familiar with the syntax, it is recommended you create and document all the variables and cases before importing into Stata. It will save you much time from figuring out how to create a subset and conditional variables within Stata.
That said, sources for help are plenty. The company's Website (http://www.stata.com) provides a plethora of links and resources leading to downloadable files and answers to most questions. Stata also provides a low cost course, Netcourse (http://www.stata.com/info/products/netcourse/), delivered over the web for those who are serious about learning Stata from beginner to expert levels. Plus, many class notes and user developed manuals are readily available on-line from Websites of most big Statistics departments (e.g. StatLib at Carnegie Mellon, UCLA, Harvard-MIT Data Center).
Below are some images produced by Stata:
For list of statistical procedures, check Stata's posting at http://www.stata.com/info/capabilities/. Among all these, the following procedures make me choose Stata over other packages:
Not enough? Check out Stata's web site and the links from UCLA, CMU and Harvard-MIT. Spend a day or two and learn Stata and you'll find it as interesting as rewarding.
(*) The full version for student and faculty purchase is priced at $99.