
RSS Matters
Click here
to read last month's RSS Matters.
By Dr.
Karl Ho, Research and Statistical Support
Services Manager

The RSS office has recently expanded our arsenal
by adding one more statistical package. Stata is a
"general-purpose statistical package that does all
of the textbook statistical analyses" (2001
Kolenikov). So, Why Stata? Some may ask, considering the
three general stat. packages already on board (namely,
SAS, S-Plus and SPSS). Well, that is what I will be
addressing in this article. But before I introduce
the features of this software, let me briefly talk about
our general strategy in acquiring software for the RSS
office.
First, our primary goal at RSS is to
support researchers on campus. So, we constantly solicit
suggestions of new software that best supports latest
research procedures and data analyses.
However, we need to have a general interest to justify a
centrally supported package, i.e., we will have a large
enough group of users who will enjoy the benefits of the
new acquisition.
Second, to keep us on the cutting edge,
we work hard to keep up-to-date on development of methods
and research-related software. Certain packages are
already well-developed and have strategies on updating
procedures to catch up with latest development. But some
specialized procedures can only be developed by users and
researchers who write their own scripts. We intend to
target at these software that support an "open"
policy that allows support from a network of advanced
users who can supply add-on scripts.
Third, the software has to have an
affordable option for students purchase. Given the
decentralization of computers and new modes of
distributed learning, we have to take into account an
option that allows students to acquire the software for
home use at a reasonable price.
Last but not least, the price of software
must justify the cost in the long run. We cannot support
a software, regardless how excellent it is, that raises
maintenance prices two- or three-fold in a few years. So,
this new package must be cost-effective.
After multiple requests from customers
and careful consideration, we decided to acquire Stata,
for not merely the reasons stated above, but also a long
list of features that support almost all statistical
procedures and data analysis. Moreover, it provides ample
room for user development. Regarding cost, the perpetual
license price is much lower than what we are currently
paying for other packages. Also, an affordable package is
in place for faculty and student purchase (*).
So, what is Stata?
To give a not-so-general answer, I
shortlist some key features that make Stata different
from SPSS, SAS and S-Plus.
1. Command-based operation
Stata is primarily designed for daily
operations for researchers. Procedures are run in
commands at a prompt (.):

For example, the following command
generates a crosstabulation of two variables:
table died drug

All output can be easily cut and paste to
other environment. The above table is in ASCII and
can be copied and pasted as follows:
----------------------------
1 if | Drug type
patient | (1=placebo)
died | 1 2 3
----------+-----------------
0 | 1 8 8
1 | 19 6 6
----------------------------
Simple as it is, it is fast in generating output,
even graphics and reiteration-intensive procedures such
as Maximum Likelihood Estimation and Logistic
regressions. Despite that, Stata can also please some
users who only attach to menu. A free add-on
module, StataQuest, is available for downloading (http://www.stata.com/support/quest/),
providing teachers and students easy access to some
commonly used procedures:

2. Speed
Unlike other packages, Stata handles data
all in RAM (Random Access Memory). That makes execution
of procedures much faster than other software packages
that spend time in hard disk access and generation of
objects. Users can also specify amount of virtual memory
to be used for large data sets and matrix operations. In
my case, I allocate 8 Mb of space for storage (by adding
a /k8000 at the command line), which is sufficient for
medium to large data sets on most procedures. There is no
limit on the number of observations and variables, only
up to the virtual memory or space on the harddrive. Click
here
to find out the benchmarks on some commonly used
procedures.
3. Modularity
This is the feature I like the most about
Stata: Its "open" architecture permits users to
write their own procedures using simple Stata codes.
Stata is composed of two types of commands. The basic
suite of commands (kernel commands) is responsible for
primary operations and "factory-provided"
procedures. Users can write ASCII text file
programs, called ado files, to run specialized
procedures. And, these interpretable ado files can be
shared with other users, who just need to download the
ASCII files and placed them in appropriate directories.
This design makes upgrading, updating and development
fast and accessible for all parties involved, vendors and
users alike. The built-in connectivity in Stata 7 even
allows running upgrades or updates within the software
via the internet, provided the machine running Stata is
on-line. As a result, although Stata is a product
of a commercial firm, it virtually represents an
"Open Source" effort by a multitude of users
and supporters all over the world.
On top of the above features that make
Stata special, another note I want to add is the software
is really easy to learn and get attached to. Once you
start using it, you will stick to it for most of the
analysis. In the following, I introduce how to get
started with a simple regression procedure and get to
know more about this package.
Using Stata
Stata is composed of four windows by
default. All these windows are adjustable in size and
fonts according to your personal preference. The biggest
window is the output windows, putting on top of the
command windows for command input. On the left hand side
are the review window recording the commands entered and
the variable window. The former facilitates well the data
analysis process which takes very often replication and
modification of previous procedures. For instance, the
following simple command runs a regression on three
variables:
regress price mpg headroom rep78

Since regression process takes
reparameterization (i.e. modifying the model by changing
independent variable combinations), users will find
Stata's design really convenient, just by clicking the
previous command at the review window, (or just by
hitting the pageup key) and change or add variables on
the right hand side.
There is one downside about the software,
nevertheless. Stata can only read ASCII data and its own
binary format of data, although a conversion software,
Stat/Transfer, is available from the same company for
doing the conversion to and from most formats. Another
not-so-smart way is to cut and paste the data from other
packages such as Excel to Stata's data window:

Data management tools in Stata are all command-based.
So, if you are not familiar with the syntax, it is
recommended you create and document all the variables and
cases before importing into Stata. It will save you much
time from figuring out how to create a subset and
conditional variables within Stata.
That said, sources for help are plenty. The company's
Website (http://www.stata.com)
provides a plethora of links and resources leading to
downloadable files and answers to most questions. Stata
also provides a low cost course, Netcourse (http://www.stata.com/info/products/netcourse/),
delivered over the web for those who are serious about
learning Stata from beginner to expert levels. Plus, many
class notes and user developed manuals are readily
available on-line from Websites of most big Statistics
departments (e.g. StatLib at
Carnegie Mellon, UCLA, Harvard-MIT
Data Center).
Below are some images produced by Stata:

For list of statistical procedures, check Stata's
posting at http://www.stata.com/info/capabilities/.
Among all these, the following procedures make me choose
Stata over other packages:
- Pooled Cross-sectional Time Series analyses
(Panel studies), featuring robust models and
models with panel corrected standard errors
- Discrete Choice models or Limited Dependent
Variable analysis featuring all types of logit
and probit models (except Multinomial Probit at
the time of this writing).
- simulation-based inference
- post-modeling diagnostics in Categorical data
analysis, featuring a whole suite of tools from J.
Scott Long of University of Indiana
Not enough? Check out Stata's web site and the links
from UCLA, CMU and Harvard-MIT. Spend a day or two
and learn Stata and you'll find it as interesting as
rewarding.
Reference:
- Stanislav Kolenikov, 2001.
"Review of Stata 7," Journal of Applied Econometrics,
637-646 Vol. 16 (5) pp. 637-646. John Wiley & Sons,
Ltd., available at: http://giganda.komkon.org/~tacik/stata/in.progress/stata-review.pdf
(*) The full version for
student and faculty purchase is priced at $99.
|