
Link to the last RSS article here:
Two User Interfaces to the R Statistical System: Rcgi and R Commander - Ed.
The Ins(heet) and Outs(heet) of Importing and
Exporting non-Stata Formatted Data
By Patrick McLeod,
Research and Statistical Support Services Consultant
One common thread that most fields of social
science share is that our data is found or arrives in a bewildering
array of different formats: Delimited text, CSV (Comma Separated
Values), Excel spreadsheets, or whatever other statistical package
is in use in your field that you do not use for research. This
article will discuss how to import some of the more common file
formats into Stata (CSV, Excel, and fixed-format data) and argue for
Stata users (and all users of statistical software) to save data in
a platform independent, standardized format.
Copy and Paste
The easiest way to import Excel data into Stata
8 is to open Stata 8, then open the Data Editor (the button to the
left of the Data Browser button (the spreadsheet with the magnifying
glass on it)). Open your Excel spreadsheet and highlight what data
you wish to import into Stata. Right click (if you have a 2 button
mouse; otherwise select the “copy” function) and copy your
selection. Either click on the first cell in the empty Data Editor
or highlight a range of cells appropriate to your copied selection
and select “paste.”
Insheet
The most-invoked data importation command in my
Stata experience has been -insheet-. This command reads most
spreadsheet and database generated data into Stata. From the
description of –insheet-:
insheet reads into memory a disk dataset that
is not in Stata format. insheet is intended for reading files
created by spreadsheet or database programs. Regardless of the
creator of the file, insheet reads text (ASCII) files where there is
one observation per line and the values are separated by tabs or
commas. In addition, the first line of the file may contain
the variable names.
To save yourself some valuable time and, more
importantly, some sanity, I suggest saving your Excel files as *.csv
(Comma Separated Values) files before you attempt to import them
into Stata using –insheet-. If you attempt to import an Excel
spreadsheet into Stata, you will usually get the following result:
. insheet using E:\exampledata.xls
(1 var, 0 obs)
If you save the file “exampledata.xls” as
“exampledata.csv” before you issue the –insheet- command (“File” –
“Save As” – “Save As Type” – “CSV (Comma delimited) *.csv”), you
will get the following successful result:
. insheet using E:\exampledata.csv
(71 vars, 889 obs)
On a wider note, I maintain that working with
data in CSV format is the easiest way to manage data across
platforms. All major statistical packages easily handle importing
CSV formatted data and all major statistical packages easily handle
exporting data to CSV format. An example of how you would do this in
Stata follows (assuming we have the dataset above in memory and that
we are exporting it to the A: drive):
. outsheet using A:\exampledata.csv, comma
Issuing this command will export your current
Stata data to the file path as a CSV file that can be read into
Stata, read into another platform or worked with in Excel.
Infix
The –infix- command is used for reading
fixed-format ASCII text data into Stata. Data can either be read
into Stata by assignment within the –infix- statement:
. infix rate 1-4 speed 6-7 acc 9-11 using
E:\highway.dat
Fixed format data can also be read into Stata
using –infix- by specifying a dictionary file that defines the data
within the –infix- statement:
. infix using E:\highway.dct
As in the –insheet- command, the Stata user can
write whatever data he or she is working with in Stata to a
fixed-format file for output. To export to a raw data file the
command would be:
. outfile using E:\highway.dat
Exporting to raw data with a dictionary file (a
dictionary file in Stata format) would look like:
. outfile using E:\highwaydct, dictionary
Conclusions
Using conventional highlight-copy-and-paste and
the –insheet- and –infix- commands will solve most platform
independent data importation into Stata. For the importation of data
from proprietary platforms (such as SPSS, SAS, S-Plus, or EViews)
into Stata, the best solution is to save the data in the particular
platform in CSV format, or if you do not have access to the
platform, to use a program such as Stat/Transfer to convert the data
into CSV or Stata format.
I advocate for all data being in a standard,
easily readable and easily writeable format. In my experiences, that
is comma separated values format: All major statistical packages
easily read and write out to CSV format, and it is easily read and
written to in Excel, making it an ideal format for saving data for
replication purposes or for shipping data between researchers using
different platforms on the same project.
Export your data to CSV format when possible!
Your fellow researchers, current graduate students and future
graduate students will thank you for it!
Return to top
|