Skip Navigation Links
Link to the last RSS article here: Two User Interfaces to the R Statistical System: Rcgi and R Commander - Ed.
The Ins(heet) and Outs(heet) of Importing and Exporting non-Stata Formatted Data
By Patrick McLeod, Research and Statistical Support Services Consultant
One common thread that most fields of social science share is that our data is found or arrives in a bewildering array of different formats: Delimited text, CSV (Comma Separated Values), Excel spreadsheets, or whatever other statistical package is in use in your field that you do not use for research. This article will discuss how to import some of the more common file formats into Stata (CSV, Excel, and fixed-format data) and argue for Stata users (and all users of statistical software) to save data in a platform independent, standardized format.
Copy and Paste
The easiest way to import Excel data into Stata 8 is to open Stata 8, then open the Data Editor (the button to the left of the Data Browser button (the spreadsheet with the magnifying glass on it)). Open your Excel spreadsheet and highlight what data you wish to import into Stata. Right click (if you have a 2 button mouse; otherwise select the “copy” function) and copy your selection. Either click on the first cell in the empty Data Editor or highlight a range of cells appropriate to your copied selection and select “paste.”
The most-invoked data importation command in my Stata experience has been -insheet-. This command reads most spreadsheet and database generated data into Stata. From the description of –insheet-:
insheet reads into memory a disk dataset that is not in Stata format. insheet is intended for reading files created by spreadsheet or database programs. Regardless of the creator of the file, insheet reads text (ASCII) files where there is one observation per line and the values are separated by tabs or commas. In addition, the first line of the file may contain the variable names.
To save yourself some valuable time and, more importantly, some sanity, I suggest saving your Excel files as *.csv (Comma Separated Values) files before you attempt to import them into Stata using –insheet-. If you attempt to import an Excel spreadsheet into Stata, you will usually get the following result:
. insheet using E:\exampledata.xls
(1 var, 0 obs)
If you save the file “exampledata.xls” as “exampledata.csv” before you issue the –insheet- command (“File” – “Save As” – “Save As Type” – “CSV (Comma delimited) *.csv”), you will get the following successful result:
. insheet using E:\exampledata.csv
(71 vars, 889 obs)
On a wider note, I maintain that working with data in CSV format is the easiest way to manage data across platforms. All major statistical packages easily handle importing CSV formatted data and all major statistical packages easily handle exporting data to CSV format. An example of how you would do this in Stata follows (assuming we have the dataset above in memory and that we are exporting it to the A: drive):
. outsheet using A:\exampledata.csv, comma
Issuing this command will export your current Stata data to the file path as a CSV file that can be read into Stata, read into another platform or worked with in Excel.
The –infix- command is used for reading fixed-format ASCII text data into Stata. Data can either be read into Stata by assignment within the –infix- statement:
. infix rate 1-4 speed 6-7 acc 9-11 using E:\highway.dat
Fixed format data can also be read into Stata using –infix- by specifying a dictionary file that defines the data within the –infix- statement:
. infix using E:\highway.dct
As in the –insheet- command, the Stata user can write whatever data he or she is working with in Stata to a fixed-format file for output. To export to a raw data file the command would be:
. outfile using E:\highway.dat
Exporting to raw data with a dictionary file (a dictionary file in Stata format) would look like:
. outfile using E:\highwaydct, dictionary
Using conventional highlight-copy-and-paste and the –insheet- and –infix- commands will solve most platform independent data importation into Stata. For the importation of data from proprietary platforms (such as SPSS, SAS, S-Plus, or EViews) into Stata, the best solution is to save the data in the particular platform in CSV format, or if you do not have access to the platform, to use a program such as Stat/Transfer to convert the data into CSV or Stata format.
I advocate for all data being in a standard, easily readable and easily writeable format. In my experiences, that is comma separated values format: All major statistical packages easily read and write out to CSV format, and it is easily read and written to in Excel, making it an ideal format for saving data for replication purposes or for shipping data between researchers using different platforms on the same project.
Export your data to CSV format when possible! Your fellow researchers, current graduate students and future graduate students will thank you for it!