Page One

Campus Computing News

Holiday Hours

UNT Internet Services in Transition

Academic Mainframe Shutdown Proposed

Lab-of-the-Month: SOVA

Using ColdFusion: Making a Connection to a Database

Today's Cartoon

RSS Matters

SAS Corner

The Network Connection

Link of the Month

WWW@UNT.EDU

Short Courses

IRC News

Staff Activities

Subscribe to Benchmarks Online
    
Research and Statistical Support

SAS Corner

By Dr. Karl Ho, Research and Statistical Support Services Manager

Data Capability: Reading OSIRIS data in SAS

What if you had a pal from somewhere on planet earth and he just handed you a mainframe tape carrying some old-time data sets created two decades ago? The label reads: OSIRIS data. You start a journey to infinity and beyond since probably it will head nowhere. Then, SAS comes to the rescue and becomes your Buzz Lightyear. You realize, once again, that learning SAS really is worth your while.

The OSIRIS data format

The OSIRIS data format was a popular data format for IBM mainframe in the old days when SAS and SPSS were in their budding stage. OSIRIS data comes with special electronic codebooks and dictionary files readable by the OSIRIS statistical software. An OSIRIS dictionary is a file that contains information to read the separate OSIRIS data file. In most cases, these are the "Type 1" dictionaries, which are in a binary format and written in EBCDIC (Extended Binary Coded Decimal Interchange Code), the mainframe character coding parallel to ASCII in PC. OSIRIS "Type 5" dictionaries are character format files.

SAS has built-in data engines that read in OSIRIS (Type I) data and other old data formats such as BMDP (a Biostatistics program, currently under SPSS). There are two ways of reading OSIRIS data into SAS: PROC CONVERT and OSIRIS data engine.

Converting an OSIRIS data file into SAS in the UNIX environment

In the following, I give an example of converting an OSIRIS data file into SAS in the UNIX environment.  Once the data file is in SAS, you can convert it into another format such as SPSS and Excel.  

1. Convert the data in EBCDIC format

In case the data file (e.g. ICPSR data) is in ASCII format, it needs to be converted into EBCDIC even in the UNIX environment. A UNIX program, converteb -- widely available, can do the job:

converteb infile outfile

where infile is the original data file and outfile the output EBCDIC data file.

2. Define an OSIRIS engine library to read in the data with the dictionary:

libname xxx osiris 'outfile' dict='odict.nnnn';

where xxx is the library name and odict.nnnn is the name of the dictionary. Note that the library is pointing to the files, but not directories.

3. Read in the data in a DATA step:

* Reading in data and converting into SAS data set. *

data newdata;
 set xxx.outfile;
run;

4. Now the data set is in SAS format and can be used for processing or converting to other data formats. To do the latter, use the PROC COPY procedure to export the file into a SAS transport file:

libname outlib xport '/data//temp.portable';
proc copy in=work out=outlib;
select newdata;
run;

5. At this point the new data file temp.portable is created for transporting to other platform or applications. For SPSS user, a simple statement can read the data in:

get sas data='/data//temp.portable'.

For the PROC CONVERT procedure, the syntax is just:

proc convert osiris=odata.nnnn dict=odict.nnnn out=temp; 
run;

Data again need to be in EBCDIC format.  

Data capability is the name of the game.

Data capability of SAS is the main selling point of the software, parallel to its portability and availability of the software in large number of operating platforms. Worth the time and money? I think so, until we have a unified format for all data, which in my humble opinion will be in the ultimate remoteness beyond the universe. At this point, I am still a petite, happy SAS programmer.

P.S.

For those of you who are in a panic over the expiration messages in SAS, lay back and relax. We will have the mainframe and UNIX license updated by the end of November. For Windows and Mac users, the license will last till February 1, 2002.

Happy Turkey Day.

Karl