The University of North Texas, along
with other universities throughout the world, belongs to a consortium
that offers an extensive archive of data sets to its members. The
consortium is called the Inter-university Consortium for Political and
Social Research (ICPSR). Typically referred to as "ICPSR datasets,"
the data files are often very large, usually of very high quality, are
based on professional and sophisticated sampling methods, and cover a
wide array of social, psychological, economic, political, business and
market, and other areas of interest. Many are longitudinal files that
lend themselves well to time series approaches. Various countries
throughout the world are represented as study populations, often
within the same study, allowing the possibility of foreign and
comparative national studies. Still others represent historical data for
the purposes of historical research. Census data for various regions
is also available.
As a member of the Consortium, the University of North Texas esp.,
its faculty and students may use any of the datasets in the ICPSR
archive at no additional cost beyond annual membership fees (i.e..,
there is no cost for individual users). Over time, as faculty and
students request and use ICPSR datasets, the UNT collection grows.
Hence, there are many ICPSR datasets that UNT is already in possession
of and that are available for immediate use by faculty and students
who are aware of how to access them. Access to our collection of ICPSR
datasets involves a minimal knowledge of Job Control Language (JCL),
presented below. datasets that are not already in the UNT archive can be
ordered (electronically) and are typically accessible within 2 to 4
weeks. Contact someone in the Statistical and Research Support group
at Academic Computing Services, 565-2324, if you are interested in
ordering a particular ICPSR study.
ICPSR publishes a comprehensive catalog of its
archival holdings annually. Typically, the first step in the process
of making use of ICPSR datasets is simply to look through the ICPSR
catalog and determine which dataset you are interested in using. This
catalog can be found in various locations around the UNT campus: many
individual faculty members and academic department offices have
copies, a copy is available on reserve at Willis library, and the
Statistical and Research Support group has a copy.
The
second step is to find out whether or not UNT has the data file you
select in its archival holdings. The easiest way to do this is to
search an index file of current holdings this file is a CMS file
called current icpsr d. The file is located on the public d disk of
the Academic Mainframe. Use the following steps to search this file:
Doing this will result in one of two outcomes. One
possibility is that CMS will respond with the message DATA NOT FOUND.
In this case, UNT does not have the particular study in its archive
and you will need to contact Academic Computing to order the dataset
if you are still interested in it. Again, it usually takes from 2 to 4
weeks to access a dataset that must be ordered from ICPSR. The
second possible outcome from step 3 occurs if UNT does have the
particular study in its archive. In this case, after entering /####,
CMS will position the screen to the line where information concerning
that study is found.
For example, if you had been
interested in the AIDS Supplement to the 1987 National Health
Interview Study, from the ICPSR catalog you will have determined that
the ICPSR study number for that study is 9271. In this instance, in
step 3 you would have entered /9271. CMS would respond by generating
the screen of information shown below.
Sample ICPSR Output on CMS 9271 National Health Interview Survey, 1987: AIDS Supplement 105595 9273 Annual Data on Nine Economic and Military Characteristics of 78 Nations, 48-83 105160 9275 General Social survey Cumulative File, 1972-1989 105645 9286 International Crisis Behavior Project, 1918-1988 301929 300768 105160 9287 Offender Based Transaction Statistics (OBTS), 1987: Alaska, Ca., Del., Minn., 105160 9300 World Tables of Economic and Social Indicators, 1950-1988 103652 2 3 105719 105246 9303 Detroit Area Study, 1981: A Study of the Family 105612 9304
There are three
lines of information for each ICPSR study. The first line contains only
the ICPSR study number. Hence, you can see that the first three lines
of the page of the above CMS file refer to ICPSR study number 9271.
Focusing our attention now on just these three lines, we have:
Of course, the second line is
the ICPSR study name. The third line of information is the
Volume/Serial number of the UNT archive tape that contains the data
(including codebook(s), dictionary, SPSS or SAS code for reading data,
etc., depending on which supplementary files are available for the
particular study requested). In this case, we can see that files for
ICPSR study number 9271 can be found on the UNT tape with
Volume/Serial number 105595.
The next step in the process of
accessing this study data is to learn a little more about the
filenames and file numbers of the files associated with the study. In
particular, you will need to find the values of certain parameters for
purposes of including the JCL lines shown in the table on page 12 in
your SPSS program that reads the data.
The JCL
statements in the table (followed by the beginning of an SPSS program)
should be included at the beginning of the SPSS program (or SAS
program for SAS, substitute EXEC SAS for EXEC SPSS) that you SUBMIT to
MVS in order to read the data. In place of idnn you should type in
your CMS User-ID; also, type in your name instead of Your Name on
line one. In the place of mvspw, put your MVS password. Note that your
MVS password was originally the same as your CMS logon password;
however, if you changed your logon password, your MVS password remains
the same as before be sure to use the correct password in this field!
The information that you must determine and supply in order
to run this program is that which is specific to the particular data
file you want to use. The information will be included in the Data
Definition (DD) statement in the above program code, it is the line
that begins with //DATAIN DD. The fields of information unique to this
study are:
As
mentioned previously, the Volume/Serial number is determined from a
search of the CMS archive listing file, current icpsr d. You will use
this Volume/Serial number in a separate MVS program to gather the
remaining information (file name, file number, and tape format) needed
to write the above Data Definition statement. This latter MVS program
is commonly referred to as a "tape map" program. It describes the file
names and file numbers for each file contained on a UNT tape. (Note:
each Volume/Serial number identifies a unique UNT archive tape, and
each tape contains from one to many files. Hence, the file(s) you are
interested in may be just one file among many on a UNT archive tape.)
ICPSR Tapemap Program //idnnMAP JOB (idnn,:30,1),your name,CLASS=B,PASSWORD=XXXXXX /*ROUTE PRINT UNTVM1.idnn /*ROUTE PUNCH UNTVM1.idnn //TAPEMAP PROC VOL=IDUNNO //MAPPIT EXEC PGM=TAPEMAPS //SYSUT1 DD LABEL=(1,BLP,EXPDT=98000), // VOL=SER=&VOL;,DISP=SHR,UNIT=TAPE9 //STEPLIB DD DSN=SYS2.A000.MVS.UTILS.LOAD,DISP=SHR //SYSPRINT DD SYSOUT=A //SYSUDUMP DD SYSOUT=A // PEND //MAP EXEC TAPEMAP,VOL=105595
JCL for SPSS Job to Read ICPSR Data //idnnSPSS JOB (idnn,:05,1),Your Name,CLASS=A,PASSWORD=mvspw /*ROUTE PUNCH UNTVM1.idnn /*ROUTE PRINT UNTVM1.idnn // EXEC SPSSX //DATAIN DD DSN=ICPSR.DA9271,UNIT=TAPE9,DISP=SHR, // VOL=SER=105595,LABEL=(20,SL) data list file = DATAIN /v1 1-2 v2 3-5...
An
example of the tape map program that you will need to run in order to
determine the specific information you need to access a dataset is
shown below. A sample tape map program is also located on the CMS
public d disk which you may copy to your own CMS account, modify for
your particular use, and submit from your account. To copy this
program, called icpsr tapmap d, to your CMS account, issue the following
command from the CMS Ready; prompt:
copyfile
icpsr tapmap d = = a
The icpsr tapemap
program that you would SUBMIT for the current example (study number
9271 on Volume/Serial = 105595) would appear as shown in the table
below.
In place of idnn, your name, and XXXXXX,
put your CMS user id, actual name, and MVS password, respectively. The
UNIT parameter refers to the kind of tape the data is stored on in the
UNT archive. Possibilities include either reel (designated as TAPE9) or
cartridge (designated as TAPECR). You can tell which of these to use
by noting the value of the Volume/Serial number associated with the
tape. Volume/Serial numbers beginning with a 1 (that is, from 100000
to 199999) are reel format; for these you would use UNIT=TAPE9 in the
above tape map program. Volume/Serial numbers beginning with a 3 (that
is, from 300000 to 399999) are cartridge format; for these you would use
UNIT=TAPECR in the above tape map program. Since, in the present
example, the Volume/Serial number begins with a 1 (105595), we would
use UNIT=TAPE9. Finally, in the last line of the program, use the
actual Volume/Serial number that you found in your search of the
current icpsr d document on CMS.
After submitting the
tape map program to MVS, the output will be returned to your reader
list. All of the information relevant to your needs will be located near
the end of this output. For the current example, the output of
interest looks like that in the table on page 13.
It is useful if you are able to make heads or tails of this
information. You will notice that on this particular UNT tape there
are a total of 22 files. All of the files appear to be various kinds
of ICPSR files that is, each is associated with a particular ICPSR
study. Notice that only one file is associated with the study we are
interested in it is file number 20, with the file name of
ICPSR.DA9271. Information about the nature of each file is contained
in the file name itself. The letters preceding the ICPSR study numbers
in the file names essentially tell what kind of file each is. The
following key may be useful in interpreting these letters:
From the tape map program, the following information is
gathered:
This information is
included in the data definition (DD) statement of the SPSS or SAS
program that you submit to read the data. The table on page 13 is an
example of the JCL required to access this data (filename, file
number, type of tape, and Volume/Serial number are in bold).
Click here for a sample table
In order to write an SPSS data list or SAS input statement to
read the data, you need to know information about which variables are
located in which columns in the raw data. This information is usually
found in a codebook associated with the particular study/dataset. For
many ICPSR studies, code books are available in hard copy format and
are available at Willis Library. Some studies, however, do not come with
hard copy versions of the codebook; these usually have electronic
versions that can be downloaded from tape, stored in your CMS account,
and read within CMS (i.e.., using browse') or sent from CMS to the
printer in ISB to generate a printout version. In order to download an
electronic version of an ICPSR codebook from MVS to your CMS account,
you need to know the same information about the codebook file as you
did for the actual data file.That is, you need to know the Volume/Serial
number, filename, file number, and the type of tape it is stored on in
the UNT tape archive. Codebook files are located on the same tapes as
the corresponding study data, hence the Volume/Serial number is the same
as for the data file. The other information (filename, file number,
type of tape) is found in the same way as for the data file it can be
found in the same tape map output that was generated for purposes of
finding information on the data file.
In the above example, there was no codebook associated with the
study. In cases such as this, check with the library to see if there
is a hardcopy available. If, instead of the Health survey, we had been
interested in ICPSR study number 8352, the German Election Study,
1983, we would have found an electronic version of the codebook. In the
tapemap output, we can see that file number 12, with filename ICPSR.
CB8352, is the codebook for the German Election data. Since it is
located on the same Volume/Serial tape (105595), it is also on reel
(hence, UNIT = TAPE9). With this information, it is possible to run an
IEBGENER program that will read this file (codebook) of the MVS tape and
write it to your CMS account. The program to do this is shown on the
right.
Sample JCL to Access File ICPSR.DA9271 //idnnSPSS JOB (idnn,:05,1),Your Name,CLASS=A,PASSWORD=XXXXXX /*ROUTE PUNCH UNTVM1.idnn /*ROUTE October 3, 1995PRINT UNTVM1.idnn // EXEC SPSSX //DATAIN DD DSN=ICPSR.DA9271,UNIT=TAPE9,DISP=SHR, // VOL=SER=105595,LABEL=(20,SL) data list file = DATAIN /v1 1-2 v2 3-5...
Information derived from the tape map
procedure is in bold; be sure to customize this file by supplying your
own CMS used, your name, and your MVS password where appropriate.
After successfully executing this job, a file (i.e.., the
codebook) will appear in your CMS reader which you may then reprint'
to receive it in your CMS fillets. You can then read the codebook
directly or make a printout of it (using the mainframe printer) in order
to obtain the information necessary to write the data list (SPSS) or
input (SAS) statement to read the data. From there on out, its your
research... you can do any statistical analysis of the data that you
wish.
Sample JCL to Read and Write a Codebook File //idnnCDBK JOB (idnn,1,15),Your Name,CLASS=B,PASSWORD=XXXXXX /*ROUTE PUNCH UNTVM1.idnn /*ROUTE PRINT UNTVM1.idnn // EXEC PGM=IEBGENER //SYSPRINT DD SYSOUT=(A,,LP2X) //SYSIN DD DUMMY //SYSUT1 DD DSN=ICPSR.CB8352,UNIT=TAPE9,DISP=SHR, // VOL=SER=105595,LABEL=(12,SL) //SYSUT2 DD SYSOUT=A /* //
If you have problems or questions about this server, please contact us as soon as possible. You can send mail to the following address: www@unt.edu