|
Examplonia1 is a small
island country of approximately one million adult citizens. The
population is dispersed across three regions (western, central, and
eastern) and among 15 cities. The census bureau of Examplonia
maintains records of each adult citizen. You can find a description
of the 62 variables collected from Examplonia’s citizens below.
Instructors at University of North
Texas (UNT) interested in obtaining a random sample of Examplonia’s
data can request one from this page's author. When requesting a sample,
please indicate the following: your first and last name (as listed
in the university directory), the course prefix, number, and title
of the class for which you are requesting the sample; the percentage
of (random) missing values you want the sample to contain (e.g., 0%,
5%, 10%, etc.), and the
sample size (i.e. number of citizens/cases) you would like the
sample to contain. Requested sample data files will be sent through
UNT email as attachments and be in plain text (your_name_date.txt),
comma delimited format; with NA as missing if any percentage missing
is requested and variable names across the top row of data.
Section 1:
Demographic Variables
The first 7 columns of data are used to describe
the characteristics of the citizens.
Variable name: id
Each adult citizen's data is assigned
a sequential id number which simply identifies them among their
peers.
Variable name: region
Each citizen’s region of residence;
there are three regions in Examplonia; I, II, III which correspond
to west, central, and east.
Variable name: city.names
Each citizen’s city of residence;
there are 15 cities in Examplonia. The western region (I) contains
the cities: Seeatile, Portly, San Francis, Los Angelinas, and San
Dingo. The central region (II) contains the cities: Fargis, Omah,
Tilsa, Astin, and El Piaso. The eastern region (III) contains the
cities: Bahston, New Jork, Washlesston, Carlot, and Myami.
Variable name: gender
Each citizen’s gender; male or
female.
Variable name: age
Each citizen’s age; number of years.
Variable name: education
The number of years of formal
education for each citizen.
Variable name: income
The annual income of each citizen.
Section 2: Engagement/Activity
Survey (3 subscales/domains)
The next 14 columns of data represent survey
questions assessing the levels of cognitive, physical, and social
engagement of each citizen.
Variable names:
q1.cognitive.1, q2.cognitive.2, q3.cognitive.3, q4.cognitive.4
These four variables contain the
4-point Likert responses to questions assessing the cognitive
engagement of citizens. Response choices were: Strongly Agree,
Agree, Disagree, and Strongly Disagree. Strong agreement indicates
greater cognitive engagement or activity of intellectual abilities.
Variable names: q5.physical.1,
q6.physical.2, q7.physical.3, q8.physical.4, q.9.physical.5
These five variables contain the
5-point Likert responses to questions assessing the physical
activity (engagement) of citizens. Response choices were:
Hyperactive, More Active, No Difference, Not Very Active, and
Lethargic. Hyperactivity represents greater (i.e. more frequent and
intense) physical activity.
Variable names: q10.social.1,
q11.social.2, q12.social.3, q13.social.4, q14.social.5
These five variables contain the 4-point
Likert responses to questions assessing the social engagement of
citizens. Response choices were: Strongly Agree, Agree,
Disagree, and Strongly Disagree. Strong agreement indicates more
social engagement, community activity and person to person
activity.
Section 3: Personality and Socio-Political
Values (6 subscales/domains)
The next 34 columns of data contain numeric
scores assessing a variety of personality characteristics and
citizen values or opinions.
Variable names: neuroticism,
extroversion, openness, conscientiousness, agreeableness
These five variables contain scores
similar to those produced by the NEO PI-R (Costa & McCrae, 1992), which reflect each citizen’s propensity to display certain behaviors
and thought patterns. Higher scores indicate more of the variable’s
personality trait.
Variable names: nuclear, coal,
nat.gas.electric, wind, solar
These five variables contain scores
which reflect each citizen’s opinion on the use of various energy
sources (domestic electricity production – for household,
commercial, and governmental use). Higher scores indicate more
favorable views toward the widespread use of a particular source of
energy.
Variable names: automobile,
bus, train, bicycle, walk
These five variables contain scores
which reflect each citizen’s opinion on the use of various
transportation methods. Issues of personal and mass transit were
addressed across a variety of distances (i.e. short trips, long
commutes, etc.). Higher scores represent favorable views toward the
widespread use/utility of a particular variable’s transportation
method.
Variable names: gasoline,
nat.gas.car, hybrid, electric, other
These five variables contain scores
which reflect each citizen’s opinion on the use of various vehicle
types, specifically vehicle propulsion methods. It is important to
note, diesel was included in the ‘gasoline’ variable, and steam,
hydrogen, and biofuels (e.g. ethanol) were included in the ‘other’
variable. Higher scores represent the favorable views toward the
widespread use/utility of a particular variables propulsion method.
Variable names:
animal.extinction, plant.extinction, severe.storms, ice.melt,
sea.rise
These five variables contain scores
which reflect each citizen’s opinion on the consequences of human
induced climate change. Higher scores reflect greater anxiety about
the impact of a particular consequence.
Variable names: religion,
abortion, national.debt, unemployment, healthcare, public.edu,
campaign.finance, business.regulation
These eight variables contain scores
which indicate each citizen’s opinion on specific social or
political issues facing Examplonia. Higher scores represent strong
beliefs that a particular issue represents a threat to national
solidarity.
Variable name:
social.responsibility
This variable contains a score which
represents the level of obligation each citizen feels toward the
healthy maintenance of their country and its people. Higher scores
represent a greater sense of social responsibility.
Section 4: Health
The next 7 columns of data contain information
about the health of each citizen.
Variable name: tobacco.user
This variable simply reports whether
or not the citizen is a tobacco user: yes, no.
Variable name: blood.type
This variable contains the blood type
of each citizen: O positive, A positive, B positive, AB positive, O
negative, A negative, B negative, and AB negative.
Variable name: bmi
This variable contains the Body Mass
Index (BMI) of each citizen. Higher numbers represent greater body
fat percentages.
Variable name: sys.bp
This variable contains the average
Systolic Blood Pressure of each citizen taken at the beginning and
end of their most recent doctors’ visit. Higher numbers represent
higher blood pressure – extremely high or low numbers represent a
cardiovascular health risk.
Variable name: wbc.count
This variable contains each citizen’s
White Blood Cell (WBC) count, measured as the number of cells per
micro liter (mcL). Higher numbers indicate higher concentrations of
WBC – higher numbers generally indicate better health, extremely low
numbers can indicate a risk to immune system health.
Variable name: glucose
This variable contains each citizen’s
blood glucose level, measured in millimoles per liter (mmol/L).
Extremely low levels can indicate a risk of hypoglycemia; extremely
high levels can indicate a risk of hyperglycemia and long term
hyperglycemia can present a risk for developing diabetes.
Variable name: ldl.cholesterol
This variable contains the blood
level of Low-Density Lipoprotein (LDL) cholesterol for each citizen,
measured in milligrams per deciliter (mg/dL). Extremely high levels
of LDL cholesterol are associated with a risk for cardiovascular
disease.
Footnotes:
1Examplonia is a fictional
country which allows some (somewhat) meaningful context for
statistical analysis examples. The population data for Examplonia
was generated to provide a statistical population from which random
samples could be drawn for the completion of example statistical
analysis problems. The current population data (March 15, 2012)
contains a variety of univariate and multivariate statistical models
and/or effects. The idea for creating Examplonia originated (for
this author) with Bethlehem (2009).
Other notes:
The population of Examplonia can
change over time, new variables can be added and the nature of the
relationships between variables may be adjusted.
References
Bethlehem, J. (2009). Applied
Survey Methods: A Statistical Perspective. Hoboken, NJ: John
Wiley & Sons.
Costa, P. T., Jr., & McCrae, R. R. (1992). NEO PI-R professional
manual. Odessa, FL: Psychological Assessment Resources, Inc.
This page
has been tested for use with Firefox, other browsers may display
the pages incorrectly.
|