|
UIT
| ACUS |
Help Desk
|
Training |
About Us |
Publications
| RSS
Home
NOTE: Please read the
FAQ
thoroughly before contacting our office.
Web link for requesting support and/or appointments with
RSS staff
|
Back to the
Do it yourself
Introduction to R |
(1) Reading data into R directly from a
URL.
Reading data into R from the web is very
easy, you simply specify where the file is located with the URL
in the function. There are three common functions for reading,
or importing, data into R; regardless of where the data is
stored. Those functions are 'read.table' (for most text files --
which have the extension .txt), 'read.csv' (for comma separated
values files -- which have the extension .csv), and 'read.spss'
(for SPSS data files -- which have the file extension .sav).
Note however, as was discussed in the previous tutorial, when
importing SPSS data files (.sav) you must first load the
'foreign' library.
An example of reading a text (.txt) file into R and naming the data "example.3":
example.3 <- read.table("http://www.unt.edu/rss/class/Jon/R_SC/Module3/ExampleData3.txt",
header=TRUE, sep="", na.strings="NA", dec=".", strip.white=TRUE)
summary(example.3)
Notice, the function is 'read.table'. The
first argument to that function is the location of the file, in
quotations, giving the URL. The 'header' argument specifies
whether or not the file has a header of names as the first line
(top row). The 'sep' argument specifies what character separates
each data point (e.g., a space, a comma, a period, etc. between
each of the columns of data). The 'na.strings' argument
specifies how you want to identify missing values -- the common
R default being "NA". The 'dec' argument specifies what is used
for a decimal point. The 'strip.white' argument specifies
whether or not the function will remove the white space from
before and after unquoted character fields, it is only specified
when 'sep' has been specified. The 'read.table' function can be
used with comma separated value files, but the 'read.csv' file
is very similar (i.e. the two functions are virtually the same
accept that the 'read.csv' function does not need the 'sep' or 'strip.white'
arguments).
An example of reading an SPSS (.sav) file
into R and naming the data "example.1":
library(foreign)
example.1 <- read.spss("http://www.unt.edu/rss/class/Jon/R_SC/Module3/ExampleData1.sav",
use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE)
summary(example.1)
Notice above, the 'foreign' library must be
loaded, perhaps SPSS data is considered foreign to R...
At any rate, you'll notice the first argument to the 'read.spss'
function specifies the location (URL) of the file being
retrieved. The 'use.value.labels' argument specifies whether or
not the values (FALSE) or value labels (TRUE) will be displayed
for factor variables (i.e. grouping variables; e.g., gender;
values = 1 or 2, value labels = "Male" or "Female"). Note, if
this argument is set to FALSE, the variable will be considered
numeric and when this argument is set to TRUE, the variable will
be considered a factor. The 'max.value.labels' argument
specifies how many unique valid values will be converted to
value labels when a variable is converted to a factor. The 'to.data.frame'
argument specifies whether or not the data will be specified as
a data frame; if FALSE, then the data will simply be a matrix.
The 'summary' function from both examples
above, simply provides the minimum value, 1st quartile value,
median, mean, 3rd quartile value, maximum value, and how many
cells contain missing data for each numeric variable in the
data. When a variable is a factor (e.g., gender), then summary
returns the number of cases/rows of each level of the factor
(e.g. "Males" = 103, "Females" = 121). The summary function is
perhaps one of the most often used functions in R and certainly
the most frequently used function on this web site. It can be
applied to vectors (numeric or factor), matrices, data frames,
lists, and fitted model objects (e.g., a regression model, a
factor analysis, etc.). As its name implies, it simply provides
a summary of whatever object is passed to it and the output (or
returned values) vary widely depending on the object on which it
is run.
In future tutorial notes, we will be using
R console and script files; but remember all scripts can be
copied and pasted into the R Console. The script files can also
be downloaded and then opened with the R Console or in R
Commander using ‘File’, ‘Open script
file…’ in the Console or Rcmdr top task bar.
When reading the script files, you'll
notice the common convention of using # to start a comment line
(which is not working code), while lines without # are working
code.
|
|
Back to the
Do it yourself
Introduction to R |
UNT home page |
Search UNT |
UNT news |
UNT events
Last updated:
08/22/12 by
Jon Starkweather.
UIT |
ACUS |
Help Desk
|
Training |
About Us |
Publications
| RSS
Home
 |