Creation date: 11/18/2000
Authored by: Karl Ho
The All-encompassing SAS 8 (1/2)
Last year, when I wrote an evaluation note on SAS 7, (which was a transitional release from SAS' first windows generation 6.x to the current version SAS 8), I fell short of giving a full coverage because of SAS' enormity composed of numerous modules and procedures. Another reason was SAS 7 was still in a developer's release (a "post-beta" beta version.) After one year, when I request myself to do that again to the new version 8, I have to say I am still shy of giving a satisfactory report: I can only split the evaluation into two articles just to introduce the new features that are innovated in version 8 alone.
The new SAS not only demonstrates higher level of stability in the MS windows operating system (geared for Windows 2000)*, it introduces a wave of new functionalities and features that give the software a facelift from its previous mainframe-adapted outlook. Most of the windows users may still refrain from choosing SAS 7 in lieu of other GUI-based packages such as SPSS or Statistica since SAS is known for its syntax-based operation. With the three new add-on modules (SAS/Analyst, SAS/LAB, SAS/INSIGHT) plus the 3-D graphic PROC G3D procedure, I would declare SAS is now fully gooey (GUI). For instance, with Analyst (Solutions--> Analysis --> Analyst), users can simply import data in various formats and start analyzing in the spreadsheet-like, explorer interface. A wide variety of procedures are ready-to-use in Analyst, such as performing bivariate analyses (e.g. T-test, correlations, ANOVA) and multivariate analyses (GLM, Regression, Power analysis, Principal Components and Survival models). Users can also easily select samples out of an existing data set and create charts by point-and-clicking.
However, comparative advantages of SAS are still on its advancement in research and development, that exemplifies in the new data analysis procedures. In the following I will briefly introduce these procedures new to the release 8.1 with some sample outputs.
When starting a survey, particularly a large-scale or national survey, researchers are concerned how to extract samples from the population and if and how weighting should be applied to certain under-represented (certain social-economic status group in some geographic areas) or over-represented groups (e.g. upper-middle class among email recipients). SAS 8 introduces a new series of SAS procedures enables survey researchers to select their survey samples using different designs:
PROC SURVEYSELECT selects samples via a variety of methods
ranging from simple random to complex multi-stage design sampling.
With another two new procedures, SURVEYMEANS and SURVEYREG,
researchers can easily estimate sample and population means, variances, confidence limits, and other descriptive
statistics, sampling errors and regression models, taking into account the
sampling design and weighting scheme introduced in the sample selection
process. (sample output)
SAS incorporates in the newest version 8.1 one of the latest techniques in modeling non-linear models: nonparametric regression. It encompasses a suite of nonparametric techniques including kernel density estimation and loess smoothing. The PROC KDE procedure compute nonparametric estimates using the method of kernel density estimation, saving the estimate for subsequent plotting and analysis. The PROC LOESS and PROC TPSPLINE provide various smoothing methods to conduct exploratory data analysis and fit nonparametric or semiparametric models.
Spatial Prediction: Variogram and 2-dimensional Kriging
(Spatial analyses in geology, petroleum exploration, mining, and water pollution analysis)
PROC VARIOGRAM and PROC KRIGE2D implement the spatial prediction of unsampled locations using two-dimensional data based on spatial continuity.
Qualitative and Limited Dependent Variable Models
Researchers are very often faced with dependent variables that are not continuous. These discrete variables (sometime called categorical choice) include the choice of political parties, presidential candidates and decision to take a bus or a train. One of the most renowned examples is what the 2000 Nobel prize laureate, Daniel L. McFadden, has been studying since 1974: commuters' choice of transportation mode(**). Multinomial logit and probit models estimate the probability of the limited dependent variable such as a commuter's choice of whether taking a bus or driving a car. A new procedure in SAS/ETS is introduced to estimate the family of discrete choice model. PROC QLIM can analyze the regular binary (two-choice) probit and logit models, but also:
multinomial logit (more than two categories_
endogenous switching regression
Other New tests/features include:
Exact Logistic Regression (sample output)
Exact tests: generating direct exact p-values, or using Monte Carlo simulation (10000 samples) to estimate exact p-values.
Numerically Precise Regression (PROC ORTHOREG***): The new procedure produces more numerically accurate estimates than other regression procedures (e.g. REG, GLM) when data are ill conditioned or badly scaled.
In the next article, I will introduce the following new features:
Partial Least Square
Multiple Imputation for Missing Data
* I should have mentioned SAS for UNIX (version 8) delivers at least as much as its Windows version. Given the limit in space, I only focus on the latter.
** McFadden, D. 1974. "The Measurement of Urban Travel Demand" Journal of Public Economics, 3:303-28. Another laureate, James Heckman, another econometrician, is known for the selection bias model, also called Heckman model.
*** Orthogonal regression minimizes the distance between the X/Y points taken together and the regression line but PROC ORTHOREG uses least squares.
An, Anthony and Donna Watts. 1998 "New SAS Procedures for Analysis of Sample Survey Data" SUGI Proceedings
What's New in Data Analysis on SAS Research and Development communities web (http://www.sas.com/rnd/app/da/danew.html)
Last updated: 01/18/06 by Karl Ho