Page One

Campus Computing News

Lab-of-the-Month: ACS' General Access Lab

Important Academic Mainframe News

GroupWise Tips

Today's Cartoon

RSS Matters

The Network Connection

List of the Month

WWW@UNT.EDU

Short Courses

IRC News

Staff Activities

Subscribe to Benchmarks Online
    

Research and Statistical Support Logo

RSS Matters

The All-encompassing SAS 8 (1/2)

By Dr.Karl Ho, Research and Statistical Support Services Manager

Last year, when I wrote an evaluation note on SAS 7, (which was a transitional release from SAS' first windows generation 6.x to the current version SAS 8), I fell short of giving a full coverage because of SAS' enormity, it is composed of numerous modules and procedures. Another reason was that SAS 7 was still in a developer's release (a "post-beta" beta version.) After one year, when I decided to evaluate the new SAS version 8, I have to say I am still shy of giving a satisfactory report: The best I can do is split the evaluation into two articles, just to introduce the new features that are included in version 8 alone. 

The New SAS

The new SAS not only demonstrates a higher level of stability in the MS Windows operating system (geared for Windows 2000)*, it introduces a wave of new functionalities and features that give the software a facelift from its previous mainframe-adapted outlook. Windows users may still refrain from choosing SAS 7 in lieu of other GUI-based packages such as SPSS or Statistica since SAS is known for its syntax-based operation. With the three new add-on modules (SAS/Analyst, SAS/LAB, SAS/INSIGHT) plus the 3-D graphic PROC G3D procedure, I would declare SAS is now fully gooey (GUI). For instance, with Analyst (Solutions--> Analysis --> Analyst), users can simply import data in various formats and start analyzing in the spreadsheet-like, explorer interface. A wide variety of procedures are ready-to-use in Analyst, such as performing bivariate analyses (e.g. T-test, correlations, ANOVA) and multivariate analyses (GLM, Regression, Power analysis, Principal Components and Survival models). Users can also easily select samples out of an existing data set and create charts by point-and-clicking.

However, comparative advantages of SAS are still on its advancement in research and development, that is exemplified in the new data analysis procedures. In the following I will briefly introduce these procedures new to the release 8.1 with some sample outputs.  

  1. Survey Sampling 

    When starting a survey, particularly a large-scale or national survey, researchers are concerned how to extract samples from the population and if and how weighting should be applied to certain under-represented (certain social-economic status group in some geographic areas) or over-represented groups (e.g. upper-middle class among email recipients). SAS 8 introduces a new series of SAS procedures enables survey researchers to select their survey samples using different designs:

    • simple random

    • stratified 

    • clustering

    • unequal weighting

    PROC SURVEYSELECT selects samples via a variety of methods ranging from simple random to complex multi-stage design sampling. With another two new procedures, SURVEYMEANS and SURVEYREG, researchers can easily estimate sample and population means, variances, confidence limits, and other descriptive statistics, sampling errors and regression models, taking into account the sampling design and weighting scheme introduced in the sample selection process.  (sample output)

  2. Nonparametric Modeling

    SAS incorporates in the newest version 8.1 one of the latest techniques in modeling non-linear models: nonparametric regression. It encompasses a suite of nonparametric techniques including kernel density estimation and loess smoothing. The PROC KDE procedure compute nonparametric estimates using the method of kernel density estimation, saving the estimate for subsequent plotting and analysis. The PROC LOESS and PROC TPSPLINE provide various smoothing methods to conduct exploratory data analysis and fit nonparametric or semiparametric models.  

    Sample output:



  3. Spatial Prediction

    Variogram and 2-dimensional Kriging (Spatial analyses in geology, petroleum exploration, mining, and water pollution analysis) PROC VARIOGRAM and PROC KRIGE2D implement the spatial prediction of unsampled locations using two-dimensional data based on spatial continuity.

    Sample plots:






  4. Qualitative and Limited Dependent Variable Models

    Researchers are very often faced with dependent variables that are not continuous. These discrete variables (sometime called categorical choice) include the choice of political parties, presidential candidates and decision to take a bus or a train. One of the most renowned examples is what the 2000 Nobel prize laureate, Daniel L. McFadden, has been studying since 1974: commuters' choice of transportation mode(**).Multinomial logit and probit models estimate the probability of the limited dependent variable such as a commuter's choice of whether taking a bus or driving a car.  A new procedure in SAS/ETS is introduced to estimate the family of discrete choice model.  PROC QLIM can analyze the regular binary (two-choice) probit and logit models, but also:

    • ordinal probit

    • nested logit

    • multinomial logit (more than two categories

    • tobit 

    • endogenous switching regression

    • simultaneous equations

  5. Other New tests/features include:

  • Exact Logistic Regression (sample output)

  • Exact tests: generating direct exact p-values, or using Monte Carlo simulation (10000 samples) to estimate exact p-values.

  • Numerically Precise Regression (PROC ORTHOREG***): The new procedure produces more numerically accurate estimates than other regression procedures (e.g. REG, GLM) when data are ill conditioned or badly scaled.

Next?

In the next article, I will introduce the following new features:

  1. Partial Least Square

  2. IML workshop

  3. Multiple Imputation for Missing Data

  4. Distribution analysis

  5. Robust regression


* I should have mentioned SAS for UNIX (version 8) delivers at least as much as its Windows version.  Given the limit in space, I only focus on the latter.

** McFadden, D. 1974. "The Measurement of Urban Travel Demand" Journal of Public Economics, 3:303-28.  Another laureate, James Heckman, another econometrician, is known for the selection bias model, also called Heckman model.

*** Orthogonal regression minimizes the distance between the X/Y points taken together and the regression line but PROC ORTHOREG uses least squares.

Reference

An, Anthony and Donna Watts. 1998 "New SAS Procedures for Analysis of Sample Survey Data" SUGI Proceedings

What's New in Data Analysis on SAS Research and Development communities Web (http://www.sas.com/rnd/app/da/danew.html)