rssunt.gif (12308 bytes)


Creation date: 02/16/2000
Authored by: Karl Ho and Craig Henderson

RSS Matters

Why SPSS 10.0?

by Karl Ho and Craig Henderson

SPSS has turned to two-digit now.  New version?  Again?  Yes, in a period of less then 6 months, the software company rolled out two versions of its flagship software late last year.   Mostly a marketwise strategic move, the software merits applicationwise evaluations from support standpoints though. And the new version does introduce innovations that facilitate research and data analysis in practice.

The major new feature that SPSS programmers introduce in the new version is the so-called "Distributed Analysis Architecture".   The new architecture represents an integral part of the distributed computing concept in which computers connected to the network share jobs and workloads.   Formerly, SPSS had multiple modes under different licensing arrangements.  The University of North Texas distributes the workstation version of SPSS, which works on individual, standalone machines.  A network, server/client version (not available at UNT) was in place but was not as popular as the one we currently support.  In version 10.0, the company joins these two versions closer together and moves on to distributed computing.  An SPSS server houses the database responsible for storing and feeding data, and the SPSS workstation receives data and does the computing.  The rationale behind this division of labor is allowing the sharing of large databases and avoiding duplicating storage on individual PCs, where users can focus on the data analysis and reporting.  

The advantages of distributed computing are multiple.  In a nutshell, it optimizes the hardware usage by job sharing and makes more efficient use of the networked server and workstations.  It also distributes or decentralizes workloads to multiple nodes and depends to a large extent on the network throughput. 

The new architecture in SPSS 10.0 will benefit researchers who work frequently with huge databases such as US Census data, which easily approaches the 2 gigabyte threshold for MS Windows systems.  It also benefite those who share data often.  No longer do they need to put the sizeable data on a local hard disk and crash SPSS every time the data are crunched.  However, distributed processing necessitates an SPSS server license that we currently do not support. Despite that, we are equipped with the UNIX version of SPSS that handles sizeable data. 

Below are some additional new features that UNT researchers can enjoy in SPSS 10.0.  

Multiple sessions

The new version allows multiple SPSS sessions running simultaneously on the same desktop computer.  In our experiments, we have been able to open five SPSS sessions with medium size data sets, which takes up about 40 to 50 percent of CPU resources.  Of course, running more than five sessions is not recommended, but SPSS manages two or three simultaneous sessions well. 

Data Access

The latest version of SPSS enhances its versatility and efficiency in data access.  For instance, it can directly read in SAS system files (up to version 6.12, it only imported SAS transport files in the past) and Excel worksheets without the need of ODBC drivers (including Excel 2000). The new GET DATA command complements the existing DATA LIST command by using virtual active files stored in cache.  This new feature partially eliminates the need of duplicating data files in temporary disk space for certain data procedures (e.g. reading in data, merging data files).

New Statistical Procedures

A new non-menu procedure, Polytomous Logit Universal Models (PLUM), that allows using regression models on ordinal dependent variables has been added to the Advanced Models module.  Other new procedures include the CATPCA and PROXSCAL, which exist under the Categories module.  The former procedure simultaneously quantifies categorical variables while reducing the dimensionality of the data via Principal Components Analysis.  The second enables proximity scaling in Multidimensional Scaling.

What's next?

In terms of development, SPSS has been on the double in the past two years, during which the software delivered three major upgrades. Two major trends are observed.  First, more new procedures have been added and data access capability has also been enhanced.  However, the company has also expanded its flagship software, plus its other acquired products, into a wider audience, mostly business users.  Clearly more new developments are geared toward market research and business studies.  Compared to other statistical packages, SPSS has explicitly become more of a business tool than a research tool.  Second, while the open source movement helps promote participation of users and programmers in software development, SPSS has become more proprietary and "cathedral" (see Mullet and Mullet 1999).  We can see the blooming of user groups of other statistical software like Stata, SAS and S-Plus, leading to proliferation of user-developed programs and macros available for sharing and collaborated development, not to mention a completely open source statistical product, R.  SPSS, however, remains "unmoved" by this movement.   For instance, there are over 400 downloadable ado files (Stata macros) developed by users archived at one of the user-supported web site(*).  SAS and Mathsoft (developer of S-Plus) also provide archives of macro or sample programs. For SPSS, its official web site houses only 11 macros, two of which are contributed by users back in 1994.

In conclusion, it appears that SPSS's change in development strategy has paralleled the dropping of its acronym, becoming simply SPSS.  This seems to indicate a desire to reach a larger audience.  While it has historically marketed itself to the academic community, it appears that in the future SPSS's development will continue to be primarily geared toward business applications.  It is not that these developments are not helpful to researchers in other fields, for example, distributed processing is an innovation that can benefit anyone using large datasets.  However, in our opinion, the statistical developments lag behind products such as SAS and S-Plus.  In the authors' opinion, the biggest advantage of SPSS still continues to be the ease of use and the relatively short learning curve, something particularly advantageous for teaching purposes.  We will continue to keep you posted with updates and developments.   

* Boston College Department of Economics Stata Archive (http://ideas.uqam.ca/ideas/data/bocbocode.html)

Reference:

Mullet, Kevin and Dianna Mullet. 1999. Doing it in the Open. Benchmarks On-line. Volume 2 - Number 6,  June 1999  (http://www.unt.edu/benchmarks/archives/1999/june99/open.htm)


Last updated: 01/18/06 by Karl Ho