Benchmarks Online

Skip Navigation Links


Page One

Campus Computing News

EagleConnect is Coming Soon!

The Best of '08, Redux

EDUCAUSE in San Antonio

Today's Cartoon

RSS Matters

The Network Connection

Link of the Month

Helpdesk FYI

Short Courses

IRC News

Staff Activities

Subscribe to Benchmarks Online
    

Research and Statistical Support - University of North Texas

RSS Matters

Link to the last RSS article here: Creating Maps With GIS Data in SAS 9.1.3, Part 2. - Ed.

Free ! = Cheap: Open Source and/or Free Alternatives in Statistical Analysis

By Dr. Mike Clark, Research and Statistical Support Services Consultant

I'll admit being a relative newbie to the technicalities of the open source scene, though mostly this was out of ignorance not any reluctance to use things that were open source and/or free on principle. But over the years I find myself using more and more of the work of those whose generous nature has allowed others to partake of the fruits of their labor. Whether it was stand alone single function stat programs, media players, PDF converters, etc., if something performed an activity I needed and it was freely and readily available, common sense dictated the obvious choice to use it rather than pay for something that did the same thing. This has held up to the point where now I start to look for open source and/or free alternatives in the early stages of software needs and use them even if the more costly proprietary software is available as it is here on campus. I do not do this to the point of zealotry, to where I suffer a noticeably inferior product just because it is free or abides by open source principles. However even at home I can watch an episode of classic star trek, type up a paper, browse the web, and perform advanced statistics using zero cost programs/websites. Such are the times in which we live.

Focus on R

The focus here however is on statistics and in particular R, which was featured on the web version of the New York Times on January 7th. R is open source, and not only free as in speech but free as in beer, i.e. the full statistical package is available at no cost. You can download it and the 1600+ packages from its website and have analytical capabilities unmatched by heavy hitters such as SPSS and SAS. If it can't do something, you can alter the functions within to do what you want. What's more, you can subscribe to the help and developer lists and gain insights and assistance from top statisticians and researchers from around the world (again freely). And finally to top things off, one can always keep up with the times with daily updates and new versions as they come out. Is it perfect? No software is or ever will be, but it does what's needed and more, and for typical academic applications just as well and often more easily. In my experience R has yet to prove unable to do anything that our clients typically want, but it is often used by us to make their projects much better than they otherwise would have been or just simply more efficient. Perhaps even more telling regarding its (and Python's) capabilities is that SPSS has an add-on that allows one to use its code (and with 17.0 its graphics) within the syntax editor. If R is good enough for SPSS to use to make up for its many weaknesses, it should be good enough for anyone that would consider using SPSS. It is not as easy to learn, but I'm not sure that learning it takes any more than the long hours put into doing workarounds for more popular programs that do not have its functionality, and if this former menu-clicker can shift to it as a primary stat package anyone can.

R is not alone however in the statistical/mathematical computing arena. SciPy and NumPy offer further choices in scientific computing via the Python programming language and already work well with R. OpenStat is a general statistical package option that includes advanced techniques. There are open source alternatives geared toward the look and feel of specific proprietary scientific/statistical software packages such as Octave for Matlab and PSPP for an alternative to SPSS (in the early stages). Getting more specialized one comes across things like Gretl primarily geared toward econometrics, Tetrad for causal modeling, OpenBugs for Bayesian analysis and even more specific analysis such as G*Power which does sample size/power estimation. The gist is that there is something out there for most academic statistical needs that will only cost the time to learn them, and may not only work as well but even better.

Free = Bad?

Unfortunately however, there is a common misperception in the academic world among students and even those with Ph.D.s that if something is free in the software world it is not as good. This is very odd considering how many things we regularly receive free in daily life such as love from family members, meals provided by friends, books etc. from libraries, countless commonly used applications for computers and web-browsing, additional services if certain other conditions are met and so on, not to mention how long the open source movement has been around. If a (reliable) friend offers a ride to the airport, have you ever heard of someone responding that they'd rather take a cab because it's obviously better based on price? The vast majority of our computing relies on freely available programs/languages (e.g. C), but there are open source or free versions of many software/internet utilities (open or proprietary) that millions of people take advantage of daily like Firefox, Open Office, virus scanners, file compression utilities, email accounts and clients, Linux features prominently in the server market, most web servers run the free and open source Apache web server software etc. It is unusual to assume that the same satisfaction people have with the functionality of those programs could not also be found also in academic research computing, and it is the case that high quality products are available.

Everything of course has some cost whether monetary or simply the time and efforts of those involved in providing the product or service. But in the end the distinction to be made is one of quality, not monetary cost. When choosing statistical software one only needs to ask -- Does it do what you want and does it do it well? The first part of that question regards bare minimum functionality, the second regards quality, as just simply providing a basic statistical or statistically-related function, e.g. standard deviation, bar/line graphs, ordinary least squares regression, etc. is not enough given modern computing capabilities and what is available nowadays. Just remember that there are open source and otherwise free alternatives that are advanced in their offerings and often flexible enough to tailor to your research needs, whatever they may be.

 

Bookmark


Originally published, January 2009 -- Please note that information published in Benchmarks Online is likely to degrade over time, especially links to various Websites. To make sure you have the most current information on a specific topic, it may be best to search the UNT Website - http://www.unt.edu . You can also search Benchmarks Online - http://www.unt.edu/benchmarks/archives/back.htm as well as consult the UNT Helpdesk - http://www.unt.edu/helpdesk/ Questions and comments should be directed to
benchmarks@unt.edu


Return to top