Page One

Campus Computing News

Important Academic Mainframe News

Secure Communications to be Required for ACS UNIX Hosts

BulkMail gets an Update

The Force is on the way . .

Mac OS X is Here!

PostgreSQL -- What is it?

Today's Cartoon

RSS Matters

SAS Corner

The Network Connection

List of the Month

WWW@UNT.EDU

Short Courses

IRC News

Staff Activities

Subscribe to Benchmarks Online
    

Research and Statistical Support


SAS Corner

By Dr.Karl Ho, Research and Statistical Support Services Manager

SAS Speaks XML

What does the development of the Web's latest language have to do with SAS programming? That's right, SAS finally speaks XML. This article briefly discusses how SAS taps into this latest trend on the Web and advances to the next generation as a data delivery system.

The Extensible Markup Language (XML) is described as the next generation ASCII that will be omnipresent in Web and local applications. It provides a "framework for delivering structured data ... within and between applications and organizations"*.

Tracing its roots, XML is an offspring from the more general Standard Generalized Markup Language first developed in the 1980s. HTML, which is from the same root, can be called a cousin of XML.  Unlike the former, which is primarily concerned with presenting information on the Web, XML is loaded with a mission of "carrying" the data in a way that multiple applications on different platforms can recognize.

XML can take advantage of the hierarchical or nested nature of data contents and deliver them in a flexible, tagged format. Data contents, for example, can be arranged or nested in a hierarchy of tags according to the data structure (library, data sets, columns/fields, etc.).  The following is an example of a data set in XML format. While HTML is limited to predefined formatting tags, XML programmers can customize tags used to describe data:

Data contents are organized under one root level of LIBRARY, while STUDENTS records cases each of which constitutes variables including ID, NAME, ADDRESS, CITY and STATE.  In fact, the hierarchy can go into further levels, say, under ADDRESS that consists of more variables like street, streetnumber, etc. 

Beginning with 8.0, several methods of generating and importing XML documents have been incorporated into SAS. Programmers can now toy with XML documents by using SAS Data Step, Output Delivery System (ODS) or the XML libname engine.  

SAS Data Step

One can simply convert a SAS data set into XML using SAS syntax.  The following sample program implements this simple task:

filename outxml "c:\temp\simplecustomer.xml";
Data _null_;
 file outxml;
 set sampdata.cust10 NOBS=Lst;
 length gender $6. marital_status $11.;
 %let tab=" ";
 addr1=htmlencode(addr1);
 addr2=htmlencode(addr2);
 if sex=0 then gender='Female';
else gender='Male';
if married=0 then marital_status="Not Married";
else marital_status="Married";
if _n_=1 then do;
put '<?xml version="1.0" ?>';
put '<customer-data>';
end;
put '<contact-information>';
put &tab'<cust-id>' custnum '</cust-id>';
put &tab'<name>' name '</name>';
put &tab'<gender>' gender '</gender>';
put &tab'<age>' age '</age>';
put &tab'<income>' income '</income>';
put &tab'<status>' marital_status '</status>';
put &tab &tab '<address>';
put &tab &tab '<street ORDER="1">'
addr1 '</street>';
put &tab &tab '<street ORDER="2">'
addr2 '</street>';
put &tab &tab '<city>' city '</city>';
put &tab &tab '<state>' state '</state>';
put &tab &tab '<zip-code>' zip '</zip-code>';
put &tab &tab '<region>' region '</region>';
put &tab &tab '</address>';
put '</contact-information>';
if _n_ = lst then do;
put '</customer-data>';
end;
run;

Click here to view the output XML file. The result file may not be appealing to the eyes but it renders the data or output dataset in a self-contained/self-explanatory fashion that appeals to other applications. Since the organization and data structure are embedded, it is very flexible for incrementing or updating new data and transporting data to other applications or environments. From here the metaphor goes: XML to data applications is like ASCII to word processors.

Output Delivery System (ODS)

ODS allows you to generate more "rich" XML files with metadata ("data about data" or information of the data in terms of how they can be used). The following example illustrates the XML output of two GLM procedures. The resulting data can be transported for browsing or further computation by other applications.

data plants;
   input type $ @;
   do block=1 to 3;
      input stemleng @;
      output;
      end;
   cards;
clarion  32.7 32.3 31.5
clinton  32.1 29.7 29.1
knox     35.7 35.9 33.1
o'neill  36.0 34.2 31.2
compost  31.8 28.0 29.2
wabash   38.2 37.8 31.9
Webster  32.5 31.1 29.7
;

data mileage;
   input mph mpg @@;
   cards;
20 15.4 30 20.2 40 25.7 50 26.2 50 26.6 50 27.4 55  . 60 24.8
;

/* Choose XML destination in ODS */
ods xml file="c:\temp\glmex.xml";

proc glm data=plants;
   class type block;
   model stemleng=type block;   run;

proc glm order=data data=plants;
   class type block;
   model stemleng=type block / solution;
   means type / waller regwq;

*-type-order------------clrn-cltn-knox-onel-cpst-wbsh-wstr;
   contrast 'compost vs others'  type -1 -1 -1 -1  6 -1 -1;
   contrast 'river soils vs.non' type -1 -1 -1 -1  0  5 -1,
                                 type -1  4 -1 -1  0  0 -1;
   contrast 'glacial vs drift'   type -1  0  1  1  0  0 -1;
   contrast 'clarion vs Webster' type -1  0  0  0  0  0  1;
   contrast 'knox vs oneill'     type  0  0  1 -1  0  0  0;   run;

quit;
  
/* Close ODS destination */
ods xml close;

Click here to see the output.

XLM LIBNAME engine

The new, experimental engine can read and write XML files in different formats including Oracle, OIBDBM (Open Information Model (Database Schema Model)) and bare HTML.  The syntax is just like:

  LIBNAME libref XML 'external-file' <XML-engine-options>;

whereas the engine is pointing to a file

The following sample output illustrates the marriage of XML with Java applet in generating a dynamic organizational chart.

An example XML page of a java-enabled, dynamic organizational chart (For Windows IE 5 only)

Incorporation of XML in SAS reminds me of the mainframe programs that have embedded data. Those programs were considered clumsy and prone to typo error or file corruption. Also, this method cannot accommodate sizeable data. When space was a major issue in the old days, these programming practice was considered inefficient and unreliable. Programmers later develop practices to divide program and data into separate files and routines that called in external data file. It is more secure to put data in one file while more flexible to program syntax in a separate file. On top of that, program can use different programming skills or languages to read in the raw data file. ICPSR, for instance, delivers datasets in this manner: one raw data file one SPSS syntax file and one SAS syntax file. 

Flexibility however was achieved at the expense of data and syntax segregation. Data and metadata have to be delivered in multiple files. When one of the files, particularly the syntax file, is missing or contaminated, the raw data would be rendered useless. 

XML is certainly an evolved method that delivers data in a self-contained format: data file itself carries the metadata so every piece of information is carried in one single file for different applications on various environment. Although space-intensive it maybe, one single file can contain much more "real, rich" data than raw data file or data in any other format!  

XML is the way to go. Before long, a new data language standard will emerge for all statistical and data applications. Despite that various data formats will prevail for a while, data crunchers will hopefully enjoy soon as much unity as our present diversity. 


* From Nelson, Greg Barnes. 2000. "XML and SAS: An Advanced Tutorial" SAS Advanced Tutorials Paper 13-25 (http://www2.sas.com/proceedings/sugi25/25/aa/25p013.pdf)

Reference:

Kent, Paul. 2001 SUGI 26 Presentation, SAS Takes Advantage of XML.(Powerpoint presentation).

Nelson, Greg Barnes. 2000. "XML and SAS: An Advanced Tutorial" SAS Advanced Tutorials Paper 13-25 (http://www2.sas.com/proceedings/sugi25/25/aa/25p013.pdf)