Research and Statistical Support

MODULE 8

Restructure Data

The Restructure function is often useful when dealing with data which is in Long format and one needs the data in Wide format, or vice versus. Long format refers to data in which each observation or participant has multiple rows. Wide format refers to data in which each observation or participant has only one row. This tutorial will focus on using the restructure function to change data from long format to wide format. Longitudinal research is an example of the type of research which often creates data files in long format.

For the duration of this tutorial we will be using the LongExample.sav file; which contains 10 participants measured on 3 outcome variables under 10 different conditions. This data was generated using this script in R.

Begin by importing the above data file (LongExample.sav) and briefly examine the data.

Notice, we have a 'code' variable which simply assigns a sequential number to each row of data. Then, there is a 'participant.id' variable in which each number represents a participant (or case, or observation). Next, we have a series of categorical variables which each identify a condition of our study. The first two such variables, 'x1.numbers' and 'x1.letters' can be thought of as identifying the 10 different times of measurement using either numbers or letters respectively as identifiers. The next two variables ('x2.numbers' & 'x2.letters') can be thought of as representing 5 different conditions of our study (e.g. 5 different therapy types, 5 different treatment drugs, 5 different locations, etc.). The next two variables ('x3.numbers' & 'x3.letters') can be thought of as representing 2 different conditions of our study, similar to the x2 variables; but with only two levels. The final three variables are our interval / ratio outcome variables.

So, currently there are 100 rows of data with 10 participants, each measured 10 different times and at each time of measure they were exposed to a unique set of x2 and x3 conditions and measured with three instruments.

Our goal here is to use the Restructure function to transform the format of the data file from its current Long format (each participant has 10 rows), to a short format (where each participant has one row) while still retaining all the information contained in the original data file.

Start by clicking on Data in the tool bar. Next, click on Restructure...

Next, select "Restructure selected cases into variables" option which is emphasized with a red ellipse here. Then click the Next > button.

Step 2; highlight / select the participant.id variable and use the top arrow button to move it to the Identifier Variable(s): box. Then click the Next > button.

Step 3; we do not need to change the Sort option; we can allow SPSS to sort the data by the identifier -- which will list each participant sequentially from 1 to 10 as rows in our new data file. Click the Next > button to continue.

Step 4; select "Group by index(for example: w1 h1, w2 h2, w3 h3" which is emphasized here with a red rectangle. This option will allow us to keep each unique combination of x1, x2, x3 conditions' outcome scores separated from each group of outcome scores associated with every other unique combination of x1, x2, x3 conditions. Click the Next > button to continue.

Lastly, select the "Paste the syntax generated by the wizard into a syntax window" option which is emphasized here with a red ellipse. After selecting the paste option, click the Finish button and a warning box will appear to let you know the sets (data sets) will still be available in after restructuring has taken place.

Click the OK button and a syntax window will open with the generated syntax in it.

In the syntax window, highlight all the text and then click on the run Selection button to run the syntax. Below right; the word Selection has appeared because the cursor is being held over the run Selection button.

Once the function runs, the output window will open (if not already open) and it will contain some trivial output showing which variables were generated (really which variables were transposed) and it will show a Processing Statistics table which displays the number of cases in and out, number of variables in and out, and the number of index variables.

The new data file should resemble what is below.

Each row in the new data file corresponds to a single participant because we used the participant.id variable as our identifier variable. The participant.id variable is the only column which was not changed or re-named in the restructuring. The code variable from the original data has now been broken down into ten columns and servers as a marker in the new data file; marking each of the 10 segments or chunks of data. By segment or chunk, we mean each time of measure. The x1.number and x1.letters variables also serve this function; identifying each of the ten times of measure. Each segment contains the unique combination of x1, x2, x3 identifiers and a unique score on x4, x5, x6 for each participant. Sliding the cursor to the right (in the data window of SPSS) you'll notice each time of measure, or chunk; identified by the sequential numbers and letters (chunks from left to right, 1 to 10 & A to J) in the columns associated with the x1 variables. You will also notice in each chunk a unique identifier for each of the x2 (1 to 5 & A to E) and x3 (1 to 2 & A to B) variables. Each also chunk contains scores (for each participant) on the three outcome measures (x4, x5, x6). Having the data in this format allows us to run repeated measures analysis and/or compute total scores for each or any unique combination of conditions.

As is the case with all of the tutorials on this web site, this tutorial should not be considered an exhaustive review of the topic covered; restructure data from long to wide format. Restructuring data from wide to long can be done by using similar steps; simply choose "Restructure selected variables into cases" at the initial Restructure Data Wizard dialog and follow the steps of the wizard.