Research and Statistical Support

MODULE 9.1

Bootstrapped confidence interval for the Independent t-test

The following covers how to conduct a bootstrapped resampling procedure to get confidence intervals for a t-test. Use the File, Import Data... to import the Example Data 1 file using the Import Wizard with SPSS File (*.sav) source and member name example1 as was done previously.

Let's start by getting a look at the data and variables of interest.

PROC PRINT DATA = example1;
RUN;

PROC MEANS DATA = example1;
CLASS candy;
VAR recall1;
RUN;

1. Next, we can conduct the t-test.

We can use PROC TTEST to examine differences between two independent groups. Notice in the output, we get t-values for variances assumed equal and variances not assumed equal.

Run the independent groups t-test.

```PROC TTEST DATA=example1;
CLASS candy;
VAR recall1;
RUN;```

2. Building the macro or function; to run the bootstrapped re-sampling (yes, this takes some time to type!) with 1000 re-samples. References: (1) (2)

%MACRO bootse (b);
DATA orig1 (WHERE = (candy = 1))
orig2 (WHERE = (candy = 2));
SET example1;
RUN;

DATA boot;
%DO t = 1 %to 2;
DO sample = 1 to &b;
DO i = 1 to NOBS;
pt = ROUND(RANUNI(&t) * NOBS);
SET orig&t NOBS = NOBS POINT = pt;
OUTPUT;
END;
END;
%END;
STOP;
RUN;

PROC MEANS
DATA = boot
NOPRINT
NWAY;
CLASS sample candy;
VAR recall1;
OUTPUT out = x
MEAN = mean;
RUN;

DATA diffmean;
MERGE x (WHERE = (candy = 1) RENAME = (mean = mean1))
x (WHERE = (candy = 2) RENAME = (mean = mean2));
BY sample;
diffmean = mean1 - mean2;
RUN;

PROC MEANS
DATA = diffmean
STD;
VAR diffmean;
OUTPUT out = bootse
STD = bootse;
RUN;
%MEND;

%bootse (1000);

DATA bootorig;
SET example1 (in = a)
boot;
if a THEN sample = 0;
RUN;

PROC MEANS
DATA = bootorig
NOPRINT
NWAY;
CLASS sample candy;
VAR recall1;
OUTPUT out = x
mean = mean
var = var
n = n;
RUN;

DATA diff_z;
MERGE x (WHERE = (candy = 1) RENAME = (mean = mean1 var = var1 n = n1))
x (WHERE = (candy = 2) RENAME = (mean = mean2 var = var2 n = n2));
BY sample;

diffmean = mean1 - mean2;
diffse = sqrt ((var1 + var2) / (n1 + n2));

RETAIN origdiff;
IF sample = 0 THEN origdiff = diffmean;

diff_z = (diffmean - origdiff) / diffse;
RUN;

PROC SORT
DATA = diff_z;
BY diff_z;
RUN;

DATA t_vals;
SET diff_z END = eof;

RETAIN t_lo t_hi;
IF _n_ = 975 THEN t_lo = diff_z;
IF _n_ = 25 THEN t_hi = diff_z;

IF eof THEN OUTPUT;
RUN;

DATA ci_t;
MERGE diff_z (WHERE = (sample = 0))
bootse (KEEP = bootse)
t_vals (KEEP = t_:);

conf_lo = origdiff - (t_lo * bootse);
conf_hi = origdiff - (t_hi * bootse);

KEEP origdiff bootse t_lo t_hi conf_lo conf_hi;
RUN;

3. Finally, we can then pull out the confidence interval limits.

PROC PRINT DATA = ci_t;
RUN;

4. With all due respect to the SAS Institute....that's a ridiculous amount of code when compared to what is necessary to do essentially the same thing in R. See the Do It Yourself Introduction to R course, specifically, Module 5 which covers t and F tests. The comments and code below were adapted from that module.

```### Robust t-test.
# First create an object (called 'x1' here) to show each group of Candy on Recall1.
x1 <- split(Recall1, Candy)