Missing Form Check for a Longitudinal Data Study

Note: This program was developed on a 32-bit Windows system.

* ---------------------------------------------------------------------------- ;
* Our master file is based on our study enrollment form. Our initital cohort   ;
* consists of 200 participants. This program will identify which subjects are  ;
* missing subsequent forms. To properly assess this, we must also identify     ;
* which subjects have passed away since study enrollment and thus exclude them ;
* from the reports to follow.                                                  ;
* ---------------------------------------------------------------------------- ;
* assign library;
libname febstat 'S:FEBSTATDatasets';

* ---------------------------------------------------------------------------- ;
* set global macro variables including where to write your output file.        ;
* ---------------------------------------------------------------------------- ;
%let lib=FEBSTAT;                   %put lib=&lib.;
%let masterFile=T1H01;              %put masterFile=&masterFile.;
%let deceasedSrcFile=TRG03;         %put deceasedSrcFile=&deceasedSrcFile.;
%let pdfDest=S:FEBSTATprogramst1hreportssubjectsMissingBaselineForms.pdf;
* ---------------------------------------------------------------------------- ;
* check to see when the dataset containing information on deceased subjects    ;
* was created (from a database pull)                                           ;
* ---------------------------------------------------------------------------- ;
PROC SQL noprint;
       select datepart(crdate) format=mmddyy10. into:createDate
     from dictionary.tables
       where libname="&lib." and memname="&deceasedSrcFile";
QUIT;
%put createDate=&createDate.;
* ---------------------------------------------------------------------------- ;
* set options including macro and format locations                             ;
* ---------------------------------------------------------------------------- ;
options nocenter nonumber nodate;

options mautosource sasautos='S:FEBSTATprogramsmacros';
options symbolgen mprint;

options fmtsearch=(febstat);
* ---------------------------------------------------------------------------- ;
* Flag deceased subjects                                                       ; 
* ---------------------------------------------------------------------------- ;
data passed; 
     set &lib..&deceasedSrcFile. (keep=iecnum idnum death 
                                  rename=(death=subjectDeceased));
     if subjectDeceased=1;
run;

proc sort data=&lib..&masterFile. out=masterFile;
     by idnum;
run;

data masterFile; set &lib..&masterfile.(keep=iecnum idnum);
     in&masterFile.=1;
run;

%macro merge(dset);
proc format; 
     value yes
     1='Yes'
     .='No';
run;

proc sort data=&lib..&dset.; 
     by idnum; 
run; 

data &dset.; set &lib..&dset (keep=iecnum idnum);
     in&dset.=1;
run;

data &dset.; merge masterFile(in=a) &dset(in=b) passed (in=c);
     by idnum; 
if a=1 and b^=1 and c^=1;
format in&dset. &dset. subjectDeceased yes.;
run;

*ods listing close;
ODS ESCAPECHAR='^';
ods proclabel="Missing &dset.";
ods pdf body="&pdfDest."
    style = sasweb
    startpage=yes;
footnote justify=right 'Page ^{thispage} of ^{lastpage}';

proc print data=&dset.;
     var iecnum idnum in&masterFile. in&dset. subjectDeceased ;
title Subject Present in &masterFile. but not &dset.;
title2 As of &createDate..;
run;
%mend merge;
* ---------------------------------------------------------------------------- ;
* check for subjects missing from the following datasets                       ;
* ---------------------------------------------------------------------------- ;
%merge(T1H02)
%merge(T1H03)
%merge(T1H04)
%merge(T1H050)
%merge(T1H051)
%merge(T1H052)
%merge(T1H053)
%merge(T1H054)
%merge(T1H055)
%merge(T1H056)
%merge(T1H057)
%merge(T1H058)
%merge(T1H059a)
%merge(T1H059b)
%merge(T1H06)
%merge(T1H07)

ods pdf close;
ods listing;
* ---------------------------------------------------------------------------- ;
* Clear results window and delete all datasets from the work library           ;
* (Optional and not necessary when running in batch mode)                      ;
* ---------------------------------------------------------------------------- ;
dm 'odsresults; clear';;
proc datasets library=work kill;
quit;

Generate a Create Date Macro Value For a Specific SAS Dataset

Note: This program was developed to run on a 32-bit Windows system.

PROC SQL noprint;
     select datepart(crdate) format=mmddyy10. into:createDate
     from dictionary.tables
     where libname="LIB" and memname="DATASET";
QUIT;

%put createDate=&createDate.;

Sample Log Output:

SYMBOLGEN:  Macro variable CREATEDATE resolves to 04/08/2011
2
3  %put createDate=&createDate.;
createDate=04/08/2011

Generate Character Frequency Reports for Data Cleaning

An email I sent to a colleague describing how to use the program below:

Attached is a program which help you look for inconsistencies in how the character data was entered (i.e. Not applicable vs N/A vs Not Applicable) and so on. Another thing that it will do is that it will let you know how many records are missing values if you wanted to do a global N/A or something of that nature.

The program will work on any SAS dataset. It automatically runs a frequency of values on every character variable (string) in a given dataset. Some of the long more narrative text fields you’ll likely just want to ignore. (When it comes time for analysis someone will have a lot of fun reading those long explanations and coming up with ways to code them. Did you ever have the pleasure of studying content analysis or text mining?) :)

The program creates an individual report per dataset but I bound them into a single PDF to make it easier to send to you.

Please let me know if you have any questions. The most important things to remember are:

  • DO NOT EDIT THE MACRO IF YOU ARE NOT 100% SURE WHAT TO DO
  • Keep an original copy of the program to go back to on the off chance that a stray keystroke breaks something.

All you need to do to get this program to work is to assign a library (i.e. point SAS to the datasets) and tell SAS what directory to dump your PDFs into. For convenience that is set with a global macro variable at the top. (I try to set parameters above the bulk of the code to help maintain the integrity of the actual program.)

The last section of the program executes the macro against a given dataset. You can comment out a single line with an asterisk at the start of the line or separate out a block using /* and */.

Note: This program was developed on a 32-bit Windows system.

*------------------------------------------------------------;
* Set options and parameters                                 ;
*------------------------------------------------------------;
options mprint symbolgen;
options nonumber nodate nocenter nosource spool; 
*------------------------------------------------------------;
* Path to output                                             ;
*------------------------------------------------------------;
%let pdfDest=S:FEBSTATReportsMisc Reports;
*------------------------------------------------------------;
* Set library                                                ;
*------------------------------------------------------------;
libname febstat "S:FEBSTATDatasets";
*------------------------------------------------------------;
* The macro (essentially a subroutine) - DO NOT MODIFY       ; 
* The "dset" variable is where you assign the dataset you    ;
* wish to process.                                           ;
*------------------------------------------------------------;
%macro charFreq(dset=);
dm 'odsresults; cancel'; 
proc datasets library = work nolist;
    modify &dset.;
      attrib _all_ label='';
quit;

proc contents data=&dset. varnum out=foo 
     (where=(type=2 and name not in('IDNUM','SUBINIT'))
      keep=name type varnum) noprint;
run;

proc sort data=foo;
     by varnum;
run;

data _null_; 
    retain N 0;
    set foo(keep=name) end=last;
    call symput("name"||trim(left(_N_)),compress(NAME));
    if last then call symput('N',trim(left(_N_)));        
run;

ods listing close;
ods pdf body="&pdfDest.&dset._charFreq.pdf" 
        style=sasweb
        startpage=yes;
    %do printLoop = 1 %to &N.; 
        ods proclabel="&&name&printLoop";
        proc freq data=&dset.;
             tables &&name&printLoop / list out=&&name&printLoop nocum nopercent;
	         title &&name&printLoop (Unique Values);
        run;

        proc datasets lib=work nolist;
             delete &&name&printLoop;
        quit;
   %end; 
ods pdf close;
ods listing;
title;
%mend charFreq;
*------------------------------------------------------------;
* Execute program;
*------------------------------------------------------------;
%charFreq(dset=LIBNAME.FOO);

SAS Macros To Create .CSV Output Using Proc Export

This program exports the ID and site information for Capture-Recapture Analysis of subjects in a longitudinal data study using R. The dynamic .CSV export macro loops through the numeric values of a given variable in a dataset using that value to both subset the data and modify the output file’s name.

Note: This program was developed on a 32-bit Windows system.

libname ks "S:nfoxKansasDatasetsPERK";
libname kansas "S:nfoxKansasDatasets"; *formats are saved here;
options fmtsearch=(kansas);
* ------------------------------------------------------------------------------- ;
* Create Macros for CSV file export ;
* ------------------------------------------------------------------------------- ;
%macro exportCSV(dset=,where=,outfile=,drop=gender VAR2);
DATA FOO(drop=&drop.); 
     SET &dset. (where=(&where.));
RUN;
PROC EXPORT DATA=FOO 
            OUTFILE=&outfile. 
            DBMS=CSV REPLACE;
     PUTNAMES=YES;
RUN;
%mend exportCSV;

%macro exportDynamicCSV;
%do exportLoop = 1 %to 4;
%exportCSV(where=ageGroup2=&exportLoop.,
           outfile="S:nfoxKansasProgramscapturecaptureKS-ageGroup&exportLoop..csv");
%end;
%mend exportDynamicCSV;
* ------------------------------------------------------------------------------- ;
* Recode site-related data for Capture Analysis ;
* ------------------------------------------------------------------------------- ;
data ks.ids; 
     retain idnum stf_id1 site1 stf_id2 recap1 stf_id3 recap2 stf_id4 recap3
                  stf_id5 recap4 stf_id6 recap5;
     length site1 recap1-recap5 3;
     set ks.stf (keep=idnum stf_id1-stf_id6 stf_14 stf_15 stf_4 fd_7 ageGroup gender);

     year=scan(stf_15,-1);

     if fd_7=1 and (stf_14=0 or (stf_14=1 and year >= 2008)) and stf_4
         in('66757','67301','67333','67335','67337','67340','67344','67347',
        '67351','67363','67364','66753','67330','67332','67335','67336',
        '67337','67341','67342','67351','67354','67356','67357');

     site1=scan(stf_ID1,1,'-');
     array recap {5} recap1-recap5;
     array sites {5} stf_ID2-stf_ID6;
     do i = 1 to 5;
	    if sites{i} ne '-999' then do;
	       recap{i}=scan(sites{i},1,'-');
	       if site1=recap{i} then recap{i} = -999;
	    end;
	    if sites{i} eq '-999' then do;
	       recap{i}=sites{i};
	    end;
   	    if recap2 ne -999 then do;
	       if recap2 = site1 then recap2=-999;
               if recap2 = recap1 then recap2=-999;
	    end;
 	    if recap3 ne -999 then do;
	       if recap3 = site1 then recap3=-999;
               if recap3 = recap1 then recap3=-999;
	       if recap3 = recap2 then recap3=-999;
	    end;
	    if recap4 ne -999 then do;
	       if recap4 = site1 then recap4=-999;
               if recap4 = recap1 then recap4=-999;
	       if recap4 = recap2 then recap4=-999;
	       if recap4 = recap3 then recap4=-999;
   	    end;
            if recap5 ne -999 then do;
	       if recap5 = site1 then recap5=-999;
               if recap5 = recap1 then recap5=-999;
	       if recap5 = recap2 then recap5=-999;
	       if recap5 = recap3 then recap5=-999;
	       if recap5 = recap4 then recap5=-999;
	    end;
     end;
	 IF AGEGROUP IN(1,2) THEN AGEGROUP2=1;
	 IF AGEGROUP IN(3,4) THEN AGEGROUP2=2;
	 IF AGEGROUP IN(5,6) THEN AGEGROUP2=3;
	 IF AGEGROUP >=7  THEN AGEGROUP2=4;
drop i stf_id1-stf_id6 stf_14 stf_15 stf_4 fd_7 year ageGroup; 
run;
* ------------------------------------------------------------------------------- ;
* Call Macros and export CSV files ;
* ------------------------------------------------------------------------------- ;
%exportCSV(where=1,outfile="S:nfoxKansasProgramscapturecaptureKS.csv")
%exportCSV(where=gender=1,outfile="S:nfoxKansasProgramscapturecaptureKS-malePop.csv")
%exportCSV(where=gender=2,outfile="S:nfoxKansasProgramscapturecaptureKS-fmPop.csv")
%exportDynamicCSV

Developing Shell Scripts to Run SAS using Cygwin

An article describing how to set up SAS to run in Cygwin shell scripts hosted at Data Savant Consulting:

Linking to SAS

Unix has got to find SAS, somehow, if you have SAS on your PC. You can add the directory with the sas executable (sas.exe) to the end of the PATH like this:

PATH=$PATH:/cygdrive/c/shellscripts:/cygdrive/c/PROGRA~1/SASINS~1/SAS/v8

Note that because Unix will not accept spaces in directory names, you have to use the pure DOS notation for some of the directories. You can see that I have done this for “Program Files” and “SAS Institute”.

Running SAS in batch

Some of the shell scripts on this site run SAS in batch. It is supposed to be done discretely so that the user is not aware that they are running sas. In order to suppress the pop-up windows you get with PC sas then you can disable them using the following options:

sas -nosplash -nologo -icon

When you call sas in one of your scripts then set these options to suppress the windows. I do not know of a shorthand way of doing this. There is no point setting up an alias in your .bashrc as this will not get exported to any sub-processes like you get for a shell script.

The second problem running SAS in batch is that it will have is locating the program files. Your PC sas will not understand the Unix directory structure. It will expect a DOS file name. The scripts I write that call SAS, generate program code in my home directory and then invokes SAS to run the code. I have set up a HOMEW variable in my .bashrc and exported it so that I can use it in shell scripts.

HOMEW=C:cygwinhomeDefault
export HOMEW

In the shell script I then use $HOMEW as both the destination of the log and to indicate where the program code is stored.

# Run the SAS code
sas -nosplash -nologo -icon -log "$HOMEW" -sysin "$HOMEWcontents.sas"

If you are developing shell scripts that call SAS on your PC and they are intended eventually for a Unix platform then you have to change $HOMEW to $HOME and change the backslash to a forward slash in the program location. You may have to remove the “-nosplash -nologo -icon” options as well.

Print Loop Macro

Note: This program was developed on a 32-bit Windows system.

Output Data Cleaning Reports by Table Name and ID

%macro printLoop(dset=,pdfDest=,rpt=);
data _null_; 
     retain N 0;
     set &dset. (keep=idnum iecnum) end=last;
     call symput("iecnum"||trim(left(_N_)),compress(iecnum));
     call symput("idnum"||trim(left(_N_)),compress(idnum));
     if last then call symput('N',trim(left(_N_)));        
run;
%do printLoop = 1 %to &N.;
data foo; set &dset.(where=(idnum="&&idnum&printLoop"));
run;
ods pdf body  = "&pdfDest.&d._&&idnum&printLoop...pdf"
        style = sasweb
        bookmarkgen=yes
        startpage=yes
        author="Columbia University Medical Center";
%inc &rpt.;
ods pdf close;
%end;
%mend printLoop;

Download SAS Program

To Output Data Cleaning Reports by Site, Table Name, and ID replace the PDF statement in the program with this statement below:

ods pdf body  ="&pdfDest.&&iecnum&printLoop&d._&&idnum&printLoop...pdf"

Program to Check for Variable Length Discrepancies Among Datasets

Note: This program was developed on a 32-bit Windows system.

libname archive 'S:pathtosaslib';
proc datasets lib=archive;
     contents data=_all_
     out=work.foo(keep=memname name length
     where=(name in('IDNUM','SUBINIT')));
quit;
proc sort data=foo;
     by name;
run;
ods listing;
ods pdf file="S:pathtooutput.pdf"
        style=sasweb;
proc report data=foo nowd;
     by name;
     column memname name length;
     define memname / display;
     define name / display ;
     define length / display ;
title 'Variable Length Discrepancies Among Datasets';
title2 'March 25th, 2011 Data Release';
run;
ods pdf close;

Download SAS Program
View Output

Create .CSV File Documenting SAS Dataset Creation Dates

Note: This program was developed on a 32-bit Windows system.

Generating Paths Based on Dictionary.Tables variables memname & crdate:

%let basearchivepath=S:pathtodatasets;
%let pathLabel=New Cohort SAS Datasets;
%put base=&basearchivepath. label=&pathLabel.;

libname febstat 'S:pathtoSASlib';

PROC SQL noprint;
   create table foo as
   select memname, crdate
   from   dictionary.tables
   where  libname="FEBSTAT"
   order by crdate;
QUIT;

data foo;
     set foo;
length Path $255;
label createDate = 'Create Date'
      memname = 'Dataset (Member) Name'
      Path = 'Path to Data';
createDate=datepart(crdate);
fmtDate=put(createDate,yymmdd10.);
path="&basearchivepath."||compress(fmtDate) ||""||"&pathLabel.";
format createDate yymmdd10.;
drop crdate fmtDate;
run;

proc sort data=foo;
     by memname createDate;
run;

ods listing close;
ods csv file="pathtooutput.csv";
 proc print data=foo noobs label;
      var memname createDate Path;
 run;
ods csv close ;
ods listing;

Download SAS Program
View Output

Missing Data Summary Macro

Note: This program was developed on a 32-bit Windows system.

%macro missingDataSummary(dset=);
ods listing close;
ods pdf body="&pdfDest.&dset._missing.pdf";

proc contents data=&dset. varnum out=foo(keep=name) noprint;
run;

data _null_; 
     retain N 0;
     set foo end=last;
     call symput("name"||trim(left(_N_)),compress(NAME));
     if last then call symput('N',trim(left(_N_)));        
run;

%let obs = '';

data foo; set &dset.;
format _numeric_ missing. _character_ $mischar.;
run;

data miss; 
     set foo;
     %do printLoop = 1 %to &N.; 
         if missing(&&name&printLoop) then output;
	 keep &&name&printLoop;
     %end; *close printLoop; 
keep iecnum idnum;
run;

proc sort data=miss nodup;
     by idnum;
run;

proc sql noprint; 
     select count(*) into: obs
     from WORK.miss;
quit;

%if &obs > 0 %then %do; 
proc contents data=miss varnum out=miss2(keep=name) noprint;
run;

data _null_; 
     retain N 0;
     set miss2 end=last;
     call symput("name"||trim(left(_N_)),compress(NAME));
     if last then call symput('N',trim(left(_N_)));        
run;

     %do printLoop = 1 %to &N.; 
         ods proclabel="&&name&printLoop";
         proc print data=miss noobs;
              where missing(&&name&printLoop);
              var iecnum idnum &&name&printLoop;
 	          title Missing &dset Data: &&name&printLoop; 
         run;
      %end; 
%end;
ods pdf close;
ods listing;
title;
%mend missingDataSummary;

Download SAS Program

View the Contents of a Format Library

Note: This program was developed on a 32-bit Windows system.

* ---------------------------------------------------------------------------- ;
* View the contents of a format library                                        ;
* ---------------------------------------------------------------------------- ;
proc catalog catalog = febstat.formats;
contents;
run;
quit;
* ---------------------------------------------------------------------------- ;
* Examine the values of specific format in the library                         ;
* ---------------------------------------------------------------------------- ;
proc format library = febstat.formats;
select gender;
run;
quit;
* ---------------------------------------------------------------------------- ;
*To print out a saved FORMAT library, use the FMTLIB option as indicated       ;
* ---------------------------------------------------------------------------- ;
libname library;
proc format library=library fmtlib;
run;