- Home
- About Us
- Workshops & Seminars
- Software Help
- Software Access
- Spatial Analysis / GIS
- External Resources
3550 Rackham Building University of Michigan Ann Arbor, MI 48109-1070 cscar@umich.edu
more info
SAS Topics: Installing SAS V9.x
Contents of This Page
- SAS Operators
- Lists of Variables
- SAS Functions
- Code Examples
Extensive definitions and explanations of the rules of the SAS language are given in the SAS Language Reference, in Chapter 4. This handout discusses a few of these rules that are helpful when doing transforms and recodes in the data step.
SAS Operators
Arithmetic Operators:
SAS arithmetic operators indicate that an arithmetic operation is performed. The arithmetic operators are shown below:
Symbol Definition Example ** Exponentiation y=x**2 z=x**y * Multiplication z=x*y / Division z=x/y + Addition z=x+y - Subtraction z=x-y
Note that an asterisk (*) must always be used to indicate multiplication e.g. y=2*x, Not y=2x, or 2(x). If one of the operands to an arithmetic operator is missing the result is missing.
Logical (Boolean) Operators:
Logical or Boolean operators are used in expressions to link sequences of comparisons. The table below lists the logical operators and their mnemonic equivalents.
Symbol Mnemonic Equivalent & AND | OR ~ NOT*
*Note that the symbol for NOT depends on the terminal you are using. It is probably safer to use the mnemonic equivalent, rather than the symbol. The NOT operator can be used as shown below:
not(name=`SMITH')
is equivalent to
name ne `SMITH'
An example of a SAS expression using a logical operator would be the following:
if age < 25 and sex = `F' then select=1;
It is possible to use parentheses to help clarify the logical expression. Be sure that each left parenthesis is followed by a matching right parenthesis.
if (age < 25) and (sex = `F') then select=1;
Comparison Operators:
The following comparison operators can be written as symbols or with their mnemonic equivalent.. The comparison operators can be used in the SAS data step as part of an if...then; statement.. They can also be used as part of an if ...then ...do; statement. Comparison operators may also be used with a WHERE statement in a Proc, to select cases that will be processed by the procedure. The operators and their mnemonic equivalents are shown below.
Symbol Mnemonic Definition < lt Less than <= le Less than or equal to > gt Greater than >= ge Greater than or equal to = eq Equal to ~= ne Not equal to
Note that if the symbol is used, it is not necessary to have blank spaces around it, but if the mnemonic is used, it must be set off by spaces:
if x<y then group=1;
is equivalent to:
if x < y then group = 1;
is equivalent to:
if x lt y then group eq 1;
The mnemonics may be given in upper or lower case, or a mixture of cases.
Lists of Variables:
Lists of variables can be given in several ways in SAS. A list of variables may be given by simply separating the variables by blanks:
age sex height weight
If the variables in a list all have the same initial part (root) and the last part of the variable name is an integer, then you can use a numbered range list. The numbers must be consecutive and ascending. Note that the variables do NOT have to be consecutive in the SAS dataset.
x1-x5
is equivalent to
x1 x2 x3 x4 x5
and
quest1-quest3
is equivalent to
quest1 quest2 quest3
A name range includes variables from the first to the last inclusive. The variables in the list must be consecutive in the SAS dataset.
age -- weight
(includes all variables from age to weight)
age-numeric-weight
(includes all numeric variables from age to weight)
age-character-weight
(includes all character variables from age to weight)
Special SAS name lists include
_NUMERIC_
(all numeric variables in the dataset)
_CHARACTER_
(all character variables in the dataset)
_ALL_
(all variables in the dataset)
SAS Functions:
There are many SAS functions that have different uses. SAS functions return a value from an argument, or series of arguments. For example, the log function returns the natural log of the argument. If a function requires more than one argument, the arguments are separated by commas. The argument(s) to a function are contained in parentheses immediately following the function name. The argument(s) to a function may be either variable names, or constants, or SAS expressions (e.g. other SAS functions or mathematical expressions). There are arithmetic, array, truncation, mathematical, trigonometric, probability, quantile, sample statistics, random number, financial, character, date and time, state and ZIP code, and special functions. For a complete list of SAS functions by category, see pp 53-57 of the SAS Language Reference. Detailed descriptions of SAS functions are in Chapter 11 (p. 521 ff.) of the SAS Language Reference. SAS functions are used as part of the DATA step programming statements, and can be used with certain Statistical procedures.
Selected SAS Functions:
These functions operate on one argument. Note that if the argument is illegal (such as trying to take the square root of a negative number), SAS will return a missing value, and print an error message in the log. This will not prevent the program from executing, however.
Function Name Definition Example abs Absolute Value y = abs(x) int Integer (takes the integer part of the y = int(x) argument--like truncation) log Natural Log y=log(x) log10 Log Base 10 y=log10(x) round Rounds to the nearest specified level y=round(x,.01) Here, it rounds x to the nearest hundredth.
SAS Statistical Functions:
Statistical functions operate on at least 2 arguments. They give the result for the non-missing values of the arguments. For statistical functions, the arguments can be listed separated by commas, or lists of variables may be used, if the keyword "of" is included in the parentheses. Note that these statistical functions give sample statistics within a case. So, for example, if you had 3 variables in your file named Wt1, WT2 and Wt3 that represented 3 measurements of weight that were made on each individual in the study, you could use the statistical functions to get the mean of all the weights for that individual. If you wished to summarize values across cases, then you would use Proc Means.
Function Name Definition Example mean Mean of the nonmissing values y=mean (x1,x2,x3) min Minimum of the nonmissing values y=min(of x1-x3) max Maximum of the nonmising values y=max(of hem1-hem5) n The number of nonmissing values y=n(of age--weight) nmiss The number of missing values y=nmiss(of wt1-wt3) std Standard deviation of nonmissing y=std(5,6,7,9) stderr Standard error of the mean of nonmissing y=stderr(of x1-x20)
Note that if the mean function were used to calculate the mean weight for an individual across 3 measurements of weight, SAS would return the average of however many values of weight were nonmissing. For a case that had all 3 weights, SAS would give the average of the 3, but for a case with only 1 nonmissing weight, SAS would return the mean of the one value, which would be simply equal to the value itself.
Code Examples
Example of Using SAS Statistical Functions in the Data Step:
The following example uses some of the SAS statistical functions to calculate sample statistics within a case.
/******************************************************************* This command file for SAS demonstrates how to use SAS statistical functions. These functions must be used with at least 2 numeric arguments. They all operate on the nonmissing values. Note that lists of variables may be used, if the keyword OF is included before the variable list. The name of this file is: STATFUNC.SAS ********************************************************************/ options linesize=72 pagesize=58; title; data test; input q1 q2 q3 agemom agedad; newvar1=n(q1,q2,q3); newvar2=nmiss(q1,q2,q3); newvar3=sum(q1,q2,q3); newvar4=mean(of q1 - q3); newvar5=std(of q1 - q3); newvar6=stderr(of q1 - q3); newvar7=max(of agemom -- agedad); newvar8=min(of agemom -- agedad); if n(q1,q2,q3) ge 2 then newvar9=mean(of q1 - q3); label q1 = 'Question 1'; label q2 = 'Question 2'; label q3 = 'Question 3'; label agemom = 'mother''s age'; label agedad = 'father''s age'; label newvar1 = 'number of nonmissing values for q1,q2,q3'; label newvar2 = 'number of missing values for q1,q2,q3'; label newvar3 = 'sum of nonmissing values of q1,q2,q3'; label newvar4 = 'mean of nonmissing values of q1,q2,q3'; label newvar5 = 'standard deviation of q1,q2,q3'; label newvar6 = 'standard error of mean of q1,q2,q3'; label newvar7 = 'maximum of values of agemom and agedad'; label newvar8 = 'minimum of values of agemom and agedad'; label newvar9 = 'mean of q1,q2,q3 if 2 or more nonmissing'; cards; 2 2 . 35 37 1 1 2 22 . 2 1 3 34 38 . . 2 28 26
The output from this program is shown below:
number of nonmissing Question Question Question mother's father's values for OBS 1 2 3 age age q1,q2,q3 1 2 2 . 35 37 2 2 1 1 2 22 . 3 3 2 1 3 34 38 3 4 . . 2 28 26 1 5 1 2 2 29 30 3 6 1 2 3 26 29 3 7 2 1 1 27 27 3 8 . 2 2 . . 2 9 . . . 33 36 0 number of sum of mean of standard missing nonmissing nonmissing standard error of values for values of values of deviation of mean of OBS q1,q2,q3 q1,q2,q3 q1,q2,q3 q1,q2,q3 q1,q2,q3 1 1 4 2.00000 0.00000 0.00000 2 0 4 1.33333 0.57735 0.33333 3 0 6 2.00000 1.00000 0.57735 4 2 2 2.00000 . . 5 0 5 1.66667 0.57735 0.33333 6 0 6 2.00000 1.00000 0.57735 7 0 4 1.33333 0.57735 0.33333 8 1 4 2.00000 0.00000 0.00000 9 3 . . . . maximum of minimum of mean of values of values of q1,q2,q3 if agemom and agemom and 2 or more OBS agedad agedad nonmissing 1 37 35 2.00000 2 22 22 1.33333 3 38 34 2.00000 4 28 26 . 5 30 29 1.66667 6 29 26 2.00000 7 27 27 1.33333 8 . . 2.00000 9 36 33 . TABLE OF Q2 BY Q3 Q2(Question 2) Q3(Question 3) Frequency| Percent | Row Pct | Col Pct | 1| 2| 3| Total ---------+--------+--------+--------+ 1 | 1 | 1 | 1 | 3 | 16.67 | 16.67 | 16.67 | 50.00 | 33.33 | 33.33 | 33.33 | | 100.00 | 33.33 | 50.00 | ---------+--------+--------+--------+ 2 | 0 | 2 | 1 | 3 | 0.00 | 33.33 | 16.67 | 50.00 | 0.00 | 66.67 | 33.33 | | 0.00 | 66.67 | 50.00 | ---------+--------+--------+--------+ Total 1 3 2 6 16.67 50.00 33.33 100.00 Frequency Missing = 3
A sample SAS program using some of the Math functions is shown below:
/******************************************************************* This command file for SAS demonstrates how to use some SAS math functions, and arithmetic operators. Note that when either argument is missing for the arithmetic operator, the result will be missing.This differs from the result for the stat functions, which operate on all nonmissing values. The name of this file is: MATHFUNC.SAS See SAS Language Reference, Version 6, p 122 for math operators, pp 521-616 for functions. ********************************************************************/
The output from this run is shown below:
OBS X Y ABSX SQRTX LOG10Y LNY INT_Y ROUNDY 1 4 5.230 4 2.00000 0.71850 1.65441 5 5.2 2 -15 22.000 15 . 1.34242 3.09104 22 22.0 3 . 18.510 . . 1.26741 2.91831 18 18.5 4 -1 3.000 1 . 0.47712 1.09861 3 3.0 5 6 0.000 6 2.44949 . . 0 0.0 6 5 5.035 5 2.23607 0.70200 1.61641 5 5.0 OBS MULT DIVIDE EXPON TOT1 DIFF TOT2 1 20.920 0.76482 1408.55 9.230 -1.230 9.230 2 -330.000 -0.68182 7.48183E25 7.000 -37.000 7.000 3 . . . . . 18.510 4 -3.000 -0.33333 -1.00 2.000 -4.000 2.000 5 0.000 . 1.00 6.000 6.000 6.000 6 25.175 0.99305 3306.08 10.035 -0.035 10.035 Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------- ABSX 5 6.2000000 5.2630789 1.0000000 15.0000000 SQRTX 3 2.2285192 0.2248399 2.0000000 2.4494897 LOG10Y 5 0.9014903 0.3813419 0.4771213 1.3424227 LNY 5 2.0757581 0.8780721 1.0986123 3.0910425 INT_Y 6 8.8333333 8.9312187 0 22.0000000 ROUNDY 6 8.9500000 9.0185919 0 22.0000000 MULT 5 -57.3810000 152.9031811 -330.0000000 25.1750000 DIVIDE 4 0.1856789 0.8183669 -0.6818182 0.9930487 EXPON 5 1.4963655E25 3.345975E25 -1.0000000 7.4818276E25 TOT1 5 6.8530000 3.1652836 2.0000000 10.0350000 DIFF 5 -7.2530000 17.0255990 -37.0000000 6.0000000 TOT2 6 8.7958333 5.5374023 2.0000000 18.5100000 -------------------------------------------------------------------
Example using conditionals and formatting
This next command file shows how to do some recodes in SAS that involve conditional if ...then statements. It also shows how to use formats to set up values for both numeric and character variables. Note that the formats may be eliminated in the proc print by giving a format statement for the variables desired and then specifying a null format.
/*******************************************************************
This command file for SAS demonstrates how to do some recodes in a
SAS data step, using comparison operators.
The name of this file is: RECODE.SAS
See SAS Language Reference, Version 6, p 123 for a list of
comparison operators.
********************************************************************/
options linesize=95 pagesize=58;
title;
proc format;
value agefmt 1='Under 6'
2='6 to 9'
3='10 and Older' ;
value $testfmt 'A'='Group A'
'B'='Group B';
data recode;
length school $ 9;
input sex $ 1-3 age 4-6 testgrp $ 8-9 school $ 11-19 ;
if age = 99 then age = . ;
if school = 'NA' then school = ' ' ;
if testgrp = 'NA' then testgrp = ' ' ;
if age ne . then do;
if age lt 6 then agegrp = 1;
if age ge 6 and age lt 10 then agegrp = 2;
if age ge 10 then agegrp = 3;
end;
if agegrp=1 and testgrp='A' then agetest=1;
if agegrp=1 and testgrp='B' then agetest=2;
if agegrp=2 and testgrp='A' then agetest=3;
if agegrp=2 and testgrp='B' then agetest=4;
if agegrp=3 and testgrp='A' then agetest=5;
if agegrp=3 and testgrp='B' then agetest=6;
if school in ('Moore','Bachman') then region = 'East';
else if school = 'White' then region = 'West';
if agegrp in (1,2) then agecat=1;
else if agegrp = 3 then agecat=2;
format agegrp agefmt. testgrp $testfmt. ;
cards;
F 5 B Moore
F 99 B NA
M 7 A Bachman
M 11 B White
M 5 A Bachman
M 10 A Bachman
F 6 B Moore
F 9 NA White
F 8 B Moore
M 4 A Bachman
M 9 B White
F 10 A White
;
proc print data=recode;
title 'PRINTOUT OF DATA WITH FORMATS ASSIGNED TO TESTGRP AND
AGEGRP';
run;
proc print data=recode;
format agegrp testgrp ;
title 'PRINTOUT OF DATA AS IT WAS ORIGINALLY READ INTO
SAS';
run;
proc freq data=recode;
tables school region sex agegrp agecat testgrp agegrp*testgrp
agetest;
title 'FREQUENCY TABLES';
run;
The output from this program is shown below:
PRINTOUT OF DATA WITH FORMATS ASSIGNED TO TESTGRP AND AGEGRP OBS SCHOOL SEX AGE TESTGRP AGEGRP AGETEST REGION AGECAT 1 Moore F 5 Group B Under 6 2 East 1 2 F . Group B . . . 3 Bachman M 7 Group A 6 to 9 3 East 1 4 White M 11 Group B 10 and Older 6 West 2 5 Bachman M 5 Group A Under 6 1 East 1 6 Bachman M 10 Group A 10 and Older 5 East 2 7 Moore F 6 Group B 6 to 9 4 East 1 8 White F 9 6 to 9 . West 1 9 Moore F 8 Group B 6 to 9 4 East 1 10 Bachman M 4 Group A Under 6 1 East 1 11 White M 9 Group B 6 to 9 4 West 1 12 White F 10 Group A 10 and Older 5 West 2 PRINTOUT OF DATA AS IT WAS ORIGINALLY READ INTO SAS OBS SCHOOL SEX AGE TESTGRP AGEGRP AGETEST REGION AGECAT 1 Moore F 5 B 1 2 East 1 2 F . B . . . 3 Bachman M 7 A 2 3 East 1 4 White M 11 B 3 6 West 2 5 Bachman M 5 A 1 1 East 1 6 Bachman M 10 A 3 5 East 2 7 Moore F 6 B 2 4 East 1 8 White F 9 2 . West 1 9 Moore F 8 B 2 4 East 1 10 Bachman M 4 A 1 1 East 1 11 White M 9 B 2 4 West 1 12 White F 10 A 3 5 West 2 FREQUENCY TABLES Cumulative Cumulative SCHOOL Frequency Percent Frequency Percent ----------------------------------------------------- Bachman 4 36.4 4 36.4 Moore 3 27.3 7 63.6 White 4 36.4 11 100.0 Frequency Missing = 1 Cumulative Cumulative REGION Frequency Percent Frequency Percent ---------------------------------------------------- East 7 63.6 7 63.6 West 4 36.4 11 100.0 Frequency Missing = 1 Cumulative Cumulative SEX Frequency Percent Frequency Percent ------------------------------------------------- F 6 50.0 6 50.0 M 6 50.0 12 100.0 Cumulative Cumulative AGEGRP Frequency Percent Frequency Percent ---------------------------------------------------------- Under 6 3 27.3 3 27.3 6 to 9 5 45.5 8 72.7 10 and Older 3 27.3 11 100.0 Frequency Missing = 1 Cumulative Cumulative AGECAT Frequency Percent Frequency Percent ---------------------------------------------------- 1 8 72.7 8 72.7 2 3 27.3 11 100.0 Frequency Missing = 1 FREQUENCY TABLES Cumulative Cumulative TESTGRP Frequency Percent Frequency Percent ----------------------------------------------------- Group A 5 45.5 5 45.5 Group B 6 54.5 11 100.0 Frequency Missing = 1 TABLE OF AGEGRP BY TESTGRP AGEGRP TESTGRP Frequency | Percent | Row Pct | Col Pct |Group A |Group B | Total -------------+--------+--------+ Under 6 | 2 | 1 | 3 | 20.00 | 10.00 | 30.00 | 66.67 | 33.33 | | 40.00 | 20.00 | -------------+--------+--------+ 6 to 9 | 1 | 3 | 4 | 10.00 | 30.00 | 40.00 | 25.00 | 75.00 | | 20.00 | 60.00 | -------------+--------+--------+ 10 and Older | 2 | 1 | 3 | 20.00 | 10.00 | 30.00 | 66.67 | 33.33 | | 40.00 | 20.00 | -------------+--------+--------+ Total 5 5 10 50.00 50.00 100.00 Frequency Missing = 2 FREQUENCY TABLES 13 21:29 Wednesday, May 17, 1995 Cumulative Cumulative AGETEST Frequency Percent Frequency Percent ----------------------------------------------------- 1 2 20.0 2 20.0 2 1 10.0 3 30.0 3 1 10.0 4 40.0 4 3 30.0 7 70.0 5 2 20.0 9 90.0 6 1 10.0 10 100.0 Frequency Missing = 2
An example using dates
The following program uses SAS the SAS date function MDY to calculate the age at interview. Recall that SAS calculates dates as the number of days from January 1, 1960 to the date being used as the argument. If the date is before this time, it will have a negative value, if it is after this time, it will have a positive value. Dates can be displayed using a date format, such as mmddyy8. or simply as a numeric value (with no format). The printout from this program shows both methods.
/******************************************************************* This command file for SAS demonstrates how to use dates in a SAS data step. The name of this file is: DATES.SAS See SAS Language Reference, Version 6, pp 128-131 for information on dates and times in SAS. ********************************************************************/ options linesize=72 pagesize=58; title; data dates; length name $12; input name $ b_mon b_day b_yr int_mon int_day int_yr; if b_day = . then b_day = 15; if int_day = . then int_day = 15; birdate = mdy(b_mon,b_day,b_yr); intdate = mdy(int_mon,int_day,int_yr); intage = int((intdate-birdate)/365); cards; Roger 12 12 84 9 3 94 Samantha 1 20 85 9 15 94 Henry 10 6 83 10 2 94 William 4 17 82 10 5 94 Petra 6 . 83 9 14 94 ; proc print data=dates; title 'printing dates as number of days since Jan 1, 1960'; run; proc print data=dates; format birdate mmddyy8. intdate mmddyy8.; title 'printing dates using date formats'; run;
The output from this program is shown below:
printing dates as number of days since Jan 1, 1960 I I B I N N I I N I B B T T N R T N N _ _ B _ _ T D D T O A M D _ M D _ A A A B M O A Y O A Y T T G S E N Y R N Y R E E E 1 Roger 12 12 84 9 3 94 9112 12664 9 2 Samantha 1 20 85 9 15 94 9151 12676 9 3 Henry 10 6 83 10 2 94 8679 12693 10 4 William 4 17 82 10 5 94 8142 12696 12 5 Petra 6 15 83 9 14 94 8566 12675 11 printing dates using date formats 1 I I B I N N I I N I B B T T N R T N N _ _ B _ _ T D D T O A M D _ M D _ A A A B M O A Y O A Y T T G S E N Y R N Y R E E E 1 Roger 12 12 84 9 3 94 12/12/84 09/03/94 9 2 Samantha 1 20 85 9 15 94 01/20/85 09/15/94 9 3 Henry 10 6 83 10 2 94 10/06/83 10/02/94 10 4 William 4 17 82 10 5 94 04/17/82 10/05/94 12 5 Petra 6 15 83 9 14 94 06/15/83 09/14/94 11