Contents of This Page
- SAS Operators
- Arithmetical Operators
- Logical (Boolean) Operators
- Comparison Operators
- Lists of Variables
- SAS Functions
- Selected SAS Functions
- SAS Statistical Functions
- Code Examples
- Example Using SAS Statistical Functions in
the Data Step
- Sample SAS program using some of the math
functions
- Example using conditionals and formatting
- Example using dates
Extensive definitions and explanations of the rules of
the SAS language are given in the SAS Language Reference,
in Chapter 4. This handout discusses a few of these rules
that are helpful when doing transforms and recodes in the
data step.
SAS Operators
Arithmetic Operators:
SAS arithmetic operators indicate that an arithmetic
operation is performed. The arithmetic operators are
shown below:
Symbol Definition Example
** Exponentiation y=x**2
z=x**y
* Multiplication z=x*y
/ Division z=x/y
+ Addition z=x+y
- Subtraction z=x-y
Note that an asterisk (*) must always be used to
indicate multiplication e.g. y=2*x, Not y=2x, or 2(x). If
one of the operands to an arithmetic operator is missing
the result is missing.
Logical (Boolean) Operators:
Logical or Boolean operators are used in expressions
to link sequences of comparisons. The table below lists
the logical operators and their mnemonic equivalents.
Symbol Mnemonic Equivalent
& AND
| OR
~ NOT*
*Note that the symbol for NOT depends on the terminal
you are using. It is probably safer to use the mnemonic
equivalent, rather than the symbol. The NOT operator can
be used as shown below:
not(name=`SMITH')
is equivalent to
name ne `SMITH'
An example of a SAS expression using a logical
operator would be the following:
if age < 25 and sex = `F' then select=1;
It is possible to use parentheses to help clarify the
logical expression. Be sure that each left parenthesis is
followed by a matching right parenthesis.
if (age < 25) and (sex = `F') then select=1;
Comparison Operators:
The following comparison operators can be written as
symbols or with their mnemonic equivalent.. The
comparison operators can be used in the SAS data step as
part of an if...then; statement.. They can also be used
as part of an if ...then ...do; statement. Comparison
operators may also be used with a WHERE statement in a
Proc, to select cases that will be processed by the
procedure. The operators and their mnemonic equivalents
are shown below.
Symbol Mnemonic Definition
< lt Less than
<= le Less than or equal to
> gt Greater than
>= ge Greater than or equal to
= eq Equal to
~= ne Not equal to
Note that if the symbol is used, it is not necessary
to have blank spaces around it, but if the mnemonic is
used, it must be set off by spaces:
if x<y then group=1;
is equivalent to:
if x < y then group = 1;
is equivalent to:
if x lt y then group eq 1;
The mnemonics may be given in upper or lower case, or
a mixture of cases.
Lists of Variables:
Lists of variables can be given in several ways in
SAS. A list of variables may be given by simply
separating the variables by blanks:
age sex height weight
If the variables in a list all have the same initial
part (root) and the last part of the variable name is an
integer, then you can use a numbered range list. The
numbers must be consecutive and ascending. Note that the
variables do NOT have to be consecutive in the SAS
dataset.
x1-x5
is equivalent to
x1 x2 x3 x4 x5
and
quest1-quest3
is equivalent to
quest1 quest2 quest3
A name range includes variables from the first to the
last inclusive. The variables in the list must be
consecutive in the SAS dataset.
age -- weight
(includes all variables from age to weight)
age-numeric-weight
(includes all numeric variables from age to weight)
age-character-weight
(includes all character variables from age to weight)
Special SAS name lists include
_NUMERIC_
(all numeric variables in the dataset)
_CHARACTER_
(all character variables in the dataset)
_ALL_
(all variables in the dataset)
SAS Functions:
There are many SAS functions that have different uses.
SAS functions return a value from an argument, or series
of arguments. For example, the log function returns the
natural log of the argument. If a function requires more
than one argument, the arguments are separated by commas.
The argument(s) to a function are contained in
parentheses immediately following the function name. The
argument(s) to a function may be either variable names,
or constants, or SAS expressions (e.g. other SAS
functions or mathematical expressions). There are
arithmetic, array, truncation, mathematical,
trigonometric, probability, quantile, sample statistics,
random number, financial, character, date and time, state
and ZIP code, and special functions. For a complete list
of SAS functions by category, see pp 53-57 of the SAS
Language Reference. Detailed descriptions of SAS
functions are in Chapter 11 (p. 521 ff.) of the SAS
Language Reference. SAS functions are used as part of the
DATA step programming statements, and can be used with
certain Statistical procedures.
Selected SAS Functions:
These functions operate on one argument. Note that if
the argument is illegal (such as trying to take the
square root of a negative number), SAS will return a
missing value, and print an error message in the log.
This will not prevent the program from executing,
however.
Function Name Definition Example
abs Absolute Value y = abs(x)
int Integer (takes the integer part of the y = int(x)
argument--like truncation)
log Natural Log y=log(x)
log10 Log Base 10 y=log10(x)
round Rounds to the nearest specified level y=round(x,.01)
Here, it rounds x to the nearest
hundredth.
SAS Statistical Functions:
Statistical functions operate on at least 2 arguments.
They give the result for the non-missing values of the
arguments. For statistical functions, the arguments can
be listed separated by commas, or lists of variables may
be used, if the keyword "of" is included in the
parentheses. Note that these statistical functions give
sample statistics within a case. So, for example,
if you had 3 variables in your file named Wt1, WT2 and
Wt3 that represented 3 measurements of weight that were
made on each individual in the study, you could use the
statistical functions to get the mean of all the weights
for that individual. If you wished to summarize values across
cases, then you would use Proc Means.
Function Name Definition Example
mean Mean of the nonmissing values y=mean (x1,x2,x3)
min Minimum of the nonmissing values y=min(of x1-x3)
max Maximum of the nonmising values y=max(of hem1-hem5)
n The number of nonmissing values y=n(of age--weight)
nmiss The number of missing values y=nmiss(of wt1-wt3)
std Standard deviation of nonmissing y=std(5,6,7,9)
stderr Standard error of the mean of nonmissing y=stderr(of x1-x20)
Note that if the mean function were used to calculate
the mean weight for an individual across 3 measurements
of weight, SAS would return the average of however many
values of weight were nonmissing. For a case that had all
3 weights, SAS would give the average of the 3, but for a
case with only 1 nonmissing weight, SAS would return the
mean of the one value, which would be simply equal to the
value itself.
Code Examples
Example of Using SAS Statistical Functions
in the Data Step:
The following example uses some of the SAS statistical
functions to calculate sample statistics within a case.
/*******************************************************************
This command file for SAS demonstrates how to use SAS statistical
functions. These functions must be used with at least 2 numeric
arguments. They all operate on the nonmissing values. Note that lists
of variables may be used, if the keyword OF is included before the
variable list.
The name of this file is: STATFUNC.SAS
********************************************************************/
options linesize=72 pagesize=58;
title;
data test;
input q1 q2 q3 agemom agedad;
newvar1=n(q1,q2,q3);
newvar2=nmiss(q1,q2,q3);
newvar3=sum(q1,q2,q3);
newvar4=mean(of q1 - q3);
newvar5=std(of q1 - q3);
newvar6=stderr(of q1 - q3);
newvar7=max(of agemom -- agedad);
newvar8=min(of agemom -- agedad);
if n(q1,q2,q3) ge 2 then newvar9=mean(of q1 - q3);
label q1 = 'Question 1';
label q2 = 'Question 2';
label q3 = 'Question 3';
label agemom = 'mother''s age';
label agedad = 'father''s age';
label newvar1 = 'number of nonmissing values for q1,q2,q3';
label newvar2 = 'number of missing values for q1,q2,q3';
label newvar3 = 'sum of nonmissing values of q1,q2,q3';
label newvar4 = 'mean of nonmissing values of q1,q2,q3';
label newvar5 = 'standard deviation of q1,q2,q3';
label newvar6 = 'standard error of mean of q1,q2,q3';
label newvar7 = 'maximum of values of agemom and agedad';
label newvar8 = 'minimum of values of agemom and agedad';
label newvar9 = 'mean of q1,q2,q3 if 2 or more nonmissing';
cards;
2 2 . 35 37
1 1 2 22 .
2 1 3 34 38
. . 2 28 26
The output from this program is shown below:
number of
nonmissing
Question Question Question mother's father's values for
OBS 1 2 3 age age q1,q2,q3
1 2 2 . 35 37 2
2 1 1 2 22 . 3
3 2 1 3 34 38 3
4 . . 2 28 26 1
5 1 2 2 29 30 3
6 1 2 3 26 29 3
7 2 1 1 27 27 3
8 . 2 2 . . 2
9 . . . 33 36 0
number of sum of mean of standard
missing nonmissing nonmissing standard error of
values for values of values of deviation of mean of
OBS q1,q2,q3 q1,q2,q3 q1,q2,q3 q1,q2,q3 q1,q2,q3
1 1 4 2.00000 0.00000 0.00000
2 0 4 1.33333 0.57735 0.33333
3 0 6 2.00000 1.00000 0.57735
4 2 2 2.00000 . .
5 0 5 1.66667 0.57735 0.33333
6 0 6 2.00000 1.00000 0.57735
7 0 4 1.33333 0.57735 0.33333
8 1 4 2.00000 0.00000 0.00000
9 3 . . . .
maximum of minimum of mean of
values of values of q1,q2,q3 if
agemom and agemom and 2 or more
OBS agedad agedad nonmissing
1 37 35 2.00000
2 22 22 1.33333
3 38 34 2.00000
4 28 26 .
5 30 29 1.66667
6 29 26 2.00000
7 27 27 1.33333
8 . . 2.00000
9 36 33 .
TABLE OF Q2 BY Q3
Q2(Question 2) Q3(Question 3)
Frequency|
Percent |
Row Pct |
Col Pct | 1| 2| 3| Total
---------+--------+--------+--------+
1 | 1 | 1 | 1 | 3
| 16.67 | 16.67 | 16.67 | 50.00
| 33.33 | 33.33 | 33.33 |
| 100.00 | 33.33 | 50.00 |
---------+--------+--------+--------+
2 | 0 | 2 | 1 | 3
| 0.00 | 33.33 | 16.67 | 50.00
| 0.00 | 66.67 | 33.33 |
| 0.00 | 66.67 | 50.00 |
---------+--------+--------+--------+
Total 1 3 2 6
16.67 50.00 33.33 100.00
Frequency Missing = 3
A sample SAS program using some of the Math
functions is shown below:
/*******************************************************************
This command file for SAS demonstrates how to use some SAS math
functions, and arithmetic operators. Note that when either argument
is missing for the arithmetic operator, the result will be
missing.This differs from the result for the stat functions, which
operate on all nonmissing values.
The name of this file is: MATHFUNC.SAS
See SAS Language Reference, Version 6, p 122 for math operators,
pp 521-616 for functions.
********************************************************************/
The output from this run is shown below:
OBS X Y ABSX SQRTX LOG10Y LNY INT_Y ROUNDY 1 4 5.230 4
2.00000 0.71850 1.65441 5 5.2 2 -15 22.000 15 . 1.34242
3.09104 22 22.0 3 . 18.510 . . 1.26741 2.91831 18 18.5 4
-1 3.000 1 . 0.47712 1.09861 3 3.0 5 6 0.000 6 2.44949 .
. 0 0.0 6 5 5.035 5 2.23607 0.70200 1.61641 5 5.0 OBS
MULT DIVIDE EXPON TOT1 DIFF TOT2 1 20.920 0.76482 1408.55
9.230 -1.230 9.230 2 -330.000 -0.68182 7.48183E25 7.000
-37.000 7.000 3 . . . . . 18.510 4 -3.000 -0.33333 -1.00
2.000 -4.000 2.000 5 0.000 . 1.00 6.000 6.000 6.000 6
25.175 0.99305 3306.08 10.035 -0.035 10.035 Variable N
Mean Std Dev Minimum Maximum
-------------------------------------------------------------------
ABSX 5 6.2000000 5.2630789 1.0000000 15.0000000 SQRTX 3
2.2285192 0.2248399 2.0000000 2.4494897 LOG10Y 5
0.9014903 0.3813419 0.4771213 1.3424227 LNY 5 2.0757581
0.8780721 1.0986123 3.0910425 INT_Y 6 8.8333333 8.9312187
0 22.0000000 ROUNDY 6 8.9500000 9.0185919 0 22.0000000
MULT 5 -57.3810000 152.9031811 -330.0000000 25.1750000
DIVIDE 4 0.1856789 0.8183669 -0.6818182 0.9930487 EXPON 5
1.4963655E25 3.345975E25 -1.0000000 7.4818276E25 TOT1 5
6.8530000 3.1652836 2.0000000 10.0350000 DIFF 5
-7.2530000 17.0255990 -37.0000000 6.0000000 TOT2 6
8.7958333 5.5374023 2.0000000 18.5100000
-------------------------------------------------------------------
Example using conditionals and formatting
This next command file shows how to do some recodes in
SAS that involve conditional if ...then statements. It
also shows how to use formats to set up values for both
numeric and character variables. Note that the formats
may be eliminated in the proc print by giving a format
statement for the variables desired and then specifying a
null format.
/*******************************************************************
This command file for SAS demonstrates how to do some recodes in a
SAS data step, using comparison operators.
The name of this file is: RECODE.SAS
See SAS Language Reference, Version 6, p 123 for a list of
comparison operators.
********************************************************************/
options linesize=95 pagesize=58;
title;
proc format;
value agefmt 1='Under 6'
2='6 to 9'
3='10 and Older' ;
value $testfmt 'A'='Group A'
'B'='Group B';
data recode;
length school $ 9;
input sex $ 1-3 age 4-6 testgrp $ 8-9 school $ 11-19 ;
if age = 99 then age = . ;
if school = 'NA' then school = ' ' ;
if testgrp = 'NA' then testgrp = ' ' ;
if age ne . then do;
if age lt 6 then agegrp = 1;
if age ge 6 and age lt 10 then agegrp = 2;
if age ge 10 then agegrp = 3;
end;
if agegrp=1 and testgrp='A' then agetest=1;
if agegrp=1 and testgrp='B' then agetest=2;
if agegrp=2 and testgrp='A' then agetest=3;
if agegrp=2 and testgrp='B' then agetest=4;
if agegrp=3 and testgrp='A' then agetest=5;
if agegrp=3 and testgrp='B' then agetest=6;
if school in ('Moore','Bachman') then region = 'East';
else if school = 'White' then region = 'West';
if agegrp in (1,2) then agecat=1;
else if agegrp = 3 then agecat=2;
format agegrp agefmt. testgrp $testfmt. ;
cards;
F 5 B Moore
F 99 B NA
M 7 A Bachman
M 11 B White
M 5 A Bachman
M 10 A Bachman
F 6 B Moore
F 9 NA White
F 8 B Moore
M 4 A Bachman
M 9 B White
F 10 A White
;
proc print data=recode;
title 'PRINTOUT OF DATA WITH FORMATS ASSIGNED TO TESTGRP AND
AGEGRP';
run;
proc print data=recode;
format agegrp testgrp ;
title 'PRINTOUT OF DATA AS IT WAS ORIGINALLY READ INTO
SAS';
run;
proc freq data=recode;
tables school region sex agegrp agecat testgrp agegrp*testgrp
agetest;
title 'FREQUENCY TABLES';
run;
The output from this program is shown below:
PRINTOUT OF DATA WITH FORMATS ASSIGNED TO TESTGRP AND AGEGRP
OBS SCHOOL SEX AGE TESTGRP AGEGRP AGETEST REGION AGECAT
1 Moore F 5 Group B Under 6 2 East 1
2 F . Group B . . .
3 Bachman M 7 Group A 6 to 9 3 East 1
4 White M 11 Group B 10 and Older 6 West 2
5 Bachman M 5 Group A Under 6 1 East 1
6 Bachman M 10 Group A 10 and Older 5 East 2
7 Moore F 6 Group B 6 to 9 4 East 1
8 White F 9 6 to 9 . West 1
9 Moore F 8 Group B 6 to 9 4 East 1
10 Bachman M 4 Group A Under 6 1 East 1
11 White M 9 Group B 6 to 9 4 West 1
12 White F 10 Group A 10 and Older 5 West 2
PRINTOUT OF DATA AS IT WAS ORIGINALLY READ INTO SAS
OBS SCHOOL SEX AGE TESTGRP AGEGRP AGETEST REGION AGECAT
1 Moore F 5 B 1 2 East 1
2 F . B . . .
3 Bachman M 7 A 2 3 East 1
4 White M 11 B 3 6 West 2
5 Bachman M 5 A 1 1 East 1
6 Bachman M 10 A 3 5 East 2
7 Moore F 6 B 2 4 East 1
8 White F 9 2 . West 1
9 Moore F 8 B 2 4 East 1
10 Bachman M 4 A 1 1 East 1
11 White M 9 B 2 4 West 1
12 White F 10 A 3 5 West 2
FREQUENCY TABLES
Cumulative Cumulative
SCHOOL Frequency Percent Frequency Percent
-----------------------------------------------------
Bachman 4 36.4 4 36.4
Moore 3 27.3 7 63.6
White 4 36.4 11 100.0
Frequency Missing = 1
Cumulative Cumulative
REGION Frequency Percent Frequency Percent
----------------------------------------------------
East 7 63.6 7 63.6
West 4 36.4 11 100.0
Frequency Missing = 1
Cumulative Cumulative
SEX Frequency Percent Frequency Percent
-------------------------------------------------
F 6 50.0 6 50.0
M 6 50.0 12 100.0
Cumulative Cumulative
AGEGRP Frequency Percent Frequency Percent
----------------------------------------------------------
Under 6 3 27.3 3 27.3
6 to 9 5 45.5 8 72.7
10 and Older 3 27.3 11 100.0
Frequency Missing = 1
Cumulative Cumulative
AGECAT Frequency Percent Frequency Percent
----------------------------------------------------
1 8 72.7 8 72.7
2 3 27.3 11 100.0
Frequency Missing = 1
FREQUENCY TABLES
Cumulative Cumulative
TESTGRP Frequency Percent Frequency Percent
-----------------------------------------------------
Group A 5 45.5 5 45.5
Group B 6 54.5 11 100.0
Frequency Missing = 1
TABLE OF AGEGRP BY TESTGRP
AGEGRP TESTGRP
Frequency |
Percent |
Row Pct |
Col Pct |Group A |Group B | Total
-------------+--------+--------+
Under 6 | 2 | 1 | 3
| 20.00 | 10.00 | 30.00
| 66.67 | 33.33 |
| 40.00 | 20.00 |
-------------+--------+--------+
6 to 9 | 1 | 3 | 4
| 10.00 | 30.00 | 40.00
| 25.00 | 75.00 |
| 20.00 | 60.00 |
-------------+--------+--------+
10 and Older | 2 | 1 | 3
| 20.00 | 10.00 | 30.00
| 66.67 | 33.33 |
| 40.00 | 20.00 |
-------------+--------+--------+
Total 5 5 10
50.00 50.00 100.00
Frequency Missing = 2
FREQUENCY TABLES
13
21:29 Wednesday, May 17,
1995
Cumulative Cumulative
AGETEST Frequency Percent Frequency Percent
-----------------------------------------------------
1 2 20.0 2 20.0
2 1 10.0 3 30.0
3 1 10.0 4 40.0
4 3 30.0 7 70.0
5 2 20.0 9 90.0
6 1 10.0 10 100.0
Frequency Missing = 2
An example using dates
The following program uses SAS the SAS date function
MDY to calculate the age at interview. Recall that SAS
calculates dates as the number of days from January 1,
1960 to the date being used as the argument. If the date
is before this time, it will have a negative value, if it
is after this time, it will have a positive value. Dates
can be displayed using a date format, such as mmddyy8. or
simply as a numeric value (with no format). The printout
from this program shows both methods.
/*******************************************************************
This command file for SAS demonstrates how to use dates in a SAS
data step.
The name of this file is: DATES.SAS
See SAS Language Reference, Version 6, pp 128-131 for information
on dates and times in SAS.
********************************************************************/
options linesize=72 pagesize=58;
title;
data dates;
length name $12;
input name $ b_mon b_day b_yr int_mon int_day int_yr;
if b_day = . then b_day = 15;
if int_day = . then int_day = 15;
birdate = mdy(b_mon,b_day,b_yr);
intdate = mdy(int_mon,int_day,int_yr);
intage = int((intdate-birdate)/365);
cards;
Roger 12 12 84 9 3 94
Samantha 1 20 85 9 15 94
Henry 10 6 83 10 2 94
William 4 17 82 10 5 94
Petra 6 . 83 9 14 94
;
proc print data=dates;
title 'printing dates as number of days since Jan 1, 1960';
run;
proc print data=dates;
format birdate mmddyy8. intdate mmddyy8.;
title 'printing dates using date formats';
run;
The output from this program is shown below:
printing dates as number of days since Jan 1, 1960
I I B I
N N I I N I
B B T T N R T N
N _ _ B _ _ T D D T
O A M D _ M D _ A A A
B M O A Y O A Y T T G
S E N Y R N Y R E E E
1 Roger 12 12 84 9 3 94 9112 12664 9
2 Samantha 1 20 85 9 15 94 9151 12676 9
3 Henry 10 6 83 10 2 94 8679 12693 10
4 William 4 17 82 10 5 94 8142 12696 12
5 Petra 6 15 83 9 14 94 8566 12675 11
printing dates using date formats
1
I I B I
N N I I N I
B B T T N R T N
N _ _ B _ _ T D D T
O A M D _ M D _ A A A
B M O A Y O A Y T T G
S E N Y R N Y R E E E
1 Roger 12 12 84 9 3 94 12/12/84 09/03/94 9
2 Samantha 1 20 85 9 15 94 01/20/85 09/15/94 9
3 Henry 10 6 83 10 2 94 10/06/83 10/02/94 10
4 William 4 17 82 10 5 94 04/17/82 10/05/94 12
5 Petra 6 15 83 9 14 94 06/15/83 09/14/94 11
|