-- -- CSCAR  
-- --   center for statistical consultation and research  
School of Social Work
--
about us
about us
location
workshops
software help
external resources
spatial
jobs
contact
search
--
The Center For Statistical Consultation and Research
3550 Rackham Building
University of Michigan
Ann Arbor, MI 48109-1070
cscar@umich.edu
.
Software Help
 
SAS Topics: SAS Language and Functions for Simple Data Transforms and Recodes
Contents of This Page
  1. SAS Operators

    1. Arithmetical Operators

    2. Logical (Boolean) Operators

    3. Comparison Operators

  2. Lists of Variables

  3. SAS Functions

    1. Selected SAS Functions

    2. SAS Statistical Functions

  4. Code Examples

    1. Example Using SAS Statistical Functions in the Data Step

    2. Sample SAS program using some of the math functions

    3. Example using conditionals and formatting

    4. Example using dates


Extensive definitions and explanations of the rules of the SAS language are given in the SAS Language Reference, in Chapter 4. This handout discusses a few of these rules that are helpful when doing transforms and recodes in the data step.


SAS Operators Arithmetic Operators:

SAS arithmetic operators indicate that an arithmetic operation is performed. The arithmetic operators are shown below:


Symbol			Definition			Example
**			Exponentiation	          	y=x**2
				z=x**y

*			Multiplication			z=x*y

/			Division			z=x/y

+			Addition			z=x+y

-			Subtraction			z=x-y

Note that an asterisk (*) must always be used to indicate multiplication e.g. y=2*x, Not y=2x, or 2(x). If one of the operands to an arithmetic operator is missing the result is missing.

Logical (Boolean) Operators:

Logical or Boolean operators are used in expressions to link sequences of comparisons. The table below lists the logical operators and their mnemonic equivalents.


Symbol			Mnemonic Equivalent
&			AND

|			OR

~			NOT*

*Note that the symbol for NOT depends on the terminal you are using. It is probably safer to use the mnemonic equivalent, rather than the symbol. The NOT operator can be used as shown below:

not(name=`SMITH')

is equivalent to

name ne `SMITH'

An example of a SAS expression using a logical operator would be the following:

if age < 25 and sex = `F' then select=1;

It is possible to use parentheses to help clarify the logical expression. Be sure that each left parenthesis is followed by a matching right parenthesis.

if (age < 25) and (sex = `F') then select=1;
Comparison Operators:

The following comparison operators can be written as symbols or with their mnemonic equivalent.. The comparison operators can be used in the SAS data step as part of an if...then; statement.. They can also be used as part of an if ...then ...do; statement. Comparison operators may also be used with a WHERE statement in a Proc, to select cases that will be processed by the procedure. The operators and their mnemonic equivalents are shown below.


Symbol		       Mnemonic		Definition
<				lt			Less than

<=				le			Less than or equal to

>				gt			Greater than

>=				ge			Greater than or equal to

=					eq			Equal to

~=					ne			Not equal to

Note that if the symbol is used, it is not necessary to have blank spaces around it, but if the mnemonic is used, it must be set off by spaces:

if x<y then group=1;

is equivalent to:

if x < y then group = 1;

is equivalent to:

if x lt y then group eq 1;

The mnemonics may be given in upper or lower case, or a mixture of cases.


Lists of Variables:

Lists of variables can be given in several ways in SAS. A list of variables may be given by simply separating the variables by blanks:

age sex height weight

If the variables in a list all have the same initial part (root) and the last part of the variable name is an integer, then you can use a numbered range list. The numbers must be consecutive and ascending. Note that the variables do NOT have to be consecutive in the SAS dataset.

x1-x5

is equivalent to

x1 x2 x3 x4 x5

and

quest1-quest3

is equivalent to

quest1 quest2 quest3

A name range includes variables from the first to the last inclusive. The variables in the list must be consecutive in the SAS dataset.

age -- weight

(includes all variables from age to weight)

age-numeric-weight

(includes all numeric variables from age to weight)

age-character-weight

(includes all character variables from age to weight)

Special SAS name lists include
_NUMERIC_

(all numeric variables in the dataset)

_CHARACTER_

(all character variables in the dataset)

_ALL_

(all variables in the dataset)


SAS Functions:

There are many SAS functions that have different uses. SAS functions return a value from an argument, or series of arguments. For example, the log function returns the natural log of the argument. If a function requires more than one argument, the arguments are separated by commas. The argument(s) to a function are contained in parentheses immediately following the function name. The argument(s) to a function may be either variable names, or constants, or SAS expressions (e.g. other SAS functions or mathematical expressions). There are arithmetic, array, truncation, mathematical, trigonometric, probability, quantile, sample statistics, random number, financial, character, date and time, state and ZIP code, and special functions. For a complete list of SAS functions by category, see pp 53-57 of the SAS Language Reference. Detailed descriptions of SAS functions are in Chapter 11 (p. 521 ff.) of the SAS Language Reference. SAS functions are used as part of the DATA step programming statements, and can be used with certain Statistical procedures.

Selected SAS Functions:

These functions operate on one argument. Note that if the argument is illegal (such as trying to take the square root of a negative number), SAS will return a missing value, and print an error message in the log. This will not prevent the program from executing, however.


Function Name	   Definition					Example
abs		   Absolute Value				y = abs(x)
int		   Integer (takes the integer part of the 	y = int(x)
			argument--like truncation)
log		   Natural Log					y=log(x)
log10		   Log Base 10					y=log10(x)
round		   Rounds to the nearest specified level  	y=round(x,.01)
		Here, it rounds x to the nearest
				hundredth.
SAS Statistical Functions:

Statistical functions operate on at least 2 arguments. They give the result for the non-missing values of the arguments. For statistical functions, the arguments can be listed separated by commas, or lists of variables may be used, if the keyword "of" is included in the parentheses. Note that these statistical functions give sample statistics within a case. So, for example, if you had 3 variables in your file named Wt1, WT2 and Wt3 that represented 3 measurements of weight that were made on each individual in the study, you could use the statistical functions to get the mean of all the weights for that individual. If you wished to summarize values across cases, then you would use Proc Means.


Function Name	Definition					Example
mean		Mean of the nonmissing values			y=mean (x1,x2,x3)
min		Minimum of the nonmissing values		y=min(of x1-x3)
max		Maximum of the nonmising values			y=max(of hem1-hem5)
n		The number of nonmissing values			y=n(of age--weight)
nmiss		The number of missing values			y=nmiss(of wt1-wt3)
std		Standard deviation of nonmissing		y=std(5,6,7,9)
stderr		Standard error of the mean of nonmissing	y=stderr(of x1-x20)

Note that if the mean function were used to calculate the mean weight for an individual across 3 measurements of weight, SAS would return the average of however many values of weight were nonmissing. For a case that had all 3 weights, SAS would give the average of the 3, but for a case with only 1 nonmissing weight, SAS would return the mean of the one value, which would be simply equal to the value itself.


Code Examples Example of Using SAS Statistical Functions in the Data Step:

The following example uses some of the SAS statistical functions to calculate sample statistics within a case.


/*******************************************************************
This command file for SAS demonstrates how to use SAS statistical
functions. These functions must be used with at least 2 numeric
arguments. They all operate on the nonmissing values. Note that lists
of variables may be used, if the keyword OF is included before the
variable list.

The name of this file is:  STATFUNC.SAS
********************************************************************/

options linesize=72 pagesize=58;
title;

data test;

input q1 q2 q3 agemom agedad;

newvar1=n(q1,q2,q3);
newvar2=nmiss(q1,q2,q3);
newvar3=sum(q1,q2,q3);
newvar4=mean(of q1 - q3);
newvar5=std(of q1 - q3);
newvar6=stderr(of q1 - q3);
newvar7=max(of agemom -- agedad);
newvar8=min(of agemom -- agedad);

if n(q1,q2,q3) ge 2 then newvar9=mean(of q1 - q3);


label q1 = 'Question 1';
label q2 = 'Question 2';
label q3 = 'Question 3';
label agemom = 'mother''s age';
label agedad = 'father''s age';
label newvar1 = 'number of nonmissing values for q1,q2,q3';
label newvar2 = 'number of missing values for q1,q2,q3';
label newvar3 = 'sum of nonmissing values of q1,q2,q3';
label newvar4 = 'mean of nonmissing values of q1,q2,q3';
label newvar5 = 'standard deviation of q1,q2,q3';
label newvar6 = 'standard error of mean of q1,q2,q3';
label newvar7 = 'maximum of values of agemom and agedad';
label newvar8 = 'minimum of values of agemom and agedad';
label newvar9 = 'mean of q1,q2,q3 if 2 or more nonmissing';


cards;
2 2 . 35 37
1 1 2 22  .
2 1 3 34 38
. . 2 28 26



The output from this program is shown below:

number of
nonmissing
Question   Question   Question   mother's   father's   values for
OBS       1          2          3         age        age      q1,q2,q3
1        2          2          .         35         37           2
2        1          1          2         22          .           3
3        2          1          3         34         38           3
4        .          .          2         28         26           1
5        1          2          2         29         30           3
6        1          2          3         26         29           3
7        2          1          1         27         27           3
8        .          2          2          .          .           2
9        .          .          .         33         36           0

number of     sum of       mean of                   standard
missing    nonmissing   nonmissing     standard     error of
values for    values of    values of   deviation of    mean of
OBS    q1,q2,q3     q1,q2,q3     q1,q2,q3      q1,q2,q3     q1,q2,q3
1         1            4         2.00000       0.00000      0.00000
2         0            4         1.33333       0.57735      0.33333
3         0            6         2.00000       1.00000      0.57735
4         2            2         2.00000        .            .
5         0            5         1.66667       0.57735      0.33333
6         0            6         2.00000       1.00000      0.57735
7         0            4         1.33333       0.57735      0.33333
8         1            4         2.00000       0.00000      0.00000
9         3            .          .             .            .

maximum of    minimum of      mean of
values of     values of    q1,q2,q3 if
agemom and    agemom and     2 or more
OBS     agedad        agedad       nonmissing
1        37            35          2.00000
2        22            22          1.33333
3        38            34          2.00000
4        28            26           .
5        30            29          1.66667
6        29            26          2.00000
7        27            27          1.33333
8         .             .          2.00000
9        36            33           .
TABLE OF Q2 BY Q3

Q2(Question 2)     Q3(Question 3)

Frequency|
Percent  |
Row Pct  |
Col Pct  |       1|       2|       3|  Total
---------+--------+--------+--------+
	 1 |      1 |      1 |      1 |      3
	   |  16.67 |  16.67 |  16.67 |  50.00
	   |  33.33 |  33.33 |  33.33 |
	   | 100.00 |  33.33 |  50.00 |
---------+--------+--------+--------+
	 2 |      0 |      2 |      1 |      3
	   |   0.00 |  33.33 |  16.67 |  50.00
	   |   0.00 |  66.67 |  33.33 |
	   |   0.00 |  66.67 |  50.00 |
---------+--------+--------+--------+
Total           1        3        2        6
		  16.67    50.00    33.33   100.00

	  Frequency Missing = 3
A sample SAS program using some of the Math functions is shown below:
/*******************************************************************
This command file for SAS demonstrates how to use some SAS math
functions, and arithmetic operators. Note that when either argument
is missing for the arithmetic operator, the result will be
missing.This differs from the result for the stat functions, which
operate on all nonmissing values.

The name of this file is:  MATHFUNC.SAS
See SAS Language Reference, Version 6, p 122 for math operators,
pp 521-616 for functions.
********************************************************************/

The output from this run is shown below:

OBS X Y ABSX SQRTX LOG10Y LNY INT_Y ROUNDY 1 4 5.230 4 2.00000 0.71850 1.65441 5 5.2 2 -15 22.000 15 . 1.34242 3.09104 22 22.0 3 . 18.510 . . 1.26741 2.91831 18 18.5 4 -1 3.000 1 . 0.47712 1.09861 3 3.0 5 6 0.000 6 2.44949 . . 0 0.0 6 5 5.035 5 2.23607 0.70200 1.61641 5 5.0 OBS MULT DIVIDE EXPON TOT1 DIFF TOT2 1 20.920 0.76482 1408.55 9.230 -1.230 9.230 2 -330.000 -0.68182 7.48183E25 7.000 -37.000 7.000 3 . . . . . 18.510 4 -3.000 -0.33333 -1.00 2.000 -4.000 2.000 5 0.000 . 1.00 6.000 6.000 6.000 6 25.175 0.99305 3306.08 10.035 -0.035 10.035 Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------- ABSX 5 6.2000000 5.2630789 1.0000000 15.0000000 SQRTX 3 2.2285192 0.2248399 2.0000000 2.4494897 LOG10Y 5 0.9014903 0.3813419 0.4771213 1.3424227 LNY 5 2.0757581 0.8780721 1.0986123 3.0910425 INT_Y 6 8.8333333 8.9312187 0 22.0000000 ROUNDY 6 8.9500000 9.0185919 0 22.0000000 MULT 5 -57.3810000 152.9031811 -330.0000000 25.1750000 DIVIDE 4 0.1856789 0.8183669 -0.6818182 0.9930487 EXPON 5 1.4963655E25 3.345975E25 -1.0000000 7.4818276E25 TOT1 5 6.8530000 3.1652836 2.0000000 10.0350000 DIFF 5 -7.2530000 17.0255990 -37.0000000 6.0000000 TOT2 6 8.7958333 5.5374023 2.0000000 18.5100000 -------------------------------------------------------------------

Example using conditionals and formatting

This next command file shows how to do some recodes in SAS that involve conditional if ...then statements. It also shows how to use formats to set up values for both numeric and character variables. Note that the formats may be eliminated in the proc print by giving a format statement for the variables desired and then specifying a null format.

/*******************************************************************
  This command file for SAS demonstrates how to do some recodes in a
  SAS data step, using comparison operators.

  The name of this file is:  RECODE.SAS
  See SAS Language Reference, Version 6, p 123 for a list of
  comparison operators.
********************************************************************/

options linesize=95 pagesize=58;
title;

proc format;

  value agefmt  1='Under 6'
				2='6 to 9'
				3='10 and Older' ;

  value $testfmt 'A'='Group A'
				 'B'='Group B';


data recode;

  length school $ 9;
  input sex $ 1-3 age  4-6 testgrp $  8-9 school $ 11-19 ;

  if age = 99 then age = . ;

  if school = 'NA' then school = ' ' ;

  if testgrp = 'NA' then testgrp = ' ' ;

  if age ne . then do;
	 if age lt 6 then agegrp = 1;
	 if age ge 6  and age lt 10 then agegrp = 2;
	 if age ge 10 then agegrp = 3;
  end;

  if agegrp=1 and testgrp='A' then agetest=1;
  if agegrp=1 and testgrp='B' then agetest=2;
  if agegrp=2 and testgrp='A' then agetest=3;
  if agegrp=2 and testgrp='B' then agetest=4;
  if agegrp=3 and testgrp='A' then agetest=5;
  if agegrp=3 and testgrp='B' then agetest=6;

  if school in ('Moore','Bachman') then region = 'East';
  else if school = 'White' then region = 'West';

  if agegrp in (1,2) then agecat=1;
  else if agegrp = 3 then agecat=2;

  format agegrp agefmt. testgrp $testfmt. ;

  cards;
  F 5  B   Moore
  F 99 B   NA
  M 7  A   Bachman
  M 11 B   White
  M 5  A   Bachman
  M 10 A   Bachman
  F 6  B   Moore
  F 9  NA  White
  F 8  B   Moore
  M 4  A   Bachman
  M 9  B   White
  F 10 A   White
  ;

proc print data=recode;
  title 'PRINTOUT OF DATA WITH FORMATS ASSIGNED TO TESTGRP AND
AGEGRP';
run;

proc print data=recode;
  format agegrp testgrp ;
  title 'PRINTOUT OF DATA AS IT WAS ORIGINALLY READ INTO
SAS';
run;

proc freq data=recode;
	 tables school region sex agegrp agecat testgrp agegrp*testgrp
agetest;
	 title 'FREQUENCY TABLES';
run;

The output from this program is shown below:

  PRINTOUT OF DATA WITH FORMATS ASSIGNED TO TESTGRP AND AGEGRP


OBS  SCHOOL   SEX  AGE  TESTGRP     AGEGRP     AGETEST  REGION  AGECAT

1  Moore     F     5  Group B  Under 6          2      East      1
2            F     .  Group B             .     .                .
3  Bachman   M     7  Group A  6 to 9           3      East      1
4  White     M    11  Group B  10 and Older     6      West      2
5  Bachman   M     5  Group A  Under 6          1      East      1
6  Bachman   M    10  Group A  10 and Older     5      East      2
7  Moore     F     6  Group B  6 to 9           4      East      1
8  White     F     9           6 to 9           .      West      1
9  Moore     F     8  Group B  6 to 9           4      East      1
10  Bachman   M     4  Group A  Under 6          1      East      1
11  White     M     9  Group B  6 to 9           4      West      1
12  White     F    10  Group A  10 and Older     5      West      2

	  PRINTOUT OF DATA AS IT WAS ORIGINALLY READ INTO SAS


OBS   SCHOOL    SEX   AGE   TESTGRP   AGEGRP   AGETEST   REGION   AGECAT

1   Moore      F      5      B         1        2       East       1
2              F      .      B         .        .                  .
3   Bachman    M      7      A         2        3       East       1
4   White      M     11      B         3        6       West       2
5   Bachman    M      5      A         1        1       East       1
6   Bachman    M     10      A         3        5       East       2
7   Moore      F      6      B         2        4       East       1
8   White      F      9                2        .       West       1
9   Moore      F      8      B         2        4       East       1
10   Bachman    M      4      A         1        1       East       1
11   White      M      9      B         2        4       West       1
12   White      F     10      A         3        5       West       2

						FREQUENCY TABLES

									Cumulative  Cumulative
	 SCHOOL    Frequency   Percent   Frequency    Percent
	 -----------------------------------------------------
	 Bachman          4      36.4           4       36.4
	 Moore            3      27.3           7       63.6
	 White            4      36.4          11      100.0

					 Frequency Missing = 1




									Cumulative  Cumulative
	  REGION   Frequency   Percent   Frequency    Percent
	  ----------------------------------------------------
	  East            7      63.6           7       63.6
	  West            4      36.4          11      100.0

					 Frequency Missing = 1




								  Cumulative  Cumulative
	   SEX   Frequency   Percent   Frequency    Percent
	   -------------------------------------------------
	   F            6      50.0           6       50.0
	   M            6      50.0          12      100.0




									   Cumulative  Cumulative
		 AGEGRP   Frequency   Percent   Frequency    Percent
   ----------------------------------------------------------
   Under 6               3      27.3           3       27.3
   6 to 9                5      45.5           8       72.7
   10 and Older          3      27.3          11      100.0

					 Frequency Missing = 1




									Cumulative  Cumulative
	  AGECAT   Frequency   Percent   Frequency    Percent
	  ----------------------------------------------------
		   1          8      72.7           8       72.7
		   2          3      27.3          11      100.0

					 Frequency Missing = 1


						FREQUENCY TABLES

									Cumulative  Cumulative
	 TESTGRP   Frequency   Percent   Frequency    Percent
	 -----------------------------------------------------
	 Group A          5      45.5           5       45.5
	 Group B          6      54.5          11      100.0

					 Frequency Missing = 1


				   TABLE OF AGEGRP BY TESTGRP

			AGEGRP        TESTGRP

			Frequency    |
			Percent      |
			Row Pct      |
			Col Pct      |Group A |Group B |  Total
			-------------+--------+--------+
			Under 6      |      2 |      1 |      3
						 |  20.00 |  10.00 |  30.00
						 |  66.67 |  33.33 |
						 |  40.00 |  20.00 |
			-------------+--------+--------+
			6 to 9       |      1 |      3 |      4
						 |  10.00 |  30.00 |  40.00
						 |  25.00 |  75.00 |
						 |  20.00 |  60.00 |
			-------------+--------+--------+
			10 and Older |      2 |      1 |      3
						 |  20.00 |  10.00 |  30.00
						 |  66.67 |  33.33 |
						 |  40.00 |  20.00 |
			-------------+--------+--------+
			Total               5        5       10
							50.00    50.00   100.00

			Frequency Missing = 2



						FREQUENCY TABLES
13
									   21:29 Wednesday, May 17,
1995

									Cumulative  Cumulative
	 AGETEST   Frequency   Percent   Frequency    Percent
	 -----------------------------------------------------
		   1          2      20.0           2       20.0
		   2          1      10.0           3       30.0
		   3          1      10.0           4       40.0
		   4          3      30.0           7       70.0
		   5          2      20.0           9       90.0
		   6          1      10.0          10      100.0

					 Frequency Missing = 2
An example using dates

The following program uses SAS the SAS date function MDY to calculate the age at interview. Recall that SAS calculates dates as the number of days from January 1, 1960 to the date being used as the argument. If the date is before this time, it will have a negative value, if it is after this time, it will have a positive value. Dates can be displayed using a date format, such as mmddyy8. or simply as a numeric value (with no format). The printout from this program shows both methods.

/*******************************************************************
  This command file for SAS demonstrates how to use dates in a SAS
  data step.

  The name of this file is:  DATES.SAS
  See SAS Language Reference, Version 6, pp 128-131 for information
  on dates and times in SAS.
********************************************************************/

options linesize=72 pagesize=58;
title;

data dates;

  length name $12;
  input name $ b_mon b_day b_yr int_mon int_day int_yr;

  if b_day = . then b_day = 15;
  if int_day = . then int_day = 15;

  birdate = mdy(b_mon,b_day,b_yr);
  intdate = mdy(int_mon,int_day,int_yr);
  intage = int((intdate-birdate)/365);

  cards;

  Roger    12 12 84  9 3  94
  Samantha 1  20 85  9 15 94
  Henry    10 6  83  10 2 94
  William  4  17 82  10 5 94
  Petra    6  .  83  9 14 94
  ;

proc print data=dates;
  title 'printing dates as number of days since Jan 1, 1960';

run;
proc print data=dates;
  format birdate mmddyy8. intdate mmddyy8.;
  title 'printing dates using date formats';
run;

The output from this program is shown below:

printing dates as number of days since Jan 1, 1960


					I     I            B       I
					N     N     I      I       N       I
	B     B           T     T     N      R       T       N
N         _     _     B     _     _     T      D       D       T
O       A         M     D     _     M     D     _      A       A       A
B       M         O     A     Y     O     A     Y      T       T       G
S       E         N     Y     R     N     Y     R      E       E       E

1    Roger       12    12    84     9     3    94    9112    12664     9
2    Samantha     1    20    85     9    15    94    9151    12676     9
3    Henry       10     6    83    10     2    94    8679    12693    10
4    William      4    17    82    10     5    94    8142    12696    12
5    Petra        6    15    83     9    14    94    8566    12675    11


				   printing dates using date formats
1

								I    I               B          I
								N    N    I          I          N       I
				 B    B         T    T    N          R          T       N
		N        _    _    B    _    _    T          D          D       T
 O      A        M    D    _    M    D    _          A          A       A
 B      M        O    A    Y    O    A    Y          T          T       G
 S      E        N    Y    R    N    Y    R          E          E       E

 1   Roger      12   12   84    9    3   94   12/12/84   09/03/94       9
 2   Samantha    1   20   85    9   15   94   01/20/85   09/15/94       9
 3   Henry      10    6   83   10    2   94   10/06/83   10/02/94      10
 4   William     4   17   82   10    5   94   04/17/82   10/05/94      12
 5   Petra       6   15   83    9   14   94   06/15/83   09/14/94      11
 
CSCAR Home | About Us | Location | Workshops & Seminars | Software Help | External Resources | Spatial Analysis GIS | Contact Us | Search
 
--
 
Copyright © 1998 - 2001 The Regents of the University of Michigan, Ann Arbor