STUDY POPULATION The study population for the 2000 Pre- and Post-Election Study is defined to include all United States citizens of voting age on or before the 2000 Election Day. Eligible citizens must have resided in housing units in the forty-eight coterminous states. This definition excludes persons living in Alaska or Hawaii and requires eligible persons to have been both a United States citizen and eighteen years of age on or before the 7th of November 2000. >> DUAL FRAME SAMPLE DESIGN The 2000 NES is a dual frame sample with both an area sample and an RDD component. The RDD frame provides coverage of telephone households while the area sample provides full coverage of all U.S. households including those without telephones. Each of these sample designs will be described in the following sections. The 2000 NES data set contains 1006 area sample cases and 801 telephone sample cases. >> FTF SAMPLE DESIGN - MULTI-STAGE AREA PROBABILITY The area sample is based on a multi-stage area probability sample selected from the Survey Research Center's (SRC) 1990 National Sample design. Identification of the 2000 NES sample respondents was conducted using a four stage sampling process--a primary stage sampling of U.S. Metropolitan Statistical Areas (MSAs) or New England County Metropolitan Areas (NECMAs) and non-MSA counties, followed by a second stage sampling of area segments, a third stage sampling of housing units within sampled area segments and concluding with the random selection of a single respondent from selected housing units. A detailed documentation of the 1990 SRC National Sample, from which the 2000 NES sample was drawn, is provided in the SRC publication titled 1990 SRC National Sample: Design and Development. The 2000 NES sample design called for an entirely new cross-section sample to be drawn from the 1990 SRC National Sample; no panel component was included in 2000. The 1990 SRC National Sample is a multi-stage area probability sample. The 2000 NES sample was drawn from both the 1990 SRC National Sample strata (MSA PSUs) and the 1980 SRC National Sample strata (non-MSA PSUs). The modification of the 1990 design in which the 1980 strata definitions were used for the non-MSA counties fully represents the non-MSA domain of the 48 contiguous states. This modification was made for cost and interviewing efficiency reasons related to the availability of interviewers in these areas who work on some of SRC's large panel studies. The following sections will focus on the 1990 SRC National Sample design. Selection Stages for the 2000 NES FTF Sample: 1990 SRC National Sample ------------------------------------------------------------------ Primary Stage Selection The selection of primary stage sampling units (PSUs) for the 1990 SRC National Sample, which depending on the sample stratum are either MSAs, New England County Metropolitan Areas (NECMAs), single counties, independent cities, county equivalents or groupings of small counties, is based on the county-level 1990 Census Reports of Population and Housing (1). Primary stage units were assigned to 108 explicit strata based on MSA/NECMA or non- MSA/NECMA status, PSU size, Census Region and geographic location within region. Twenty-eight of the 108 strata contain only a single self- representing PSU, each of which is included with certainty in the primary stage of sample selection. The remaining 80 nonself-representing strata contain more than one PSU. From each of these nonself-representing strata, one PSU was sampled with probability proportionate to its size (PPS) measured in 1990 occupied housing units. The full 1990 SRC National Sample of 108 primary stage selections was designed to be optimal for surveys roughly three to five times the size of the 2000 NES. To permit the flexibility needed for optimal design of smaller survey samples, the primary stage of the SRC National Sample can be readily partitioned into smaller subsamples of PSUs such as a one-half sample or a three-quarter sample partition. Each of the partitions represents a stratified subselection from the full 108 PSU design. The 2000 NES sample of 44 PSUs is a stratified random subsample of PSUs from the "A" half-sample partition of the 1990 SRC National Sample. Because of the small size of this NES sample, both the number of PSUs (selected primary areas) and the secondary stage units (area segments) in the National half-sample were reduced by subselection for the 2000 NES sample design. The 18 self- representing areas in the 1990 SRC National half-sample were all retained for the 2000 NES sample (8 of these remained self-representing in the 2000 NES and 10 represent not only their own MSA but their "pair" among the twenty additional self-representing primary areas of the full 1990 SRC National Sample design). Nineteen of the 26 nonself-representing half-sample MSAs and 7 of the 14 half-sample non-MSAs were retained by the subselection for the 2000 NES sample (or 26 of 40 NSR PSUs). Table 1 identifies the 44 PSUs in the 2000 NES sample by MSA status and Region and also indicates the number of area segments used for the 2000 NES sample (see next section on second stage selection). Table 1: PSU Name and Number of Area Segments in the 2000 NES Sample Showing 1990 SRC National-Sample Stratum and MSA Status. ============================================================================== National Sample PSU National Sample PSU Name # of 2000 NES Segments ============================================================================== Eight Largest Self-representing PSUs ------------------------------------ 120 New York, NY MSA 12 190 Los Angeles-Long Beach, CA MSA130 12 130 Chicago, IL MSA 9 121 Philadelphia, PA-NJ MSA 7 131 Detroit, MI MSA 6 150 Washington DC-MD-VA MSA 6 110 Boston, MA NECMA 6 171 Dallas and Ft Worth, TX CMSA 6 Ten Remaining Largest MSA PSUs ------------------------------ 170 Houston, TX MSA 6 191 Seattle-Tacoma, WA CMSA 6 141 St Louis, MO-IL MSA 6 152 Baltimore, MD MSA 6 122 Nassau-Suffolk, NY MSA 6 194 Anaheim-Santa Ana, CA MSA 6 132 Cleveland, OH MSA 6 154 Miami-Hialeah, FL MSA 5(2) 181 Denver, CO MSA 6 196 San Francisco, CA MSA 6 Nonself-representing MSAs: Northeast ------------------------------------- 211 New Haven-Waterbury-Meriden, CT NECMA 6 213 Manchester-Nashua NH NECMA 6 220 Buffalo, NY MSA 6 226 Atlantic City, NJ MSA 6 Nonself-representing MSAs: Midwest ----------------------------------- 230 Milwaukee, WI MSA 6 434 Saginaw, MI MSA 6 239 Steubenville-Wheeling, OH (3) 6 240 Des Moines, IA MSA 6 Nonself-representing MSAs: South --------------------------------- 250 Richmond-Petersburg, VA MSA 6 255 Columbus, GA-AL MSA 6 257 Jacksonville, FL MSA 6 258 Lakeland, FL MSA 6 260 Knoxville TN MSA 6 262 Birmingham, AL MSA 6 273 Waco, TX MSA 6 274 McAllen-Edinburg-Mission, TX MSA 6 Nonself-representing MSAs: West -------------------------------- 280 Salt Lake City-Ogden etc, UT MSA 6 292 Fresno, CA MSA 6 293 Eugene-Springfield, OR MSA 6 Nonself-representing Non-MSAs: Northeast ----------------------------------------- 464 Gardner, MA 6 Nonself-representing Non-MSAs: Midwest -------------------------------------- 466 Decatur County, IN 6 470 Mower County, MN 6 Nonself-representing Non-MSAs: South ------------------------------------- 474 DeSoto Parish, LA 6 477 Chicot County, AR 6 480 Montgomery County, VA 6 Nonself-representing Non-MSAs: West ------------------------------------ 482 ElDorado County, CA 6 Total Number of Segments 279 (1) Office of Management and Budget (OMB) June 1990 definitions of MSAs, NECMAs, counties, parishes, independent cities. These, of course, differ in some respects from the primary stage unit (PSU) definitions used in the 1980 SRC National Sample so will not be strictly comparable to the 1996 NES Panel PSUs--particularly in New England where MSAs were used as PSUs in the 1980 National Sample and NECMAs were used as PSUs in the 1990 National Sample. (2) One selected segment (023) was in a former trailer park that had no housing units to be listed in January 1996. All had been destroyed in 1992 by hurricane Andrew and there were no plans to rebuild. (3) In the 1990 SRC National Sample, U.S. Census Region boundaries were maintained for purposed of stratification at the Primary State of selection. Since some MSA definitions cross Region boundaries, such MSAs were split and the MSA counties recombined in ways that maintained the Region boundary. This PSU actually contains the Ohio counties from both the Steubenville-Wierton, OH-WV MSA (Jefferson County, OH) and the Wheeling, WV-OH MSA (Belmont County, OH) and although it is made up of MSA counties -- it is not a cohesive MSA by OMB 1990 definition. Second Stage Selection Area Segments The second stage of the 1990 SRC National Sample, used for the 2000 NES sample, was selected directly from computerized files that were extracted for the selected PSUs from the 1990 U.S. Census summary file series STF1-B. These files (on CD Rom) contain the 1990 Census total population and housing unit (HU) data at the census block level. The designated second-stage sampling units (SSUs), termed "area segments", are comprised of census blocks in both the metropolitan (MSA) primary areas and in the rural areas of non- MSA primary areas. Each SSU block or block combination was assigned a measure of size equal to the total 1990 occupied housing unit count for the area. SSU block(s) were assigned a minimum measure of 72 1990 total HUs per MSA SSU and a minimum measure of 48 total HUs per non-MSA SSU. Second stage sampling of area segments was performed with probabilities proportionate to the assigned measures of size (PPS). For the 2000 NES sample the number of area segments used in each PSU varies. In the self-representing (SR) PSUs the number of area segments varies in proportion to the size of the primary stage unit, from a high of 12 area segments in the self-representing New York and Los Angeles MSA PSUs, to a low of 6 area segments in the smaller self-representing PSUs such as Cleveland, Miami-Hialeah or Nassau-Suffolk MSAs. All nonself-representing (NSR) PSUs were represented by 6 area segments each. A total of 279 NES area segments were selected as shown in Table 1. Third Stage Selection Housing Units For each area segment selected in the second sampling stage, a listing had been made of all housing units located within the physical boundaries of the segment. For segments with a very large number of expected housing units, all housing units in a subselected part of the segment were listed. The final equal probability sample of housing units for the 2000 NES sample was systematically selected from the housing unit listings for the sampled area segments. The 2000 NES sample design was selected from the 1990 SRC National Sample to yield an equal probability sample of 2269 listed housing units. This total included 1972 housing units for the main sample and three reserve replicates of 99 cases each. Table 2 below shows the assumptions that were used to determine the number of sample housing units. The overall probability of selection for 2000 NES cross-section sample of households was f=0.00002116 or 0.2116 in 10,000. The equal probability sample of households was achieved for the 2000 NES sample by using the standard multi-stage sampling technique of setting the sampling rate for selecting housing units within area segments to be inversely proportional to the PPS probabilities used to select the PSU and area segment (Kish, 1965). Fourth Stage Selection - Respondent Selection Within each sampled 2000 NES occupied housing unit, the SRC interviewer prepared a complete listing of all eligible household members. Using an objective procedure described by Kish (1949) a single respondent was then selected at random to be interviewed. Regardless of circumstances, no substitutions were permitted for the designated respondent. >> AREA SAMPLE DESIGN ASSUMPTIONS, SPECIFICATIONS AND OUTCOMES The 2000 National Election Study sought a total of 1000 in-person interviews. It was estimated that this would require a NES sample draw of 1972 housing units. This assumed an occupancy/growth rate of 0.83, an eligibility rate of 0.94 and a response rate of 0.65. These assumptions were based on the 1998 NES field experience. The overall 2000 NES area sample design specifications, assumptions and outcomes are set out in Table 2, below. A sample of 2269 listed housing units was actually selected for the 2000 NES study. This allowed for three reserve replicates of 99 cases each. There was no panel component in 2000. A comparison of the 2000 NES sample outcome figures to the design specifications and assumptions in Table 2 shows that the actual occupancy, eligibility, and response rates were very close to the expected rates. The actual response rate for the Post-Election Telephone sample was 0.86, which was slightly higher than the assumed rate of 0.85. Table 2: 2000 NES Area Sample Pre and Post-Election Design Specifications and Assumptions Compared to Sample Outcome. ============================================================================== 2000 NES 2000 NES 2000 NES 2000 NES Pre-Election Pre-Election Post-Election Post-Election Design Sample Design Sample Specification Outcome Specification Outcome ============================================================================== Completed 1000 1006 847 693 Interviews Response Rate 0.65 0.64 .85 0.86 Eligible 1538 1564 1000 805 (4) Sample Households Eligibility 0.94 0.95 Rate Occupied 1634 1639 Households Occupancy/ 0.83 0.82 growth Rate Total Sample 1972 1986 Lines (4) Initial sample lines (FTF and Phone) are different from the Pre-Election completed interviews because of the switch in mode for randomly selected sample cases. >> 2000 NES RDD (RANDOM DIGIT DIAL)SAMPLE The RDD telephone component of the 2000 NES is a stratified equal probability sample of telephone numbers. The sample is not clustered. The telephone numbers were selected from a commercial listed one hundred series sampling frame consisting of every possible phone number that can be generated by appending the 2-digit numbers 00 - 99 to the set of hundred banks that have at least two listed household telephone numbers. Hundred banks are the first eight digits of a phone number - area code, exchange, and the next two digits. Each hundred bank defines a set of 100 possible phone numbers. Directory listings are used to define the set of listed hundred series. However both listed and unlisted telephone numbers can be selected from the sampling frame. A small amount of noncoverage of telephone numbers results from household numbers that are in hundred banks with 0 or 1 listed residential numbers. These telephone households as well as non-telephone households are covered by the area sample component. An initial sample of 8500 telephone numbers was selected from the listed frame for the coterminous 48 states. These numbers were pre-screened by the vendor to remove most business and non-working phone numbers. After pre-screening, 5760 or 67.8% of the 8500 telephone numbers were returned as potentially working residential numbers. The potentially working phone numbers were matched against a file of directory listings to append address information so that Congressional Districts could be assigned. Before sample selection, the telephone numbers were stratified by the competitiveness of the Congressional race (5 levels), whether or not the race was open, and by Census Division. A half sample was systematically selected from the stratified file. An initial sample of 2349 cases was selected from the random half sample and the remaining telephone numbers were assigned to 5 reserve replicates of 106-107 numbers each. The reserve replicates were available for use in case the working rate or response rate were lower than expected. >> 2000 NES RDD SAMPLE DESIGN ASSUMPTIONS, SPECIFICATIONS AND OUTCOMES The 2000 National Election Study sought a total of 861 telephone interviews. It was estimated that this would require a NES sample draw of 2349 telephone numbers assuming a working rate (after pre-screening) of 0.65, an eligibility rate of 0.94, and a response rate of 0.60. The eligibility rate was based on the 1998 NES experience. Working rate and response rate assumptions were based on the Survey Research Center's recent experience with RDD samples. The overall 2000 NES RDD sample design specifications, assumptions and outcomes are set out in Table 3, below. A comparison of the 2000 NES RDD sample design specifications and assumptions to the outcome figures in Table 3 indicates that, although the actual eligibility rate was higher than assumed, both the working rate and response rates were lower than specified in the sample design assumptions. This resulted in fewer interviews being taken in the Pre-Election study. The actual response rate for the Post-Election telephone sample was 0.86, which was higher than the assumed rate of 0.75. Table 3: 2000 NES Telephone Sample Design Specifications and Assumptions Compared to Sample Outcome. ============================================================================== 2000 NES 2000 NES 2000 NES 2000 NES Pre-Election Pre-Election Post-Election Post-Election Design Sample Design Sample Specification Outcome Specification Outcome ============================================================================== Completed 861 801 645 862 Interviews Response Rate 0.60 0.56 .75 0.86 Eligible 1435 1418 861 1002 (5) Sample Households Eligibility 0.94 0.96 Rate Occupied 1527 1475 Households Working Rate 0.65 0.63 Total Sample 2349 2349 Lines (5) Initial sample lines (FTF and Phone) are different from the Pre-Election completed interviews because of the switch in mode for randomly selected sample cases. >> 2000 NES POST-ELECTION STUDY SAMPLE OUTCOMES Of the 1807 respondents interviewed in the Pre-Election Study, 1555 completed Post-Election interviews for an overall response rate of 0.86. FTF interviews were attempted with 805 of the 1006 persons interviewed FTF in the Pre-Election study and 693 FTF interviews were obtained for a FTF response rate of 0.86. Approximately 200 FTF cases were transferred to telephone interviewing for the Post-Election study in order to reduce field costs. This was accomplished through a systematic random sample of approximately 20 percent of the area segments. Telephone interviews were attempted with 1002 (201 FTF in the Pre-Election study and 801 Telephone in Pre-Election study) respondents in the Post-Election study. 862 telephone interviews were obtained for a response rate of 0.86. >> 2000 NES DATA - WEIGHTED ANALYSIS The 2000 NES data set includes a person-level analysis weight, which incorporates sampling, nonresponse and post-stratification factors. Analysts interested in developing their own nonresponse or stratification adjustment factors must request access to the necessary sample control data from the NES Board. >> 2000 NES ANALYSIS WEIGHTS - CONSTRUCTION Household Selection Weight Component ------------------------------------ The joint household selection weight is the same for both the RDD and the area sample. This weight is an inflation factor equal to 34195.298. It is equal to the inverse of the joint probability of selection, which is the sum of the RDD and the area sample probabilities minus their product. It was not possible from the data available to reliably identify the area sample respondents who did not have telephone service. The 2000 CPS March Supplement estimates that 5.5% of U.S. households do not have telephone service. The household selection weight component therefore slightly underestimates respondents who live in households that cannot be reached through the RDD sample frame. Person-Level Sample Selection Weight Component ---------------------------------------------- The dual frame sample design for the 2000 NES results in a probability sample of U.S. households. Within sample households a single adult respondent is chosen at random to be interviewed. Since the number of eligible adults varies from one household to another, the random selection of a single adult introduces inequality into respondents' selection probabilities. In analysis, a respondent selection weight should be used to compensate for these unequal selection probabilities. The person-level selection weight is the product of the joint household selection weight and the within household selection weight. The within household selection weight is equal to the number of eligible persons in the household and is capped at 3. The use of the respondent selection weight is strongly encouraged, despite past evaluations that have shown these weights to have little significant impact on the values of NES estimates of descriptive statistics. Nonresponse Adjusted Selection Weight ------------------------------------- The base weight equals the product of the joint selection weight and the household level nonresponse adjustment factors. Nonresponse adjustment factors were constructed at the household level separately for the area sample and the RDD sample. Nonresponse adjustment cells for the 2000 NES sample were formed by crossing MSA status by the four Census regions (Northeast, Midwest, South, and West). A nonresponse adjustment factor equal to the inverse of the response rate in each cell was applied to the interview cases. Tables 4 and 5 show the response rates and nonresponse adjustment factors for the area and RDD samples. Table 4. Computation of Nonresponse Adjustment Weights -- 2000 NES Area Sample. ============================================================================== PSU Type Census Region Response Rate Nonresponse (%) Adjustment Factor ============================================================================== MSAs Northeast 55.28 1.809 Midwest 62.86 1.591 South 61.87 1.616 West 67.82 1.474 Non MSAs Northeast 61.54 1.625 Midwest 65.71 1.522 South 79.55 1.257 West 83.33 1.200 Table 5 Computation of Nonresponse Adjustment Weights -- 2000 NES RDD Sample. ============================================================================== PSU Type Census Region Response Rate Nonresponse (%) Adjustment Factor ============================================================================== MSAs Northeast 43.94 2.276 Midwest 62.08 1.611 South 58.72 1.703 West 53.56 1.867 Non MSAs Northeast 50.00 2.000 Midwest 67.90 1.473 South 62.70 1.595 West 67.86 1.474 Post-stratification factor -------------------------- The 2000 NES weights are post-stratified to 2000 CPS March Supplement proportions for six (6) ages by four (4) education categories. Table 6 shows the weighted estimates and proportions for the 24 cells for the 2000 CPS and the 2000 NES. The post-stratification adjustment is computed by dividing the CPS weighted total by the 2000 NES total weighted by the nonresponse adjusted selection weight. The final two columns show the NES weighted totals using the final post-stratified analysis weight and the resulting percents, which match the CPS percents. Final Analysis Weights ---------------------- The final analysis weight (FINAL_WT) is the product of the household level non-response adjustment factor, the number of eligible persons, and a person- level post-stratification factor. The final analysis weight for the 2000 NES sample (FINAL_WT) is scaled to sum to 1807, the total number of respondents. This weight is trimmed at the 1st and 99th percentiles and then re-scaled to match the 2000 CPS proportions for the 24 age by education cells. Post-Election Attrition Weight ------------------------------ The 1555 Post-Election cases were post-stratified to 2000 CPS March Supplement proportions for six (6) ages by four (4) education categories (the same categories used for post-stratifying the Pre-Election cases). The post- stratification compensates for differential non-response by age group and education level. Response rates for the Post-Election Study ranged from a high of 100 percent for persons 70 or older with a college degree or higher to a low of 76 percent for persons age 30 - 39 who did not graduate from high school. The panel attrition weight for the Post-Election Study, POST_WT, is the product of the Pre-Election FINAL_WT and the post-stratification factor formed by dividing the CPS proportion by the weighted NES proportion for each of the 24 age by education cells. The weight is scaled to sum to the number of cases, 1555. Table 6: 2000 NES Sample Weight: Post-stratification Factors. ============================================================================== Age Education n 2000 CPS 2000 Prelim 2000 Post- NES Final Group Level Est in CPS NES wtd strat wtd NES 000s (6) % Est in 000s Adjust n wtd centered % ============================================================================== 18-29 <High 22 6,411.4 3.438 2,490.3 2.574 62.08 3.44 School Graduation High School 88 12,223.7 6.555 9,628.2 1.270 118.53 6.56 Graduate Some 103 14,524.8 7.789 11,424.0 1.271 140.81 7.79 College College 68 6,666.9 3.575 6,990.0 0.954 64.73 3.58 Graduate 30-39 <High 21 3,242.8 1.739 1,780.1 1.822 31.48 1.74 School Graduation High 108 12,543.8 6.727 10,873.1 1.154 121.56 6.73 School Graduate Some 121 10,759.0 5.769 11,727.6 0.917 104.32 5.77 College College 146 10,786.4 5.784 14,122.3 0.764 104.36 5.78 Graduate 40-49 <High 22 3,478.8 1.865 2,277.5 1.527 33.74 1.87 School Graduation High 101 13,087.2 7.018 9,899.0 1.322 126.84 7.02 School Graduate Some 129 11,548.5 6.193 13,551.0 0.852 111.85 6.19 College College 137 11,327.1 6.074 14,505.2 0.781 109.74 6.07 Graduate 50-59 <High 123 3,300.1 1.770 2,192.9 1.505 32.04 1.77 School Graduation High 93 9,364.1 5.022 9,558.1 0.980 90.70 5.02 Graduate Some 96 7,449.2 3.995 10,185.6 0.731 72.12 3.99 College College 110 7,984.6 4.282 11,542.5 0.716 77.40 4.28 Graduate 60-69 <High 35 4,136.4 2.218 3,429.9 1.206 40.20 2.22 School Graduation High School 61 7,201.9 3.862 6,060.7 1.188 69.77 3.86 Graduate Some 49 3,886.6 2.084 4,280.8 0.908 37.58 2.08 College College 49 3,880.8 2.081 4,688.9 0.828 37.53 2.08 Graduate 70 + <High School 58 7,298.9 3.914 5,033.8 1.450 70.63 3.91 Graduation High School 73 7,994.7 4.287 6,327.7 1.263 77.51 4.29 Graduate Some College 48 4,073.3 2.184 3,811.1 1.069 39.41 2.18 College 46 3,303.4 1.771 4,071.8 0.811 32.07 1.77 Totals 1807 186,470.0 100.0 180,100.0 1807.0 100.0 (6) Because U.S. citizenship is required for NES eligibility, the CPS counts used for stratification include only U.S. citizens. >> 2000 NES PROCEDURES FOR SAMPLING ERROR ESTIMATION The 2000 NES sample design is based on a stratified multi-stage area probability sample of United States households. Although smaller in scale, the NES sample design is very similar in it basic structure to the multi- stage designs used for major federal survey programs such as the Health Interview Survey (HIS) or the Current Population Survey (CPS). The survey literature refers to the NES, HIS and CPS samples as complex designs, a loosely-used term meant to denote the fact that the sample incorporates special design features such as stratification, clustering and differential selection probabilities (i.e., weighting) that analysts must consider in computing sampling errors for sample estimates of descriptive statistics and model parameters. This section of the 2000 NES sample design description focuses on sampling error estimation and construction of confidence intervals for survey estimates of descriptive statistics such as means, proportions, ratios, and coefficients for linear and logistic linear regression models. Standard analysis software systems such SAS and SPSS assume simple random sampling (SRS) or equivalently independence of observations in computing standard errors for sample estimates. In general, the SRS assumption results in underestimation of variances of survey estimates of descriptive statistics and model parameters. Confidence intervals based on computed variances that assume independence of observations will be biased (generally too narrow) and design-based inferences will be affected accordingly. Sampling Error Computation Methods and Programs ----------------------------------------------- Over the past 50 years, advances in survey sampling theory have guided the development of a number of methods for correctly estimating variances from complex sample data sets. A number of sampling error programs which implement these complex sample variance estimation methods are available to NES data analysts. The two most common approaches to the estimation of sampling error for complex sample data are through the use of a Taylor Series Linearization of the estimator (and corresponding approximation to its variance) or through the use of resampling variance estimation procedures such as Balanced Repeated Replication (BRR) or Jackknife Repeated Replication (JRR). New Bootstrap methods for variance estimation can also be included among the resampling approaches. See Rao and Wu (1988). 1. Taylor series linearization method: When survey data are collected using a complex sample design with unequal size clusters, most statistics of interest will not be simple linear functions of the observed data. The linearization approach applies Taylor's method to derive an approximate form of the estimator that is linear in statistics for which variances and covariances can be directly and easily estimated (Woodruff, 1971). SUDAAN and Stata are two commercially available statistical software packages that include procedures that apply the Taylor series method to estimation and inference for complex sample data. SUDAAN (Shah et al., 1996) is a commercially available software system developed and marketed by the Research Triangle Institute of Research Triangle Park, North Carolina (USA). SUDAAN was developed as a stand-alone software system with capabilities for the more important methods for descriptive and multivariate analysis of survey data, including: estimation and inference for means, proportions and rates (PROC DESCRIPT and PROC RATIO); contingency table analysis (PROC CROSSTAB); linear regression (PROC REGRESS); logistic regression (PROC LOGISTIC); log-linear models (PROC CATAN); and survival analysis (PROC SURVIVAL). SUDAAN V7.0 and earlier versions were designed to read directly from ASCII and SAS system data sets. The latest versions of SUDAAN permit procedures to be called directly from the SAS system. Information on SUDAAN is available at the following web site address: http://www.rti.org. Stata (StataCorp, 1997) is a more recent commercial entry to the available software for analysis of complex sample survey data and has a growing body of research users. Stata includes special versions of its standard analysis routines that are designed for the analysis of complex sample survey data. Special survey analysis programs are available for descriptive estimation of means (SVYMEAN), ratios (SVYRATIO), proportions (SVYTOT) and population totals (SVYTOTAL). Stata programs for multivariate analysis of survey data currently include linear regression (SVYREG), logistic regression (SVYLOGIT) and probit regression (SVYPROBT). Information on the Stata analysis software system can be found on the Web at: http://www.stata.com. 2. Resampling methods: BRR, JRR and the bootstrap comprise a second class of nonparametric methods for conducting estimation and inference from complex sample data. As suggested by the generic label for this class of methods, BRR, JRR and the bootstrap utilize replicated subsampling of the sample database to develop sampling variance estimates for linear and nonlinear statistics. WesVar PC (Brick et al., 1996) is a publicly available software system for personal computers that employs replicated variance estimation methods to conduct the more common types of statistical analysis of complex sample survey data. WesVar PC was developed by Westat, Inc. and is distributed along with documentation free of charge to researchers from Westat's Web site: http://www.westat.com/wesvarpc/. WesVar PC includes a Windows-based application generator that enables the analyst to select the form of data input (SAS data file, SPSS for Windows data base, dBase file, ASCII data set) and the computation method (BRR or JRR methods). Analysis programs contained in WesVar PC provide the capability for basic descriptive (means, proportions, totals, cross tabulations) and regression (linear, logistic) analysis of complex sample survey data. WestVar Complex Samples 3.0 is the latest version of WestVar PC that is licensed and distributed by SPSS. Information on the latest developments can be obtained at http://www.spss.com. These new and updated software packages include an expanded set of user friendly, well-documented analysis procedures. Difficulties with sample design specification, data preparation, and data input in the earlier generations of survey analysis software created a barrier to use by analysts who were not survey design specialists. The new software enables the user to input data and output results in a variety of common formats, and the latest versions accommodate direct input of data files from the major analysis software systems. Readers who are interested in a more detailed comparison of these and other survey analysis software alternatives are referred to Cohen (1997). Sampling Error Computation Models --------------------------------- Regardless of whether linearization or a resampling approach is used, estimation of variances for complex sample survey estimates requires the specification of a sampling error computation model. NES data analysts who are interested in performing sampling error computations should be aware that the estimation programs identified in the preceding section assume a specific sampling error computation model and will require special sampling error codes. Individual records in the analysis data set must be assigned sampling error codes that identify to the programs the complex structure of the sample (stratification, clustering) and are compatible with the computation algorithms of the various programs. To facilitate the computation of sampling error for statistics based on 2000 NES data, design-specific sampling error codes will be routinely included in all public-use versions of the data set. Although minor recoding may be required to conform to the input requirements of the individual programs, the sampling error codes that are provided should enable analysts to conduct either Taylor Series or Replicated estimation of sampling errors for survey statistics. Table 7 defines the sampling error coding system for 2000 NES sample cases. Two sampling error code variables are defined for each case based on the sample design primary stage unit (PSU) and area segment in which the sample household is located. Sampling Error Stratum Code (Variable 000097). The Sampling Error Computation Stratum Code is the variable that defines the sampling error computation strata for all sampling error analysis of the NES data. Each self- representing (SR) design stratum is represented by one sampling error computation stratum. Pairs of similar nonself-representing (NSR) primary stage design strata are "collapsed" (Kalton, 1977) to create NSR sampling error computation strata. Since there was an uneven number of nonself- representing MSA and non-MSA strata used in the 2000 NES, and since it was felt that a nonself-representing MSA PSU should be paired with a non-MSA PSU, one of each of these PSUs stands alone within its Sampling Error Stratum Code. For the 1990 SRC National Sample design controlled selection and a "one-per- stratum" PSU allocation are used to select the primary stage of the 2000 NES national sample. The purpose in using controlled selection and the "one-per- stratum" sample allocation is to reduce the between-PSU component of sampling variation relative to a "two-per-stratum" primary stage design. Despite the expected improvement in sample precision, a drawback of the "one-per-stratum" design is that two or more sample selection strata must be collapsed or combined to form a sampling error computation stratum. Variances are then estimated under the assumption that a multiple PSU per stratum design was actually used for primary stage selection. The expected consequence of collapsing design strata into sampling error computation strata is the overestimation of the true sampling error; that is, the sampling error computation model defined by the codes contained in Table 7 will yield estimates of sampling errors which in expectation will be slightly greater than the true sampling error of the statistic of interest. SECU - Stratum-specific Sampling Error Computation Unit code (Variable OOOO97) is a half sample code for analysis of sampling error using the BRR method or approximate "two-per-stratum" Taylor Series method (Kish and Hess, 1959). Within the SR sampling error strata, the SECU half sample units are created by dividing sample cases into random halves, SECU=1 and SECU=2. The assignment of cases to half-samples is designed to preserve the stratification and second stage clustering properties of the sample within an SR stratum. Sample cases are assigned to SECU half samples based on the area segment in which they were selected. For this assignment, sample cases were placed in original stratification order (area segment number order) and beginning with a random start entire area segment clusters were systematically assigned to either SECU=1 or SECU=2. In the general case of nonself-representing (NSR) strata, the half sample units are defined according to the PSU to which the respondent was assigned at sample selection (with the exception of the two unpaired NSR strata mentioned above). That is, the half samples for each NSR sampling error computation stratum bear a one-to-one correspondence to the sample design NSR PSUs. The particular sample coding provided on the NES public use data set is consistent with the "ultimate cluster" approach to complex sample variance estimation (Kish, 1965; Kalton, 1977). Individual stratum, PSU and segment code variables may be needed by NES analysts interested in components of variance analysis or estimation of hierarchical models in which PSU-level and neighborhood-level effects are explicitly estimated. Table 7 shows the area sample sampling error stratum and SECU codes to be used for the paired selection model for sampling error computations for any 2000 NES analyses. Strata 01 through 26 reflect the half sample 1990 National Sample design used for the 2000 NES area sample. It can be seen from this table that the three-digit 2000 SE code is comprised of, first, the two-digit SE Stratum code followed by the one-digit SECU code. The RDD sample cases are assigned to Strata 27 through 66. The RDD sample is a stratified unclustered design. In order to reflect the stratification of the RDD frame, the sample was sorted by area code within metropolitan status within Census Division prior to the assignment of sampling error stratum and SECU codes. The sorted file was then divided into groups of 20 adjacent cases to form the strata. Within each stratum, cases were assigned alternately to each of the pair of SECUs, 10 cases per SECU. This assignment of sampling error stratum and SECU codes allows for design effects to be estimated for the complete NES data set as well as separately for the RDD and area sample components. Table 7: 2000 NES Election Study Sampling Error Codes. ============================================================================== SE SECU SE Code PSU Segment #s Total Rs Stratum ============================================================================== 01 1 011 120 015, 031, 047, 063, 079, 099 11 2 012 120 007, 023, 039, 055, 071, 087 11 02 1 021 190 007, 023, 039, 055, 071, 087 11 2 022 190 016, 031, 047, 063, 079, 095 13 03 1 031 130 011, 028, 044, 060 8 2 032 130 004, 020, 036, 052, 068 15 04 1 041 121 002, 018, 034, 050 10 2 042 121 010, 026, 042 6 05 1 051 131 016, 032, 047 11 2 052 131 008, 024, 040 10 06 1 061 150 007, 023, 039 11 2 062 150 015, 031, 047 8 07 1 071 171 010, 026, 042 6 2 072 171 002, 018, 034 7 08 1 081 110 004, 020, 036 6 2 082 110 012, 028, 044 5 09 1 091 170 011, 027, 031, 039 17 2 092 154 003, 007, 011, 015, 019 13 170 007, 019 10 1 101 122 008, 012, 015, 024, 028, 032 18 2 102 152 004, 012, 016, 020, 028, 032 13 11 1 111 141 004, 008, 016, 020, 024, 032 12 2 112 132 001, 005, 009, 013, 017, 021 18 12 1 121 191 001, 005, 009, 017, 021, 025 27 2 122 181 001, 005, 009, 013, 017, 021 20 13 1 131 194 004, 008, 016, 020, 024, 032 17 2 132 196 002, 006, 010, 014, 018, 022 15 14 1 141 220 001, 005, 009, 013, 017, 021 40 2 142 226 002, 006, 010, 014, 018, 022 24 15 1 151 211 004, 007, 011, 015, 020, 023 9 2 152 213 004, 008, 012, 016, 020, 024 17 16 1 161 230 002, 006, 010, 014, 018, 022 45 2 162 434 002, 304, 306, 008, 010, 011 23 17 1 171 239 001, 005, 009, 013, 017, 021 14 2 172 240 002, 006, 010, 014, 018, 022 20 18 1 181 262 002, 006, 010, 014, 018, 022 48 2 182 255 004, 008, 012, 016, 020, 024 17 19 1 191 257 004, 008, 012, 016, 020, 024 23 2 192 258 002, 006, 010, 014, 018, 022 15 20 1 201 273 003, 007, 011, 015, 019, 023 18 2 202 274 002, 006, 010, 014, 018, 022 14 21 1 211 260 003, 007, 011, 015, 019, 023 14 2 212 250 003, 007, 011, 015, 019, 023 21 22 1 221 292 001, 005, 009, 013, 017, 022 20 2 222 293 003, 007, 011, 015, 019, 023 20 23 1 231 464 303, 305, 306, 309, 311, 312 32 2 232 480 301, 302, 303, 305, 306, 307 39 24 1 241 466 301, 302, 304, 305, 306, 308 26 2 242 470 301, 302, 303, 305, 306, 307 43 25 1 251 474 302, 303, 304, 306, 307, 308 40 2 252 477 302, 303, 304, 306, 307, 308 26 26 1 261 280 002, 006, 010, 014, 018, 022 34 2 262 482 301, 303, 304, 305, 307, 308 45 Total: 1006 Generalized Sampling Error Results for the 2000 NES --------------------------------------------------- To assist NES analysts, the PC SUDAAN program was used to compute sampling errors for a wide-ranging example set of proportions estimated from the 2000 NES election Survey data set. Sampling errors were computed for the complete NES data set as well as separately for the area sample and RDD sample components. For each estimate, sampling errors were computed for the total sample and for fifteen demographic and political affiliation subclasses of the 2000 NES sample. The results of these sampling error computations were then summarized and translated into the general usage sampling error tables provided in Tables 8 - 10. The mean value of deft, the square root of the design effect, was found to be 1.098 for the combined sample, 1.076 for the area sample component, and 1.049 for the RDD sample component. The design effects were primarily due to weighting effects (Kish, 1965) and did not vary significantly by subclass size. Therefore the generalized variance tables are produced by multiplying the simple random sampling standard error for each proportion and sample size by the average deft for the set of sampling error computations. Incorporating the pattern of "design effects" observed in the extensive set of example computations, Tables 8 - 10 provide approximate standard errors for percentage estimates based on the 2000 NES. To use the tables, examine the column heading to find the percentage value which best approximates the value of the estimated percentage that is of interest. Next, locate the approximate sample size base (denominator for the proportion) in the left- hand row margin of the table. To find the approximate standard error of a percentage estimate, simply cross-reference the appropriate column (percentage) and row (sample size base). Note: the tabulated values represent approximately one standard error for the percentage estimate. To construct an approximate confidence interval, the analyst should apply the appropriate critical point from the "z" distribution (e.g., z=1.96 for a two- sided 95% confidence interval half-width). Furthermore, the approximate standard errors in the table apply only to single point estimates of percentages not to the difference between two percentage estimates. The generalized variance results presented in Tables 8 - 10 are a useful tool for initial, cursory examination of the NES survey results. For more in depth analysis and reporting of critical estimates, analysts are encouraged to compute exact estimates of standard errors using the appropriate choice of a sampling error program and computation model. Table 8: Generalized Variance Table. 2000 NES election Survey - Combined Sample. APPROXIMATE STANDARD ERRORS FOR PERCENTAGES ============================================================================== For percentage estimates near: Sample n 50% 40% 30% 20% 10% or 60% or 70% or 80% or 90% ============================================================================== 100 5.49 5.38 5.03 4.39 3.29 200 3.88 3.80 3.56 3.10 2.33 300 3.17 3.10 2.90 2.54 1.90 400 2.74 2.69 2.52 2.20 1.65 500 2.45 2.40 2.25 1.96 1.47 600 2.24 2.20 2.05 1.79 1.34 700 2.07 2.03 1.90 1.66 1.24 800 1.94 1.90 1.78 1.55 1.16 900 1.83 1.79 1.68 1.46 1.10 1000 1.74 1.70 1.59 1.39 1.04 1100 1.66 1.62 1.52 1.32 0.99 1200 1.58 1.55 1.45 1.27 0.95 1300 1.52 1.49 1.40 1.22 0.91 1400 1.47 1.44 1.34 1.17 0.88 1500 1.42 1.39 1.30 1.13 0.85 1600 1.37 1.34 1.26 1.10 0.82 1700 1.33 1.30 1.22 1.06 0.80 1800 1.29 1.27 1.19 1.04 0.78 Table 9: Generalized Variance Table. 2000 NES election Survey - Area Sample. APPROXIMATE STANDARD ERRORS FOR PERCENTAGES ============================================================================== For percentage estimates near: Sample n 50% 40% 30% 20% 10% or 60% or 70% or 80% or 90% ============================================================================== 100 5.38 5.27 4.93 4.30 3.23 200 3.80 3.73 3.48 3.04 2.28 300 3.10 3.04 2.85 2.48 1.86 400 2.69 2.63 2.46 2.15 1.61 500 2.40 2.36 2.20 1.92 1.44 600 2.20 2.15 2.01 1.76 1.32 700 2.03 1.99 1.86 1.63 1.22 800 1.90 1.86 1.74 1.52 1.14 900 1.79 1.76 1.64 1.43 1.07 1000 1.70 1.67 1.56 1.36 1.02 Table 10: Generalized Variance Table. 2000 NES election Survey - RDD Sample. APPROXIMATE STANDARD ERRORS FOR PERCENTAGES ============================================================================== For percentage estimates near: Sample n 50% 40% 30% 20% 10% or 60% or 70% or 80% or 90% ============================================================================== 100 5.24 5.14 4.80 4.19 3.14 200 3.71 3.63 3.40 2.96 2.22 300 3.03 2.96 2.77 2.42 1.82 400 2.62 2.57 2.40 2.10 1.57 500 2.34 2.30 2.15 1.88 1.41 600 2.14 2.10 1.96 1.71 1.28 700 1.98 1.94 1.82 1.58 1.19 800 1.85 1.82 1.70 1.48 1.11 References Alegria, M., Kessler, R., Bijl, R., Lin, E., Heeringa, S.G., Takeuchi, D.T., Kolody, B. (2000). To appear in The Unmet Need for Treatment. Proceedings of a Symposium of the World Psychiatric Association, Sydney, Australia, October, 1997. Binder, D.A. (1983), "On the variances of asymptotically normal estimators from complex surveys," International Statistical Review, Vol. 51, pp. 279- 292. Brick, J.M., Broene, P., James, P., & Severynse, J. (1996). "A User's Guide to WesVar PC." Rockville, MD: Westat, Inc. Cochran, W.G. (1977). Sampling Techniques. New York: John Wiley & Sons. Cohen, S.B. (1997). "An evaluation of alternative PC-based software packages developed for the analysis of complex survey data," The American Statistician, Vol. 51, No. 3, pp. 285-292. Goldstein, H. (1987). Multi-level Models in Educational and Social Research. London: Oxford University Press. Kalton, G. (1977), "Practical methods for estimating survey sampling errors," Bulletin of the International Statistical Institute, Vol. 47, 3, pp. 495-514. Kish, L. (1949). "A procedure for objective respondent selection within the household," Journal of the American Statistical Association, Vol. 44, pp. 380-387. Kish, L. (1965), Survey Sampling. New York: John Wiley & Sons, Inc. Kish, L., & Frankel, M.R. (1974), "Inference from complex samples," Journal of the Royal Statistical Society, B, Vol. 36, pp. 1-37. Kish, L., Groves, R.M., & Krotki, K.P. (1975). "Sampling errors for fertility surveys." Occasional Paper No. 17. Voorburg, Netherlands: World Fertility Survey, International Statistical Institute. Kish, L., & Hess, I. (1959), "On variances of ratios and their differences in multi-stage samples," Journal of the American Statistical Association, 54, pp. 416-446. LePage, R., & Billard, L. (1992), Exploring the Limits of Bootstrap. New York: John Wiley & Sons, Inc. Mahalanobis, P.C. (1946), "Recent experiments in statistical sampling at the Indian Statistical Institute," Journal of the Royal Statistical Society, Vol. 109, pp. 325-378. McCullagh, P.M. & Nelder, J.A. (1989). Generalized Linear Models, 2nd Edition. Chapman and Hall. London. Rao, J.N.K & Wu, C.F.J. (1988.), "Resampling inference with complex sample data," Journal of the American Statistical Association, 83, pp. 231-239. Rosenstone, Steven J., Kinder, Donald R., Miller, Warren E., & the National Election Studies 1994 Sample Design: Technical Memoranda, 1994 Election Study pp. 882-905 in Rosenstone, Steven J., Kinder, Donald R., Miller, Warren E., & the National Election Studies, AMERICAN NATIONAL ELECTION STUDY, 1994: ELECTION SURVEY (ENHANCED WITH 1992 AND 1993 DATA) (Computer file). Conducted by University of Michigan Center for Political Studies. 2nd ICPSR ed. Ann Arbor MI: University of Michigan, Center for Political Studies, and Inter-university Consortium for Political and Social Research (producer), 1995. Ann Arbor MI: Inter-university Consortium for Political and Social Research (distributor), 1995. Rust, K. (1985). "Variance estimation for complex estimators in sample surveys," Journal of Official Statistics, Vol. 1, No. 4. SAS Institute, Inc. (1990). SAS/STAT User's Guide, Version 6, Fourth Ed., Vol. 2. Cary, NC: SAS Institute, Inc. Shah, B.V., Barnwell, B.G., Biegler, G.S. (1996). SUDAAN User's Manual: Software for Statistical Analysis of Correlated Data. Research Triangle Park, NC: Research Triangle Institute. Skinner, C.J., Holt, D., & Smith, T.M.F. (1989). Analysis of Complex Surveys. New York: John Wiley & Sons. SPSS, Inc. (1993). SPSS for Windows: BASE System User's Guide, Release 6.0. Chicago, IL: SPSS Inc. Stata Corp. (1997). Stata Statistical Software: Release 5.0. College Station, TX: Stata Corporation. Wolter, K.M. (1985). Introduction to Variance Estimation. New York: Springer-Verlag. Woodruff, R.S. (1971), "A simple method for approximating the variance of a complicated estimate," Journal of the American Statistical Association, Vol. 66, pp. 411-414. Yamageuchi, K. (1991). Event History Analysis. Applied Social Research Methods Series, Vol. 28. Newbury Park, CA/London: Sage Publications. Office of Management and Budget (OMB) June 1990 definitions of MSAs, NECMAs, counties, parishes, independent cities.