Introduction-The data presented in this Statistical Abstract came from many sources. The sources include not only federal statistical bureaus and other organizations that collect and issue statistics as their principal activity, but also governmental administrative and regulatory agencies, private research bodies, trade associations, insurance companies, health associations, and private organizations such as the National Education Association and philanthropic foundations. Consequently, the data vary considerably as to reference periods, definitions of terms and, for ongoing series, the number and frequency of time periods for which data are available.
The statistics presented were obtained and tabulated by various means. Some statistics are based on complete enumerations or censuses, while others are based on samples. Some information is extracted from records kept for administrative or regulatory purposes (school enrollment, hospital records, securities registration, financial accounts, social security records, income tax returns, etc.), while other information is obtained explicitly for statistical purposes through interviews or by mail. The estimation procedures used vary from highly sophisticated scientific techniques, to crude ‘‘informed guesses.’’
Each set of data relates to a group of individuals or units of interest referred to as the target universe or target population, or simply as the universe or population.Prior to data collection the target universe should be clearly defined. For example, if data are to be collected for the universe of households in the United States, it is necessary to define a ‘‘household.’’ The target universe may not be completely tractable. Cost and other considerations may restrict data collection to a surveyuniverse based on some available list, such list may be it of date. This list is called a survey frame or sampling frame.
The data in many tables are based on data obtained for all population units, acensus, or on data obtained for only a portion, or sample, of the population units. When the data presented are based on a sample, the sample is usually a scientifically selected probability sample. This is a sample selected from a list or sampling frame in such a way that every possible sample has a known chance of selection and usually each unit selected can be assigned a number, greater than zero and less than or equal to one, representing its likelihood or probability of selection.
For large-scale sample surveys, the probability sample of units is often selected as a multistage sample. The first stage of a multistage sample is the selection of a probability sample of large groups of population members, referred to as primary sampling units (PSUs). For example, in a national multistage household sample, PSUs are often counties or groups of counties. The second stage of a multistage sample is the selection, within each PSU selected at the first stage, of smaller groups of population units, referred to as secondary sampling units. In subsequent stages of selection, smaller and smaller nested groups are chosen until the ultimate sample of population units is obtained. To qualify a multistage sample as a probability sample, all stages of sampling must be carried out using probability sampling methods.
Prior to selection at each stage of a multistage (or a single stage) sample, a list of the sampling units or sampling frame for that stage must be obtained. For example, for the first stage of selection of a national household sample, a list of the counties and county groups that form the PSUs must be obtained. For the final stage of selection, lists of households, and sometimes persons within the households, have to be compiled in the field. or surveys of economic entities and for the economic censuses the Census Bureau generally uses a frame constructed from the Bureau’s Business Register. The Business Register contains all establishments with payroll in the United States, including small single establishment firms as well as large multiestablishment firms
Wherever the quantities in a table refer to an entire universe, but are constructed from data collected in a sample survey, the table quantities are referred to as sample estimates. In constructing a sample estimate, an attempt is made to come as close as is feasible to the corresponding universe quantity that would be obtained from a complete census of the universe. Estimates based on a sample will, however, generally differ from the hypothetical census figures. Two classifications of errors are associated with estimates based on sample surveys: (1) samplingerror the error arising from the use of a sample, rather than a census, to estimate population quantities and (2) nonsamplingerror those errors arising from nonsampling sources. As discussed below, the magnitude of the sampling error for an estimate can usually be estimated from the sample data. However, the magnitudeof the nonsampling error for anestimate can rarely be estimated. Consequently, actual error in an estimate exceeds the error that can be estimated.
The particular sample used in a survey is only one of a large number of possible samples of the same size which could have been selected using the same sampling procedure. Estimates derived from the different samples would, in general, differ from each other. The standard error (SE) is a measure of the variation among the estimates derived from all possible samples. The standard error is the most commonly used measure of the sampling error of an estimate. Valid estimates of the standard errors of survey estimates can usually be calculated from the data collected in a probability sample. For convenience, the standard error is sometimes expressed as a percent of the estimate and is called the relative standard error or coefficient of variation (CV). For example, an estimate of 200 units with an estimated standard error of 10 units has an estimated CV of 5 percent
A sample estimate and an estimate of its standard error or CV can be used to construct interval estimates that have a prescribed confidence that the interval includes the average of the estimates derived from all possible samples with a known probability. To illustrate, if all possible samples were selected under essentially the same general conditions, and using the same sample design, and if an estimate and its estimated standard error were calculated from each sample, then: (1) Approximately 68 percent of the intervals from one standard error below the estimate to one standard error above the estimate would include the average estimate derived from all possible samples; (2) approximately 90 percent of the intervals from 1.6 standard errors below the estimate to 1.6 standard errors above the estimate would include the average estimate derived from all possible samples; and (3) approximately 95 percent of the intervals from two standard errors below the estimate to two standard errors above the estimate would include the average estimate derived from all possible samples.
Thus, for a particular sample, one can say with the appropriate level of confidence (e.g., 90 percent or 95 percent) that the average of all possible samples is included in the constructed interval. Example of a confidence interval: An estimate is 200 units with a standard error of 10 units. An approximately 90-percent confidence interval (plus or minus 1.6 standard errors) is from 184 to 216.
All surveys and censuses are subject to nonsampling errors. Nonsampling errors are of two kinds random and nonrandom.Random nonsampling errors arise because of the varying interpretation of questions (by respondents or interviewers) and varying actions of coders, keyers, and other processors. Some randomness is also introduced when respondents must estimate. Nonrandom nonsampling errors result from total nonresponse (no usable data obtained for a sampled unit), partial or item nonresponse (only a portion of a response may be usable), inability or unwillingness on the part of respondents to provide correct information, difficulty interpreting questions, mistakes in recording or keying data, errors of collection or processing, and coverage problems (overcoverage and undercoverage of the target universe). Random nonresponse errors usually, but not always, result in an understatement of sampling errors and thus an overstatement of the precision of survey estimates. Estimating the magnitude of nonsampling errors would require special experiments or access to independent data and, consequently, the magnitudes are seldom available.
Nearly all types of nonsampling errors that affect surveys also occur in complete censuses. Since surveys can be conducted on a smaller scale than censuses, nonsampling errors can presumably be controlled more tightly. Relatively more funds and effort can perhaps be expended toward eliciting responses, detecting and correcting response error, and reducing processing errors. As a result, survey results can sometimes be more accurate than census results.
To compensate for suspected nonrandom errors, adjustments of the sample estimates are often made. For example, adjustments are frequently made for nonresponse, both total and partial. Adjustments made for either type of nonresponse are often referred to as imputations.Imputation for total nonresponse is usually made by substituting for the questionnaire responses of the nonrespondents the ‘‘average’’ questionnaire responses of the respondents. These imputations usually are made separately within various groups of sample members, formed by attempting to place respondents and nonrespondents together that have ‘‘similar’’ design or ancillary characteristics. Imputation for item nonresponse is usually made by substituting for a missing item the response to that item of a respondent having characteristics that are ‘‘similar’’ to those of the nonrespondent.
For an estimate calculated from a sample survey, the total error in the estimate is composed of the sampling error, which can usually be estimated from the sample, and the nonsampling error, which usually cannot be estimated from the sample. The total error present in a population quantity obtained from a complete census is composed of only nonsampling errors. Ideally, estimates of the total error associated with data given in the StatisticalAbstract tables should be given. However, because of the unavailability of estimates of nonsampling errors, only estimates of the levels of sampling errors, in terms of estimated standard errors or coefficients of variation, are available. To obtain estimates of the estimated standard errors from the sample of interest, obtain a copy of the referenced report, which appears at the end of each table.
Source of Additional Material: The Federal Committee on Statistical Methodology (FCSM) is an interagency committee dedicated to improving the quality of federal statistics. <http://fcsm.ssd.census.gov>
Principal data bases-Beginning below are brief descriptions of 46 of the sample surveys and censuses that provide a substantial portion of the data contained in this Abstract.
U.S. DEPARTMENT OF AGRICULTURE, National Agriculture Statistics Service
Basic Area Frame Sample
Universe, Frequency, and Types of Data: June agricultural survey collects data on planted acreage and livestock inventories. The survey also serves to measure list incompleteness and is subsampled for multiple frame surveys.
Type of Data Collection Operation: Stratifiedprobability sample of about 11,000 land area units of about 1 sq. mile (range from 0.1 sq. mile in cities to several sq. miles in open grazing areas). Sample includes 42,000 parcels of agricultural land. About 20 percent of the sample replaced annually.
Data Collection and Imputation Procedures: Data collection is by personal enumeration. Imputation is based on enumerator observation or data reported by respondents having similar agricultural characteristics.
Estimates of Sampling Error: EstimatedCVs range from 1 percent to 2 percent for regional estimates to 3 percent to 6 percent for state estimates of major crop acres and livestock inventories.
Other (nonsampling) Errors: Minimizedthrough rigid quality controls on the collection process and careful review of all reported data.
Sources of Additional Material: U.S.Department of Agriculture, NationalAgricultural Statistics Service: The Fact Finders of Agriculture, September 1994.
Multiple Frame Surveys
Universe, Frequency, and Types of Data: Surveys of U.S. farm operators to obtain data on major livestock inventories, selected crop acreage and production, grain stocks, and farm labor characteristics; farm economic data and chemical use data.
Type of Data Collection Operation: Primaryframe is obtained from general or special purpose lists, supplemented by a probability sample of land areas used to estimate for list incompleteness.
Data Collection and Imputation Procedures: Mail, telephone, or personal interviews used for initial data collection. Mail nonrespondent followup by phone and personal interviews. Imputation based on average of respondents.
Estimates of Sampling Error: EstimatedCV for number of hired farm workers is about 3 percent. Estimated CVs range from 1 percent to 2 percent for regional estimates to 3 percent to 6 percent for state estimates of livestock inventories and crop acreage.
Other (nonsampling) Errors: In addition to above, replicated sampling procedures used to monitor effects of changes in survey procedures.
Sources of Additional Material: U.S.Department of Agriculture, NationalAgricultural Statistics Service: The Fact Finders of Agriculture, September 1994.
Objective Yield Surveys
Universe, Frequency, and Types of Data: Surveys for data on corn, cotton, potatoes, soybeans, and wheat to forecast and estimate yields.
Type of Data Collection Operation: Randomlocation of plots in probability sample. Corn, cotton, soybeans, spring wheat, and durum wheat selected in June from Basic Area Frame Sample (see above). Winter wheat and potatoes selected from March and June multiple frame surveys, respectively.
Data Collection and Imputation Procedures: Enumerators count and measure plant characteristics in sample fields. Production measured from plots at harvest. Harvest loss measured from post harvest gleanings.
Estimates of Sampling Error: CVs for national estimates of production are about 2-3 percent.
Other (nonsampling) Errors: In addition to above, replicated sampling procedures used to monitor effects of changes in survey procedures.
Sources of Additional Material: U.S.Department of Agriculture, NationalAgricultural Statistics Service: The Fact Finders of Agriculture, September 1994.
U.S. Census Bureau.
County Business Patterns
Universe, Frequency, and Types of Data: Annual tabulation of basic data items extracted from the Business Register, a file of all known single and multiestablishment companies maintained and updated by the Census Bureau. Data include number of establishments, number of employees, first quarter and annual payrolls, and number of establishments by employment size class. Data are excluded for self-employed persons, domestic service workers, railroad employees, agricultural production workers, and most government employees.
Type of Data Collection Operation: Theannual Company Organization Survey provides individual establishment data for multi establishment companies. Data for single establishment companies are obtained from various Census Bureau programs, such as the Annual Survey of Manufactures and Current Business Surveys, as well as from administrative records of the Internal Revenue Service and the Social Security Administration.
Estimates of Sampling Error: Not applicable. Other (nonsampling) Error: Responserates of greater than 85 percent for the 2000 Company Organization Survey.
1997 Economic Census- Manufacturing Sector
Universe, Frequency, and Types of Data: Conducted every 5 years to obtain information on labor, materials, capital input and output characteristics, plant location, and legal form of organization for all plants in the United States with one or more paid employees. Universe was 36,000 manufacturing establishments in 1997.
Type of Data Collection Operation: Completeenumeration of data itemsobtained from 200,000 firms. Administrative records from Internal Revenue Service (IRS) and Social Security Administration (SSA) are used for 166,000 smaller single-location firms, which were determined by various cutoffs based on size and industry.
Data Collection and Imputation Procedures: Four mail and telephone followups for larger nonrespondents. Data for small single-location firms (generally those with fewer than 10 employees) not mailed census questionnaires were estimated from administrative records of IRS and SSA. Data for nonrespondents were imputed from related responses or administrative records from IRS and SSA. Approximately 9 percent of total value of shipments was represented by fully imputed records in 1997.
Estimates of Sampling Error: Not applicable. Other (nonsampling) Errors: Based on evaluation studies, estimates of nonsampling errors for 1972 were about 1.3 percent for estimated total payroll; 2 percent for total employment; and 1 percent for value of shipments. Estimates for later years are not available.
Sources of Additional Material: U.S. Census Bureau, 1997 Economic Census -Manufacturing Sector, Industry Series, Geographic Area Series, Subject Series, and Summary Series.
Foreign Trade-Export Statistics
Universe, Frequency, and Types of Data: The export declarations collected by U.S. Bureau of Customs and Border Protection are processed each month to obtain data on the movement of U.S. merchandise exports to foreign countries. Data obtained include value, quantity, and shipping weight of exports by commodity, country of destination, district of exportation, and mode of transportation.
Type of Data Collection Operation: ShippersExport Declarations (paper andelectronic) are generally required to be filed for the exportation of merchandise valued over $2,500. U.S. Bureau of Customs and Boarder Protection officials collect and transmit the documents to the Census Bureau on a flow basis for data compilation. Data for shipments valued under $2,501 are estimated, based on established percentages of individual country totals.
Data Collection and Imputation Procedures: Statistical copies of Shippers Export Declarations are received on a daily basis from ports throughout the country and subjected to a monthly processing cycle. They are fully processed to the extent they reflect items valued over $2,500. Estimates for shipments valued at $2,500 or less are made, based on established percentages of individual country totals.
Estimates of Sampling Error: Not applicable. Other (nonsampling) Errors: The goods data are a complete enumeration of documents collected by the U.S. Bureau of Customs and Boarder Protection and are not subject to sampling errors, but they are subject to several types of nonsampling errors. Quality assurance procedures are performed at every stage of collection, processing and tabulation; however the data are still subject to several types of nonsampling errors. The most significant of these include reporting errors, undocumented shipments, timeliness, data capture errors, and errors in the estimation of low-valued transactions.
Sources of Additional Material: U.S.Census Bureau, U.S. International Trade in Goods and Services, FT 900, U.S. Imports of Merchandise, and U.S. Exports of Merchandise. <http://www.census.gov/foreign-trade/guide/sec2.html>
Foreign Trade-Import Statistics
Universe, Frequency, and Types of Data: The import entry documents collected by U.S. Bureau of Customs and Boarder Protection are processed each month to obtain data on the movement of merchandise imported into the United States. Data obtained include value, quantity, and shipping weight by commodity, country of origin, district of entry, and mode of transportation.
Type of Data Collection Operation: Importentry documents, either paper or electronic, are required to be filed for the importation of goods into the United States valued over $2,000 or for articles which must be reported on formal entries. U.S. Bureau of Customs and Boarder Protection officials collect and transmit statistical copies of the documents to the Census Bureau on a flow basis for data compilation. Estimates for shipments valued under $2,001 and not reported on formal entries are based on estimated established percentages for individual country totals.
Data Collection and Imputation Procedures: Statistical copies of import entry documents, received on a daily basis from ports of entry throughout the country, are subjected to a monthly processing cycle. They are fully processed to the extent they reflect items valued at $2,001 and over or items which must be reported on formal entries.
Estimates of Sampling Error: Not applicable. Other (nonsampling) Errors: The goods data are a complete enumeration of documents collected by the U.S. Bureau of Customs and Boarder Protection and are not subject to sampling errors, but they are subject to several types of nonsampling errors. Quality assurance procedures are performed at every stage of collection, processing and tabulation; however the data are still subject to several types of nonsampling errors. The most significant of these include reporting errors, undocumented shipments, timeliness, data capture errors, and errors in the estimation of low-valued transactions.
Sources of Additional Material: U.S.Census Bureau, U.S. International Trade in Goods and Services, FT 900, U.S. Imports of Merchandise, and U.S. Exports of Merchandise. <http://www.census.gov/foreign-trade/guide/sec2.html>
Census of Governments
Universe, Frequency, and Types of Data: Survey of all governmental units in the United States conducted every 5 years to obtain data on government revenue, expenditures, debt, assets, employment and employee retirement systems, property values, public school systems, and number, size, and structure of governments.
Type of Data Collection Operation: Completecensus. List of units derivedthrough classification of government units recently authorized in each state and identification, counting, and classification of existing local governments and public school systems.
Data Collection and Imputation Procedures: Data collected through field and office compilation of financial data from official records and reports for states and large local governments; mail canvass of selected data items, like state tax revenue and employee retirement systems; and collection of local government statistics through central collection arrangements with state governments.
Estimates of Sampling Error: Not applicable.
Other (nonsampling) Errors: Some nonsampling errors may arise due to possible inaccuracies in classification, response, and processing.
Sources of Additional Material: Publications<http://www.census.gov/prod/www/abs/govern.html>: U.S. Census Bureau, Public Employment in 1992, GE 92, No. 1, Governmental Finances in 1991-1992, GF 92, No. 5, and Census of Governments, 1997 and 2002, variousreports. Web site references: Census of Governments <http://www.census.gov/govs/www/cog2002.html> and <http://www.census.gov/govs/www/cog.html>. Employment-state and local site: <http://www.census.gov/govs/www/apes.html>. Finance-state and local site: <http://www.census.gov/govs/www/financegen.html>.
Annual Surveys of State and Local Government
Universe, Frequency, and Types of Data: Sample survey conducted annually to obtain data on revenue, expenditure, debt, and employment of state and local governments. Universe is all governmental units in the United States (about 87,500).
Type of Data Collection Operation: Samplesurvey includes all state governments, county governments with 100,000+ population, municipalities with 75,000+ population, townships with 50,000+ population, all school districts with 10,000+ enrollment in March 2000, and other governments meeting certain criteria; probability sample for remaining units.
Data Collection and Imputation Procedures: Field and office compilation of data from official records and reports for states and large local governments; central collection of local governmental financial data through cooperative agreements with a number of state governments; mail canvass of other units with mail and telephone follow-ups of nonrespondents. Data for nonresponses are imputed from previous year data or obtained from secondary sources, if available.
Estimates of Sampling Error: State and local government totals are generally subject to sampling variability of less than 3 percent.
Other (nonsampling) Errors: Nonresponserate is less than 15 percent for local governments. Other possible errors may result from undetected inaccuracies in classification, response, and processing.
Sources of Additional Material: Publications<http://www.census.gov/prod/www/abs/govern.html>: U.S. Census Bureau, Public Employment in 1992, GE92, No. 1, Governmental Finances in 1991-1992, GF 92, No. 5, and Census of Governments, 1997 and 2002, variousreports. Web site references: Census of Governments <http://www.census.gov/gov/govs/www/cog202.html> and <http://www.census.gov/govs/www/cog.html>. Employment-state and local site: <http://www.census.gov/govs/www/apes.html>. Finance-state and local site: <http://www.census.gov/govs/www/financegen.html> and <http://www.census.gov/govs/www/statetechdoc.html>
American Housing Survey
Universe, Frequency, and Types of Data: Conducted nationally in the fall in odd numbered years to obtain data on the approximately 116 million occupied or vacant housing units in the United States (group quarters are excluded). Data include characteristics of occupied housing units, vacant units, new housing and mobile home units, financial characteristics, recent mover households, housing and neighborhood quality indicators, and energy characteristics.
Type of Data Collection Operation: Thenational sample was a multistage probability sample with about 53,000 units eligible for interview in 2001. Sample units, selected within 394 PSUs, were surveyed over a 4-month period. Data Collection and Imputation Procedures: For 2001, the survey was conducted by personal interviews. The interviewers obtained the information from the occupants or, if the unit was vacant, from informed persons such as landlords, rental agents, or knowledgeable neighbors.
Estimates of Sampling Error: For the national sample, illustrations of the SE of the estimates are provided in the Appendix D of the 2001 report. As an example, the estimated CV is about 0.2 percent for the estimated percentage of owner occupied units with two persons.
Other (nonsampling) Errors: Responserate was about 93 percent. Nonsampling errors may result from incorrect or incomplete esponses, errors in coding and recording, and processing errors. For the 2001 national sample, approximately 1.9 percent of the total housing inventory was not adequately represented by the AHS sample.
Sources of Additional Material: U.S. Census Bureau, Current Housing Reports, Series H150 and H-170, American Housing Survey. <http://www.census.gov/hhes/www/ahs.html>.
Monthly Survey of Construction
Universe, Frequency, and Types of Data: Survey conducted monthly of newly constructed housing units (excluding mobile homes). Data are collected on the start, completion, and sale of housing.(Annual figures are aggregates of monthly estimates.)
Type of Data Collection Operation: Probabilitysample of housing units obtained from building permits selected from 19,000 places. For nonpermit places, multistage probability sample of new housing units selected in 169 PSUs. In those areas, all roads are canvassed in selected enumeration districts.
Data Collection and Imputation Procedures: Data are obtained by telephone inquiry and field visit.
Estimates of Sampling Error: EstimatedCV of 3 percent to 4 percent for estimates of national totals, but may be higher than 20 percent for estimated totals of more detailed characteristics, such as housing units in multiunit structures.
Other (nonsampling) Errors: Responserate is over 90 percent for most items. Nonsampling errors are attributed to definitional problems, differences in interpretation of questions, incorrect reporting, inability to obtain information about all cases in the sample, and processing errors.
Sources of Additional Material: All data are available on the Internet at <http://www.census.gov/const/www/newsresconstindex.html>. Further documentation of the survey is also available at that site.
Value of Construction Put in Place
Universe, Frequency, and Types of Data: Survey conducted monthly on total value of all construction put in place in the current month, both public and private projects. Construction values include costs of materials and labor, contractors profits, overhead costs, cost of architectural and engineering work, and miscellaneous project costs. (Annual figures are aggregates of monthly estimates.)
Type of Data Collection Operation: Variesby type of activity: Total cost of private one-family houses started each month is distributed into value put in place using fixed patterns of monthly construction progress; using a multistage probability sample, data for private multifamily housing are obtained by mail from owners of multiunit projects. Data for residential additions and alterations are obtained in a quarterly survey measuring expenditures; monthly estimates are interpolated from quarterly data. Estimates of value of private nonresidential construction, and state and local government construction are obtained by mail from owners (or agents) for a probability sample of projects. Estimates of farm nonresidential construction expenditures are based on U.S. Department of Agriculture annual estimates of construction; public utility estimates are obtained from reports submitted to federal regulatory agencies and from private utility companies; estimates of federal construction are based on monthly data supplied by federal agencies.
Data Collection and Imputation Procedures: See ‘‘Type of Data Collection’’ Operation. Imputation accounts for approximately 25 percent of estimated value of construction each month. Estimates of Sampling Error: CV estimates for private nonresidential construction range from 3 percent for estimated value of industrial buildings to 9 percent for religious buildings. CV is approximately 2 percent for total new private nonresidential buildings.
Other (nonsampling) Errors: For directly measured data series based on samples, some nonsampling errors may arise from processing errors, imputations, and misunderstanding of questions. Indirect data series are dependent on the validity of the underlying assumptions and procedures.
Sources of Additional Material: U.S. Census Bureau, Construction Reports, SeriesC30, Value of Construction Put in Place.
Annual Survey ofManufactures
Universe, Frequency, and Types of Data: The Annual Survey of Manufactures (ASM) is conducted annually, except for years ending in 2 and 7 for all manufacturing establishments having one or more paid employees. The purpose of the ASM is to provide key intercensal measures of manufacturing activity, products, and location for the public and private sectors. The ASM provides statistics on employment, payroll, worker hours, payroll supplements, cost of materials, value added by manufacturing, capital expenditures, inventories, and energy consumption. It also provides estimates of value of shipments for 1,800 classes of manufactured products. Type of Data Collection Operation: TheASM includes approximately 55,000 establishments selected from the census universe of 366,000 manufacturing establishments. Some 25,000 large establishments are selected with certainty, and some 30,000 other establishments are selected with probability proportional to a composite measure of establishment size. The survey is updated from two sources; Internal Revenue Service administrative records are used to include new single-unit manufacturers and the Company Organization Survey identifies new establishments of multiunit forms.
Data Collection and Imputation Procedures: Survey is conducted by mail with phone and mail follow-ups of nonrespondents. Imputation (for all nonresponse items) is based on previous year reports, or for new establishments in survey, on industry averages.
Estimates of Sampling Error: Estimatedstandard errors for number of employees, new expenditure, and for value added totals are given in annual publications. For U.S. level industry statistics, most estimated standard errors are 2 percent or less, but vary considerably for detailed characteristics.
Other (nonsampling) Errors: Responserate is about 85 percent. Nonsampling errors include those due to collection, reporting, and transcription errors, many of which are corrected through computer and clerical checks.
Sources of Additional Material: U.S. Census Bureau, Annual Survey of Manufactures, and Technical Paper 24. <http://www.census.gov/econ/www/mancen.html>.
Census of Population
Universe, Frequency, and Types of Data: Complete count of U.S. population conducted every 10 years since 1790. Data obtained on number and characteristics of people in the U.S.
Type of Data Collection Operation: In1980, 1990, and 2000 complete census for some items age, sex, race, and relationship to householder. In 1980, approximately 19 percent of the housing units were included in the sample; in 1990 and 2000, approximately 17 percent.
Data Collection and Imputation Procedures: In 1980, 1990, and 2000, mail questionnaires were used extensively with personal interviews in the remainder. Extensive telephone and personal followup for nonrespondents was done in the censuses. Imputations were made for missing characteristics.
Estimates of Sampling Error: Samplingerrors for data are estimated for all items collected by sample and vary by characteristic and geographic area. The CVs for national and state estimates are generally very small.
Other (nonsampling) Errors: Since 1950, evaluation programs have been conducted to provide information on the magnitude of some sources of nonsampling errors such as response bias and undercoverage in each census. Results from the evaluation program for the 1990 census indicate that the estimated net under coverage amounted to about 1.5 percent of the total resident population. For Census 2000, the evaluation program indicates a net overcount of 0.5 percent of the resident population.
Sources of Additional Material: U.S. Census Bureau, The Coverage of Population in the 1980 Census, PHC80-E4; ContentReinterview Study: Accuracy of Data for Selected Population and Housing Characteristics as Measured by Reinterview, PHC80-E2; 1980 Census of Population, Vol. 1, (PC80-1), Appendixes B, C, and D; 1990 Census of Population and Housing, Content Reinterview Study, CPH-E-1,1990 Census of Population and Housing, Effectiveness of Quality Assurance, CPHE-2, 1990 Census of Population and Housing, Programs to Improve Coverage, CPH-E-3. For 2000 census see <http://www.census.gov/pred/www>.
Current Population Survey (CPS)
Universe, Frequency, and Types of Data: Nationwide monthly sample survey of civilian noninstitutional population, 15 years old or over, to obtain data on employment, unemployment, and a number of other characteristics.
Type of Data Collection Operation: Multistageprobability sample of about50,000 households in 754 PSUs in 1996 expanded to about 60,000 households in July 2001. Oversampling in some states and the largest MSAs to improve reliability for those areas of employment data on annual average basis. A continual sample rotation system is used. Households are in sample 4 months, out for 8 months, and in for 4 more. Month--to-month overlap is 75 percent; year-to-year overlap is 50 percent.
Data Collection and Imputation Procedures: For first and fifth months that a household is in sample, personal interviews; other months, approximately, 85 percent of the data collected by phone. Imputation is done for both item and total nonresponse. Adjustment for total nonresponse is done by a predefined cluster of units, by MSA size and residence; for item nonresponse imputation varies by subject matter.
Estimates of Sampling Error: EstimatedCVs on national annual averages for labor force, total employment, and nonagricultural employment, 0.2 percent; for total unemployment and agricultural employment, 1.0 percent to 2.5 percent. The estimated CVs for family income and poverty rate for all persons in 1986 are 0.5 percent and 1.5 percent, respectively. CVs for subnational areas, such as states, would be larger and would vary by area
Other (nonsampling) Errors: Estimates of response bias on unemployment are not available, but estimates of unemployment are usually 5 percent to 9 percent lower than estimates from reinterviews. Six to 7.0 percent of sample households unavailable for interviews.
Sources of Additional Material: U.S. Census Bureau and Bureau of Labor Statistics, Current Population Survey; Designand Methodology, (Tech. Paper 63), available on Internet <http://www.census.gov/prod/2002pubs/tp63rv.pdf> Source and Accuracy of Estimates for Poverty in the United States, availableon Internet <http://www.census.gov/hhes/poverty/poverty02/pov02src.pdf> and Bureau of Labor Statistics, Employment and Earnings, monthly, Explanatory Notes and Estimates of Error, Household Data and BLSHandbook of Methods, Chapter 1, available on the Internet at <http://www.bls.gov/opub/hom/homch1a.htm>.
Surveys of Minority- and Women-Owned Business Enterprises (SMOBE/SWOBE)
Universe, Frequency, and Types of Data: The surveys provide basic economic data on businesses owned by Blacks, Hispanics, Asians, Pacific Islanders, Alaska Natives, American Indians, and Women. All firms operating during 1997, except those classified as agricultural, are represented. The lists of all firms (or sample frames) are compiled from a combination of business tax returns and data collected on other economic census reports. The published data include the number of firms, gross receipts, number of paid employees, and annual payroll. The data are presented by geographic area, industry, size of firm, and legal form of organization of firm.
Type of Data Collection Operation: Thesurveys are based on a stratified probability sample of approximately 2.5 million firms from a universe of approximately 20.8 million firms. There were approximately 5.3 million firms with paid employees and 15.5 million firms with no paid employees. The data are based on the entire firm rather than on individual locations of a firm
Data Collection and Imputation Procedures: Data were collected through a mailout/mailback operation. Compensation for missing data is addressed through reweighting, edit correction, and standard statistical imputation methods.
Estimates of Sampling Error: Variability in the estimates is due to the sample selection and estimation for items collected by SMOBE/SWOBE. CVs are applicable to only published cells in which sample cases are tabulated. The CVs for number of firms and receipts at the national level range from 1 to 4 percent.
Other (nonsampling) Error: Nonsamplingerrors are attributed to many sources: inability to obtain information for all cases in the universe, adjustments to the weights of respondents to compensate for nonrespondents, imputation for missing data, data errors and biases, mistakes in recording or keying data, errors in collection or processing, and coverage problems.
Sources of Additional Materials: U.S. Census Bureau, Guide to the 1997 Economic Census and Related Statistics.
1997 Economic Census (GeographicArea Series and Subject Series Reports) (for NAICS sectors 22, 42, 4445, 48-49, and 51-81)
Universe, Frequency, and Types of Data: Conducted every 5 years to obtain data on number of establishments, number of employees, total payroll size, total sales, and other industry specific statistics. In 1997, the universe was all employer and nonemployer establishments primarily engaged in wholesale, retail, utilities, finance & insurance, real estate, transportation & warehousing, and other service industries. Type of Data Collection Operation: Alllarge employer firms were surveyed (i.e. all employer firms above the payroll size cutoff established to separate large from small employers) plus a 5 percent to 25 percent sample of the small employer firms. Firms with no employees were not required to file a census return.
Data Collection and Imputation Procedures: Mail questionnaires were used with both mail and telephone follow-ups for nonrespondents. Data for nonrespondents and for small employer firms not mailed a questionnaire were obtained from administrative records of the IRS and Social Security Administration or imputed. Nonemployer data were obtained exclusively from IRS 1997 income tax returns. Estimates of Sampling Error: Not applicable for basic data such as sales, revenue, payroll, etc
Other (nonsampling) Errors: Trade area level unit response rates in 1997 ranged from 85 percent to 99 percent. Item response rates ranged from 60 percent to 90 percent with lower rates for the more detailed questions. Nonsampling errors may occur during the collection, reporting, and keying of data, and industry misclassification. Sources of Additional Material: U.S. Census Bureau, 1997 Economic Census: GeographicArea Series and Subject Series Reports (by NAICS sector), Appendix C, and <www.census.gov/con97.html>.
Service Annual Survey
Universe, Frequency, and Types of Data: The U.S. Census Bureau conducts the Service Annual Survey to provide national estimates of revenues, expenses, and e-commerce revenues for taxable and tax-exempt firms classified in selected service industries. Estimates are summarized by industry classification based on the 1997 North American Industry Classification System (NAICS). Industries covered by the Service Annual Survey include all or part of the following NAICS sectors: Transportation and Warehousing (NAICS 48-49); Information (NAICS 51); Finance and Insurance (NAICS 52); Real Estate and Rental and Leasing (NAICS 53); Professional, Scientific, and Technical Services (NAICS 54); Administrative and Support and Waste Management and Remediation Services (NAICS 56); Health Care and Social Assistance (NAICS 62); Arts, Entertainment, and Recreation (NAICS 71); and Other Services, except Public Administration (NAICS 81). Data items collected include total revenue, revenue from e-commerce transactions; and for selected industries, revenue from detailed service products, total expenses, and expenses by major type, revenue from exported services, and inventories. Questionnaires are mailed in January and request annual data for the prior year. Estimates are published approximately 12 months after the initial survey mailing.
Type of Data Collection Operation: The Service Annual Survey estimates are developed using data from a probability sample and administrative records. ServiceAnnual Survey questionnaires are mailed to a probability sample that is periodically reselected from a universe of firms located in the United States and having paid employees. The sample includes firms of all sizes and covers both taxable firms and firms exempt from federal income taxes. Updates to the sample are made on a quarterly basis to account for new businesses. Firms without paid employees or nonemployers are included in the estimates through imputation and/or administrative records data provided by other federal agencies. Links to additional information about confidentiality protection, sampling error, nonsampling error, sample design, definitions, and copies of the questionnaires may be found on the Internet at <http://www.census.gov/econ/www/servmenu.html>.
Estimates of Sampling Error: Coefficientsof variation for the 2001 Service Annual Survey estimates range from 0.7 percent to 2.1 percent for total revenue estimates computed at the NAICS sector (2-digit NAICS code) level. Sampling errors for more detailed industries are shown in the corresponding publications.Links to additional information regarding sampling error may be found at: <http://www.census.gov/svsd/www/cv.html>.
Other (nonsampling) Errors: Data are imputed for unit nonresponse, item nonresponse, and for reported data that fails edits. The percent of imputed data for total revenue for the 2001 ServiceAnnual Survey is approximately 12 percent. Sources of Additional Material: U.S. Census Bureau, Current Business Reports, Service Annual Survey, Census Bureau Web site: <http://www.census.gov/econ/www/servmenu.html>.
Wholesale Trade Survey
Universe, Frequency, and Types of Data: Provides monthly estimates of wholesale sales and end of month inventories. Type of Data Collection Operation: Probabilitysample of all firms from a listframe and additionally, for retail and service an area frame. The list frame is the Bureaus Standard Statistical Establishment List (SSEL) updated quarterly for recent birth Employer Identification (EI) Numbers issued by the Internal Revenue Service and assigned a kind of business code by the Social Security Administration. The largest firms are included monthly
Data Collection and Imputation Procedures: Data are collected by mail questionnaire with telephone followups for nonrespondents. Imputation made for each nonresponse item and each item failing edit checks.
Estimates of Sampling Error: For the 2001 monthly surveys median CVs are about 0.6 percent for estimated total retail sales, 1.3 for wholesale sales, 1.6 for wholesale inventories. For dollar volume of receipts, CVs from the Service Annual Survey vary by kind of business and range between 1.5 percent to 15.0 percent. Sampling errors are shown in monthly publications.
Other (nonsampling) Errors: Imputationrates are about 18 percent to 23 percent for monthly retail sales, 30 percent for wholesale sales, about 32 percent for monthly wholesale inventories, and 14 percent for the Service Annual Survey. Sources of Additional Material: U.S. Census Bureau, Current Business Reports, Monthly Retail Trade, Monthly Wholesale Trade, and Service Annual Survey.
Monthly Retail Trade and Food Service Survey
Universe, Frequency, and Types of Data: Provides monthly estimates of retail and food service sales by kind of business and end of month inventories of retail stores
Type of Data Collection Operation: Probabilitysample of all firms from a listframe. The list frame is the Bureaus Standard Statistical Establishment List (SSEL) updated quarterly for recent birth Employer Identification (EI) Numbers issued by the Internal Revenue Service and assigned a kind of business code by the Social Security Administration. The largest firms are included monthly; a sample of others is included every month also. Data Collection and Imputation Procedures: Data are collected by mail questionnaire with telephone followups for nonrespondents. Imputation made for each nonresponse item and each item failing edit checks.
Estimates of Sampling Error: For the 2003 monthly surveys, CVs are about 0.5 percent for estimated total retail sales and 0.99 percent for estimated total retail inventories. Sampling errors are shown in monthly publications.
Other (nonsampling) Errors: Imputationrates are about 20 percent for monthly retail and food service sales, and 28 percent for monthly retail inventories. Sources of Additional Material: U.S. Census Bureau, Current Business Reports, Monthly Retail Trade.
Nonemployer Statistics
Universe, Frequency, and Types of Data: Nonemployer statistics are an annual tabulation of economic data by industry for active businesses without paid employees that are subject to federal income tax. Data showing the number of establishments and receipts by industry are available for the U.S., states, counties, and metropolitan areas. Most types of businesses covered by the Census Bureaus economic statistics programs are included in the nonemployer statistics. Tax-exempt and agriculturalproduction businesses are excluded from nonemployer statistics.
Type of Data Collection Operation: Theuniverse of nonemployer establishments is created annually as a byproduct of the Census Bureaus Business Register processing for employer establishments. If a business is active but without paid employees, then it becomes part of the potential nonemployer universe. Industry classification and receipts are available for each potential nonemployer business. These data are obtained primarily from the annual business income tax returns of the Internal Revenue Service (IRS). The potential nonemployer universe undergoes a series of complex processing, editing, and analytical review procedures at the Census Bureau to distinguish nonemployers from employers, and to correct and complete data items used in creating the data tables.
Estimates of Sampling Error: Not applicable.
Other (nonsampling) Errors: The data are subject to nonsampling errors, such as errors of self-classification by industry on tax forms, as well as errors of response, keying, nonreporting, and coverage.Sources of Additional Material: U. S. Census Bureau, Nonemployer Statistics: 2000 (Introduction; Coverage and Methodology). See also <http://www.census.gov/epcd/nonemployer/view/cov&meth.htm>.
U.S. DEPARTMENT OF EDUCATION, National Center for Education Statistics
Higher Education General Information Survey (HEGIS), Degrees and Other Formal Awards Conferred. Beginning1986, Integrated Postsecondary Education Data Survey (IPEDS)
Universe, Frequency, and Types of Data: Annual survey of all institutions and branches listed in the Education Directory, Colleges and Universities to obtain data on earned degrees and other formal awards, conferred by field of study, level of degree, sex, and by racial/ethnic characteristics (every other year prior to 1989, then annually).
Type of Data Collection Operation: Completecensus.
Data Collection and Imputation Procedures: Survey package is usually mailed in the spring with surveys due at varying dates in the summer and fall; mail and phone followup procedures for nonrespondents. Missing data are imputed by using data of similar institutions.
Estimates of Sampling Error: Not applicable.
Other (nonsampling) Errors: For 2000- 2001, approximately 92.3 percent response rate for degree-granting institutions. Sources of Additional Material: <http://www.nces.ed.gov/ipeds>
Higher Education General Information Survey (HEGIS), Fall Enrollment in Institutions of Higher Education; beginning 1986, IntegratedPostsecondary EducationData Survey (IPEDS), Completions Fall Enrollment
Universe, Frequency, and Types of Data: Annual survey of all institutions and branches listed in the Directory to obtain data on total enrollment by sex, level of enrollment, type of program, racial/ethnic characteristics (every other year prior to 1989, then annually) and attendance status of student, and on first-time students.
Type of Data Collection Operation: Completecensus. Data Collection and Imputation Procedures: Survey package is usually mailed in the spring with surveys due at varying dates in the summer and fall; mail and phone followup procedures for nonrespondents. Missing data are imputed by using data of similar institutions.
Estimates of Sampling Error: Not applicable.
Other (nonsampling) Errors: For degree granting institutions approximately 97.2 percent response rate in fall 2000. Sources of Additional Material: U.S.Department of Education, National Center for Education Statistics, Fall Enrollment in Higher Education, annual.<http://www.nces.ed.gov/ipeds>
Higher Education General Information Survey (HEGIS), Financial Statistics of Institutions of Higher Education; beginning 1986, IntegratedPostsecondary EducationData Survey (IPEDS), Finance
Universe, Frequency, and Types of Data: Annual survey of all institutions and branches listed in the Education Directory, Colleges and Universities to obtain data on financial status and operations, including current funds revenues, current funds expenditures, and physical plant assets.
Type of Data Collection Operation: Completecensus.
Data Collection and Imputation Procedures: Survey package is usually mailed in the spring with surveys due at varying dates in the summer and fall; mail and phone followup procedures for nonrespondents. Missing data are imputed by using data of similar institutions.
Estimates of Sampling Error: Not applicable.
Other (nonsampling) Errors: For 2000, 96.7 percent for degree-granting institutions. Sources of Additional Material: U.S.Department of Education, National Center for Education Statistics, FinancialStatistics. <http://www.neces.ed.gov/ipeds>
National Postsecondary Student Aid Study
Universe, Frequency, and Types of Data: NPSAS is a comprehensive nationwide study conducted every 3 to 4 years by NCES. It was first administered in 1986- 87. The purpose of the study is to produce reliable national estimates of how students and their families pay for postsecondary education. Information has been gathered on more than 55,000 students in each study cycle.
Type of Data Collection Operation: Thedesign for the NPSAS sample involves selecting a nationally representative sample of postsecondary education institutions and students within those institutions. NPSAS data come from multiple sources, including institutional records, government databases, and student telephone interviews
Data Collection and Imputation Procedures: NPSAS:2000 involved a multistage effort to collect information related to student aid. An initial NPSAS: 2000 data collection stage collected electronic student aid report information directly from the U.S. Department of Education Central Processing System for federal financial aid applications. The second stage involved abstracting information from the students records at the school from which he/she was sampled, using a computer-assisted data entry system. In the third stage, interviews were conducted with sampled students, primarily using a computer-assisted telephone interviewing (CATI) procedure. Computer-assisted personal interviewing procedures, using field interviewers, were also used for the first time on a NPSAS study, to help reduce the level of nonresponse to CATI. Over the course of data collection, some data were obtained from the Department of Educations National Student Loan Data System (NSLDS), the ACT and the Educational Testing Service. The additional data sources provided a way to check or confirm information obtained from student records or the interview and include other data. After the editing process (which included logical imputations), the remaining missing values for 23 analysis variables were imputed statistically. Most of the variables were imputed using a weighted hot deck procedure
Estimates of Sampling Error: Estimatesvary by characteristic and variable.
Other (nonsampling) Errors: There is nearly complete coverage of the institutions in the target population. Student coverage, however, is dependent upon enrollment lists provided by the institutions. For the 1999-2000 NPSAS, the overall weighted study response rate was 89 percent. There is also error due to unit and item nonresponse, as well as measurement error. Possible sources of measurement error include: incorrect reporting, variation in question delivery or interpretation, and mistakes in data entry.
Sources of Additional Material: U.S.Department of Education, National Center for Education Statistics. National Postsecondary Student Aid Study, 1999- 2000 (NPSAS: 2000), MethodologyReport, NCES 2002152, by John A. Riccobono,Melissa B. Cominole, Peter H.Siegel, Timothy J. Gabel, Michael W. Link, and Lutz K. Berkner. Project officer, Andrew G. Malizio. Washington, DC: 2001. U.S. Department of Education. National Center for Education Statistics. NCES Handbook of Survey Methods, NCES 2003603, by Lori Thurgood, Elizabeth Walter, George Carter, Susan Henn, Gary Huang, Daniel Nooter, Wray Smith, R. William Cash, and Sameena Salvucci. Project Officers, Marilyn Seastrom, Tai Phan, and Michael Cohen. Washington, DC: 2003.
National Household Education Surveys Program
Universe, Frequency, and Types of Data: The National Household Education Surveys Program (NHES) is a system of telephone surveys of the noninstitutionalized civilian population of the United States. Surveys in NHES have varying universes of interest depending on the particular survey. Specific topics covered by each survey are at the NHES Web site <http://neces.ed.gov/nhes>. A list of the surveys fielded as part of NHES, each universe, and the years they were fielded is provided below. (1) Adult Education Interviews were conducted with a representative sample of civilian, noninstitutionalized persons age 16 and older who were not enrolled in grade 12 or below (1991, 1995, 1999, 2001, 2003). (2) Before- and After-School Programs and Activities Interviews were conducted with parents of a representative sample of students in grades K8 (1999, 2001). (3) Civic Involvement Interviews were conducted with a representative sample of parents, youth, and adults(1996, 1999). (4) Early Childhood Program Participation Interviews were conducted with parents of a representative sample of children from birth through grade 3, with the specific age groups varying by survey year (1991, 1995, 1999, 2001). (5) Household and Library Use Interviews were conducted with a representative sample of U.S. households (1996). (6) Parent and Family Involvement in Education Interviews were conducted with parents of a representative sample of children age three through grade 12 or in grades K-12 depending on the survey year.(1996, 1999, 2003). (7) School Readiness Interviews were conducted with parents of a representative sample of 3- to 7-year-old children (1993, 1999). (8) School Safety and Discipline Interviews were conducted with a representative sample of students in grades 612, their parents, and the parents of a representative sample of students in grades 3-5 (1993).
Type of Data Collection Operation: NHESuses telephone interviews to collect data. Data Collection and Imputation Procedures: Telephone numbers are selected using random digit-dialing techniques. Approximately 45,000 to 64,000 households are contacted in order to identify persons eligible for the surveys. Data are collected using computer-assisted telephone interviewing (CATI) procedures. Missing data are imputed using hot-deck imputation procedures
Estimates of Sampling Error: Unweightedsample sizes range between 2,500 and 21,000. The average root design effects of the surveys in NHES range from 1.1 to 4.5.
Other (nonsampling) Errors: Because of unit nonresponse and because the samples are drawn from households with telephone instead of all households, nonresponse and/or coverage bias may exist for some estimates. However, both sources of potential bias are adjusted for in the weighting process. Analyses of both potential sources of bias in the NHES collections have been studied and no significant bias has been detected.
Sources of Additional Material: Please see the NHES Web site at <http://nces.ed.gov/nhes>.
U.S. ENERGY INFORMATION ADMINISTRATIONResidential EnergyConsumption Survey
Universe, Frequency, and Types of Data: Quadriennial survey of households and their fuel suppliers. Data are obtained on energy-related household characteristics, housing unit characteristics, use of fuels, and energy consumption and expenditures by fuel type.
Type of Data Collection Operation: Probabilitysample in 116 PSUs. The 1997survey resulted in 5,900 completed interviews. The 2001 survey resulted in 5,318 completed interviews. For responding units, fuel consumption and expenditure data obtained from fuel suppliers to those households
Data Collection and Imputation Procedures: Personal interviews. Extensive followup of nonrespondents including mail questionnaires for some households. Adjustments for nonrespondents were made in weighting for respondents. Most item nonresponses were imputed.
Estimates of Sampling Error: EstimatedCVs for household averages (1997 survey): for consumption, 1.3 percent; for expenditures, 1.0 percent; for various fuels, values ranged from 2.0 percent; for electricity to 7.0 percent for LPG.
Other (nonsampling) Errors: Householdresponse rate of 81.0 percent for 1997 survey and 76.7 for 2001 survey. Nonconsumption data were mostly imputed for mail respondents (2.5 percent of eligible units in 1997 and 3.9 percent in 2001). Usable responses from fuel suppliers for various fuels ranged from 80.7 percent for electricity to 56.6 percent for fuel oil for the 1997 survey.
Sources of Additional Material: U.S.Energy Information Administration, ALook at Residential Energy Consumption in 1997. The Web page for the Residential Energy Consumption Survey is at: <http://www.eia.doe.gov/emeu/recs>.
U.S. NATIONAL CENTER FOR HEALTH STATISTICS (NCHS) National Vital Statistics System
Universe, Frequency, and Types of Data: Annual data on births and deaths in the United States.
Type of Data Collection Operation: Mortalitydata based on complete file of death records, except 1972, based on 50 percent sample. Natality statistics 1951-71, based on 50 percent sample of birth certificates, except a 20 percent to 50 percent in 1967, received by NCHS. Beginning 1972, data from some states received through Vital Statistics Cooperative Program (VSCP) and complete file used; data from other states based on 50 percent sample. Beginning 1986, all reporting areas participated in the VSCP.
Data Collection and Imputation Procedures: Reports based on records from registration offices of all states, District of Columbia, New York City, Puerto Rico, Virgin Islands, Guam, American Samoa, and Northern Marianas.
Estimates of Sampling Error: For recent years, CVs for births are small due to large portion of total file in sample (except for very small estimated totals).
Other (nonsampling) Errors: Data on births and deaths believed to be at least 99 percent complete.
Sources of Additional Material: U.S.National Center for Health Statistics, Vital Statistics of the United States, Vol. I and Vol. II, annual, and National Vital Statistics Report. NCHS Web site at <http://www.cdc.gov/nchs/nvss.htm>.
National Health Interview Survey (NHIS)
Universe, Frequency, and Types of Data: Continuous data collection covering the civilian noninstitutional population to obtain information on personal and demographic characteristics, illnesses, injuries, impairments, and other health topics.
Type of Data Collection Operation: Multistageprobability sample of 49,000households (in 198 PSUs) from 1985 to 1994; 43,000 households (358 design PSUs) from 1995 on, selected in groups of about four adjacent households.
Data Collection and Imputation Procedures: Some missing data items (e.g., race, ethnicity) are imputed using a hot deck imputation value. Unit nonresponse is compensated for by an adjustment to the survey weights.
Estimates of Sampling Error: Estimates of standard error (SE): For 1999 medically attended injury episodes rates in the past 12 months by falling for: females 37.66 (1.94), and males 29.69 (1.82) per 1,000 population; for 1999 injury episodes rates during the past 12 months inside the home - 20.38 (1.08) per 1,000 population.
Other (nonsampling) Errors: The response rate was 93.8 percent in 1996; in 1999, the total household response rate was 87.6 percent, with the final family response rate of 86.1 percent, and the final sample adult response rate of 69.6 percent; in 2000, the total household response rate was 88.9 percent, with the final family response rate of 87.3 percent, and the final sample adult response rate was 72.1 percent for the NHIS. 2001 household final response rate was 88.9 percent, with the family/person final response rate of 87.6 percent and the final sample adult response rate was 73.8 percent. (Note: The NHIS sample redesign was conducted in 1995, and the NHIS questionnaire was redesigned in 1997.)
Sources of Additional Material: U.S.National Center for Health Statistics, Summary Health Statistics for U.S. Children: National Health Interview Survey, 1999, Vital and Health Statistics, Series 10 #203; U.S. National Center for Health Statistics, Summary Health Statistics for the U.S. Population: National Health Interview Survey, 1999, Vital and Health Statistics, Series 10 #211; U.S. National Center for Health Statistics, Summary Health Statistics for U.S. Adults: National Health Interview Survey, 1999, Vital and Health Statistics, Series 10 #212; U.S. National Center for Health Statistics, Design and Estimation for the National Health Interview Survey, 1995-2004, Vital and Health Statistics, Series 2 #130; U.S. National Cent r for Health Statistics, Summary Health Statistics Technical Report: National Health Interview Survey, 1997-2003, Vital and Statistics, Series 2 #134 (in preparation)
.U.S. BUREAU OF JUSTICE STATISTICS (BJS) National Crime Victimization Survey
Universe, Frequency, and Types of Data: Monthly survey of individuals and households in the United States to obtain data on criminal victimization of those units for compilation of annual estimates.
Type of Data Collection Operation: National probability sample survey of about 50,000 interviewed households in 376 PSUs selected from a list of addresses from the 1980 census, supplemented by new construction permits and an area sample where permits are not required.
Data Collection and Imputation Procedures: Interviews are conducted every 6 months for 3 years for each household in the sample; 8,300 households are interviewed monthly. Personal interviews are used in the first interview; the intervening interviews are conducted by telephone whenever possible.
Estimates of Sampling Error: CVs averaged over the period 1998-2001 are: 3.7 percent for personal crimes (includes all crimes of violence plus purse snatching crimes), 3.8 percent for crimes of violence; 12.1 percent for estimate of rape/sexual assault counts; 7.9 percent for robbery counts; 4.1 percent for assault counts; 11.2 percent for purse snatching (it refers to purse snatching and pocket picking); 2.5 percent for property crimes; 3.8 percent for burglary counts; 2.7 percent for theft (of property); and 5.2 percent for motor vehicle theft counts.
Other (nonsampling) Errors: Respondentrecall errors which may include reporting incidents for other than the reference period, interviewer coding and processing errors, and possible mistaken reporting or classifying of events. Adjustment is made for a household noninterview rate of about 7 percent and for a within-household noninterview rate of 10 percent.
Sources of Additional Material: U.S.Bureau of Justice Statistics, Criminal Victimization in the United States, annual
U.S. FEDERAL BUREAU OF INVESTIGATIONUniform Crime Reporting(UCR) Program
Universe, Frequency, and Types of Data: Monthly reports on the number of criminal offenses that become known to law enforcement agencies. Data are collected on crimes cleared by arrest, by age, sex, and race of offender, and on assaults on law enforcement officers.
Type of Data Collection Operation: Crimestatistics are based on reports of crime data submitted either directly to the FBI by contributing law enforcement agencies or through cooperating state UCR programs
Data Collection and Imputation Procedures: States with UCR programs collect data directly from individual law enforcement agencies and forward reports, prepared in accordance with UCR standards, to the FBI. Accuracy and consistency edits are performed by the FBI.
Estimates of Sampling Error: Not applicable.
Other (nonsampling) Errors: Coverage of 90 percent of the population (92 percent in MSAs, 79 percent in ‘‘other cities’’, and 79 percent in rural areas) by UCR program, though varying number of agencies report.
Sources of Additional Material: U.S. Federal Bureau of Investigation, Crime in the United States, annual, Hate CrimeStatistics, annual, Law Enforcement Officers Killed & Assaulted, annual,<http://www.fbi.gov/ucr.htm>.
U.S. BUREAU OF LABOR STATISTICS U.S. Census Bureau, Current Population Survey (CPS)
Universe, Frequency, and Types of Data: Nationwide monthly sample survey of civilian noninstitutional population, 15 years old or over, to obtain data on employment, unemployment, and a number of other characteristics.
Type of Data Collection Operation: Multistageprobability sample of about50,000 households in 754 PSUs in 1996 expanded to about 60,000 households in July 2001. Oversampling in some states and the largest MSAs to improve reliability for those areas of employment data on annual average basis. A continual sample rotation system is used. Households are in sample 4 months, out for 8 months, and in for 4 more. Month--to-month overlap is 75 percent; year-to-year overlap is 50 percent.
Data Collection and Imputation Procedures: For first and fifth months that a household is in sample, personal interviews; other months, approximately, 85 percent of the data collected by phone. Imputation is done for both item and total nonresponse. Adjustment for total nonresponse is done by a predefined cluster of units, by MSA size and residence; for item nonresponse imputation varies by subject matter.
Estimates of Sampling Error: EstimatedCVs on national annual averages for labor force, total employment, and nonagricultural employment, 0.2 percent; for total unemployment and agricultural employment, 1.0 percent to 2.5 percent. The estimated CVs for family income and poverty rate for all persons in 1986 are 0.5 percent and 1.5 percent, respectively. CVs for subnational areas, such as states, would be larger and would vary by area
Other (nonsampling) Errors: Estimates of response bias on unemployment are not available, but estimates of unemployment are usually 5 percent to 9 percent lower than estimates from reinterviews. About 7.5 percent of sample households unavailable for interviews.
Sources of Additional Material: U.S. Census Bureau and Bureau of Labor Statistics, Current Population Survey; Designand Methodology, (Tech. Paper 63), available on Internet <http://www.census.gov/prod/2002pubs/tp63rv.pdf> Source and Accuracy of Estimates for Poverty in the United States, availableon Internet <http://www.census.gov/hhes/poverty/poverty02/pov02src.pdf> and Bureau of Labor Statistics, Employment and Earnings, monthly, Explanatory Notes and Estimates of Error, Household Data and BLSHandbook of Methods, Chapter 1, available on the Internet at <http://www.bls.gov/opub/hom/homch1a.htm>.
Consumer Price Index (CPI)
Universe, Frequency, and Types of Data: Monthly survey of price changes of all types of consumer goods and services purchased by urban wage earners and clerical workers prior to 1978, and urban consumers thereafter. Both indexes continue to be published.
Type of Data Collection Operation: Prior to 1978, and since 1998, sample of various consumer items in 87 urban areas; from 1978-1997, in 85 PSUs, except from January 1987 through March 1988, when 91 areas were sampled.
Data Collection and Imputation Procedures:Prices of consumer items are obtained from about 50,000 housing units, and 23,000 other reporters in 87 areas. Prices of food, fuel, and a few other items are obtained monthly; prices of most other commodities and services are collected every month in the three largest geographic areas and every other month in others
Estimates of Sampling Error: Estimates of standard errors are available.
Other (nonsampling) Errors: Errors result from inaccurate reporting, difficulties in defining concepts and their operational implementation, and introduction of product quality changes and new products.
Sources of Additional Material: U.S. Bureau of Labor Statistics, Internet site <http://stats.bls.gov/cpi> and BLS Handbook of Methods, Chapter 17, Bulletin 2490.
Producer Price Index (PPI)
Universe, Frequency, and Types of Data: Monthly survey of producing companies to determine price changes of all commodities produced in the United States for sale in commercial transactions. Data on agriculture, forestry, fishing, manufacturing, mining, gas, electricity, public utilities, and a few services.
Type of Data Collection Operation: Probability sample of approximately 30,000 establishments that result in about 100,000 price quotations per month.
Data Collection and Imputation Procedures:Data are collected by mail and facsimile. If transaction prices are not supplied, list prices are used. Some prices are obtained from trade publications, organized exchanges, and government agencies. To calculate index, price changes are multiplied by their relative weights taken from 1997 shipment values from the census of manufactures.
Estimates of Sampling Error: Not applicable.
Other (nonsampling) Errors: Not available at present.
ources of Additional Material: U.S. Bureau of Labor Statistics, BLS Handbook of Methods, Chapter 14, Bulletin 2490. U.S. Bureau of Labor Statistics Internet sites <http://stats.bls.gov/ppi>.
Current Employment Statistic (CES) Program
Universe, Frequency, and Types of Data: Monthly survey drawn from a sampling frame of over 8 million Unemployment Insurance tax accounts in order obtain data by industry on employment, hours, and earnings.
Type of Data Collection Operation: In 2003, the CES sample included about 160,000 businesses and government agencies, which represent approximately 400,000 individual worksites
Data Collection and Imputation Procedures: Each month, the state agencies cooperating with BLS, as well as BLS Data Collection Centers, collect data through various automated collection modes and mail. BLS-Washington staff prepares national estimates of employment, hours, and earnings while states use the data to develop state and area estimates
Estimates of Sampling Errors: The relative standard error for total nonfarm employment is 0.2 percent.
Other (nonsampling) Errors): Estimates of employment adjusted annually to reflect complete universe. Average adjustment is 0.3 percent over the last decade, ranging from zero to 0.7 percent.Sources of Additional: U.S. Bureau of Labor Statistics, Employment and Earnings, monthly, Explanatory Notes and Estimates of Errors, Tables 2-A through 2-F.
National Compensation Survey
Universe, Frequency, and Types of Data: Nationwide sample survey of establishments of all employment size classes, stratified by geographic area, in private industry and state and local government. Data collected include wages and salaries, and employer costs of employee benefits. Data produced include percent changes in the cost of employment cited in the Employment Cost Index (ECI) and costs per hour worked for individual benefits cited in the Employer Costs for Employee Compensation (ECEC). The survey provides data by ownership (Private industry and state and local government), industry sector, major industry divisions, major occupational groups, bargaining status, metropolitan area status, and census region. ECEC also provides data by establishment size class.
Type of Data Collection Operation: Probability proportionate to size sample of establishments. The sample is replaced on a continual basis. Establishments are in the survey for approximately 5 years, with some establishments replaced each quarter.
Data Collection and Imputation Procedures: For the initial visit, data are primarily collected in a personal visit to the establishment. Quarterly updates are obtained primarily by mail, fax, and telephone. Imputation is done for individual benefits.
Estimates of Sampling Error: Because standard errors vary from quarter to quarter, the ECI uses a 5-year moving average of standard errors to evaluate published series. These standard errors are available at <http://www.bls.gov/ncs/ect/home.htm>.
Other (nonsampling) Errors: Nonsampling errors have a number of potentia sources. The primary sources are (1) survey nonresponse and (2) data collection and processing errors. Nonsampling errors are not measured Procedures have been implemented for reducing nonsampling errors, primarily through quality assurance programs. These programs include the use of data collection reinterviews, observed interviews, computer edits of the data, a nd systematic professional review of the reports on which the data are recorded. The programs also serve as a training device to provide feedback to the field economists, or data collectors on errors. And, they provide information on the sources of error which can be remedied by improved collection instructions or computer processing edits. Extensive training of field economists is also conducted to maintain high standards in data collection.
Sources of Additional Material: Bureau of Labor Statistics, BLS Handbook of Methods, Chapter 8 (Bulletin 2490) and <http://www.bls.gov/ncs>.
Consumer Expenditure Survey (CES)
Universe, Frequency and Types of Data: Consists of two continuous components: a quarterly interview survey and a weekly diary or recordkeeping survey. They are nationwide surveys that collect data on consumer expenditures, income, characteristics, and assets and liabilities. Samples are national probability samples of households that are representative of the civilian noninstitutional population. The surveys have been ongoing since 1980
Type of Data Collection Operation: The Interview Survey is a panel rotation survey. Each panel is interviewed for five quarters and then dropped from the survey. About 7,500 consumer units are interviewed each quarter. The Diary Survey sample is new each year and consists of about 7,500 consumer units. Data are collected on an ongoing basis in 105 PSUs since 1996.
Data Collection and Imputation Procedures:For the Interview Survey, data are collected by personal interview with each consumer unit interviewed once per quarter for five consecutive quarters. Designed to collect information that respondents can recall for 3 months or longer, such as large or recurring expenditures. For the Diary Survey, respondents record all their expenditures in a self-reporting diary for two consecutive 1-week periods. Designed to pick up items difficult to recall over a long period, such as detailed food expenditures. Missing or invalid attributes or expenditures are imputed. Income, assets, and liabilities are not imputed. The U.S. Census Bureau collects the data for the Bureau of Labor Statistics.
Estimates of Sampling Error: Standard error tables are available since 2000.
Other (nonsampling) Errors: Includes incorrect information given by respondents, data processing errors, interviewer errors, and so on. They occur regardless of whether data are collected from a sample or from the entire population.
Sources of Additional Material: Bureau of Labor Statistics, Internet site <http://www.bls.gov/cex> and BLS Handbook of Methods, Chapter 16, Bulletin 2490.
Board of Governors of the Federal Reserve System, Survey of Consumer Finances.
Universe, Frequency, and Types of Data:Periodic sample survey of families. In this survey a given household is divided into a primary economic unit and other economic units. The primary economic unity, which may be a single individual,is generally chosen as the unit that contains the person who either holds the title to the home or is the first person listed on the lease. The primary unit is used as the reference family. The survey collects detailed data on the composition of family balance sheets, the terms of loans, and relationships with financial institutions. It also gathered information on the employment history and pension rights of the survey respondent and the spouse or partner of the respondent.
Type of Data Collection Operation: The survey employs a two-part strategy for sampling families. Some families were selected by standard multistage area probability sampling methods applied to all 50 states. The remaining families in the survey were selected using statistical records derived from tax returns, under the strict rules governing confidentiality and the rights of potential respondents to refuse participation.
Data Collection and Imputation Procedures:NORC at the University of Chicago has collected data for the survey since 1992. Since 1995, the survey has used computer-assisted personal interviewing. Adjustments for nonresponse are made through multiple imputation of unanswered questions and through weighting adjustments based on data used in the sample design for families that refused participation.
Estimates of Sampling Error: Because of the complex design of the survey, the estimation of potential sampling errors is not straightforward. A replicate-based procedure is available.
Other (nonsampling) Errors: The survey aims to complete 4,500 interviews, with about two-thirds of that number deriving from the area-probability sample. The response rate is typically about 70 percent for the area-probability sample and about 35 percent over all strata in the tax-data sample. Proper training and monitoring of interviewers, careful design of questionnaires, and systematic editing of the resulting data were used to control inaccurate survey responses.
Sources of Additional Material: Board of Governors of the Federal Reserve System,Recent Changes in U.S. Fami ly Finances: Evidence from t e 1998 and 2001 Survey of Consumer Finances, Federal Reserve Bulletin, January 2003.
U.S. INTERNAL REVENUE SERVICEStatistics of Income, Individual Income Tax Returns
Universe, Frequency, and Types of Data: Annual study of unaudited individual income tax returns, Forms 1040, 1040A, and 1040EZ, filed by U.S. citizens and residents. Data provided on various financial characteristics by size of adjusted gross income, marital status, and by taxable and nontaxable returns. Data by state, based on 100 percent file, also include returns from 1040NR, filed by nonresident aliens plus certain self employment tax returns.
Type of Data Collection Operation: Annual 2000 stratified probability sample of approximately 196,000 returns broken into sample strata based on the larger of total income or total loss amounts as well as the size of business plus farm receipts. Sampling rates for sample strata varied from 0.05 percent to 100 percent.
Data Collection and Imputation Procedures:Computer selection of sample of tax return records. Data adjusted during editing for incorrect, missing, or inconsistent entries to ensure consistency with other entries on return.
Estimates of Sampling Error: Estimated CVs for tax year 2000: Adjusted gross income less deficit 0.11 percent; salaries and wages 0.21 percent; and tax exempt interest received 1.65 percent. (State data not subject to sampling error.)
Other (nonsampling) Errors: Processing errors and errors arising from the use of tolerance checks for the data.
Sources of Additional Material: U.S. Internal Revenue Service, Statistics of Income, Individual Income Tax Returns, annual. For background information for individual tax statistics: <http://www.irs.gov/taxstats/article/0,,id=96571,00.html>
Statistics of Income, Sole Proprietorship Returns and Statistics of Income Bulletin
Universe, Frequency, and Types of Data: Annual study of unaudited income tax returns of nonfarm sole proprietorships, Form 1040 with business schedules. Data provided on various financial characteristics by industry.
Type of Data Collection Operation: Stratified probability sample of approximately 56,000 sole proprietorships for tax year 2000. The sample is classified based on presence or absence of certain business schedules; the larger of total income or loss; and size of business plus farm receipts. Sampling rates vary from 0.05 percent to 100 percent.
Data Collection and Imputation Procedures:Computer selection of sample of tax return records. Data adjusted during editing for incorrect, missing, or inconsistent entries to ensure consistency with other entries on return.
Estimates of Sampling Error: Estimated CVs for tax year 2000 are available. For sole proprietorships, business receipts, 0.69 percent; net income, (less loss), 0.94 percent; depreciation 1.40 percent.
Other (nonsampling) Errors: Processing errors and errors arising from the use of tolerance checks for the data.
Sources of Additional Material: U.S. Internal Revenue Service, Statistics of Income, Sole Proprietorship Returns (for years through 1980) and Statistics of Income Bulletin, Vol. 22, No. 1 (summer 2002).
Statistics of Income, Partnership Returns and Statistics of Income Bulletin
Universe, Frequency, and Types of Data: Annual study of unaudited income taxreturns of partnerships, Form 1065. Data provided on various financial characteristicsby industry.
Type of Data Collection Operation: Stratified probability sample of approximately 36,000 partnership returns from a population of 2.2 million filed during calendar year 2000. The sample is classified based on combinations of gross receipts, net income or loss, and total assets, and on industry. Sampling rates vary from 0.09 percent to 100 percent.
Data Collection and Imputation Procedures: Computer selection of sample of tax return records. Data are adjusted during editing for incorrect, missing, or inconsistent entries to ensure consistency with other entries on return. Data not available due to regulations are not imputed.
Estimates of Sampling Error: Estimated CVs for tax year 2000 (latest available): For number of partnerships, 0.3 percent; business receipts, 0.4 percent; net income, 0.5 percent; net loss, 1.9 percent.
Other (nonsampling) Errors: Processing errors and errors arising from the use of tolerance checks for the data.
Sources of Additional Material: U.S. Internal Revenue Service, Statistics of Income, Partnership Returns and Statistics of Income Bulletin, Vol. 22, No. 2 (fall 2002).
Corporation Income Tax Returns
Universe, Frequency, and Types of Data: Annual study of unaudited corporation income tax returns, Forms 1120, 1120-A, 1120-F, 1120-L, 1120-PC, 1120- REIT, 1120-RIC, and 1120S, filed by corporations or businesses legally defined as corporations. Data provided on various financial characteristics by industry and size of total assets, and business receipts.
Type of Data Collection Operation: Stratified probability sample of approximately 145,500 returns for tax year 2000, allocated to sample classes which are based on type of return, size of total assets, size of net income or deficit, and selected business activity. Sampling rates for sample classes varied from 0.25 percent to 100 percent.
Data Collection and Imputation Procedures:Computer selection of sample of tax return records. Data adjusted during editing for incorrect, missing, or inconsistent entries to ensure consistency with other entries on return and to comply with statistical definitions.
Estimates of Sampling Error: Estimated CVs for tax year 2000: Returns with assets over $250 million are selfrepresenting. For other returns grouped by assets, CVs ranged from 0.04 percent to 2.88 percent; for amount of net income CV is 0.14 percent.
Other (nonsampling) Errors: Nonsampling errors include coverage errors, processing errors, and response errors.
Sources of Additional Material: U.S. Internal Revenue Service, Statistics of Income, Corp ration Income Tax Returns, annual. For background information for corporation tax statistics: <http://www.irs.gov/taxstats/article/0,,id=96246,00.html>
U.S. SOCIAL SECURITY ADMINISTRATION Benefit Data
Universe, Frequency, and Types of Data: All persons receiving monthly benefits under Title II of Social Security Act. Data on number and amount of benefits paid by type and state.
Type of Data Collection Operation: Data based on administrative records. Data based on 100 percent files, as well as 10 percent and 1 percent sample files.
Data Collection and Imputation Procedures:Records used consist of actions pursuant to applications dated by subsequent post-entitlement actions.
Estimates of Sampling Error: Varies by size of estimate and sample file size.
Other (nonsampling) Errors: Processing errors, which are believed to be small.
Sources of Additional Material: U.S. Social Security Administration, Annual Statistical Supplement to the Social Security Bulletin.
Supplemental Security Income (SSI) Program
Universe, Frequency, and Types of Data: All eligible aged, blind, or disabled persons receiving SSI benefit payments under SSI program. Data include number of persons receiving federally administered SSI, amounts paid, and state administered supplementation.
Type of Data Collection Operation: Databased on administrative records.
Data Collection and Imputation Procedures: Data adjusted to reflect returned checks and overpayment refunds. For federally administered payments, actual adjusted amounts are used.
Estimates of Sampling Error: Not applicable.
Other (nonsampling) Errors: Processing errors, which are believed to be small.
Sources of Additional Material: U.S. Social Security Administration, Annual Statistical Supplement to the Social Security Bulletin.
National Highway Traffic Safety Administration (NHTSA)Fatality Analysis Reporting System (FARS)
Universe, Frequency, and Types of Data: Census of all motor vehicle traffic crashes involving at least one person killed as a result of the crash. The crash must be reported to the state and the death of the involved person must be within thirty days of the crash date. These fatal crashes occur throughout the United States (includes the District of Columbia), Puerto Rico, Virgin Islands, and American Pacific Territories.
Type of Data Collection Operation: Each state provides an analyst(s) who extracts data from the official documents and enters it into a standardized database.
Data Collection and Imputation Procedures:Detailed data describi g the characteristics of the fatal crash, the vehicles and persons involved are obtained from police crash reports, driver and vehicle registration records, autopsy reports, highway department, etc. Computerized edit checks monitor that accuracy and completeness of the data. The FARS incorporates a sophisticated mathematical multiple imputation model to impute missing blood alcohol concentration (BAC) in the database for drivers and pedestrians only.
Estimates of Sampling Error: Since this is census data, there are no sampling errors.
Other (nonsampling) Errors: Data on the fatal motor vehicle traffic crashe is more than 97 percent complete.
Sources of Additional Material: The FARS Coding and Validation Manual, ANSI D16.1, Manual on Classification of Motor Vehicle Traffic Accidents (sixth edition).