GUIDELINES FOR STATISTICAL ANALYSIS
OF OCCUPATIONAL EXPOSURE DATA
FINAL
by
IT Environmental Programs, Inc.
11499 Chester Road
Cincinnati, Ohio 45246-0100
and
ICF Kaiser Incorporated
9300 Lee Highway
Fairfax, Virginia 22031-1207
Contract No. 68-D2-0064
Work Assignment No. 006
for
OFFICE OF POLLUTION PREVENTION AND TOXICS
U.S. ENVIRONMENTAL PROTECTION AGENCY
401 M STREET, S.W.
WASHINGTON, D.C. 20460
August 1994
-------
DISCLAIMER
This report was developed as an in-house working document and the procedures and methods
presented are subject to change. Any policy issues discussed in the document have not been subjected
to agency review and do not necessarily reflect official agency policy. Mention of trade names or
products does not constitute endorsement or recommendation for use.
-------
CONTENTS
FIGURES v
TABLES Vj
ACKNOWLEDGMENT vu
*
INTRODUCTION 1
A. Types of Occupational Exposure Monitoring Data 1
B. Types of Occupational Exposure Assessments 2
C. Variability in Occupational Exposure Data 3
D. Organization of This Report 4
STEP 1: IDENTIFY USER NEEDS 9
STEP 2: COLLECT DATA 15
A. Obtaining Data From NIOSH 15
B. Obtaining Data From QSHA 16
C. Other Sources of Data 17
STEP 3: DEFINE DATA NEEDS 19
STEP 4: IDENTIFY PARAMETERS AFFECTING EXPOSURE 21
STEP 5: IDENTIFY UNCERTAINTIES, ASSUMPTIONS, AND BIASES 26
A. Uncertainties '. 26
B. Assumptions 27
C. Biases 28
STEP 6: CREATE PRELIMINARY EXPOSURE DATA MATRIX 30
STEP 7: CHECK FOR CONSISTENCY AND REASONABLENESS 33
A. Grouping of Like Types of Data 33
B. Conversion to Consistent Concentration Units 34
C. Conversion to Consistent Exposure Periods 34
D. Identification of Assumptions 36
E. Checks for Consistency and Reasonableness 36
STEP 8: COLLECT ADDITIONAL MISSING INFORMATION . 38
STEP 9: ESTIMATE ADDITIONAL MISSING INFORMATION 39
ii
-------
STEP 10: REVISE EXPOSURE MATRIX AND IDENTIFY DATA BY TYPE 41
STEP 11: ASSESS ABILITY TO MEET USER NEEDS 43
STEP 12: TREAT TYPE 3 DATA 44
STEP 13: TREAT NONDETECTED VALUES 47
STEP 14: SEPARATE INTO TYPE I DATA AND TYPE 2 DATA 50
STEP 15: DEFINE GROUPS FOR ANALYSIS 51
A. Identify Initial Grouping 53
B. Log-Transform the Data 56
C. Graphical Examination of the Data: Check for Outliers 56
D. Analysis of Variance 58
E. Redefining Groups 67
STEP 16: TREATMENT OF TYPE 2 DATA 68
A. Considering Addition of Type 2 Data •. 68
B. Adding Type 2 Data 68
C. Summary of Remaining TyM 2 Pfltt 68
STEP 17. CALCULATE DESCRIPTIVE STATISTICS FOR EACH GROUP 72
STEP 18: TREAT UNCERTAINTIES, ASSUMPTIONS, AND BIASES 76
A. Sensitivity Analysis 76
B. Confidence Intervals 77
C. Quantification of Bias 78
D. Weighting Factors to Mitigate Bias 78
STEP 19: PRESENT RESULTS : 81
A. Characterization of Exposure 81
B. Presentation of Descriptive Statistics 82
C. Presentation of Assumptions and Uncertainties 83
D. Present Original Data 90
REFERENCES R-l
GLOSSARY OF TERMS G-l
APPENDIX A A-l
SPREADSHEET MATRK FOR TYPE 1 EXAMPLE DATA SET
FULL SHIFT PERSONAL SAMPLES
111
-------
APPENDIX B . '. . B,l
BACKGROUND INFORMATION ON STATISTICAL METHODOLOGY
APPENDIX C C-l
LISTING OF COMPUTER SOFTWARE FOR
VARIOUS STATISTICAL ANALYSES
IV
-------
FIGURES
*
Number Page
1 Flow Diagram for Creation of Preliminary Exposure Matrix 6
2 Flow Diagram for Creation of a Completed Exposure Matrix 7
3 Flow Diagram for the Statistical Analysis of Type 1 and Type 2 Data 8
4 Statement of Needs 11
5 Flow Diagram for Step 15 (Define Groups for Analysis) . . 52
6 Box-and-Whisker Plot for Monomer Industry Categories 59
7 SAS Output for All Monomer Industry Categories Combined 61
8 SAS Output for Test of Company, Process Type, and Control Type in Monomer Industry 64
9 SAS Output for Test of Process Type and Control Type in Monomer Industry 65
10 SAS Output for Test of Company, Process Type, and Control Type in Polymer Industry . 66
11 Flow Diagram for Step 16 (Treatment of Type 2 Data) 70
12 Box-and-Whisker Plots for Monomer Industry Groups 86
13 Example Bar Graph for Polymer Industry Groups:
Means and Maxima Compared to 3 Target Levels 87
14 Example Format for Presentation of Assumptions and Uncertainties 89
-------
TABLES
Number page
«
1 Example Preliminary Exposure Data Matrix - Full Shift Personal Samples 31
2 Type 2 Data Used in Statistical Analysis 71
3 Descriptive Statistics for Groups in Example Data Set 74
4 Descriptive Statistics Presentation, Example Data Set 84
VI
-------
ACKNOWLEDGMENT
*
Many individuals and organizations have been helpful in developing this report; for these
contributions the project management extends its sincere gratitude.
Mr. Paul Quillen and Ms. Breeda Reilly were the EPA Project Officers and Ms. Cathy
Fehrenbacher was the EPA Work Assignment Manager. Mr. Thomas Corwin, IT Environmental
Programs, Inc., was the Project Director and Mr. Edwin Pfetzing the Project Manager. Mr. Robert
Goodman, IT Environmental Programs, Inc., assisted in the preparation of the report. Ms. Nora Zirps
was the ICF Project Manager. Dr. Erwin Hearne and Mr. Bruce Allen, K.S. Crump Division of ICF
Kaiser developed the statistical methodology for the report. Extensive review of and comment on the
guidelines was made by Drs. Rick Hornung, Larry Elliott, Steve Ahrenholz, David Utterback, and
Thurman Wenzel, NIOSH; and Elizabeth Margosches and Gary Grindstaff, EPA.
Peer review was provided by Dr. Neil C. Hawkins, Dow Chemical Company; Mr. Keith A.
Motley, OSHA; Dr. Stephen M. Rappaport, University of North Carolina; Col. James C. Rock, U.S.
Air Force; and Dr. Steve Selvin, University of California, Berkeley.
vu
-------
INTRODUCTION
The purpose of these guidelines is to establish a consistent approach to handling the wide variety
of occupational exposure data available for preparing occupational exposure assessments in support of risk
assessments. It provides guidance in the characterization of broad ranges of job groups with similar
exposures, calculation of descriptive statistics (where appropriate) and treatment of uncertainties,
assumptions, and biases in the data. It is designed to be used by engineers in the Office of Pollution
Prevention and Toxics (OPPT), with some assistance from industrial hygienists and statisticians. The
procedures described provide a systematic methodology for performing an occupational exposure
assessment based upon the types of data which are most commonly available for such analyses. Methods
used by OPPTs Chemical Engineering Branch (CEB) to prepare assessments of occupational exposure
and environmental release are presented in the CEB Engineering Manual (IT, 91). These guidelines an
a supplement to the CEB Engineering Manual intended for use with recently collected data. It should
be noted that these guidelines are not intended to provide recommendations for performing additional
monitoring of exposure or for determining compliance with regulatory standards. If mis is the goal, the
reader should consult other references such as Hawkins (91) and Patty (81), etc.
A. Types of Occupational Exposure Monitoring Data
Monitoring data usually consist of area samples, personal inhalation samples or dermal samples.
Area samples are collected to represent the airborne concentration of a chemical in a specific location at
a facility. Personal samples are collected to represent a worker's inhalation exposure during a specified
time period; for example, peak, ceiling, short-term, and full-shift samples. Peak or ceiling samples are
typically collected instantaneously through continuous monitoring or for 15 minutes or less. Short-term
samples are collected over a designated period, typically less than 2 hours. Full-shift samples are
collected to represent a worker's inhalation exposure over an entire work shift and may be composed of
a single sample or consecutive short-term samples. Dermal samples are collected to represent a worker's
dermal exposure to a given chemical over a portion of the body which has been in contact with the
chemical. Exposure data collected for each type of exposure should be separated and statistical analyses
conducted separately.
Biological monitoring may also be used to determine an employee's overall exposure to a given
chemical by measuring the appropriate determinant in biological specimens collected from exposed
workers at the specified time. While biological monitoring provides information complementary to air
monitoring, interpretation of data can be difficult due to variability in the physiological and health status
of the individual, exposure sources, individual life style, analytical errors, etc. If biological monitoring
I
-------
data are available', this fact should be noted in the exposure assessment. This report does not address
biological monitoring but focuses on air monitoring data collected to assess inhalation exposure.
For the purposes of this report, three broad categories of occupational exposure data are
considered:
• Type 1 data consist of measurements for which all important variables are known. The
data consist of studies that contain individual measurements and include all backup and
ancillary information (e.g., analytical method, limit of detection, sampling duration, type
of sample taken, job tasks, etc.).
• Type 2 data consist of measurements where important variables are not known but for
which assumptions can be made for their estimation. The data consist of individual
monitoring measurements, but backup and ancillary information are inconsistent.
• Type 3 data consist of measurement summaries, anecdotal data, or other data for which
the important variables are not known and cannot be estimated. Individual monitoring
measurements are typically not available.
These categories were developed for use with these guidelines; judgment is used in determining
the type(s) of data available. Examples and additional information on the categories are provided
beginning with Step 10.
Once satisfied that the data have been properly collected for the objective of the study, the
primary determinant of the confidence one can place in the analysis is the sample size. Every effort
should therefore be made to collect and analyze every available piece of data. Because the size of the
data set being analyzed has a large effect on the confidence that can be placed in the analysis, the
methodology set forth in these guidelines allows the combination of similar data sets based on statistical
tests. The traditional categorization of data by the industrial hygienist or engineer is supplemented by
statistical analysis of the categorization; the goal is identification of groups of data that are as large as
possible and describable by standard statistical distributions (lognormal and normal).
B. Types of Occupational Exposure Assessments
There an various types of exposure assessments performed by OPPTs* CEB. The main
distinction between them is the level of effort expended in collecting data. Regardless of what type of
data are obtained, however, the CEB engineer should review the level of detail required in the exposure
assessment and try to provide the best and most complete analysis of the available data.
The following are examples of the program areas and types of exposure assessments performed
by CEB:
-------
• New Chemicals Pmyram An initial screening assessment is performed with a goal to
determine the high end and central tendency exposures, generally using available
information and information submitted in the Premanufacture Notification (PMN). In
reality, these estimates are more likely to be bounding (e.g., overestimates of) exposure,
due to lack of information. If there are concerns for worker exposure, the initial
assessment is refined as the case progresses through the review process. However, due
to lack of data on these new chemicals which have not yet been commercialized, this
often involves the use of modeling or surrogate data, rather than analysis of actual data
on exposure to the substance of concern.
• Chemical Testing. A preliminary exposure assessment is completed to determine the
bounds of potential occupational exposure for chemical testing candidates. This exposure
assessment is refined as the case progresses and additional information is gathered. Since
these are "existing" chemicals, there may be some exposure data available on the specific
substance. These chemicals may be referred to CEB through the Interagency Testing
Committee (TTC).
• Existing Chemicals. An exposure assessment may be an initial screening which is used
to help determine if further work is needed on the case. If so, a more detailed exposure
assessment including the range of potential exposure, measure of central tendency, uncer-
tainty,, etc. is completed for the population(s) of concern. A risk assessment is
performed; if risk management action will be taken the exposure assessment may be
revised to include additional information or to cover additional uses, etc. For some cases
monitoring studies will be conducted to determine workplace exposure levels. An
evaluation of controls may also be needed.
C. Variability in Occupational Exposure Data
It is rare to find studies of occupational exposure based on a statistical approach to providing
representative information for an individual facility; it is even less likely to find such a study that repre-
sents a particular industry subsector or group of facilities. While random sampling (i.e., monitoring
exposure to a group of workers in a random fashion) is preferred, "worst-case sampling" (i.e., monitoring
the individual with the highest exposure) during a 1- to 3-day sampling campaign is common industrial
hygiene practice for compliance with regulatory standards. However, sampling programs are being used
that promote exposure monitoring and periodic surveillance (Damiano, 89; Hawkins, 91).
Even in statistically-selected, well-done studies, there may be high variability in the
characterization of worker exposure. Measurements at a plant made over a period of no more than a few
days may be all mat are available to characterize exposures over an entire year or a period of years.
Seasonal variability, interday and intraday variability, and changes in the process or worker activities can
cause the exposure to vary from that measured on a single day. Temperature changes can affect
evaporation rates, and seasonal changes in natural ventilation affect exposure. Sampling methods and
time periods can also vary. Seldom can all these variables be measured and accounted for. However,
-------
if important variables are identified and quantified, it is hoped the influence of less important variables
on the overall measure of central tendency will be minimized. Variables that may not be obvious may
also affect variability among plants in the same industry category. Variables such as the age of the plant,
the age of the control equipment, whether the plant is in a volatile organic compound (VOQ
nonattainment area, and operation and maintenance (O&M) practices at the plant should be investigated.
When analyzing sample data, it is important to understand the sources of variation in exposure
sample results that combine to create the observed variability (Patty, 81). The size of the variation may
be a function of both the exposure levels and the measurement method. Both random and systematic
errors should be considered.
Random variations in workplace exposure levels can result in intraday variations, interday
variations, or variations in exposures of different workers within a job group or occupational category
(Patty, 81). Variability in the measurement procedure can be caused by random changes in pump flow
rate, collection efficiency, or desorption efficiency. It is important to realize that random variation in
real workplace exposure levels will usually exceed measurement procedure variation by a substantial
amount, often by factors of 10 or 20 (Patty, 81; Nicas, 91).
Systematic variations in the determinant variables affecting workplace exposure levels will lead
to systematic shifts in the exposure results. Variability in worker exposure levels reflects changes in
worker job operations during a work shift or over several days, production process changes, or control
.system changes. Systematic errors in the measurement procedure can result from mistakes in pump
calibration, use of sampling devices at temperatures or altitudes substantially different from calibration
conditions, physical or chemical interferences, sample degradation during storage, internal laboratory
errors, and interlaboratory errors (Patty, 81). These errors may be identified and their effects minimized
with the use of quality assurance programs (EPA, 92). Specific variables (parameters) that can affect
occupational exposure measurements are more fully discussed in Step 4.
It is also important to ascertain the objectives of the monitoring study to identify potential biases
in the data. For example, if the objective was to sample only well-controlled facilities, then the results
would probably not represent the exposure in the industry as a whole. If the monitoring resulted from
worker complaints, then exposures may not represent typical exposures. If the monitoring was conducted
to evaluate engineering controls or as a preliminary screening of exposure, the results may not represent
actual employee exposure. It is important that ail potential variables be identified and evaluated.
D. QrysniHtfon flf This Report
Following the introduction is a 19-step procedure for statistical analysis of occupational exposure
data. Figures 1 to 3 present flow diagrams outlining these procedures. Each numbered step in these
figures is explained separately. Steps 1 through 6 are presented in Figure 1 and give the actions
necessary to prepare a preliminary exposure matrix. Steps 7 through 14 are presented in Figure 2 and
give the actions necessary to prepare a completed exposure matrix from the preliminary exposure matrix
including preparation of a non-statistical report on Type 3 data. Steps 15 through 19 are presented in
-------
Figure 3 and relate to the statistical analysis of Type I and 2 data and the presentation of the results. An
example is used throughout the 19 steps to better explain the techniques used in the guidelines. The data
used in the example are based on real data, but have been altered where necessary to emphasize particular
points in the guidelines.
These guidelines present rather sophisticated approaches for statistical analysis of occupational
exposure data. Nonstatisticians may require training or the assistance of a statistician in order to properly
understand and use the guidelines. The development of software as a companion to the guidelines could
be useful in guiding the user through the analyses and in incorporating more complex calculations for
certain nondefault procedures discussed in Appendix B.
A bibliography of references pertinent to occupational exposure analysis is also provided.
Appendix A presents a spreadsheet matrix for the example data set. Appendix B presents background
information on the methodology available to statistically analyze the data. Appendix C presents a listing
of currently available computer software for the statistical analyses.
-------
Preliminary
exposure
mabii
Data sources:
NIOSH
OSHA
EPA
Other federal ae
Slate agancias
alionft
Unions
Journal aitidas
F«M in indMcky
Figure 1. Flow Diagram for Creation of Preliminary Exposure Matrix.
-------
Uncertainty/
Assumption
List
Definition of
data needs
from Stop 3
Prekrmnary
exposure
matrix
Uncartamty/
Assumption
lost
Revise Exposure
Matrix and Idanlify
Data by Typa
Check lor
consistency and
reasonabianau
Assass ai)ilily to
maat usai naads
Cankay
informalion ba
colaotadoi
Completed
Typal
axposuia
mafeix
Ara
thara Typa 1
orTypaZ
data?
Separate into
Typa f data and
Typa2 data
Tiaal non-
datactad vaiuac
Completed
Typa 2
exposure
makM
Uncertainly/
Assumption
bsl
Non-stabslical
Report
Figure 2. Flow Diagram for Creation of a Completed Exposure Matrix.
-------
Type 2 Date
Summary
T '
Completed
Typel
exposure
matrix^-—-'
_».
Define Groups
lor Analysis
*~
Treafenenl of
Type 2 Data
-^
Calculate
descriptive
statistics lor each
oroup
-^
Treat assumptions.
uncerlaintes.
-^
Present results
I/ j
Type 3 Data
Summary
Figure 3. Flow Diagram for the Statistical Analysis of Type 1 and Type 2 Data.
-------
STEP 1: IDENTIFY USER NEEDS
The first step in an exposure assessment is to identify the needs of those using the information,
usually in some form of risk assessment activity. The user is typically the project manager for the
chemical under review. This step initially identifies the data requirements of the assessment so mat
resources can be used most effectively to collect pertinent data.
The level of detail required in an exposure assessment depends on the scope of the risk assessment
activity it supports (EPA, 87). If the purpose of the analysis is merely to screen a new chemical for
potential problems, a much less rigorous bounding estimate of exposure will often be prepared. These
analyses are- useful in developing statements that exposures are "not greater than" the estimated value.
However, to support a detailed risk assessment, an in-depth presentation of potential exposures must be
prepared. It is also necessary to know if the end user is interested in a particular demographic group,
route of exposure, frequency and duration of exposure, industry, exposure period, or omer variable. For
example, if the chemical is of concern because of possible reproductive effects for women of childbearing
age, then every effort should be made to gather information on the exposure of mis demographic group.
Information needs also depend on the specific health hazards identified for die chemical. Some of the
information needs that may be identified include:
Mean, standard deviation
Geometric mean, geometric standard deviation
Range of exposures, confidence intervals
Duration of exposure (hr/day and days/yr)
8-hour time-weighted average (TWA)
Peak exposures
Time period (i.e., particular year, 1989)
Cumulative exposure over time, lifetime average daily exposure (for possible use in risk
assessment)
Probability of excursions or exposure during upsets or emergency release
Uncertainties associated with the data and assumptions used in analyzing the data
The objectives of the exposure assessment must be defined using information obtained from the
"user," typically the project manager for die chemical under review. To assist in this process project
managers should be contacted initially to discuss the data requirements of the assessment and asked to
complete a "statement of customer needs" form for exposure assessments which are not typical new
chemical-type assessments. When this form (shown in Figure 4) is returned, it will be of value in Step 3
to more completely define user needs.
Since health effects data are often gathered to prepare die hazard assessment in parallel to the
occupational exposure assessment, good lines of communication wim me project manager and those
preparing the hazard assessment will facilitate information exchange regarding potentially changing
assessment needs. For example, as new health effects are defined, die exposure data classification or
level of detection required of the analytical methods used may need to be changed. For example, if
-------
chronic health effects are identified, generally long-term exposures are of interest, while peak or short*
term exposures are of interest for acute health effects. Timely communication will minimize the changes
that need to be made as well as the need for further data collection.
EXAMPLE
The example shown below will be used throughout this report to illustrate how the statistical
analysis proceeds.
*
The example chemical is a colorless gas whose primary use is polymerization to make
various elastomers. Recent chronic oncology studies indicate that the chemical is carcinogenic in
mice. The present OSHA Permissible Exposure Limit (PEL) is 1,000 ppm as an 8-hour TWA, but
the American Conference of Governmental Industrial Hygienists (ACGffl) recommended a revised
Threshold Limit Value (TLV) of 10 ppm as an 8-hour TWA.
The project manager identified two general needs for the exposure assessment. First, the
exposure assessment was needed to do a preliminary risk assessment for all worker exposures to
the chemical. Second, it was needed as a baseline to estimate the technological feasibility and cost
of reducing worker exposure to target levels of 10 ppm, 1 ppm, and 0.1 ppm. An example
statement of needs form for the example chemical is shown in Figure 4.
10
-------
Figure 4. Statement of Needs
Statement of Customer Needs for
CEB Engineering Assessments
Requester: Sallv Jonms. Protect Manager Pat* of Request? 2/20/94
Tft9 purpose of thtt form is to gethor Information on customer need* to be used
in developing a CEB engineering assessment. Please note that aU identified needs
may not be met due to date limitations, resource constraints, etc. What wftft
multiple customers of CEB assessments, it Is suggested that the form be
completed by the individual who wiM be using the specific type of in formation
provided by CEB.
Return completed form to: John Smith. CEB Engineer Phone: 26O-1234
Section 1. General Information
A. Please indicate tht origin of tht east and chemical/use dusttr, etc. (e.g. RM 2 analysis for
hydraane): RM2 analysis for example chemical.
B. What art tht purpost and goals of tht CEB assessment and tht project? Develop assessment
occuoational exposure to the example chemical.
C. What an tht approjdaiatt completion data for tht CEB asstssmtnt and for tht project? CEB
assessment is due April 4. 1994
D. Please identify tht health effects of concern (t.g. cardnogenidty, neurotoxidty, liver effects^
reproductive effects, sensitization, etc.): Carcinogenicitv
E. Please identifi tht environmental effects of concern : NA
F. Please identify any specific data, sources, references, or personal contacts you would UJu CEB to
research: NIQSH and OSHA data.
11
-------
G. When do you'need to have an estimate of CEB extramural resources (if any) for this project? NA
Section 2. Occupational Exposure Assessment Q Not Needed
A. CEB will estimate number of workers exposed for each industry segment of interest. Identify any
special population characteristics of interest (e.g. gender, etc.): Total number of workers nm»miaiiY
exposed, and population potentially exposed during monomer and polymer production
B. Identify specific industry segment(s) of interest (e.g. manufacture, processing and end uses; only
spray coating application end uses, etc.): Monomer and polymer production.
C. Indicate / which types of exposure are of interest:
/ Inhalation exposure Q Dermal exposure
D Other (e.g. ingestion):
D. Identify which worker activities are of interest (e.g. the assessment need only address textile dye
weighers): All worker activities associated with monomer and polymer production.
E. Indicate /" the preferred characterization for duration and frequency of exposure:
Q Short-term exposure (e.g. peak exposure, maximum 15-minute exposure, etc.), for acute health
effects. Identify specific requirements:
/ Long-term exposure (e.g. annual average exposure, lifetime average daily dose, etc.), for chronic
health effects. Identify specific requirements: annual average exposure and lifetime average
daily dose.
/ Frequency of exposure (days/yr)
/ Cumulative exposure over time (e.g. days, months, years): days, months, and years are of
interest •
D Other: ;
G. CEB will attempt to provide a measure of central tendency, and a high end Potential Dose Rate
(PDR), identify assumptions made, and characterize uncertainty, as data and methodologies allow.
Identify any specific needs (e.g. specific statistical descriptors, etc): Statistical descriptors of geometric
mean, arithmetic mean, geometric standard deviation, arithmetic standard deviation, the distribution of
the data and a ffranhic presentation of the data are D referred.
H. Please identify any other special needs for the occupational exposure assessment: Estimate of me
technical feasibility of controlling exposure to 10 pom. 1 com and Q.I DOm.
12
-------
Section 3. Process Information / Not Needed
A. An then spittle industrial segments (t.g. manufaetun, processing into a coating, end tut as a
paint in automotive application) you would like process information far?
2.
3.
4.
fl. Please specify the information you would like CEB to provide:
D Number of sites Q Days/yr
D Throughput (kg/site-day) D Process Description
D Flow Diagram O Other (please specify)
Section 4. Environmental Release Assessment / Not Needed
A. CEB will provide estimates of environmental release (t.e. kg/site-day or kg/yr)for manufacture,
processing and end use operations. Indicate any specific industry segments of interest or special data
needs:
B. Indicate / which types of releases an of interest, and indicate any special needs:
D Water releases Q Air releases
D Landfill releases D Incineration releases
D Other:
Special Needs:
C. CEB will attempt to provide descriptors for release assessments, identify assumptions made, and
characterize uncertainty, as data and methodologies allow. Identify any specific needs:
12-64}. R*»w4 13
-------
Section 5. Pollution Prevention Assessment (PPA)/Occupational Exposure Reduction
Assessment (OERA) j Not Needed
An there specific industrial segments you would like CEB to provide an assessment of pollution
prevention opportunities and/or occupational exposure reduction for?
D PPA a OERA a Both
1. __
2. , :
3. ;
Section 6. Other Information Needs / Not Needed
Please identify other information, analysis or data needed, and the rationale for requiring the
information:
Customer Contact (e.g. Project Manager):
Ca/fw lnm»*
w
(Namt) (IXvUat/BnuitlQ (TiUflmu) (DtU)
14
-------
STEP 2: COLLECT DATA
Once the data requirements of the assessment are preliminarily identified, the next step is to
collect the monitoring data that will be used in the analysis. It is important to obtain information on all
variables relating to the measured values, such as the collection method, number of workers exposed,
duration of the sampling, etc. Step 4 contains a listing of parameters that may affect exposure. The
more data that are identified and collected, the better the analysis will be. Therefore, it is important to
ascertain at the beginning of the project that all possible sources of data have been checked.
Typical sources of exposure monitoring data include the National Institute for Occupational Safety
and Health (NIOSH), the Occupational Safety and Health Administration (OSHA), the Environmental
Protection Agency (EPA), other federal agencies or departments, state agencies, trade associations,
unions, journal articles, and individual companies in the industry.
A. Obtaining Data From NIQSH
For existing chemicals that have been studied by NIOSH, Health Hazard Evaluations (HHEs) and
Industry Wide Surveys (IWSs) usually represent the largest body of complete and extremely well
documented data. NIOSH reports usually include most of the information necessary to fully classify data.
In cases where the chemical of interest was not the primary reason for the NIOSH report, but rather only
measured as a secondary chemical, information may have to be rilled in by direct contact with the
inspector. In addition, it may also be necessary to confirm the presence of the chemical in all areas
monitored if a large quantity of nondetected values are recorded. Since HHEs are generally done in
response to a complaint regarding a specific chemical, the data may not be random in selection. IWSs
tend to be well selected to represent an industry, but may be biased if only well controlled facilities were
monitored. NIOSH Control Technology Assessment reports are developed to identify and evaluate
appropriate control measures and may be biased toward facilities that are well-controlled. Contact wim
NIOSH can usually identity any potential biases. NIOSH tends to take many samples per visit as
contrasted with OSHA which typically only takes a few measurements.
In general, NIOSH inspectors are easy to locate and will have worked on more than one of the
surveys, so that multiple information can be gathered from each contact. Where contact cannot be made,
it is usually acceptable to assume mat the NIOSH collection and analytical method recommended at the
time was used to collect the data. NIOSH may also have unpublished data or studies that are in progress;
contact with NIOSH personnel who have been or are working on the chemical can thus result in
additional unpublished monitoring data. The best source of NIOSH reports is the NIOSHTIC data base,
which is available mrough DIALOG or on computer disk. In addition, the NIOSH Publications Catalog
can be manually reviewed to identify useful reports. It may also be useful to obtain up-to-date published
and unpublished information available on microfiche and hardcopy from NIOSH. Data may be obtained
from:
15
-------
LT.S. Department of Health and Human Services
National Institute for Occupational Safety and Health
Robert A. Taft Laboratories
4676 Columbia Parkway
Cincinnati, Ohio 45226
(800) 35-NIOSH
B. Obtaining Data From QSHA .
The largest number of measurements for an existing chemical is generally located through
accessing the OSHA National Health Sampling Results by Inspection (OSHA report: OHR 2.6). These
data can be obtained by written request to:
U.S. Department of Labor
Occupational Safety and Health Administration
Director, Office of Management Data Systems
Room N3661
200 Constitution Ave., N.W.
Washington, D.C. 20210
(202) 219-7008
Information provided for each facility includes company name and address, SIC code, inspector code,
OSHA office, date and reason for visit, job title, exposure value, number of similarly exposed workers
at the time of the inspection, and type of exposure (peak/8-hour TWA, personal/area). No information
is provided on controls, type of process, monitoring method, concentration of chemical in process, or
demographics of the exposed workers. The sampling and analytical method and limit of detection may
not be available. Where the sampling and analytical method cannot be ascertained, it is usually
acceptable to assume that the method used is that specified by OSHA in me OSHA Technical Manual at
the time the survey was used (OSHA, 90). The methods specified in this publication are in most cases
from either the NIOSH Manual of Analytical Methods (NIOSH, 84) or the OSHA Manual of Analytical
Methods (OSHA, unpublished). Unlike NIOSH, OSHA usually collects only one or two samples per
chemical during each inspection. In many cases, the job tide or SIC may uniquely define the use of the
chemical (e.g., degreaser operator or SIC 7216, Dry Cleaning Plants), but most data require that some
assumptions be made for categorization. In addition, the data may include large quantities of nondetects
and SIC codes may be inconsistently applied. If time and budget permit, it is best to contact the OSHA
inspector. Because the inspector at the local OSHA office must be called and few summaries are from
the same inspector, mis process can be time consuming. Also, inspectors may be difficult to locate, files
may be stored away, or the inspector may not remember details of me facility. Many states (23 to date)
operate their own OSHA State Programs which must be "at least as effective as" the federal program.
However, these State plans have historically not had data in this OSHA data base. OSHA's Publication
Catalog can also be reviewed, and up-to-date information (including NIOSH studies) may also be
available from:
16
-------
OSHA Technical Data Center
Department of Labor
200 Constitution Avenue, N.W.
Room H-2625
Washington, O.G. 20210
(202) 219-7500
C. Other Sources of Data
Monitoring data may also be available from previous and ongoing EPA studies. Previous reports
done by OPPT (formerly OTS) may contain occupational exposure data. Usually the data will have been
summarized and the primary data will have to be obtained separately. It is important to obtain primary
data to avoid the duplication of data from other sources. Information submitted under Sections 4, 8(a),
and 8(d) of TSCA may be useful in preparing the exposure assessment. Non-confidential information
submitted under TSCA may be obtained through the TSCA Non-Confidential Information Center at
(202) 260-7099. The Office of Air Quality Planning and Standards (OAQPS) may have collected some
exposure data through the use of Section 114 letters. Information about OAQPS Section 114 letters can
be obtained by contacting the Emissions Standards Division at (919) 541-5571.
Other federal agencies or departments may have collected exposure data. For example, the Army
and Air Force have monitoring data on workers in a wide variety of job categories. These data may be
obtained by contacting the following departments:
Army: Assistant Secretary of the Army
(Installations, Logistics and Environment)
Ann: SAILE(ESOH)
1 110 Army Pentagon
Washington, D.C. 20310-0110
(703) 614-8464
Air Force: HQ AFMOA SGPA (BEES)
170 Luke Avenue
Boiling AFB
Washington, D.C. 20332-5113
(202) 767-1731
MSHA: Mine Safety and Health Administration
Metal/Nonmetal, Division of Health
4015 Wilson Blvd.
Arlington, VA 22203-1984
(703) 235-8307 .
17
-------
Mine Safety and Health Administration
Coal, Division of Health
4015 Wilson Blvd.
Arlington, VA 22203-1984
(703) 235-1358
State environmental and occupational safety agencies concerned with both environmental
protection and worker health may have monitoring data. This is especially true if there is a concentration
of the industry under study in a state.
Trade associations often collect and evaluate monitoring data from their members. In many cases
the association may not allow access to the primary data and will provide only summaries of the data,
thus limiting its usefulness. Even if the data cannot be incorporated in the direct analysis, however, it
can be used for comparison with the results of other analyses. An extensive listing of trade associations
is contained in the Encyclopedia of Associations (Koek, 88).
Unions often are the driving force behind the investigation of a particular chemical. In such cases
they may have obtained exposure measurements from companies with which they have contracts. Direct
contact with die union in question is the best method to obtain these data.
Data may also be identified from journal articles. On-line data bases that can be useful to identity
exposure data include BIOSIS, CA Search, EMBASE, Enviroline, Medline, NIOSHTIC, NTIS, and
Pollution Abstracts. These sources almost never present the primary data and the necessary ancillary
information, so the author will usually have to be contacted if primary data are necessary.
Finally, if plant visits are being conducted or plants are being contacted to provide information
for the study, they may also be asked to voluntarily provide monitoring data. Such contacts are of course
limited by Office of Management and Budget (OMB) oversight under the provisions of the Paperwork
Reduction Act. Plants may also be surveyed in the form of OMB approved questionnaires or telephone
surveys.
EXAMPLE
For the example chemical, worker exposure data were obtained from NIOSH, OSHA, a
previous contractor report for EPA, and the union representing workers at several facilities. The
data were generally not primary monitoring results but only summaries of the data giving means
and number of samples for ranges (i.e., Type 3 data). The user needs identified in Step 1,
however, called for the types of results only available by analysis of Type 1 data. Therefore, new
monitoring data had to be collected for the industry. The available and new data form the basis for
the analyses shown in the example in the following steps.
18
-------
STEP 3: DEFINE DATA NEEDS
By the time the initial data collection has been finished, the completed "statement of needs for
occupational exposure assessment" form (Figure 4) should have been received from the project manager.
This form and any other information provided should be used to formally define the data needs of the
assessment. A preliminary determination should be made by the CEB engineer as to whether the existing
data are "in the ball park" or if significant changes in data collection resources or expectations of the
project manager are needed. A more detailed assessment of whether the user needs can be met will be
made in Step 11.
If it is apparent that the exposure data are inadequate to meet the needs set form in the statement
of needs form, then the CEB engineer should inform the project manager that expectations should be
modified to match the existing data or outline approaches and resource implications to meet those needs.
It is important to be responsive to requests for specific statistics in the assessment. For instance,
it is typical for exposure data to be summarized by calculating the geometric mean. Exposures tend to
follow a lognormai distribution and the geometric mean is the value mat represents the most "middle"
value in such a-distribution. However, if the concern of the end user is with total dose rather than win
typical exposure levels, the arithmetic mean may be a more appropriate measure of central tendency, and
should be provided with the assessment.
19
-------
EXAMPLE
For the example chemical, several key issues were identified in the information supplied
by the end users:
• Exposure of workers in the industry was of more interest than exposure of the general
population.
• Worker exposure in the monomer industry was of more interest than worker exposure
in the polymerization process. Worker exposure in handling of the finished polymer
was of least interest.
• EPA was considering risk management options under TSCA. Since exposure may be
limited to workers, a referral to OSHA was also possible. OSHA had no ongoing
activities for the chemical at this time.
• Only inhalation exposure was of interest at this time.
• Only long-term exposure was of interest at this time.
• Specific descriptive statistics were requested.
Because the only data available were of Type 3, it was therefore necessary to conduct a
monitoring program to obtain sufficient Type 1 data to conduct the types of analyses necessary to
meet these needs.
20
-------
STEP 4: IDENTIFY PARAMETERS AFFECTING EXPOSURE
Prior to statistical analysis, monitoring results must be classified into categories containing
sufficient and reliable data so that meaningful analyses can be conducted (EPA, 87). The classification
and organization of occupational exposure monitoring data are extremely important to the analysis and
to the usefulness of the data for the end user. The classification and organization processes can be seen
as the result of a compromise between two competing goals.
•
The first goal is to completely define the data set. If this were the single goal, the only data
included would be those for which all parameters that can influence worker exposure were known, thus
allowing definition of categories based on differences induced by all of these variables. For example,
each category could be uniquely defined by process type, job title, worker activities, ambient control type
(e.g., carbon adsorber), occupational control type (e.g., local exhaust ventilation), collection method,
concentration of chemical in the process, demographics of the exposed worker, date the sample was
taken, and any other parameter that could affect exposure or risk. The categories so defined would yield
groups of exposure measurements (or groups of individual workers) expected to have the same or a
similar exposure profile. Stated another way, the first goal is to define subsets of the data such that data
within each subset are measuring the same thing, i.e., the subsets define homogeneous categories.
Categories that are defined based on too few categorizing variables may lump together data that are not
homogeneous.
The second goal, however, is to get categories with sufficient numbers of observations to allow
meaningful statistical analyses. The power of any statistical analysis is greatly affected by sample size;
large uncertainty can result when data sets are too small. The ability to make generalizations
(extrapolations) is also limited when sample sizes are small. The number of observations within
categories is inversely related to the number of categories (which is directly related to the number of
parameters used to define the categories). Sample size is also reduced if observations have to be excluded
from consideration because the values of variables potentially affecting those observations are missing
or unknown.
The approach to balancing these two conflicting goals presented here has an industrial hygiene
(qualitative) component and a statistical component The industrial hygiene component is described in
Step 4. The statistical component, described in Step 15, verifies the results of the industrial hygiene-
based component and suggests possible re-categorization.
Thus, Step 4 consists of the critical process of identifying those parameters that are important in
influencing worker exposure to the chemical under study. These exposure parameters will be used to
define the categories (subsets or subpopulations) into which the exposure data will be classified.
CEB often develops categories of individuals with the same or similar exposure by first
identifying the industrial process or unit operation during which exposure to the substance occurs, then
identifying specific work activities or tasks associated with exposure, and identifying (or estimating) those
workers associated with the activity or task, incorporating other information as appropriate. If monitoring
21
-------
data are available and job descriptions or job titles are given for the data, the engineer will need-to
evaluate whether the job description or job title can be directly linked to a specific work activity or task.
There are cases where the job title or description does reflect the work activity, but the converse is also
true where job tides or job descriptions may be broader than the activities linked directly to the
monitoring (Hawkins, 91).
If the job title is associated with a specific work activity, the engineer may determine that creating
categories by industrial process/unit operation/job title/work activity/control type/etc, is appropriate. If
the job title or description is not associated with a certain task or work activity, die engineer should try
to obtain information on work activities associated with a personnel job title or description. If
appropriate, an alternative is to make assumptions about die activities associated with the job title, based
on knowledge of die process, professional judgment, etc. These assumptions should be fully documented
and evaluated widi other assumptions made during die assessment (see Step 5). It should also be noted
that the identification of important exposure parameters is often refined as additional information is
gathered during die exposure assessment.
Occupational control type is a variable mat may affect worker exposure and which should often
be considered when defining a classification scheme for exposure data.
The categories should also be designed wim user needs in mind. This may include consideration
of parameters that relate to risk assessment and regulatory considerations. All potential parameters will
be used to create die preliminary exposure data matrix in Step 6.
A distinction may sometimes be made between exposure parameters mat can be considered
"explanatory" as opposed to those that are merely "blocking" factors. For example, it may be the case
that exposures differ from one company to anomer, across plants, or wim time. Although a statistical
analysis may determine that plant-to-plant differences are significant, die factor, plant, does not "explain"
why the exposures are different. Plant is not an explanatory parameter, it is what can be referred to as
a blocking factor; the plant-to-plant differences may be present because of differences in occupational or
ambient controls or other unknown factors that are directly related to exposure concentrations. Blocking
factors are merely parameters wimin which exposures are expected to be similar. The factors that
contribute to plant-to-plant differences, for example, may not be known or identified, and so it may
sometimes be die case that such blocking variables need to be retained to account for differences in
exposure levels. Nevertheless, me engineer is encouraged to identify explanatory parameters for die
purposes of categorization. Retention of some blocking variables may be suggested, but their importance
(as well as die importance of the proposed explanatory variables) will be tested statistically in Step 15.
The engineer should also consider die relative importance of die exposure factors considered for
the classification. Based on his or her knowledge of the industry and die processes entailing exposure,
he or she may be able to suggest that a small set of explanatory (and, perhaps, blocking factors) will be
the most important for determining exposure. Parameters identified by the end user as important should
be considered for the categorization, although, as discussed in Step 11, die expectations of the user may
have to be modified in accordance with die availability of pertinent data. Job tide, work practices.
occupational controls, and production levels are typical examples of important parameters. One purpose
22
-------
of ranking the variables is to prioritize collection of additional information in these areas where necessary
(see Steps 8 and 9).
Ideally, for risk assessment purposes, the exposure profiles for each exposed subpopulatioo
defined by the parameters identified in this step should include the size of the group, the make-up of the
group (age, sex, etc.), the source of the chemical, exposure pathways, the frequency and the intensity
of exposure by each route (dermal, inhalation, etc.), the duration of exposure, and the form of the
chemical when exposure occurs. Assumptions and uncertainties associated with each scenario and profile
should be recorded and clearly discussed in the results presentation (EPA, 87). •
The following parameters are presented as guidance to the CEB engineer as typical variables that
can affect exposure and may be important in determining categories of similarly exposed individuals.
They are presented in general order of their typical importance, but the actual importance of the
parameter must be determined by the CEB engineer for the specific chemical and use.
• Type of
sample
• Process type
• Job tide
• Worker
activities
• Worker
location
• Occupational
control type
(workplace
practices)
• Exposure
period
Sample type such as personal, area, ceiling, peak, etc. should be
defined. In general, different sample types are not combined.
Process should be defined by all characteristics that are likely to
affect exposure. Examples include machine type (e.g., open-top vs.
conveyorized degreaser), age of equipment, usage rate, and product
(e.g., printing on paper vs. plastic).
Job .title is usually given with the monitoring data and may require
combination of similar job descriptions (e.g., printer, letterpress
operator, and press operator could be combined into a single
category).
Within a given job title, activities performed by the workers may
vary in a significant way that can directly affect exposure.
The approximate location of the worker with respect to the source
of the exposure is an important factor.
Controls such as local exhaust ventilation (LEV) or general
ventilation directly affect measured exposure. Other controls such
such as respirators do not generally affect measured exposure but do
affect actual worker exposure.
The time period the worker is exposed to the chemical in a workday
directly affects exposure. Frequency and duration of exposure are
also important factors.
23
-------
• Production
levels
Exposure can relate directly to the volume of production at the
facility.
Operating - Total exposure relates directly to these variables.
frequency and
duration
• Concentration of
chemicals in the
process
The concentration of the chemical can directly affect the exposure
of the workers. Such information is seldom available, however.
• Sampling strategy • The duration of the sampling and die sampling strategy can affect
die accuracy of the measurements in characterizing die exposure.
• Ambient control
type
• Company and
location
• Date of
measurement
Although such controls are installed primarily to reduce release of
the chemical to the ambient air (e.g., refrigerated condenser, carbon
adsorber, or baghouse), they may also increase or decrease
occupational exposure.
Variables such as local regulations, differences between large and
small companies, and regional differences in processes can affect
worker exposure.
The date die measurement was taken can be indicative of die
measurement method, die controls in use, and die effect of natural
ventilation or odier factors.
Sample collection - Different collection methods, sampling times, validated range of die
method, or method analytical techniques can affect die accuracy of
die measurement and die detection limit.
• Source of data
- Analysis by source of die data can help to identify potential biases
in die data. Biases that are not evident in die review of data in
Step 5 may be identified in Step 16.
Demographics of -If healdi effects data show that a particular demographic group is
die exposed worker susceptible (e.g., women of childbearing age), dien whenever
possible data should be categorized using dm information. While
this is not typically needed in an exposure assessment, it may be
needed for a later healdi risk assessment.
• Industry
- While four-digit SIC is preferable to two-digit SIC, OPPT
assessments often focus on individual companies and/or facilities.
24
-------
Other . Depending on the process, controls implemented primarily for other
substances may also reduce exposure to the substance of concern
(e.g., LEV at the raw material transfer operation).
EXAMPLE
For the example data sec, the following were identified as potentially important parameters:
Sample type
Job title
Process type
Occupational control
Company
Sample collection method
Industry
While data were collected for other parameters discussed in this section, emphasis was
placed on verifying information on these seven parameters. Note that the "blocking" variables,
company and industry have been retained. Industry, in particular was retained because the end user
had specified that the monomer industry needed to be considered separately from the polymer
industry.
25
-------
STEP 5: IDENTIFY UNCERTAINTIES, ASSUMPTIONS, AND BIASES
Uncertainties and assumptions are identified and recorded to allow their clear recognition by die
end user. This step initiates that process. All data should be examined for any characteristics that may
represent a nonrandom selection process or a systematic error (bias) in sampling or analysis. It may be
helpful to review the list of important parameters to assist in identifying uncertainties, assumptions, and
biases. All important uncertainties, assumptions, and biases are identified, and for purposes of grouping
like exposure, these should be as specific as possible. In preparing the risk assessment, more general
information on uncertainties, assumptions, and biases may be acceptable. Uncertainties, assumptions,
and biases will be evaluated in Step 18 to determine any influence on estimates of worker exposure in
one or more groups. Steps 5 and 18 are extremely important but may be difficult to execute.
A. Uncertainties
Examples of problems that give rise to typical uncertainties in the input and output of an exposure
analysis include:
• - Data manipulation errors either by the persons collecting the monitoring data or during
the analysis.
• The inherent uncertainty in a small data set (e.g., day-to-day and worker-to-worker
variability are not accounted for).
• Uncertainties regarding differences in chemical concentration, throughput, or other
process related variables.
• Use of an unknown monitoring or analysis method.
• Assumptions made from secondary sources that were applied to the primary data.
• Uncertainties of values below the detection limit.
• Possible interference of other chemicals with a specific test method.
• Uncertainty regarding missing or incomplete information needed to fully define the
exposure.
• The use of generic or surrogate data when site-specific data are not available.
• Errors in professional judgment.
26
-------
In evaluating and repotting uncertainty associated with measurements, the three most important
categories of errors are sampling errors, laboratory analysis errors, and data manipulation errors (EPA,
92). There are two kinds of sampling errors: systematic errors (often referred to as biases) that result
from the sampling process, and random errors that result from the variability of both the population and
the sampling process. While random error cannot be eliminated, its effects can be minimized by using
sampling strategies and by having sufficiently large data sets. Systematic errors can result from faulty
calibration of critical components such as flow meters, thermometers, pressure sensors, sieves, or other
sampling devices.
•
Other systematic errors can result from contamination, losses, interactions with containers,
deteriorations, or displacement of phase or chemical equilibria (EPA, 92).
Generally, laboratory errors are smaller than sampling errors. Calibration is a major source of
systematic error in analysis. Other sources of error include chemical operations such as sample
dissolution, concentration, extraction, and reactions (EPA, 92).
Data manipulation errors include errors of calculation, errors of transposition, errors of
transmission, use of wrong units, use of improper conversion factors, spatial or temporal averaging
information loss, and misassociation errors that confuse samples and numerical results.
B. Assumptions
Throughout the analysis, assumptions must be made about the data. Many assumptions are made
in response to uncertainties identified in the data. These assumptions must be clearly listed and their
effect on the results quantified if possible. Examples of typical assumptions that are made during
exposure analysis include:
• That plants and workers were randomly selected and that they represent the industry as
a whole. (It should be noted that this is almost never true; if it is known not to be true,
this assumption should not be made.)
• That the controls in place when the data were collected represent typically maintained
controls.
• That the value selected for use for a nondetected measurement accurately represents the
actual exposure at those facilities.
• That estimates of ancillary information garnered from other sources also represent the
facilities in die monitoring data set.
• That job activities performed during the exposure period represent typical activities for
that job category.
27
-------
That estimates of the duration of tasks used to convert data to 8-hour TWA values are
accurate.
C. Biases
Bias is a systematic error inherent in a method or caused by some feature of the measurement
system (EPA, 92). Systematic errors in sample selection, sampling errors, laboratory analysis or data
manipulation can cause the results to be biased. If the facilities and workers were>not randomly selected
and the selection process documented, then die data may also contain biases. Common features that may
introduce bias include:
• Systematic sampling, laboratory, or data manipulation errors that have been identified.
• Selection of only "well-controlled" plants such as a NIOSH industry-wide survey
conducted to identify good control technology.
• Selection of only large facilities.
• Large disparity between the number of samples at different facilities (e.g., OSHA vt.
NIOSH data) could lead to bias, depending on how the data are weighted and whether
there are underlying sampling biases.
• Data that represent only OSHA complaint visits.
• When sampling for compliance widi a ceiling limit, sampling workers with the highest
potential for exposure.
• Selection of only plants that are members of a trade association.
• Selection of only companies that voluntarily supplied monitoring data.
• Averaging of a measurement representing many workers with a measurement representing
few workers.
• Use of sampling or analytical methods at concentrations for which they are not validated.
• Sampling strategy bias towards compliance sampling.
D. Development of Uncertainty/Assumptions List
In order to record and retain uncertainties, assumptions, and biases identified in die course of an
occupational exposure assessment, a listing of the uncertainties and assumptions made at various steps
28
-------
will be maintained. This list is initiated in this step and will initially contain uncertainties/assumptions
associated with the data collection and classification. For example, in Step 4, some assumptions may
have been required to relate job titles to specific activities. Moreover, there may have been uncertainties
about the exposure profiles (number of workers, demographics of workers, source of chemical, etc.) for
some of the groups defined by the important exposure parameters. These assumptions and uncertainties
will be recorded in the uncertainty/assumption list.
In the course of following the guidelines defined in this document, other assumptions and
uncertainties will be identified. All of them will be recorded on the uncertainty/assumption list for use
in Step 18 (Treatment of Uncertainties, Assumptions, and Biases) and for presentation to the end-user
with the quantitative results.
EXAMPLE
For the example chemical, a very detailed protocol and quality assurance plan were
developed to select the facilities at which monitoring data would be collected. This protocol is
more detailed than is typical but serves as an example of considerations that should be included to
obtain a sample that is as representative as possible of the sample universe.
For manufacture of the example chemical monomer, the sample universe consisted of tea
companies at 12 different plant locations. A walk-through survey was conducted at ten plants
representing a 100 percent sample of the ten producers. The walk-through survey was used to
gather information that was used .to select a smaller sample set at which to conduct in-depth
surveys. Monitoring data were collected at these in-depth surveys.
The purpose of the survey site selection strategy was to obtain a representative subset of
monomer plants from which to characterize exposures by job title and work environment. To
achieve this, the ten monomer production plants were divided into distinct subpopulations (strata)
representing differences in die work place environment.
The strata were based on the presence or absence of three specific types of engineering
controls, the mode of transportation (pipeline, rail car, tank truck, marine vessel) of die feed stock
and product, and the existence of other production processes or final products at the plant. A single
plant within each stratum was selected based on a scoring system diat quantified the relative
representativeness of each site. Four plants emerged as best representing the diversity of work
environments seen in die example chemical monomer industry. In-depm surveys, including the
collection of monitoring data, were conducted at these four facilities.
In the example data set, a serious potential bias in die analytical method for the chemical
was identified. Potential interferences from C4 chemicals made the measurements taken using
previous methods suspect. Ways were investigated to mitigate die bias, but finally it was decided
to exclude ail data taken using die older analytical methods.
29
-------
STEP 6: CREATE PRELIMINARY EXPOSURE DATA MATRIX
All data should be entered into a usable matrix using a personal computer for analysis. Software
packages (spreadsheets, databases, etc.) are available with storage and retrieval capabilities that facilitate
data analysis calculations. The matrix should be designed to be compatible with statistical programs that
are likely to be used in the data analysis. Many statistical analysis packages have their own data matrix
handling tools which provide a suitable, and in some cases preferable, alternative for data management.
All parameters that were identified as having a potential impact on exposure, were requested by the end
user, or were collected as ancillary information should be entered in the matrix. The use of a matrix will
allow identification of missing information for some observations.
Inclusion of company name, plant location, and source of data in the data matrix is important
because it provides a recordkeeping approach to allow easy referral of data back to the particular plant
or study to obtain additional data. All potential variables should be entered into the data matrix and the
field left blank when no data are found. Every effort should be made to fill in blanks in the matrix for
all variables identified as important. An extra field or two should be included in the matrix for calcu-
lations such as converting to consistent units (Step 7). Also included would be any calculations made
using assumptions such as the conversion of the TWA for the sampled time to an 8-hour TWA.
The exposure data matrix will be completed to the extent possible in Steps 7 through 9 by filling
in missing information (where appropriate) and converting to consistent units. The revised exposure data
matrix (Step 10) will serve to classify the data available and to assess die ability to meet the users' needs
(Step 11). If possible, the data in the matrix will be used in the statistical analyses starting with Step 15.
EXAMPLE
Table 1 presents a partial example of the data matrix used in me example analysis. The full
data set used in the analysis is presented in Appendix A. Only data on the important variables are
presented in Table 1; however, data on all variables are included on die computer spreadsheet.
30
-------
Table I (continued)
Haul
ID
M4
Ml
Ml
Ml
M2
M2
M2
M)
M3
M4
M4
M2
M2
M2
M3
M3
M3
M3
M3
M3
M4
M4
IDO
Induauy (a)
Monomer 4
Monomer 5
Monomer 5
Monffmtr 5
Monomer 6
Monomer 6
Monomer 7
Monomer 8
Monomer •
Monomer 9
Monomer 9
Monomer 9
Monomer 10
Monomer 10
Monomer II
Monomer II
Protein Type
Peocaaeana
Control room
Control room
Control room
Control room
Control room
Control room
Control room
Control room
| parting area
loading area
Loading area - railcar
tTling area - raikar
Loadifti area * railcar
Loading area/eemi*tracior trailer
l"44ing area/ecmi-tracior Inikr
L~*ittfH aica/iemi-wacior inikr
Loading area
Loading area
lob Title
Proceia lechniciao
Proccea technician
Proceai technician
Procaaa technician
Proceat technician
Proceae Uchaician
PHfCttgU Intrhfniriknl
PVQCAftU mtrhBnfifltt
Procaaa terhnirian
Prm-ttt k ia
Proceea irrhnirian
Proccai technician
Proccaa icchnician
Procaia tefhnic iin
Proceae icchnician
Proceat technician
Prnceaa lechuician
Proceae icchnician
NOTE: Source of data: NIOSH/EFA. Laboratory aoalyaii limit of detection ranged (ram 2 la
(a) IDO - Initial cMcgorica
tfhk Thr fnllnwin* Mr Ihr rnninJ iviwi- It i-mtinJIcd. 7k uacnatmHfA •»> lahonbwv with 1?
Control
Typc(b)
2
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
Duration (min)
420
466
447
455
451
452
442
425
453
449
415
421
427
474
260
442
4M
474
446
443
459
8-hr TWA
(ppm)
£0.14
£0.02
£0.02
£0.02
0.25
£0.08
0.52
£0.04
£0.11
1.17
1.70
0.50
1.44
1.29
£0.11
£0.12
0.46
2.40
5.46
0.01
123.57
3.97
Coolrul Description
Singk mechanical acali 4k opto-loop bomb atnpliog
General room ventilation '
General room ventilation
General room ventilation
General mum ventilation
General room ventilation
General mom ventilation
General room ventilation
General room veatilaiioa
General room ventilation
Oenenl room ventilation
Slip-tube gauge
Slip-lube gauge
SUp-tube gauge
Magnetic gauge
Magnetic gauge
Rotameler vauve
Rotameter gauge
Rotameicr gauge
Slip-lube gauge
Slip-tube gauge
1 1 pg/aa«o**, depending on the day of me analyaia.
ttir ^h»n»»aAftr A\ IAA4L Mttkj>_AM ml* in l.luu^a^Hj <\ Cnflt 1.- :_ :- l-l . *\ *.Hm 1.. ..- ..,
in laboratory.
-------
STEP 7: CHECK FOR CONSISTENCY AND REASONABLENESS
Once tht data have been loaded into the spreadsheet, the next step is to check them for
consistency and reasonableness. It is recommended that, first, all the exposure measurements be
converted to consistent units. This step describes some of the considerations related to conversion of
units and the types of checks that can be made subsequently to verify that die results are reasonable.
For conversion of units, typically a standardized procedure consisting of grouping similar types
of data, conversion to consistent concentration units, and conversion to consistent exposure periods can
be used. For some data, however, all the information necessary to do the conversions is not known (e.g.,
actual exposure time period). In many of these cases, assumptions can be made that will allow use of
the data in the analysis. All such assumptions should be recorded in the uncertainty/assumption list.
The general approach for conversion of data into consistent units is the following:
• Grouping of like types of data (e.g., 15 minute, long term, area, personal),
• Conversion to consistent concentration units (e.g., mg/ra3 or ppra),
• Conversion to consistent exposure periods when defensible (e.g., 8-hour TWA), and
• Estimation of missing information.
A. Grouping of Like Types of Data
It is extremely important that different types of samples not be averaged. For example, area
samples generally do not represent personal exposure, and 15-minute peak and ceiling sampling should
not be adjusted to represent full shift exposure. Specific data groupings that usually form like data sets
and, as a general rule, should never be pooled into a single data set include:
• Area samples
• Personal samples
• Short term exposure estimates
• Long term exposure estimates
EXAMPLE
In the example data set only personal TWA samples will be used.
33
-------
B. (Aversion to Con«kTfim r^ncentration Units
The end user should be consulted for guidance on preferable reporting units early in the project.
Occupational exposure monitoring data are typically reported in either ppm or mg/m3. NIOSH reports
and journal articles report the occupational exposure values in either ppm or mg/m3, while OSHA
Inspection Summary Reports almost always report occupational exposure values in ppm. Before
conducting statistical analysis on different data sets, all measurements need to be converted into similar
units. Values in ppm can be converted to mg/ra3 by the following equation:
mg/m3- ppm x *** x _L x 298
8 VV 24.45 760 (T+273)
where:
P = barometric pressure (mm Hg) of air sampled;
T = workplace temperature (°C) of air sampled;
24.45 = molar volume (liter/g-mole) at 25°C and 760 mm Hg;
MW = molecular weight (g/g-mole);
760 = standard pressure (mm Hg); and
298 = standard temperature (°K).
EXAMPLE
Consider a case in which a chemical concentration is reported to be 5 ppm at a pressure of
760 mm Hg and 25°C. The molecular weight of the example chemical is 54.1 g/g-mole. The
occupational exposure can then be converted from ppm by the following equation:
mg/m3 - 5 ppm x *LI x 22 x 298
v •» w» 245 760 (25 „ 273)
mg/m3 * 5 ppm x 2.213
Therefore, for the example chemical, a concentration of 5 ppm is equivalent to a concentration
of 11.1 mg/m3.
C. Conversion to Consistent Exposure Periods
NIOSH and OSHA exposure limits for chemicals are often based on 8-hour TWAs; therefore,
occupational exposure monitoring data are often converted into 8-hour TWAs in order to compare worker
exposures to these regulatory or recommended limits. Monitoring data collected from OSHA are
34
-------
typically reported as 8-hour TWAs because they are sampled for compliance with an 8-hour TWA
Permissible Exposure Limit (PEL). OSHA TWA measurements may utilize a zero exposure for the
unsampled portion of the 8-hour day when calculating the TWA. It may be useful to determine whether
the sample represents an actual 8-hour sample or an 8-hour TWA. Some NIOSH reports and journal
articles present data collected for less than an 8-hour time period. The measurement samples are literally
only representative of the exposure period actually sampled. However, professional judgment or reliable
knowledge may sometimes be used to extrapolate data collected for shorter time periods to an 8-hour
TWA (Patty, 81). Where the exposure during the shorter period is representative of the exposure during
the entire work period and the length of the work period is known, exposure values can be converted into
8-hour TWAs based on the shorter exposure duration.
Based upon the job description in the NIOSH report or journal article, an estimate of the number
of worker hours per day related to each job category may be estimated. This should be done with caution
as many times the sampling time was dictated by the analytical method or other cause not related to
exposure and is not representative of the entire day. If the measurement sample is judged to be
representative of the exposure period and the exposure period is less than 8 hours, then an exposure value
not already reported as an 8-hour TWA can be adjusted to an 8-hour TWA as follows:
- 8-hour TWA » exposed value x exposed hours per day
This approach is only valid when you can assume that mere was no exposure during the remainder of the
workday. This is a key assumption that should not be made without good information indicating that this
is indeed the case.
Peak and ceiling measurements should never be converted to 8-hour TWA exposures. These
measurements are best taken in a nonrandom fashion. That is, ail available knowledge relating to the
area, individual, and process being sampled are utilized to obtain samples during periods of maximum
expected exposure (Patty, 81). Therefore these measurements by design are not representative of the
longer work period. They are representative only of the time period over which they are taken, which
usually corresponds to an applicable standard for peak or ceiling exposure.
EXAMPLE
While most samples were taken to represent 8-hour TWA exposures, some were not.
Information gathered during the plant visit was used to estimate the exposure period for those
measurements that did not represent 8-hour TWAs.
35
-------
D.
Many times the conversion of data to consistent units involves the need to make assumptions
about the process or the worker activities. For example, the conversion from mg/m* to ppm requires
knowledge of the workplace temperature. If this is not given in the report, an engineering judgment must
be made as to the typical temperatures in die work area. Other data may indicate that die sample time
was 2 hours but not indicate if the job was performed for 2 hours or 8 hours per day. Again, engineering
judgment of typical practices in that industry may have to be used to estimate die exposure period.
*
Since such assumptions can have large influences on the exposure value, all assumptions should
be recorded in the uncertainty/assumption list and presented with the results of the analysis. Where
assumptions have been made in such calculations, ranges of possible values can be estimated for later
sensitivity analysis. For example, an assumption for one worker can be made based on data from other
workers with the same potential for exposure. If the data for the other workers indicated a period of
exposure ranging from 2 hours to 8 hours, then it is possible that the exposure period of this worker
could range from 2 to 8 hours as well. Exposure values for these extreme times can be calculated and
the results tested for sensitivity to the assumption (see Step 18). All data where assumptions need to be
made for important parameters should be classified as Type 2 data.
Typical -default values that can be assumed where there is no information to die contrary are:
• Where die monitoring method is unknown, die predominant method used for that
agency/company during die appropriate time period may be assumed to have been used.
• Where there is no information to die contrary, ambient temperature and pressure (298°K, 760
mm Hg) may be assumed.
Where assumptions cannot be made because of lack of knowledge of me process or job activity,
then these data should be classified as Type 3 or incomplete data. Classification as Type 3 results in
values being excluded from the analysis.
EXAMPLE
Because EPA and NIOSH collected die data used in die analysis specifically for die
analysis, no information needed to be estimated.
E. Checks for Consistency and Reasonableness
Data manipulation errors are caused by calculation errors, errors of transposition, errors of
transmission, use of wrong units, use of improper conversion factors, spatial or temporal averaging
information loss, and raisassociation errors that confuse samples and numerical results (EPA, 92). Some
36
-------
of these errors can be identified by comparison with known standards. While most chemicals will not
have all of the following parameters, comparison with those that do will help to flag possible data
manipulation erron: .
• Immediately Dangerous to Life or Health (1DLH)
• Analytical limit of detection
• Lower or Upper Explosive Limits (LEL, UEL)
• Applicable standards (OSHA PEL, ACGffl TLV, NIOSH REL, STEL, ceiling, etc.)
Data that appear to be outside of typical limits such as these may be outliers and should be
rechecked for the accuracy of the value. The use of incorrect units for the data is one of the biggest
causes for such errors, and verification of the value and units can usually substantiate the data.
Additional tests for outliers are discussed in Step 15.
EXAMPLE
For the example data set, the monitored levels were far below any regulatory limits
(IDLH » 20,000 ppm; OSHA PEL =» 1,000 ppm) and the limit of detection of the new analytical
method was very low (0.0054 ppm). A verification of the units, experience with other situations,
and confidence that the disparity between the PEL and the measured units reflects a real situation,
not an error in units, suggested that the monitored levels were reasonable.
37
-------
STEP 8: COLLECT ADDITIONAL MISSING INFORMATION
The purpose of this step is to fill data gaps in the matrix through the collection of additional
information. Data points that lack specific information in the source document for parameters that are
judged important may be difficult to classify during analysis. However, this missing information may
be available by direct contact with the inspector identified in the report. Obtaining missing information
may be as simple as properly classifying a process type or job description, or as difficult as identifying
the controls in use when the measurements were taken.
For NIOSH and OSHA reports, the name or identification number of the inspector and the office
location is usually present on the report. Where feasible, direct contact with this person by telephone is
usually the best method to gather the data. Some inspectors will request that a letter be sent requesting
release of the information under the Freedom of Information Act. For data from a trade association or
from one agency office where extraction of die primary data or ancillary information may be time
consuming, a written request or a trip to the location may be necessary. It is important to remember that
collection of all missing important variables can change a Type 2 measurement to a Type 1 measurement.
EXAMPLE
For the example data set, the problem with the sensitivity and selectivity of the test method
was so severe mat all new data using a new test method were necessary. For most chemicals, mis
would not be the case and the collection of additional information on important variables for the
existing data helps to increase the size of the Type 1 data sets.
38
-------
STEP 9: ESTIMATE ADDITIONAL MISSING INFORMATION
Data gaps in the exposure matrix (i.e., missing ancillary information) can also be filled by
estimating missing information when appropriate. If data gaps in the matrix are in areas critical to the
accuracy of the assessment, the scope of die assessment may need to be narrowed, or further data
collection may be necessary. If data gaps are not critical and if it is not feasible to contact the inspector
or otherwise gather additional information, it may be appropriate to fill data gaps by making assumptions,
using surrogate data, or using professional judgment, etc. Caution should be used when making
assumptions or using other approaches to estimate missing data as this may increase the uncertainty
associated with the assessment and/or cause outliers in the data set. If an assumption is made for an
important variable, the data can only be used as Type 2 data. The use of assumptions, surrogate data,
professional judgment, or combinations of these methods must be clearly documented and the rationale
for each assumption or judgment given (via notations on the uncertainty/ assumption list).
In the absence of data, CEB uses these methods to develop screening level estimates of exposure.
These screening level estimates generally err on the conservative side (i.e., overestimate exposure) and
are used to determine whether potential exposures are of no concern and can be eliminated from further
consideration. If the estimates are of concern, additional data and information are gathered and die
estimates are refined if possible. Due to the uncertainty associated with these estimates, the assessment
must be well characterized and used with caution.
If surrogate data are used, the differences between the surrogate and the substance of concern
must be small, and the scenarios for which exposure is estimated must be very similar or the same. If
conservative assumptions are used, the resulting exposures should be expressed appropriately using an
appropriate exposure descriptor. It is important to be aware of and explain how many assumptions are
used; their influence on die final conclusions of die assessment will be evaluated in later steps. The
mathematical product of several conservative assumptions is more conservative than one assumption alone
and can result in estimates that are unrealistically conservative bounding estimates (EPA, 92; IT, 91).
The following present typical kinds of assumptions, use of surrogate data or information, or
professional judgments that may be made, as appropriate.
• Process type - Other variables such as process temperature, drying time, etc., could be used
wife professional engineering judgment to make an estimate of the process type.
• Occupational control type • Company practices and engineering controls in place could be used
as surrogate information to estimate what was being used during the time the sample was
taken. This assumes die current process and controls are the same or very similar to those
used when the sample was taken.
• Production levels • The average or range of production levels for the facility or industry could
be used as a surrogate to estimate the production level when actual figures are not available.
This assumes the production levels are the same or very similar.
39
-------
Concentration of the chemical in the process - The average or range of concentrations in other
processes could be used as a surrogate for estimating the concentration in the process,
assuming the processes and concentrations are very similar or the same.
EXAMPLE
The example data set was collected by NIOSH and EPA and all important parameters were
identified and data collected. Therefore, a hypothetical example will be used to illustrate the
process.
In the hypothetical example the age of the equipment was identified as an important variable
for two reasons. First, newer process equipment tends to contain dual mechanical seals and has
been shown to reduce fugitive release of the chemical, while older equipment does not. Second,
in this industry newer facilities are often better maintained than older facilities.
Because the monitoring measurement in question was taken by OSHA, the OSHA inspector
listed on the inspection summary was called. The inspector no longer worked for OSHA and the
person contacted at the local office could find nothing in the file for that facility to indicate the age
of the equipment. It was discovered that the facility was an older plant An attempt to directly
contact the facility where the monitoring data were collected indicated that the facility was closed
about a year ago.
Because older facilities that are about to be closed generally have older equipment and tend
to be poorly maintained it was assumed that this measurement represented data from a facility using
older equipment. This assumption- is based on professional engineering judgment and knowledge
of the industry. The assumption and rationale would be documented within the assessment and
presented with the results.
40
-------
STEPrO: REVISE EXPOSURE MATRIX AND IDENTIFY DATA BY TYPE
The exposure data matrix should be updated to reflect any changes entailed by the checks of
consistency and reasonableness, and to display the concentration measurements in consistent units. la
addition, the exposure matrix can be modified to reflect the results of collecting additional information
or estimating the values of ancillary data. At this point, the revised exposure matrix (in conjunction with
the uncertainty/assumption list which details the treatment of uncertain values and lists all assumptions
that have been made) should be indicative of the modifications that have occurred in the first round of
updating the data. As indicated in the next step, additional rounds may be conducted.
Using the revised exposure matrix as the basis for classification, the data are categorized as
Type 1, Type 2, or Type 3 data. Recall that the categorization of worker exposure data into the three
distinct types is based on the following considerations:
• Type I data consist of measurements for which all important parameters are available Typical
sources of Type 1 data include statistically valid studies, and NIOSH and OSHA data for
which all important parameters can be determined.
• Type 2 data consist of measurements where the important variables are not available but for
which assumptions can be made to estimate them. For example, if the limit of detection is not
known because the monitoring method is not stated, OSHA or NIOSH measurements may be
assumed to have been taken using the recommended method for the time period. Typical
sources of Type 2 data include NIOSH and OSHA reports which contain incomplete
information and for which the inspector cannot be located or cannot provide the missing
information. Other typical sources include journal articles, state agencies, and other federal
agencies or departments.
• Type 3 data consist of measurement summaries, anecdotal data, estimation techniques, or other
data for which the important variables are not known and cannot be estimated. A typical
example is a data summary provided by a trade association. The association will not allow
access to the primary data, and many questions remain unanswered on how die data were
collected and tabulated.
The engineer will need to use professional judgement in classifying the data, but all data should
be classified as either Type 1, Type 2 or Type 3. If it is questionable which type best describes the data.
the data should be classified as a lower type. If new information is found that allow raising to a higher
type, this should be done at that time.
When all data have been classified, it may be helpful to separate out the Type 3 data. A separate
Type 3 exposure matrix may be created. The Type 3 data will not be subject to any statistical analyses,
whereas Type 1 and, perhaps, Type 2 data will be analyzed. If the user needs can be met, the Type 3
data will be treated as described in Step 12.
41
-------
EXAMPLE
In the example data set, ail Type 3 data were excluded from the analysis due to potential
bias in the monitoring methods used. For the sake of the example, some of the excluded data will
be used in Step 12 to show how Type 3 data should be treated.
42
-------
STEP U: ASSESS ABILITY TO MEET USER NEEDS
The ability to meet the needs of the project manager is dependent on both the quantity and quality
of the data collected. User needs were preliminarily defined in Step 1 and formally defined in Step 3.
The purpose of this step is to formally determine if die assembled data are sufficient to meet the project
manager needs defined in Step 3.
If there are insufficient data to meet the needs identified in Step 3, the project managers should
be informed that their expectations should be modified to match the existing data or additional resources
are needed to obtain the desired quality of data. If no decision can be reached, it may be appropriate to
stop work until a decision is made so that resources are not wasted on work that will not meet the
specified needs.
The most likely case is that most of the user needs can be met but that some requests will be
difficult to fulfill. These potential difficulties should be identified in writing and sent to the project
manager. The project manager can then reassess how important each need is and estimate how much
additional effort, if any, should be expended to gather the necessary data.
If the CEB engineer is satisfied that the data are sufficient to meet the end user needs, proceed
to Step 12. It may be determined that those needs can be met even if Type 3 data are all that are
available. Typically, however, Type 1 or Type 2 data will be required. To obtain such data, additional
rounds of data collection, or further estimates of ancillary information may be approved. If no additional
information can be obtained, then the exposure assessment should proceed to Step 19, Presentation of
Results, at which point a summary of the available data can be completed, detailing data deficiencies with
respect to the end user needs.
EXAMPLE
For the example data set, the need to develop a new analytical method to account for a
potential bias in the existing method as well as the need to collect new data caused a delay in the
completion of the exposure assessment. The end users were notified of this delay; they approved
the data collection and analyses based on die new data.
43
-------
STEP 12: TREAT TYPE 3 DATA
If it is determined that user needs can be met, the next step is to use nonstatistical methods to
present Type 3 data and to give alternate ways to generate additional Type 3 exposure estimates for
comparison with existing estimates. When a comprehensive assessment is not needed and all of the
individual monitoring data are Type 3 (i.e., many of the important variables are not known and cannot
be estimated), no statistical analysis of the data should be done. Although descriptive statistics could be
calculated for some Type 3 data sets, such analyses may mislead the end uses into a false sense of
confidence in these data. The preferred method is to describe the data qualitatively in the report,
including its deficiencies, and any conclusions that can be drawn. A median and range may also be given
for each data set. Each Type 3 data set should be presented separately. Preferred data sets should be
identified and reasons given for the preference. In addition, any uncertainties, assumptions, and biases
should be clearly identified, using the uncertainty/assumption list initiated in Step 5.
When only summary data, anecdotal data, or no monitoring data are found for a chemical, and
a comprehensive exposure assessment is needed, the resolution depends to a targe extent on the end use
of the assessment. There are two primary options when there are insufficient data to perform the
analysis:
• Collect monitoring data (i.e., conduct a survey for segments for which no data are currently
available; conduct a monitoring study, etc.)
• Use other nonmonitoring methods
When there are insufficient data, the best method is to collect the required monitoring data. This
alternative may not be viable as it can be extremely expensive and the time constraints on the analysis
may not allow this option. As a result, it is often necessary to use omer nonmonitoring methods. These
include:
• Modeling of the exposure
• Use of a surrogate chemical or job type
• Comparison with a workplace standard
• Professional judgement
Modeling of the worker exposure can be used to estimate exposure where no monitoring data are
available. Almost never will there be sufficient data available to validate a model as real time release,
air movement and several receptor monitoring data are necessary. However, sometimes a previously
validated model can then be used for other chemicals within the stated constraints of the model. For
indoor exposures, such models typically require the estimation of a release rate, room size, ventilation
rate, and exposure duration. When using models, the results should always be tested for reasonableness
44
-------
against any available monitoring data or calculations based on surrogate monitoring data. One advantage
of the model approach is that sensitivity analysis can be conducted to identify those factors that cause
large uncertainties in predicted exposures. A sensitivity analysis simply involves running the model using
a range of input variables and measuring how the results change as the input variables are changed.
The use of monitoring data for a similar chemical as a surrogate is another approach when no
monitoring data are available for the chemical of interest. A rough exposure estimate can be made by
adjusting the surrogate monitoring data for the differences in vapor pressure, molecular weight, and
concentration of the chemical in the process. The degree of uncertainty in the approach depends on the
similarities between the chemical and its uses and the surrogate and its uses, and how well the worker
activities are understood in both situations. This approach is particularly useful in the analysis of new
chemicals where little or no actual exposure to the chemical has occurred (TT, 91).
A final approach that can be used in the absence of monitoring data is to use professional
judgment to develop a plausible exposure scenario based on knowledge of the operation, or assume
compliance with the OSHA PEL for the substance. When professional judgment is used to develop an
exposure scenario, no exposure descriptor is used, and the uncertainty associated with me assessment is
high. This type of assessment is characterized as a "what-if" scenario and the uncertainty associated with
the assessment must be carefully and fully communicated to the user. When assuming compliance with
the OSHA PEL. a search of the OSHA Computerized Information System (OCIS) database should be
conducted to check the assumption of compliance. The assessment should be characterized as a "what-if
scenario if the assumption of compliance cannot be supported based on monitoring data or other'
documentation. Engineers must be extremely careful to properly characterize the type of assessment
presented if compliance with the OSHA PEL is assumed. There are currently different OSHA PELs for
different industries, such as construction, agriculture, etc. Currently, OSHA does not inspect facilities
with fewer than 11 employees. If this approach is used and if compliance data or other data have been
evaluated, the workplace standard should be identified with an appropriate exposure descriptor. The
uncertainty of these methods is high, but when properly used and presented, these estimates are
acceptable for screening level assessments.
The outcome of this step is a nonstatistical report that qualitatively describes the data, including
its deficiencies and any conclusions that can be drawn. If there are Type 1 or Type 2 data, then proceed
to Step 13. If not, then the nonstatistical report will be the primary result of the exposure assessment and
it can be presented as described in Step 19.
45
-------
EXAMPLE
For the example chemical, some Type 3 data were available. The following gives examples
of how such data should be described:
• Six companies completed studies to determine exposure to the chemical. Although
attempts were made to obtain the original monitoring data, only summary results were
made available.
%
• Although the data cannot be compared directly across several companies, the areas of
higher exposure appear to be 1) the monomer transfer and storage area, 2) the reactor
area, 3) the recovery area, and 4) the lab area.
• One source states that release and exposure to the chemical in die solution polymerization
process are very similar to those in the emulsion process.
• If monitoring summaries examined in the analysis are representative of levels at polymer
plants, they imply that additional controls would not be required at typical polymer plants
to limit exposure to 10 ppm.
46
-------
STEP 13: TREAT NONDETECTED VALUES
Measurements that are recorded as nondetected are assigned a numerical value so that they can
be used to calculate descriptive statistics which characterize the data set. Care should be taken to ensure
that the chemical reported as nondetected was actually being used at the time. Otherwise the descriptive
statistic that is calculated will be biased by inclusion of a site where the chemical was never used. The
first task in the treatment of nondetected values is to gather information on the analytical method. If a
quality assurance plan was developed for the study, it may also contain useful information and should be
reviewed. The NIOSH Manual of Analytical Methods provides information on NIOSH analytical methods
(NIOSH, 84). That manual identifies the analytical method used for each chemical for which NIOSH has
developed an analytical method. The OSHA Technical Manual (OSHA, 90) and OSHA Chemical
Information File (OSHA, 85) provide information on current OSHA methods. Information to gather
regarding the analytical method includes:
Issue date.
Applicability,
Interferences,
Other methods,
Accuracy,
Range and precision.
Estimated limit of detection (rag/sample),
Maximum sample volume (liter), and
Evaluation of method.
If the issue date for the analytical method is after the date the sample was collected, the engineer should
determine what other analytical methods are used for this chemical.
The second task in the treatment of nondetected values is me calculation of a representative value
(Crump, 78; Hornung, 90). The limit of detection for these data must first be determined. There are
two ways in which a limit of detection may be reported:
• The limit of detection of analytical equipment such as a gas chromatography (GC/MS, etc.),
which is normally expressed in mg per sample, and
• The sampling limit of detection in measuring workplace air concentrations, which is normally
expressed in mg/m3 or ppm.
The sampling limit of detection accounts for bom the analytical limit of detection and the sample air
volume and is the value needed for calculation^ purposes. In many cases, however, this value is not
reported directly. The sampling limit of detection will often vary from sample to sample if different
volumes of air are collected.
47
-------
If the analytical method is not reported, the prevalent analytical method used at the time of the I
study should be assumed and this assumption recorded on the uncertainty/assumption list. If the sample
volume is not reported, the maximum sample volume recommended in the analytical method could be
used for calculation^ purposes, and this assumption recorded as well.
An analytical limit of detection is normally specified in a published sampling and analytical
method, and a sampling limit of detection can be calculated if the sample volume is known or can be
assumed. The following equation is used:
*
Sampling limit of dectction (mg/m3) - Analytical limit of detection (mg) x 1QQQ (liters/m*)
Air volume sampled (liters)
EXAMPLE
For the example chemical, consider a case in which a 25.0-liter air sample has been
analyzed for the example chemical using NIOSH Method 1024, which has a reported analytical limit
of detection of 0.0003 mg per sample. The sampling limit of detection is therefore:
Sampling limit s 0.0003 mg x 1000 liters/m3 » 0.012 mg/m9
of detection 25.0 liters
A reported or calculated sampling limit of detection should not be directly substituted for those
values reported as nondetectable because, by definition, such values are below die detection limit. A
value lower than the sampling limit of detection must merefore be substituted for these values. As
described by Hornung and Reed (Hornung, 90), the preferred method for calculating this value depends
upon the degree to which the data are skewed and the proportion of me data that is below detection limits.
The two methods are:
1) If the geometric standard deviation of the monitoring data set is less than 3.0,
nondetectable values should be replaced by the limit of detection divided by the square
root of two (LA/2).
2) If the data are highly skewed, with a geometric standard deviation of 3.0 or greater.
nondetectable values should be replaced by half the detection limit (L/2).
If 50% or more of the monitoring data are nondetectable, substitution of any value for these data
will result in biased estimates of the geometric mean and the geometric standard deviation (Hornung. 90).
If it is necessary to calculate statistics using data sets with such a large proportion of nondetectable data,
the potential biases introduced by these calculations should be described when presenting the results of
the analyses. It should be noted that mere are other methods to address reporting limit of detection values
(Aitchison, 57; Cohen, 61; EPA, 92; Waters, 90).
48
-------
EXAMPLE
Preliminary examination of the data, categorized by the important exposure parameters
(Step 4) indicated that geometric standard deviations tended to be at or above 3.0. Therefore, half
the detection limit was used for all example calculations to represent nondetected values. That
choice was recorded on the list of uncertainties and assumptions. The impact of choosing L/2 on
the analyses will be examined in Step 18.
49
-------
STEP 14: SEPARATE INTO TYPE I DATA AND TYPE 2 DATA
In Step 10, data in the exposure matrix were classified as either Type 1, Type 2 or Type 3 data.
Type 1 data consist of measurements for which values of all important parameters are known. The data
consist of studies that contain individual measurements, and include all backup and ancillary information.
Type 2 data consist of measurements where values of important parameters are not known but for which
assumptions can be made to estimate these variables. The data consist of individual monitoring
measurements, but backup and ancillary information is highly variable. No Type 3 data (summaries,
anecdotal, etc.) should be in the matrix. All such data should have been excluded in Step 10.
The data should now be sorted by the Type I/Type 2 classification and separate matrices formed
for each type of data. Type 2 data will only be used for statistical analysis when mere are insufficient
Type 1 data to perform the analysis. The products of this step are two separate matrices that will be used
in die statistical analysis.
If only minimal Type 1 and Type 2 data exist, that together are still not sufficient for statistical
analysis, all data are treated as Type 3 data and the analysis returns to Step 12. In this case a qualitative
report that describes the data, including its deficiencies and any conclusions that can be drawn is
prepared, as described in Step 12.
EXAMPLE
For the example data set, all newly collected data were of Type 1. Some previously
collected data were Type 2 data and these will be considered as necessary (Step 16).
SO
-------
STEP 15: DEFINE GROUPS FOR ANALYSIS
The purpose of this step is to identify the groups that will be the basic units for the calculation
of descriptive statistics. Each group is intended to include measurements representing samples from a
single distribution of concentrations; the descriptive statistics computed for that group pertain to that one
distribution. The principal output of the application of these guidelines will be the group-specific
descriptive statistics.
•
The groups that result from this step are those that are determined to have as large a sample size
as possible given the characteristics of and differences in exposures (e.g., those caused by effects of die
parameters identified by the engineer or industrial hygienist in Step 4). Stated another way, the groups
will be as large as possible while minimizing variation within the groups relative to variation between the
groups. Statistical approaches are described to perform the necessary calculations. The initial grouping
that is an input for die statistical calculations is based on the important exposure parameters identified by
the engineer in Step 4. Combinations of die original categories may result in the definition of new
groupings that will be subject to statistical description. Figure 5 presents a flow diagram defining the
subtasks involved in the definition of the groups.
51
-------
A Mantjfa Initial Groaning
For a given data set, the initial categories are determined by the important parameters identified
by the CEB engineer or industrial hygienist in Step 4. The initial categories are defined by the
combinations of ail the important parameters. Note that if there are many important parameters, there
could be very many initial categories which would tend to reduce the number of observations within any
given category. The engineer is encouraged to try to reduce the number of. important parameters
considered. This may be accomplished, as discussed in Step 4, by eliminating from consideration as
many variables regarded only as "blocking" factors as possible. It is to be hoped that truly explanatory
variables can be found that account for much of the difference observed across blocks.
53
-------
EXAMPLE
The data were collected for both the manufacture (monomer industry) and use (polymer
industry) of the chemical. In the monomer industry, there were 209 measurements from four
plants (Ml, M2, M3, M4). In the polymer industry, there were 578 measurements from five
plants (PI, P2, P3, P4, P5). The total data set consisted of 787 measurements: 516 full-shift
personal samples, 37 short-term samples, and 232 area samples. For the example calculations,
only the 516 full-shift personal samples were used. Values reported as nondetected were treated
as described in Step 13, and the value of L/2 was used in all calculations. -The value of L for
each nondetected measurement was determined individually based on the sample volume and the
reported analytical limit of detection. These data are presented in Appendix A. The variables
deemed most important by the industrial hygienist/engineer were sample type, sample collection
method, industry, company, process type, job tide, and occupational control type. Consideration
of sample type and sample collection method resulted in retention of only the full-shift personal
samples collected by die newer method. Industry, typically considered a blocking variable, was
retained because of the end user request to consider die monomer and polymer industries
separately (see Step 3).
Examination of die 516 full-shift personal sample data points (Appendix A) showed that,
after consideration of industry, company, process type, and occupational control, little or no
additional information was provided by job title. That is, there tended to be only a single job
title for any given process type. Thus, job title was not considered for the definition of die initial
groups. On the basis of the remaining parameters, 58 initial groups were identified, wim sample
sizes as indicated (by industry, company, process type, and occupational control):
Monomer:
Ml, Control room, control I: N=3
Ml, Lab, control 4: N=6
Ml, Process area, control 2: N=5
M2, Control room, control 1: N=3
M2, Lab, control 3: N=9
M2, Loading area, control 1: N=3
M2, Process area, control 1: N=6
M3, Control room, control 1: N=2
M3, Lab, control 6: N=7
M3, Loading area, control 2: N=6
M3, Process area, control 2: N=4
M4, Control room, control I: N=2
M4. Lab, control 5: N=7
M4, Lab, control 2: N=3
M4i Loading area, control 2: N»2
M4, Process area, control 2: N=» 12
M4, Tank farm, control 1: N=5
54
-------
Polymer:
PI, Crumbing and drying, control 1: N=9
PI, Lab, control 1: N» 10
PI, Maintenance, control 1: N»34
PI, Packaging, control 1: N=»30
PI, Polymerization or reaction, control I: N»6
PI, Process area, control 1: N»5
PI, Purification, control 1: N»6
PI, Solutions and coagulation, control 1: N=9
PI, Tank farm, control 1: N=5
PI, Warehouse, control 1: N»2
P2, Control room, control 1: N=6
P2, Crumbing and drying, control 1: N»7
P2, Lab, control 1: N-14
P2, Maintenance, control 1: N»9
P2, Packaging, control I: N»6
P2, Polymerization or reaction, control 1: N»29
P2, Solutions and coagulation, control 1: N=5
P2, Tank Farm, control 1: N=»3
P3, Lab, control I: N-3
P3, Maintenance, control 1: N»4
P3, Polymerization or reaction, control 1: N» 18
P3, Solutions and coagulation, control 1: N=4
P3, Tank farm, control 2: N»9
P3, Unloading area, control 1: N»2
P4, Crumbing and drying, control 1: N» 13
P4, Lab, control I: N-17
P4, Maintenance, control 1: N»7
P4, Packaging, control 1: N*20
P4, Polymerization or reaction, control 2: N»7
P4, Solutions and coagulation, control 1: N»3
P4, Tank farm, control 1: N»8
P4, Warehouse, control 1: N »11
PS, Crumbing and drying, control 1: N»6
P5, Lab, control 1: N-8
P5, Maintenance, control I: N»16
PS, Packaging, control 1: N»23
P5, Polymerization or reaction, control 2: N»2
P5, Purification, control 2: N» 12
P5, Solutions and coagulation, control 1: N* 12
PS, Tank farm, control I: N»6
PS, Warehouse, control 1: N-7
In the above list, the control types are as listed in Appendix A, The initial categories are
identified by number in Appendix A.
55
-------
B. f,py-Tran&forni the Data
The tests of the grouping .and the importance of the identified exposure parameters are conducted
on the log-transformed concentration values. This is done because it is typically assumed that
concentration data can be described by a log-normal distribution. If the concentrations are log-normally
distributed, the effect of log-transforming the data is to create normally distributed values. One
assumption underlying analysis of variance (ANOVA) methods (see subtask D below) is that die errors
are normally distributed. Thus, under the general assumption of log-normally distributed concentrations
and using a log-transformation of the concentrations, an assumption of the ANOVAs discussed below is
satisfied.
We have not proposed here to test the assumption that the concentrations are log-normally
distributed. This is considered appropriate in light of the theoretical rationale for suspecting that
atmospheric concentration data follow a log-normal distribution and the extensive empirical evidence that
a log-normal distribution can describe observed patterns c; concentrations of various compounds (see
Rappaport, 91, for a brief review). Moreover, ANOVA is robust with respect to departures from the
assumption of normality. That is, ANOVA can still be reasonably expected to give the correct
interpretation of die data even if the data deviate somewhat from a normal distribution. Nevertheless,
testing the assumption of log-normally distributed concentrations can be-considered an option, and
Appendix B presents information related to the testing of data to see if it is normal or log-normal. If the
engineer suspects that the concentration data should not be considered to be log-normal, he or she can
• apply the tests described in that appendix or consult a statistician for additional support. If departures
from log-normally distributed concentrations are detected, a notation should be added to the list of
uncertainties and assumptions.
The data points are transformed into natural (base e) log values as described by Equation I.
x,, = In (xj Equation 1
where:
t
x« = a log transformed data point
x, a a data point (as originally observed)
In a the natural logarithmic function
C. Graphical Examination of the Data: Check for Outliers
Before the ANOVA(s) are performed to test the importance of me exposure parameters, the log-
transtbrmed data should be examined once more to determine if some errors have been introduced. This
examination will focus on the pattern of observed values, rather than individual observations as in Step 7,
to determine if there are any values that appear "unusual." The unusual observations can be considered
to be the outliers, those observations mat do not appear to fit in with the rest of the data. "Box-and-
whisker" plots can be used to identify outliers.
56
-------
Box-and-whisker plots can be created for each of the initial categories. If there are relatively few
observations per category, less than 6 to 10 typically, such plots may not be very informative. One can
also combine son* of the initial categories and examine box-and-whisker plots for such combinations.
Caution should bt exercised when such combinations are considered, because it is not clear at this stage
of the analysis which categories ought to be combined. Combination of categories with quite different
mean values, for example, may lead to a bi-modal distribution that will be relatively uninfbrmative with
respect to identification of outliers.
Outliers can be identified from a box-and-whisker plot as the individual observations that are
displayed beyond the limits of the whiskers. More information about the box and the whiskers of such
a plot is presented in Appendix B. Any outliers so identified should not be dropped from analysis.
Rather, those data points should be examined to determine if they have been entered or calculated
incorrectly. Sources of error include, but are not necessarily limited to, misciassification (an observation
was recorded as belonging to one group when in actuality it belongs in another group), transcription (an
incorrect value was transcribed from the lab sheets entered into the computer data base), or calculation
errors (e.g., when units were converted).
If errors are detected, then they should be corrected and the graphical examination of the data
re-evaluated. If no errors are detected, then the data points should be retained and considered in the
ANOVA.
57
-------
EXAMPLE
Figure 6 shows the box-and-whisker plots for the initial categories from the monomer
industry. The numbers of observations within each category are small, so not much can be
determined with respect to outliers. However, in the category consisting of process area
concentrations at company Ml, there is one relatively low value; and in the tank farm at company
M4, there is a high value. The former is an observation below the detection limit (below 0.18
ppm, set equal to 0.09 ppm for these analyses) which appears low relative to the 4 detected
concentrations of 0.37 ppm or above at the Ml process area. The latter is a concentration of 1.53
ppm, which, compared to the other 4 concentrations from the M4 tank farm (all of which were less
than 0.31 ppm and included 3 non-detects), looks suspicious.
Figure 7 shows the SAS output for all of the monomer industry initial categories combined.
The box-and-whisker plot from that output shows two outliers, bom on the high side. Investigation
of those observations revealed that they were from the lab at M4 (with control type 2) and from the
loading area at M4. These points were not detected in Figure 6 because the initial categories in
which they were classified had few observations (3 and 2, respectively).
When these outliers were investigated, it was determined that they did not result from data
manipulation errors. Furthermore, they did not appear to be the result of atypical situations (e.g.,
a spill) at the plants involved. Because there was no evidence that they were unusual or erroneous,
these concentrations were retained for the subsequent analyses.
The concentrations in the polymer industry initial categories were similarly examined.
Again, no evidence of erroneous or atypical data were discovered and all data points were retained
for analysis.
D. Analysis of Variance
ANOVA techniques are the recommended basis for revising the initial grouping. Such techniques
are applied to determine if the observed concentrations within some of the initial groups are similar
enough to warrant combination of those groups. This approach is based on determinations of whether
or not the exposure parameters suggested by the engineer as potentially important actually discriminate
between exposure levels, i.e., whether or not those parameters are statistically significant with respect
to concentration differences.
The application of ANOVA may not be straight-forward in many real cases. Difficulties can arise
if there are several factors being considered, if confounding or aliasing of me effects of those factors is
possible, or if there are correlations among the observations (e.g., if there is nesting of the effects of one
factor within another factor). The ANOVA approach described here is relatively easy; suggested
interpretations of standard statistical output are provided. However, it is recommended that the engineer
consult a statistician to help interpret problematic cases and to suggest supplemental analyses that may
resolve the problems.
58
-------
Univariate Procedure
Schematic Plots
VariablecLOGCONC
6
5
4
S
2
1
0
-1
-2
-3
-4
LogCconcentrat ton)
«..!..«
• *
I * I
l j
*--«-••»
1 1
COMIRflt
PtOCESS
OMPAHV
Control
N1
Lab
HI
2
Process
HI
Control
N2
3
Lab
N2
1
Loading
N2
Process
N2
Figure 6: Box-and-Whisker Plot for Monomer Industry Categories
-------
Univariate Procedure
Schematic Plots
Variabl«>LOfiCONC
Log(concentrat ion)
6
5
4 •
3
2
1 •!
0
•'
-2
-3
-4
-5
CONTROL
PROCESS
COMPANY
...
1.
*
U..I
1
Control
N3
,
6
Lab 1
M3
.oa<
lint
NJ
i
I 1
\
>roi
:esi
HJ
•..«..*
> 1
> Control
1 Mi
*
i
Lat
N4
*"-'
1
1 1
1. i
1
1 , ,
*.....*
...... 1 A 1
1- J
'5221
> Lab Loading Proces^ Tank far
^" P^% N% W%
Figure 6: Box-and-Whisker Plot for Monomer Industry Categories
-------
MonoMr Induttrjr
Unl»*rt«l« Pr
t»tr«
oss
*- 0
H(St«.)
«.)
•••>
Su» MoU OS
I.0200SO WarUnc* 1.721207
0.474001 K«rlnU I.2M201
Ml.0011 CSS II2.S004
-214.070 Std NM« 0.2002M
-4.200*1 *r»|l| 0.0001
04 NHB » 0 27
-IS Pr»|M| 0.0014
-010 Pr»ll| 0.0001
0.0*0101 Fr« O.M12
100k H». S.02M2S
7S» Ql 0.42S2M
SOX N*d -0.7111S
in Ql -2.407M
OK MU -4.MSI7
IO.52O2
2.013211
-4.M117
•W S.*2M2S
tftK I.5OS14S
WK 1.0*1*2)
in .1.21000
n .1.01202
1* -4.0MI7
-4.MSI7
-4.(0517
-4.MSI7
-1.01202
-1.01202
i;
2
1
•1
M
-------
An ANOVA of only the main effects (the important exposure parameters identified in Step 4) is
recommended. That is, for the purpose of identifying which factors to retain for the definition of the
final categories, examination of the contributions of the factors themselves and not their interactions
should be sufficient. The presence of an interaction means that the effect of one factor is not the same
across all the values of some other factor or factors. While such interactions may exist, it may be
difficult to evaluate them if there are relatively few distinct combinations of factors for which we have
observations. A statistician should be consulted to determine the effect of ignoring interaction terms in
any particular case that appears to be problematic.
*
Although, in Step 4, the engineer was encouraged to identify explanatory exposure variables
(e.g., control type, job tide, etc.) as opposed to blocking variables (e.g., company or industry), inclusion
of some blocking variables in the ANOVA can help to avoid potential difficulties. The inclusion of
blocking variables in ANOVAs is typically recommended so as to account for sources of variability that
are not otherwise accounted for by the explanatory variables, especially when there are known or
suspected differences across the units that are being observed that can not be controlled. Blocking by
company, for example, can make the test of control type more sensitive, if there are company-to-company
differences that can not otherwise be Factored out. Moreover, problems of correlation (e.g., observations
obtained at one date being more closely related to one another than they are to observations from another
date, even if the observations came from the same plant and process type) might be minimiy«d by
blocking, especially blocking by calendar time if the concentration measurements have been collected over
a relatively long period of time. Blocking may not be the ideal solution (nested ANOVAs might be
considered—see Appendix B), but a simple main effects ANOVA with suitable blocking factors may be
sufficient for the purposes of determining which factors to retain for group definition. Again,
consultation with a statistician is recommended. Moreover, if large block-to-block differences are
observed, the engineer may find it useful to determine if there are some explanatory variables that might
account for those differences.
For each of the factors in die ANOVA, whether it is an explanatory or a blocking variable, the
result of interest will be die F-test that compares the variability in concentrations accounted for by mat
factor to die "error" variability. The error variability (assessed by the mean squared error) measures die
inherent randomness of observations within groups. When differences in means across die groups defined
by the factor under consideration are large relative to the within-group variability, then the F-test of that
factor returns a significant result. This suggests mat that factor is indeed important and should be
retained for defining exposure groups. A significant result can be defined as an F-test with an associated
p-value less than 0.05. The determination of significance is dependent on sample size, so it may be
appropriate to adjust the 0.05 cut-point as a function of sample size. For small sample size, a larger p-
value might be warranted; for larger sample sizes, a smaller p-value could be used. A statistician should
be consulted if such adjustments are considered.
It is recommended that the partial sums of squares be used for the F-tests of significance. These
sums of squares (called Type ffl sums of squares in the SAS output) are considered by many statisticians
to be the most desirable. Such sums of squares are not sensitive to the order of the factors in die model.
The sums of squares for one factor account for die effects of all other factors. Moreover, they are not
functions of the numbers of observations per group. All these features make the partial sums of squares
62
-------
appropriate for the purposes of determining how to refine the initial grouping by ignoring some of the
exposure parameters.
EXAMPLE
The SAS output for the ANOVA of the monomer industry groups is displayed in
Figure 8. The last column of the output shows the p-values associated with the 3 factors being
considered: company, process type, and control type. Alt three of the p-values exceed 0.05 for
the partial sum of squares, suggesting that none of them significantly account for observed
differences in concentrations. Rather than removing all of the factors from consideration,
however, it was decided to first remove only the blocking variable, company, to see what effect
this would have on the other two factors.
Figure 9 shows the ANOVA results when only process type and control type are
considered. In this case, both of those factors contribute strongly to observed differences in
exposure. The tack of significance for those factors when company was included illustrates a
difficulty that can be encountered when there are relatively few observations and factors with
many values: there is confounding (overlap) of the effects and the significance of one or more
of them may be masked. Because we were not interested in company per se and were willing to
remove it from consideration, the importance of process type and control type could be
revealed. Both factors are retained for redefining groups in the monomer industry.
For the polymer industry groups (Figure 10), the company blocking variable and the
process type parameter were highly significant but the control type was not This suggests that
control type can be ignored in the polymer industry. Apparently, the differences between the
controlled and uncontrolled work areas did not result in significant differences in exposure,
when the other factors of company and process type were considered. The fact that company
was a significant factor suggests that other differences between companies, in addition to control
technologies, are contributing to different exposure levels. At this point in time, the relevant
differences among companies have not been identified, so company is retained as a factor used
to define exposure categories.
63
-------
General Linear KobeIs Procedure
Class Level Information
Class Levels Values
COMPANY 4 HI H2 MS M4
PROCESS 5 Control rooa Lib loading Process area lank far*
CON1ROL 6 123456
NuBtoer of observations in data set * 85
Dependent Variable: LOCCONC Lofl(concentration)
Source
Nodel
Error
Corrected lotal
Source
COMPANY
PROCESS
CONIROL
Of
12
72
84
R- Square
0.375273
or
3
4
5
Sun of Squares
117.30527971
195.28116513
312.58644484
C.V.
•183.4455
lype III SS
19.11674454
22.57606702
18.94416819
Mean Square
9.77543998
2.71223840
Root MSE
1.64688749
Mean Square
6.37224818
5.64401676
3.78883364
F Value
3.60
f Value
2.35
2.08
1.40
Pr > f
0.0003
LOGCONC Mean
-0.89775311
Pr > f
0.0796
0.0922
0.2357
I igure 8: SAS Output for Test of Company, Process Type, and Control Type in Monomer Industry
-------
General Linear Models Procedure
Class Level Information
Class Levels
PROCESS 5
OWIROL 6
Values
Control root) Lib Loading Process area Tank fara
1 2 3 4 5 6
of observations in data set • 85
Dependent Variable: LOCCONC Log(concentration)
Source
Model
Error
Corrected Total
Source
PROCESS
CONTROL
OF
9
75
at
R- Square
0.314116
Of
4
5
Sum of Squares
98.18853517
2U. 39790968
312.58644484
C.V.
-188.3314
Type III SS
49.48315238
57.55457757
Mean Square
10.90983724
2.85863880
Boot NSE
1.69075096
Mean Square
12.37078810
11.51091551
F Value
3.82
F Value
4.33
4.03
Pr > f
0.0005
LOCCONC Mean
-0.89775311
Pr > F
0.0033
0.0027
Pigure 9: SAS Output for Test of Process Type, and Control Type in Monomer Industry
-------
General Linear Models Procedure
Class Level Information
Class Levels
COMPANY 5
PROCESS
CONIBOL
Values
PI P2 PJ P4 PS
12 Control rooa Crushing and dry Laboratory Maintenance Packaging Polymerization o
Process area Purification Solutions and co lank far* Unloading area Warehouse
212
Nuaber of observations in data set » 431
Dependent Variable: LOCCOMC Log(concentration)
Source
Model
Error
Corrected lotal
Source
COMPANY
PROCESS
CONTROL
OF
16
414
430
ft -Square
0.647787
Of
4
It
1
SUB of Squares
1279.36507409
695.61144185
1974.97651593
C.V.
-47.86S22
Type III SS
487.37213944
686.59871410
0.96805416
Mean Square
79.96031713
1 .68022087
Root NSE
1.29623334
Mean Square
121.84303486
62.41806492
0.96605416
f Value
47.59
f Value
72.52
37.15
O.S8
Pr > f
0.0001
LOCCOHC Mean
-2.70809000
Pr > f
0.0001
0.0001
0.4483
10: SAS Ouipul for lest of Company, Process Type, and Control Type in Polymer Industry
-------
E. Rgjepning Groups
Based on the results of the ANOVA(s), it may be possible to ignore one or more of the factors
that were originally considered for importance. The regrouping is accomplished by simply dropping the
non-significant factors (essentially pooling some groups).
EXAMPLE
*
For the monomer industry groups, ignoring the company parameter and reclassifying results
in a drop to 11 groups, from the initial 17. Unfortunately, for the polymer industry group,
elimination of control type from the definition of the groups does not reduce the number of groups
for which descriptive statistics are required. Each of the initial groups could have been completely
defined by company and process type alone (i.e., no process type within a company had more than
one control type in place). Thus, the 41 initial polymer industry groups are retained for calculation
of descriptive statistics in Step 17.
The groups that are carried through to Step 16 are listed here:
Monomer process area, control 1, N=6
Monomer process area, control 2, N=21
Monomer control room, control 1, N=10
Monomer loading area, control 1, N=3
Monomer loading area, control 2, N=8
Monomer lab, control 2, N=3
Monomer lab, control 3, N=»9
Monomer lab, control 4, N=6
Monomer lab, control 5, N»7
Monomer lab, control 6, N»7
Monomer tank farm, control 1, N»5
PI, Crumbing and drying, N»9
PI, Lab, N» 10
PI, Maintenance, N=34
PI, Packaging, N-30
PI, Polymerization or reaction, N»6
PI, Process area, N«5
PI, Purification, N=6
PI, Solutions and coagulation, N-9
PI, Tank farm, N-5
PI, Warehouse, N-2
P2, Controj room, N»6
P2, Crumbing and drying, N»7
P2, Lab, N»14
P2, Maintenance, N»9
P2, Packaging, N=6
P2, Polymerization.or reaction, N=29
P2, Solutions and coagulation, N»S
P2, Tank Farm, N-3
P3, Lab, N»3
P3, Maintenance, N»4
P3, Polymerization or reaction, N* 18
P3, Solutions and coagulation, N=>4
P3, Tank farm, N»9
P3, Unloading area, N-2
P4, Crumbing and drying, N» 13
P4, Lab, N-17
P4, Maintenance, N»7
P4, Packaging, N»20
P4, Polymerization or reaction, N»7
P4, Solutions and coagulation, N»3
P4, Tank farm, N»8
P4, Warehouse, N* 11
PS, Crumbing and drying, N=»6
P5, Lab, N«8
PS, Maintenance, N-16
PS, Packaging, N-23
PS, Polymerization or reaction, N»20
PS, Purification, N-12
PS, Solutions and coagulation, N= 12
PS, Tank farm, N-6
PS, Warehouse, N=»7
67
-------
STEP 16: TREATMENT OF TYPE 2 DATA
Categories with insufficient Type 1 data are identified and may be supplemented with Type 2 data
(Figure 11). Type 2 data should only be added for those categories that require it, and Type 3 data
should never be added.
A sample size of 6 is a common minimum cited in the literature (Patty, 81; Hawkins, 91) for
calculation of simple descriptive statistics. The addition of Type 2 data is considered only for groups
having fewer than six samples.
A summary of the Type 2 data not used in the statistical analysis will be prepared, similar to the
summary of Type 3 data completed in Step 12.
A. Considering Addition of Type 2 Data
There will be a "trade off" that must be carefully considered when faced with a group with small
sample size. The addition of additional data points will tend to improve the estimation of the descriptive
statistics desired, all else being equal. However, when Type 2 data are all mat are available for boosting
sample sizes, all things are not equal. The Type 2 data are not as good as the Type 1 data considered
heretofore, typically because the Type 2 data lack information about some important parameter or because
some substantial uncertainty is associated with the measurements. In some instances or for some
categories, the addition of such Type 2 data may not be desirable, even when sample sizes are low,
because the additional uncertainty is considered to outweigh me benefits of increased sample size. It may
be the case that a sample size of 5, for example, is preferable to adding one or more Type 2 data points
because the information that was missing from the Type 2 data, and the assumptions made in order to
use the Type 2 data, may have a substantial impact on the applications intended by the end user. The
decision, therefore, must consider the end user needs and how sample size and assumptions relate to those
needs.
B. Adding Type 2 Data
When Type 2 data are added to the data set, a record of mat addition and the associated
assumptions most be added to the ongoing list of uncertainties and assumptions. The impact of the
assumptions and uncertainties will be assessed in Step 18.
C. ^UFniPVY fff RflPliniflg TVPC 2 Data
Whatever Type 2 data have not been included for statistical analysis should be summarized. The
summary may be similar in nature to the summary of the Type 3 data (Step 12), but a slightly more
quantitative report may be possible for some Type 2 data. This report on the Type 2 data can be used
68
-------
or referred to in 'the presentation of results, as a supplement to the statistical information based on the
Type 1 data (supplemented as needed by Type 2 data).
EXAMPLE
In the example data set, 12 groups resulting from processing in Step 15 had Type 1 data
sample sizes less than 6. For one of those groups, monomer loading area with control 1 (N=3),
additional Type 2 data were located (Table 2). These data were considered to be Type 2 data
because of known biases in the measurement procedure and assumptions that were made about the
correction factor to apply to adjust for that bias. Nevertheless, it was possible to estimate values
for the samples, as shown in Table 2. The eleven Type 2 values were added to the Type 1 data
of this group, because the two sets of values appeared to be generally consistent and the effect of
uncertainty about the Type 2 values was considered to be offset by die advantage of increasing
sample size for this group. The inclusion of these data is noted on the list of uncertainties and
assumptions.
No other Type 2 data were available to boost sample sizes for the other eleven groups with
small sample size. These groups will be treated appropriately in subsequent steps.
69
-------
Aro
tore groups
wilt tower tun
6Typ*1
Otwarvalians?
Groupings horn
Stop 15
Is it
appropriate
toaddTyp«2
daUlosmU
groups?
Uncwlamty/
Assumptont
Exposure
Matrix
Figure 11. Flow Diagam for Step 16 (Treatment of Type 2 Data).
-------
Table 2. Type 2 Data Used in Statistical Analysis
Plant
ID
Al
Al
Al
Al
A2
A2
A2
A3
A3
A3
A3
Industry-
Monomer
Monomer
Monomer
Monomer
Monomer
Monomer
Monomer
Monomer
Monomer
Monomer
Monomer
' Process
type
Loading
area
Loading
area
Loading
area
Loading
area
Loading
area
Loading
area
Loading
area
Loading
area
Loading
area
Loading
area
Loading
area
Job title
Process
technician
Process
technician
Process
technician
Process
technician
Process
technician
Process
technician
Process
technician
Process
technician
Process
technician
Process
technician
Process
technician
Control
type (a)
1
1
1
1
1
1
I
I
1
1
1
Sample
duration (min)
415
428
427
474
260
442
443
459
484
474
446
8-hr TWA
(ppm)
0.50
0.30
*
0.10
0.90
2.80
3.10
0.80
7.50
0.60
2.40
1.70
Control
description
Magnetic gauge
Magnetic gauge
Magnetic gauge
Magnetic gauge
Magnetic gauge
Magnetic gauge
Magnetic gauge
Magnetic g
Magnetic gauge
Magnetic gauge
Magnetic gauge
(a) Control Type 1 is "controlled," as in Table 1.
71
-------
STEP 17. CALCULATE DESCRIPTIVE STATISTICS FOR EACH GROUP
For each group defined in the previous steps, means and standard deviations, as well as geometric
means and geometric standard deviations will be estimated. Because no tests have been conducted to
determine the nature of the distributions of concentrations within the groups, relatively simple and
consistent estimators of those parameters are recommended. This step describes the calculations
necessary for estimating the descriptive statistics.
*
The sample mean and sample standard deviation are consistent estimators of the mean and
standard deviation, respectively. In the case of normality, they are also unbiased estimators. The sample
mean is given by Equation 2.
£ *> Equation 2
MEAN - J^— ^
n
where:
MEAN = sample mean
x, =• a data point
n =» number of data points
The sample standard deviation is the square root of VAR, SO » (VAR)", where VAR is given
by Equation 3.
E (x, - MEAN)1 Equation 3
VAR - 2
n-1
where:
VAR =•• sample variance
MEAN * sample mean
X| » a data point
n =» number of data points.
72
-------
The geometric mean and geometric standard deviation can be estimated from the log-transformed
data. Equations 4 and 5 present those estimates:
GM = exp {LMEAN} Equation 4
GSD » exp {LVAROS} Equation 5
where
•
•^ •
Equation 6
n-1
and
x< = a log-transformed data point
n = number of data points
exp = the antilog function
It may also be useful to calculate standard errors for the estimators of the means. The standard
error is related to the variability of the estimator of the mean. That estimator is estimating the true mean
of the distribution of observations, but because it is only an estimator, there is some uncertainty
concerning the value of the true mean. That uncertainty is characterized by the standard error.
The derivation of a standard error for the sample mean, SE, is given by Equation 8.
SE * SD/(nf* Equation 8
where
SD =» standard deviation estimate
n * number of observations.
EXAMPLE
Table 3 displays the descriptive statistics calculated for me groups retained from Step 15.
That table provides the statistics for any group with sample size of at least 6. For the groups with
samples sizes less than six, median values are all that are provided.
73
-------
Table 3: Descriptive Statistics for Groups in Example Data Set
Group.
Monomer Control Room, Control 1
Monomer Lab, Control 2
Monomer Lab, Control 3
Monomer Lab, Control 4
Monomer Lab, Control 5
Monomer Lab, Control 6
Monomer Loading, Control 1
Monomer Loading, Control 2
Monomer Process Area, Control 1
Monomer Process Area, Control 2
Monomer Tank Farm, Control 1
PI, Crumbing and drying
PI, Lab
PI, Maintenance
PI, Packaging
PI, Polymerization or reaction
PI, Process area .
PI. Purification
PI, Solutions and coagulation
PI, Tank farm
PI, Warehouse
tp2, Control room
P2, Crumbing and drying
P2, Lab
P2, Maintenance
P2, Packaging
P2, Polymerization or reaction
P2, Solutions and coagulation
P2, Tank farm
P3, Lab
P3, Maintenance
P3, Polymerization or reaction
P3, Solutions and coagulation
P3, Tank farm
P3, Unloading area
P4, Crumbing and drying
P4, Lab
P4, Maintenance
P4, Packaging
P4, Polymerization or reaction
P4, Solutions and coagulation
P4, Tank farm
P4, Warehouse
PS, Crumbing and drying
P5. Lab
Descriptive Statistics
No. of Mean Std. Dev. Geom. Mean Geom. Std.
Samples (ppm) (ppm) (ppm) Dev.
10
3
9
6
7
7
14
8
6
21
5
9
10
34
30
6
6
6
9
5
2
6
7
14
9
6
29
5
3
3
4
18
4
8
2
13
17
7
20
7
3
8
11
6
8
0.448
2.610 *
0.524
0.298
3.087
0.350
1.709
17.010
1.312
0.918
0.160 *
0.043
2.909
0.857
0.039
0.696
0.118
4.357
0.027
0.440 *
0.020 *
0.028
0.032
0.636
0.030
0.033
0.077
0.030 *
0.360 *
0.020 *
0.020 *
0.057
0.020 *
0.112
14.600 *
0.016
0.184
0.004
0.006
0.003
0.003 *
2.366
0.004
0.055
3.972
0.724
0.629
0.357
2.256
0.304
1.913
43.110
1.131
1.054
0.019
3.348
2.310
0.031
I.'IOO
0.122
2.312
0.008
0.030
0.013
1.267
0.009
0.006
0.144
..
..
..
0.068
..
0.231
0.020
0.275
0.004
0.006
0.001
..
4.203
0.002
0.031
3.035
0.236
0.335
0.191
*2.492
0.264
1.139
6.243
0.994
0.603
0.040
1.908
0.298
0.031
0.372
0.082
3.849
0.026
0.019
0.030
0.285
0.029
0.032
0.036
--
--
--
0.036
0.049
--
0.010
0.102
0.003
0.004
0.003
..
1.161
0.004
0.048
3.156
3.106
. »
2.572
2.569
1.924
2.116
2.463
4.120
2.107
2.502
1.515
2.505
4.277
2.003
3.062
2.346
1.646
1.343
2.382
1.485
3.547
1.341
1.201
3.417
2.583
«
3.626
2.682
2.955
2.140
2.374
1.180
3.299
1.627
1.697
1.970
Values marked by asterisks are medians for groups with less than 6 observations.
-------
Table 3: Descriptive Statistics for Groups in Example Data Set
Descriptive Statistics
No. of Mean Std. Dev. Geoa. Mean Geoa. Std.
Group Samples (ppa) (ppm) (ppa) Dev.
P5, Maintenance
P5, Packaging
PS, Polymerization or reaction
P5, Purification
P5, Solutions and coagulation
P5, Tank faro
P5, Warehouse
16
23
20
12
12
6
7
1.200
O.OS8
0.740
9.523
0.082
3.020
0.045
1.253
0.034
0.886
6.727
0.047
1.750
0.015
0.830
0.050
0.474
7.778
'0.071
2.613
0.043
2.360
1.730
2.568
1.889
1.709
1.713
1.382
• Values marked by asterisks are medians for groups with less than 6 observations.
-------
STEP 18: TREAT UNCERTAINTIES, ASSUMPTIONS, AND BIASES
In the course of completing some, previous steps, uncertainties, assumptions and biases will have
been compiled in an ongoing list. The listing of uncertainties, assumptions, and biases will be treated
in this step to provide important information to the end user. Evaluating uncertainty, assumptions, and
biases provides a sense of the integrity of the results, whether significant gaps exist in the available data
or information upon which the assessment is based and whether decisions made on the basis of the data
will be tenuous. In addition, an uncertainty analysis provides information to better focus resources
needed to refine the assessment and improve (reduce) the uncertainty (EPA, 92).
This step describes procedures for the treatment of data limitations imposed by uncertainties,
assumptions, and biases. To die extent possible, those procedures will be quantitative; sensitivity analyses
and confidence limit calculations are examples of quantitative approaches. The EPA Exposure
Assessment Guidelines (EPA, 92) and Hornung (Hornung, 91) contain additional methods for quantifying
uncertainty. In many cases, however, treatment may be qualitative, when quantification is not possible.
Because this step is vital to a risk assessment and the management decisions associated wim it, and
because it may be difficult to execute, even a qualitative discussion of uncertainty will be extremely
important.
A. Sensitivity Analysis
Sensitivity analysis can be used to test the effect of uncertainty or assumptions on the results, over
the expected range of the uncertain or assumed values. The sensitivity analysis involves fixing the value
for one variable at a credible lower bound while the other variables remain at meir "best-estimate* values,
and then computing the results. Then a credible upper bound value for the one variable is used while
the other variables remain at their "best-estimate" values, and again die results are computed. Bodi sets
of results are-evaluated, over ail uncertainties and assumptions (i.e., those relating to values of the
observations used in die calculations), to determine which variables have me greatest impact on die
assessment of exposure. Such analyses may also help focus resources for further refinement of the
assessment. Since a sensitivity analysis does not provide any information on the likelihood of the
variables assuming any particular values in their ranges of values, me analysis is most useful in screening-
level assessments.
An approach known as Monte Carlo simulation can be used to quantitatively combine die
contributions of various uncertainties. If ranges and/or distributions for the uncertain parameters can be
specified, then values from mose distributions can be sampled repeatedly, with exposure descriptive
statistics recalculated with each repetition, to develop a "picture" of the distribution of descriptive statistic
values. Monte Carlo simulation is a computer-intensive approach that can handle complex systems and
combinations of many parameters. The user should consult a statistician if Monte Carlo approaches are
to be considered.
76
-------
Where limited data exist, such as for a new chemical, comparison with similar chemicals
(surrogates) or the use of modeling may be used to estimate concentrations for the chemical of interest
(Step 12 describes methods for treatment of Type 3 data). A sensitivity analysis can address uncertainty
in the following manner: the model is run using a range of expected values for model parameters as in
Monte Carlo simulation discussed above; changes in the estimated concentrations for different input
parameter values are a function of the sensitivity to the model parameters and of the degree of uncertainty
associated with the parameter values. A more complete evaluation of uncertainty due to modeling would
be to consider alternative models and ranges for their input parameter values.
B. Confidence Intervals
Confidence intervals can be calculated to quantify the uncertainty associated with estimates of
summary statistics. In particular, one is often interested in the uncertainty concerning the mean exposure.
As discussed in Step 17, the standard error of the mean characterizes the variability of the estimate of
the mean and is the basis for confidence limit calculations for the mean. Confidence limits address
uncertainty associated with sampling error, not other sources of uncertainty.
For a normal distribution, a 90% confidence interval for the mean extends from 1.645 standard
errors below the estimator of the mean to 1.645 standard errors above the mean estimator. A
95% confidence interval is ± 1.96 standard errors, and a 99% confidence interval is ± 2.58 standard
errors around the estimator of the mean. The values 1.645, 1.96, and 2.58 are the multipliers of dw
standard errors that are used to derive confidence intervals corresponding to three levels of confidence
(90%, 95%, and 99%, respectively). In practice, one does not know what the true standard error is any
more than one knows what the true mean is. To account for this added level of uncertainty, the values
for the multipliers of the standard error are increased, the degree of the increase depending on the sample
size.
A particularly common situation for confidence limit calculation is for a normal distribution mean.
In that case, multipliers for the standard error can be found in a table of T distribution percentiles. Those
percemiles depend on the sample size as desired. For example, for a normal distribution with an
estimated mean of 5 ppm, a standard deviation of 1.5, and a sample size of 25, the resulting 95%
confidence interval for the mean ranges from (5 - 2.064-(1.5/5) to (5 + 2.064'(1.5/5), i.e., from 4.4
to 5.6 ppm. In that calculation, (1.5/5) is the standard error estimate from Equation 6 (see Step 17) and
2.064 is the 97.5m percentile of the T distribution with 24 degrees of freedom (the estimates of standard
deviation and standard error have degrees of freedom equal to the sample size minus one). The use of
the 97.5th percentile results in 2.5% probability above and 2.5% probability below the confidence
interval, i.e., a 95% confidence interval.
Even though we have not tested the groups defined in Step 15 to see if they are normal or not,
the calculations outlined above should hold approximately, since the sample mean is approximately
normal no matter what the distribution of the underlying observations may be. The adequacy of the
approximation depends on the sample size and on the extent to which the standard deviation estimate
divided by the square root of the sample size approximates the standard error of the mean. Appendix B
77
-------
presents additional material on confidence limits, especially as related to means of lognormal
Hicfrihiifinnc
distributions.
C. Quantification of Bias
If the data were not statistically sampled, the results may be biased. This bias is separate from
and should not be confused with bias in the data measurement which can be defined as a systematic error
inherent in a method or caused by some feature of the measurement system (EPA, 92). Statistical bias
is caused by the sample population not being representative of the population under study. It should be
noted that data collected from other agencies and published sources are almost never randomly selected,
although a particular bias may be difficult to identify. Despite the difficulty, it is extremely important
to identify potential biases and clearly present them in the results presentation. Furthermore, if random
sampling was carried out only in a subpbpulation, the summary statistics may apply only to that
subpopulation and may not be representative of a larger group. There are no quamiatrve methods to
extend the sample results beyond the bounds of the subpopulation.
Bias can also occur because of inappropriate selection of sample location, sample time, or workers
to be sampled. For example, measurements of peak exposures are intended to measure the period of
highest exposure for mat job category. Therefore, if a time period that does not represent maximum
exposure or an individual in a job category mat would not represent peak exposure are measured, thea
this selection would cause the measurements not to be representative of peak exposure.
Quantification of biases is always difficult and may be beyond the scope of the exposure
assessment. If quantification is not possible, biases should be qualitatively described in the results
presentation. One method of quantification is to segregate the potentially biased data and compare the
exposures with the remaining data sets. Where a large quantity of data is available, this may allow
quantification of the bias. Where only limited data are available, such comparisons may not yield
dependable results.
Another method is to try to quantity the bias through use of other information. For example, if
the data are biased because the plants art "well controlled," then information gathered from other sources
or estimated from the monitoring data may be used to estimate the control efficiency and the distribution
of controls in the industry. This, in turn, can be used to quantify the bias. Likewise, if only large
facilities were surveyed and other data indicate differences in control between large and small facilities,
the effect on exposure estimates may be estimable.
D. Weighting Factors to Mitigate Bias
The most common way to mitigate known quantifiable biases is through me use of a weighting
factor. Weighting factors are used to adjust the influence of different pieces of data to equal their weight
in the population being judged as a whole. For example, when determining an annual exposure, values
may be weighted by the number of days annually that a worker is exposed. Weighting can also be used
78
-------
to calculate averages within a job category or other subpopulation. Weighting should always be clearly
explained so that the user is aware that the descriptive statistics are based on weighted data. Weighting
factors used to mitigate bias should be clearly presented.
Sensitivity
For t
group, set
measurem
statistics.
was set at
area/contx
Thee
choice of
Fort
EXAMPLE
f Analysis
he monomer process area/control 2 group and for the P4/polymerization and reaction
isitivity analyses were performed to quantity me effect of the assumption that (undetected
ems were equal to half the detection limit, L, on the calculation of the descriptive
A lower bound for the value to be used for nondetecn was set at L/4. The upper bound
LV2, another common choice for the value of a nondetect. For the monomer process
ol 2 group, the resulting descriptive statistic estimates were as follows:
Statistic
MEAN
SD
GM
GSD
•stimateoft
values for th
he P4/polyn
Statistic
MEAN
SD
GM
GSD
Nondected value:
L/4
0.91
1.06
0.59
2.5
te means and standard
e nondetects. Only fr
urization group, the r
L/2
0.92
1.05
0.61
2.5
deviations for this gn
*e of the 21 observatic
e>ult* were as follows
v-ndected value:
L/4
0.0016
0.00027
0.0016
1.2
I. .'2
00033
000054
0.0032
1.2
UV2
0.92
1.05
0.61
2.5
[>up were very insensitive to the
>ns were bdow detection limits.
I
LA/2
0.0046
0.0007
0.0046
1.2
79
-------
The change of the mean for the P4/polymerization group was considerably greater than that
observed in the monomer process area/control 2 group, ranging from 52% below to 39% above the
initial estimate of the mean. All seven of the P4/polymeraation group observations were below
detection limits. Clearly, the sensitivity of the results, in this case to the assumed values of
nondetects, can vary from group to group.
Quantification and presentation of the results of sensitivity analyses and their variations
across groups will be useful for subsequent risk assessment/risk management decisions. The results
of such sensitivity analysis can be used by the risk assessor/risk manager to determine if his or her
actions and decisions could be subject to change as a result of uncertainty concerning relatively low
concentrations (those below the limit of detection). If they are subject to change, the implications
of those changes can be determined or the decisions re-evaluated.
Other Means of Mitigating Bias
The collection of the data used in this example analysis provides an example of the
identification of a bias in the collection method and how the bias was mitigated by using a different
method. The potential for bias exists if the collection 01 analytical method.has not been validated
over the entire range of exposures. N10SH Method S-91 for the example chemical illustrates this
(NIOSH, 84). This method was developed to meet compliance monitoring needs associated with
the OSHA standard at the time of 1,000 ppm (2,000 Rig/m*). The method was validated over a
range of concentrations from 481 to 2,237 ppm (1,065 to 4,950 mglaf). Because of new animal
test data indicating toxicity at much lower concentrations, and the fact that industry was controlling
exposures to much lower levels, the existing method had to be reviewed. It was found that the S-91
method poorly separated the example chemical from other C4 hydrocarbons. This and other
possible interferences probably systematically overestimated the example chemical content of the
samples at lower concentrations.
In the case of the example chemical, a new extraction method was developed that improved
the sensitivity and selectivity of the method and new measurements were taken. Where sufficient
time or resources are not available, correction factors may be developed and the overestimate at
lower concentrations adjusted by these factors. Any such adjustments should be clearly identified
in the data and the results. The correction factor values are themselves subject to uncertainty and
should be included in the list of uncertainties/assumptions for presentation to the end user.
80
-------
STEP 19: PRESENT RESULTS
Because me results of the analysis may need to be used by engineers, economists, and other
decision-makers who are not statisticians, presentation techniques will to a large extent determine their
usefulness. To properly use the results of the analysis, the end user must know the purpose, scope, level
of detail and approach used in the assessment. In addition, key assumptions used, the overall quality of
the assessment (including uncertainties in the results), and the interpretation of data and results are as
important as estimates of exposure. The results must also be presented in a form that corresponds to the
modeling or other needs of the end user. Finally, it is important that the original data values and ail
important variables be presented in an appendix to the report. This step describes four aspects of results
presentation:
A) Characterization of exposure (narrative explanation)
B) Presentation of descriptive statistics
C) Presentation of assumptions and uncertainties
D) Presentation of original data
A. Characterization of Exposure,
The characterization of exposure is the overall narrative which consists of discussion, analysis
and conclusions that summarize and explain the exposure assessment. It provides a statement of the
purpose of the assessment, the scope, level of detail, and approach used in the assessment. It presents
the estimates of exposure by route of exposure (e.g., inhalation, dermal) for me population,
subpopulation, or individuals, in accordance with the needs identified by the user. It should also include
an overall evaluation of the quality of the assessment, and a discussion of the degree of confidence the
engineer has in the estimates of exposure and the conclusions drawn. The data and results should be
presented in keeping with the terms defined in the EPA Exposure Assessment Guidelines (EPA, 92) for
bounding estimates, reasonable worst case estimates, worst case estimate, maximally exposed individual,
maximum exposure range, etc.
The engineer should include a discussion ot whether the scope and level of detail were sufficient
to meet the need* of the user. If user needs were not met it is preferable to identify die tasks or
mechanisms (monitoring, collecting additional information, etc.) that will be needed in order to fully meet
the needs of the user, and how this lack of data or information impacts the assessment. A general
discussion of research or additional data to improve the assessment is also quite useful; data gaps should
be identified in order to focus further efforts to reduce uncertainty. An appendix may be a suitable place
for this discussion.
81
-------
The methods used to quantify exposure (e.g., models, use of surrogate data, use of monitoring
data) should be clearly identified in the exposure characterization. A discussion of the strengths and
weaknesses of the methods and of the data used should be included.
When Type 2 and Type 3 data were available but not used for the quantitative characterization
of exposure, summaries of the information available from the Type 2 and Type 3 data bases should be
included. Recall that the summaries of the Type 2 data may be more quantitative in nature and may
provide some numerical estimates. The numerical estimates and qualitative appraisals of the Type 2 and
Type 3 data can be compared with the summary statistics from Type 1 data (if available) to suggest
discrepancies or potential differences. If the Type 2 and/or Type 3 results suggest exposures that appear
to be different from the results of analyzing the Type 1 data, potential explanations for the differences
should be provided.
The end user will sometimes request a characterization of e- posure for the entire population (e.g.,
all workers in a given industry). The identification of subpopulations defined by the important exposure
parameters entails that descriptive statistics per se probably should not be derived for the entire
population, say by combining the descriptive statistics for each category (although, see Appendix B for
some issues related to such combinations). The best overall summary may be the presentation of the
descriptive statistics for each category, perhaps in graphical format. Such a presentation preserves much
more information then a formal, quantitative combination of means, for example, over all the categoric*.
In conjunction with a prose description of the numerical variety of circumstances (e.g., of the many
combinations of factors that affect exposure level), such tabular and graphical representations should
convey the information necessary for risk assessment and risk management decisions. Semi-quantitative
summaries (e.g., presentation of the range of mean exposure levels) may also be useful.
B. Presentation of Descriptive Statistics
The results should be presented in accordance with the needs of the end user as defined in Step 3.
The end user should have identified the required descriptive statistics and presentation methods.
Where sufficient data are present, the plotting of the data on an appropriate scale in addition to
the accompanying descriptive statistics is usually the best presentation method. Where box-and-whisker
plots were used to identify outliers, these plots can be presented in an appendix. It is also useful to
present a characterization of the data by the percentage of nondetected values and percentage of values
above the detection limit, etc.
There may be some Type 1 data groups that had few observations and for which descriptive
statistics were not calculated. These groups must be verbally summarized and the indications of the
degree of exposure suggested by these groups compared and contrasted to the quantitative estimates for
the other Type I data groups. This comparison and contrast is similar to mat provided for the Type 2
and Type 3 data sets. Qualitative and semi-quantitative results from the data not used to derive
quantitative estimates must be compared, to the degree possible, with the quantitative results. Possible
explanations for apparent discrepancies should be provided.
82
-------
EXAMPLE
Table 4 presents summary information for all of the groups considered in the example.
Although most users wish to receive the data in tabular form, some may wish to have
graphic presentations also provided. Figure 12 provides a box-and-whisker plot of the data for the
monomer industry groups. Figure 13 provides an example of a bar graph for several of die groups,
comparing mean and maximum concentrations with several target levels.
C. Presentation of Assumptions and Uncertainties
A figure summarizing and clearly presenting all assumptions and uncertainties (treated in Step 18)
should be accompanied by a more complete explanation in the text. Wherever possible, the effect of
those assumptions and uncertainties on the results of the analysis will be quantified (see Step 18). Figure
14 presents an example of how mis information may be presented; it may be considered to be the product
of the cumulative listing of assumptions and uncertainties produced from the various steps of the exposure
assessment.
The first column of Figure 14 presents a description of the uncertainty. The uncertainties
range from the length of the work day to me actual concentration when non-detected values are recorded.
The second column presents the associated assumption if one was made. The third column presents an
estimate of the range of possible values for the assumed value. Finally, column 4 presents an estimate
of the effect of the assumption on die results. Some of the effects presented in die last column may have
to be group-specific.
83
-------
Table 4: Descriptive Statistics Presentation, Example Data Set
No. of Exposed No. of
Crogp ; uorkcrf Saaoles
Monomer Control ROOM, Control 1
Monoswr Lab. Control 2
Monoswr Lab. Control 3
NcnoMr Lab. Control 4
NonoBer Lab, Control S
NonoMtr Lab. Control 6
Nonoaar Loading, Control 1
NonoBer Loading. Control 2
Nonoaar Process Araa. Control 1
NonoMar Process Araa. Control 2
Nonnaar Tank Fana. Control 1
PI. Crusbing and drying
PI. Lab
PI, Naintananca
Pt. Packaging
PI. Polymerization or rtaction
Pt. Process araa
PI. Purification
P1. Solution* and coagulation
PI. lank far*
Pt. Marahouaa
P2, Control rooai
P2. Crurfainn and drying
P2, Lab
P2. Maintenance
P2, Packaging
P2. Polymerization or reaction
P2, Solutions and coagulation
P2. Tank far*
PS. Lab
PI, Naintananca
PJ. Polyavrization or reaction
PS, Solution* and coagulation
PS. Tank far*
PS. Unloading area
P4. Cruabing and drying
P4. Lab
P4. Naintananca
Pi, Packaging
P4. Polymerization or reaction
Pi. Solution* and coagulation
P4. Tank fan*
P4, Warehouse
PV Crv«s>lng and drying
r\. i*»
r». B*ini«nant*
70
n
25
40
45
61
90
106
111
95
OS
166
SO
110
Jo
100
00
66
260
59
10
19
40
6)
94
25
105
650
59
45
74
100
460
41
45
24
60
61
400
204
SIS
45
56
39
36
ao
10
s
9
6
7
7
14
•
6
21
S
9
10
34
30
6
6
6
9
i
2
6
7
14
9
6
29
S
S
S
4
10
4
8
2
13
17
7
20
7
3
8
11
6
a
16
Niniav*
(ocaO (a)
* 0.020
0.420
2 0.000
0.050
0.560
* 0.040
0.100
S 0.000
1 0.270
* 0.070
i 0.040
0.014
0.014
0.014
0.012
0.03S
i 0.006
1.330
0.019
0.113
0.014
i 0.006
0.018
0.029
0.021
0.022
5 0.000
0.01S
0.123
S 0.009
0.011
3 0.006
2 0.006
0.009
0.770
i 0.005
5 0.006
i 0.006
5 0.006
* 0.006
* 0.005
5 0.006
i 0.005
0.033
0.100
0.072
Maxiaui
(ml (•!
1.870
373.540
1.960
0.870
6.310
0.890
7.500
123.570
2.980
4.190
1.530
0.071
8.330
11.020
0.154
2.710
0.304
6.950
0.046
0.962
0.020
0.070
0.052
4.120
0.048
0.030
0.780
0.030
0.436
0.429
0.026
0.250
0.164
0.682
28.510
0.081
0.943
0.013
0.026
S 0.008
2 0.008
12.030
i 0.010
0.116
8.870
3.090
Descriptive Statistics
Median Mean SE (ppa) Std. Oev. G«
(DOB) <•! (ml (hi fn«l
0.048
2.610
0.340
0.110
2.550
0.280
1.100
1.430
0.960
0.550
0.155
0.040
1.210
0.100
0.028
0.060
0.075
5.020
0.025
0.436
0.017
0.016
0.027
0.044
0.026
0.034
0.033
0.028
0.362
0.016
0.020
0.032
0.019
0.034
14.640
0.013
0.069
0.003
0.003
0.003
0.003
0.392
0.003
0.043
4.580
0.655
0.448
--
0.524
0.298
3.087
0.350
1.709
17.010
1.312
0.918
.-
0.043
2.909
0.857
0.039
0.696
0.118
4.357
0.027
--
--
0.028
0.032
0.636
0.030
0.033
0.077
-.
..
..
..
0.057
..
0.112
-.
0.016
0.104
0.004
0.006
0.003
..
2.366
0.004
0.055
3.972
1.200
0.229
--
0.210
0.146
0.853
0.115
0.511
15.242
0.462
0.230
..
0.006
1.059
0.396
0.006
0.449
0.050
0.944
0.003
--
--
0.012
0.005
0.339
0.003
0.002
0.027
.-
..
..
..
O.U16
.-
0.082
..
0.006
0.067
0.001
0.001
0.000
..
1.406
0.001
0.013
1.073
0.313
0.724
--
0.629
0.357
2.256
0.304
1.913
43.110
1.131
1.0S4
..
0.019
3.348
2.310
0.031
1.100
0.122
2.312
0.008
..
..
0.030
0.013
1.267
0.009
0.006
0.144
»»
..
..
..
0.068
.-
0.231
0.020
0.275
0.004
0.006
0.001
..
4.203
0.002
0.031
3.035
1.253
KM. Mean Geoa. Std.
0.236
..
0.335
0.191
2.492
0.264
1.139
6.243
0.994
0.603
0.040
1.900
0.298
0.031
0.372
0.002
3.049
0.026
..
0.019
0.030
0.285
0.029
0.032
0.036
„.
...
*.
..
0.036
..
0.049
0.010
0.102
0.003
0.004
0.003
1.161
0.004
0.048
3.156
0.030
3.106
..
2.572
2.569
1.924
2.116
2.463
4.120
2.107
2.502
1.515
2.505
4.277
2.003
3.062
2.346
1.646
1.343
,.
2.382
1.485
3.547
1.341
1.201
3.417
f .
^ _
— —
2.583
3.626
2.662
2.955
2.140
2.374
1.180
3.299
1.627
1.697
1.970
2.360
Non-Detects
6
0
3
0
0
1
0
2
1
S
3
0
0
0
0
0
1
0
0
0
0
2
0
0
0
0
2
0
o
|
o
2
1
0
0
4
3
6
16
7
3
1
10
o
o
0
60
0
33
0
0
14
0
25
17
24
60
0
0
0
0
* 0
17
0
0
0
0
33
0
0
0
0
7
0
0
33
o
ft
25
0
0
31
18
86
80
100
100
12
91
o
0
0
-------
Table 4: Descriptive Statistics Presentation, Example Data Set
Crotc
No. of Exposed Mo. of
yorfcert
Descriptive Statistics Non-Oetects
Niniaua MaxiauB Median Mean SC iff*) Std. Dev. Geo». Mean Ceoa. Std.
Oev. Mo. Percent
w,
M,
P5j
P5
«.
Packaging
Poly«wriiation or reaction •
Purification
Solutions and coagulation
Tank farst
Warehouse
44
$2
90
SSS
41
30
23
20
12
12
6
7
i
i 0.014
0.035
2.770
S 0.006
1.070
0.033
0.144
2.800
24.140
0.169
6.010
0.068
0.042
0.400
7.580
0.090
2.760
0.039
0.058
0.740 •
9.S23
0.082
3.020
0.04S
0.007
0.198
1.942
0.014
0.714
0.006
0.034
0.886
6.727
0.047
1.7SO
0.015
O.OSO
0.474
7.778
0.071
2.613
0.043
1.730
2.568
1.889
1.709
1.713
1.382
1
0
0
1
0
0
4
•
f
|
0
0
(a) The •inisui. •axiatM, and avdian are provided as additional descriptive statistics.
(b) Standard error avasures precision of cfco Men.
-------
300
100
6
5
4
3
2
1
0
---
t
3
*
T T
I
t
J T
USB*
i
I
t
»
I
T II I l I l l l
* Mean + Extremes
Figure 12. Box-and-Whisker Plot Monomer Industry Groups
-------
25
20
s
o
15
Maximum
Mean
1
/' //////// / /'//// /' /
/ / / / II' I
Figure 13. Example Bar Graph for Polymer Industry Groups: Means and Maxima Compared to 3 Target Levels
-------
Uncertainly
For job category A the length
of work day is not known for
30% of the monitorial dtfa.
Actual exposure not known for
values recorded as nondetected
(5% of values).
NIOSH indicates that data for
industry B represents "well
controlled" facilities.
Plants in the industry C data set
were not randomly selected but
rather all available data was
used.
Tor job category D only OSHA
compliance data were used.
etc.
Associated assumption
Length of work day assumed to
be 6 hours.
A value of LA/2 was assumed.
L = 1 ppm. ND = 0.71 ppm.
None made.
The data set for industry C
represents the industry as a
whole.
None made.
etc.
Reasonable possible variance of
assumption.
Reasonable range is 5 to 7
hours.
A value of L/2 could better
represent actual exposure.
NIOSH personnel roughly
estimated that exposures at well
controlled facilities can be 20%
lower than the industry average.
Not quantifiable.
Not quantifiable.
etc.
Effect on results
Maximum 6% change in
descriptive statistic for job
category A (sensitivity
analysis).
Maximum 2% change in
overall descriptive statistic
(sensitivity analysis).
Descriptive statistics for
industry B may underestimate
exposure by up to 20%
(NIOSH estimate).
Unknown
Facilities where OSHA
complaints are made may have
higher exposure than the
industry as a whole
(engineering judgment).
etc.
Figure 14. Example Format for Presentation of Assumptions and Uncertainties.
-------
EXAMPLE
No bias«t were identified for the Type 1 data. The only assumptions used for the Type 1
data sets were:
• The use of L/2 for the value of nondetected in the calculation of descriptive statistics
• Estimated duration of tasks provided by the companies where the monitoring was done
were used to convert some values to 8-hour TWAs.
For the Type 2 data, die following bias was identified:
• Some Type 2 data was taken using the old analytical method which may overestimate
concentrations due to interference by other C« chemicals.
The bias associated with the Type 2 data may explain discrepancies between the Type 1 and Type 2
analysis results.
D. Present Original Data
Even diough every attempt should be made to satisfy user needs, poor communication or changing
requirements may dictate changes even after the exposure assessment is finalized. Therefore, presentation
of all original data used in the calculations and all important variables associated wim the data will allow
additional statistics to be calculated by the end user when required.
EXAMPLE
Appendix A presents me S16 full shm rcrxonal samples that were used in the example
calculations in this report.
90
-------
REFERENCES AND BIBLIOGRAPHY
(Aitchison, 57)
(Armstrong, 92)
(Airfield, 92)
(Bickd, 77)
(BMDP/PC)
(Box, 78)
(Buringh, 91)
(CMA, 86)
(Cochran, 63)
(Cohen, 61)
(Cohen, 78)
(Conover, 80)
(Corn, 79)
Aitchison, J. and J.A.C. Brown. The Lognormal Distribution. Cambridge
University Press. London. 19S7.
Armstrong, Ben C. Confidence Intervals for Arithmetic Means of Lognormally
Distributed Exposures. American Industrial Hygiene Association Journal
53:481-485. 1992.
Atrfield, M.D. and P. Hewett. Exact Expressions for the Bias and Variance of
Estimators of the Mean of a Lognormal Distribution. American Industrial
Hygiene Association Journal. 53:432-435. 1992.
Bickd, P.J. and K.A. Doksum. Mathematical Statistics. Holden-Day, Ind. San
Francisco, CA. 1977.
BMDP Statistical Software, Inc., 1440 Sepulveda Blvd., Suite 316, Los
Angdes, CA 90025. 213/479-7799.
Box, George £., William G. Hunter, and 1. Stuart Hunter. Statistics for
Experimenter*. John Wiley A Sou. New York, New York. 1978.
Bruingh, Eltjo and Rod Laming. Exposure Variability in the Workplace: Its
Implications for the Assessment of Compliance. American Industrial Hygiene
Association Journal 52:6-13. 1991.
Chemical Manufacturers Association. Papers Presented at the Workshop on
Strategies for Measuring Exposure. December 9 and 10, 1986.
Cochran, William G. Sampling Technique!. John Wiley and Sons, Inc. New
York, New York. 1963.
Cohen, C.A. Tables for Maximum Likdihood ET*«M*»«- Singly Truncated or
Singly Censored Samples. Technometrics 3:535. 1961.
Cohen, Clifford, et at. Statistical Analysis of Radionuclide Levels in Food
Commodities, prepared for U.S. Food and Drug Administration, Washington,
D.C. September 15, 1978.
Conover WJ. Practical Nonparametric Statistics. 2nded. John Wiley & Soos.
New York, New York. 1980.
Corn, Morton and Nurtan A. Esmen. Workplace Exposure Zones for
Classification of Employee Exposures to Physical and Chemical Agents.
American Industrial Hygiene Association Journal. 40:47-57. 1979.
R-l
-------
(Cox, 81)
(Crump, 78)
(Damiano, 86)
(Damiano, 89)
(Daniel, 78)
(Devore, 82)
(Dixon, 83)
(Eisenhart, 68)
(EPA, 78)
(EPA, 87)
(EPA, 92)
(Esmen, 77)
(Hansen, 83)
Cox, David C. and Paul Baybutt. Methods for Uncertainty Analysis: A
Comparative Survey. Risk Analysis. 1:251-258. 1981.
Crump, Kenny S. Estimation of Mean Pesticide Concentrations When
Observations are Detected Below the Quantification Limit. Prepared for the
Food and Drug Administration. Washington, D.C. April 9, 1978.
Damiano, Joe. The Alcoa Sampling and Evaluation Guidelines, presented at the
Workshop on Strategies for Measuring Exposure (CMA). December 9 and 10,
1986.
Damiano, Joe. A Guideline for Managing the Industrial Hygiene Sampling
Function. American Industrial Hygiene Association Journal. July 1989.
Daniel, Wayne S. Applied Nonparametric Statistics.
Company. Boston, Massachusetts. 1978.
Houghton Mifflin
Devore, Jay L. Probability and Statistics for Engineering and the Sciences.
Table A.3 Standard Normal Curve Areas, p. 620. Brooks/Cole Publishing
Company, Monterey, CA. 1982.
Dixon, S.W., et al. E.I. du Pont de Nemours. Management of Air Sampling
Results, presented at the American Industrial Hygiene Conference, Philadelphia,
Pennsylvania. May 25, 1983.
Eisenhart, Churchill. Expression of the Uncertainties of Final Results. Science.
pp 1201-1204. June 1968.
Environmental Protection Agency. Source Assessment: Analysis of Uncertainty
- Principles and Application EPA/600/13. Industrial Environmental Research
Laboratory, Research Triangle Park, North Carolina. August 1978.
Environmental Protection Agency. The Risk Assessment Guidelines of 1986.
EPA7600/8-87/045. Washington, D.C. August 1987.
Environmental Protection Agency. Exposure Assessment Guidelines.
EPA/600/Z-92/001. Washington, D.C. 1992.
Esmen, Nurtan A. and Yehia Y. Hammad. Log-Normality of Environmental
Sampling Data. Journal of Environmental Science and Health, A12 (1 & 2), pp.
29-41. 1977.
Hansen, Morris H., et al. An Evaluation of Model-Dependent and Probability-
Sampling Inferences in Sample Surveys. Journal of the American Statistical
Association, pp. 776-793. December 1983.
R-2
-------
(Hawkins, 92)
(Hawkins, 91)
(Hoaglin, 83)
(Horaung, 90)
(Hornung, 91)
(TT,91)
(Jackson, 85)
(Johnson, 70)
(Karen, 88)
(Koek, 88)
(Koizumi, 80)
(Lee, Undated)
(Lemasten, 85)
Hawkins, Neil C, MichaeJ A. Jayjock, and Jeremiah Lynch. A Rationale and
Framework for Establishing the Quality of Human Exposure Assessments.
American Industrial Hygiene Association Journal 53:34-41. 1992.
Hawkins, Neil C., et al. A Strategy for Occupational Exposure Assessments.
American Industrial Hygiene Association. Akron, Ohio. 1991.
Hoaglin, David D., et al. ' Understanding Robust and Exploratory Data
Analysis. John WUey and Sons, Inc. New York, New York. 1983.
Hornung, Richard W., and Lawrence D. Reed. Estimation of Average
Concentration in the Presence of Noodetectable Values. Applied Occup Environ
Hyg 5(1):46-51. 1990.
Hornung, Richard W. Statistical Evaluation of Exposure Strategies. Applied
Occup Environ Hyg 6<6):516-520. 1991.
IT Environmental Programs, Inc. Preparation of Engineering Assessments,
Volume I: CEB Engineering Manual. U.S. Environmental Protection
Agency/Office of Toxic Substances. Washington, D.C. Contract No. 68-D8-
0112. February, 1991.
Jackson, R.A., and A. Behar. Noise Exposure - Sample Size and Confidence
Limit Calculation. American Industrial Hygiene Association Journal.
46:387-390. 1985.
Johnson Norman L. and Samuel Kotz. Continuous Unrvariate Distributions-1.
John WUey & Sons. New York, New York. 1970.
Karch, Nathan J. Testimony of Nathan J. Karen, Ph.D. on the Quantitative
Risk Assessments Included in the Proposed Rule on Air Contaminants of the
Occupational Safety and Health Administration. July 28, 1988.
Koek, Kara E., at al. Encyclopedia of Associations 1989, 3 vol. Gale
Research, Inc. Detroit, Michigan. 1988.
Koizumi, Akio, et al. Evaluation of the Tune Weighted Average of Air
Contaminants with Special References to Concentration Fluctuation and
Biological Half Tune. American Industrial Hygiene Association Journal, pp
693-699. October 1980.
Lee, Shin Tao, et al. A Calculation and Auto Selection of Simple Statistical
Analyses for Industrial Hygiene Data, NIOSH, Cincinnati, Ohio. Undated.
Lemasten, Grace K., Arch Carson, and Steven J. Samuels. Occupational
Styrene Exposure for Twelve Product Categories in the Reinforced-Plastics
Industry. American Industrial Association Journal 46:434-441. 1985.
R-3
-------
(Lilliefors, 67)
(Massey, 51)
(McBnde, 91)
(Nicas, 91)
(N1OSH, 77)
(NIOSH, 84)
(Oldham, 65)
(Olsen, 91)
(OSHA, 90)
(OSHA, 85)
(OSHA, Unpublished)
(Patty, 81)
(Powell, 88)
(Preat, 87)
Lilliefors, H.W. On the Kolmogorov-Smirnov test for normality with mean and
variance unknown. Journal of the American Statistical Association 62-399-402
1967.
Massey, Frank J., Jr. The Kolmogorov-Smirnov Test for Goodness of Fit.
Journal of the American Statistical Association, pp. 68-78. 1951.
McBride, Judith B. A Study of Spatial Correlation in a Workplace Exposure
Zone. Presented at American Industrial Hygiene Association Conference and
Exposition, Salt Lake City, Utah. May 23, 1991.
Nicas, Mark, Barton P. Simmons, and Robert C. Spear. Environmental versus
analytical Variability in Exposure Measurements. American Industrial Hygiene
Association Journal 52:553-557. 1991.
National Institute for Occupational Safety and Health. Occupational Exposure
Sampling Strategy Manual. DHEW (NIOSH) Publication No. 77-173.
Cincinnati, Ohio. 1977.
National Institute for Occupational Safety and Health. NIOSH Manual of
Analytical Methods. Third Edition. Cincinnati, Ohio 1984.
Oldham, P.O. On Estimating the Arithmetic Means of Lognormally-Distributed
Populations. Biometrics. 213:235-239. 1965.
Olsen, Erik, Bjarne Laursen, and Peter S. Vinzents. Bias and Random Errors
in Historical Data of Exposure to Organic Solvents. American Industrial
Hygiene Association Journal 52:204-211. 1991.
Occupational Safety and Health Administration. OSHA Technical Manual.
Issued by OSHA February 5, 1990.
OSHA Chemical Information File. OSHA Directorate of Technical Support.
June 13, 1985.
OSHA Manual of Analytical Methods (Unpublished).
Patty, Frank A. Patty's Industrial Hygiene and Toxicology, 3rd Edition.
Volumes 1 through 3 General Principles, Statistical Design and Data Analysis
Requirements. John Wiley & Sons. New York, New York, 1981.
Powell, R.W. A Method for Calculating a Mean of a Lognormal Distribution
of Exposures. Exxon Research and Engineering Company. Florham Park, New
Jersey. 1988.
Preat, Bernard. Application of Geostatistical Methods for Estimation of the
Dispersion Variance of Occupational Exposures. American Industrial Hygiene
Association Journal. 48:877-884. 1987.
R-4
-------
(Rappaport, 91)
(Rappaport, 87)
(Re, 85)
(Rock, 82)
(Samuels, 85)
(Searle, 92)
(Schneider, 91)
(Selvin, 91)
(Selvin, 89)
(Selvin, 87)
(Sokal, 81)
(SPSS/PC)
(Stoline, 91)
Rappaport, S.M. Assessment of Long-term Exposures to Toxic Substances in
Air. Annuals of Occupational Hygiene 35:61-121. 1991.
Rappapon, S.M. and S. Selvin. A Method for Evaluating the Mean Exposure
from a Lognorfnal Distribution. American Industrial Hygiene Association
Journal. 48:374-379. 1987.
Re, M. Microcomputer Programs for the Evaluation of Predictable Long-Tenn
Exposure. American Industrial Hygiene Association Journal. 46:369-372.
1985.
Rock, James C. A Comparison Between OSHA-Compliance Criteria and
Action-Levd Decision Criteria. American Industrial Hygiene Association
Journal. 43:297-313. 1982.
Samuels, Steven J., Grace K. Lemasters, and Arch Carson. Statistical Methods
for Describing Occupational Exposuie Measurements. American Industrial
Hygiene Association Journal. 46:427-433. 1985.
Searle, Shayle R., George Casella, and Charles E. McCulloch. Variance
Components. John Wiley & Sons. New York, New York. 1992.
Schneider, Thomas, Ib Olsen, Ole Jorgensen, and Bjarae Lauersen. Evaluation
of Exposure Information. Applied Occupational and Environmental Hygiene
6:475-481. 1991.
Selvin, S. Review of draft version of "Guideline* for Statistical Analysis of
Occupational Exposure Data." Submitted to EPA Chemical Engineering Branch.
November, 1991.
Selvin, S. and S.M. Rappaport. A Note on the Estimation of the Mean Value
from a Lognormal Distribution. American Industrial Hygiene Association
Journal. 50:627-630. 1989.
Selvin, S., et al. A Note on the Assessment of Exposure Using One-Sided
Tolerance Limits. American Industrial Hygiene Association Journal. 48:89-93.
1987.
Sokal, Robert R. and James Rohlf. Biometry. W. H. Freeman and Company.
New York, New York. 1981.
SPSS, Inc., 444 N. Michigan Ave , Suite 3000, Chicago, 0.60611. 312/329-
2400.
Stoline, Michael R. An Examination of the Lognormal and Box and Cox
Family of Transformations in Fitting Environmental Data. Envirpnmetrics 2:85-
106. 1991.
R-5
-------
(SYSTAT)
(Tiat, 92)
(Tuggle, 81)
(Tuggle, 82)
(Waters, 91)
(Waters, 90)
(Whitmore, 85)
(Woodruff, 71)
Systat, Inc., 1800 Sherman Ave., Evanston, IL 60201. 312/864-5670.
Tiat, Keith. The Workplace Exposure Assessment Expert System
(WORKSPERT). American Industrial Hygiene Association Journal 53-84-98.
1992.
Tuggle, R. M. The NIOSH Decision Scheme. American Industrial Hygiene
Association Journal. 42:493-498. 1981.
Tuggle, R. M. Assessment of Occupational Exposure Using One-Sided
Tolerance Limits. American Industrial Hygiene Association Journal.
43:338-346. 1982.
Waters, Martha A., Steve Selvin, Stephen M. Rappaport. A Measure of
Goodness-of-Fit for the Lognonnal Model Applied to Occupational Exposures.
American Industrial Hygiene Association Journal 52:493-502. 1991.
Waters, Martha A. Some Statistical Considerations in Chemical Exposure
Assessment. Ph.D. dissertation, University of California at Berkeley,
California. 1990.
Whitmore, Roy W. Methodology for Characterization of Uncertainty in
Exposure Assessments. U.S. EPA, Office of Health and Environmental
Assessment, EPA/600/8-85/009. August 1985.
Woodruff, Ralph S. A Simple Method for Approximating the Variance of a
Complicated Estimate. Journal of the American Statistical Association, pp.
411-414. June 1971.
R-6
-------
GLOSSARY OF TERMS
Accuracy • the mmnra of the correctness of the data, as given by the difference between the measured
value and tht true value.
Sample Mean - the sum of all the measurements in the data set divided by the number of measurements
in the data set.
Bias - a systematic error inherent in a method or caused by some feature of the measurement system.
Bimodal Distribution - a probability density function with two relative maxima values.
founding RiUlmitt • an estimate of exposure that is higher than the exposure of the individual in the
population with the highest exposure. Bounding estimate* are useful in constructing statements
such as ".. exposure is not greater than" the estimated value.
Confidence Interval - a range of values diat contains the true value of a parameter in a distribution a
predetermined proportion of time if the process of determining the value is repeated a number
of times.
Descriptive Statistics . statistics mat describe conditions and events in terms of the observed data; use is
made of tables, graphs, ratios, and typical parameters such as location statistics (e.g., arithmetic
mean) and dispersion statistics (e.g., variance).
Frequency Hfcfnyfam . a graphical representation of a frequency distribution, typically using bars to
exhibit die frequency or relative frequency of occurrence of each value or group of values in a
data set.
Geometric Mean • me n* root of the product of n values.
High End Estimate - a plausible estimate of individual exposure for utose persons at the upper end of an
exposure distribution, conceptually above the 90* percentile, but not higher than the individual
in the population with the highest exposure.
Homogeneous Categories - groups or categories with die same or similar modifying attributes.
Limit of Detection * ** "M^"*""* concentration of an analyte diat, in a given matrix and with a specific
method, hat a 99% probability of being identified, qualitatively or quantitatively measured, and
reported lo be greater man zero.
Log-normal DterBnirion - a probability distribution restricted to positive real values. • If the random
variable Y has a log-normal distribution, then X » log.Y, then X has a normal distribution.
Maximally Exposed Individual HV^D * * semiquantitative term referring to me extreme uppermost
portion of the distribution of exposures. For consistency, mis term should refer to me portion
of the individual exposure distribution diat conceptually falls above me 98* percentile of the
distribution, but is not higher than the individual with the highest exposure.
G-i
-------
%JmatC - an estimate based on finding the values of parameters that give the
maximum value of the likelihood function. The likelihood function is the probability of observing
the data, as a function of the parameters defining a distribution. The maximum likelihood
approach is applicable whenever the underlying distribution of the data is known or assumed.
It is a common statistical estimation procedure.
- the value in a measurement data set such that half the measured values are greater and half are
less.
Nonparametric Statistical Methods. - methods that do not assume a functional form with identifiable
parameters for the statistical distribution of interest (distribution-free methods).
Normal Distribution - a symmetric probability distribution whose maximum height is at the mean,
applicable to positive and negative real numbers. The normal distribution is the common "bell-
shaped" curve. Also called a Gaussian distribution.
Precision - a measure of the reproducibilhy of a measured value under a given set of conditions.
Probability Sampling - sampling method in which each population element has a known and nonzero
probability of being selected. Basic probability sampling methods include simple random
sampling, stratified sampling, and cluster sampling.
Quantification Limit - the concentration of analyte in a specific matrix for which the probability of
producing analytical values above the method detection limit is 99%.
Random Sampling - the selection of a sample of size n in such a way that each possible sample of size
n has the same chance of being selected.
Reasonable Worst Case - a semiquantitative term.referring to the lower portion of the high end of the
exposure distribution. For consistency, it should refer to a range diat can conceptually be
described as above the 90^ percemile in the distribution, but below about the 98* percentile.
Representativeness - the degree to which a sample is. or samples are, characteristic of the whole medium,
exposure, or dose for which the samples are being used to make inferences.
Sample - a small part of something designed to show the nature or quality of the whole. Exposure-
related measurements may be samples of exposures of a small subset of a population for a short
time, for the purpose of inferring me nature and quality of the parameters important to evaluating
exposure.
Sample Cumulative Distribution Function - a function that estimates the theoretical cumulative distribution
function of a population. If a sample of n independent values is available, the value of the
sample cumulative distribution at x is the proportion of the sample values that are less man or
equal to x.
Standard Deviation - a measure of the variability of the values in a sample or a population. The positive
square root of the variance of me distribution.
G-2
-------
Statistical Inference • .the process of using knowledge about samples to make statements about the
population.
Statistical fiiynififlKf • *n inference that the probability of an observed pattern (with respect to the data
being measured or the comparison being made) is so low that it is highly unlikely to have
occurred by chance alone (within the constraints of the hypothesis being tested). The inference
is that the hypomesis being tested is probably not true; that hypothesis is rejected in favor of a
stated alternative hypothesis.
Statistically Selected Sample • a sample chosen based on a statistically valid sampling plan.
Stratified Random Sample - a sample obtained by separating the population elements into nonoverlapping
groups called strata, and then selecting a simple random sample for each stratum.
Theoretical Cumulative Distribution Function - a function that uniquely defines the probability
distribution of a random variable, x. The function specifies the probability that the random
variable assumes a value less than or equal to x.
Worst Case - a semiquantitative term referring to the maximum possible exposure that can conceivably
occur, whether or not this exposure actually occurs or is observed in a specific population.
G-3
-------
APPENDIX A
SPREADSHEET MATRIX FOR TYPE 1
EXAMPLE DATA SET
FULL SHIFT
PERSONAL SAMPLES
-------
APPENDIX A
The data set presented in Appendix A represent 516 full-shift personal samples grouped into
58 initial categories. In addition to these data, 37 short-term samples and 232 area samples were
collected. Since these data were not used in the example analysis they were set aside and are not
presented in this appendix.
A-l
-------
Table A-l. Spreadsheet Matrix for Type I Example Data Set - Full Shift Personal Samples
Plul
10
M4
M4
Ml
Ml
Ml
Ml
Ml
Ml
M2
M2
M2
M2
M2
M2
M2
M2
M2
M3
Ml
Ml
M)
Ml
Ml
Ml
M4
M4
M4
P3
PJ
PI
PI
PI
PI
PI
UduMty
Polymei
Polymer
Poly net
Polymer
Polymer
Polymer
Polymer
IOO
Pruceu type
Loadiae) ana
Laboratory
Laboratory
Laboratory
(•aboratory
Lalnnioiy
Laboratory
Laboratory
Laboratory
Laboratory
Taakbm
Taakbm
Taakbm
Taakbm
Ualrtadiag area
Taakbm
TMkbm
TMkbm
TMkbm
TMkbm
Job title
ProccM leduuciaa
PTOCCM lecbaicua
UblacbaiciM-Wet
Leblecaaicua-Dry
Lab tecbaicwa - Dry
Ubtocbaicwa-Wei
l^mb teCaBBVaicaiBBBB' • Wttt
LjfeMcteictM • Dvy
l^toctaiciM-Wet
Lab tecfoBictaa. • Supefviiof
Lak MdMiciM • SHMtviMt
Lab»fbaifiaa-Pry
Labiifbairiia-Dry
LabtecbaiciM-Supervuor
UbtacbakiM-Wet
I^^K teCaaBBiiciaftaft * Wfli
(.ablctbaiciaa -Dry
(.•htctMKiaa -Dty
j 1 ah tatMKiM • Dry
l-jbtevbaiuM-Wet
1 ill Wf bairiaa - Dty
(.abwcbaiciM-Wcl
UblecbaiciM-Wet
UbtachaiciM-Wet
Lttb tAchaUciu • Diy
UblacbaicUa-Wet
LabttcfcuciM-Wat
Lab iMaMciM • Diy
UbtockMOM-Diy
LabtocteiciM-W«(
LabtockaiciM-Diy
I .^Ak ^^CaaaVaaUalA • BtfkavAk Vo^utlBBl
•^••V ••^•••^••i) WHV *UWaVM£
|«K |*£^««Cft*A » B^MHlft V€MllUal
•MPW ^MMBH*«MW " mPaW WMlValg
Laflb iMftaMCMB • B^NMb Vw4MA
f%OCA*M ttCaiiMCiAB
flof
2
Scoyk
rtumioa
(•»•)
44)
459
4M
4U
47«
492
41*
a?
410
392
4(3
4*7
411
39S
49ft
443
49»
4M
441
IS9
470
449
449
4SI
4S4
410
449
472
SIS
474
4*7
40
^J*
4S2
400
4*5
502
210
491
474
^fW
24S
229
«|A
JBV
SIO
471
^•*
ataU
^•*
41S
H-arTWA.
to»)
123.57
3.97
0,05
0.62
0.17
0.15
0.07
0.03
196
<4>.IO
.oa
0.36
0.34
<-0.09
043
0.25
1.04
0.2»
0.12
0.56
0.76
1.73
4J2
4.U
2.SS
6.31
371 54
*»*«J^
241
0.42
0.20
00.04
•c-0.31
MdMia> tu 100% for Ub vcMiUtMM; MFV of bomb: 12 !%•
Mrfw-iv ait 100% lot bb vMlibtmi; MFV of booa*: 12 !•>•
Make-up air 100% fat bb VMU|*MM; MFV orbaodK 12 !%•
MMt-Hp «ir IOOH fat bb v*MiUtio>; MFV orbood*: U !•>•
Ub vwbUltM (12 ak ca«M«WbMa); MFV of boo*. ITS * 163 Ifcai
Lab vMlilMio* (12 ak chMgM/koiif); MFV of boa*: ITS A 163 Ifcat
Lib vMliUlioa(l2 air caMpi/Wmi); MFV of boo*: 171 A 163 !%•
UbvMliUiwa(l2«rcbM0MAouf);MFVorboo4f: 171* I63lfc«
Ub vuliUuoa (12 ak caMtetAoiu); MFV ofbood*: I7« * 163 Ifcai
Ub vMtibliM (12 ak «*ia>iWboiiO; MFV of boo*: ITS aV 163 Upat
Ub VMlilalioa (12 ak cbaa«MtoM); MFV of boo*: I7g A 163 Upm
Ub vaaiilaliM (12 ak fangn/tom); MFV arbaa*: 171 A 163 Ifom
Ub vaalibliM (12 ak fk»M«i%nn); MFV of boo*: 171 A 163 Upn
Make-up ak 50% bcOV; MFV af boo4t:70 Ifatm; raaauH eaclown tot bom
Maka-vp «ii 50% foe yaml ^aalilatioa; MFV of boo*: 1 10 Ifrm
Make-up *ii 50% (M gmenl vaatiblioa; MFV of boo*: 1 10 Upm
Make-up air SOH for OV; MFV of boodi:70 I%MB; exbausl eacloiuR tot bom
Makc-Hp ai. SOH for geaeral vealiUlioa; MFV of kood>: 1 10 Ifom
Make-up ail 50% for geaeral vcatibttoa; MFV of buodc 110 Ifom
Make-up air 50% for fteaenl vealilalioa; MFV of boodi. 1 10 Itpm
Make-up ail 60% for lab veMitaboai MFV ofbood*: 61 Ifom
Make-*? air 60% for tab veattbltoa; MFV of boo*: 1 3d Ifom
Mak«-Mpak«0%forlabvMliblioa;MFVofboo*: I3alfom
Make-up aw 60% for bb venibiioa; MFV of boo*: 61 Ifcai
Make-up air 60% for bb veaJibtioa; MFV of boo*: 61 Ifom
Make-up ak 60% for bb vealilalioa; MFV of boo*: 1 3S Ifom
Make-up air 60% for bb vealibtioa; MFV of boo*: 6 1 Ifom
tteam auaifiiM wilk earloeed — fc— - —- •—
-------
Table A-1. Spreadsheet Matrix for Type 1 Example Data Set - Full Shift Personal Samples
Plutt
ID
PI
PI
PI
n
n
n
n
n
Pi
n
n
n
n
n
n
n
n
n
n
n
n
n
n
Fl
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
HI
pi
UdutUy
Polymer
Poly met
Poly BUI
PotyMi
Polywr
Potyiwr
Poryawr
Pory«M
Pory«cr
Poly*ci
Poty«Mf
PoryM
Poly MM
Porywr
PotyMf
Poryn*r
PolyoMf
Poly am
IUO
Process type
Polymcriulio* or iMclio*
PoryncriulioB of IC*UIUB
Potyawfualioa or ICW.IHII
PotyncrizalioB or ie«tUo«
PolynenzalioB or ruclioc
Polymeiiulioa or mtfiiM
PorynciuMiui or M»OMM
MymeriiMtiom or ructtoa
PolymiiMlto* or i
Poly •Miizttio* or
My«Mrir tlioi or
Poly •MtixMio* or
Poly •wrisattM «r
Pory •MtiMlio* or
B or mclio*
PoryawnzMioa or n*cltoo
fotym»nuno» at n»sn»*
PolyBHnultoi ot raMlio*
Pory«wriMliaB or nacttoo
JobliUe
Process icchwciM
PfOMM ICCBMCiM
PffOCCtt ItcfcMCi&B
PlUKu K
PIOUH IccluiicUa
ProMM Icchakua
Provoss tefhaif ii>
PTOMM ttckokiu
PtOMM iMkMCtM
PfOMM HckMCIM
Prec«M tockMciM
Itaetw tockaiciM
PracoMtodHUciM
ProMMtochwciM
Procow teclMicMB
Cortrol
typ*(b)
Ainlio>
(«••)
510
491
470
441
456
44)
450
462
455
451
455
447
451
MS
462
4)8
460
449
45)
45)
4)7
455
458
457
449
)46
457
)S4
451
458
451
457
498
401
471
426
484
419
47)
508
497
495
424
472
465
408
1-teTWA.
(»f«>
0.0)5
0.047
0452
0.191
04)1
0479
0.194
0.07)
0416
04))
•412
0.779
0.022
408
0427
0468
0428
0441
04)1
0.041
0.015
0.055
O.OW
0.047
0.112
«*0.008
0.012
0450
0.1)8
0416
04)1
0.052
0.258
0.200
0412
O.OM
0.009
04))
•&4406
04)2
406
0.115
04))
0479
0421
0491
CoMroi detcriplioa
-------
PlMtf
ID
P3
PJ
PI
P4
P4
P4
PS
PS
PS
PS
PS
PS
PS
PS
PS
PS
PS
PS
PI
PI
PI
Fl
PI
II
1*1
PI
n
P2
P2
P2
P2
P2
P2
n
N
N
N
N
N
P4
N
N
N
N
N
IM
Polyner
Poly OKI
Poly..,
PoJyoMt
Poly MIC
PolyMf
PolyMU
PotyMf
PolyaMt
Polyau
PolyMf
100
Table A-1. Spreadsheet Matrix for Type 1 Example Data Set - Full Shift Personal Samples
Proccw type Job title
»!•*
caatnilaiyw
CmHbiag ud dryiat
diMtbug MM! 4ryi«t
Ciunbiic Md dryu|
Cnuabu« ud dtyiag
CiiMrfMf Md dryug
Proc«M IccMtcuB
Pt OCM* lectaiciaa
PntMMtoclwkiM
PlMMf IcdMMCIMI
PIUMU ICCMKUB
flOCCU UCMKWB
ProcMitecMicua
PMMM
PIOCMI iir taif in
PfUCM* MdHUCMB
PlOMMUCMkiM
Ptoc«M tockakua
Co*tn>4
>y|M(b)
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
SMNpl*
fttmltra
(•«)
462
469
199
367
522
450
436
406
463
433
443
443
419
456
453
416
412
44>
4M
467
501
45*
475
462
491
4*7
475
412
337
435
2N
407
407
115
496
513
467
449
413
471
501
523
4M
501
471
4tt
R-teTWA.
(MM)
0.018
otf.006
0.021
oo4oa
0040S
00.006
0492
0.134
0.109
0^17
0442
0.010
0.169
•c-0.006
0437
0495
0.107
042*
0414
0443
0471
0431
042*
0453
0.069
0439
0.040
0419
0427
0442
0452
04IS
0425
0441
0040$
0413
0414
00406
00406
0415
04*1
0419
00.006
0409
0417
0410
Coaliul dewnpliua
-------
PU*
ID
P4
P4
P4
P4
P4
P4
P4
M
M
M
P4
M
M
P4
P4
P4
M
M
PS
PS
i-i
Pi
Pi
PS
PS
•S
PS
PS
PS
PS
PS
PS
PS
PS
PS
PS
PS
PS
PI
PI
P4
P4
P4
P4
UdMUy
PolyoMi
Polynef
Poly«M
PolyMf
PolyMf
PolyMf
Poly MM
PolyMf
PolyMf
PMyi
PolyMf
PolyMf
PotynMf
| Polymcf
1 Culyuw
I Pulyiw
PotyMf
PolyMf
Poly MI
PotyMf
Poly MM
PolyMf
PolyMf
ill i Liai ••
POvjMMff
Polya
PolyMi
PolyMi
100
42
42
42
42
42
42
42
42
42
42
42
42
42
42
42
42
42
41
4)
4)
41
41
41
41
41
4)
41
41
41
41
41
41
41
41
41
41
41
41
41
41
Poly MM
Poly OKI
Table A-1. Spreadsheet Matrix for Type 1 Example Data Set - Full Shift Personal Samples
PloccMlype
Joblille
PIOCMI icckaiciM
PtOMM ICCbMCIM
PH
PlIKCM ICCkMCIM
Wcnkouw
Pf QC4
fnttm ttttmtnm
PlOMM
M tockaicuw
PTOMM
Proc4U uckucua
PIOMW uckuciu
PfOMW MclMictM
Protcu MckwciMi
frtMml
IrfVHHU
lyp«(b)
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
Senile
l^ml^nf
(«•)
441
496
4tt
468
449
406
27*
46S
461
4SI
444
Sll
494
192
461
467
471
471
411
411
4S8
4SI
41K
191
416
417
4S2
449
4S9
442
412
419
41S
460
461
420
471
444
47S
461
461
464
44*
47*
194
479
1-hiTWA.
(PP»)
<»0006
«4.006
0.011
6
•04.00)
OjOH
<-O.007
0.012
<-0.006
<-0.006
04)12
OA11
04)17
O.IU)
0)00
0.016
<-O.OH
0.0)6
0.144
0.044
O.IUO
04)42
04)51
04)76
OjOMt
•417
04)76
O.OW
04)M
04)11
0.112
04)29
04)SI
0.020
0.014
OJ07
«-O4MM
oOiOOS
««M6
-------
Table A-1. Spreadsheet Matrix for Type 1 Example Data Set - Full Shift Personal Samples
PlMl
ID
P4
P4
P4
P4
P4
P4
N
N
P4
P4
N
N
PS
PS
PS
PS
PS
PS
PS
PS
PI
PI
PI
PI
PI
PI
PI
PI
PI
PI
PI
PI
PI
PI
PI
PI
PI
PI
H
PI
PI
PI
PI
PI
PI
PI
UAuUy
Poly MM
PolyMM
Poly MM
PolyMM
PolyMM
PolyMM
Poly MM
PolyMM
PolyMM
PolyMM
PolyMM
PolyMM
PolyMM
PolyMM
PolyMM
PolyMM
PolyMM
PolyMM
m, hiin,
I*™*M
PolyMM
Poly MII
PolyMM
PolyMM
Poly am
Polynei
IDO
PIOMM type
l.»bofMoiy
Lftbonkxy
JoblUle
Ltbonioiy uck •
Ltbonioiy uck
Ltbontofyuck
Ltbonfcxy icck •
Ltboratoty uck •
UbontofyUck
UbontoiyUch
•oalytu
uulyiii
MMlyw
wulyiu
•MlyM
UbwMoiytoco
l.ikinliiyuck
Likiniiiyuck
Lihuniiyuck
LtMMMMyUck
JMHiUniyuck
UoontotyUck
LtbMMMyUck
LaboiMMyUck
Ltbociloty ucfc
Lthoraloiy Ucfc
Ubomoiylock
• MMlyoi*
tMlytii
MMlyMi
• •MlyM
MUlyM*
Mulyiu
MMlym
CoMrol
«yp«(b)
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
S*mpk
AMMMM
(Mia)
4*1
295
4S4
469
4*3
4*3
4*1
4**
4*3
4*0
47*
474
411
430
415
475
441
420
439
47*
470
4*4
471
45*
439
442
4*1
1*1
47*
3*5
4*7
439
450
470
4*7
4*4
4*3
4*2
459
43*
522
43*
432
4*1
475
4*9
1-krTWA.
(PP»)
0.023
0.17*
-------
PI**
ID
P2
n
n
n
PI
PI
PI
PI
PI
PI
IxfciMiy
Polymer
PolyoMf
Polymer
Polymer
Polymer
Polymer
Polymer
IUO
Table A-1 Spreadsheet Matrix for Type 1 Example Data Set - Full Shift Personal Samples
PIOMM type
CoMmlfMMi
Practwi
PncM*i
ftaMMI
PMC*MMM
KibUUe
Provcu uckaiciac
PIOMU uclwicia*
UUiUMOfMttlor
UUiliM of mior
Caniml
V.OOS
0.022
<4>.006
0.00*
O.J04
0.056
00.006
0.2)0
0.02)
0494
(a) IOO - Ukul cateapoM
(b)Tk*febV>wM«aMllMc
6) 40 BJMCMI (Mki-tip aii M Iks Ufcontoty.
: SOMM «fdaU-N|OSH/EPAfi«td Italy. IjUutataiy aaalysis limit ofdelcclioa
Stajplti IRMI Ika HMM a4aal aad pm«csi> type wcw t.uttc>.k
-------
APPENDIX B
BACKGROUND INFORMATION ON STATISTICAL METHODOLOGY
-------
APPENDIX B
BACKGROUND INFORMATION ON STATISTICAL METHODOLOGY
This appendix presents background information for statistical methods used in these guidelines,
as well as others that may be useful in the context of occupational exposure monitoring. Some of the
topics include log-normal distributions, analysis of variance, data transformations, tests of distributions,
cluster analysis, outliers, and confidence intervals. The engineer may wish to become familiar with these
methods and the statistical assumptions associated with etch method. References such as Massey, 51,
for the K-S test; Cochran, 63; Danid, 78; Conover, 80; etc. should be obtained and consulted, as needed.
EPA statisticians should also be consulted, as required.
Box-and-Whisker Plot
Box-and-whisker plots are useful for the graphical identification of possible outliers. The box
plot presents a dear depiction of outliers, compared to the majority of the whole data set. The box
portion of a box plot extends from the 25th percentile to die 75th percentile of die observed data (i.e.,
25% of me observations are at or bdow me bottom of the box and 25% are above the top of the box).
That range is called the interquartile range. The whiskers extending from the box cover only 1.5 tinm
the interquartile range. Any points outside 1.5 times the range are presented individually. This allows
clear identification of outliers.
Analysis of Variance
Analysis of variance is me basis for many statistical techniques. It is applicable to normally
distributed data (observations for which the errors are assumed to be normally distributed), especially in
the context of testing for significance of possible explanatory variables.
Nested analysis of variance' is a particular form of analysis of variance mat addresses the issues
associated with hierarchical (nested) data structures. In such structures, the variations induced by one
variable are nested within (vary around) means that are dependent on die value of anomer variable, and
which may also vary. Box (78) presents a nice discussion of nested designs and their analysis. Samuels
(85) discusses the nested structure for occupational exposure data.
Tests of Distribution*
The guidelines assume a log-normal distribution but mere are dine common approaches to
quantitatively testing groups of data to determine if they can be described by certain distributions: the
Shapiro-Wilk statistic, the Koimogorov-Smirnov approach, and the ratio statistic.
B-l
-------
The Shapiro Wilk statistic involves covariances between the order statistics of a standard normal
distribution. It is similar to a test that examines the correlation (squared) between the observed order
statistics and hypothetical order statistics. Order statistics are simply the observations (or hypothetical
values) arranged io ascending order: the first order statistic is the smallest value, the second order statistic
is the next smallest, etc. Simulation studies have suggested that the Shapiro-Wilk statistic is more
powerful than the Kolmogorov-Smirnov test. Note that it can be applied only for testing for normality.
Bickel (77) gives a short discussion and references to material on the Shapiro-Wilk statistic.
The ratio test was proposed in Waters (91) as a procedure for testing for log-normality. It makes
use of two estimates of the mean of a log-normal distribution. In fact the ratio that gives mis test its
name is the ratio of those two estimates and is very easy to calculate. Its application requires the
estimation of the coefficient of variation (related to me geometric standard deviation) and use of tables
derived in Waters (91). Those tables are not complete for large values of me coefficient of variation.
Waters (91) compared the ratio test favorably to the Shapiro-Wilk and Kolmogorov-Smirnov approach.
The Kolmogorov-Smirnov (K-S) approach is a widely used technique. The particular application
presented here ts for testing for normality, and nas been called the Lilliefors test. K-S approaches are
applicable more generally for testing for a variety of distributions.
The calculations needed to apply me Lilliefors test are discussed in some detail hen. The
procedure consists of the following: 1) deriving the sample cumulative distribution function for dtt
observed data; 2) calculating me sample mean of the data (which may be concentrations if testing for
normality or log-transformed concentrations if testing for log-normality); 3) calculating me sample
standard deviation of the data; 4) standardizing the data; 5) determining the theoretical cumulative
distribution; 6) identifying me value for passing the K-S test (the critical value); 7) calculating the
maximum difference between the theoretical cumulative distribution and the sample cumulative
distribution (the test statistic); and 8) determining if the data pass the test.
1. Derive the Sample Cumulative Distribution Function
The monitoring results for a group are arranged in ascending order lowest value first and the
highest value last. Next, the values for the sample Cumulative distribution function are calculated on the
sorted data. The cumulative distribution function t. r «ich data point is equal to me proportion of values
less than or equal to the given point, as presented n Equation Bl.
SCO| * i / n Equation Bl
where:
SCD| • the sample cumulative distribution function value for observation i
n a number of data points.
2. Calculate the Sample Mean of the Data
The sample mean of the data is calculated using Equation 9 (for the concentrations) or Equation 2
(for transformed data) from Step 19.
B-2
-------
3. Calculate the Sample Standard Deviation of the Data
The sample standard deviation of the transformed data is calculated using Equation 10 (for die
concentrations) or Equation 3 (for transformed data) from Step 19.
The purpose of this step is to standardize the data to the standard normal distribution curve. The
equation for standardizing the transformed data is presented in Equation B2.
z, » (y, - SM)/SSD Equation B2
where:
z, » a standardized data point
SSD =• the sample standard deviation of the dan from 3 above
SM » the sample mean of the data from 2 above
y. a a data point (either a concentration or transformed concentration)
Subtracting SM shifts the mean to zero, and men dividing by SSD scales the variable so that tte
standard deviation is 1 rather than SSD.
5. Determine the Theoretical Cumulative Distribution
This step consists of calculating a values corresponding to a meoretical (normal) cumulative
distribution function for the standardized transformed data. The distribution may be calculated manually
using a standard normal table or determined by one of several statistical software packages (see
Appendix Q. A standard normal table may be found in many statistical texts, including Bickd (77).
6. Identify the Val'"> far Pajuing thfl K*S Test
Table Bl presents critical values for the Lilliefors test (Conover, 80).
The critical values depend on the sample size and the level of statistical significance required.
For sample sizes between the values on Table Bl , the value for the next highest sample size can be used.
7. ClIfiBJMB fltf D jrlJMences Between the Values of me Theoretical Cumulative Distribution
* Distribution
This step consists of subtracting the values of the theoretical cumulative distribution function from
the values of the sample cumulative distribution function and taking the absolute value, for each of the
data points. The goal is to identify the maximum vertical difference between the sample and meoretical
cumulative distribution functions. Since die sample cumulative distribution function is constant for values
between me data points, the differences examined should include those between me value of me sample
cumulative distribution function at a particular data point value and (1) me value of the meoretical
B-3
-------
cumulative distribution function at that data point value and (2) the value of the theoretical cumulative
distribution function at the next data point value.
8. Determine If the Data Pass the Lilliefors Test
. If none of the absolute values of the differences between the theoretical cumulative distribution
and the sample cumulative distribution exceed the critical value identified in 6 above, then it may be
concluded that the data can be described by a normal distribution. If one or more of the absolute
differences exceed the critical value, the normal distribution is not appropriate.
TABLE Bl. CRITICAL VALUES FOR LILLIEFORS TEST (Conover, 80)
Sample size
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
25
30
Over 30
Level of siffnjficancft
0.20
0.300
0.285
0.265
0.247
0.233
0.223
0.215
0.206
0.199
0.190
0.183
0.177
0.173
0.169
0,166
0.163
0.160
0.142
0.131
0.736
0.15
0.319
0.299
0.277
0.258
0.244
0.233
0.224
0.217
0.212
0.202
0.194
0.187
0.182
0.177
0.173
0.169
0.166
0.147
0.136
0.768
0.10
0.352
0.315
0.294
0.276
0.261
0.249
0.239
0.230
0.223
0.214
0.207
0.201
0.195
0.189
0.184
0.179
0.174
0.158
0.144
0.805
0.05
0.381
0.337
0.319
0.300
0.285
0.271
0.258
0.249
0.242
0.234
o.:i7
0.2:0
0.213
0.206
0.200
0.195
0.190
0.173
0.161
0.886
0.01
0.417
0.405
0.364
0.348
0.331
0.311
0.294
0.284
0.275
0.268
0.261
0.257
0.250
0.245
0.239
0.235
0.231
0.200
0.187
1.031
B-*
-------
Data Tfflpstbrmations
The guideimea consider a simple data transformation, the log transformation. That transformation
is just one of a family called die Box and Cox transformations. Such transformations are often considered
prior to analysis of data in order .to make the data more normal and to make the variances in different
groups more similar, both of which are desirable for most analysis of variance approaches, for example.
The reader is referred to Stoline (91) for a discussion of me Box and Cox family of transformations
applied to environmental data. Samuels (85) also considers transformations other than the log
transformation for occupational exposure data. The guidelines do not recommend transformations other
than the log transformation because of the computations involved, because the properties (e.g., mean and
standard deviation) of die tog-normal distribution are well known whereas die interpretation and
calculation of descriptive statistics based on other transformations is not straightforward, and because the
log-normal distribution is an accepted distribution for concentration data.
Log-normal Distribution
The log-normal distribution has been studied and applied to concentration data for many yean
(Aitchuon, 57; Johnson, 70). The estimation of die mean of me log-normal distribution is discussed in
detail in Attfidd (92). Note that the formula for MLEA in Attfidd (92) is incorrect as stated: multiply
the formula given by exp(f) to get the corrected value for MLE*. Confidence limits for the mean of a
log-normal distribution are presented in Armstrong (92). Samuels (85) shows how confidence intervals
for the concentration data means can be derived from standard deviations and standard errors associated
with transformed data.
The calculation of confidence interval is an important means of presenting the degree of certainty
about the estimates of any particular parameter. It is important to note that a confidence interval for a
mean, for example, must be based on me variance associated with diat estimate, not with the variance
associated with me individual observation in the population. Thus, me standard error of die mean
(which is die square root of me variance of the mean estimator) should be used to define a confidence
interval for me mean.
Confidence intervals for means also depend on die data structure and die distribution of me data.
Although asymptotically (as die sample size gets very large) a mean will be normally distributed, no
matter what me underlying distribution of the observations may be, for relatively small sample sizes the
normal approximation may be poor. Thus, contidence intervals for a log-normal mean, for example,
have been specifically defined (Armstrong, 92). Standard errors and merefore confidence intervals can
be defined for transformed concentrations and convened back to me original scale (Samuels, 85).
Standard errors due take into account nested data structures can also be computed (Samuels, 85) and used
to define confidence intervals.
B-5
-------
Techniaues rn Combine Groups
One of the final quantitative steps in the guidelines is to obtain statistics for combinations of
groups. As is discussed in the text, this should only be attempted when appropriate. The only techniques
identified as appropriate are from stratified sampling theory. These techniques can be considered because
they allow tor estimation of means and standard deviations across groups with widely different population
sizes. The properties of these estimates are not known for nonrandom sampled data. This fact should
be stated if such estimates are used.
Cluster Analysis
Another approach to defining groups for statistical analysis is based on a procedure known as
cluster analysis. That approach examines characteristics of the measurements within groups (clusters)
and determines when two groups are similar enough to be combined. The cluster analysis approach is
described here is some detail.
Cluster analysis is an iterative procedure by which clusters are combined. Combination proceeds
in order of similarity: the most similar groups are combined first, then the next most similar, etc. Each
group of measurements (e.g., a set of observations sharing the same values for all the important exposure
parameters identified by the engineer or industrial hygienist) starts out as a single cluster; when two
groups are combined, the combined group replaces the two groups mat were combined, for the purposes
of comparison with other groups and additional combination.
In order to conduct a cluster analysis, some measure of similarity is required. The simplest
measure, and one that can easily be used for routine application to occupational exposure data, is based
on the mean values of the measurements within groups: two groups are considered similar when the
difference between their mean values is small. This clustering method is referred to as the unweighted
pair-group method using arithmetic averages (UPGMA).
The advantage of this method of clustering is that it does not require the specification or
assumption of an underlying distribution for the measurements within the groups. A disadvantage is mat
this method only compares the mean values within groups and does not consider omer descriptors of the
wjthin-group measurements, such as variation. Some other methods for defining the similarity of groups
are discussed and compared with me UPGMA method in the SAS manual. In some applications those
other methods may be more appropriate dian the simple UPGMA procedure. Consultation with a
statistician is recommended in diose cases, and may even be required when die UPGMA method is all
that is desired.
A cluster analysis can proceed until all the groups are combined into one cluster. Output from
a computer package will specify which clusters are combined at each step and the similarity (difference
in means for the UPGMA method) of the clusters combined at each step. The engineer can examine the
output and determine at what point the clustering is sufficient, where "sufficient" clustering is based on
consideration of sample sizes attained, on the similarity of the clusters that are combined, or on a
combination of those two factors.
The goal of this procedure is to increase sample sizes and define uniform groups. It is
inappropriate to combine groups that are quite dissimilar, just to get big sample sizes. Thus, some
B-6
-------
decision by che engineer, in consultation with the statistician, must be made about the weight to be given
to the conflicting pressures of those two considerations (sample size vs uniformity). It is recommended
that the engineer and statistician decide on a "stopping rule" prior to the running of the cluster analysis.
The stopping rule will specify the largest measure of similarity (largest difference in means for the
UPGMA method) thai will be considered acceptable for combination to occur. The "knowledge of the
engineer and the statistician is required to select a stopping rule, as there is currently no statistical test
or probabilistic measure that can tell the user when the clustering of groups is inappropriate. An
examination of the initial groups, their means, and the overall mean for all groups may provide some
indication of a stopping rule to consider.
One drawback to the cluster technique is that it can combine groups which do not belong together
from an engineering perspective. A priori selection of appropriate and inappropriate groupings of data
based on engineering judgement can be used to prevent inappropriate clustering of the data. The
ANOVA technique discussed in Step 15 does not have this problem. However, the ANOVA technique
is most appropriate for data from designed, controlled experiments.
B-7
-------
APPENDIX C
LISTING OF COMPUTER SOFTWARE FOR
VARIOUS STATISTICAL ANALYSES
-------
APPENDIX C
LISTING OF COMPUTER SOFTWARE FOR
VARIOUS STATISTICAL ANALYSES
Box-and-Whisker Plot
There are many software packages available on the PC for this technique. These include CSS,
NWA Statpak, Solo, SPSS/PC Plus, Statgraphics, Statpac Gold, Systat/Sygraph, SAS, and BMDP.
Analysis of Variance.
Analysis of variance is a standard statistical tool available in the software packages CSS, NWA
Statpak, Sclo, SPSS/PC Plus, Statgraphics, Statpac Gold, Systat/Sygraph, SAS, and BMDP. Not ail of
these packages can provide the results needed to obtain variance components for a nested analysis of
variance. SAS has a special procedure, PROG NESTED, which does just that.
Distribution Testi
The Shapiro-Wilk test is provided as an option in the SAS procedure PROC UNIVARIATE.
Many software statistical packages have the K«S type test procedures for the PC: CSS, NWA
Statpak, SPSS/PC Plus, Statgraphics, Statpac Gold, and Systat/Sysgraph. For these packages, the user
compares a normal distribution to a set of data.
Theoretical Cumulative Distribution
The many software packages available for computing the standard normal theoretical cumulative
distribution function include CSS, NWA Statpak. Solo, SPSS/PC Plus, Statgraphics, Statpac Gold,
Systat/Sysgraph, SAS, and BMDP.
C-l
------- |