GUIDELINES FOR STATISTICAL ANALYSIS
     OF OCCUPATIONAL EXPOSURE DATA

                   FINAL
                     by

         IT Environmental Programs, Inc.
              11499 Chester Road
           Cincinnati, Ohio 45246-0100

                    and

             ICF Kaiser Incorporated
              9300 Lee Highway
           Fairfax, Virginia 22031-1207
            Contract No. 68-D2-0064
            Work Assignment No. 006
                    for
OFFICE OF POLLUTION PREVENTION AND TOXICS
  U.S. ENVIRONMENTAL PROTECTION AGENCY
             401 M STREET, S.W.
          WASHINGTON, D.C. 20460
                 August 1994

-------
                                       DISCLAIMER

       This report was developed as an in-house working document and the procedures and methods
presented are subject to change.  Any policy issues discussed in the document have not been subjected
to agency review and do not necessarily reflect official agency policy.  Mention of trade names or
products does not constitute endorsement or recommendation for use.

-------
                                 CONTENTS


FIGURES	   v

TABLES	   Vj

ACKNOWLEDGMENT	vu
                                                           *
INTRODUCTION	   1
      A.    Types of Occupational Exposure Monitoring Data	   1
      B.    Types of Occupational Exposure Assessments	   2
      C.    Variability in Occupational Exposure Data	   3
      D.    Organization of This Report  	   4

STEP 1:  IDENTIFY USER NEEDS	   9

STEP 2:  COLLECT DATA 	  15
      A.    Obtaining Data From NIOSH	  15
      B.    Obtaining Data From QSHA	  16
      C.    Other Sources of Data	  17

STEP 3:  DEFINE DATA NEEDS 	  19

STEP 4:  IDENTIFY PARAMETERS AFFECTING EXPOSURE	  21

STEP 5:  IDENTIFY UNCERTAINTIES, ASSUMPTIONS, AND BIASES	  26
      A.    Uncertainties	'.	  26
      B.    Assumptions	  27
      C.    Biases	  28

STEP 6:  CREATE PRELIMINARY EXPOSURE DATA MATRIX 	  30

STEP 7:  CHECK FOR CONSISTENCY AND REASONABLENESS  	  33
      A.    Grouping of Like Types of Data	  33
      B.    Conversion to Consistent Concentration Units  	  34
      C.    Conversion to Consistent Exposure Periods	  34
      D.    Identification of Assumptions	  36
      E.    Checks for Consistency and Reasonableness	  36

STEP 8:  COLLECT ADDITIONAL MISSING INFORMATION .  	  38

STEP 9:  ESTIMATE ADDITIONAL MISSING INFORMATION 	  39
                                     ii

-------
STEP 10:  REVISE EXPOSURE MATRIX AND IDENTIFY DATA BY TYPE	  41

STEP 11:  ASSESS ABILITY TO MEET USER NEEDS	  43

STEP 12:  TREAT TYPE 3 DATA	  44

STEP 13:  TREAT NONDETECTED VALUES	  47

STEP 14:  SEPARATE INTO TYPE I DATA AND TYPE 2 DATA  	  50

STEP 15:  DEFINE GROUPS FOR ANALYSIS 	  51
      A.     Identify Initial Grouping	  53
      B.     Log-Transform the Data	  56
      C.     Graphical Examination of the Data: Check for Outliers	  56
      D.     Analysis of Variance	  58
      E.     Redefining Groups  	  67

STEP 16:  TREATMENT OF TYPE 2 DATA	  68
      A.     Considering Addition of Type 2 Data	•.	  68
      B.     Adding Type 2 Data 	  68
      C.     Summary of Remaining TyM 2 Pfltt	  68

STEP 17.  CALCULATE DESCRIPTIVE STATISTICS FOR EACH GROUP	  72

STEP 18:  TREAT UNCERTAINTIES, ASSUMPTIONS, AND BIASES  	  76
      A.     Sensitivity Analysis	  76
      B.     Confidence Intervals 	  77
      C.     Quantification of Bias  	  78
      D.     Weighting Factors to Mitigate Bias 	  78

STEP 19:  PRESENT RESULTS  	:	  81
      A.     Characterization of Exposure	  81
      B.     Presentation of Descriptive Statistics  	  82
      C.     Presentation of Assumptions and Uncertainties	  83
      D.     Present Original Data	  90

REFERENCES	   R-l

GLOSSARY OF TERMS 	   G-l

APPENDIX A	   A-l
      SPREADSHEET MATRK FOR TYPE 1  EXAMPLE  DATA SET
      FULL SHIFT PERSONAL SAMPLES
                                     111

-------
APPENDIX B . '. .	B,l
     BACKGROUND INFORMATION ON STATISTICAL METHODOLOGY

APPENDIX C	  C-l
     LISTING OF COMPUTER SOFTWARE FOR
     VARIOUS STATISTICAL ANALYSES
                                  IV

-------
                                       FIGURES


                                                                     *
Number                                                                             Page

1      Flow Diagram for Creation of Preliminary Exposure Matrix  	6

2      Flow Diagram for Creation of a Completed Exposure Matrix	7

3      Flow Diagram for the Statistical Analysis of Type 1 and Type 2 Data	8

4      Statement of Needs	  11

5      Flow Diagram for Step 15 (Define Groups for Analysis) . .	  52

6      Box-and-Whisker Plot for Monomer Industry Categories	59

7      SAS Output for All Monomer Industry Categories Combined	61

8      SAS Output for Test of Company, Process Type, and Control Type in Monomer Industry  64

9      SAS Output for Test of Process Type and Control Type in Monomer Industry  	65

10     SAS Output for Test of Company, Process Type, and Control Type in Polymer Industry .  66

11     Flow Diagram for Step 16 (Treatment of Type 2 Data)	70

12     Box-and-Whisker Plots for Monomer Industry Groups  	  86

13     Example Bar Graph for Polymer Industry Groups:
       Means and Maxima Compared to 3 Target Levels	  87

14     Example Format for Presentation of Assumptions and Uncertainties	89

-------
                                         TABLES

Number                                                                                page
                                                                       «
1      Example Preliminary Exposure Data Matrix - Full Shift Personal Samples	  31

2      Type 2 Data Used in Statistical Analysis	  71

3      Descriptive Statistics for Groups in Example Data Set 	  74

4      Descriptive Statistics Presentation, Example Data Set	  84
                                             VI

-------
                                  ACKNOWLEDGMENT
                                                                       *
       Many individuals and organizations  have been helpful in developing this report; for these
contributions the project management extends its sincere gratitude.

       Mr.  Paul  Quillen and  Ms. Breeda  Reilly were the EPA Project Officers and  Ms. Cathy
Fehrenbacher was the EPA Work Assignment Manager.  Mr. Thomas Corwin,  IT  Environmental
Programs, Inc., was the Project Director and Mr. Edwin Pfetzing the Project Manager.   Mr. Robert
Goodman, IT Environmental Programs, Inc.,  assisted in the preparation of the report. Ms. Nora Zirps
was the ICF Project Manager.  Dr. Erwin Hearne and Mr. Bruce Allen, K.S. Crump Division of ICF
Kaiser developed the statistical  methodology  for the report.  Extensive review of and comment on the
guidelines was made by Drs. Rick Hornung, Larry Elliott, Steve Ahrenholz, David Utterback, and
Thurman Wenzel,  NIOSH; and  Elizabeth Margosches and Gary Grindstaff, EPA.

       Peer review was provided by Dr. Neil C. Hawkins, Dow Chemical Company; Mr. Keith A.
Motley, OSHA; Dr. Stephen M. Rappaport,  University of North Carolina; Col. James C.  Rock, U.S.
Air Force; and Dr. Steve Selvin, University of California, Berkeley.
                                             vu

-------
                                      INTRODUCTION
       The purpose of these guidelines is to establish a consistent approach to handling the wide variety
of occupational exposure data available for preparing occupational exposure assessments in support of risk
assessments.  It provides guidance in the characterization of broad ranges of job groups with similar
exposures,  calculation of descriptive statistics (where appropriate) and  treatment  of  uncertainties,
assumptions, and  biases in the data.  It is designed to  be used by engineers in the Office of Pollution
Prevention  and  Toxics (OPPT), with some assistance from industrial hygienists and statisticians.  The
procedures  described provide a systematic methodology  for performing an occupational  exposure
assessment  based upon the types of data which are most  commonly available for such analyses.  Methods
used by OPPTs Chemical Engineering Branch (CEB) to prepare assessments of occupational exposure
and environmental release are presented in the CEB Engineering Manual (IT, 91).  These guidelines an
a supplement to the CEB Engineering Manual intended for use with recently collected data.  It should
be noted that these guidelines are not intended to  provide recommendations for performing additional
monitoring of exposure or for determining compliance with regulatory standards. If mis is the goal, the
reader should consult other references such as Hawkins (91) and Patty (81), etc.
A.     Types of Occupational Exposure Monitoring Data

       Monitoring data usually consist of area samples, personal inhalation samples or dermal samples.
Area samples are collected to represent the airborne concentration of a chemical in a specific location at
a facility. Personal samples are collected to represent a worker's inhalation exposure during a specified
time period; for example, peak, ceiling, short-term, and full-shift samples.  Peak or ceiling samples are
typically collected  instantaneously through continuous monitoring or for 15 minutes or less.  Short-term
samples  are collected over a designated period, typically less than  2 hours.  Full-shift samples are
collected to represent a worker's  inhalation exposure over an entire work shift and may be composed of
a single sample or consecutive short-term samples.  Dermal samples are collected to represent a worker's
dermal exposure to a given chemical over a portion of the body which has been in contact with the
chemical. Exposure data collected for each type of exposure should be separated and statistical analyses
conducted separately.

       Biological  monitoring may also be used to determine an employee's overall exposure to a given
chemical by  measuring the appropriate determinant in biological specimens collected from exposed
workers  at the specified time. While biological monitoring provides  information complementary to air
monitoring, interpretation of data can be difficult due to variability in the physiological and health status
of the individual, exposure sources, individual life style, analytical errors, etc. If biological monitoring

                                               I

-------
data are available', this fact should be noted in the exposure assessment.  This report does not address
biological monitoring but focuses on air monitoring data collected to assess inhalation exposure.

       For the purposes of this  report, three broad categories of occupational exposure  data are
considered:

       •      Type 1 data consist of measurements for which all important variables are known. The
               data consist of studies that contain individual measurements and include all backup and
               ancillary information (e.g., analytical method, limit of detection, sampling duration, type
               of sample taken, job tasks, etc.).

       •      Type 2 data consist of measurements where important variables are not known but for
               which assumptions can be made for  their estimation.   The data consist of individual
               monitoring measurements, but backup and ancillary information are inconsistent.

       •      Type 3 data consist of measurement summaries, anecdotal data, or other data for which
               the important variables are not known and cannot be estimated.  Individual monitoring
               measurements are typically not available.

       These categories were developed for use with these guidelines; judgment is used in determining
the type(s) of data available.  Examples and additional information on the categories are provided
beginning with  Step 10.

       Once satisfied  that the data have been properly  collected for the objective of the study, the
primary determinant of the confidence one can place in the analysis is the sample size.  Every effort
should therefore be made to collect and analyze every available piece of data.  Because the size of the
data set being  analyzed has a large effect on the confidence  that can  be placed in the analysis, the
methodology set forth in these guidelines allows the combination of similar data sets based on statistical
tests. The traditional categorization of data by the industrial hygienist or engineer is supplemented by
statistical analysis of the categorization; the goal is identification of groups of data that are as large as
possible and describable by standard statistical distributions (lognormal and normal).
B.      Types of Occupational Exposure Assessments

        There an various types of exposure assessments performed  by OPPTs* CEB.   The  main
distinction between them is the level of effort expended in collecting data. Regardless of what type of
data are obtained, however, the CEB engineer should review the level of detail required in the exposure
assessment and try to provide the best and most complete analysis of the available data.

        The following are examples of the program areas and types of exposure assessments performed
by CEB:

-------
       •      New Chemicals Pmyram  An initial screening assessment is performed with a goal to
               determine the high end and central tendency exposures, generally  using available
               information and information submitted in the Premanufacture Notification (PMN).  In
               reality, these estimates are more likely to be bounding (e.g., overestimates of) exposure,
               due to lack of information.  If there are concerns for worker exposure, the initial
               assessment is  refined as the case progresses through the review process. However, due
               to lack of data on these new chemicals which have not yet been commercialized, this
               often involves the use of modeling or surrogate data, rather than analysis of actual data
               on exposure to the substance of concern.

       •      Chemical Testing.   A preliminary exposure  assessment is completed to determine the
               bounds of potential occupational exposure for chemical testing candidates. This exposure
               assessment is refined as the case progresses and additional information is gathered. Since
               these are "existing" chemicals, there may be some exposure data available on the specific
               substance. These chemicals may be referred to CEB through the Interagency Testing
               Committee (TTC).

       •      Existing Chemicals. An exposure assessment may be an initial screening which is used
               to help determine if further work is needed on the case. If so, a more detailed exposure
               assessment including the range of potential exposure, measure of central tendency, uncer-
               tainty,, etc. is completed for  the population(s) of concern.   A risk assessment is
               performed;  if risk  management action will be taken the exposure assessment may be
               revised to include additional information or to cover additional uses, etc. For some cases
               monitoring studies will be conducted to determine workplace exposure levels.  An
               evaluation of  controls may also be needed.
C.     Variability in Occupational Exposure Data

       It is rare to find studies of occupational exposure based on a statistical approach to providing
representative information for an individual facility; it is even less likely to find such a study that repre-
sents a particular industry subsector or group of facilities.  While random sampling (i.e.,  monitoring
exposure to a group of workers in a random fashion) is preferred, "worst-case sampling" (i.e., monitoring
the individual with the highest exposure) during a 1- to 3-day sampling campaign is common industrial
hygiene practice for compliance with regulatory standards. However, sampling programs are being used
that promote exposure monitoring and periodic surveillance (Damiano, 89; Hawkins, 91).

       Even  in statistically-selected, well-done studies, there  may be  high  variability  in  the
characterization of worker exposure. Measurements at a plant made over a period of no more than a few
days may be all mat are available to characterize exposures over an entire year  or a period of years.
Seasonal variability, interday and intraday variability, and changes in the process or worker activities can
cause the exposure to vary from that measured on a single  day.  Temperature changes can affect
evaporation rates,  and seasonal changes in natural ventilation affect exposure.  Sampling methods and
time periods can also vary.  Seldom can all these variables be measured and accounted for.  However,

-------
 if important variables are identified and quantified, it is hoped the influence of less important variables
 on the overall measure of central tendency will be minimized. Variables that may not be obvious may
 also affect variability among plants  in the same industry category.  Variables such as the age of the plant,
 the age of the  control  equipment,  whether the plant  is  in  a volatile organic  compound  (VOQ
 nonattainment area, and operation and maintenance (O&M) practices at the plant should be investigated.

        When analyzing sample data, it is important to understand the sources of variation in exposure
 sample results that combine to create the observed variability (Patty, 81).  The size of the variation may
 be a function of both the exposure levels and the measurement method.  Both random and systematic
 errors should be considered.

        Random variations in workplace exposure  levels can result in intraday variations, interday
 variations, or variations in exposures of different workers within a job group or occupational category
 (Patty, 81).  Variability in the measurement procedure can be caused by  random changes in pump flow
 rate, collection efficiency, or desorption efficiency.  It is important to realize that random variation in
 real workplace exposure levels will usually exceed  measurement procedure variation by a substantial
 amount, often by factors of 10 or 20  (Patty,  81; Nicas, 91).

        Systematic variations  in the determinant variables affecting workplace exposure levels will lead
 to  systematic shifts in the exposure results.   Variability in worker exposure levels reflects changes in
 worker job operations during a work shift or over several days, production process changes, or control
.system changes.  Systematic  errors in the measurement procedure  can  result from mistakes in pump
 calibration, use of sampling devices at temperatures or altitudes substantially different from calibration
 conditions, physical or chemical interferences, sample degradation during storage, internal laboratory
 errors, and interlaboratory errors (Patty, 81).  These errors may be identified and their effects minimized
 with the use of quality assurance programs (EPA, 92).  Specific variables (parameters) that can affect
 occupational exposure measurements  are more fully discussed in Step 4.

        It is also important to ascertain the objectives of the monitoring study to identify potential biases
 in the data.  For example, if the objective was to sample only well-controlled facilities, then the  results
 would probably not represent the exposure in the industry as a whole. If the monitoring resulted from
 worker complaints, then exposures  may not represent typical exposures. If the monitoring was conducted
 to evaluate engineering controls or  as a preliminary screening of exposure, the results may not represent
 actual employee exposure.  It is important that ail potential variables be identified and evaluated.
 D.     QrysniHtfon flf This Report

        Following the introduction is a 19-step procedure for statistical analysis of occupational exposure
 data.  Figures 1 to  3  present flow diagrams outlining these procedures.  Each numbered step  in these
 figures is explained separately.  Steps 1 through 6 are presented in Figure 1 and  give the actions
 necessary to prepare a preliminary exposure matrix. Steps 7 through  14 are presented in Figure 2 and
 give the actions necessary to prepare a completed exposure matrix from the preliminary exposure matrix
 including preparation of a non-statistical report on Type 3 data.  Steps 15 through 19 are presented in

-------
Figure 3 and relate to the statistical analysis of Type I and 2 data and the presentation of the results. An
example is used throughout the 19 steps to better explain the techniques used in the guidelines. The data
used in the example are based on real data, but have been altered where necessary to emphasize particular
points in the guidelines.

        These guidelines present rather sophisticated approaches for  statistical analysis of occupational
exposure data.  Nonstatisticians may require training or the assistance of a statistician in order to properly
understand and use the guidelines.  The development of software as a companion to the guidelines could
be useful in guiding the user through the analyses and in incorporating more complex calculations for
certain nondefault procedures discussed in Appendix B.

        A bibliography  of references pertinent to occupational  exposure analysis is  also provided.
Appendix A presents a spreadsheet matrix for the example data set.  Appendix B presents background
information on the methodology available to statistically analyze the data. Appendix C presents a listing
of currently available  computer software for the statistical analyses.

-------
                                                                               Preliminary
                                                                                exposure
                                                                                 mabii
Data sources:
NIOSH
OSHA
EPA
Other federal ae
Slate agancias
        alionft
Unions
Journal aitidas
F«M in indMcky
  Figure 1. Flow Diagram for Creation of Preliminary Exposure Matrix.

-------
                 Uncertainty/
                 Assumption
                 List
                                                                  Definition of
                                                                  data needs
                                                                  from Stop 3
Prekrmnary
 exposure
  matrix
                                       Uncartamty/
                                       Assumption
                                       lost
                                             Revise Exposure
                                             Matrix and Idanlify
                                              Data by Typa
   Check lor
consistency and
reasonabianau
Assass ai)ilily to
maat usai naads
                                                                      Cankay
                                                                   informalion ba
                                                                    colaotadoi
                                                                                                                           Completed
                                                                                                                            Typal
                                                                                                                           axposuia
                                                                                                                            mafeix
                                                         Ara
                                                      thara Typa 1
                                                       orTypaZ
                                                         data?
                                                                  Separate into
                                                                 Typa f data and
                                                                  Typa2 data
                                                           Tiaal non-
                                                         datactad vaiuac
                                                                                                                          Completed
                                                                                                                            Typa 2
                                                                                                                           exposure
                                                                                                                            makM
                                                                            Uncertainly/
                                                                            Assumption
                                                                            bsl
Non-stabslical
Report
                         Figure 2. Flow Diagram for Creation of a Completed Exposure Matrix.

-------
                                                                         Type 2 Date
                                                                         Summary
                                                                            T  '

Completed
Typel
exposure
matrix^-—-'
_».

Define Groups
lor Analysis

	 *~

Treafenenl of
Type 2 Data

-^

Calculate
descriptive
statistics lor each
oroup

-^

Treat assumptions.
uncerlaintes.

-^

Present results
I/ j


Type 3 Data
Summary



Figure 3. Flow Diagram for the Statistical Analysis of Type 1 and Type 2 Data.

-------
                              STEP 1: IDENTIFY USER NEEDS
       The first step in an exposure assessment is to identify the needs of those using the information,
usually in some form of risk assessment activity.  The user  is typically the project manager for the
chemical under review.  This step  initially identifies the data requirements of the assessment so mat
resources can be used most effectively to collect pertinent data.

       The level of detail required in an exposure assessment depends on the scope of the risk assessment
activity it supports (EPA, 87).  If the purpose of the analysis is  merely  to screen a new chemical for
potential problems, a much less rigorous bounding estimate of exposure will often be prepared.  These
analyses are- useful in developing statements that exposures are "not greater than" the estimated value.
However, to support a detailed risk assessment, an in-depth presentation of potential exposures must be
prepared.  It is also necessary to know if the end user is interested in a particular demographic group,
route of exposure, frequency and duration of exposure, industry, exposure  period, or omer variable. For
example, if the chemical is of concern because of possible reproductive effects for women of childbearing
age, then every effort should be made to  gather information on the exposure of mis demographic group.
Information needs also depend on the specific health  hazards identified for  die chemical. Some of the
information  needs that may be identified  include:

               Mean, standard deviation
               Geometric mean, geometric standard deviation
               Range of exposures, confidence intervals
               Duration of exposure (hr/day and days/yr)
               8-hour time-weighted average (TWA)
               Peak exposures
               Time period (i.e., particular year, 1989)
               Cumulative exposure over time, lifetime average daily exposure (for possible use in risk
               assessment)
               Probability of excursions or exposure during upsets or emergency release
               Uncertainties associated with the data and assumptions used in analyzing the data

       The objectives of the exposure assessment must be  defined using information obtained from the
"user," typically the project manager for die chemical under review.  To assist in this process project
managers should be contacted initially to discuss the  data requirements of the assessment and asked to
complete a  "statement of customer  needs"  form for  exposure  assessments which are not typical new
chemical-type assessments. When this form (shown in Figure 4) is returned, it will be of value in Step 3
to more completely define user needs.

       Since  health  effects data are often gathered to prepare  die hazard assessment in parallel to the
occupational exposure assessment, good lines  of communication wim me  project manager  and  those
preparing the  hazard assessment will facilitate  information exchange regarding potentially changing
assessment needs.  For example, as new health effects are defined, die exposure data classification or
level of detection required of the analytical methods used may need to be changed.  For example, if

-------
chronic health effects are identified, generally long-term exposures are of interest, while peak or short*
term exposures are of interest for acute health effects.  Timely communication will minimize the changes
that need to be made as well as the need for further data collection.
                                         EXAMPLE

          The example shown below will be used throughout this report to illustrate how the statistical
   analysis proceeds.
                                                                         *

          The example chemical is a colorless gas whose  primary use  is polymerization to make
   various elastomers.  Recent chronic oncology studies indicate that the chemical is carcinogenic in
   mice.  The present OSHA Permissible Exposure Limit (PEL) is 1,000 ppm as an 8-hour TWA, but
   the American Conference of Governmental Industrial Hygienists (ACGffl) recommended a revised
   Threshold Limit Value (TLV) of 10 ppm as an 8-hour TWA.

          The project manager identified two general needs for the exposure assessment.  First, the
   exposure  assessment was needed to do a preliminary risk assessment for all worker exposures to
   the chemical.  Second, it was needed as a baseline to estimate the technological feasibility and cost
   of reducing worker exposure to target levels of 10 ppm, 1 ppm, and 0.1  ppm.  An  example
   statement of needs form for the example chemical is shown in Figure 4.
                                              10

-------
                            Figure 4. Statement of Needs
                    Statement of Customer Needs for
                       CEB Engineering Assessments

  Requester:  Sallv Jonms. Protect Manager	Pat* of Request? 2/20/94

  Tft9 purpose of thtt form is to gethor Information on customer need* to be used
  in developing a CEB engineering assessment. Please note that aU identified needs
  may not be met due to date limitations, resource constraints, etc.  What wftft
  multiple customers of CEB assessments, it Is suggested that the form be
  completed by the individual who wiM be using the specific type of in formation
  provided by CEB.
  Return completed form to:  John Smith. CEB Engineer    Phone: 26O-1234
Section 1.  General Information

A.  Please indicate tht origin of tht east and chemical/use dusttr, etc. (e.g. RM 2 analysis for
hydraane):  RM2 analysis for example chemical.
B. What art tht purpost and goals of tht CEB assessment and tht project?  Develop assessment
occuoational exposure to the example chemical.
C.  What an tht approjdaiatt completion data for tht CEB asstssmtnt and for tht project?  CEB
assessment is due April 4. 1994	

D.  Please identify tht health effects of concern (t.g. cardnogenidty, neurotoxidty, liver effects^
reproductive effects, sensitization, etc.): Carcinogenicitv	
E. Please identifi tht environmental effects of concern :  NA
F. Please identify any specific data, sources, references, or personal contacts you would UJu CEB to
research:  NIQSH and OSHA data.	
                                       11

-------
G.  When do you'need to have an estimate of CEB extramural resources (if any) for this project?  NA


Section 2.  Occupational Exposure Assessment                         Q Not Needed

A. CEB will estimate number of workers exposed for each industry segment of interest. Identify any
special population characteristics of interest (e.g. gender, etc.):   Total number of workers nm»miaiiY
exposed, and population potentially exposed during monomer and polymer production	
B.  Identify specific industry segment(s) of interest (e.g. manufacture, processing and end uses; only
spray coating application end uses, etc.):    Monomer and polymer production.	
C.  Indicate / which types of exposure are of interest:
/     Inhalation exposure                  Q     Dermal exposure
D     Other (e.g. ingestion):	
D.  Identify which worker activities are of interest (e.g. the assessment need only address textile dye
weighers):  All worker activities associated with monomer and polymer production.	
E. Indicate /" the preferred characterization for duration and frequency of exposure:
Q     Short-term exposure (e.g. peak exposure, maximum 15-minute exposure, etc.), for acute health
       effects. Identify specific requirements:	

/     Long-term exposure (e.g. annual average exposure, lifetime average daily dose, etc.), for chronic
       health effects.  Identify specific requirements:  annual average exposure and lifetime average
       daily dose.	
/     Frequency of exposure (days/yr)
/     Cumulative exposure over time (e.g. days, months,  years):    days, months, and years are of
       interest	•
D     Other:  	;	
G.  CEB will attempt to provide a measure of central tendency, and a high end Potential Dose Rate
(PDR), identify assumptions made, and characterize uncertainty, as data and methodologies allow.
Identify any specific needs (e.g. specific statistical descriptors, etc):  Statistical descriptors of geometric
mean, arithmetic mean, geometric standard deviation, arithmetic standard deviation, the distribution of
the data  and a ffranhic presentation of the data are D referred.	
 H. Please identify any other special needs for the occupational exposure assessment: Estimate of me
 technical feasibility of controlling exposure to 10 pom. 1 com and Q.I DOm.
                                              12

-------
Section 3.  Process Information                                        / Not Needed

A. An then spittle industrial segments (t.g.  manufaetun, processing into a coating, end tut as a
paint in automotive application) you would like process information far?
2.
3.
4.
fl.  Please specify the information you would like CEB to provide:
D     Number of sites                     Q     Days/yr
D     Throughput (kg/site-day)             D     Process Description
D     Flow Diagram                      O     Other (please specify)
Section 4.  Environmental Release Assessment                        /  Not Needed

A.  CEB will provide estimates of environmental release (t.e. kg/site-day or kg/yr)for manufacture,
processing and end use operations.  Indicate any specific industry segments of interest or special data
needs: 	
B.  Indicate / which types of releases an of interest, and indicate any special needs:
D     Water releases               Q     Air releases
D     Landfill releases             D     Incineration releases
D     Other: 	
 Special Needs:
 C.  CEB will attempt to provide descriptors for release assessments, identify assumptions made, and
 characterize uncertainty, as data and methodologies allow. Identify any specific needs:  	
     12-64}. R*»w4                                13

-------
Section 5. Pollution Prevention Assessment (PPA)/Occupational Exposure Reduction
Assessment (OERA)                                                 j Not Needed

An there specific industrial segments you would like CEB to provide an assessment of pollution
prevention opportunities and/or occupational exposure reduction for?

             D PPA                   a  OERA                 a Both
1.  	__
2.  	,	:	
3.  	;	
Section 6.  Other Information Needs                                / Not Needed

Please identify other information,  analysis or data needed,  and the rationale for requiring the
information:	
Customer Contact (e.g. Project Manager):
 Ca/fw lnm»*
w
(Namt)             (IXvUat/BnuitlQ    (TiUflmu)   (DtU)
                                          14

-------
                                 STEP 2:  COLLECT DATA
       Once the data requirements of the assessment are preliminarily identified, the next step is to
collect the monitoring data that will be used in the analysis. It is important to obtain information on all
variables relating to the measured values, such as the collection method, number of workers exposed,
duration of the sampling, etc.  Step 4 contains a listing of parameters that may affect exposure.  The
more data that are identified and collected, the better the analysis will be.  Therefore, it is important to
ascertain at the beginning of the project that all possible sources of data have been checked.

       Typical sources of exposure monitoring data include the National Institute for Occupational Safety
and Health (NIOSH), the Occupational Safety and Health Administration (OSHA), the Environmental
Protection Agency (EPA),  other federal agencies  or departments, state agencies, trade associations,
unions, journal articles, and individual companies in the industry.
A.     Obtaining Data From NIQSH

       For existing chemicals that have been studied by NIOSH, Health Hazard Evaluations (HHEs) and
Industry  Wide  Surveys (IWSs) usually  represent the largest body of complete and extremely well
documented data. NIOSH reports usually include most of the information necessary to fully classify data.
In cases where the chemical of interest was not the primary reason for the NIOSH report, but rather only
measured as a  secondary chemical, information may have to be rilled in by direct contact with the
inspector. In addition, it may also be necessary  to confirm the  presence of the chemical in all  areas
monitored if a  large quantity of nondetected values are recorded.  Since HHEs are generally done in
response to a complaint regarding a specific chemical, the data may not be random in selection.  IWSs
tend to be well selected to represent an industry, but may be biased if only well controlled facilities were
monitored.  NIOSH Control Technology Assessment reports are developed to identify and evaluate
appropriate control measures and may be biased toward facilities that are well-controlled.  Contact wim
NIOSH can usually identity any potential biases.  NIOSH tends  to take many  samples per visit as
contrasted with OSHA which typically only takes a few measurements.

       In general, NIOSH inspectors are easy to locate and will have worked on more than one of the
surveys, so that multiple information can be gathered from each contact. Where contact cannot be made,
it is usually acceptable to assume mat the NIOSH  collection and analytical method recommended at the
time was used to collect the data.  NIOSH may also have unpublished data or studies that are in progress;
contact with NIOSH  personnel who have been or are working on the  chemical can  thus  result in
additional unpublished monitoring data. The best source of NIOSH reports is the NIOSHTIC data base,
which is available mrough DIALOG or on computer disk. In addition, the NIOSH Publications Catalog
can be manually reviewed to identify useful reports.  It may also be useful to obtain up-to-date published
and unpublished information available on microfiche and hardcopy from NIOSH.  Data may be obtained
from:
                                              15

-------
              LT.S. Department of Health and Human Services
              National Institute for Occupational Safety and Health
              Robert A. Taft Laboratories
              4676 Columbia Parkway
              Cincinnati, Ohio 45226
              (800) 35-NIOSH
B.     Obtaining Data From QSHA                                        .

       The largest number of measurements for an existing chemical is generally located through
accessing the OSHA National Health Sampling Results by Inspection (OSHA report: OHR 2.6). These
data can be obtained by written request to:

              U.S. Department of Labor
              Occupational Safety and Health Administration
              Director, Office of Management Data Systems
              Room N3661
              200 Constitution Ave., N.W.
              Washington, D.C.  20210
              (202) 219-7008

Information provided for each facility includes company name and address, SIC code, inspector code,
OSHA office, date and reason for visit, job title, exposure value, number of similarly exposed workers
at the time of the inspection, and type of exposure (peak/8-hour TWA, personal/area).  No information
is provided on controls, type of process, monitoring method, concentration of chemical in process, or
demographics of the exposed workers. The sampling and analytical method and limit of detection  may
not be available.   Where the sampling  and analytical method cannot be  ascertained, it is usually
acceptable to assume that the method used is that specified by OSHA in me OSHA Technical Manual at
the time the survey was used (OSHA, 90). The methods specified in this publication are in most cases
from either the NIOSH Manual of Analytical Methods (NIOSH, 84) or the OSHA Manual of Analytical
Methods (OSHA, unpublished).  Unlike NIOSH, OSHA usually collects only one or two samples per
chemical during each inspection. In many cases, the job tide or SIC may uniquely define the use of the
chemical (e.g., degreaser operator or SIC 7216, Dry Cleaning Plants), but most data require that some
assumptions be made for categorization.  In addition, the data may include large quantities of nondetects
and SIC codes may be inconsistently applied. If time and budget permit, it is best to contact the OSHA
inspector.  Because the inspector at the local OSHA office must be called and few summaries  are from
the same inspector, mis process can be time consuming. Also, inspectors may be difficult to locate, files
may be stored away, or the inspector may not remember details of me facility. Many states (23 to date)
operate their own OSHA State Programs which must be "at least as effective as" the federal  program.
However, these State plans have historically not had data in  this OSHA data base. OSHA's Publication
Catalog can also be reviewed, and  up-to-date  information (including  NIOSH studies) may also be
available from:
                                             16

-------
              OSHA Technical Data Center
              Department of Labor
              200 Constitution Avenue, N.W.
              Room H-2625
              Washington, O.G.  20210
              (202) 219-7500
C.     Other Sources of Data

       Monitoring data may also be available from previous and ongoing EPA studies. Previous reports
done by OPPT (formerly OTS) may contain occupational exposure data. Usually the data will have been
summarized and the primary data will have to be obtained separately.  It is important to obtain primary
data to avoid the duplication of data from other sources. Information submitted under Sections 4, 8(a),
and 8(d) of TSCA may be useful in preparing the exposure assessment.  Non-confidential information
submitted under TSCA may be obtained through the TSCA Non-Confidential Information Center at
(202) 260-7099. The Office of Air Quality Planning and Standards (OAQPS) may have collected some
exposure data through the use of Section 114 letters.  Information about OAQPS Section 114 letters can
be obtained by contacting the Emissions Standards Division at (919) 541-5571.

       Other federal agencies or departments may have collected exposure data.  For example, the Army
and Air Force have monitoring data on workers in a wide variety of job categories. These data may be
obtained by contacting  the following departments:

       Army:               Assistant Secretary of the Army
                            (Installations, Logistics and Environment)
                            Ann:  SAILE(ESOH)
                            1  110 Army Pentagon
                            Washington, D.C.  20310-0110
                            (703) 614-8464

       Air Force:            HQ AFMOA SGPA (BEES)
                            170 Luke Avenue
                            Boiling AFB
                            Washington, D.C.  20332-5113
                            (202) 767-1731

       MSHA:              Mine Safety and Health Administration
                            Metal/Nonmetal, Division of Health
                            4015 Wilson Blvd.
                            Arlington, VA 22203-1984
                            (703) 235-8307    .
                                            17

-------
                             Mine Safety and Health Administration
                             Coal, Division of Health
                             4015 Wilson Blvd.
                             Arlington, VA 22203-1984
                             (703) 235-1358

       State environmental and occupational  safety  agencies concerned  with both  environmental
protection and worker health may have monitoring data.  This is especially true if there is a concentration
of the industry under study in a state.

       Trade associations often collect and evaluate monitoring data from their members.  In many cases
the association may not allow access to the primary data and will provide only summaries of the data,
thus limiting its usefulness.  Even if the data cannot be incorporated in the direct analysis, however, it
can be used for comparison with the results of other analyses.  An extensive listing of trade associations
is contained in the Encyclopedia of Associations (Koek, 88).

       Unions often are the driving force behind the investigation of a particular chemical. In such cases
they may have obtained exposure measurements from companies with which they have contracts.  Direct
contact with die union in question is the best method to obtain these data.

       Data may also be identified from journal articles. On-line data bases that can be useful to identity
exposure data include  BIOSIS, CA Search, EMBASE, Enviroline, Medline, NIOSHTIC, NTIS, and
Pollution Abstracts.  These sources almost never present the primary data and the necessary ancillary
information, so the author will usually have to be contacted if primary data are necessary.

       Finally, if plant visits are being conducted or plants are being contacted to provide information
for the study, they may also be asked to voluntarily provide monitoring data. Such contacts are of course
limited by Office of Management and Budget (OMB) oversight under the provisions of the Paperwork
Reduction Act. Plants  may also be surveyed in the form of OMB approved questionnaires or telephone
surveys.
                                          EXAMPLE

          For the example chemical, worker exposure data were obtained from NIOSH, OSHA, a
   previous contractor report for EPA, and the union representing workers at several facilities.  The
   data were generally not primary monitoring results  but only summaries of the data giving means
   and number of samples for ranges (i.e.,  Type 3  data).   The user needs identified in Step 1,
   however, called for the types of results only available by analysis of Type 1 data.  Therefore, new
   monitoring data had to be collected for the industry.  The available and new data form the basis for
   the analyses shown in the example in the following  steps.
                                               18

-------
                               STEP 3:  DEFINE DATA NEEDS
       By the time the initial data collection has been finished, the completed "statement of needs for
occupational exposure assessment" form (Figure 4) should have been received from the project manager.
This form and any other information provided should be used to formally define the data needs of the
assessment. A preliminary determination should be made by the CEB engineer as to whether the existing
data are "in the ball park" or if significant changes in data collection resources  or expectations of the
project manager are needed.  A more detailed assessment of whether the user needs can be met will be
made in Step 11.

       If it is apparent that the exposure data are inadequate to meet the needs set form in the statement
of needs form, then the CEB engineer should inform the project manager that expectations should be
modified to match the existing data or outline approaches and resource implications to  meet those needs.

       It is important to be responsive to requests for specific  statistics in the assessment.  For instance,
it is typical for exposure data to be summarized by calculating the geometric mean. Exposures tend to
follow a lognormai  distribution and the geometric mean is the value mat represents the most "middle"
value in such a-distribution.  However, if the concern of the end user is with total dose rather than win
typical exposure levels, the arithmetic mean may be a more appropriate measure of central tendency, and
should be provided with the assessment.
                                              19

-------
                                       EXAMPLE

       For the example chemical, several key issues were identified in the information supplied
by the end users:

       • Exposure of workers in the industry was of more interest than exposure of the general
          population.

       • Worker exposure in the monomer industry was of more interest than worker exposure
          in the polymerization process.  Worker exposure in handling of the finished polymer
          was of least interest.

       • EPA was considering risk management options under TSCA. Since exposure may be
          limited to workers, a referral to  OSHA was also possible.  OSHA had no ongoing
          activities for the chemical  at this time.

       • Only inhalation exposure was of interest at this time.

       • Only long-term exposure was of interest at this time.

       • Specific descriptive statistics were requested.

       Because the only data available were of Type 3, it was therefore necessary to conduct a
monitoring program to obtain sufficient Type 1 data to conduct the types of analyses necessary to
meet these needs.
                                            20

-------
                STEP 4: IDENTIFY PARAMETERS AFFECTING EXPOSURE


       Prior to statistical analysis,  monitoring results must be classified into categories containing
sufficient and reliable data so that meaningful analyses can be conducted (EPA, 87).  The classification
and organization of occupational exposure monitoring data are extremely important to the analysis and
to the usefulness of the data for the end user. The classification and organization processes can be seen
as the result of a compromise between two competing goals.
                                                                          •
       The first goal is to completely define the data set.  If this were the single goal, the only data
included would be those for which all parameters that can influence worker exposure were known, thus
allowing definition of categories based on differences induced by all of these  variables.  For example,
each category could be uniquely defined by process type, job title, worker activities, ambient control type
(e.g., carbon adsorber), occupational control type (e.g., local exhaust ventilation), collection method,
concentration of chemical in the process, demographics of  the exposed worker, date the sample was
taken, and any other parameter that could affect exposure or risk. The categories so defined would yield
groups of exposure measurements (or groups of individual workers) expected to have the same or a
similar exposure profile.  Stated another way, the first goal is to define subsets of the data such that data
within each subset are measuring the same thing,  i.e.,  the subsets define homogeneous categories.
Categories that are defined based on too few categorizing variables may lump together data that are not
homogeneous.

       The second goal, however, is to get categories with sufficient numbers of observations to allow
meaningful statistical analyses.  The power of any statistical  analysis  is greatly affected by sample size;
large uncertainty  can result when data sets  are too small.   The ability  to  make  generalizations
(extrapolations) is also limited  when sample sizes  are small.   The number of observations within
categories is inversely related to the number of categories (which is directly  related  to the number of
parameters used to define the categories).  Sample size is also reduced if observations have to be excluded
from consideration because the values of variables potentially affecting those  observations are missing
or unknown.

       The approach to balancing these two conflicting goals presented here has an industrial hygiene
(qualitative) component and  a statistical component  The industrial hygiene component is described in
Step 4. The statistical component, described in Step 15, verifies the results of the industrial hygiene-
based component and suggests possible re-categorization.

       Thus, Step 4 consists of the critical process of identifying those parameters that are important in
influencing worker exposure to the chemical under study. These exposure parameters will be used to
define the categories (subsets or subpopulations) into which the exposure data will be classified.

       CEB often develops categories  of individuals  with the same or similar exposure by first
identifying the industrial process or unit operation during which exposure to the substance occurs, then
identifying specific work activities or tasks associated with exposure, and identifying (or estimating) those
workers associated with the activity or task, incorporating other information as appropriate.  If monitoring

                                              21

-------
data are available and job descriptions or job titles are given for the data,  the engineer will need-to
evaluate whether the job description or job title can be directly linked to a specific work activity or task.
There are cases where the job title or description does reflect the work activity, but the converse is also
true where  job  tides  or job descriptions may be broader  than the  activities linked directly to the
monitoring (Hawkins,  91).

        If the job title is associated with a specific work activity, the engineer may determine that creating
categories by industrial process/unit operation/job title/work activity/control type/etc, is appropriate. If
the job title or description is not associated with a certain task or work activity, die engineer should try
to obtain information on  work  activities associated  with a  personnel job  title  or description.   If
appropriate, an alternative is to make assumptions about die activities associated with the job title, based
on knowledge of die process, professional judgment, etc.  These assumptions should be fully documented
and evaluated widi other assumptions made during die assessment (see Step 5). It should also be noted
that the  identification  of important exposure  parameters is often refined as additional  information is
gathered during die exposure assessment.

        Occupational control type is a variable mat may affect worker exposure and which should often
be considered when defining a classification scheme for exposure data.

        The categories should also be designed wim user needs in mind. This may include consideration
of parameters that relate to risk assessment and regulatory considerations.  All potential parameters will
be used to create die preliminary exposure data matrix in Step 6.

        A distinction may sometimes be made between exposure parameters mat can  be  considered
"explanatory" as opposed to those that are merely "blocking" factors.  For example, it may be the case
that exposures differ from one company to anomer, across plants, or wim time.  Although a statistical
analysis may determine that plant-to-plant differences are significant, die factor, plant, does not "explain"
why the exposures are different.  Plant is not an explanatory parameter, it is what can be referred to as
a blocking factor; the plant-to-plant differences may be present because of differences in occupational or
ambient controls or other unknown factors that are directly related to exposure concentrations. Blocking
factors  are  merely parameters wimin which exposures are expected to  be similar.  The factors that
contribute to plant-to-plant  differences,  for example, may not be known or  identified, and so it may
sometimes be die case that such blocking variables need to be retained to account for differences in
exposure levels.  Nevertheless, me engineer is encouraged to identify explanatory parameters for die
purposes of categorization.  Retention of some blocking variables may be suggested, but their importance
(as well as die importance of the proposed explanatory variables) will be tested statistically in Step  15.

        The engineer should also consider die relative importance of die exposure factors considered for
the classification.  Based on his or  her knowledge of the industry and die processes entailing exposure,
he or she may be able to suggest that a small set of explanatory (and, perhaps, blocking factors) will be
the most important for determining  exposure.  Parameters identified by the end user as important should
be considered for the categorization, although, as discussed in Step 11, die expectations of the user may
have to be  modified  in accordance with die  availability of pertinent data.  Job tide, work practices.
occupational controls,  and production levels are typical examples of important parameters. One purpose

                                               22

-------
of ranking the variables is to prioritize collection of additional information in these areas where necessary
(see Steps 8 and 9).

        Ideally, for risk assessment purposes, the  exposure profiles for  each exposed subpopulatioo
defined by the parameters identified in this step should include the size of the group, the make-up of the
group (age,  sex, etc.), the source of the chemical, exposure pathways, the frequency and the intensity
of exposure by each  route (dermal, inhalation, etc.), the duration of exposure, and the form of the
chemical when exposure occurs. Assumptions and uncertainties associated with each scenario and profile
should be recorded and clearly discussed in the results presentation (EPA,  87).  •

        The following parameters are presented as guidance to the CEB engineer as typical variables that
can affect exposure and  may be important in determining categories of similarly exposed individuals.
They  are presented  in general order of their typical importance, but the actual  importance of the
parameter must be determined by the CEB engineer for the specific chemical and use.
        •  Type of
           sample

        •  Process type
        •  Job tide
        •  Worker
           activities

        •  Worker
           location

        •  Occupational
           control type
           (workplace
           practices)

        •  Exposure
           period
Sample type such as personal, area, ceiling, peak, etc. should be
defined.  In general, different sample types are not combined.

Process should be defined by all characteristics that are likely to
affect exposure.  Examples include machine type (e.g., open-top vs.
conveyorized degreaser), age of equipment, usage rate, and product
(e.g., printing on paper vs. plastic).

Job .title is usually given with the monitoring data and may require
combination of similar job descriptions (e.g., printer, letterpress
operator, and press operator could be combined into a single
category).

Within a given job title, activities performed by the workers may
vary in a significant way that can directly affect exposure.

The approximate location of the worker with respect to the source
of the exposure is an important factor.

Controls such as local exhaust ventilation (LEV) or general
ventilation directly affect measured exposure. Other controls  such
such as respirators do not generally affect measured exposure but do
affect actual worker exposure.

The time period the worker is exposed to the chemical in a workday
directly affects exposure.  Frequency and duration  of exposure are
also  important factors.
                                               23

-------
•   Production
    levels
  Exposure can relate directly to the volume of production at the
  facility.
   Operating         - Total exposure relates directly to these variables.
   frequency and
   duration
•  Concentration of
   chemicals in the
   process
  The concentration of the chemical can directly affect the exposure
  of the workers.  Such information is seldom available, however.
•   Sampling strategy  • The duration of the sampling and die sampling strategy can affect
                        die accuracy of the measurements in characterizing die exposure.
•   Ambient control
    type
•  Company and
   location
•   Date of
    measurement
  Although such controls are installed primarily to reduce release of
  the chemical to the ambient air (e.g.,  refrigerated condenser, carbon
  adsorber, or baghouse), they may also increase or decrease
  occupational exposure.

  Variables such as local regulations, differences between large and
  small companies, and regional differences in processes can affect
  worker exposure.

  The date die measurement was taken can be indicative of die
  measurement method, die controls in use, and die effect of natural
  ventilation or odier factors.
    Sample collection  -  Different collection methods, sampling times, validated range of die
                         method, or method analytical techniques can affect die accuracy of
                         die measurement and die detection limit.
•   Source of data
- Analysis by source of die data can help to identify potential biases
  in die data. Biases that are not evident in die review of data in
  Step 5 may be identified in Step 16.
    Demographics of  -If healdi effects data show that a particular demographic group is
    die exposed worker   susceptible (e.g., women of childbearing age), dien whenever
                         possible data should be categorized using dm information.  While
                         this is not typically needed in an exposure assessment, it may be
                         needed for a later healdi risk assessment.
•  Industry
- While four-digit SIC is preferable to two-digit SIC, OPPT
  assessments often focus on individual companies and/or facilities.
                                        24

-------
        Other              .  Depending on the process, controls implemented primarily for other
                             substances may also reduce exposure to the substance of concern
                             (e.g.,  LEV at the raw material transfer operation).
                                       EXAMPLE

       For the example data sec, the following were identified as potentially important parameters:

                      Sample type
                      Job title
                      Process type
                      Occupational control
                      Company
                      Sample collection method
                      Industry

       While data were  collected for other parameters discussed in this section, emphasis was
placed on verifying information  on these seven parameters.  Note that the "blocking" variables,
company and industry have been retained. Industry, in particular was retained because the end user
had specified that the monomer industry needed to be considered separately  from the polymer
industry.
                                            25

-------
           STEP 5:  IDENTIFY UNCERTAINTIES, ASSUMPTIONS,  AND BIASES
       Uncertainties and assumptions are identified and recorded to allow their clear recognition by die
end user.  This step initiates that process. All data should be examined for any characteristics that may
represent a nonrandom selection process or a systematic error (bias) in sampling or analysis.  It may be
helpful to  review the list of important parameters to assist in identifying uncertainties, assumptions, and
biases. All important uncertainties, assumptions, and biases are identified, and for purposes of grouping
like exposure, these should be as specific as possible.  In preparing the risk assessment, more general
information on uncertainties, assumptions, and biases may be  acceptable.  Uncertainties, assumptions,
and biases will be evaluated in Step 18 to determine any influence on estimates of worker exposure in
one or more groups. Steps 5 and  18 are extremely  important but may be difficult to execute.
A.     Uncertainties

       Examples of problems that give rise to typical uncertainties in the input and output of an exposure
analysis include:

       •     - Data manipulation errors either by the persons collecting the monitoring data or during
               the analysis.

       •      The inherent uncertainty  in a small data set (e.g., day-to-day and worker-to-worker
               variability are not accounted for).

       •      Uncertainties regarding differences  in  chemical  concentration, throughput, or other
               process related variables.

       •      Use of an unknown monitoring or analysis method.

       •      Assumptions made from secondary sources that were applied to the primary data.

       •      Uncertainties of values below the detection limit.

        •      Possible interference of other chemicals with a specific test method.

        •      Uncertainty regarding missing or incomplete information needed to  fully  define the
               exposure.

        •      The use of generic or surrogate data when site-specific data are not available.

        •      Errors in professional judgment.
                                              26

-------
        In evaluating and repotting uncertainty associated with measurements, the three most important
categories of errors are sampling errors, laboratory analysis errors, and data manipulation errors (EPA,
92). There are two kinds of sampling errors:  systematic errors (often referred to as biases) that result
from the sampling process, and random errors that result from the variability of both the population and
the sampling process.  While random error cannot be eliminated,  its effects can be minimized by using
sampling strategies and by having sufficiently large data sets.  Systematic errors can result from faulty
calibration of critical components such as flow meters, thermometers, pressure sensors, sieves, or other
sampling devices.
                                                                            •
        Other systematic errors  can result from contamination,  losses,  interactions with containers,
deteriorations, or displacement of phase or chemical equilibria (EPA, 92).

        Generally, laboratory errors are smaller than sampling errors.  Calibration is a major source of
systematic error in  analysis.  Other sources of error include chemical operations such as sample
dissolution, concentration, extraction, and reactions (EPA, 92).

        Data manipulation  errors  include  errors of  calculation, errors  of transposition, errors of
transmission, use  of  wrong  units,  use  of improper conversion factors, spatial or temporal averaging
information loss, and misassociation errors that confuse samples and numerical results.
B.      Assumptions

        Throughout the analysis, assumptions must be made about the data.  Many assumptions are made
in response to uncertainties identified in the data.  These assumptions must be clearly listed and their
effect on the results quantified if possible.  Examples of typical assumptions that are made during
exposure analysis include:

        •      That plants and workers were randomly selected and that they represent the industry as
               a whole.  (It should be noted that this is almost never true;  if it is known not to  be true,
               this assumption should not be made.)

        •      That the controls in place when the data were collected represent typically maintained
               controls.

        •      That the value selected for use for a nondetected measurement accurately represents the
               actual exposure at those facilities.

        •      That estimates of ancillary information garnered  from other sources also represent the
               facilities in die monitoring data set.

        •      That job activities performed during the exposure period represent typical activities for
               that job category.
                                                27

-------
               That estimates of the duration of tasks used to convert data to 8-hour TWA values are
               accurate.
C.     Biases

       Bias is a systematic error inherent in a method or caused by some feature of the measurement
system (EPA, 92).  Systematic errors in sample selection, sampling errors, laboratory analysis or data
manipulation can cause the results to be biased. If the facilities and workers were>not randomly selected
and the selection process documented, then die data may also contain biases.  Common features that may
introduce bias include:

       •      Systematic sampling, laboratory, or data manipulation errors that have been identified.

       •      Selection  of  only "well-controlled"  plants such as a NIOSH industry-wide survey
               conducted to  identify good control technology.

       •      Selection of only large facilities.

       •      Large disparity between the number of samples at different facilities (e.g., OSHA vt.
               NIOSH data) could lead to  bias, depending on how  the data are weighted and whether
               there are underlying sampling biases.

       •      Data that represent only OSHA complaint visits.

       •      When sampling for compliance widi a ceiling limit,  sampling workers with the highest
               potential for exposure.

       •      Selection of only plants that are members of a trade association.

       •      Selection of only companies that voluntarily supplied monitoring data.

       •      Averaging of a measurement representing many workers with a measurement representing
               few workers.

        •      Use of sampling or analytical methods at concentrations for which they are not validated.

        •      Sampling strategy bias towards compliance sampling.
 D.     Development of Uncertainty/Assumptions List

        In order to record and retain uncertainties, assumptions, and biases identified in die course of an
 occupational exposure assessment, a listing of the uncertainties and assumptions made at various steps


                                               28

-------
will be maintained.  This list is initiated in this step and will initially contain uncertainties/assumptions
associated with the data collection and classification.  For example, in Step 4, some assumptions may
have been required to relate job titles to specific activities. Moreover, there may have been uncertainties
about the exposure profiles (number of workers, demographics of workers, source of chemical, etc.) for
some of the groups defined by the important exposure parameters. These assumptions and uncertainties
will be recorded in the uncertainty/assumption list.

       In the course of following  the  guidelines defined in this document, other assumptions and
uncertainties will be identified.  All of them will be recorded on the uncertainty/assumption list for use
in Step 18 (Treatment of Uncertainties, Assumptions,  and Biases) and for presentation to the end-user
with the quantitative results.
                                           EXAMPLE

          For the example  chemical,  a very detailed  protocol and quality assurance plan  were
   developed to select the facilities at which monitoring data would be collected.  This protocol is
   more detailed than is typical but serves as an example of considerations that should be included to
   obtain a sample that is as representative as possible of the sample universe.

          For manufacture of the example chemical monomer, the sample universe consisted of tea
   companies  at 12 different plant locations.   A walk-through survey was conducted at ten plants
   representing a  100 percent sample of the ten producers.  The walk-through survey was used to
   gather  information that was  used .to select a smaller sample set at  which to conduct in-depth
   surveys.  Monitoring data were collected at these in-depth surveys.

          The purpose of the survey site selection strategy was to  obtain a representative subset of
   monomer plants from  which to characterize exposures by job title and  work environment.   To
   achieve this, the ten monomer production plants were divided into distinct subpopulations  (strata)
   representing differences in die work place environment.

          The strata were based on the presence or absence of three specific types of engineering
   controls, the mode of transportation (pipeline, rail car, tank truck, marine vessel) of die feed stock
   and product, and the existence of other production processes or final products at the plant. A single
   plant within each  stratum was selected based on a scoring system diat quantified the relative
   representativeness  of each site.  Four plants emerged  as best representing the diversity of work
   environments seen  in die  example chemical monomer industry.   In-depm surveys, including  the
   collection of monitoring data, were conducted at these four facilities.

          In the example data set, a serious potential bias in die analytical  method for the chemical
   was identified.   Potential interferences from C4 chemicals made the measurements taken using
   previous methods suspect.  Ways were investigated to mitigate die bias, but finally it was decided
   to exclude ail data taken using die older analytical methods.
                                                29

-------
               STEP 6: CREATE PRELIMINARY EXPOSURE DATA MATRIX
       All data should be entered into a usable matrix using a personal computer for analysis. Software
packages (spreadsheets, databases, etc.) are available with storage and retrieval capabilities that facilitate
data analysis calculations. The matrix should be designed to be compatible with statistical programs that
are likely to be used in the data analysis.  Many statistical analysis packages have their own data matrix
handling tools which provide a suitable, and  in some cases preferable, alternative for data management.
All parameters that were identified as having a potential impact on exposure, were requested by  the end
user, or were collected as ancillary information should be entered in the matrix. The use of a matrix will
allow identification of missing information for some observations.

       Inclusion of company name, plant location, and source of data in the data matrix is important
because it provides a recordkeeping approach to allow easy referral of data back to the particular plant
or study to obtain additional data. All potential variables should be entered into the data matrix  and the
field left blank when no data are found. Every effort should be made to fill in blanks in the matrix for
all variables identified  as important.  An extra field or two should be included in the matrix for calcu-
lations such as converting to consistent units (Step 7).  Also included would be any calculations made
using assumptions such as the conversion of the TWA for the sampled time to an 8-hour TWA.

       The exposure data matrix will be completed to the extent possible in Steps 7 through 9 by filling
in missing information (where appropriate) and converting to consistent units. The revised exposure data
matrix (Step 10) will serve to classify the data available and to assess die ability to meet the users' needs
(Step 11).  If possible,  the data in the matrix  will be used in the statistical analyses starting with Step 15.
                                          EXAMPLE

          Table 1 presents a partial example of the data matrix used in me example analysis. The full
   data set used in the analysis is presented in Appendix A.  Only data on the important variables are
   presented in Table 1; however, data on all variables are included on die computer spreadsheet.
                                               30

-------
                                                                                   Table I (continued)
Haul
ID
M4
Ml
Ml
Ml
M2
M2
M2
M)
M3
M4
M4
M2
M2
M2
M3
M3
M3
M3
M3
M3
M4
M4
IDO
Induauy (a)
Monomer 4
Monomer 5
Monomer 5
Monffmtr 5
Monomer 6
Monomer 6
Monomer 7
Monomer 8
Monomer •
Monomer 9
Monomer 9
Monomer 9
Monomer 10
Monomer 10
Monomer II
Monomer II
Protein Type
Peocaaeana
Control room
Control room
Control room
Control room
Control room
Control room
Control room
Control room
| parting area
loading area
Loading area - railcar
tTling area - raikar
Loadifti area * railcar
Loading area/eemi*tracior trailer
l"44ing area/ecmi-tracior Inikr
L~*ittfH aica/iemi-wacior inikr
Loading area
Loading area
lob Title
Proceia lechniciao
Proccea technician
Proceai technician
Procaaa technician
Proceat technician
Proceae Uchaician
PHfCttgU Intrhfniriknl
PVQCAftU mtrhBnfifltt
Procaaa terhnirian
Prm-ttt k ia
Proceea irrhnirian
Proccai technician
Proccaa icchnician

Procaia tefhnic iin
Proceae icchnician
Proceat technician
Prnceaa lechuician
Proceae icchnician

NOTE: Source of data: NIOSH/EFA. Laboratory aoalyaii limit of detection ranged (ram 2 la
(a) IDO - Initial cMcgorica
tfhk Thr fnllnwin* Mr Ihr rnninJ iviwi- It i-mtinJIcd. 7k uacnatmHfA •»> lahonbwv with 1?
Control
Typc(b)
2
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
Duration (min)
420
466
447
455
451
452
442
425
453
449
415
421
427
474
260
442
4M
474
446
443
459
8-hr TWA
(ppm)
£0.14
£0.02
£0.02
£0.02
0.25
£0.08
0.52
£0.04
£0.11
1.17
1.70
0.50
1.44
1.29
£0.11
£0.12
0.46
2.40
5.46
0.01
123.57
3.97
Coolrul Description
Singk mechanical acali 4k opto-loop bomb atnpliog
General room ventilation '
General room ventilation
General room ventilation
General mum ventilation
General room ventilation
General mom ventilation
General room ventilation
General room veatilaiioa
General room ventilation
Oenenl room ventilation
Slip-tube gauge
Slip-lube gauge
SUp-tube gauge
Magnetic gauge
Magnetic gauge
Rotameler vauve
Rotameter gauge
Rotameicr gauge
Slip-lube gauge
Slip-tube gauge
1 1 pg/aa«o**, depending on the day of me analyaia.
ttir ^h»n»»aAftr A\ IAA4L Mttkj>_AM ml* in l.luu^a^Hj <\ Cnflt 	 1.- 	 :_ :- l-l 	 . 	 *\ *.Hm 	 1.. ..- ..,
in laboratory.

-------
              STEP 7: CHECK FOR CONSISTENCY AND REASONABLENESS


       Once tht data have been loaded  into the spreadsheet, the next step  is to check them for
consistency  and reasonableness.  It  is recommended that, first,  all the exposure measurements be
converted to consistent units.  This step describes some of the considerations related to conversion of
units and the types of checks that can be made subsequently to verify that die results are reasonable.

       For  conversion of units, typically a standardized procedure  consisting of grouping similar types
of data, conversion to consistent concentration units, and conversion to consistent exposure periods can
be used.  For some data, however, all the information necessary to do the conversions is not known (e.g.,
actual exposure time period). In many of these cases, assumptions can be made that will allow use of
the data in the analysis. All such assumptions should be recorded in the uncertainty/assumption list.

       The general approach for conversion of data into consistent units is the following:

       • Grouping of like types of data (e.g.,  15 minute, long term, area, personal),

       • Conversion to consistent concentration units (e.g., mg/ra3 or ppra),

       • Conversion to consistent exposure periods when defensible (e.g., 8-hour TWA), and

       • Estimation of missing information.


A.     Grouping of Like Types of Data

       It is extremely important that different types of samples not be averaged. For example,  area
samples generally do not represent personal exposure, and 15-minute peak and ceiling sampling should
not be adjusted to represent full shift exposure. Specific data groupings that usually form like data sets
and, as a general rule, should never be pooled into a single data set include:

       • Area samples
       • Personal samples
       • Short term exposure estimates
        • Long term exposure estimates
                                          EXAMPLE

          In the example data set only personal TWA samples will be used.
                                              33

-------
B.     (Aversion to Con«kTfim r^ncentration Units

       The end user should be consulted for guidance on preferable reporting units early in the project.
Occupational exposure monitoring data are typically reported in either ppm or mg/m3. NIOSH reports
and journal articles report the occupational exposure values  in either ppm or mg/m3, while OSHA
Inspection Summary  Reports almost always report occupational  exposure  values in ppm.   Before
conducting statistical analysis on different data sets, all measurements need to be converted into similar
units. Values in ppm can be converted to mg/ra3 by the following equation:

                         mg/m3- ppm  x  ***  x _L  x   298
                           8     VV      24.45     760    (T+273)

       where:

       P       =      barometric pressure (mm Hg) of air sampled;
       T       =      workplace temperature (°C) of air sampled;
       24.45   =      molar volume (liter/g-mole) at 25°C and 760 mm Hg;
       MW    =      molecular weight (g/g-mole);
       760     =      standard pressure (mm Hg); and
       298     =      standard temperature (°K).
                                        EXAMPLE

       Consider a case in which a chemical concentration is reported to be 5 ppm at a pressure of
    760 mm Hg and 25°C.  The molecular weight of the example chemical is 54.1 g/g-mole. The
    occupational exposure can then be converted from ppm by the following equation:

                       mg/m3 - 5 ppm x  *LI x  22 x     298
                         v      •» w»    245    760     (25 „  273)

                       mg/m3 * 5 ppm x  2.213


    Therefore, for the example chemical, a concentration of 5 ppm is equivalent to a concentration
    of 11.1 mg/m3.
C.     Conversion to Consistent Exposure Periods

       NIOSH and OSHA exposure limits for chemicals are often based on 8-hour TWAs; therefore,
occupational exposure monitoring data are often converted into 8-hour TWAs in order to compare worker
exposures to these regulatory or recommended limits.   Monitoring data collected from OSHA are

                                            34

-------
typically reported as 8-hour TWAs because they are sampled for compliance with an 8-hour TWA
Permissible Exposure Limit (PEL).  OSHA TWA measurements may utilize a zero exposure for the
unsampled portion of the 8-hour day when calculating the TWA.  It may be useful to determine whether
the sample represents an actual  8-hour sample or an 8-hour TWA.  Some NIOSH reports and journal
articles present data collected for less than an 8-hour time period.  The measurement samples are literally
only representative of the exposure period actually sampled. However, professional judgment or reliable
knowledge may sometimes be used to extrapolate data collected for shorter time periods to an  8-hour
TWA (Patty, 81). Where the exposure during the shorter period is representative of the exposure during
the entire work period and the length of the work period is known, exposure values can be converted into
8-hour TWAs based on the shorter exposure duration.

       Based upon the job description in the NIOSH report or journal article, an estimate of the number
of worker hours per day related to each job category may be estimated.  This should be done with caution
as many times the sampling time  was dictated by the analytical method or other cause not related to
exposure and is not representative of the entire day.  If the measurement sample is judged to be
representative of the exposure period and the exposure period is less than 8 hours, then an exposure value
not already reported as an 8-hour TWA can be adjusted to an 8-hour TWA as follows:

               -     8-hour TWA » exposed value  x exposed hours per day


This approach is only valid when you can assume that mere was no exposure during the remainder of the
workday. This is a key assumption that should not be made  without good information indicating that this
is indeed the case.

       Peak and ceiling measurements should never be converted to 8-hour TWA exposures.  These
measurements are best taken in  a nonrandom fashion. That is, ail available knowledge relating to the
area, individual, and process being sampled are utilized to  obtain samples during periods of maximum
expected exposure (Patty, 81).  Therefore these  measurements by design are  not representative of the
longer work period.  They are representative only of the time period over which they are taken, which
usually corresponds  to an applicable standard for peak or ceiling exposure.
                                         EXAMPLE

          While most samples were taken to represent 8-hour TWA exposures,  some were not.
   Information gathered during the plant visit was used to estimate the  exposure  period for those
   measurements that did not represent 8-hour TWAs.
                                              35

-------
D.

       Many times the conversion of data to consistent units involves the need to make assumptions
about the process or the worker activities.  For example, the conversion from mg/m* to ppm requires
knowledge of the workplace temperature. If this is not given in the report, an engineering judgment must
be made  as to the typical temperatures in die work area. Other data may indicate that die sample time
was 2 hours but not indicate if the job was performed for 2 hours or 8 hours per day. Again, engineering
judgment of typical practices in that industry may have to be used to estimate die exposure period.
                                                                          *
       Since such assumptions can have large influences on the exposure value, all assumptions should
be recorded in the uncertainty/assumption list and presented with the results of the analysis.  Where
assumptions have been made in such calculations, ranges of possible values can be estimated for later
sensitivity analysis. For example, an assumption for one worker can be made based on data from other
workers  with the same potential for exposure.  If the data for  the other workers indicated a period of
exposure ranging from 2 hours to 8 hours, then it is possible  that the exposure period of this worker
could range from 2 to 8 hours  as well.  Exposure values for these extreme times can be calculated and
the results tested for sensitivity to the assumption (see Step 18).  All data where assumptions need to be
made for important parameters should be classified as Type 2 data.

       Typical -default values that can be assumed where there is no information to die contrary are:

       • Where  die  monitoring method is  unknown,  die predominant  method  used  for that
          agency/company during die appropriate time period may be assumed to have been used.

       • Where there is no information to die contrary, ambient temperature and pressure (298°K, 760
          mm Hg) may be assumed.

       Where assumptions cannot be made because of lack of knowledge of me process or job activity,
then these data should be classified as Type 3 or incomplete data.  Classification as Type 3 results in
values being excluded from the analysis.
                                          EXAMPLE

          Because  EPA and NIOSH collected die data used  in die analysis specifically for die
   analysis, no information needed to be estimated.
 E.      Checks for Consistency and Reasonableness

        Data manipulation errors are caused by calculation errors, errors of transposition,  errors of
 transmission, use of wrong units, use of improper conversion factors,  spatial or temporal averaging
 information loss, and raisassociation errors that confuse samples and numerical results (EPA, 92).  Some

                                              36

-------
of these errors can be identified by comparison with known standards. While most chemicals will not
have all of the following parameters, comparison with those that do will help to flag possible data
manipulation erron:   .

       •  Immediately Dangerous to Life or Health (1DLH)

       •  Analytical limit of detection

       •  Lower or Upper Explosive Limits (LEL, UEL)

       •  Applicable standards (OSHA PEL, ACGffl TLV, NIOSH REL, STEL, ceiling, etc.)

       Data that  appear to be outside of typical limits such as these may be outliers and should be
rechecked for the accuracy of the value. The use of incorrect units for the data is one of the biggest
causes for such errors, and verification of the value and units can usually substantiate the data.

       Additional tests for outliers are discussed in Step 15.
                                         EXAMPLE

          For the example data set, the monitored levels were far below any regulatory limits
   (IDLH » 20,000 ppm; OSHA PEL =» 1,000 ppm) and the limit of detection of the new analytical
   method was very low (0.0054 ppm).  A verification of the units, experience with other situations,
   and confidence that the disparity between the PEL and the measured units reflects a real situation,
   not an error in units, suggested that the monitored levels were reasonable.
                                              37

-------
                STEP 8:  COLLECT ADDITIONAL MISSING INFORMATION
       The purpose of this step is to fill data gaps in the matrix through the collection of additional
information.  Data points that lack specific information in the source document for parameters that are
judged important may be difficult to classify during analysis.  However, this missing information may
be available by direct contact with the inspector identified in the report. Obtaining missing information
may be as simple as properly classifying a process type or job description, or as difficult as identifying
the controls in use when the measurements  were taken.

       For NIOSH and OSHA reports, the  name or identification number of the inspector and the office
location is usually present on the report. Where feasible, direct contact with this person by telephone is
usually the best method to gather the data.  Some inspectors will request that a letter be sent requesting
release of the information under the Freedom of Information Act. For data from a trade association or
from one agency office where extraction of die primary data or ancillary information may  be  time
consuming, a written request or a trip to the location may be necessary. It is important to remember that
collection of all missing important variables  can change a Type 2 measurement to a Type 1 measurement.
                                         EXAMPLE

          For the example data set, the problem with the sensitivity and selectivity of the test method
   was so severe mat all new data using a new test method were necessary.  For most chemicals, mis
   would not be the case and the collection of additional information on important variables for the
   existing data helps to increase the size of the Type 1 data sets.
                                              38

-------
                STEP 9: ESTIMATE ADDITIONAL MISSING INFORMATION


        Data gaps in the exposure matrix  (i.e., missing ancillary information) can also be filled by
estimating missing information when appropriate. If data gaps in the matrix are in areas critical to the
accuracy of the assessment,  the scope of die assessment may need to be narrowed, or further data
collection may be necessary.  If data gaps are not critical and if it is not feasible to contact the inspector
or otherwise gather additional information, it may be appropriate to fill data gaps by making assumptions,
using surrogate data, or using professional judgment, etc.   Caution should be used when making
assumptions or using other approaches  to estimate  missing data as this may increase the uncertainty
associated with the assessment and/or cause outliers in the data set.  If an assumption is made for an
important variable, the data can only be used as Type 2 data. The use of assumptions, surrogate data,
professional judgment, or combinations of these methods must be clearly documented and the rationale
for each assumption or judgment given (via notations on the uncertainty/ assumption list).

        In the absence of data, CEB uses these methods to develop screening level estimates of exposure.
These screening level estimates generally err on the  conservative side (i.e., overestimate exposure) and
are used to determine whether potential exposures are of no concern and can be eliminated from further
consideration.  If the estimates are of concern,  additional data and information are gathered and die
estimates are refined if possible. Due to the uncertainty associated with these estimates, the assessment
must  be well characterized and used with caution.

        If surrogate data are used, the differences between the surrogate and the substance of concern
must  be small,  and the scenarios for which  exposure is estimated must be very similar or the same. If
conservative assumptions are used, the resulting exposures should be expressed appropriately using an
appropriate exposure descriptor. It is important to be aware of and explain how many assumptions are
used;  their influence on die  final  conclusions of die assessment will be evaluated in later steps.  The
mathematical product of several conservative assumptions is more conservative than one assumption alone
and can result in estimates that are unrealistically conservative bounding estimates (EPA, 92; IT, 91).

        The following present typical kinds of assumptions, use of surrogate data or information, or
professional judgments that may be made, as appropriate.

        • Process type - Other variables such  as process temperature, drying time, etc., could be used
          wife professional engineering judgment to make an estimate of the process type.

        • Occupational control type • Company practices and engineering controls in place could be used
          as surrogate information to estimate what was  being used during the time the sample was
          taken. This assumes die current process  and controls are the same or very similar to those
          used when the sample was taken.

        • Production levels • The average or range of production levels for the facility or industry could
          be used as a surrogate to estimate the production level when actual figures are not available.
          This assumes  the production levels are the same or very similar.

                                              39

-------
       Concentration of the chemical in the process - The average or range of concentrations in other
       processes could be used  as  a surrogate  for estimating the concentration in the process,
       assuming the processes and concentrations are very similar or the same.
                                       EXAMPLE

       The example data set was collected by NIOSH and EPA and all important parameters were
identified and data collected.  Therefore, a hypothetical example  will be used to illustrate the
process.

       In the hypothetical example the age of the equipment was identified as an important variable
for two reasons.  First, newer process equipment tends to contain dual mechanical seals and has
been shown to reduce fugitive release of the chemical, while older equipment does not. Second,
in this industry newer facilities are often better maintained than older facilities.

       Because the monitoring measurement in question was taken by OSHA,  the OSHA inspector
listed on  the inspection summary was called.  The inspector no longer worked for OSHA and the
person contacted at the local office could find nothing in the file for that facility to indicate the age
of the equipment.  It was  discovered that the facility was an older plant  An attempt to directly
contact the facility where the monitoring data were collected  indicated that the facility was closed
about a year ago.

       Because older facilities that are about to be closed generally have older equipment and tend
to be poorly maintained it was assumed that this measurement represented data  from a facility using
older equipment.  This assumption- is based on professional engineering judgment and knowledge
of the industry. The assumption and rationale would be documented within the assessment  and
presented with the results.
                                            40

-------
         STEPrO:  REVISE EXPOSURE MATRIX AND IDENTIFY DATA BY TYPE


       The exposure data matrix should be updated to reflect any changes entailed by the checks of
consistency and reasonableness, and to display the concentration measurements in consistent units.  la
addition, the exposure matrix can be modified to reflect the results of collecting additional information
or estimating the values of ancillary data.  At this point, the revised exposure matrix (in conjunction with
the uncertainty/assumption list which details the treatment of uncertain values and lists all assumptions
that have been made) should be indicative of the  modifications that have occurred in the first round of
updating the data.  As indicated in the next step, additional rounds may be conducted.

       Using the revised exposure matrix  as the basis for classification, the data are categorized as
Type 1, Type 2, or Type 3 data.  Recall that the categorization of worker exposure data into the three
distinct types is based on the following considerations:

       • Type I data consist of measurements for which all important parameters are available  Typical
          sources of Type  1 data include statistically valid studies, and NIOSH and OSHA data for
          which all important parameters can be determined.

       • Type 2 data consist of measurements where the important variables are not available but for
          which assumptions can be made to estimate them. For example, if the limit of detection is not
          known because the monitoring method  is not stated, OSHA or NIOSH measurements may be
          assumed to have been taken using the recommended method for the time period.  Typical
          sources  of Type 2 data include NIOSH and  OSHA  reports which contain incomplete
          information and  for which  the inspector  cannot be located or cannot provide the missing
          information.  Other typical sources include journal articles, state agencies, and other federal
          agencies or departments.

       • Type 3 data consist of measurement summaries, anecdotal data, estimation techniques, or other
          data for which the important variables are not known and cannot be estimated.  A typical
          example is a data summary provided by a trade association. The association will not allow
          access to the primary data, and many questions remain unanswered on how die data were
          collected and tabulated.

       The engineer will need to use professional judgement  in classifying the data, but all data should
be classified as either Type  1, Type 2 or Type 3.  If it is questionable which type best describes the data.
the data should be classified as a lower type. If new information is found that allow raising to a higher
type, this should be done at that time.

       When all data have been classified, it may be helpful to separate out the Type 3 data. A separate
Type 3 exposure matrix may be created. The Type 3 data will not be subject to any statistical analyses,
whereas Type 1 and, perhaps, Type 2 data will be analyzed.  If the user needs can be met, the Type 3
data will be treated as described in Step 12.
                                              41

-------
                                      EXAMPLE

       In the example data set, ail Type 3 data were excluded from the analysis due to potential
bias in the monitoring methods used. For the sake of the example, some of the excluded data will
be used in Step 12 to show how Type 3 data should be treated.
                                            42

-------
                    STEP U: ASSESS ABILITY TO MEET USER NEEDS
        The ability to meet the needs of the project manager is dependent on both the quantity and quality
of the data collected. User needs were preliminarily defined in Step 1 and formally defined in Step 3.
The purpose of this step is to formally determine if die assembled data are sufficient to meet the project
manager needs defined in Step 3.

        If there are insufficient data to meet the needs identified in Step 3, the project managers should
be informed that their expectations should be modified to match the existing data or additional resources
are needed to obtain the desired quality of data. If no decision can be reached, it may be appropriate to
stop work until a decision is made so  that resources  are not wasted on work that will not  meet the
specified needs.

        The most likely case is that most of the user needs can be met but that some requests will be
difficult to fulfill.  These potential difficulties should be identified in writing and sent to the project
manager.  The project manager can then reassess how important each  need is and estimate how much
additional effort, if any, should be expended to gather the necessary data.

        If the CEB  engineer is satisfied  that the data are sufficient to meet the end user needs, proceed
to Step 12.  It may be determined that those needs can be met even if Type 3 data  are all  that are
available. Typically, however, Type 1 or Type 2 data will be required. To obtain such data, additional
rounds of data collection, or further estimates of ancillary information may be approved.  If no additional
information can be obtained, then the exposure assessment should proceed to Step 19, Presentation of
Results, at which point a summary of the available data can be completed, detailing data deficiencies with
respect to the  end user needs.
                                          EXAMPLE

          For the example data set, the need to develop a new analytical method to account for a
   potential bias in the existing method as well as the need to collect new data caused a delay in the
   completion of the exposure assessment.  The end users were notified of this delay; they approved
   the data collection and analyses based on die new data.
                                               43

-------
                               STEP 12: TREAT TYPE 3 DATA
       If it is determined that user needs can be met, the next step is to use nonstatistical methods to
present Type 3 data and to give alternate ways to generate additional Type 3  exposure estimates for
comparison with existing estimates.   When a comprehensive assessment is  not needed  and all of the
individual monitoring data are Type 3 (i.e.,  many of the important variables  are not known and cannot
be estimated), no statistical analysis of the data should be done.  Although descriptive statistics could be
calculated for some Type 3 data sets, such analyses may mislead the end uses into  a  false sense of
confidence in these data.  The  preferred method is to describe the data qualitatively  in the report,
including its deficiencies, and any conclusions that can be drawn. A median and range may also be given
for each  data set.  Each Type 3  data set should be presented separately.  Preferred data  sets should be
identified and reasons given for the preference. In addition, any uncertainties, assumptions, and biases
should be clearly identified, using the uncertainty/assumption list initiated in Step 5.

       When only summary data, anecdotal data, or no monitoring data are found  for a  chemical, and
a comprehensive exposure assessment is needed, the resolution depends to a targe extent on the end use
of the assessment.  There are two primary options when there  are insufficient data to perform  the
analysis:

       • Collect  monitoring data (i.e., conduct a survey for segments for which no data are currently
          available; conduct a monitoring study, etc.)

       • Use other nonmonitoring methods

       When there are insufficient data, the best method is to collect the required monitoring data. This
alternative may not be viable as it can be extremely expensive and the time constraints on the analysis
may not  allow this option.  As a result, it is often necessary to use omer nonmonitoring methods.  These
include:

        • Modeling of the exposure

        • Use of a surrogate chemical or job type

        • Comparison with a workplace standard

        • Professional judgement

        Modeling of the worker exposure can be used to estimate exposure where no monitoring data are
available.  Almost never will there be sufficient data available to validate a model as real time release,
air movement and several  receptor monitoring data are necessary.  However, sometimes a  previously
validated model can then be used for other chemicals within the  stated constraints of  the model.  For
indoor exposures, such  models typically require the estimation of a release rate, room size,  ventilation
rate, and exposure duration.  When using models, the results should always be tested for reasonableness

                                               44

-------
against any available monitoring data or calculations based on surrogate monitoring data.  One advantage
of the model approach is that sensitivity analysis can be conducted to identify those factors that cause
large uncertainties in predicted exposures. A sensitivity analysis simply involves running the model using
a range of input variables and measuring how the results change as the input variables are changed.

        The use of monitoring data for a similar chemical as a surrogate is another approach when no
monitoring data are available for the chemical of interest.  A rough exposure estimate can be made by
adjusting the surrogate monitoring data for the differences in vapor pressure, molecular weight, and
concentration of the chemical in the process.  The degree of uncertainty in the approach depends on the
similarities between the chemical and its uses and the surrogate and its uses, and how well the worker
activities are understood in both situations. This approach is particularly useful in the analysis of new
chemicals where little or no actual exposure to the chemical has occurred (TT, 91).

        A final approach that can be used in the absence of monitoring data is to use professional
judgment  to develop a plausible exposure scenario based on knowledge  of the operation, or assume
compliance with the OSHA PEL for the substance.  When professional judgment is used to develop an
exposure scenario, no exposure descriptor is used, and the uncertainty associated with me assessment is
high. This type of assessment is characterized as a "what-if" scenario and the uncertainty associated with
the assessment must be carefully and fully communicated to the user.  When assuming compliance with
the OSHA PEL.  a search of the OSHA Computerized Information System (OCIS) database should be
conducted to check the assumption of compliance. The assessment should be characterized as a "what-if
scenario if  the assumption of  compliance cannot be supported  based on monitoring data or other'
documentation.  Engineers must be extremely careful to properly characterize the type of assessment
presented if compliance with the OSHA PEL is assumed. There are currently different OSHA PELs for
different industries, such as construction, agriculture, etc.  Currently, OSHA does not inspect facilities
with fewer than 11 employees.  If this approach is used and if compliance data or other data have been
evaluated, the workplace standard should be identified with an appropriate exposure descriptor.  The
uncertainty  of these methods is  high,  but when  properly used and presented,  these estimates are
acceptable for screening level assessments.

        The outcome of this step is a nonstatistical report that qualitatively describes the data, including
its deficiencies and any conclusions that can be drawn.  If there are Type 1  or Type 2 data, then proceed
to Step 13.  If not, then the nonstatistical report will be the primary result of the exposure assessment and
it can be presented as described in Step 19.
                                               45

-------
                                       EXAMPLE

       For the example chemical, some Type 3 data were available. The following gives examples
of how such data should be described:

       • Six companies completed  studies to determine exposure to the chemical.   Although
          attempts were made to obtain the original monitoring data, only summary results were
          made available.
                                                                       %
       • Although the data cannot be compared directly across several companies, the areas of
          higher exposure appear to  be 1) the monomer transfer and storage area,  2) the reactor
          area, 3) the recovery area,  and 4) the lab area.

       • One source states that release and exposure to the chemical in die solution polymerization
          process are very similar to those in  the emulsion process.

       • If monitoring summaries examined in the analysis are representative of levels at polymer
          plants, they imply that additional controls would not be required at typical polymer plants
          to  limit exposure to 10 ppm.
                                            46

-------
                        STEP 13: TREAT NONDETECTED VALUES
       Measurements that are recorded as nondetected are assigned a numerical value so that they can
be used to calculate descriptive statistics which characterize the data set. Care should be taken to ensure
that the chemical reported as nondetected was actually being used at the time.  Otherwise the descriptive
statistic that is calculated will be biased by inclusion of a site where the chemical was never used. The
first task in the treatment of nondetected values is to gather information on the analytical method.  If a
quality assurance plan was developed for the study, it may also contain useful information and should be
reviewed. The NIOSH Manual of Analytical Methods provides information on NIOSH analytical methods
(NIOSH, 84). That manual identifies the analytical method used for each chemical for which NIOSH has
developed an analytical method.  The  OSHA Technical Manual (OSHA,  90) and OSHA Chemical
Information File (OSHA, 85) provide information on current OSHA methods.  Information to gather
regarding the analytical method includes:

          Issue date.
          Applicability,
          Interferences,
          Other methods,
          Accuracy,
          Range and precision.
          Estimated limit of detection (rag/sample),
          Maximum sample volume (liter), and
          Evaluation of method.

If the issue date for the analytical method is after the date the sample was collected, the engineer should
determine what other analytical methods are used for this chemical.

       The second task in the treatment of nondetected values is me calculation of a representative value
(Crump,  78; Hornung, 90).  The limit of detection for these data must first be determined. There are
two ways in which a limit of detection may be reported:

       • The limit of detection of analytical equipment such as a gas chromatography (GC/MS, etc.),
          which is normally expressed in mg per sample, and

       • The sampling limit of detection in measuring workplace air concentrations, which is normally
          expressed in mg/m3 or ppm.

The sampling limit  of detection accounts for bom the analytical limit of detection  and the sample  air
volume and is the value needed for calculation^ purposes.  In many cases, however, this value is not
reported  directly. The sampling limit of detection will often vary from sample to sample if different
volumes  of air are collected.
                                             47

-------
        If the analytical method is not reported, the prevalent analytical method used at the time of the I
study should be assumed and this assumption recorded on the uncertainty/assumption list.  If the sample
volume is not reported, the maximum sample volume recommended in the analytical method could be
used for calculation^ purposes, and this assumption recorded as well.

        An analytical limit of detection is normally  specified  in a published sampling and analytical
method, and a sampling limit of detection can be calculated if the sample volume is known or can be
assumed.  The following equation is used:
                                                                          *

   Sampling limit of dectction (mg/m3)  - Analytical limit of detection (mg) x  1QQQ (liters/m*)
                                                     Air volume sampled (liters)
                                          EXAMPLE

          For the example chemical,  consider a case in which a 25.0-liter air  sample has been
   analyzed for the example chemical using NIOSH Method 1024, which has a reported analytical limit
   of detection of 0.0003 mg per sample.  The sampling limit of detection is therefore:

            Sampling limit s 0.0003 mg x 1000 liters/m3 » 0.012 mg/m9
            of detection           25.0 liters
        A reported or calculated sampling limit of detection should not be directly substituted for those
values reported as nondetectable because, by definition, such values are below die detection limit.  A
value lower than the sampling limit of detection must merefore be substituted for these values.  As
described by Hornung and Reed (Hornung, 90), the preferred method for calculating this value depends
upon the degree to which the data are skewed and the proportion of me data that is below detection limits.
The two methods are:

        1)     If the geometric standard deviation  of the monitoring data set is less  than 3.0,
               nondetectable values should be replaced by the limit of detection divided by the square
               root of two (LA/2).

        2)     If the data are highly skewed, with a geometric standard deviation of 3.0 or greater.
               nondetectable values should be replaced by half the detection limit (L/2).

        If 50% or more of the monitoring data are nondetectable, substitution of any value for these data
will result in biased estimates of the geometric mean and the geometric standard deviation (Hornung. 90).
If it is necessary to  calculate statistics using data sets with such a large proportion of nondetectable data,
the potential biases introduced by these calculations should be described when presenting the results of
the analyses. It should be noted that mere are other methods to address reporting limit of detection values
(Aitchison,  57; Cohen, 61; EPA, 92; Waters, 90).

                                               48

-------
                                       EXAMPLE

       Preliminary examination of the data, categorized by the important exposure parameters
(Step 4) indicated that geometric standard deviations tended to be at or above 3.0. Therefore, half
the detection limit was used for all example calculations to represent nondetected values.  That
choice was recorded on the list of uncertainties and assumptions.  The impact of choosing L/2 on
the analyses will be examined in Step 18.
                                             49

-------
               STEP 14: SEPARATE INTO TYPE I DATA AND TYPE 2 DATA
       In Step 10, data in the exposure matrix were classified as either Type 1, Type 2 or Type 3 data.
Type 1 data consist of measurements for which values of all important parameters are known. The data
consist of studies that contain individual measurements, and include all backup and ancillary information.
Type 2 data consist of measurements where values of important parameters are not known but for which
assumptions can  be made to  estimate  these variables.  The data consist of individual monitoring
measurements,  but backup and ancillary information is  highly variable.   No Type 3 data (summaries,
anecdotal, etc.) should be in the matrix.  All such data should have been excluded in Step 10.

       The data should now be sorted by the Type I/Type 2 classification and separate matrices formed
for each  type of data.  Type 2 data will only be used  for statistical analysis when mere are insufficient
Type 1 data to perform the analysis. The products of this step  are two separate matrices that will be used
in die statistical analysis.

       If only minimal Type 1 and Type 2 data exist, that together are still not sufficient for statistical
analysis, all data are treated as Type 3 data and the analysis returns to Step 12. In this case a qualitative
report that describes  the data, including  its  deficiencies and any  conclusions that can be drawn is
prepared, as described in Step  12.
                                          EXAMPLE

          For the example data set, all newly collected data were of Type  1.  Some previously
   collected data were Type 2 data and these will be considered as necessary (Step 16).
                                              SO

-------
                         STEP 15:  DEFINE GROUPS FOR ANALYSIS
        The purpose of this step is to identify the groups that will be the basic units for the calculation
of descriptive statistics.  Each group is intended to include measurements representing  samples from a
single distribution of concentrations; the descriptive statistics computed for that group pertain to that one
distribution.   The principal output  of the  application of these  guidelines will  be the group-specific
descriptive statistics.
                                                                            •
        The groups that result from this step are those that are determined to have as large a sample size
as possible given the characteristics of and differences  in exposures (e.g., those caused by effects of die
parameters identified by the engineer or industrial hygienist in Step 4).  Stated another way, the groups
will be as large as possible while minimizing variation within the groups relative to variation between the
groups.  Statistical approaches are described to perform the necessary calculations. The  initial grouping
that is an input for die statistical calculations is based on the important exposure parameters identified by
the engineer in  Step 4.   Combinations of die original categories may result in the definition of new
groupings that will be subject to statistical description.  Figure 5 presents  a flow diagram defining the
subtasks involved in the definition of the groups.
                                                51

-------
A     Mantjfa Initial Groaning

       For a given data set, the initial categories are determined by the important parameters identified
by the CEB  engineer or industrial hygienist in Step 4.  The initial categories  are defined by the
combinations of ail the important parameters.  Note that if there are many important parameters, there
could be very many initial categories which would tend to reduce the number of observations within any
given category.   The engineer  is encouraged to try to  reduce the  number  of. important parameters
considered. This may be accomplished, as discussed in Step 4, by eliminating from consideration as
many variables regarded only as "blocking" factors as possible. It is to be hoped that truly explanatory
variables can be found that account for much of the difference observed across blocks.
                                               53

-------
                                     EXAMPLE

       The data were collected for both the manufacture (monomer industry) and use (polymer
industry) of the chemical.  In the monomer industry, there were 209 measurements from four
plants (Ml, M2, M3, M4).  In the polymer industry, there were 578 measurements from five
plants (PI, P2, P3, P4, P5).  The total data set consisted of 787 measurements:  516 full-shift
personal samples,  37 short-term samples, and 232 area samples. For the example calculations,
only the 516 full-shift personal samples were used.  Values reported as nondetected were treated
as described in Step 13, and the value of L/2 was used in all calculations. -The value of L for
each nondetected measurement was determined individually based on the sample volume and the
reported analytical limit of detection.  These data are presented in Appendix A. The variables
deemed most important by the industrial hygienist/engineer were sample type, sample collection
method, industry, company, process type, job tide, and occupational control type. Consideration
of sample type and sample collection method resulted in retention of only the full-shift personal
samples collected by die newer method. Industry, typically considered a blocking variable, was
retained because of the end user request to  consider die monomer  and  polymer industries
separately (see Step 3).

       Examination of die 516 full-shift personal sample data points (Appendix A) showed that,
after consideration of industry, company, process type, and occupational control, little or no
additional information was  provided by job  title.  That is, there tended to be only a single job
title for any given process type.  Thus, job title was not considered for the definition of die initial
groups. On the basis of the remaining parameters, 58 initial groups were identified, wim sample
sizes as indicated (by industry, company, process type, and occupational control):

Monomer:
       Ml, Control room, control I:  N=3
       Ml, Lab,  control 4: N=6
       Ml, Process area, control 2: N=5
       M2, Control room, control 1:  N=3
       M2, Lab,  control 3: N=9
       M2, Loading area,  control 1: N=3
       M2, Process area, control 1: N=6
       M3, Control room, control 1:  N=2
       M3, Lab,  control 6: N=7
       M3, Loading area, control 2: N=6
       M3, Process area, control 2: N=4
       M4, Control room, control I:  N=2
       M4. Lab, control 5: N=7
       M4, Lab, control 2: N=3
       M4i Loading area, control 2: N»2
       M4, Process area, control 2: N=» 12
       M4, Tank farm, control 1: N=5
                                        54

-------
Polymer:
      PI, Crumbing and drying, control 1: N=9
      PI, Lab, control 1: N» 10
      PI, Maintenance, control 1: N»34
      PI, Packaging, control 1: N=»30
      PI, Polymerization or reaction, control I: N»6
      PI, Process area, control 1: N»5
      PI, Purification, control 1: N»6
      PI, Solutions and coagulation, control 1: N=9
      PI, Tank farm, control 1: N=5
      PI, Warehouse, control 1: N»2
      P2, Control room, control 1: N=6
      P2, Crumbing and drying, control 1: N»7
      P2, Lab, control 1: N-14
      P2, Maintenance, control 1: N»9
      P2, Packaging, control I: N»6
      P2, Polymerization or reaction, control 1: N»29
      P2, Solutions and coagulation, control 1: N=5
      P2, Tank Farm, control 1: N=»3
      P3, Lab, control I: N-3
      P3, Maintenance, control 1: N»4
      P3, Polymerization or reaction, control 1: N» 18
      P3, Solutions and coagulation, control 1: N=4
      P3, Tank farm, control 2: N»9
      P3, Unloading area, control 1: N»2
      P4, Crumbing and drying, control 1: N» 13
      P4, Lab, control I: N-17
      P4, Maintenance, control 1: N»7
      P4, Packaging, control 1: N*20
      P4, Polymerization or reaction, control 2: N»7
      P4, Solutions and coagulation, control 1: N»3
      P4, Tank farm, control 1: N»8
      P4, Warehouse, control 1: N »11
      PS, Crumbing and drying, control 1: N»6
      P5, Lab, control 1: N-8
      P5, Maintenance, control I: N»16
      PS, Packaging, control 1: N»23
      P5, Polymerization or reaction, control 2: N»2
      P5, Purification, control 2: N» 12
      P5, Solutions and coagulation, control 1: N* 12
      PS, Tank farm, control I: N»6
      PS, Warehouse,  control 1: N-7

In the above list,  the control types are as  listed in  Appendix A,  The initial categories are
identified by number in Appendix A.
                                     55

-------
 B.      f,py-Tran&forni the Data

         The tests of the grouping .and the importance of the identified exposure parameters are conducted
 on  the  log-transformed  concentration values.  This is  done because  it is  typically assumed that
 concentration data can be described by a log-normal distribution. If the concentrations are log-normally
 distributed, the effect  of log-transforming  the data  is to create  normally distributed values.   One
 assumption underlying analysis of variance (ANOVA) methods (see subtask D below) is that die errors
 are normally distributed.  Thus, under the general assumption of log-normally distributed concentrations
 and using a log-transformation of the concentrations, an assumption of the ANOVAs discussed below is
 satisfied.

         We have  not proposed here to test the assumption that the concentrations are log-normally
 distributed.   This is considered  appropriate in light of the theoretical  rationale  for suspecting that
 atmospheric concentration data follow a log-normal distribution and the extensive empirical evidence that
 a log-normal  distribution can describe observed patterns c; concentrations of various compounds (see
 Rappaport, 91, for a brief review).  Moreover, ANOVA is robust with respect to departures from the
 assumption of normality.  That  is, ANOVA can still  be  reasonably  expected to give the correct
 interpretation of die data even if the data deviate somewhat from a normal distribution.  Nevertheless,
 testing the assumption of log-normally distributed concentrations can be-considered an option, and
 Appendix B presents information related to the testing of data to see if it is normal or log-normal.  If the
 engineer suspects that the concentration data should not be considered to be log-normal, he or she can
• apply the tests described  in that appendix or consult a statistician for additional support. If departures
 from  log-normally distributed concentrations are detected, a notation should  be added to the  list of
 uncertainties and assumptions.

         The data points are transformed into natural (base e)  log values as described by Equation  I.

                       x,, = In (xj                                                    Equation 1

           where:
                                    t
           x«    =     a log transformed data point
           x,    a     a data point (as originally observed)
           In    a     the natural logarithmic function
 C.      Graphical Examination of the Data: Check for Outliers

         Before the ANOVA(s) are performed to test the importance of me exposure parameters, the log-
 transtbrmed data should be examined once more to determine if some errors have been introduced. This
 examination will focus on the pattern of observed values, rather than individual observations as in Step 7,
 to determine if there are any values that appear "unusual." The unusual observations can be considered
 to be the outliers, those observations mat do not appear to fit in with the rest of the data.  "Box-and-
 whisker"  plots can be used to identify outliers.

                                                56

-------
        Box-and-whisker plots can be created for each of the initial categories. If there are relatively few
observations per category, less than 6 to  10 typically, such plots may not be very informative.  One can
also combine son* of the initial categories and examine box-and-whisker plots for such combinations.
Caution should bt exercised when such combinations are considered, because it is not clear at this stage
of the analysis  which categories ought to be combined.  Combination of categories with quite different
mean values, for example, may lead to a bi-modal distribution that will be relatively uninfbrmative with
respect to identification of outliers.

        Outliers can be identified from a box-and-whisker plot as the individual observations that are
displayed beyond the limits of the whiskers.  More information about the box and the whiskers of such
a plot is presented in Appendix  B.  Any  outliers so identified should not be  dropped from  analysis.
Rather, those data points should be examined to determine  if they have been  entered or calculated
incorrectly. Sources of error include, but are not necessarily limited to, misciassification (an observation
was recorded as belonging to one group when in actuality it belongs in another group), transcription (an
incorrect value was transcribed from the lab sheets entered into the computer data base), or calculation
errors (e.g., when units were converted).

        If errors are detected, then they  should be corrected and the graphical  examination of the data
re-evaluated.  If no errors are detected,  then the data points should be retained and considered in the
ANOVA.
                                                57

-------
                                          EXAMPLE

          Figure 6 shows the box-and-whisker plots for the  initial categories from the monomer
  industry.   The  numbers of observations within each category are  small, so not much can be
  determined with respect to outliers.   However,  in the category  consisting of  process  area
  concentrations at company Ml, there is one relatively low value; and in the tank farm at company
  M4, there is a high value.  The former is an observation below the detection limit (below 0.18
  ppm, set  equal to  0.09 ppm  for these analyses) which appears low relative to the 4 detected
  concentrations of 0.37 ppm or above at the Ml process area.  The latter is a concentration of 1.53
  ppm, which, compared to the other 4 concentrations from the M4 tank farm (all of which were less
  than 0.31 ppm and included 3 non-detects), looks suspicious.

          Figure 7 shows the SAS output for all of the monomer industry initial categories combined.
  The box-and-whisker plot from that output shows two outliers, bom on the high side.  Investigation
  of those observations revealed that they were from the lab at M4 (with control type 2) and from the
  loading area at M4. These points were not detected in Figure 6 because the initial categories in
  which they were classified had few observations (3 and 2, respectively).

          When these outliers were investigated, it was determined that they did not result from data
  manipulation errors. Furthermore, they did not appear to be the result of atypical situations (e.g.,
  a spill) at the plants involved.  Because there was no evidence that they were unusual or erroneous,
  these concentrations were retained for the subsequent analyses.

          The  concentrations in the polymer industry initial categories were similarly examined.
  Again, no evidence of erroneous or atypical data were discovered and all data points were retained
  for analysis.
D.     Analysis of Variance

       ANOVA techniques are the recommended basis for revising the initial grouping. Such techniques
are applied to determine if the observed concentrations within some of the initial groups are similar
enough to warrant combination of those groups.  This approach is based on determinations of whether
or not the exposure parameters suggested by the  engineer as potentially important actually discriminate
between exposure levels, i.e., whether or not those parameters are statistically significant with respect
to concentration differences.

       The application of ANOVA may not be straight-forward in many real cases.  Difficulties can arise
if there are several factors being considered, if confounding or aliasing of me effects of those factors is
possible, or if there are correlations among the observations (e.g., if there is nesting of the effects of one
factor within another factor).   The ANOVA approach described here  is relatively easy; suggested
interpretations of standard statistical output are provided. However, it is recommended that the engineer
consult a statistician to help interpret problematic cases and to suggest supplemental analyses that may
resolve the problems.


                                               58

-------
Univariate Procedure
Schematic Plots

VariablecLOGCONC


      6


      5


      4


      S


      2


      1


      0


      -1


      -2


      -3


      -4
    LogCconcentrat ton)
       «..!..«
       •	*
                                                       I  *  I

                                                       l	j
                                                                 *--«-••»

                                                                 1	1
 COMIRflt
 PtOCESS
 OMPAHV
Control
    N1
Lab
 HI
     2
Process
    HI
Control
     N2
  3
Lab
 N2
     1
Loading
    N2
Process
    N2
                                 Figure 6:  Box-and-Whisker Plot for Monomer Industry Categories

-------
Univariate Procedure
Schematic Plots

Variabl«>LOfiCONC
Log(concentrat ion)
6
5
4 •
3
2

1 •!

0



•'



-2


-3
-4
-5
CONTROL
PROCESS
COMPANY












...
1.
*




U..I

1
Control
N3














,






6
Lab 1
M3






















.oa<























lint
NJ





















i
I 1
\






















>roi























:esi
HJ







•..«..*













> 1
> Control
1 Mi




*
















i
Lat
N4



*"-'
1
1 	 1
1. i


1
1 , ,



*.....*
...... 1 A 1
1- J




'5221
> Lab Loading Proces^ Tank far
^" P^% N% W%
                               Figure 6:  Box-and-Whisker Plot for Monomer Industry Categories

-------
                                                                                 MonoMr Induttrjr

                                                                               Unl»*rt«l« Pr
                                                                                                                                    t»tr«
oss
    *- 0
H(St«.)
    «.)
    •••>
          Su» MoU         OS

I.0200SO  WarUnc*   1.721207
0.474001  K«rlnU   I.2M201
Ml.0011  CSS       II2.S004
-214.070  Std NM«   0.2002M
-4.200*1  *r»|l|       0.0001
      04  NHB » 0          27
     -IS  Pr»|M|      0.0014
    -010  Pr»ll|      0.0001
0.0*0101  Fr«         O.M12
100k H».  S.02M2S
 7S» Ql   0.42S2M
 SOX N*d  -0.7111S
 in Ql   -2.407M
  OK MU  -4.MSI7
                                                                                 IO.52O2
                                                                                2.013211
                                                                                -4.M117
                                                                                              •W  S.*2M2S
                                                                                              tftK  I.5OS14S
                                                                                              WK  1.0*1*2)
                                                                                              in  .1.21000
                                                                                               n  .1.01202
                                                                                               1*  -4.0MI7
                                                -4.MSI7
                                                -4.(0517
                                                -4.MSI7
                                                -1.01202
                                                -1.01202
                                                                                                                                        i;
                                                                                                                                        2
                                                                                                                                        1
                                                                                                                                       •1
                                               M
-------
        An ANOVA of only the main effects (the important exposure parameters identified in Step 4) is
recommended.  That is, for the purpose of identifying which factors to retain for the definition of the
final categories, examination of the contributions of the factors themselves and not their  interactions
should be sufficient.  The presence of an interaction means that the effect of one factor is not the same
across all the values of some other factor or factors.  While such interactions may exist, it may be
difficult to evaluate them if there are relatively few distinct combinations of factors for which we have
observations.  A statistician should be consulted to determine the effect of ignoring interaction terms in
any particular case that appears to be problematic.
                                                                           *
        Although, in Step 4, the engineer was encouraged to identify explanatory exposure variables
(e.g., control type, job tide, etc.) as opposed to blocking variables (e.g., company or industry), inclusion
of some blocking variables in the ANOVA can help to avoid potential difficulties.  The inclusion of
blocking variables in ANOVAs is typically recommended so as to account for sources of variability that
are not otherwise accounted for by  the explanatory variables,  especially when there are known or
suspected differences across the units that are  being observed that  can not be controlled.  Blocking by
company, for example, can make the test of control type more sensitive, if there are company-to-company
differences that can not otherwise be Factored out.  Moreover, problems of correlation (e.g., observations
obtained at one date being more closely related to one another than they are to observations from another
date,  even if the observations  came  from the same plant and process type) might be minimiy«d by
blocking, especially blocking by calendar time if the concentration measurements have been collected over
a relatively long  period of time.  Blocking may  not be the ideal solution (nested ANOVAs might be
considered—see Appendix B), but a simple main effects ANOVA with suitable blocking factors may be
sufficient for  the purposes of determining which factors to retain  for  group definition.   Again,
consultation with a  statistician is recommended.   Moreover, if large block-to-block differences are
observed, the engineer may find it useful to determine if there are some explanatory variables that might
account for those differences.

        For each  of the factors  in die ANOVA, whether it is an explanatory or a blocking variable, the
result of interest will be die F-test that compares the variability in concentrations accounted for by mat
factor to die "error" variability.  The error variability (assessed by the mean squared error) measures die
inherent randomness of observations within groups. When differences in means across die groups defined
by the factor under consideration are large relative to the within-group variability, then the F-test of that
factor returns a significant result.  This suggests mat that factor  is indeed important and should be
retained for defining exposure groups. A significant result can be defined as an F-test with an associated
p-value less than 0.05.  The determination of significance is dependent on sample size,  so it may be
appropriate to adjust the 0.05 cut-point as a function of sample size. For small sample size, a larger p-
value might be warranted; for larger sample sizes, a smaller p-value could be used. A statistician should
be consulted if such adjustments are considered.

        It is recommended that the partial sums of squares be used for the F-tests of significance.  These
sums of squares (called Type ffl sums of squares in the SAS output) are considered by many statisticians
to be the most desirable. Such sums of squares are not sensitive to the order of the factors in die model.
The sums of squares for one factor account for die effects of all other factors. Moreover, they are not
functions of the numbers of observations per group.  All these features make the partial sums of squares

                                               62

-------
appropriate for the purposes of determining how to refine the initial grouping by ignoring some of the
exposure parameters.
                                          EXAMPLE

          The  SAS output for the ANOVA of the  monomer industry groups  is displayed in
   Figure 8.  The last column of the output shows the p-values associated with the 3 factors being
   considered: company,  process type, and control type.  Alt three of the p-values exceed 0.05 for
   the  partial sum  of squares, suggesting that none of them significantly  account  for observed
   differences  in concentrations.  Rather  than removing all  of the factors  from consideration,
   however, it was decided to first remove only the blocking variable, company, to see what effect
   this would have on the other two factors.

          Figure 9 shows  the ANOVA results  when only process type  and  control type are
   considered.  In this case, both of those factors contribute  strongly to observed differences in
   exposure.  The tack of significance for those factors when company was included illustrates a
   difficulty that can be  encountered when there  are relatively few observations and factors with
   many values: there is  confounding (overlap) of the effects and the significance of one or more
   of them may be masked.  Because we were not interested in company per se and were willing to
   remove  it from  consideration,  the importance  of process type  and control  type could  be
   revealed. Both factors are retained for redefining groups in the monomer industry.

          For the polymer  industry groups (Figure  10),  the company blocking variable and the
   process  type parameter were highly significant  but the control type was not This suggests that
   control type can be ignored in  the polymer  industry.   Apparently, the differences between the
   controlled and uncontrolled work  areas  did not  result in significant  differences  in exposure,
   when the other factors of company and process type were considered. The fact that company
   was a significant factor suggests that other differences between companies, in addition to control
   technologies, are contributing to different exposure  levels.   At this point in time, the relevant
   differences among companies have not been  identified,  so company is  retained as  a factor used
   to define exposure categories.
                                              63

-------
General Linear KobeIs Procedure
Class Level Information

Class    Levels    Values

COMPANY      4    HI H2 MS M4

PROCESS      5    Control rooa Lib loading Process area lank far*

CON1ROL      6    123456


NuBtoer of observations in data set * 85


Dependent Variable: LOCCONC   Lofl(concentration)
Source
Nodel
Error
Corrected lotal


Source
COMPANY
PROCESS
CONIROL
Of
12
72
84
R- Square
0.375273
or
3
4
5
Sun of Squares
117.30527971
195.28116513
312.58644484
C.V.
•183.4455
lype III SS
19.11674454
22.57606702
18.94416819
Mean Square
9.77543998
2.71223840

Root MSE
1.64688749
Mean Square
6.37224818
5.64401676
3.78883364
F Value
3.60




f Value
2.35
2.08
1.40
Pr > f
0.0003


LOGCONC Mean
-0.89775311
Pr > f
0.0796
0.0922
0.2357
                I igure 8:  SAS Output for Test of Company, Process Type, and Control Type in Monomer Industry

-------
General Linear Models Procedure
Class Level Information
Class   Levels

PROCESS       5

OWIROL       6
Values

Control root) Lib Loading Process area Tank fara

1 2 3 4 5 6
      of observations in data set • 85
Dependent Variable: LOCCONC   Log(concentration)
Source
Model
Error
Corrected Total


Source
PROCESS
CONTROL
OF
9
75
at
R- Square
0.314116
Of
4
5
Sum of Squares
98.18853517
2U. 39790968
312.58644484
C.V.
-188.3314
Type III SS
49.48315238
57.55457757
Mean Square
10.90983724
2.85863880

Boot NSE
1.69075096
Mean Square
12.37078810
11.51091551
F Value
3.82




F Value
4.33
4.03
Pr > f
0.0005


LOCCONC Mean
-0.89775311
Pr > F
0.0033
0.0027
                     Pigure 9:   SAS Output for Test of Process Type, and Control Type  in Monomer Industry

-------
General  Linear Models Procedure
Class Level Information
Class   Levels

COMPANY       5

PROCESS
CONIBOL
     Values

     PI P2 PJ P4 PS
12   Control  rooa Crushing and dry Laboratory Maintenance Packaging Polymerization o
     Process  area Purification Solutions and co lank far* Unloading area Warehouse

 212
Nuaber  of observations in data set » 431
Dependent Variable: LOCCOMC   Log(concentration)
Source
Model
Error
Corrected lotal


Source
COMPANY
PROCESS
CONTROL
OF
16
414
430
ft -Square
0.647787
Of
4
It
1
SUB of Squares
1279.36507409
695.61144185
1974.97651593
C.V.
-47.86S22
Type III SS
487.37213944
686.59871410
0.96805416
Mean Square
79.96031713
1 .68022087

Root NSE
1.29623334
Mean Square
121.84303486
62.41806492
0.96605416
f Value
47.59




f Value
72.52
37.15
O.S8
Pr > f
0.0001


LOCCOHC Mean
-2.70809000
Pr > f
0.0001
0.0001
0.4483
                        10:  SAS Ouipul for lest of Company, Process Type, and Control Type in Polymer Industry

-------
E.     Rgjepning Groups
       Based on the results of the ANOVA(s), it may be possible to ignore one or more of the factors
that were originally considered for importance.  The regrouping is accomplished by simply dropping the
non-significant factors (essentially pooling some groups).
                                       EXAMPLE
                                                                    *
         For the monomer industry groups, ignoring the company parameter and reclassifying results
   in a drop to  11 groups,  from the initial 17.   Unfortunately, for the polymer  industry group,
   elimination of control type from the definition of the groups does not reduce the number of groups
   for which descriptive statistics are required. Each of the initial groups could have been completely
   defined by company and process type alone (i.e., no process type within a company had more than
   one control type in place). Thus, the 41 initial polymer industry groups are retained for calculation
   of descriptive statistics in Step 17.

         The groups that are carried through to Step  16 are listed here:
   Monomer process area, control 1, N=6
   Monomer process area, control 2, N=21
   Monomer control room, control 1, N=10
   Monomer loading area, control 1, N=3
   Monomer loading area, control 2, N=8
   Monomer lab, control 2, N=3
   Monomer lab, control 3, N=»9
   Monomer lab, control 4, N=6
   Monomer lab, control 5, N»7
   Monomer lab, control 6, N»7
   Monomer tank farm, control 1, N»5
   PI, Crumbing and drying, N»9
   PI, Lab, N» 10
   PI, Maintenance, N=34
   PI, Packaging, N-30
   PI, Polymerization or reaction, N»6
   PI, Process area, N«5
   PI, Purification,  N=6
   PI, Solutions and coagulation, N-9
   PI, Tank farm, N-5
   PI, Warehouse, N-2
   P2, Controj room, N»6
   P2, Crumbing and drying, N»7
   P2, Lab, N»14
   P2, Maintenance, N»9
   P2, Packaging, N=6
P2, Polymerization.or reaction, N=29
P2, Solutions and coagulation, N»S
P2, Tank Farm, N-3
P3, Lab, N»3
P3, Maintenance, N»4
P3, Polymerization or reaction, N* 18
P3, Solutions and coagulation, N=>4
P3, Tank farm, N»9
P3, Unloading area,  N-2
P4, Crumbing and drying, N» 13
P4, Lab, N-17
P4, Maintenance, N»7
P4, Packaging, N»20
P4, Polymerization or reaction, N»7
P4, Solutions and coagulation, N»3
P4, Tank farm, N»8
P4, Warehouse, N* 11
PS, Crumbing and drying, N=»6
P5, Lab, N«8
PS, Maintenance, N-16
PS, Packaging, N-23
PS, Polymerization or reaction, N»20
PS, Purification, N-12
PS, Solutions and coagulation, N= 12
PS, Tank farm, N-6
PS, Warehouse, N=»7
                                           67

-------
                         STEP 16: TREATMENT OF TYPE 2 DATA
       Categories with insufficient Type 1 data are identified and may be supplemented with Type 2 data
(Figure 11).  Type 2 data should only be added for those categories that require it, and Type 3 data
should never be added.

       A sample size of 6 is a common minimum cited in the literature (Patty, 81; Hawkins, 91) for
calculation of simple descriptive statistics.  The addition of Type 2 data is considered only for groups
having fewer than six samples.

       A summary of the Type 2 data not used in the statistical analysis will be prepared, similar to the
summary of Type 3 data completed in Step 12.
A.     Considering Addition of Type 2 Data

       There will be a "trade off" that must be carefully considered when faced with a group with small
sample size. The addition of additional data points will tend to improve the estimation of the descriptive
statistics desired, all else being equal.  However, when Type 2 data are all mat are available for boosting
sample sizes, all things are not equal.  The Type 2 data are not as good as the Type 1 data considered
heretofore, typically because the Type 2 data lack information about some important parameter or because
some substantial uncertainty is associated with the  measurements.   In some instances or for some
categories, the addition of such Type 2 data may not be desirable, even  when sample sizes are low,
because the additional uncertainty is considered to outweigh me benefits of increased sample size. It may
be the case that a sample size of 5, for example, is preferable to adding one or more Type 2 data points
because the information that  was missing from the Type 2 data, and the assumptions made in order to
use the Type 2 data, may have a substantial impact on the applications intended by the end user.  The
decision, therefore, must consider the end user needs and how sample size and assumptions relate to those
needs.
B.      Adding Type 2 Data

        When Type 2  data are added to the data set,  a record of mat addition and the associated
assumptions most be added to the ongoing list of uncertainties and assumptions.  The impact of the
assumptions and uncertainties will be assessed in Step 18.
C.      ^UFniPVY fff RflPliniflg TVPC 2 Data

        Whatever Type 2 data have not been included for statistical analysis should be summarized. The
summary  may be similar in nature to the summary of the Type 3 data (Step 12), but a slightly more
quantitative report may be possible for some Type 2 data.  This report on the Type 2 data can be used
                                              68

-------
or referred to in 'the presentation of results, as a supplement to the statistical information based on the
Type 1 data (supplemented as needed by Type 2 data).
                                          EXAMPLE

          In the example data set, 12 groups resulting from processing in Step 15 had Type 1 data
   sample sizes less than 6.  For one of those groups, monomer loading area with control 1 (N=3),
   additional Type 2 data  were located (Table 2).  These data were considered to be Type 2 data
   because of known biases in the measurement procedure and assumptions that were made about the
   correction factor to apply to adjust for that bias.  Nevertheless, it was possible to estimate values
   for the samples, as shown in Table 2.  The eleven Type 2 values were added to the Type 1 data
   of this group, because the two sets of values appeared to be generally consistent and the effect of
   uncertainty about the Type 2 values was considered  to be offset by die advantage of increasing
   sample size for this group.  The inclusion of these data is noted on the list of uncertainties and
   assumptions.

          No other Type 2 data were available to boost sample sizes for the other eleven groups with
   small sample size.  These groups will be treated appropriately in subsequent steps.
                                              69

-------
                          Aro
                       tore groups
                      wilt tower tun
                        6Typ*1
                      Otwarvalians?
Groupings horn
  Stop 15
                          Is it
                       appropriate
                      toaddTyp«2
                       daUlosmU
                        groups?
Uncwlamty/
Assumptont
                                   Exposure
                                    Matrix
Figure 11.  Flow Diagam for Step 16 (Treatment of Type 2 Data).

-------
                        Table 2. Type 2 Data Used in Statistical Analysis
Plant
ID
Al
Al
Al
Al
A2
A2
A2
A3
A3
A3
A3
Industry-
Monomer
Monomer
Monomer
Monomer
Monomer
Monomer
Monomer
Monomer
Monomer
Monomer
Monomer
' Process
type
Loading
area
Loading
area
Loading
area
Loading
area
Loading
area
Loading
area
Loading
area
Loading
area
Loading
area
Loading
area
Loading
area
Job title
Process
technician
Process
technician
Process
technician
Process
technician
Process
technician
Process
technician
Process
technician
Process
technician
Process
technician
Process
technician
Process
technician
Control
type (a)
1
1
1
1
1
1
I
I
1
1
1
Sample
duration (min)
415
428
427
474
260
442
443
459
484
474
446
8-hr TWA
(ppm)
0.50
0.30
*
0.10
0.90
2.80
3.10
0.80
7.50
0.60
2.40
1.70
Control
description
Magnetic gauge
Magnetic gauge
Magnetic gauge
Magnetic gauge
Magnetic gauge
Magnetic gauge
Magnetic gauge
Magnetic g
Magnetic gauge
Magnetic gauge
Magnetic gauge
(a) Control Type 1 is "controlled," as in Table 1.
                                               71

-------
         STEP 17. CALCULATE DESCRIPTIVE STATISTICS FOR EACH GROUP
       For each group defined in the previous steps, means and standard deviations, as well as geometric
means and geometric standard deviations will be estimated.  Because no tests have been conducted to
determine the nature of the distributions of concentrations within the groups, relatively simple and
consistent estimators of those parameters are  recommended.   This  step describes the calculations
necessary for estimating the descriptive statistics.
                                                                      *
       The sample  mean and sample standard deviation are consistent estimators of the mean and
standard deviation, respectively. In the case of normality, they are also unbiased estimators. The sample
mean is given by Equation 2.
                                                £  *>                           Equation 2
                                      MEAN -  J^—                           ^
                                                 n
         where:
          MEAN        =  sample mean
              x,        =•  a data point
              n        =»  number of data points

       The sample standard deviation is the square root of VAR, SO » (VAR)", where VAR is given
by Equation 3.


                                          E  (x, - MEAN)1                      Equation 3
                                  VAR -  2	
                                               n-1

              where:

              VAR      =••  sample variance
              MEAN    *  sample mean
              X|         »  a data point
              n         =»  number of data points.
                                             72

-------
       The geometric mean and geometric standard deviation can be estimated from the log-transformed
data.  Equations 4 and 5 present those estimates:

                                     GM = exp {LMEAN}                            Equation 4

                                     GSD »  exp {LVAROS}                            Equation 5

       where
                                                   •
                                                   •^                      •
                                                                                   Equation 6
                                                  n-1

       and
              x<      =  a log-transformed data point
              n      =  number of data points
              exp    =  the antilog function


       It may also be useful to calculate standard errors for the estimators of the means. The standard
error is related to the variability of the estimator of the mean. That estimator is estimating the true mean
of the distribution of observations, but because it is only an estimator,  there is some  uncertainty
concerning the value of the true mean.  That uncertainty is characterized by the standard error.

       The derivation of a standard error for the sample mean, SE, is given by Equation 8.

                                       SE * SD/(nf*                               Equation 8

       where

              SD     =»  standard deviation estimate
              n      *  number of observations.
                                         EXAMPLE

          Table 3 displays the descriptive statistics calculated for me groups retained from Step 15.
   That table provides the statistics for any group with sample size of at least 6. For the groups with
   samples sizes less than six, median values are all that are provided.
                                              73

-------
                   Table 3: Descriptive Statistics for Groups in Example Data Set
             Group.

Monomer Control Room, Control  1
Monomer Lab, Control 2
Monomer Lab, Control 3
Monomer Lab, Control 4
Monomer Lab, Control 5
Monomer Lab, Control 6
Monomer Loading, Control  1
Monomer Loading, Control  2
Monomer Process Area, Control  1
Monomer Process Area, Control  2
Monomer Tank Farm, Control  1
PI, Crumbing and drying
PI, Lab
PI, Maintenance
PI, Packaging
PI, Polymerization or reaction
PI, Process area .
PI. Purification
PI, Solutions and  coagulation
PI, Tank farm
PI, Warehouse
tp2, Control room
P2, Crumbing and drying
P2, Lab
P2, Maintenance
P2, Packaging
P2, Polymerization or reaction
P2, Solutions and  coagulation
P2, Tank farm
P3, Lab
P3, Maintenance
P3, Polymerization or reaction
P3, Solutions and  coagulation
P3, Tank farm
P3, Unloading area
P4, Crumbing and drying
P4, Lab
P4, Maintenance
P4, Packaging
P4, Polymerization or reaction
P4, Solutions and  coagulation
P4, Tank farm
P4, Warehouse
PS, Crumbing and drying
P5. Lab
        Descriptive Statistics
No. of    Mean     Std. Dev. Geom. Mean  Geom.  Std.
Samples   (ppm)      (ppm)       (ppm)       Dev.
10
3
9
6
7
7
14
8
6
21
5
9
10
34
30
6
6
6
9
5
2
6
7
14
9
6
29
5
3
3
4
18
4
8
2
13
17
7
20
7
3
8
11
6
8
0.448
2.610 *
0.524
0.298
3.087
0.350
1.709
17.010
1.312
0.918
0.160 *
0.043
2.909
0.857
0.039
0.696
0.118
4.357
0.027
0.440 *
0.020 *
0.028
0.032
0.636
0.030
0.033
0.077
0.030 *
0.360 *
0.020 *
0.020 *
0.057
0.020 *
0.112
14.600 *
0.016
0.184
0.004
0.006
0.003
0.003 *
2.366
0.004
0.055
3.972
0.724

0.629
0.357
2.256
0.304
1.913
43.110
1.131
1.054

0.019
3.348
2.310
0.031
I.'IOO
0.122
2.312
0.008


0.030
0.013
1.267
0.009
0.006
0.144
..
..

..
0.068
..
0.231

0.020
0.275
0.004
0.006
0.001
..
4.203
0.002
0.031
3.035
0.236

0.335
0.191
*2.492
0.264
1.139
6.243
0.994
0.603

0.040
1.908
0.298
0.031
0.372
0.082
3.849
0.026


0.019
0.030
0.285
0.029
0.032
0.036
--
--
--

0.036

0.049
--
0.010
0.102
0.003
0.004
0.003
..
1.161
0.004
0.048
3.156
3.106
. »
2.572
2.569
1.924
2.116
2.463
4.120
2.107
2.502

1.515
2.505
4.277
2.003
3.062
2.346
1.646
1.343


2.382
1.485
3.547
1.341
1.201
3.417




2.583
«
3.626

2.682
2.955
2.140
2.374
1.180

3.299
1.627
1.697
1.970
  Values marked by asterisks are medians for groups with less than 6 observations.

-------
                   Table 3:  Descriptive Statistics for Groups in Example Data Set

                                          Descriptive Statistics
                                  No. of    Mean     Std.  Dev. Geoa. Mean  Geoa.  Std.
             Group                Samples   (ppa)       (ppm)      (ppa)        Dev.
P5, Maintenance
P5, Packaging
PS, Polymerization or reaction
P5, Purification
P5, Solutions and coagulation
P5, Tank faro
P5, Warehouse
16
23
20
12
12
6
7
1.200
O.OS8
0.740
9.523
0.082
3.020
0.045
1.253
0.034
0.886
6.727
0.047
1.750
0.015
0.830
0.050
0.474
7.778
'0.071
2.613
0.043
2.360
1.730
2.568
1.889
1.709
1.713
1.382
 • Values marked by asterisks are medians for groups with less than 6 observations.

-------
             STEP 18:  TREAT UNCERTAINTIES, ASSUMPTIONS, AND BIASES
        In the course of completing some, previous steps, uncertainties, assumptions and biases will have
been compiled in an ongoing list.  The listing of uncertainties, assumptions, and biases will be treated
in this step to provide important information to the end user.  Evaluating uncertainty, assumptions, and
biases provides a sense of the integrity of the results, whether significant gaps exist in the available data
or information upon which the assessment is based and whether decisions made on the basis of the data
will be tenuous.   In addition, an  uncertainty  analysis provides information to better focus resources
needed to refine the assessment and improve (reduce) the uncertainty (EPA, 92).

        This  step describes procedures for the treatment of data limitations imposed by uncertainties,
assumptions, and biases. To die extent possible, those procedures will be quantitative; sensitivity analyses
and confidence  limit calculations are examples of quantitative  approaches.   The EPA  Exposure
Assessment Guidelines (EPA, 92) and Hornung (Hornung, 91) contain additional methods for quantifying
uncertainty.  In many cases, however, treatment may be qualitative, when quantification is not possible.
Because this step  is vital  to a risk assessment and the management decisions associated wim it, and
because  it may be difficult to execute, even a qualitative discussion of uncertainty will be extremely
important.
A.     Sensitivity Analysis

       Sensitivity analysis can be used to test the effect of uncertainty or assumptions on the results, over
the expected range of the uncertain or assumed values. The sensitivity analysis involves fixing the value
for one variable at a credible lower bound while the other variables remain at meir "best-estimate* values,
and then computing the results.  Then a credible upper bound value for the one variable is used while
the other variables remain at their "best-estimate" values, and again die results are computed.  Bodi sets
of results are-evaluated, over ail uncertainties and assumptions (i.e.,  those relating to values of the
observations used in die calculations), to determine which variables have me  greatest impact on die
assessment of exposure.  Such analyses may  also help focus resources for further refinement of the
assessment.  Since a sensitivity analysis  does not provide any information on the likelihood of the
variables assuming any particular values in their ranges of values, me analysis is most useful in screening-
level assessments.

        An approach known as Monte Carlo simulation can be used  to quantitatively combine die
contributions of various uncertainties. If ranges and/or distributions for the uncertain parameters can be
specified, then values from mose distributions can be sampled repeatedly, with exposure descriptive
statistics recalculated with each repetition, to develop a "picture" of the distribution of descriptive statistic
values. Monte Carlo simulation is a computer-intensive approach that can handle complex systems and
combinations of many parameters. The user should consult a statistician if Monte Carlo approaches are
to be considered.
                                               76

-------
        Where limited data exist, such  as  for a  new chemical, comparison with similar chemicals
(surrogates) or the use of modeling may  be used to estimate concentrations for the chemical of interest
(Step 12 describes methods for treatment  of Type 3 data). A sensitivity analysis can address uncertainty
in the following manner: the model is run using a range of expected values for model parameters as in
Monte Carlo simulation  discussed above; changes  in the estimated concentrations for different input
parameter values are a function of the sensitivity to the model parameters and of the degree of uncertainty
associated with the parameter values. A more complete evaluation of uncertainty due to modeling would
be to consider alternative models and ranges for their input parameter values.
B.      Confidence Intervals

        Confidence intervals can be calculated to quantify the uncertainty associated with estimates of
summary statistics.  In particular, one is often interested in the uncertainty concerning the mean exposure.
As discussed in Step 17, the standard error of the mean characterizes the variability of the estimate of
the mean  and is the basis for confidence limit calculations for the mean.  Confidence limits address
uncertainty associated with sampling error, not other sources of uncertainty.

        For a normal distribution, a 90% confidence interval for the mean extends from 1.645 standard
errors  below  the estimator of the mean to  1.645  standard errors above the mean estimator.  A
95% confidence interval is ±  1.96 standard errors,  and a 99% confidence interval  is ± 2.58 standard
errors around the estimator of the mean.  The values 1.645,  1.96, and 2.58 are the multipliers of dw
standard errors that are used to derive confidence intervals corresponding to three levels of confidence
(90%, 95%, and 99%, respectively). In practice, one does not know what the true standard error is any
more than one knows what the true mean is. To account for this added level of uncertainty, the values
for the multipliers of the standard error are increased, the degree of the increase depending on the sample
size.

        A particularly common situation for confidence limit calculation is for a normal distribution mean.
In that case, multipliers for the standard error can be found in a table of T distribution percentiles.  Those
percemiles depend  on the sample size  as desired.   For example,  for a normal distribution with an
estimated  mean of  5 ppm, a standard deviation of 1.5, and  a sample size of 25, the resulting 95%
confidence interval  for the mean ranges from (5 - 2.064-(1.5/5) to (5 + 2.064'(1.5/5), i.e., from 4.4
to 5.6 ppm. In that calculation, (1.5/5) is the standard error estimate from Equation 6 (see Step 17) and
2.064 is the 97.5m percentile of the T distribution with 24 degrees of freedom (the estimates of standard
deviation  and  standard error have degrees of freedom equal to the sample size minus one).  The use of
the 97.5th percentile results in 2.5% probability above and 2.5% probability below the confidence
interval, i.e.,  a 95% confidence interval.

        Even though we have not tested the groups defined in Step  15 to see if they are normal or not,
the calculations outlined above should  hold approximately, since  the  sample mean is approximately
normal  no matter what the distribution of the underlying observations  may be.  The adequacy of the
approximation depends on the sample size and on the  extent  to which  the standard deviation estimate
divided by the square root of the sample size approximates the standard error of the  mean.  Appendix B

                                               77

-------
presents additional  material  on  confidence  limits,  especially  as  related  to  means  of lognormal
Hicfrihiifinnc
distributions.
C.      Quantification of Bias

        If the data were not statistically sampled, the results may be biased.  This bias is separate from
and should not be confused with bias in the data measurement which can be defined as a systematic error
inherent in a method or caused by some feature of the measurement system (EPA, 92). Statistical bias
is caused by the sample population not being representative of the population under study.  It should be
noted that data collected from other agencies and published sources are almost never  randomly selected,
although a particular bias may be difficult to identify.  Despite the difficulty, it is extremely important
to identify potential biases and clearly present them in the results presentation.  Furthermore, if random
sampling  was carried out only in a subpbpulation, the summary statistics may apply only to that
subpopulation and may not be representative of a larger group.  There are no quamiatrve methods to
extend the sample results beyond the bounds of the subpopulation.

        Bias can also occur because of inappropriate selection of sample location, sample time, or workers
to be sampled.   For example, measurements of peak exposures are intended to measure the period of
highest exposure for mat job category.  Therefore, if a time period that does not represent maximum
exposure or an individual in a job category mat would not represent peak exposure are measured, thea
this selection would cause the measurements not to be representative of peak exposure.

        Quantification of biases is always difficult  and may be  beyond the scope of the exposure
assessment.  If  quantification is not possible,  biases should  be qualitatively described  in the results
presentation. One method of quantification is to segregate the potentially biased data and compare the
exposures with the remaining data  sets.  Where a large quantity of data is available, this may allow
quantification of the  bias.  Where only limited data are  available, such  comparisons may  not yield
dependable results.

        Another method is to try to quantity the bias through use of other information.  For example, if
the data are biased because the plants art "well controlled," then information gathered from other sources
or estimated from the  monitoring data may be used to estimate the control efficiency and the distribution
of controls in the industry. This,  in turn, can be used to quantify the bias.  Likewise,  if only large
facilities were surveyed and other data indicate differences  in control between large and small facilities,
the effect  on exposure estimates may be estimable.
D.      Weighting Factors to Mitigate Bias

        The most common way to mitigate known quantifiable biases is through me use of a weighting
factor.  Weighting factors are used to adjust the influence of different pieces of data to equal their weight
in the population being judged as a whole.  For example,  when determining an annual exposure, values
may be weighted by the number of days annually that a worker is exposed.  Weighting can also be used

                                               78

-------
to calculate averages within a job category or other subpopulation.  Weighting should always be clearly
explained so that the user is aware that the descriptive statistics are based on weighted data.  Weighting
factors used to mitigate bias should be clearly presented.
Sensitivity
For t
group, set
measurem
statistics.
was set at
area/contx
Thee
choice of
Fort







EXAMPLE
f Analysis
he monomer process area/control 2 group and for the P4/polymerization and reaction
isitivity analyses were performed to quantity me effect of the assumption that (undetected
ems were equal to half the detection limit, L, on the calculation of the descriptive
A lower bound for the value to be used for nondetecn was set at L/4. The upper bound
LV2, another common choice for the value of a nondetect. For the monomer process
ol 2 group, the resulting descriptive statistic estimates were as follows:

Statistic
MEAN
SD
GM
GSD
•stimateoft
values for th
he P4/polyn

Statistic
MEAN
SD
GM
GSD

Nondected value:
L/4
0.91
1.06
0.59
2.5
te means and standard
e nondetects. Only fr
urization group, the r
L/2
0.92
1.05
0.61
2.5
deviations for this gn
*e of the 21 observatic
e>ult* were as follows
v-ndected value:
L/4
0.0016
0.00027
0.0016
1.2
I. .'2
00033
000054
0.0032
1.2

UV2
0.92
1.05
0.61
2.5

[>up were very insensitive to the
>ns were bdow detection limits.
I

LA/2
0.0046
0.0007
0.0046
1.2








                                                79

-------
       The change of the mean for the P4/polymerization group was considerably greater than that
observed in the monomer process area/control 2 group, ranging from 52% below to 39% above the
initial estimate of the mean.  All seven of the P4/polymeraation group observations were below
detection limits.  Clearly,  the sensitivity of the results,  in this case  to the assumed values  of
nondetects, can vary from group to group.

       Quantification and  presentation of the results of sensitivity analyses and their variations
across groups will be useful for subsequent risk assessment/risk management decisions. The results
of such sensitivity analysis can be used by the risk assessor/risk manager to determine if his or her
actions and decisions could be subject to change as a result of uncertainty concerning relatively low
concentrations (those  below the limit of detection). If they are subject to change, the implications
of those changes can be determined or the decisions re-evaluated.
Other Means of Mitigating Bias

       The collection of the data used in this  example analysis  provides an example of the
identification of a bias in the collection method and how the bias was mitigated by using a different
method.  The potential for bias exists if the collection 01 analytical method.has not been validated
over the entire range of exposures.  N10SH Method S-91 for the example chemical  illustrates this
(NIOSH, 84).  This method was developed to meet compliance monitoring needs associated with
the OSHA  standard at the time of 1,000 ppm (2,000 Rig/m*).  The method was validated over a
range of concentrations from 481 to 2,237 ppm (1,065 to 4,950 mglaf).  Because of new animal
test data indicating toxicity at much lower concentrations, and the fact that industry was controlling
exposures to much lower levels, the existing method had to be reviewed. It was found that the S-91
method poorly separated  the example chemical  from other C4 hydrocarbons.   This and other
possible interferences probably systematically overestimated the example chemical content of the
samples at  lower concentrations.

       In the case of the example chemical, a new extraction method was developed  that improved
the sensitivity and selectivity of the method and new measurements were taken.  Where sufficient
time or resources are not available, correction factors may be  developed and the overestimate at
lower concentrations adjusted by these factors.  Any such adjustments should be clearly identified
in the data and  the results.  The correction factor  values are themselves subject to uncertainty and
should be included in the list of uncertainties/assumptions for presentation to the end user.
                                            80

-------
                                STEP 19:  PRESENT RESULTS


        Because me results of the analysis may need to be used by  engineers, economists, and other
decision-makers who are not statisticians, presentation techniques will to a large extent determine their
usefulness. To properly use the results of the analysis, the end user must know the purpose, scope, level
of detail and approach used in the assessment.  In addition, key assumptions used, the overall quality of
the assessment (including uncertainties  in the results), and the interpretation of data and results are as
important as estimates of exposure.  The results must also be presented in a form that corresponds to the
modeling  or other needs of the end user.  Finally, it is important that the original data values and ail
important variables be presented in an appendix to the report. This step describes four aspects of results
presentation:

               A)   Characterization of exposure (narrative explanation)

               B)   Presentation of descriptive statistics

               C)   Presentation of assumptions and uncertainties

               D)   Presentation of original data


A.      Characterization of Exposure,

        The characterization of exposure is the overall narrative which consists of discussion, analysis
and conclusions that summarize  and explain the exposure assessment.  It provides a statement of the
purpose of the assessment, the scope, level of detail, and approach used in the assessment.  It presents
the  estimates  of exposure  by  route  of exposure  (e.g.,  inhalation,  dermal) for  me  population,
subpopulation, or individuals, in accordance with the needs identified by the user.  It should also include
an overall evaluation of the quality of the assessment, and  a discussion of the degree of confidence the
engineer has in the  estimates  of  exposure and the conclusions drawn. The data and  results should be
presented in keeping with the  terms defined in the  EPA Exposure Assessment Guidelines (EPA, 92) for
bounding estimates, reasonable worst case estimates, worst case estimate, maximally exposed individual,
maximum exposure range, etc.

        The engineer should include a discussion ot whether the scope and level of detail were sufficient
to meet the need* of the  user.  If user needs  were not met it is preferable  to identify die tasks or
mechanisms (monitoring, collecting additional information, etc.) that will be needed in order to fully meet
the  needs of the user,  and how this lack of data or information impacts the assessment.  A general
discussion of research or additional data to improve the assessment is also quite useful; data gaps should
be identified in order to focus  further efforts to reduce uncertainty. An appendix may be a suitable place
for this discussion.
                                                81

-------
       The methods used to quantify exposure (e.g.,  models, use of surrogate data, use of monitoring
data) should be clearly  identified in the exposure characterization.  A discussion of the strengths and
weaknesses of the methods and of the data used should be included.

       When Type 2 and Type 3 data were available but not used for the quantitative characterization
of exposure, summaries of the information available from the Type 2  and Type 3 data bases should be
included.  Recall that the summaries of the  Type 2 data may be more quantitative in nature and may
provide some numerical estimates. The numerical estimates and qualitative appraisals of the Type 2 and
Type 3 data can  be compared with the summary statistics from Type 1 data (if available) to suggest
discrepancies or potential differences. If the Type 2 and/or Type 3 results suggest exposures that appear
to be different from the results of analyzing  the Type 1 data, potential explanations for the differences
should be provided.

       The end user will sometimes request a characterization of e- posure for the entire population (e.g.,
all workers in a given industry).  The identification of subpopulations defined by the important exposure
parameters entails that descriptive  statistics per se probably  should not  be derived for the entire
population, say by combining the descriptive statistics for each category (although, see Appendix B for
some issues related to such combinations).   The best  overall summary may be  the presentation of the
descriptive statistics for each category, perhaps in graphical format.  Such a presentation preserves much
more information then a formal, quantitative combination of means,  for example, over all the categoric*.
In conjunction with a prose description of the numerical  variety of circumstances (e.g., of the many
combinations of factors that  affect  exposure level), such tabular and  graphical representations should
convey the information necessary  for risk assessment and risk management decisions. Semi-quantitative
summaries (e.g.,  presentation of the range of mean exposure levels) may also be useful.

B.     Presentation of Descriptive Statistics

       The results should be presented in accordance with the needs of the end user as defined in Step 3.
The end user should have identified the required descriptive statistics and presentation methods.

       Where sufficient data are present, the plotting of the data on an appropriate scale in addition to
the accompanying descriptive statistics is usually the best presentation method.  Where box-and-whisker
plots were used to identify outliers, these plots can be presented in an appendix.  It is also useful to
present a characterization of the data by the  percentage of nondetected values and percentage of values
above  the detection limit, etc.

       There may be some Type 1 data groups  that had few observations and for which descriptive
statistics were not calculated.  These groups must be verbally  summarized and the indications of the
degree of exposure suggested by these groups compared and contrasted to the quantitative estimates for
the other  Type I  data groups.  This comparison and contrast is similar to mat provided for the Type 2
and Type 3  data sets.  Qualitative and semi-quantitative  results from the  data not  used to derive
quantitative estimates must be compared, to the degree possible, with the quantitative results.  Possible
explanations for apparent discrepancies should be provided.
                                               82

-------
                                          EXAMPLE

          Table 4 presents summary information for all of the groups considered in the example.

          Although most users wish to receive the data  in tabular form, some may wish to have
  graphic presentations also provided.  Figure 12 provides a box-and-whisker plot of the data for the
  monomer industry groups.  Figure 13 provides an example of a bar graph for several of die groups,
  comparing mean and maximum concentrations with several target levels.
C.     Presentation of Assumptions and Uncertainties

       A figure summarizing and clearly presenting all assumptions and uncertainties (treated in Step 18)
should be accompanied by a more complete explanation in the text.   Wherever possible, the effect of
those assumptions and uncertainties on the results of the analysis will be quantified (see Step 18).  Figure
14 presents an example of how mis information may be presented; it may be considered to be the product
of the cumulative listing of assumptions and uncertainties produced from the various steps of the exposure
assessment.
       The first column of Figure 14 presents a description of the uncertainty. The uncertainties
range from the length of the work day to me actual concentration when non-detected values are recorded.
The second column presents the associated assumption if one was made. The third column presents an
estimate of the range of possible values for the assumed value.  Finally, column 4 presents an estimate
of the effect of the assumption on die results.  Some of the effects presented in die last column may have
to be group-specific.
                                              83

-------
Table 4:  Descriptive Statistics Presentation, Example Data Set
No. of Exposed No. of
	 Crogp 	 ; 	 uorkcrf Saaoles
Monomer Control ROOM, Control 1
Monoswr Lab. Control 2
Monoswr Lab. Control 3
NcnoMr Lab. Control 4
NonoBer Lab, Control S
NonoMtr Lab. Control 6
Nonoaar Loading, Control 1
NonoBer Loading. Control 2
Nonoaar Process Araa. Control 1
NonoMar Process Araa. Control 2
Nonnaar Tank Fana. Control 1
PI. Crusbing and drying
PI. Lab
PI, Naintananca
Pt. Packaging
PI. Polymerization or rtaction
Pt. Process araa
PI. Purification
P1. Solution* and coagulation
PI. lank far*
Pt. Marahouaa
P2, Control rooai
P2. Crurfainn and drying
P2, Lab
P2. Maintenance
P2, Packaging
P2. Polymerization or reaction
P2, Solutions and coagulation
P2. Tank far*
PS. Lab
PI, Naintananca
PJ. Polyavrization or reaction
PS, Solution* and coagulation
PS. Tank far*
PS. Unloading area
P4. Cruabing and drying
P4. Lab
P4. Naintananca
Pi, Packaging
P4. Polymerization or reaction
Pi. Solution* and coagulation
P4. Tank fan*
P4, Warehouse
PV Crv«s>lng and drying
r\. i*»
r». B*ini«nant*
70
n
25
40
45
61
90
106
111
95
OS
166
SO
110
Jo
100
00
66
260
59
10
19
40
6)
94
25
105
650
59
45
74
100
460
41
45
24
60
61
400
204
SIS
45
56
39
36
ao
10
s
9
6
7
7
14
•
6
21
S
9
10
34
30
6
6
6
9
i
2
6
7
14
9
6
29
S
S
S
4
10
4
8
2
13
17
7
20
7
3
8
11
6
a
16
Niniav*
(ocaO (a)
* 0.020
0.420
2 0.000
0.050
0.560
* 0.040
0.100
S 0.000
1 0.270
* 0.070
i 0.040
0.014
0.014
0.014
0.012
0.03S
i 0.006
1.330
0.019
0.113
0.014
i 0.006
0.018
0.029
0.021
0.022
5 0.000
0.01S
0.123
S 0.009
0.011
3 0.006
2 0.006
0.009
0.770
i 0.005
5 0.006
i 0.006
5 0.006
* 0.006
* 0.005
5 0.006
i 0.005
0.033
0.100
0.072
Maxiaui
(ml (•!
1.870
373.540
1.960
0.870
6.310
0.890
7.500
123.570
2.980
4.190
1.530
0.071
8.330
11.020
0.154
2.710
0.304
6.950
0.046
0.962
0.020
0.070
0.052
4.120
0.048
0.030
0.780
0.030
0.436
0.429
0.026
0.250
0.164
0.682
28.510
0.081
0.943
0.013
0.026
S 0.008
2 0.008
12.030
i 0.010
0.116
8.870
3.090
Descriptive Statistics
Median Mean SE (ppa) Std. Oev. G«
(DOB) <•! (ml (hi fn«l
0.048
2.610
0.340
0.110
2.550
0.280
1.100
1.430
0.960
0.550
0.155
0.040
1.210
0.100
0.028
0.060
0.075
5.020
0.025
0.436
0.017
0.016
0.027
0.044
0.026
0.034
0.033
0.028
0.362
0.016
0.020
0.032
0.019
0.034
14.640
0.013
0.069
0.003
0.003
0.003
0.003
0.392
0.003
0.043
4.580
0.655
0.448
--
0.524
0.298
3.087
0.350
1.709
17.010
1.312
0.918
.-
0.043
2.909
0.857
0.039
0.696
0.118
4.357
0.027
--
--
0.028
0.032
0.636
0.030
0.033
0.077
-.
..
..
..
0.057
..
0.112
-.
0.016
0.104
0.004
0.006
0.003
..
2.366
0.004
0.055
3.972
1.200
0.229
--
0.210
0.146
0.853
0.115
0.511
15.242
0.462
0.230
..
0.006
1.059
0.396
0.006
0.449
0.050
0.944
0.003
--
--
0.012
0.005
0.339
0.003
0.002
0.027
.-
..
..
..
O.U16
.-
0.082
..
0.006
0.067
0.001
0.001
0.000
..
1.406
0.001
0.013
1.073
0.313
0.724
--
0.629
0.357
2.256
0.304
1.913
43.110
1.131
1.0S4
..
0.019
3.348
2.310
0.031
1.100
0.122
2.312
0.008
..
..
0.030
0.013
1.267
0.009
0.006
0.144
»»
..
..
..
0.068
.-
0.231

0.020
0.275
0.004
0.006
0.001
..
4.203
0.002
0.031
3.035
1.253
KM. Mean Geoa. Std.
0.236
..
0.335
0.191
2.492
0.264
1.139
6.243
0.994
0.603

0.040
1.900
0.298
0.031
0.372
0.002
3.049
0.026

..
0.019
0.030
0.285
0.029
0.032
0.036
„.
...
*.
..
0.036
..
0.049

0.010
0.102
0.003
0.004
0.003

1.161
0.004
0.048
3.156
0.030
3.106
..
2.572
2.569
1.924
2.116
2.463
4.120
2.107
2.502

1.515
2.505
4.277
2.003
3.062
2.346
1.646
1.343

,.
2.382
1.485
3.547
1.341
1.201
3.417

f .
^ _
— —
2.583

3.626

2.662
2.955
2.140
2.374
1.180

3.299
1.627
1.697
1.970
2.360
Non-Detects
6
0
3
0
0
1
0
2
1
S
3
0
0
0
0
0
1
0
0
0
0
2
0
0
0
0
2
0
o
|
o
2
1
0
0
4
3
6
16
7
3
1
10
o
o
0
60
0
33
0
0
14
0
25
17
24
60
0
0
0
0
* 0
17
0
0
0
0
33
0
0
0
0
7
0
0
33
o
ft
25
0
0
31
18
86
80
100
100
12
91
o
0
0

-------
                                             Table 4:  Descriptive Statistics Presentation, Example Data Set
            Crotc
No. of Exposed  Mo. of
    yorfcert
                            Descriptive Statistics                              Non-Oetects
Niniaua    MaxiauB   Median    Mean   SC iff*) Std. Dev. Geo». Mean   Ceoa.  Std.
                                                                     Oev.	Mo. Percent
w,
M,

P5j
P5
«.
Packaging
Poly«wriiation or reaction •
Purification
Solutions and coagulation
Tank farst
Warehouse
44
$2
90
SSS
41
30
23
20
12
12
6
7
i
i 0.014
0.035
2.770
S 0.006
1.070
0.033
0.144
2.800
24.140
0.169
6.010
0.068
0.042
0.400
7.580
0.090
2.760
0.039
0.058
0.740 •
9.S23
0.082
3.020
0.04S
0.007
0.198
1.942
0.014
0.714
0.006
0.034
0.886
6.727
0.047
1.7SO
0.015
O.OSO
0.474
7.778
0.071
2.613
0.043
1.730
2.568
1.889
1.709
1.713
1.382
1
0
0
1
0
0
4
•
f
|
0
0
(a) The •inisui. •axiatM, and avdian are provided as additional descriptive statistics.
(b) Standard error avasures precision of cfco Men.

-------
300
100
6
5

4
3
2

1
0
---






t


3







*

T T




I
t

J T
USB*



i

I

t
»
I
T II I l I l l l
                     * Mean    +  Extremes
Figure 12.  Box-and-Whisker Plot    Monomer Industry Groups

-------
 25
 20
s

o
 15
             Maximum
               Mean
                        1
   /' //////// / /'//// /'  /
  /      /     /       /       II'     I
 Figure 13. Example Bar Graph for Polymer Industry Groups: Means and Maxima Compared to 3 Target Levels

-------
Uncertainly
For job category A the length
of work day is not known for
30% of the monitorial dtfa.
Actual exposure not known for
values recorded as nondetected
(5% of values).
NIOSH indicates that data for
industry B represents "well
controlled" facilities.
Plants in the industry C data set
were not randomly selected but
rather all available data was
used.
Tor job category D only OSHA
compliance data were used.
etc.
Associated assumption
Length of work day assumed to
be 6 hours.
A value of LA/2 was assumed.
L = 1 ppm. ND = 0.71 ppm.
None made.
The data set for industry C
represents the industry as a
whole.
None made.
etc.
Reasonable possible variance of
assumption.
Reasonable range is 5 to 7
hours.
A value of L/2 could better
represent actual exposure.
NIOSH personnel roughly
estimated that exposures at well
controlled facilities can be 20%
lower than the industry average.
Not quantifiable.
Not quantifiable.
etc.
Effect on results
Maximum 6% change in
descriptive statistic for job
category A (sensitivity
analysis).
Maximum 2% change in
overall descriptive statistic
(sensitivity analysis).
Descriptive statistics for
industry B may underestimate
exposure by up to 20%
(NIOSH estimate).
Unknown
Facilities where OSHA
complaints are made may have
higher exposure than the
industry as a whole
(engineering judgment).
etc.
Figure 14. Example Format for Presentation of Assumptions and Uncertainties.

-------
                                          EXAMPLE

          No bias«t were identified for the Type 1 data.  The only assumptions used for the Type 1
   data sets were:

          •  The use of L/2 for the value of nondetected in the calculation of descriptive statistics

          •  Estimated duration of tasks provided by the companies where the monitoring was done
             were used to convert some values to 8-hour TWAs.

          For the Type 2 data, die following bias was identified:

          •  Some Type 2 data was taken using the old analytical method which may overestimate
             concentrations due to interference by other C« chemicals.

   The bias associated with the Type 2 data may explain discrepancies between the Type 1 and Type 2
   analysis results.
D.     Present Original Data

       Even diough every attempt should be made to satisfy user needs, poor communication or changing
requirements may dictate changes even after the exposure assessment is finalized. Therefore, presentation
of all original data used in the calculations and all important variables associated wim the data will allow
additional statistics to be calculated by the end user when required.
                                          EXAMPLE

          Appendix A presents me S16 full shm rcrxonal samples that were used in the example
   calculations in this report.
                                              90

-------
                           REFERENCES AND BIBLIOGRAPHY
(Aitchison, 57)


(Armstrong, 92)



(Airfield, 92)



(Bickd, 77)


(BMDP/PC)


(Box, 78)


(Buringh, 91)



(CMA, 86)


(Cochran, 63)


(Cohen, 61)


(Cohen, 78)



(Conover, 80)


(Corn, 79)
Aitchison, J. and J.A.C. Brown.  The Lognormal Distribution. Cambridge
University Press. London.  19S7.

Armstrong, Ben C.  Confidence Intervals for Arithmetic Means of Lognormally
Distributed Exposures.   American Industrial  Hygiene  Association Journal
53:481-485.  1992.

Atrfield, M.D. and P. Hewett. Exact Expressions for the Bias and Variance of
Estimators of the Mean of a Lognormal Distribution.  American  Industrial
Hygiene Association Journal. 53:432-435. 1992.

Bickd, P.J. and K.A. Doksum. Mathematical Statistics. Holden-Day, Ind.  San
Francisco, CA.  1977.

BMDP Statistical  Software, Inc.,  1440 Sepulveda Blvd.,  Suite 316,  Los
Angdes, CA 90025. 213/479-7799.

Box, George £., William G. Hunter, and 1.  Stuart Hunter.   Statistics for
Experimenter*. John Wiley A Sou. New York, New York.  1978.

Bruingh, Eltjo and Rod Laming. Exposure Variability in the Workplace: Its
Implications for the Assessment of Compliance.  American Industrial Hygiene
Association Journal 52:6-13. 1991.

Chemical Manufacturers Association.   Papers Presented at the Workshop on
Strategies for Measuring Exposure.  December 9 and 10, 1986.

Cochran, William G. Sampling Technique!. John Wiley and Sons, Inc. New
York, New York.  1963.

Cohen, C.A. Tables for Maximum Likdihood ET*«M*»«- Singly Truncated or
Singly Censored Samples.  Technometrics 3:535.  1961.

Cohen, Clifford, et at.  Statistical Analysis of Radionuclide Levels in Food
Commodities, prepared for U.S. Food and Drug Administration, Washington,
D.C.  September 15, 1978.

Conover WJ. Practical Nonparametric Statistics.  2nded.  John Wiley & Soos.
New York, New York.  1980.

Corn,  Morton and  Nurtan A. Esmen.  Workplace Exposure Zones  for
Classification of Employee Exposures to Physical  and  Chemical Agents.
American Industrial Hygiene Association Journal. 40:47-57.  1979.
                                           R-l

-------
(Cox, 81)


(Crump, 78)



(Damiano, 86)



(Damiano, 89)


(Daniel, 78)


(Devore, 82)



(Dixon, 83)



(Eisenhart, 68)


(EPA,  78)



(EPA,  87)


(EPA,  92)


(Esmen, 77)



(Hansen, 83)
Cox,  David C. and Paul Baybutt.   Methods  for Uncertainty  Analysis:   A
Comparative Survey.  Risk Analysis.  1:251-258.  1981.

Crump, Kenny S.   Estimation of  Mean  Pesticide  Concentrations  When
Observations are Detected Below the  Quantification  Limit.  Prepared for the
Food and Drug Administration.  Washington, D.C.  April 9, 1978.

Damiano, Joe.  The Alcoa Sampling and Evaluation Guidelines, presented at the
Workshop on Strategies for Measuring Exposure (CMA).  December 9 and 10,
1986.

Damiano,  Joe.  A Guideline for Managing the Industrial Hygiene Sampling
Function.  American Industrial Hygiene Association Journal. July 1989.
Daniel,  Wayne S.   Applied Nonparametric Statistics.
Company. Boston, Massachusetts. 1978.
Houghton Mifflin
Devore, Jay L.  Probability and Statistics for Engineering and the Sciences.
Table A.3 Standard Normal Curve Areas,  p. 620.  Brooks/Cole Publishing
Company, Monterey, CA.  1982.

Dixon, S.W., et al.  E.I. du Pont de Nemours.  Management of Air Sampling
Results, presented at the American Industrial Hygiene Conference, Philadelphia,
Pennsylvania.  May 25, 1983.

Eisenhart, Churchill.  Expression of the Uncertainties of Final Results. Science.
pp 1201-1204. June 1968.

Environmental Protection Agency.  Source Assessment: Analysis of Uncertainty
- Principles and Application EPA/600/13.  Industrial Environmental Research
Laboratory, Research Triangle Park, North Carolina. August 1978.

Environmental Protection Agency.  The Risk Assessment Guidelines of 1986.
EPA7600/8-87/045.  Washington, D.C. August 1987.

Environmental  Protection  Agency.    Exposure  Assessment  Guidelines.
EPA/600/Z-92/001.  Washington,  D.C.  1992.

Esmen, Nurtan A. and Yehia Y. Hammad.  Log-Normality of Environmental
Sampling Data. Journal of Environmental Science and Health, A12 (1 & 2), pp.
29-41. 1977.

Hansen, Morris H., et al. An Evaluation of Model-Dependent and Probability-
Sampling Inferences in Sample Surveys.  Journal of the American Statistical
Association,  pp. 776-793. December 1983.
                                            R-2

-------
(Hawkins, 92)



(Hawkins, 91)


(Hoaglin, 83)


(Horaung, 90)



(Hornung, 91)


(TT,91)
(Jackson, 85)



(Johnson, 70)


(Karen, 88)



(Koek, 88)


(Koizumi, 80)
(Lee, Undated)


(Lemasten, 85)
Hawkins, Neil C, MichaeJ A. Jayjock, and Jeremiah Lynch. A Rationale and
Framework for Establishing the  Quality of Human Exposure Assessments.
American Industrial Hygiene Association Journal 53:34-41.  1992.

Hawkins, Neil C., et al. A Strategy for Occupational Exposure Assessments.
American Industrial Hygiene Association.  Akron, Ohio.  1991.

Hoaglin, David  D.,  et al.  ' Understanding Robust and  Exploratory  Data
Analysis. John WUey and Sons, Inc.  New York, New York.  1983.

Hornung,  Richard  W., and Lawrence D.  Reed.    Estimation of Average
Concentration in the Presence of Noodetectable Values. Applied Occup Environ
Hyg 5(1):46-51.  1990.

Hornung, Richard W. Statistical Evaluation of Exposure Strategies. Applied
Occup Environ Hyg 6<6):516-520.  1991.

IT Environmental Programs, Inc.   Preparation of Engineering Assessments,
Volume I:    CEB Engineering  Manual.    U.S. Environmental  Protection
Agency/Office of Toxic Substances.  Washington, D.C.  Contract No. 68-D8-
0112. February, 1991.

Jackson, R.A., and A. Behar. Noise Exposure - Sample Size and Confidence
Limit Calculation.    American  Industrial  Hygiene  Association Journal.
46:387-390.  1985.

Johnson Norman L. and Samuel Kotz.  Continuous Unrvariate Distributions-1.
John WUey & Sons.   New York, New York.  1970.

Karch, Nathan J. Testimony of Nathan J. Karen, Ph.D. on the Quantitative
Risk Assessments Included in the Proposed Rule on Air Contaminants of the
Occupational Safety and Health Administration. July 28, 1988.

Koek, Kara  E., at al.  Encyclopedia of Associations 1989,  3 vol.   Gale
Research, Inc. Detroit, Michigan.  1988.

Koizumi, Akio,  et al.   Evaluation  of the Tune Weighted Average of Air
Contaminants with  Special  References  to  Concentration Fluctuation and
Biological Half Tune.  American  Industrial Hygiene Association Journal,  pp
693-699. October 1980.

Lee, Shin Tao, et al.  A Calculation and Auto Selection of Simple Statistical
Analyses for Industrial Hygiene Data, NIOSH,  Cincinnati, Ohio. Undated.

Lemasten,  Grace K.,  Arch Carson, and  Steven J. Samuels.  Occupational
Styrene Exposure for Twelve Product Categories in the Reinforced-Plastics
Industry. American Industrial Association Journal 46:434-441.  1985.
                                             R-3

-------
(Lilliefors, 67)



(Massey, 51)


(McBnde, 91)



(Nicas, 91)



(N1OSH, 77)



(NIOSH, 84)


(Oldham, 65)


(Olsen, 91)



(OSHA, 90)


(OSHA, 85)


(OSHA, Unpublished)

(Patty, 81)



(Powell, 88)



(Preat, 87)
Lilliefors, H.W.  On the Kolmogorov-Smirnov test for normality with mean and
variance unknown.  Journal of the American Statistical Association 62-399-402
1967.

Massey, Frank J.,  Jr.  The Kolmogorov-Smirnov Test for Goodness of Fit.
Journal of the American Statistical Association,  pp. 68-78.  1951.

McBride, Judith B.  A Study of Spatial Correlation in a Workplace Exposure
Zone.  Presented at American Industrial Hygiene Association Conference and
Exposition, Salt Lake City, Utah.  May 23, 1991.

Nicas, Mark, Barton P. Simmons, and Robert C. Spear.  Environmental versus
analytical Variability in Exposure Measurements.  American Industrial Hygiene
Association Journal 52:553-557.  1991.

National Institute for Occupational Safety and Health. Occupational Exposure
Sampling  Strategy  Manual.    DHEW  (NIOSH) Publication No.  77-173.
Cincinnati, Ohio.  1977.

National  Institute for Occupational  Safety and Health.   NIOSH Manual of
Analytical Methods. Third Edition.  Cincinnati, Ohio 1984.

Oldham, P.O. On Estimating the Arithmetic Means of Lognormally-Distributed
Populations. Biometrics.  213:235-239.  1965.

Olsen, Erik, Bjarne Laursen, and Peter S. Vinzents. Bias and Random Errors
in Historical Data  of Exposure to  Organic Solvents.   American  Industrial
Hygiene Association Journal 52:204-211.  1991.

Occupational Safety and Health Administration.  OSHA Technical  Manual.
Issued by OSHA  February 5,  1990.

OSHA Chemical  Information File. OSHA Directorate of Technical  Support.
June 13, 1985.

OSHA Manual of Analytical Methods (Unpublished).

Patty, Frank A.   Patty's Industrial Hygiene and Toxicology, 3rd  Edition.
Volumes 1 through 3 General Principles, Statistical Design and Data Analysis
Requirements.  John Wiley & Sons.  New York, New York, 1981.

Powell, R.W.  A Method for Calculating a Mean of a Lognormal Distribution
of Exposures. Exxon Research and Engineering Company. Florham Park, New
Jersey.  1988.

Preat, Bernard.   Application  of Geostatistical Methods for Estimation of the
Dispersion Variance of Occupational Exposures.  American Industrial Hygiene
Association Journal. 48:877-884.  1987.
                                            R-4

-------
(Rappaport, 91)


(Rappaport, 87)



(Re,  85)



(Rock, 82)



(Samuels, 85)



(Searle, 92)


(Schneider, 91)



(Selvin, 91)



(Selvin, 89)



(Selvin, 87)



(Sokal, 81)


(SPSS/PC)


(Stoline,  91)
Rappaport, S.M.  Assessment of Long-term Exposures to Toxic Substances in
Air.  Annuals of Occupational Hygiene 35:61-121.  1991.

Rappapon, S.M. and S. Selvin.  A Method for Evaluating the Mean Exposure
from a  Lognorfnal  Distribution.  American  Industrial Hygiene  Association
Journal.  48:374-379.  1987.

Re, M.  Microcomputer Programs for the Evaluation of Predictable Long-Tenn
Exposure.  American  Industrial Hygiene Association Journal.  46:369-372.
1985.

Rock, James C.   A Comparison Between OSHA-Compliance Criteria  and
Action-Levd  Decision  Criteria.   American  Industrial Hygiene  Association
Journal.  43:297-313.  1982.

Samuels, Steven J., Grace K. Lemasters, and Arch Carson. Statistical Methods
for Describing Occupational Exposuie Measurements.   American Industrial
Hygiene Association Journal.  46:427-433. 1985.

Searle,  Shayle R., George Casella,  and Charles E. McCulloch.  Variance
Components.  John Wiley & Sons. New York, New York.  1992.

Schneider, Thomas, Ib Olsen, Ole Jorgensen, and Bjarae Lauersen. Evaluation
of Exposure Information. Applied Occupational  and Environmental Hygiene
6:475-481. 1991.

Selvin,  S.  Review of draft version of "Guideline* for Statistical Analysis of
Occupational Exposure Data." Submitted to EPA Chemical Engineering Branch.
November, 1991.

Selvin, S. and S.M. Rappaport.  A Note on the Estimation of the Mean Value
from a  Lognormal  Distribution.  American  Industrial Hygiene  Association
Journal. 50:627-630.  1989.

Selvin,  S., et al.   A Note on the Assessment of Exposure Using One-Sided
Tolerance Limits.  American Industrial Hygiene Association Journal. 48:89-93.
1987.

Sokal, Robert R. and James Rohlf.  Biometry.  W. H.  Freeman and Company.
New York, New York.  1981.

SPSS, Inc., 444 N. Michigan Ave  , Suite 3000, Chicago, 0.60611.  312/329-
2400.

Stoline, Michael R.  An Examination of the Lognormal and  Box and Cox
Family of Transformations in Fitting Environmental Data. Envirpnmetrics 2:85-
106.  1991.
                                             R-5

-------
(SYSTAT)

(Tiat, 92)



(Tuggle, 81)


(Tuggle, 82)



(Waters, 91)



(Waters, 90)



(Whitmore, 85)



(Woodruff, 71)
Systat, Inc., 1800 Sherman Ave., Evanston, IL 60201.  312/864-5670.

Tiat,   Keith.    The  Workplace  Exposure  Assessment  Expert  System
(WORKSPERT).  American Industrial Hygiene Association Journal 53-84-98.
1992.

Tuggle, R.  M.  The NIOSH Decision Scheme.  American Industrial  Hygiene
Association Journal. 42:493-498.  1981.

Tuggle, R.  M.   Assessment of Occupational Exposure  Using One-Sided
Tolerance  Limits.    American  Industrial  Hygiene  Association   Journal.
43:338-346. 1982.

Waters, Martha  A., Steve Selvin,  Stephen  M. Rappaport.  A Measure  of
Goodness-of-Fit for the Lognonnal Model Applied to Occupational Exposures.
American Industrial Hygiene Association Journal 52:493-502.  1991.

Waters, Martha  A.   Some Statistical Considerations in  Chemical Exposure
Assessment.   Ph.D.  dissertation, University  of  California  at  Berkeley,
California.  1990.

Whitmore,  Roy  W.   Methodology for  Characterization of Uncertainty  in
Exposure Assessments.   U.S.  EPA, Office of Health and Environmental
Assessment, EPA/600/8-85/009.   August  1985.

Woodruff, Ralph S.  A Simple Method for Approximating the Variance of a
Complicated Estimate.  Journal of the American Statistical Association, pp.
411-414. June 1971.
                                             R-6

-------
                                   GLOSSARY OF TERMS


Accuracy • the mmnra of the correctness of the data, as given by the difference between the measured
       value and tht true value.

Sample Mean - the sum of all the measurements in the data set divided by the number of measurements
       in the data set.

Bias - a systematic error inherent in a method or caused by some feature of the measurement system.

Bimodal Distribution - a probability density function with two relative maxima values.

founding  RiUlmitt • an estimate of exposure that is higher than the exposure of the individual in the
       population with the highest exposure. Bounding estimate* are useful in constructing statements
       such as ".. exposure is not greater than" the estimated value.

Confidence Interval - a range of values diat contains the true value of a parameter in a distribution a
       predetermined proportion of time if the process of determining the value is repeated a number
       of times.

Descriptive Statistics . statistics mat describe conditions and events in terms of the observed data; use is
       made of tables, graphs, ratios, and typical parameters such as location statistics (e.g., arithmetic
       mean) and dispersion statistics (e.g., variance).
Frequency Hfcfnyfam . a graphical representation of a frequency distribution, typically using bars to
       exhibit die frequency or relative frequency of occurrence of each value or group of values in a
       data set.

Geometric Mean • me n* root of the product of n values.

High End Estimate - a plausible estimate of individual exposure for utose persons at the upper end of an
       exposure distribution, conceptually above the 90* percentile, but not higher than the individual
       in the population with the highest exposure.

Homogeneous Categories - groups or categories with die same or similar modifying attributes.
Limit of Detection * ** "M^"*""* concentration of an analyte diat, in a given matrix and with a specific
        method, hat a 99% probability of being identified, qualitatively or quantitatively measured, and
        reported lo be greater man zero.

Log-normal DterBnirion - a probability distribution restricted to positive real values. • If the random
        variable Y has a log-normal distribution, then X » log.Y, then X has a normal distribution.
Maximally Exposed  Individual  HV^D * * semiquantitative term referring to me extreme uppermost
       portion of the distribution of exposures.  For consistency, mis term should refer to me portion
       of the individual exposure distribution diat conceptually falls above me 98* percentile of the
       distribution, but is not higher than the individual with the highest exposure.

                                             G-i

-------
                     %JmatC - an estimate based  on finding the values of parameters that give the
       maximum value of the likelihood function. The likelihood function is the probability of observing
       the data, as a  function of the parameters defining a distribution.  The  maximum likelihood
       approach is applicable whenever the underlying distribution of the data is known or assumed.
       It is a common statistical estimation procedure.

       - the value in a measurement data set such that half the measured values are greater and half are
       less.

Nonparametric  Statistical Methods. -  methods that do not assume a functional form with  identifiable
       parameters for  the statistical distribution of interest (distribution-free methods).

Normal Distribution -  a symmetric probability distribution  whose maximum height is  at the mean,
       applicable to positive and negative real numbers. The normal distribution is the common "bell-
       shaped" curve.  Also called a Gaussian distribution.

Precision - a measure of the reproducibilhy of a measured value under  a given set of conditions.

Probability Sampling - sampling method in which each population element has a known and nonzero
       probability of  being selected.   Basic probability sampling methods  include simple  random
       sampling, stratified  sampling, and cluster sampling.

Quantification Limit -  the concentration of analyte  in a specific matrix for which the probability of
       producing analytical values above the method detection limit is  99%.

Random  Sampling - the selection of a sample of size n in such a way that each possible sample of size
       n has the same chance of being selected.

Reasonable Worst Case - a semiquantitative term.referring to the lower portion of the high end of the
       exposure distribution.   For consistency, it  should refer  to a  range diat can conceptually  be
       described as above the 90^ percemile in  the distribution, but below about the 98* percentile.

Representativeness - the degree to which a sample is.  or samples are, characteristic of the whole medium,
       exposure, or dose for which the samples are being used to make inferences.

Sample - a  small part  of something designed to show the nature or quality of the whole.  Exposure-
       related measurements may be samples of exposures of a small subset of a population for a short
       time, for the purpose of inferring me nature and quality of the parameters important to evaluating
       exposure.

Sample Cumulative Distribution Function - a function that estimates the theoretical cumulative distribution
       function of a population.  If  a sample of n independent values  is available, the value of the
       sample cumulative distribution at x is the proportion of the sample values that are less man or
       equal to x.

Standard Deviation - a  measure of the variability of the values in a sample or a population. The positive
       square root of the variance of me distribution.
                                              G-2

-------
Statistical Inference • .the process of using  knowledge  about samples to make statements  about  the
        population.

Statistical fiiynififlKf • *n inference that the  probability of an observed pattern (with respect to the data
        being measured or  the comparison being made) is so low that  it  is highly unlikely to have
        occurred by chance alone (within the constraints of the hypothesis being tested).  The inference
        is that the hypomesis being tested is probably not true; that hypothesis is rejected in favor of a
        stated alternative hypothesis.

Statistically Selected Sample • a sample chosen based on  a statistically valid  sampling plan.

Stratified Random Sample - a sample obtained by separating the population elements into nonoverlapping
        groups called strata, and then selecting a simple random sample for each stratum.

Theoretical  Cumulative  Distribution  Function - a  function that  uniquely  defines  the probability
        distribution of a random  variable, x.  The function specifies  the probability that the random
        variable assumes a value less than or  equal to x.

Worst Case - a semiquantitative term referring to the maximum possible exposure that can conceivably
        occur, whether or not this exposure actually occurs or is observed in a specific population.
                                              G-3

-------
         APPENDIX A

SPREADSHEET MATRIX FOR TYPE 1
      EXAMPLE DATA SET
         FULL SHIFT
      PERSONAL SAMPLES

-------
                                         APPENDIX A
       The data set presented in Appendix A represent 516 full-shift personal samples grouped into
58 initial categories.  In addition to these data, 37 short-term samples and 232 area samples were
collected.  Since these  data were not used in the example analysis they were set aside and are  not
presented in this appendix.
                                               A-l

-------
                       Table A-l.  Spreadsheet Matrix for Type I Example Data Set - Full Shift Personal Samples
Plul
 10

 M4
 M4
 Ml
 Ml
 Ml
 Ml
 Ml
 Ml
 M2
 M2
 M2
 M2
 M2
 M2
 M2
 M2
 M2
 M3
 Ml
 Ml
 M)
 Ml
 Ml
 Ml
 M4
 M4
 M4
 P3
 PJ
 PI
 PI
 PI
 PI
 PI
UduMty
Polymei
Polymer
Poly net
Polymer
Polymer
Polymer
Polymer
              IOO
       Pruceu type
           Loadiae) ana
           Laboratory
           Laboratory
           Laboratory
           (•aboratory
           Lalnnioiy
           Laboratory
           Laboratory
           Laboratory
           Laboratory
           Taakbm
           Taakbm
           Taakbm
Taakbm
Ualrtadiag area
Taakbm
TMkbm
TMkbm
TMkbm
TMkbm


Job title
ProccM leduuciaa
PTOCCM lecbaicua
UblacbaiciM-Wet
Leblecaaicua-Dry
Lab tecbaicwa - Dry
Ubtocbaicwa-Wei
l^mb teCaBBVaicaiBBBB' • Wttt
LjfeMcteictM • Dvy
l^toctaiciM-Wet
Lab tecfoBictaa. • Supefviiof
Lak MdMiciM • SHMtviMt
Lab»fbaifiaa-Pry
Labiifbairiia-Dry
LabtecbaiciM-Supervuor
UbtacbakiM-Wet
I^^K teCaaBBiiciaftaft * Wfli
(.ablctbaiciaa -Dry
(.•htctMKiaa -Dty
j 1 ah tatMKiM • Dry
l-jbtevbaiuM-Wet
1 ill Wf bairiaa - Dty
(.abwcbaiciM-Wcl
UblecbaiciM-Wet
UbtachaiciM-Wet
Lttb tAchaUciu • Diy
UblacbaicUa-Wet
LabttcfcuciM-Wat
Lab iMaMciM • Diy
UbtockMOM-Diy
LabtocteiciM-W«(
LabtockaiciM-Diy
I .^Ak ^^CaaaVaaUalA • BtfkavAk Vo^utlBBl
•^••V ••^•••^••i) WHV *UWaVM£
|«K |*£^««Cft*A » B^MHlft V€MllUal
•MPW ^MMBH*«MW " mPaW WMlValg
Laflb iMftaMCMB • B^NMb Vw4MA
f%OCA*M ttCaiiMCiAB
flof 
2












Scoyk
rtumioa
(•»•)
44)
459
4M
4U
47«
492
41*
a?
410
392
4(3
4*7
411
39S
49ft
443
49»
4M
441
IS9
470
449
449
4SI
4S4
410
449
472
SIS
474
4*7
40
^J*
4S2
400
4*5
502
210
491
474
^fW
24S
229
«|A
JBV
SIO
471
^•*
ataU
^•*
41S

H-arTWA.
to»)
123.57
3.97
0,05
0.62
0.17
0.15
0.07
0.03
196
<4>.IO
.oa
0.36
0.34
<-0.09
043
0.25
1.04

0.2»
0.12
0.56
0.76
1.73
4J2
4.U
2.SS
6.31
371 54
*»*«J^
241
0.42
0.20
00.04
•c-0.31

MdMia> tu 100% for Ub vcMiUtMM; MFV of bomb: 12 !%•
Mrfw-iv ait 100% lot bb vMlibtmi; MFV of booa*: 12 !•>•
Make-up air 100% fat bb VMU|*MM; MFV orbaodK 12 !%•
MMt-Hp «ir IOOH fat bb v*MiUtio>; MFV orbood*: U !•>•
Ub vwbUltM (12 ak ca«M«WbMa); MFV of boo*. ITS * 163 Ifcai
Lab vMlilMio* (12 ak chMgM/koiif); MFV of boa*: ITS A 163 Ifcat
Lib vMliUlioa(l2 air caMpi/Wmi); MFV of boo*: 171 A 163 !%•
UbvMliUiwa(l2«rcbM0MAouf);MFVorboo4f: 171* I63lfc«
Ub vuliUuoa (12 ak caMtetAoiu); MFV ofbood*: I7« * 163 Ifcai
Ub vMtibliM (12 ak «*ia>iWboiiO; MFV of boo*: ITS aV 163 Upat
Ub VMlilalioa (12 ak cbaa«MtoM); MFV of boo*: I7g A 163 Upm
Ub vaaiilaliM (12 ak fangn/tom); MFV arbaa*: 171 A 163 Ifom
Ub vaalibliM (12 ak fk»M«i%nn); MFV of boo*: 171 A 163 Upn
Make-up ak 50% bcOV; MFV af boo4t:70 Ifatm; raaauH eaclown tot bom
Maka-vp «ii 50% foe yaml ^aalilatioa; MFV of boo*: 1 10 Ifrm
Make-up *ii 50% (M gmenl vaatiblioa; MFV of boo*: 1 10 Upm
Make-up air SOH for OV; MFV of boodi:70 I%MB; exbausl eacloiuR tot bom
Makc-Hp ai. SOH for geaeral vealiUlioa; MFV of kood>: 1 10 Ifom
Make-up ail 50% for geaeral vcatibttoa; MFV of buodc 110 Ifom
Make-up air 50% for fteaenl vealilalioa; MFV of boodi. 1 10 Itpm
Make-up ail 60% for lab veMitaboai MFV ofbood*: 61 Ifom
Make-*? air 60% for tab veattbltoa; MFV of boo*: 1 3d Ifom
Mak«-Mpak«0%forlabvMliblioa;MFVofboo*: I3alfom
Make-up aw 60% for bb venibiioa; MFV of boo*: 61 Ifcai
Make-up air 60% for bb veaJibtioa; MFV of boo*: 61 Ifom
Make-up ak 60% for bb vealilalioa; MFV of boo*: 1 3S Ifom
Make-up air 60% for bb vealibtioa; MFV of boo*: 6 1 Ifom
tteam auaifiiM wilk earloeed — fc— - —- •—














-------
                         Table A-1.  Spreadsheet Matrix for Type 1 Example Data Set - Full Shift Personal Samples
Plutt
 ID

 PI
 PI
 PI
 n
 n
 n
 n
 n
 Pi
 n
 n
 n
 n
 n
 n
 n
 n
 n
 n
 n
 n
 n
 n
 Fl
 n
 n
 n
 n
 n
 n
 n
 n
 n
 n
 n
 n
 n
 n
 n
 n
 n
 n
 HI
 pi
UdutUy


Polymer

Poly met

Poly BUI

PotyMi
Polywr

Potyiwr
Poryawr
Pory«M
Pory«cr
Poly*ci
Poty«Mf
PoryM

Poly MM

Porywr

PotyMf

Poryn*r
PolyoMf

Poly am
               IUO
        Process type


Polymcriulio* or iMclio*
PoryncriulioB of IC*UIUB

Potyawfualioa or ICW.IHII
PotyncrizalioB or ie«tUo«

PolynenzalioB or ruclioc

Polymeiiulioa or mtfiiM

PorynciuMiui or M»OMM
MymeriiMtiom or ructtoa
PolymiiMlto* or i
            Poly •Miizttio* or
            My«Mrir tlioi or
            Poly •MtixMio* or
            Poly •wrisattM «r
            Pory •MtiMlio* or
           B or mclio*
PoryawnzMioa or n*cltoo

fotym»nuno» at n»sn»*
PolyBHnultoi ot raMlio*
Pory«wriMliaB or nacttoo
         JobliUe
                           Process icchwciM

                           PfOMM ICCBMCiM
                           PffOCCtt ItcfcMCi&B
                                       PlUKu K
PIOUH IccluiicUa
ProMM Icchakua

Provoss tefhaif ii>
PTOMM ttckokiu
                                       PtOMM iMkMCtM

                                       PfOMM HckMCIM
                           Prec«M tockMciM
                                       Itaetw tockaiciM
                                       PracoMtodHUciM
                                       ProMMtochwciM
Procow teclMicMB
Cortrol
typ*(b)














































Ainlio>
(«••)
510
491
470
441
456
44)
450
462
455
451
455
447
451
MS
462
4)8
460
449
45)
45)
4)7
455
458
457
449
)46
457
)S4
451
458
451
457
498
401
471
426
484
419
47)
508
497
495
424
472
465
408
1-teTWA.
(»f«>
0.0)5
0.047
0452
0.191
04)1
0479
0.194
0.07)
0416
04))
•412
0.779
0.022
408
0427
0468
0428
0441
04)1
0.041
0.015
0.055
O.OW
0.047
0.112
«*0.008
0.012
0450
0.1)8
0416
04)1
0.052
0.258
0.200
0412
O.OM
0.009
04))
•&4406
04)2
406
0.115
04))
0479
0421
0491
CoMroi detcriplioa

-------
PlMtf
 ID

 P3
 PJ
 PI
 P4
 P4
 P4
 PS
 PS
 PS
 PS
 PS
 PS
 PS
 PS
 PS
 PS
 PS
 PS
 PI
 PI
 PI
 Fl
 PI
 II
 1*1
 PI
 n
 P2
 P2
 P2
 P2
 P2
 P2
 n
 N
 N
 N
 N
 N
 P4
 N
 N
 N
 N
 N
 IM
Polyner
Poly OKI
Poly..,
PoJyoMt
Poly MIC
PolyMf
PolyMU
PotyMf
PolyaMt
Polyau
PolyMf
        100
      Table A-1.  Spreadsheet Matrix for Type 1 Example Data Set - Full Shift Personal Samples


        Proccw type                 Job title
             »!•*
                     caatnilaiyw
CmHbiag ud dryiat
diMtbug MM! 4ryi«t
Ciunbiic Md dryu|
Cnuabu« ud dtyiag
CiiMrfMf Md dryug
                         Proc«M IccMtcuB
                         Pt OCM* lectaiciaa
                         PntMMtoclwkiM
                                    PlMMf IcdMMCIMI
                                    PIUMU ICCMKUB
                                    flOCCU UCMKWB
                                    ProcMitecMicua
                                    PMMM
                                    PIOCMI iir taif in
PfUCM* MdHUCMB
PlOMMUCMkiM
Ptoc«M tockakua

Co*tn>4
>y|M(b)
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
SMNpl*
fttmltra
(•«)
462
469
199
367
522
450
436
406
463
433
443
443
419
456
453
416
412
44>
4M
467
501
45*
475
462
491
4*7
475
412
337
435
2N
407
407
115
496
513
467
449
413
471
501
523
4M
501
471
4tt

R-teTWA.
(MM)
0.018
otf.006
0.021
oo4oa
0040S
00.006
0492
0.134
0.109
0^17
0442
0.010
0.169
•c-0.006
0437
0495
0.107
042*
0414
0443
0471
0431
042*
0453
0.069
0439
0.040
0419
0427
0442
0452
04IS
0425
0441
0040$
0413
0414
00406
00406
0415
04*1
0419
00.006
0409
0417
0410
                                                                                                       Coaliul dewnpliua

-------
PU*
 ID

 P4
 P4
 P4
 P4
 P4
 P4
 P4
 M
 M
 M
 P4
 M
 M
 P4
 P4
 P4
 M
 M
 PS
 PS
 i-i
Pi
Pi
PS
PS
•S
PS
PS
PS
PS
PS
PS
PS
PS
PS
PS
PS
PS
PI
PI
P4
P4
P4
P4
 UdMUy

 PolyoMi
 Polynef
 Poly«M
 PolyMf
 PolyMf
 PolyMf
 Poly MM
 PolyMf
 PolyMf
 PMyi
 PolyMf
 PolyMf
 PotynMf
| Polymcf
1 Culyuw
I Pulyiw
       PotyMf
       PolyMf
       Poly MI
        PotyMf
        Poly MM
        PolyMf
        PolyMf
        ill i Liai ••
        POvjMMff
        Polya
        PolyMi
        PolyMi
              100
42
42
42
42
42
42
42
42
42
42
42
42
42
42
42
42
42
41
4)
4)
41
41
41
41
41
4)
41
41
41
41
41
41
 41
 41
 41
 41
 41
 41
 41
 41
        Poly MM
        Poly OKI
                        Table A-1.  Spreadsheet Matrix for Type  1 Example Data Set - Full Shift Personal Samples
           PloccMlype
                                                  Joblille

                                          PIOCMI icckaiciM
                                          PtOMM ICCbMCIM
                                      PH
                                      PlIKCM ICCkMCIM
              Wcnkouw
                                            Pf QC4
                                      fnttm ttttmtnm
                                             PlOMM
                                                 M tockaicuw
                                       PTOMM
                                             Proc4U uckucua
                                             PIOMW uckuciu
                                             PfOMW MclMictM
                                             Protcu MckwciMi

frtMml
IrfVHHU
lyp«(b)
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2














2
2
2
2
2
2
Senile
l^ml^nf
(«•)
441
496
4tt
468
449
406
27*
46S
461
4SI
444
Sll
494
192
461
467
471
471
411
411
4S8
4SI
41K
191
416
417
4S2
449
4S9
442
412
419
41S
460
461
420
471
444
47S
461
461
464
44*
47*
194
479

1-hiTWA.
(PP»)
<»0006
«4.006
0.011
6
•04.00)
OjOH
<-O.007
0.012
<-0.006
<-0.006
04)12
OA11
04)17
O.IU)
0)00
0.016
<-O.OH
0.0)6
0.144
0.044
O.IUO
04)42
04)51
04)76
OjOMt
•417
04)76
O.OW
04)M
04)11
0.112
04)29
04)SI
0.020
0.014
OJ07
«-O4MM
oOiOOS
««M6

-------
                       Table A-1.  Spreadsheet Matrix for Type 1 Example Data Set - Full Shift Personal Samples
PlMl
 ID

 P4
 P4
 P4
 P4
 P4
 P4
 N
 N
 P4
 P4
 N
 N
 PS
 PS
 PS
 PS
 PS
 PS
 PS
 PS
 PI
 PI
 PI
 PI
 PI
 PI
 PI
 PI
 PI
 PI
 PI
 PI
 PI
 PI
 PI
 PI
 PI
 PI
 H
 PI
 PI
 PI
 PI
 PI
 PI
 PI
UAuUy

Poly MM
PolyMM
Poly MM
PolyMM
PolyMM
PolyMM
Poly MM
PolyMM
PolyMM
PolyMM
PolyMM
PolyMM
PolyMM
PolyMM
PolyMM
PolyMM
PolyMM
PolyMM
m, hiin,
I*™*M
PolyMM
Poly MII
PolyMM
PolyMM
Poly am
Polynei
       IDO
       PIOMM type
l.»bofMoiy
Lftbonkxy
        JoblUle
Ltbonioiy uck •
Ltbonioiy uck
Ltbontofyuck
Ltbonfcxy icck •
Ltboratoty uck •
UbontofyUck
UbontoiyUch
•oalytu
uulyiii
                                     MMlyw
                                     wulyiu
                                     •MlyM
UbwMoiytoco
l.ikinliiyuck
Likiniiiyuck
Lihuniiyuck
LtMMMMyUck
JMHiUniyuck
UoontotyUck
LtbMMMyUck
LaboiMMyUck
Ltbociloty ucfc
Lthoraloiy Ucfc
Ubomoiylock
                                     • MMlyoi*
                                     tMlytii
                                     MMlyMi
                                     • •MlyM
                                     MUlyM*
Mulyiu
MMlym

CoMrol
«yp«(b)
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
S*mpk
AMMMM
(Mia)
4*1
295
4S4
469
4*3
4*3
4*1
4**
4*3
4*0
47*
474
411
430
415
475
441
420
439
47*
470
4*4
471
45*
439
442
4*1
1*1
47*
3*5
4*7
439
450
470
4*7
4*4
4*3
4*2
459
43*
522
43*
432
4*1
475
4*9

1-krTWA.
(PP»)
0.023
0.17*

-------
  PI**
   ID

   P2
   n
   n
   n
   PI
   PI
   PI
   PI
   PI
   PI
IxfciMiy


Polymer

PolyoMf
Polymer

Polymer

Polymer
Polymer

Polymer
IUO
                          Table A-1   Spreadsheet Matrix for Type 1 Example Data Set - Full Shift Personal Samples
            PIOMM type
    CoMmlfMMi
    Practwi
    PncM*i
    ftaMMI
    PMC*MMM
         KibUUe


Provcu uckaiciac

PIOMU uclwicia*
UUiUMOfMttlor
UUiliM of mior

Caniml
V.OOS
0.022
<4>.006
0.00*
O.J04
0.056
00.006
0.2)0
0.02)
0494
(a) IOO - Ukul cateapoM
(b)Tk*febV>wM«aMllMc
   6) 40 BJMCMI (Mki-tip aii M Iks Ufcontoty.
    : SOMM «fdaU-N|OSH/EPAfi«td Italy. IjUutataiy aaalysis limit ofdelcclioa

       Stajplti IRMI Ika HMM a4aal aad pm«csi> type wcw t.uttc>.k
-------
                   APPENDIX B



BACKGROUND INFORMATION ON STATISTICAL METHODOLOGY

-------
                                         APPENDIX B

            BACKGROUND INFORMATION ON STATISTICAL METHODOLOGY
       This appendix presents background information for statistical methods used in these guidelines,
as well as others that may be useful in the context of occupational exposure monitoring. Some of the
topics include log-normal distributions, analysis of variance, data transformations, tests of distributions,
cluster analysis, outliers, and confidence intervals.  The engineer may wish to become familiar with these
methods and  the statistical assumptions associated with etch method.  References such as Massey, 51,
for the K-S test; Cochran, 63; Danid, 78; Conover, 80; etc. should be obtained and consulted, as needed.
EPA statisticians should also be consulted, as required.
Box-and-Whisker Plot

       Box-and-whisker plots are useful for the graphical identification of possible outliers.  The box
plot presents a dear depiction of outliers, compared to the majority of the whole data set.  The box
portion of a box plot extends from  the 25th percentile to die 75th percentile of die observed data (i.e.,
25% of me observations are at or bdow me bottom of the box and 25% are above the top of the box).
That range is called the interquartile range.  The whiskers extending from the box cover only 1.5 tinm
the interquartile range.  Any points outside 1.5 times the range are presented individually.  This allows
clear identification of outliers.
Analysis of Variance

        Analysis of variance is me basis for many statistical techniques.  It is applicable to normally
distributed data (observations for which the errors are assumed to be normally distributed), especially in
the context of testing for significance of possible explanatory variables.

        Nested analysis of variance' is a particular form of analysis of variance mat addresses the issues
associated with hierarchical (nested) data structures.  In such structures, the variations induced by one
variable are nested within (vary around) means that are dependent on die value of anomer variable, and
which may also vary.  Box (78) presents a nice discussion of nested designs and their analysis.  Samuels
(85) discusses the nested structure for occupational exposure data.
Tests of Distribution*

        The guidelines assume a log-normal distribution but mere are dine common approaches to
quantitatively testing groups of data to determine if they can be described by certain distributions: the
Shapiro-Wilk statistic, the Koimogorov-Smirnov approach, and the ratio statistic.
                                              B-l

-------
        The Shapiro Wilk statistic involves covariances between the order statistics of a standard normal
distribution.  It is similar to a test that examines the correlation (squared) between the observed order
statistics and hypothetical order statistics.  Order statistics are simply the observations (or hypothetical
values) arranged io ascending order: the first order statistic is the smallest value, the second order statistic
is the next smallest, etc.  Simulation studies have suggested  that the Shapiro-Wilk statistic is more
powerful than the Kolmogorov-Smirnov test.  Note that it can be applied only for testing for normality.
Bickel (77) gives a short discussion and  references to material on the Shapiro-Wilk statistic.

       The ratio test was proposed in Waters (91) as a procedure for testing for log-normality. It makes
use of two estimates of the mean of a log-normal distribution. In fact the ratio that gives mis test its
name is the ratio of those two estimates and is very easy to  calculate.  Its  application requires the
estimation of the coefficient of variation (related to me geometric standard deviation) and use of tables
derived in Waters (91).  Those tables are not complete for large values of me coefficient of variation.
Waters (91) compared the ratio test favorably to the Shapiro-Wilk and Kolmogorov-Smirnov approach.

       The Kolmogorov-Smirnov (K-S) approach is a widely used technique. The particular application
presented here ts for testing for normality, and nas been called the Lilliefors test.  K-S approaches are
applicable more  generally for testing for a variety of distributions.

       The calculations needed  to apply me Lilliefors  test are discussed  in  some detail hen.  The
procedure consists of the following:  1) deriving the sample cumulative distribution function for dtt
observed data; 2) calculating me sample mean of the data (which may be concentrations if testing for
normality or log-transformed  concentrations  if testing for log-normality);  3)  calculating me sample
standard deviation of the data; 4)  standardizing the data;  5)  determining  the theoretical cumulative
distribution; 6) identifying me value for passing the K-S test (the critical  value); 7) calculating the
maximum difference between the theoretical cumulative distribution and  the  sample  cumulative
distribution (the  test statistic); and 8) determining if the data pass the test.

        1.  Derive the Sample  Cumulative Distribution Function

       The monitoring results for a group are arranged in ascending order lowest value first and the
highest value last.  Next, the values for the sample Cumulative distribution function are calculated on the
sorted data.  The cumulative distribution function t.  r «ich data point is equal  to me proportion of values
less than or equal to the given point, as presented  n Equation Bl.

                       SCO|  *  i / n                                                 Equation Bl

              where:

              SCD|    •     the sample cumulative distribution function value for observation i
              n       a     number of data points.
       2.  Calculate the Sample Mean of the Data

       The sample mean of the data is calculated using Equation 9 (for the concentrations) or Equation 2
(for transformed data) from Step 19.
                                              B-2

-------
       3.  Calculate the Sample Standard Deviation of the Data

       The sample standard deviation of the transformed data is calculated using Equation 10 (for die
concentrations) or Equation 3 (for transformed data) from Step 19.
       The purpose of this step is to standardize the data to the standard normal distribution curve. The
equation for standardizing the transformed data is presented in Equation B2.

                         z, »  (y, - SM)/SSD                                        Equation B2
               where:

               z,       »     a standardized data point
               SSD    =•     the sample standard deviation of the dan from 3 above
               SM     »     the sample mean of the data from 2 above
               y.       a     a data point (either a concentration or transformed concentration)
       Subtracting SM shifts the mean to zero, and men dividing by SSD scales the variable so that tte
standard deviation is 1 rather than SSD.

       5.  Determine the Theoretical Cumulative Distribution

       This step consists of calculating  a values corresponding to a meoretical (normal) cumulative
distribution function for the standardized transformed data. The distribution may be calculated manually
using a standard  normal table or determined by one of several  statistical software packages (see
Appendix Q.  A standard normal table may be found in many statistical texts, including Bickd (77).

       6.  Identify the Val'"> far Pajuing thfl K*S Test

       Table Bl presents critical values for the Lilliefors test (Conover, 80).

       The critical values depend on the sample size and the level of statistical significance required.
For sample sizes between the values on Table Bl , the value for the next highest sample size can be used.

       7.      ClIfiBJMB fltf D jrlJMences Between the Values of me Theoretical Cumulative Distribution
                                       * Distribution
        This step consists of subtracting the values of the theoretical cumulative distribution function from
the values of the sample cumulative distribution function and taking the absolute value, for each of the
data points. The goal is to identify the maximum vertical difference between the sample and meoretical
cumulative distribution functions. Since die sample cumulative distribution function is constant for values
between me data points, the differences examined should include those between me value of me sample
cumulative distribution  function at  a  particular data point value and (1) me value of the meoretical
                                              B-3

-------
cumulative distribution function at that data point value and (2) the value of the theoretical cumulative
distribution function at the next data point value.
       8. Determine If the Data Pass the Lilliefors Test

     .  If none of the absolute values of the differences between the theoretical cumulative distribution
and the sample cumulative distribution exceed  the critical value identified in 6 above,  then it may be
concluded that the data can be described by a normal distribution.  If one or  more of the absolute
differences exceed the critical value, the normal distribution is not appropriate.
            TABLE Bl. CRITICAL VALUES FOR LILLIEFORS TEST (Conover, 80)


Sample size
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
25
30
Over 30
Level of siffnjficancft

0.20
0.300
0.285
0.265
0.247
0.233
0.223
0.215
0.206
0.199
0.190
0.183
0.177
0.173
0.169
0,166
0.163
0.160
0.142
0.131
0.736
0.15
0.319
0.299
0.277
0.258
0.244
0.233
0.224
0.217
0.212
0.202
0.194
0.187
0.182
0.177
0.173
0.169
0.166
0.147
0.136
0.768

0.10
0.352
0.315
0.294
0.276
0.261
0.249
0.239
0.230
0.223
0.214
0.207
0.201
0.195
0.189
0.184
0.179
0.174
0.158
0.144
0.805

0.05
0.381
0.337
0.319
0.300
0.285
0.271
0.258
0.249
0.242
0.234
o.:i7
0.2:0
0.213
0.206
0.200
0.195
0.190
0.173
0.161
0.886
0.01
0.417
0.405
0.364
0.348
0.331
0.311
0.294
0.284
0.275
0.268
0.261
0.257
0.250
0.245
0.239
0.235
0.231
0.200
0.187
1.031
                                              B-*

-------
Data Tfflpstbrmations

       The guideimea consider a simple data transformation, the log transformation. That transformation
is just one of a family called die Box and Cox transformations. Such transformations are often considered
prior to analysis of data in order .to make the data more normal and to make the variances in different
groups more similar, both of which are desirable for most analysis of variance approaches, for example.
The reader is referred to Stoline (91) for a discussion of me Box and Cox family of transformations
applied to  environmental  data.   Samuels  (85)  also  considers transformations  other than  the  log
transformation for occupational exposure data. The guidelines do not recommend transformations other
than the log transformation because of the computations involved, because the properties (e.g., mean and
standard deviation) of die tog-normal distribution are well known  whereas die interpretation and
calculation  of descriptive statistics based on other transformations is not straightforward, and because the
log-normal  distribution is an accepted distribution for concentration data.
Log-normal Distribution

       The log-normal distribution has been studied and applied to concentration data for many yean
(Aitchuon, 57; Johnson, 70).  The estimation of die mean of me log-normal distribution is discussed in
detail in Attfidd (92). Note that the formula for MLEA in Attfidd (92) is incorrect as stated: multiply
the formula given by exp(f) to get the corrected value for MLE*.  Confidence limits for the mean of a
log-normal distribution are presented in Armstrong (92). Samuels (85) shows how confidence intervals
for the concentration data means can be derived from standard deviations and standard errors associated
with transformed data.
       The calculation of confidence interval is an important means of presenting the degree of certainty
about the estimates of any particular parameter.  It is important to note that a confidence interval for a
mean, for example, must be based on me variance associated with diat estimate, not with the variance
associated with me individual observation in the population. Thus, me standard error of die  mean
(which is die square root of me variance of the mean estimator) should be used to define a confidence
interval for me mean.

       Confidence intervals for means also depend on die data structure and die distribution of me data.
Although asymptotically (as die sample size gets very large) a mean will be normally distributed, no
matter what me underlying distribution of the observations may be, for relatively small sample sizes the
normal approximation may  be poor.  Thus, contidence intervals for a log-normal mean, for example,
have been specifically defined (Armstrong, 92).  Standard errors and merefore confidence intervals can
be defined for transformed concentrations and  convened back to me original scale (Samuels,  85).
Standard errors due take into account nested data structures can also be computed (Samuels, 85) and used
to define confidence intervals.
                                             B-5

-------
Techniaues rn Combine Groups
       One of the final quantitative steps in the guidelines is to obtain statistics for combinations of
groups. As is discussed in the text, this should only be attempted when appropriate. The only techniques
identified as appropriate are from stratified sampling theory. These techniques can be considered because
they allow tor estimation of means  and standard deviations across groups with widely different population
sizes.  The properties of these estimates are not known for nonrandom sampled data.  This fact should
be stated if such estimates are used.
Cluster Analysis

       Another approach to defining groups for statistical analysis is based on a procedure known as
cluster analysis. That approach examines characteristics of the measurements within groups (clusters)
and determines when two groups are similar enough to be combined.  The cluster analysis approach is
described here is some detail.

       Cluster analysis is an iterative procedure by which clusters are combined. Combination proceeds
in order of similarity: the most similar groups are combined first, then the next most similar, etc.  Each
group of measurements (e.g., a set of observations sharing the same values for all the important exposure
parameters identified by the engineer or industrial hygienist) starts out as a single cluster; when two
groups are combined, the combined group replaces  the two groups mat were combined, for the purposes
of comparison with other groups and additional combination.

       In order to conduct a cluster analysis, some measure of  similarity is  required.  The simplest
measure, and one that can easily be used for routine application to occupational exposure data, is based
on the mean values of the measurements within groups: two groups  are considered  similar when the
difference between their mean values is small. This clustering method is referred to as the unweighted
pair-group method using arithmetic averages (UPGMA).

       The advantage  of this method of clustering  is that it does  not require the specification or
assumption of an underlying distribution for the measurements within the groups.  A disadvantage is mat
this method only compares the mean values within  groups and does not consider omer descriptors of the
wjthin-group measurements, such as variation.  Some other methods for defining the similarity of groups
are discussed and compared with me UPGMA method in the SAS manual.  In some applications those
other methods  may be more appropriate dian the simple UPGMA procedure. Consultation with  a
statistician is recommended in diose cases, and may even be required  when die UPGMA method  is all
that is desired.

       A cluster analysis can proceed until  all the groups are combined into one cluster.  Output from
a computer package will specify which clusters are combined at each step and the similarity (difference
in means for the UPGMA method) of the clusters combined at each step. The engineer can examine the
output and determine at what point the clustering is sufficient, where "sufficient" clustering is based on
consideration of sample sizes  attained, on  the  similarity of the  clusters that are combined, or  on a
combination of those two factors.

       The goal  of this procedure is to  increase sample sizes and define  uniform  groups.   It is
inappropriate to combine groups that are quite dissimilar, just to get big sample sizes.   Thus, some

                                             B-6

-------
decision by che engineer, in consultation with the statistician, must be made about the weight to be given
to the conflicting pressures of those two considerations (sample size vs uniformity).  It is recommended
that the engineer and statistician decide on a "stopping rule"  prior to the running of the cluster analysis.
The  stopping rule will specify the largest measure of similarity (largest difference in means for the
UPGMA method) thai will be considered acceptable for combination to occur.  The "knowledge of the
engineer and the statistician is required to select a stopping  rule, as there is currently no statistical test
or probabilistic measure that  can  tell  the user when the clustering of  groups is inappropriate.   An
examination of the initial groups, their means, and the overall mean for all groups may provide some
indication of a stopping rule to consider.

       One drawback to the cluster technique is that it can combine groups which do not belong together
from an engineering perspective.  A priori selection of appropriate and inappropriate groupings of data
based  on  engineering judgement can be used to prevent inappropriate clustering of the data.  The
ANOVA technique discussed in Step 15 does not have this problem. However, the ANOVA technique
is most appropriate for data from designed, controlled experiments.
                                               B-7

-------
           APPENDIX C

LISTING OF COMPUTER SOFTWARE FOR
   VARIOUS STATISTICAL ANALYSES

-------
                                      APPENDIX C

                       LISTING OF COMPUTER SOFTWARE FOR
                           VARIOUS STATISTICAL ANALYSES
Box-and-Whisker Plot

       There are many software packages available on the PC for this technique. These include CSS,
NWA Statpak, Solo, SPSS/PC Plus, Statgraphics, Statpac Gold, Systat/Sygraph, SAS, and BMDP.
Analysis of Variance.

       Analysis of variance is a standard statistical tool available in the software packages CSS, NWA
Statpak, Sclo, SPSS/PC Plus,  Statgraphics, Statpac Gold, Systat/Sygraph, SAS, and BMDP.  Not ail of
these packages can provide the results needed to obtain variance components for a nested analysis of
variance.  SAS has a special procedure, PROG NESTED, which does just that.


Distribution Testi

       The Shapiro-Wilk test is provided as an option in the SAS procedure PROC UNIVARIATE.

       Many software statistical packages have the K«S type test procedures for the PC: CSS, NWA
Statpak, SPSS/PC Plus, Statgraphics, Statpac Gold, and Systat/Sysgraph.  For these packages, the user
compares a normal distribution to a set of data.

Theoretical Cumulative Distribution

       The many software packages available for computing the standard normal theoretical cumulative
distribution function include CSS,  NWA Statpak. Solo,  SPSS/PC Plus, Statgraphics, Statpac Gold,
Systat/Sysgraph, SAS, and BMDP.
                                          C-l

-------