Update of the Adult Lead Methodology’s Default Baseline Blood Lead Concentration and Geometric Standard Deviation Parameters

* A \
% ^
UNITED STATES ENVIRONMENTAL PROTECTION AGENCY
WASHINGTON, D.C. 20460
JUN 2 6 2009
OFFICE OF
SOL 10 WASTE AND EMERGENCY
RESPONSE
OSWER 9200.2-82
MEMORANDUM
SUBJECT: Transmittal of Update of the Adult Lead Methodology's Default Baseline Blood
Lead Concentration and Geometric Standard Deviation Parameters
FROM: James E. Woolford, Director c
Office of Superfund Remediation and Technology Innovation
TO: Superfund National Policy Managers, Regions 1 = 10
Regional Risk Leads, Regions 1-10
Purpose
The purpose of this memorandum is to transmit the document. Update of the Adult Lead
Methodology's Default Baseline Blood Lead Concentration and Geometric Standard Deviation
Parameters. This guidance document provides the technical basis for updating the default
baseline blood lead concentration and default geometric standard deviation input parameters of
the Adult Lead Methodology. This document is primarily intended for Regional risk assessors.
Background
The Adult Lead Methodology (ALM) is used to assess lead risks from the soil at non-
residential Superfund sites. The baseline blood lead concentration input parameter of the ALM
represents the geometric mean blood lead concentration in women of child-bearing age and the
geometric standard deviation (GSD) input parameter is a measure of the inter-individual
variability in these concentrations.
Default values for these input parameters were originally derived from an analysis of
blood lead data for U.S. women 17—45 years of age. from Phase 1 (1988 to 1991) of the Third
National Health and Nutrition Examination Survey (NHANES III) as well as consideration of
available site-specific data on blood lead concentrations and GSDs. EPA prepared updated
estimates lor these two parameters in 2002, using data from Phase 1 and 2 (1988 to 1994) of
1
Recycled/Recyclable • Printed wl#i VegesabJe Oil Based Inks on 100% Recycled Paper (40% Poslconsumer)

-------
estimates for these two parameters in 2002, using data from Phase 1 and 2 (1988 to 1994) of
NHANES III. The purpose of this report is to provide updated estimates for these parameters
using data from the NHANES surveys that were conducted from 1999-2004.
Implementation
This document provides updated values for the default blood lead concentration and the
geometric standard deviation input parameters of the Adult Lead Methodology. However, recent
scientific evidence has demonstrated adverse health effects at blood lead concentrations below
10 jag/dL down to 5 |ig/dL, and possibly below. OSRTI is developing a new soil lead policy to
address this new information. Until that soil lead policy is finalized, regional risk assessors and
managers should consult with the Lead Committee of the Technical Review Workgroup for
Metals and Asbestos (TRW) before applying these updated values for risk assessment.
If you have any questions, please contact me or have your staff contact Aaron Yeow at
yeow.aaron@epa.aov or Michael Beringer at beringer.rnichael@epa.gov.
cc: Mathy Stanislaus. OSWER
Barry N. Breen, OSWER
Renee Wynn. OPM
Brigid Lowery, CPA
Debbie Dietrich, OEM
Matt Hale, ORCR
David Lloyd. OBLR
John Reeder, FFRRO
Carolyn Hoskinson, OUST
Elliott Gilberg, OSRE
David Kling, FFEO
OSRTI Managers
John Michaud, OGC
Wendy Lubbe, Superfund Lead Region Coordinator, Region 7
NARPM Co-Chairs
2

-------
OSWER 9200.2-82
June 2009
^£DS7X
* fig \
I® I
\pB0/
Update of the Adult Lead Methodology's
Default Baseline Blood Lead Concentration
and Geometric Standard Deviation Parameters
Prepared by the
Lead Committee of the
Technical Review Workgroup for Metals and Asbestos
Office of Superfund Remediation and Technology Innovation
United States Environmental Protection Agency

-------
Members of the Lead Committee of the
Technical Review Workgroup for Metals and Asbestos
Region 1
Region 8
Mary Ballew
Helen Dawson
Region 2
Region 10
Mark Maddaloni
Marc Stifelman
Julie McPherson


OSRTI
Region 3
Aaron Yeow (co-chair)
Dawn Ioven

Linda Watson
ORD NCEA-Cincinnati

Harlal Choudhury
Region 4

Kevin Koporec
ORD NCEA-RTP

Andrew Rooney
Region 5

Andrew Podowski
Associate Member

Scott Everett (Utah DEQ)
Region 6

Ghassan Khoury
Advisors

Karen Hogan (ORD NCEA)
Region 7
Jim Luey (Region 8)
Michael Beringer (co-chair)
Paul White (ORD NCEA)

Larry Zaragoza (OSRTI)
Technical assistance provided by Syracuse Research Corporation (SRC)

-------
1.0 Introduction
In 1996 the Technical Review Workgroup for Lead (TRW) recommended the use of the Adult Lead
Methodology (ALM) (U.S. EPA, 1996) for assessing risks to adults from exposures to lead in soil at non-
residential Superfund sites. The background blood lead concentration (PbBaduit,o) parameter in the ALM
represents the geometric mean blood lead concentration (PbB) (|ig/dL) in women of child-bearing age, in
the absence of exposures at the site being assessed. The geometric standard deviation parameter
(GSDiadult) is a measure of the inter-individual variability in blood lead concentrations in a population
whose members are exposed to the same non-residential environmental lead levels. Default values for
both PbBaduit,o and GSDiaduit were originally derived from an analysis of blood lead data for U.S. women
17-45 years of age, from Phase 1 (1988 to 1991) of the Third National Health and Nutrition Examination
Survey (NHANES III)1 as well as consideration of available site-specific data on PbBs and GSDs (U.S.
EPA, 1996). The TRW prepared updated estimates for these two parameters in 2002, using data from
Phase 1 and 2 (1988 to 1994) of NHANES III (U.S. EPA, 2002). The purpose of this report is to provide
updated estimates for PbBaduit 0 and GSDi aduit using data from the NHANES surveys that were conducted
from 1999-2004. Although the Centers for Disease Control (CDC) releases data from the continuous
NHANES in 2-year increments, it is recommended to use four or more years of data when estimating
parameters for demographic sub-domains (CDC, 2005a).
This document provides the technical basis for updating the PbBaduit 0 (GM) and GSDl adu|t (GSD)
parameters and details on how the updated estimates for the parameters were calculated. The intended
audience for this document is risk assessors who are familiar with using the ALM. For background and
further detail on the use of the ALM in Superfund lead risk assessment, please refer to U.S. EPA (1996)
and the TRW Lead Committee website (http ://www. epa. gov/superfund/lead').
2.0 Technical Approach
Information on PbB for non-institutionalized U.S. women 17-45 years of age was extracted from the
NHANES database (CDC, 2005b). Data from three 2-year cycles of the continuous NHANES (1999-
2004) were used in this analysis in accordance with CDC recommendations (CDC, 2005a). Results
reported at less than the detection limit of 0.3 (ig/dL were analyzed by a variety of methods, including !/2
the detection limit (which is consistent with the 2002 analysis [U.S. EPA, 2002] and EPA guidance [U.S.
EPA, 1998]), and other statistical methods such as the Cohen maximum likelihood estimation method
(Cohen, 1959), the Tobit maximum likelihood estimation method (Tobin, 1958), regression on order
statistics (Newman et al., 1989) and the Kaplan-Meier method (Kaplan and Meier, 1958)2. Further
discussion of these methods is presented in the Appendix. Estimates of the PbBaduit 0 and GSDiadu|t were
calculated using SAS® software, Version 9.1 of the SAS System for Microsoft Windows3 and the sample
weights recommended by CDC (2005a).
1	For more information see htto://www. cdc. gov/nchs/about/mai or/nhanes/nhanes2005-
2006/faas05 06.htm#auestion%2010 or EPA's National Center for Environmental Assessment's Handbook for Use
of Data from the National Health and Nutrition Examination Surveys (NHANES): A Goldmine of Data for
Environmental Health Analyses
2	This method is equivalent to the Kaplan-Meier approach in EPA ProUCL Software.
3	SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. Although a newer version is available, it is unlikely the results would
be affected.
1

-------
Standard errors for the estimates of the GM were estimated using the Taylor linearization method in
SUDAAN4 (Version 9.0, SAS-callable version). SUDAAN is designed to compute statistics (e.g., means
and percentiles) and their standard errors for data that are from complex sample surveys such as the
NHANES. SUDAAN does not calculate estimates of population variance, such as the GSD. To the best
of our knowledge, a Taylor linearization approach for estimating the standard error is not available for the
GSD; therefore, standard errors for the GSD were estimated using a SAS macro3 that implements a
jackknife method. Parameter estimates used the sample weights provided in the NHANES demographic
data files (CDC, 2005a). Standard errors for the GSD were estimated using the sample weights and the
masked variance units (i.e., pseudo-strata and pseudo-primary sampling units which are also provided in
the NHANES demographic files). The sample weights account for the unequal probabilities of selection
of survey participants, the non-response of some participants, and are adjusted to population controls.
The masked-variance units account for the multistage sampling design and are necessary to estimate
accurate standard errors for parameter estimates.
3.0 Results
The 1999-2004 NHANES data provided 4,589 blood lead measurements for non-institutionalized U.S.
women 17-45 years of age. GM PbB and GSD were derived from the data using a variety of methods.
Table 1 presents the estimates for the GM PbB and GSD calculated by the various methods.
Table 1. Results of various methods for determining geometric mean baseline blood lead
concentration (GM) and geometric standard deviation (GSD) for U.S. Women age 17-45
years for NHANES III (1999-2004)
Method
GM
GSD
(jig/dL)

Assigning !/2 detection limit (0.15) to non-detects
1.0
1.8
Cohen MLEa
1.0
1.8
Tobit MLEb
1.0
1.8
ROS°
1.0
1.7
Helsel Kaplan-Meierd
1.0
1.8
aCohen: Maximum likelihood estimation method of Cohen (1959).
bTobit: Maximum likelihood estimation method by Tobit regression (Tobin, 1958).
cROS: Regression on Order Statistics method (Newman et al., 1989).
dHelsel Kaplan-Meier: Kaplan-Meier estimate calculated with Helsel's KMStats.xls spreadsheet with Efron's
bias correction (Kaplan and Meier, 1958).
4 SUDAAN® is a registered trademark of the Research Triangle Institute. © 2005 Research Triangle Institute. All
rights reserved. Although a newer version is available, it is unlikely the results affected.
3 The SAS macro implements the 'leave one out' jackknife method (e.g., Research Triangle Institute. 2004.
SUDAAN Language Manual, Release 9.0. Research Triangle Park, NC).
2

-------
The GM PbB is estimated to be 1.0 (ig/dL, and the GSD is estimated to be 1.8. Table 2 presents the
updated estimates as well as the estimates from the previous analyses.
Table 2. Geometric mean baseline blood lead concentration (GM, (ig/dL) and GSD
estimates of U.S. women, 17-45 years of age for NHANES III (1999-2004)
NHANES Data
n
GM
(jig/dL)
GSD
1988-1991 NHANES III - EPA ALM (1996)
-
1.7-2.2
1.8-2.1
1988-1994 NHANES III - EPA 2002 Update
5,016
1.5
2.1
1999-2004 NHANES - Current Update
4,589
1.0
1.8
4.0 Uncertainty
There are two main sources of uncertainty in this analysis: the unknown PbB concentrations for data
reported as non-detects and the potential bias in the estimates, particularly the GSD, that can occur when
a few observations have an undue influence on the estimate due to large sample weights.
As blood lead levels continue to decline in the U.S. population, the number of non-detects in the
NHANES data has the potential to become an important source of uncertainty in estimates of PbB and
GSD. However, the detection limit for measuring lead concentration in blood was lowered from
1.0 |ig/dL for the 1988-1994 NHANES III to 0.3 |ig/dL for the 1999-2004 NHANES. The lower
detection limit removes a considerable source of uncertainty that was present in previous estimates of the
GM (U.S. EPA, 2002) as the rate of non-detects in the NHANES 1999-2004 data (-2%) is much lower
than the rate of non-detects in the 1988-1994 NHANES III data (-21%). Nonetheless, the potential effect
of the non-detects on the robustness of the estimates was explored using a variety of methods, ranging
from simple substitution to more complex statistical methods (see Appendix for additional information).
An unbiased estimate for the GM can be made using any subset of the PbB concentrations by using the
sample weights included in the NHANES database. However, weighted estimates of population
variability, such as the GSD, have the potential to be unduly influenced by observations that receive large
sample weights. This source of uncertainty in the estimate of the GSD is partially addressed by
estimating the GSD using a regression approach that is described in the Uncertainty section.
The results of the uncertainty analysis yielded similar estimates of the GM and GSD regardless of the
method used to treat the non-detects (see Table 1). This increases confidence in the estimates and
indicates that the non-detects and sample weights do not have a substantial effect on the estimates of the
GM and GSD.
3

-------
5.0 Recommendations
Consistent with the 2002 report (U.S. EPA, 2002), estimates of the PbBaduit,o and GSDiaduitare provided for
the population of non-institutionalized U.S. women 17-45 years of age. Unlike the earlier analysis, the
TRW recommends using a single national estimate instead of the regional or ethnic alternatives.
Feedback from Regional risk assessors indicates that the regional and ethnic information are not useful
because populations move between regions and exposure is not typically ethnically homogenous. Based
on this analysis of the NHANES 1999-2004 data, the updated values for the PbBadult0 and GSDiadult
parameters, 1.0 (ig/dL and 1.8, respectively, are recommended for all applications of the ALM where
current and future use scenarios are assessed (see Table 3). These estimates have been shown to be robust
to the two sources of uncertainties addressed in the Uncertainty Section and the Appendix.
Table 3. Recommended baseline blood lead concentration ((ig/dL) and GSD estimates of
U.S. women, 17-45 years of age, between NHANES III and 1999-2004 NHANES data

n
GM
(ug/dL)
GSD
1999-2004 NHANES
4,589
1.0
1.8
4

-------
6.0 References
Blum, M. 1958. Recursion formulas for growing memory digital filters. IRE Transactions on
Information Theory. 4(1): 24-30. Available from:
http://ieeexplore.ieee.org/xpl/freeabs all.isp?tp=&arnumber=1057439&isnumber=22768.
Centers for Disease Control and Prevention (CDC). 2005a. National Center for Health Statistics
(NCHS). Analytical and Reporting Guidelines. Hyattsville, MD: U.S. Department of Health and
Human Services, Centers for Disease Control and Prevention. Available from:
http://www.cdc.gov/nchs/about/maior/nhanes/nhanes2003-20Q4/analvtical guidelines.htm.
Centers for Disease Control and Prevention (CDC). 2005b. National Center for Health Statistics
(NCHS). National Health and Nutrition Examination Survey Data. Hyattsville, MD: U.S.
Department of Health and Human Services, Centers for Disease Control and Prevention.
Available from: http://www.cdc.gov/nchs/nhanes/about nhanes.htm.
Centers for Disease Control and Prevention (CDC). 2005c. NHANES 2003-2004 Demographic
and Weighting Variable List, December. Available from:
http://www.cdc.gov/nchs/data/nhanes/nhanes 03 04/vardemo c.pdf.
Cohen, A.C. 1959. Simplified estimators for the normal distribution when samples are singly
censored or truncated. Technometrics .1(3): 217-237.
Gilliom, R.J. and Helsel, D.R. 1986. Estimation of distributional parameters for censored trace
level water quality data. 1. Estimation techniques. Water Resources Research. 22(2): 135-146.
Goovaerts, P. 1997. Accounting for local uncertainty in environmental decision-making
processes. In: Geostatistics Wollongong. Baafi, E.Y. and Schofield, N.A. (eds.) 2: 929-940.
Haas, C.N. and Scheff, P. A. 1990. Estimation of averages in truncated samples. Environ. Sci.
Technol. 24(6): 912-919.
Helsel, D.R. 2005. Nondetects and data analysis: statistics for censored environmental data. John
Wiley & Sons, Inc.: Hoboken, NJ.
Kaplan EL, Meier P. Non parametric estimation from incomplete observations. J Am Stat Assoc.
1958; 53: 457-81.
Newman, M.C., Dixon, P.M., Looney, B.B., and Pinder, J.E. 1989. Estimating mean and
variance for environmental samples with below detection limit observations. Water Resources
Bulletin. 25(4): 905-916.
Newman, M.C., Greene, K.D., and Dixon, P.M. 1995. UnCensor, Version 4.0. User's Manual.
Savannah River Ecology Laboratory. Aiken, S.C.
5

-------
Shumway, R.H., Azari, R.S., and Kayhanian, M. 2002. Statistical approaches to estimating mean
water quality concentrations with detection limits. Environ. Sci. Technol. 36(15): 3345-3353.
Tobin, J. 1958. Estimation of relationships for limited dependent variables, Econometrica. 26:
24-36.
U.S. Environmental Protection Agency (EPA). 1989. Risk Assessment Guidance for Superfund
Volume I Human Health Evaluation Manual (Part A). EPA 540-1-89-002. Available from:
http://www.epa.gov/oswer/riskassessment/ragsa/pdf/rags-voll-pta complete.pdf.
U.S. Environmental Protection Agency (EPA). 1996. Recommendations of the Technical Review
Workgroup for Lead for an Interim Approach to Assessing Risks Associated with Adult
Exposures to Lead in Soil. Available from:
http://www.epa.gov/superfund/programs/lead/prods.htm.
U.S. Environmental Protection Agency (EPA). 1998. Guidance for Data Quality Assessment:
Practical Methods for Data Analysis,. U.S. EPA ORD, EPA/600/R-96/084 (EPA QA/G-9).
U.S. Environmental Protection Agency (EPA). 2002. Blood Lead Concentrations of U.S. Adult
Females: Summary Statistics from Phases 1 and 2 of the National Health and Nutrition
Examination Survey (NHANES III). OSWER #9285.7-52. March. Available from:
http://epa.gov/superfund/lead/products/nhanes.pdf
6

-------
Appendix: Uncertainty Analysis
This Appendix provides additional details on the methods that were used to assess the two sources of
uncertainty that could have an effect on the reliability of the estimates of the GM and GSD:
• The unknown PbB concentrations for the data reported as non-detects; and
• The potential bias in the estimates, particularly the GSD, which can occur when a few
observations have an undue influence on the estimate due to large sample weights.
Several methods were used to assess the robustness of the estimates to the two sources of uncertainty
described above, ranging from simple to more complex statistical methods. A search of the literature on
more advanced statistical methods (/'. e., rather than simple substitution) for estimating the mean and
standard deviation with data that include non-detects found maximum likelihood estimation (MLE) and
regression on order statistics (ROS) methods to be the most often recommended. In addition to these
parametric methods, Helsel (2005) recommends the use of the non-parametric Kaplan-Meier method for
data that are multiply censored (e.g., more than one detection limit) and a robust version of the ROS
method. The use of more than one method builds confidence in the estimates when the methods produce
similar estimates.
The Kaplan-Meier method has a small positive bias when the smallest value is censored, which is the case
with the NHANES 1999-2004 PbB data (Helsel, 2005). Efron's bias correction was applied to address
this bias. The Kaplan-Meier method is not well suited for this analysis because it is designed for data
with multiple censors. The 1999-2004 NHANES data only has one detection limit, and applying the
Kaplan-Meier method to singly-censored is equivalent to replacing the non-detects with the detection
limit (Helsel, 2005). Nonetheless, this method was included in the analysis at the suggestion of several
Regional Risk Assessors.
Helsel's robust regression on order statistics (ROS) method was not used in this analysis because the
advantage of this method over the common form of the ROS method is to reduce the bias that is
introduced when estimates of the arithmetic mean and standard deviation in log-space are required in the
original measurement scale. Estimation of the geometric mean and geometric standard deviation are not
susceptible to this bias because, by definition, the geometric mean and geometric standard deviation equal
the anti-log of the mean and standard deviation of the log-transformed data.
Based on the literature search, the MLE method of Cohen (1959) was selected because of the large
sample size available for this analysis (n = 4,589) and the good fit of the normal distribution to the log-
transformed data (Figure A-l). Furthermore, MLE methods are more precise (provide narrow confidence
intervals relative to other methods) and the Cohen method could be implemented with the complex survey
data of NHANES using commercially available software (SAS). While MLE methods are not unbiased,
the bias is a practical concern only with small sample sizes Cohen recommends his method when
samples sizes are 10 or "... slightly larger..." (Cohen, 1959); Newman et al. (1995) suggest the bias is
important when the sample size is 20 or less and Helsel (2005) recommends MLE methods for sample
sizes of 50 or more. Simulation tests have shown the Cohen MLE method produces estimates with lower
bias and higher precision than other methods for large sample sizes (Haas and Scheff, 1990; Newman et
al., 1989; Shumway, et al., 2002).
7

-------
The formulas that are used to calculate the MLEs for the mean and standard deviation of the log-
transformed data are shown below (Equations 1 and 2, respectively).
Where:
s = (DL - ju)/
-------
4
y= 0.0473+ 0.5348Z
R2 = 0.9703
3
2
1
0
1
¦2
-5
¦4
¦3
2
1
0
1
2
3
4
5
Figure A-l. Probability plot for blood lead concentration for U.S. women, 17-45 years old using
only observations that are equal to or above the level of detection (0.3 jug/dL) The vertical axis is
the natural log of blood lead concentration in (ig/dL, the horizontal axis is the normal score (z). The
vertical dotted line shows where the 99th percentile intersects with the straight line. A lognormal
distribution provides a reasonably good fit to the estimated distribution, up to approximately the 99th
percentile. This plot also serves as a graphical description of the Newman et al. (1989) regression on
order statistics method.
These equations are typically solved iteratively (e.g., Newman et al., 1995; Shumway et al., 2002), by
assuming s can be estimated by F_1 (k / n) (assuming s can be estimated by the inverse of the
cumulative normal distribution function evaluated at k/n, or the fraction of non-detects) (Newman et al.,
1995), or by using another form of Equations 1 and 2 with tabled values for an auxiliary estimation
function (0) provided by Cohen (1959) (Equations 3 and 4).
jli = x- e(x - DL)
<72 =s2 +0(x-DLf
Equation 3
Equation 4
9
-------
The auxiliary function (0) considers the frequency of non-detects and the difference between the detection
limit (DL) and sample average (in log units).
The MLE estimates of the GM and GSD, computed by taking the anti-log of the estimate of the
population mean (Equation 3) and the anti-log of the square root of the estimate of the population
variance (Equation 4), are 1.0 (ig/dL and 1.8, respectively. These results are the same (to 0.1) as the
results obtained by the other methods, which increases the confidence in the estimates of the GM and
GSD previously obtained (Table 1). These results show that the estimates obtained with any of the
methods are robust to the uncertainty around the actual values of the non-detects.
The MLE method described above is approximate. The method assumes the data are independent (e.g.,
equally weighted), which is not the case with the NHANES PbB data. The estimates computed using the
MLE method as described above did not fully account for the NHANES sample weights. The sample
weights were used to compute the sample mean and variance (x and s2, respectively); however, the
theory behind the adjustments to the sample mean and variance that remove the bias in the estimates is
based on the assumption that the blood lead data are independent normal variables (i.e., all sample
weights = 1/sample size). Given the low percentage of non-detects in the PbB data (-2%), the
adjustments have little effect on the estimates of the GM and GSD. A version of the MLE could be
developed that fully accounts for the NHANES weights; however, based on the above results, we expect
that for this data set estimates of the GM and GSD would be the same (to two significant digits), so we
did not pursue this additional level of complexity.
Helsel (2005) recommends an iterative solution to the MLE over the use of the auxiliary function. An
iterative solution to MLEs was computed using Tobit regression (Tobin, 1958). An advantage of the
Tobit regression method is it can be used with data that include non-detects with more than one limit of
detection (LOD). Similar to the Kaplan-Meier method, this advantage is not useful for this dataset.
Nonetheless, it was included at the suggestion of several Regional Risk Assessors. The model was
estimated using SAS Proc LifeReg, which employs a Newton-Raphson algorithm to calculate the
maximum of the likelihood function. The GM and GSD were estimated as the mean and slope,
respectively, of the estimated regression model. The GM and GSD estimated by the Tobit regression
method (1.0 (ig/dL and 1.8, respectively) are the same (to 0.1) as those estimated by the Cohen method
and similar to the other methods (Table 1).
To address the uncertainty in the parameter estimates due to the sample weights, the regression on order
statistics (ROS) method recommended by Newman et al. (1989) was selected because it does not entail
squaring the sample weight to compute the estimate of the GSD. Cohen (1959) and Helsel (2005) also
recommend the ROS method for small sample sizes, although Helsel's recommendation is based on the
assumption that estimates of the mean and standard deviation must be back-transformed to the original
measurement scale, which is not an issue in this analysis.
The ROS method assumes the data are independent and from a normal distribution, but this method has
been shown to be robust to departures from normality (Gilliom and Helsel, 1986; Helsel, 2005). This
method was modified to fully account for the NHANES sampling weights. Under the assumption of
independent observations, the ROS method regresses the log-transformed data on the normal scores. The
normal scores are calculated as shown in Equation 5.6 This equation was modified to include the sample
weights, resulting in Equation 6. Equation 5 is a special case of Equation 6 where the weights are all
equal (and therefore cancel out).
6 The formula for calculating normal scores is from Goovaerts (1997). The differences between the normal scores
obtained using Equation 3 and more frequently used formulas (e.g., Blum [1958]), are trivial for large sample sizes.
10
-------
Z(i) = Fl
n n
Z(i) = F-
YjWi "°-5^
i=1
Equation 5
Equation 6
Where:
F 1 = inverse of the cumulative normal distribution function
r, = rank of the ith observation
w, = sample weight for ilh observation (adjusted so they sum to 1)
n = number of observations in the sample
Z(i) = normal score of the ith observation
In the ROS method, the log-transformed data above the detection limit are regressed on their sample-
weighted normal scores (Figure A-l). The mean and standard deviation are estimated by the intercept
and slope of the regression equation, respectively. The GM and GSD are computed by taking the anti-log
of the estimates of the mean (intercept) and standard deviation (slope). The estimate of the GM by the
ROS method is 1.0 (ig/dL; the same estimate derived by the other methods (Table 1). The GSD estimated
by the ROS method, 1.7, is approximately 0.1 less than the estimates obtained by the other methods (see
Table 1), indicating there is no substantial bias in the estimate caused by large sample weights.
The consistency in the estimates strongly indicates the estimates recommended in Table 3 are robust to
the uncertainty of the actual values of the non-detects and indicates the potential for bias caused by a few
large sample weights is not a concern.
11
-------