DRAFT TECHNICAL REPORT on MODELS FOR NATIONAL LEAD LABORATORY ACCREDITATION PROGRAM (NLLAP) EXPANSION May 27, 1999 Prepared by Steven M. Bortnick, Abi Katz-Stein, Peter A. Chace, and Ann M. Herberholt BATTELLE 505 King Avenue Columbus, Ohio 43201-2693 for John Scalera, Work Assignment Manager Office of Pollution Prevention and Toxics U.S. Environmental Protection Agency Washington, D.C. 20460 EPA Contract No. 68-W-99-033 Work Assignment 1 -4 ------- DISCLAIMER The material in this document has not been subject to Agency technical and policy review. Views expressed by the authors are their own and do not necessarily reflect those of the U.S. Environmental Protection Agency. Mention of trade names, products, or services does not convey, and should not be interpreted as conveying, official EPA approval, endorsement, or recommendation. Do not quote or cite this document. This reports is copied on recycled paper. ------- TABLE OF CONTENTS EXECUTIVE SUMMARY 1.0 INTRODUCTION [[[ 1 1.1 BACKGROUND [[[ 1 1.2 SCOPE OF REPORT [[[ 1 1.3 OBJECTIVE OF REPORT [[[ 2 1.4 ORGANIZATION OF REPORT [[[ 2 2.0 DEFINITIVE LABORATORY/ANALYSIS [[[ 4 2.1 DEFINE AND SPECIFY REQUIREMENTS FOR DEFINITIVE CLASSIFICATION .......... 4 2.2 ANALYSIS OF ELPAT DATA FOR NLLAP RECOGNIZED LABORATORIES .............. 5 2.3 PRESENT AND DEFINE A DECISION TREE FOR 95 PERCENT CONFIDENCE .......... 5 3.0 SEMI-QUANTITATIVE LABORATORY/ANALYSIS [[[ 7 3.1 CHARACTERIZE XRF PRECISION [[[ 7 3.1 .1 XRF Performance from Characteristic Sheets ......................................... 7 3.1.2 EPA Field Study on XRF Measurement Precision .................................. 10 3.1 .3 Field Investigation of On-Site Techniques (Portable XRF) ....................... 1 1 3.1 .4 Using XRF Technology for Soil Analysis .............................................. 1 1 3.2 CHARACTERIZING ULTRASONIC EXTRACTION/ANODIC STRIPPING VOLTAMMETRY (UE/ASV) PRECISION [[[ 12 3.2.1 EPA Evaluation of the PaceScan 2000 ................................................ 12 3.2.2 Interlaboratory Evaluation of UE/ASV Lead Measurements on Paint, Dust, and Soil [[[ 13 3.2.3 Field Investigation of On-Site Techniques (UE/ASV) .............................. 14 3.2.4 Laboratory Evaluation of the PaceScan 2000 ....................................... 14 3.2.5 Other Studies Considered [[[ 15 3.3 PRESENT AND DEFINE A DECISION TREE FOR 95 PERCENT CONFIDENCE ........ 1 6 4.0 QUALITATIVE LABORATORY/ANALYSIS [[[ 18 4.1 USING QUALITATIVE ANALYSES AS NEGATIVE SCREENS ............................. 18 4.2 USING QUALITATIVE ANALYSES AS POSITIVE SCREENS ............................... 20 4.3 FIELD TEST RESULTS FROM EVALUATING CHEMICAL TEST KITS ................... 22 ------- TABLE OF CONTENTS (Continued) 5.5 ALTERNATIVES TO THE GRAY-ZONES PROVIDED IN THIS REPORT 29 5.5.1 An Alternative Approach that Avoids Gray-Zone Calculations 29 5.5.2 An Alternative Gray-Zone Calculation 29 5.6 CONCLUSION 30 6.0 REFERENCES 31 APPENDIX A: GLOSSARY A-1 APPENDIX B: DETAILS FOR DATA ANALYSES CONDUCTED B-1 List of Tables Table 2.1. Gray-zone comparison between definitive laboratory requirements and NLLAP recognized laboratories 5 Table 3.1. Gray-zones (mg/cm2) for various XRF instruments when measuring lead levels on painted surfaces, based on PCS data 9 Table 3.2. Gray-zones defined as 1.0 ± two times the precision in mg/cm2, where 0 is the lowest limit. Results are based on the above described field test data 10 Table 3.3. PaceScan 2000 results based on data from (EPA 600/EPA-95/093), April 1996 13 Table 3.4. Results based on data from interlaboratory evaluation of UE/ASV 14 Table 3.5 Results based on data from analysis of TCLP Extracts (PaceScan 2000) 15 Table B.1 Regression parameters for an NLLAP accredited laboratory's precision as a function of the true lead level B-1 Table B.2 Descriptive statistics for paint, dust, and soil sample means for full and reduced data sets B-2 Table B.3. Estimated regression parameters for the response component and the SD component B-6 List of Figures Figure 2.1. Decision tree for making statements with 95 percent confidence using a Definitive Laboratory 6 Figure 3.1. Decision tree for making statements with 95 percent confidence using a SerraHQuHErtitative Laboratory 17 Figure 4.1. Hypothetical operating characteristic (OC) curve of a chemical test kit analyzing lead in paint (demonstrating qualitative analysis performance considered appropriate as a negative screen) 19 Figure 4.2. Hypothetical operating characteristic (OC) curve of a chemical test kit analyzing lead in paint (demonstrating qualitative analysis performance considered appropriate as a positive screen) 21 Figure 4.3. Decision tree for making statements with 95 percent confidence, using both a negative and positive screen 25 Figure B.1. Relationship between overall standard errors and means for NLLAP recognized laboratory analyses of paint, dust, and soil B-3 IV ------- EXECUTIVE SUMMARY This report presents options and issues associated with the expansion and redesign of the current National Lead Laboratory Accreditation Program (NLLAP) to cover laboratories and lead testing firms generating data for the evaluation of potential lead hazards from paint chips, dust, and soils. A field-decision performance based model for providing 95 percent confidence in decision making is developed for laboratories and testing firms engaged in at least one of three types of analysis: definitive, semi-quantitative, and qualitative. Definitive analyses are reflective of the NLLAP's current laboratory quality system requirements (LQSR). Semi-quantitative analyses are defined as quantitative analyses that produce data which do not meet the performance requirements for a definitive analysis but are still of sufficient quality to support a decision at the lead level of concern. Qualitative analyses do not provide quantitative information but are still capable of determining the presence or absence of lead. Every analyzing instrument has associated measurement error. When a recorded lead level is near the action level, the uncertainty associated with measurement creates a "gray-zone" in decision making. This gray-zone is a band around the action level where the true concentration of lead cannot be judged, with a required amount of certainty, to be above or below the action level, due to imprecision of the measuring instrument. A definitive laboratory is quantitatively defined as a laboratory having a confirmed gray-zone that is no larger than plus or minus 20 percent of the action level of concern. Analyses of ELPAT data for NLLAP recognized laboratories from rounds 14-21 show that on average, these laboratories are able to perform within the prescribed standard of the action levels plus or minus 20 percent for paint chips, dust, and soil. In other words, the performance of NLLAP recognized laboratories meets the requirement for definitive laboratories. The field-decision performance based model of this report recommends that results lying within the gray-zone of a definitive laboratory be classified as positive for lead above the action level. This conclusion is conservative and protects those at risk for exposure to lead. Analyses of available data for field-portable X-ray fluorescence (XRF) instruments and ultrasonic extraction/anodic stripping voltammetry (UE/ASV) instruments used to detect lead were performed. Results show that laboratories using XRF instruments tend to perform at the semi-quantitativerievd near the action level of concern. Laboratories using UE/ASV instruments sometimes gave results that could be classified as definitive, and sometimes as semi-quantitative near the action level of concern. The field-decision performance based model of this report recommends that semi-quantitative laboratory results that fall within the laboratory's gray-zone be sent to a definitive laboratory for a more accurate confirmatory analysis. Any results outside the gray-zone indicating the presence or absence of lead are accepted as accurate with 95 percent confidence. Laboratories or testing firms using chemical test kits that indicate only a presence or absence of lead are classified as qualitative laboratories. It is noted that current test kits available are useful as either a negative screen (i.e., whether the amount of lead is below the action level) or a positive screen (i.e., whether the amount of lead is above the action level), but not both. The field-decision performance based model of this report recommends classifying negative results ------- from negative screens as below the action level, and positive results from positive screens as above the action level. The samples associated with other qualitative results must be sent to a definitive laboratory for a more accurate confirmatory analysis. A decision rule is given for cases when both negative and positive screens are available. If both screens agree, the result may be accepted as accurate with 95 percent confidence. It is recommended that if the screens reach opposite conclusions, the sample should be sent to a definitive laboratory for further analysis, since a decision cannot be reached with 95 percent confidence in this case. In summary, a field-decision performance based model is recommended for providing 95 percent confidence in decision making. The form of the model will depend on the type of analysis being performed: definitive, semi-quantitative, or qualitative. VI ------- 1.0 INTRODUCTION 1.1 BACKGROUND In the FY92 appropriations bill, Congress identified the Environmental Protection Agency (EPA) as the federal agency responsible for establishing an accreditation program for laboratories participating in the analysis of lead in paint chips, soils, and dust wipes, as part of a national home lead-based paint abatement and control program. The Office of Pollution Prevention and Toxics (OPPT) has established the National Lead Laboratory Accreditation Program (NLLAP) to help assure parties utilizing the services of laboratories recognized by NLLAP that the laboratories are capable of adequately performing lead analysis. NLLAP recognition of laboratories analyzing lead in paint chips, soils, and dust wipes has two requirements: (1) Successful participation in proficiency testing using real world matrices, and (2) laboratory accreditation including on-site assessment of laboratory operations. The Environmental Lead Proficiency Analytical Testing (ELPAT) program is designed to administer this proficiency testing and assessment program. Design of the NLLAP is based on the recommendations of a Federal Interagency Taskforce on Lead-Based Paint, a group of 17 federal agencies involved with lead issues, that recognition should be based upon both proficiency testing and laboratory accreditation. Currently the NLLAP applies to laboratories performing analysis on collected samples using quantitative methods. Laboratories or testing firms which perform analysis directly on the area in question (on site, in-situ) or use methodologies which produce data of less accuracy (semi-quantitative or qualitative) than required by the current NLLAP accreditation program are not presently covered. This report presents options and issues associated with expanding and redesigning the current NLLAP to cover all laboratories and testing firms generating data for the evaluation of potential lead-poisoning hazards from paint chips, soils, and dust. 1.2 SCOPE OF REPORT This report develops a field-decision performance based model for NLLAP to address laboratories and lead testing firms that are engaged in at least one of three .types of analysis: definitive, semi-quantitative, and qualitative. The model is expressed as a decision tree, with a separate tree being proposed for each type of analysis. Definitive analyses are able to meet strict requirements for accuracy in measurements at the lead action level of concern. Semi- quantitative analyses are defined as quantitative analyses that produce data which do not meet the performance requirements for a definitive analysis, but are still of sufficient quality to support a decision on the presence of hazardous levels of lead in paint chips, dust, and soil with 95 percent confidence. Qualitative analyses do not provide quantitative information about lead levels but are still useful in determining the presence or absence of lead at levels above or below an action level. See Section 1.3 for more complete definitions. In addition, the performances of three lead testing technologies are evaluated, based upon performance specifications found in literature and past studies. These technologies are portable ------- x-ray fluorescence (XRF), ultrasonic extraction/anodic stripping voltammetry (UE/ASV), and chemical test kits. The suitability of these instruments for evaluating the presence or absence of lead about an action level is of interest. Specifically, the types of analyses that might be conducted using such technologies are explored. For example, the results of previous studies are considered as evidence about whether or not a laboratory utilizing a given brand of portable XRF technology instruments can be considered a definitive laboratory (i.e., whether definitive-type precision is demonstrated). 1.3 OBJECTIVE OF REPORT The report is designed to address the following issues: • Evaluate the lead analysis capability of laboratories and/or testing firms using field portable XRF instruments, UE/ASV instruments, and chemical test kits. • Identify and define the different types of laboratories or testing firms to be covered under the expanded NLLAP. • Construct lead analysis decision models for the different types of laboratories or testing firms to be covered under an expanded NLLAP. 1.4 ORGANIZATION OF REPORT Section 2.0 defines the requirements for a laboratory or testing firm to be categorized as producing definitive data. An analysis of recent ELPAT data for NLLAP accredited laboratories is provided to evaluate the level of performance of those laboratories. The goal of this analysis is to establish the current definition of a definitive laboratory (see Appendix A) as reasonable. A decision rule for evaluating the presence or absence of lead level using definitive analysis is presented. Section 3.0 defines requirements for a laboratory or testing firm to be categorized as producing semi-quantitative data. The precision of laboratories or testing firms using XRF and UE/ASV technology at lead action levels is evaluated and the requirements for these analyses to be classified as semi-quantitative are discussed. The reason for locating the findings for XRF and UE/ASV technology in this section is because such technologies, as noted later, perform often at the semi-quantitative level. A decision rule for evaluating the presence or absence of lead above an action level using semi-quantitative data is presented. Section 4.0 defines requirements for a laboratory to be categorized as producing qualitative data. The precision of chemical spot-test kits is evaluated. Chemical test kits are presented in this section because this technology produces qualitative data. A decision rule for evaluating the presence or absence of lead level using qualitative data is presented. Section 5.0 discusses other important issues related to the expansion of NLLAP, not necessarily covered by the results in this document. The purpose of this section is to raise pertinent unresolved issues in order that they may be properly addressed in the future. ------- Section 6.0 includes references to the studies evaluated in this report, and other material that is referenced in this report. A glossary of key terms used in this report is found in Appendix A. Appendix B expands on some of the more detailed statistical issues covered in this report, including evaluating ELPAT data for NLLAP recognized laboratories, the performance of analyses using XRF technology, and the performance of analyses using UE/ASV technology. ------- 2.0 DEFINITIVE LABORATORY/ANALYSIS This section presents concepts related to analyzing paint, dust, and soil for lead hazard under Definitive Laboratory conditions. Definitions for Definitive Laboratory, Definitive Laboratory gray-zone, and examples are provided. Every analyzing instrument has some associated measurement error. When a lead level is near the action level, the uncertainty associated with measuring this lead level creates a "gray-zone" in decision making near the action level. For example, if the paint lead level is substantially above (or below) the action level of 1.0 mg/cm2, then the instrument's uncertainty has minimal impact on the conclusion that the lead level is above (or below) the action level, and it is safe to make that conclusion with a reasonable amount of certainty. If, however, a paint sample is very near 1.0 mg/cm2, the instrument's imprecision will impact the ability to make the correct conclusion with the desired confidence (e.g., 95 percent confidence), hence creating a "gray-zone." Laboratories are defined as possessing an inherent gray-zone associated with the instrument or analytical method they use, along with other factors that might affect accuracy. A Definitive Laboratory is defined as a laboratory having a confirmed gray-zone that is no larger than plus or minus 20 percent of an action level with 95 percent confidence. Such a gray-zone is considered to be significantly small. 2.1 DEFINE AND SPECIFY REQUIREMENTS FOR DEFINITIVE CLASSIFICATION Here, the term "gray-zone" is defined as an approximate 95 percent confidence interval for what the laboratory will observe if the true lead amount is at the action level. Thus an observation within the gray-zone fails to provide sufficient information to allow a conclusion to be made that the true lead amount in the sample is above or below the action level. The definitive requirement that this gray-zone falls within ± 20 percent of the action level roughly translates to a 10 percent coefficient of variation requirement at the action level, to be demonstrated by the laboratory. With the mean and variance of the measurement at the action level represented by ^IA and CTA2, respectively, and assuming an unbiased methodology and normality in the measurements, the previous statement can be seen as follows: 95% C.I. => uA ± 2aA , 20% rule => (2aA)/uA < 0.20 , so (OA/UA), the coefficient of variation at the action level, must be < 0.10, or 10 percent. When action levels correspond to the proposed §403 hazard standards, the following list translates the above-defined Definitive Laboratory requirement to gray-zone requirements for analysis of paint, dust, and soil samples: • 1.0 ± 0.2 mg/cm2 for paint (or 0.5 % ± 0.1 % lead by weight) (under the statutory definitions for lead-based paint) • 50 ± 10 ug/ft2 for dust on floors ------- • 250 ± 50 ng/ft2 for dust on window sills • 800 ± 160 ^ig/ft2 for dust on window wells (under the interim §403 guidance) • 2,000 ± 400 ppm for soil. 2.2 ANALYSIS OF ELPAT DATA FOR NLLAP RECOGNIZED LABORATORIES ELPAT data from rounds 14 to 21 for NLLAP recognized laboratories were considered for analysis in order to determine whether such laboratories, currently considered definitive in performance, do in fact achieve such precision. For each medium (paint, dust, and soils), each lab analyzed four samples per round, yielding a total of 32 analyses per lab. The overall set of observed mean values were treated as true values, and the corresponding standard errors were treated as a function of the truth. Ordinary least squares regression was used to determine the parameters defining the approximate linear relationship between the lead level and the precision of NLLAP recognized laboratories. Details about the statistical modeling used in this analysis, as well as summary statistics for available data, are provided in Appendix B, Section 1. Note that the parameters presented in Table B.I are used to calculate a "precision" value at the action levels of concern. This precision is then used to calculate the gray-zone, which equals the action level plus or minus 2 times the precision. The theoretical Definitive Laboratory gray-zones and the NLLAP observed gray-zones for paint, dust, and soils are presented below in Table 2.1. Table 2.1 Gray-zone comparison between definitive laboratory requirements and NLLAP recognized laboratories Medium Paint (mg/cm2) Dust (ug/ft2) Soil (ppm) Floor Window Sills Window Wells Action Level (± 20%) 1.0 ±0.2 50±10 250 ±50 800 ±160 2000 ± 400 Definitive Gray- Zone [0.80, 1 .20] [40.0, 60.0] [200, 300] [640, 960] [1600,2400] -NLLAP Observed Gray-Zone [0.87, 1.13] [36.13,63.87]' [210, 290] [688, 910] [1789, 2211] 1 Based on data mostly above 50 ug/ft2 (see Figure B.1 of Appendix B). Result should be interpreted with caution. The results show that on average, the NLLAP recognized laboratories are able to perform within the prescribed 'standard of the action level plus or minus 20 percent. The definitive-type performance of the NLLAP recognized laboratories follows from the fact that the NLLAP observed gray-zones all are narrower than the Definitive gray-zone requirements in Table 2.1, except for floor dust. 2.3 PRESENT AND DEFINE A DECISION TREE FOR 95 PERCENT CONFIDENCE Figure 2.1 below presents a decision tree for making a decision with approximate 95 percent confidence that a true lead level is above or below an action level, when using a Definitive Laboratory to perform the analysis. Notice that even though a laboratory may be considered definitive, it still has an associated amount of imprecision, as recognized by its ------- Sample Analyzed by Definitive Laboratory \/ \/ Observed Measurement Below Definitive Laboratory gray-zone Observed Measurement Within Definitive Laboratory gray-zone Observed Measurement Above Definitive Laboratory gray-zone Conclude Lead is Below Action Level Conclude Lead is Above Action Level (conservative approach) Conclude Lead is Above Action Level Figure 2.1 Decision tree for making statements with 95 percent confidence using a Definitive Laboratory. gray-zone. As such, results in the gray-zone of a Definitive Laboratory cannot be classified with 95 percent confidence as either above or below the action level. However, if some decision must be made, and since a gray-zone result does not provide clear evidence that the true lead amount is below the action level, then the decision tree is conservative by concluding that lead is above the action level in such cases. The net result of this choice is that an increased frequency of false positive classifications will occur, associated with lead levels that are truly below an action level but classified otherwise due to instrument imprecision. However, such a misclassification is preferred to a false negative classification. That is, the conservative approach of increasing false positive classifications protects those at risk for lead exposure. ------- 3.0 SEMI-QUANTITATIVE LABORATORY/ANALYSIS A Semi-Quantitative Laboratory is a laboratory performing a quantitative analysis, but whose methods provide a gray-zone (i.e., 95 percent confidence interval) that is wider than plus or minus 20 percent of the action level. The decision tree to be developed below recommends that any sample whose lead measurement falls within a Semi-Quantitative Laboratory gray-zone be sent to a Definitive Laboratory for more accurate analysis. The results from the Definitive Laboratory then will be used to make the final conclusion with respect to the action level. As an example, consider a laboratory using an LPA-1 portable XRF as its measurement technology. According to the performance characteristic sheet (PCS) of the LPA-1, with a reading measured at 1.0 mg/cm2 for lead in paint, the substrates brick, concrete, drywall, plaster, and wood have zero bias and a precision of 0.3 mg/cm2. Therefore, LPA-1 instruments have a gray-zone of 1.0 ± 2*0.3, or (0.4,1.6) mg/cm2, for the listed substrates. Since the LPA-1 gray-zone is wider than the Definitive Laboratory gray-zone of (0.8,1.2) mg/cm , the PCS numbers suggest that a laboratory using the LPA-1 XRF analyzer would be classified as a Semi-Quantitative Laboratory for analyzing lead in paint. As this example suggests, it is expected that data for portable XRF and UE/ASV technologies will suggest that laboratories using such techniques often will be classified as Semi-Quantitative Laboratories. Therefore, findings for portable XRF and UE/ASV technologies are presented in this section. However, it is not necessarily the case that such technologies always lack the precision to be considered definitive in nature, as the discussion below will show. 3.1 CHARACTERIZE XRF PRECISION This section considers the performance of portable XRF technologies. In section 3.1.1, the gray-zones as determined from PCS information are presented. A 1993 field study, described in more detail in section 3.1.2, uses portable XRF instruments in the field, following manufacturer instructions. An additional field study performed by the National Institute for Occupational Safety and Health (NIOSH) to evaluate three lead-based paint detection technologies is discussed in Section 3.1.3. Finally, analysis of lead in soil by portable XRF is considered in Section 3.1.4. 3.1.1 XRF Precision from Performance Characteristic Sheets This section presents estimated gray-zones for various makes and models of portable XRF instruments, according to data provided by XRF Performance Characteristic Sheets (PCSs). First, some background is provided. In (EPA 747-R-95-008), bias and precision estimates for XRF instruments are obtained from field testing data. Since estimates of bias and precision are based on analysis of field samples, the report considers two fundamental issues: 1. The lead levels within the field samples were more distributed toward lower values, with fewer samples occurring as the level increases. ------- 2. Lead levels are themselves estimated by laboratory analysis of paint samples using inductively coupled plasma-atomic emission spectroscopy (ICP-AES), which itself has some measurement error. These two factors make it impossible to directly observe the bias and precision of XRF results under field conditions, or at pre-specified lead levels. In order to estimate XRF bias and precision, the report makes several assumptions, which are listed below: • A linear regression relationship exists between the mean XRF measurements and the true lead level, i.e., XRF = a + b*(true lead level) + error. • The magnitude of the regression error is proportional to true lead level. The model used is Error = c + d*(true lead level). • It was assumed that the distribution of true lead levels is lognormal, and that ICP-AES measures the natural logarithm of the true lead level with a known measurement error. This assumption affects how the parameters (a, b, c, and d) are estimated. For further details on deriving the estimates to be used in a PCS, see (EPA 747-R-95-008). An XRF PCS is instrument-model specific and is created to provide testing guidance and detailed performance information. This information includes the specification of conclusive and inconclusive XRF results. The PCS also provides calibration check values to be used in conjunction with the MIST Standard Reference Material paint films and a procedure for evaluating XRF testing. Table 3.1 presents eight XRF manufacturer's make and model information, and gray-zones based on precision and bias reported in the PCS. Performance is stratified by substrate. The results of Table 3.1 provide evidence that a laboratory utilizing a portable XRF instrument for analysis would probably be classified as a Semi-Quantitative Laboratory. That is, the observed gray-zones in Table 3.1 are wider than the definitive requirement of (0.8,1.2) mg/cm2 for lead in paint. ------- Table 3.1 Gray-zones (mg/cm2) for various XRF instruments when measuring lead levels on painted surfaces, based on PCS data Manufacturer TN Technologies Scitec Corporation Warrington, Inc. Princeton Gamma-Tech, Inc. Niton Corporation Radiation Monitoring Devices Scitec Corporation Advanced Detectors, Inc. Make and Model Pb Analyzer 9292 MAP-3 Microlead 1 revision 4 XK-3 XL-309. 701 -A, 703- A Spectrum Analyzers LPA-1 sold prior to /serviced before June 26. 1995 LPA-1 sold or serviced after June 26, 1995 MAP-4 LeadStar w/ software v 4.1 to 4.30 LeadStar w/ software versions less than 4.1 Measured at Normal reading time at 1 5- seconds 1 5-second reading 60-second reading Normal reading time at 1 5- seconds Normal reading time at 1 5- seconds Variable time mode, software version 5.1 20-second reading 30-second reading Quick Mode 20-second reading Quick Mode Screen Mode Test Mode .Fixed Mode Fixed Mode Brick (0.2, 1.8) (0.0, 3.0) (0.0, 2.4) (0.0, 2.2) (0.0, 2.2) (0.4. 1.6) (0.4, 1.6) (0.4, 1.6) (0.2, 1.8) (0.4, 1.6) (0.2, 1.8) (0.2, 1.8) (0.4, 1.6) 10.4, 1.6) (0.4, 1.6) Concrete (0.2, 1.8) (0.0, 3.0) (0.0, 2.4) (0.0. 2.4) (0.0, 2.4) (0.4, 1.6) (0.4, 1.6) (0.4, 1.6) (0.2, 1.8) (0.4, 1.6) (0.2, 1-8) (0.2, 1.8) (0.4, 1.6) (0.4, 1.6) (0.4, 1.6) Drywall (0.2, 1.8) (0.2, 1.8) (0.4, 1.6) (0.4, 1.6) (0.2, 1.8) (0.4, 1.6) (0.4, 1.6) (0.4, 1.6) (0.2. 1.8) (0.4, 1.6) (0.2, 1.8) (0.0, 2.2) (0.0, 2.2) (0.4, 1.6) (0.4. 1.6) Metal (0.2, 1.8) (0.0, 2.2) (0.2. 1.8) (0.0, 2.2) (0.0. 3.0) (0.4, 1.6) (0.4, 1.6) (0.4, 1.6) (0.2, 1.8) (0.4. 1.6) (0.2, 1.8) (0.4, 1.6) (0.6. 1.4) (0.4. 1.6) (0.4, 1.6) Plaster (0.2, 1.8) (0.0, 2.8) (0.0, 2.6) (0.0, 2.4) (0.0, 2.2) (0.4, 1.6) (0.4, 1.6) (0.4. 1.6) (0.2, 1.8) (0.4, 1.6) (0.2, 1.8) (0.2, 1.8) (0.4, 1.6) (0.4, 1.6) (0.4, 1.6) Wood (0.2, 1.8) (0.0, 2.4) (0.2, 1.8) (0.0, 2.4) (0.0, 2.4) (0.4, 1.6) (0.4, 1.6) (0.4, 1.6) (0.2, 1.8) (0.4, 1.6) (0.2, 1.8) (0.0, 2.2) (0.0, 2.2) (0.4, 1.6) (0.4, 1.6) Note: Gray-zones are defined as 1.0 mg/cm2, ± two times the documented precision. The definitive gray-zone of comparison is (0.8, 1.2) mg/cm2. ------- 3.1.2 EPA Field Study on XRF Measurement Precision In 1993, a study was conducted by the U.S. EPA (EPA 747/R-95/002b) to collect information necessary to establish federal guidelines on testing for lead in paint. The overall study objective was "to collect information about field measurement methodologies sufficient to allow EPA and HUD to establish guidance and protocols for lead hazard identification and evaluation." Included in this report is a statistical model that was used to describe the relationship between XRF measurements and the lead level as analyzed by ICP-AES (see Appendix B, Section 2, for more detail). ICP is a method commonly used in laboratories to analyze lead in paint and is one of the techniques recommended for confirmation testing. For each of the six field portable devices tested in this study (Lead Analyzer, MAP-3, Microlead I, X-MET 880, XK-3, and XL), instrument gray-zones were calculated using a precision estimate derived from the statistical model parameters provided in the report. Since the Lead Analyzer and MAP-3 instruments could be operated by using either K-shell or L-shell X-rays, results were recorded once for each energy level. Table 3.2 presents the gray-zone values for XRF instruments based on this field study. As in the previous section, the results provide evidence that a laboratory using a portable XRF instrument for analysis would likely be classified as a Semi-Quantitative Laboratory. Table 3.2 Gray-zones defined as 1.0 ± two times the precision in mg/cm2, where 0 is the lowest limit. Results are based on the above described field test data. Instrument Lead Analyzer MAP-3 Microlead 1 X-MET 880 XK-3 XL Energy Level K-shell L-shell K-shell L-shell K-shell L-shell K-shell L-shell Brick (0.527. 1 .473) (0.925, 1.075) (0.00, 2.864) (0.518, 1.482) N/A N/A *N/A or (0.00, 2.198) (0.631, 1.369) Concrete (0.26, 1.741 (0.799, 1.201) (0.00, 2.989) (0.671. 1.329) (0.00, 2.439) (0.859, 1.141) *N/A or (0.00, 2.271) (0.514, 1 .486) Drywall (0.29, 1.71) (0.659, 1.341) *N/A or (0.249, 1.751) (0.506, 1.494) •N/A or (0.322, 1.678J (0.73, 1.27) (0.00, 2.124) (0.527, 1.473) Metal (0.185, 1.815) (0.641, 1.359) (0.00. 2.094) (0.086, 1.914) (0.00. 2.368) (0.523. 1.477) (0.00, 3.116) (0.22, 1.78) Plaster (0.527. 1.473) (0.724. 1.276) (0.00. 2.733) (0.697. 1.303) (0.00, 2.238) (0.839, 1.161) (0.00. 2.266) (0.671. 1.329) Wood (0.135, 1.865) (0.63, 1.37) (0.00, 2.33) (0.449, 1.551) (0.00. 2.83) (0.422, 1.578) (0.00. 2.373) (0.371, 1.629) * Calculated with d = 0, could be entered as N/A or as gray-zone given. (See Appendix B) N/A = Not applicable (insufficient data) Note: The definitive gray-zone of comparison is (0.8, 1.2) mg/cm2. 10 ------- 3.1.3 Field Investigation of On-Site Techniques (Portable XRF) Three field lead detection technologies for detecting lead levels in paint were evaluated in (Ashley, et al., 1998-2), one of which was XRF. This was a field study, conducted by NIOSH on buildings erected from the late 1800's to the 1960's on the campus of Florida A&M University in Tallahassee. The XRF field instrument used was the TN Spectrace 2000. Confirmatory analyses of split paint samples were carried out by an accredited laboratory using atomic absorption spectrometry (AAS), and the intent of the study was primarily to determine the level of bias in on-site measurements. A total of 175 measurements were taken on paint test readings for various media (plaster, metal, wood, and brick), and the XRF test results were compared to AAS results by linear regression. The mean-squared error for the regression was given, as well as the slope estimate, intercept estimate, and r-square value. The action level for paint of 0.5 percent lead by weight was lower than the average paint concentration observed, and variability tended to increase with concentration. Because of this, it is assumed that the reported standard error, which is the root mean-squared error of the regression line, is probably larger than error associated with lead levels close to the action level. The reported standard error is 0.054 percent, resulting in a gray-zone of (0.392-0.608) percent around the action level, and a coefficient of variation of 0.108. Since the definitive gray-zone requirement in this case is (0.4, 0.6) percent, the observed performance using this XRF technology does not quite meet definitive-type standards. It should be noted that the estimate of standard error given here is not completely accurate at the action level, although it is reasonable to assume the true standard error is lower. From this perspective, the result is conservative. Also, results from the XRF instruments appear to be biased high when compared to reference concentrations determined by AAS. XRF readings gave results that were, on average, higher than the reference concentrations. Of course, if the bias is well understood, it can be corrected for. 3.1.4 Using XRF Technology for Soil Analysis The on-site capabilities of XRF instrumentation for measuring lead in soil was investigated in (EPA 600-R-97-145). Two instruments were used in this analysis: the Spectracle TN Pb .AnaJyaer and the Spectracl/e TN 9QOO Analyzer. Two sites were selected to perform this on-site analysis: one in Maryland and a second in Iowa. Heavy industrial activity had taken place at both sites. Samples collected on-site were split and evaluated by a reference laboratory using ICP-AES to provide a reference concentration. Soil samples were classified into groups according to their reference lead concentrations. Then ten replicate measurements for lead were made on the soil samples using the same instrument by the same individual. The results here are, therefore, estimates of repeatability of measurements, and not reproducibility. 11 ------- The lead concentrations of soils were reported only as falling into one of four categories: (1) near the minimum detection level, (2) 50-500 yg lead/gram soil, (3) 500-1,000 ng/g, and (4) >1,000 jig/g. Twenty soil samples were found to contain reference concentrations of lead that were > 1,000 yg/g, which is the category containing the 2,000 ng/g action level for soils. Results here were reported as relative standard deviations (RSD), which is the ratio of standard deviation to the category mean. The TN Pb Analyzer was reported as having a RSD of 2.52 for lead, which would give a gray-zone of (1,900,2,100) jag/g around the action level, as compared to the definitive requirement of (1,600,2,400) ug/g. The TN 9000 Analyzer was reported as having a RSD of 3.68 for lead, which yields a gray-zone of (1,853,2,147) ^ig/g around the action level. In both cases, the performances appear to be of definitive-type quality for analysis of lead in soil. However, it should be noted that the only source of variability estimated in this study was associated with repeated measurements of the same sample by the same instrument and individual, and any additional variation due to reproducibility is not taken into account. 3.2 CHARACTERIZING ULTRASONIC EXTRACTION/ANODIC STRIPPING VOLTAMMETRY (UE/ASV1 PRECISION This section considers the performance of Ultrasonic Extraction/Anodic Stripping Voltammetry (UE/ASV) technologies. Overall, UE/ASV roughly demonstrates a level of precision and accuracy that is potentially compatible with definitive-type analysis. However, as seen below, this does not appear to always be the case. 3.2.1 EPA Evaluation of the PaceScan 2000 A study was performed by the U.S. EPA (EPA 600/R-95/093) to evaluate the performance of solution-based technologies for measuring lead in environmental media. One of the instruments tested was the PaceScan 2000, which uses UE/ASV technology. Accuracy and precision of the instrument were determined by taking measurements of lead in characterized paints, bulk dusts, and soils, designated as Research Triangle Institute (RTI) core materials. These RTI core materials include materials from the following sources: • NIST Standard Reference Materials (NIST SRMs) - reference samples prepared and certified by the National Institute of Standards and Technology. • RTI Method Evaluation Materials (RTI MEMs) - samples with lead concentrations determined from an EPA/RTI round-robin study (Williams, et al., 1993) done by using hotplate or microwave extraction, with measurement by AAS or ICP-AES. • ELPAT Materials - reference laboratory samples with mean concentrations determined by a number of reference laboratories selected by NIOSH using a range of extraction methods with measurement by AAS or ICP-AES. For the purpose of analysis, it was assumed that the measured mean lead concentrations of these reference materials were in fact the true lead concentration. Three repeated measurements were made on a number of paint, bulk dust, and soil samples. It was also assumed 12 ------- that the instrument would have similar characteristics for bulk dust and dust wipes, as bulk dust was analyzed in the study. Because the data were presented in full and are of high quality, a statistical analysis was performed on the data to identify precision at the specified action levels. Details of this analysis are provided in Appendix B. The results presented in Table 3.3 indicate that definitive analysis criteria are satisfied for paint and soil testing, but only semi-quantitative analysis criteria were met for bulk dust testing. The PaceScan 2000 has both a high and a low setting for paint: The low range is 0.0025-1.5 percent lead by weight, and the high range is 0.02-10 percent. The lead detection range for dust and soil samples is 0.0025-1.5 percent. The instrument also has extended ranges that were not considered. The report also concluded that "The PaceScan 2000 instrument provided applicability to multimedia analysis, was easily operated, and appeared to have promise for field application". Table 3.3 PaceScan 2000 results based on data from (EPA 600/R-95/093), April 1996 Medium Paint - Low Paint - High Bulk Dust Bulk Dust Bulk Dust Soil Action Level 0.5% 0.5% 50 tig/sample 250 ug/sample 800 ug/sample 2000 ug/g Precision 0.032% 0.042% 6.2 ug/sample 27 ug/sample 86 ug/sample 130 ug/g Percent Error 6.4% 8.3% 12.4% 10.8% 10.7% 6.5% Gray-Zone 0.436-0.564 0.417-0.583 38-62 1 96-304 628-972 1740-2260 Definitive Zone 0.400-0.600 0.400-0.600 40-60 200-300 640-960 1600-3200 Within 20% Y Y N N N Y Dust samples are assumed to represent a 1 ft2 area. 3.2.2 Intel-laboratory Evaluation of UE/ASV Lead Measurements on Paint, Dust, and Soil An interlaboratory evaluation of the UE/ASV procedure obtaining estimates for both repeatability and reproducibility of measurements was conducted by NIOSH and reported in (Ashley, et al., 1998-1). The UE/ASV technology evaluated was the Palintest 5000, a later version of the PaceScan 2000. Paint, soil, and dust wipe performance evaluation materials (PEM*s) prepared by the Research Triangle Institute (RTI) were used in the study. These samples were collected at various commercial and residential sites in several states, and then dried, ground, sieved, and homogenized prior to initial laboratory characterizations for lead content using UE in accordance with ASTM PS87 and ASV in accordance with ASTM PS88. Dusts were spiked onto wipes. As a reference analytical method, ICP-AES with microwave digestion was used to characterize the samples. Measurements performed on the RTI reference samples were chosen to bracket action levels for the different matrices. Two measurements were taken at each often laboratories, at three different lead concentration levels per medium. Due to a procedural error at one of the laboratories, the paper contains results from only nine laboratories. 13 ------- In order to address questions about precision of UE/ASV measurements at action levels, a few general assumptions were made about the data. These assumptions are detailed in Appendix B. Results are shown in Table 3.4, which again demonstrates a mixture of definitive and semi-quantitative performance for analyses using UE/ASV technology. Table 3.4 Results based on data from interlaboratory evaluation of UE/ASV (Palintest 5000) Medium Paint Dust - floors Dust-sills Dust-wells Soil Action Level 0.5% 50 ng/sample 250 tig/sample 800 ug/sample 2000 ug/9 Precision 0.070% 6 ug/sample 24 ug/sample 53 ug/sample 136ug/g Percent Error 14.0% 1 2.4% 9.6% 6.7% 6.8% Gray-Zone 0.360-0.640 . 38-62 202-298 694-906 1728-2272 Definitive Zone 0.400-0.600 40-60 200-300 640-960 1600-2400 Within 20% N N Y Y Y 3.2.3 Field Investigation of On-Site Techniques (UE/ASV) Three field lead detection technologies for detecting lead levels in paint, including UE/ASV, were evaluated in the NIOSH study presented in Section 3.1.3 (Ashley, et al., 1998-2). The UE/ASV field instrument used was the PaceScan 3000. Confirmatory analyses of split paint samples were carried out by an accredited laboratory using AAS, and the intent of the study was primarily to determine the level of bias in on-site measurements. A total of 71 analyses were taken on paint test samples from various media (plaster, metal, wood, and brick), and the UE/ASV test results were compared to AAS results by linear regression. The mean-squared error for the regression was given, as well as the slope estimate, intercept estimate, and r-square value. The action level for paint of 0.5 percent was lower than the average paint lead concentration investigated, and variability tends to increase with concentration. Because of this it is assumed that the reported standard error, which is the mean squared error of the regression line, is probably larger than error at the action level. Reported standard error is 0.0390 percent, resulting in a gray-zone of 0.4220-0.5780 percent around the action level, and a coefficient of variation of 0.078. This gives a gray-zone that has all values within 20 percent of the action level, indicating definitive-type performance. It should be noted that the estimate of standard error given here is not completely accurate at the action level, although it is reasonable to assume the true standard error is lower. From this perspective, the result is conservative. Also, two outliers were deleted before analysis of the data. However, these outliers occurred at levels much higher than the action level and did not significantly impact estimation of precision. 3.2.4 Laboratory Evaluation of the PaceScan 2000 The effectiveness of the PaceScan 2000 in performing analyses of real-world waste toxic characteristic leaching procedure (TCLP) samples was evaluated in (White and Clapp, 1998). TCLP samples are samples taken from remediation sites and are not classified as paint, dust, or soil samples but the dust setting was used to evaluate the lead concentration of the samples. 14 ------- Analysis results from ICP-AES done in extract were used as a reference. The comparative analysis was done on leachate, the solution that was extracted. Eighteen TCLP samples were analyzed using both ICP-AES analysis and using the PaceScan 2000 with sample aliquots acidified to 2 percent and 4 percent concentrated nitric acid solutions. The study noted that there was no significant difference between the 2 and 4 percent acidification results and also no significant difference between either of these results and the ICP-AES reference readings. A spike recovery study was performed using spikes of known amounts of lead followed by repeated analysis of the spike using the PaceScan 2000. The coefficient of variation of these measurements never exceeded 3 percent. This study showed that the dust setting of the PaceScan 2000 can give results that could be qualified as a definitive-type analysis under controlled laboratory conditions. A repeatability study was also done by performing ten parallel analyses of three TCLP sample extracts which had concentrations near lead action levels for dust. The reference concentrations from the ICP-AES analysis were 80, 550, and 650 ng/sample. Using linear extrapolation to estimate variance of measurements at dust action levels is difficult for this study as the observed standard deviation at the 550 jig/sample level was actually larger than the observed standard deviation for 650 jig/sample. To estimate the standard deviation at 50 jAg/sample, the observed standard deviation at 80 ng/sample was used. For the 250 and 800 jig/sample action levels, standard deviations were estimated by a linear regression equation. Results are summarized in Table 3.5 and suggest definitive-type performance. Table 3.5 Results based on data from analysis of TCLP Extracts (PaceScan 2000) Medium Oust - floors Dust-sills Dust-wells Action Level 50 ng/sample 250 iig/sample 800 ua/sample Precision 2.4 ug/sample 1 7 ug/sample 54 us/sample Percent Error 4.8% 6.8% 6.7% Gray-Zone 45-55 216-284 694-908 Definitive Zone 40-50 200-300 640-960 Within 20% Y Y Y 3.2.5 Other Studies Considered One study (Ashley, 1995) investigated UE/ASV technology but performed analysis on air filter samples. The study concluded that UE/ASV technology gave definitive-type results in the neighborhood of action levels for dust wipes, which is considered a comparable analysis. Finally, manufacturer claims for the Palintest 5000 are that the scanner has a coefficient of variation of less than or equal to 7 percent at the action levels for paint, dust wipes, and soil. If appropriate, these results, along with the results in the previous sections, indicate that UE/ASV technology may possess the potential for being used in a definitive-type analysis. 15 ------- 3.3 PRESENT AND DEFINE A DECISION TREE FOR 95 PERCENT CONFIDENCE Recall that the gray-zone for an instrument (action level, plus or minus two times the known precision for that instrument) represents an approximate 95 percent confidence interval for what the instrument will observe if the true lead amount is at the action level. Thus an observation within the gray-zone fails to provide strong enough evidence regarding whether the true lead amount is above or below the action level. Recall that a Definitive Laboratory has a confirmed gray-zone at least as narrow as plus or minus 20 percent of the action level, while a Semi-Quantitative Laboratory is a laboratory whose methods provide a gray-zone that is wider than plus or minus 20 percent of the action level. Figure 3.1 below provides a decision tree for providing 95 percent confidence in making decisions relative to an action level when using a Semi-Quantitative Laboratory to perform the analysis. The tree indicates that initial gray-zone results should be sent to a Definitive Laboratory for confirmation. The idea is that the imprecision associated with a Semi-Quantitative Laboratory will produce a large number of gray-zone results. Therefore, in order to avoid a high rate of false positive or false negative classifications due to making a decision based on gray-zone results, such results require the more precise analysis of a Definitive Laboratory. The following scenarios correspond to a Semi-Quantitative Laboratory using portable XRF technology with performance similar to that given on the LPA-1 PCS. These scenarios are meant to illustrate the use of the decision tree given by Figure 3.1. • The observed paint lead measurement is 0.2 mg/cm2. The observed value is below the lower limit of the Semi-Quantitative gray-zone (0.4,1.6). The appropriate conclusion is that the lead loading is below the action level of 1.0 mg/cm . This conclusion is made with 95 percent confidence. • The observed paint lead measurement is 2.6 mg/cm2. The observed value is above the upper limit of the Semi-Quantitative gray-zone (0.4,1.6). The appropriate conclusion is that the lead loading is above the action level of 1.0 mg/cm . This conclusion is made with 95 percent confidence. • The obsewed paint lead measurement is 0.8 mg/oral The observed -value.is within the Semi-Quantitative gray-zone (0.4,1.6). The appropriate action is to send the sample to a Definitive Laboratory (gray-zone within (0.8,1.2)). The Definitive Laboratory will analyze the sample. The appropriate conclusion is made based on the results from the Definitive Laboratory. In the last scenario presented above, observe that the possibility exists, due to Definitive Laboratory imprecision, that the subsequent result will lie within a gray-zone as well. As such, if a decision must be made at this point, then it is not necessarily made with 95 percent confidence. Essentially, once an initial gray-zone result is obtained by the Semi-Quantitative Laboratory, the decision tree in Figure 3.1 defaults to the decision tree in Figure 2.1. 16 ------- Observed Measurement Below Semi-Quantitative Laboratory firav-Tone V Conclude Lead is Below Action Level Identify Semi-Quantitative Laboratory Gray-zone Obtain Measurement V Observed Measurement Within Semi-Quantitative Laboratory Grav-7.one Send Sample to Definitive Laboratory V Concludeihased on Results from Definitive Laboratory. (See Figure 2.1) Observed Measurement Above Semi-Quantitative Laboratory Grav-7.one V Conclude Lead is Above Action Level Figure 3.1 Decision tree for making statements with 95 percent confidence using a Semi-Quantitative Laboratory. 17 ------- 4.0 QUALITATIVE LABORATORY/ANALYSIS This section discusses the use of qualitative measures for lead in paint, dust, or soil. Specifically, chemical test kits that provide only an indication of presence or absence of lead are considered. The issue of whether such qualitative analyses can be used for the purpose of making a decision, at 95 percent confidence, with respect to a true lead level as compared to an action level is investigated. The general finding is that qualitative analyses, such as that performed in the application of a chemical test kit, may be appropriate for making a decision, at 95 percent confidence, in a single direction but probably not in two directions. Specifically, Section 4.1 discusses using chemical test kits as a negative screen (i.e., concluding that the amount of lead is below the action level). Section 4.2 discusses using chemical test kits as a positive screen (i.e., concluding that the amount of lead is above the action level). Section 4.3 discusses some field results for different chemical test kits. Finally, Section 4.4 presents a decision-tree model for combining information from the two types of qualitative measures discussed in Sections 4.1 and 4.2. 4.1 USING QUALITATIVE ANALYSES AS NEGATIVE SCREENS Figure 4.1 demonstrates hypothetical performance of a chemical test kit for the analysis of lead in paint. This figure is an operating characteristic (OC) curve which plots the probability of a positive indication for lead in paint as a function of the true amount of lead in paint. The vertical dashed line corresponds to the action level of 1.0 mg/cm2 for lead in paint. The horizontal dashed line corresponds to a 95 percent probability of obtaining a positive indication for lead in paint. The hypothetical results demonstrated in Figure 4.1 represent the type of qualitative performance that would be appropriate for making decisions in the direction of a negative screen. The figure shows that if the true lead level is at or above the action level of 1.0 mg/cm2, then at least 95 percent of the time a positive indication will be obtained: Probability {Positive indication given true lead level > 1.0 mg/cm2} > 0.95. This implies that the test has high sensitivity. Equivalently, the likelihood of a false negative (i.e., having a negative indication when the time lead level is at or above the action level), is no more that 5 percent: Probability {Negative indication given true lead level > 1.0 mg/cm2} < 0.05. Such performance provides 95 percent confidence that negative indications, where made, are correct indications. Therefore, a chemical test kit exhibiting this type of performance could be used as a negative screen. 18 ------- 10H 0.0 000 0.25 050 0.75 1.00 125 1.50 1.75 200 True Lead Level (milligrams per centimeters squared) 2.25 2 50 Figure 4.1 Hypothetical operating characteristic (OC) curve of a chemical test kit analyzing lead in paint (demonstrating qualitative analysis performance considered appropriate as a negative screen). In contrast to the above discussion, Figure 4.1 also highlights the fact that a chemical test kit exhibiting the displayed performance would not necessarily be appropriate for making decisions based on positive indications. The high sensitivity (i.e., likelihood of a positive result) of such an instrument at lower lead levels would produce far too many false positive results. Specifically, for lead levels between 0.5 and 1.0 mg/cm2, this test kit would provide an incorrect positive indication more than 5 percent of the time. Such results are considered false positives given the action level of 1.0 mg/cm2. For this reason, a more appropriate course of action given a positive indication by this'test kit would be to send a sample to a definitive laboratory in order to obtain a more accurate quantitative result. In summary, a qualitative measurement technology such as a chemical test kit may have potential for assessing lead in paint, dust, or soil - with 95 percent confidence. Unfortunately, the qualitative nature of the analysis being performed appears to produce a limitation on the decisions that can be drawn from obtained results. However, a straightforward decision tree that provides an overall protection of 95 percent confidence against incorrect conclusions still can be formed. In the case of a qualitative analysis used as a negative screen, the decision tree providing 95 percent confidence against error would be as follows: 19 ------- • Perform the analysis and obtain a result. (a) If the result is a negative indication for lead, conclude with 95 percent confidence that the true lead level is below the action level. (b) If the result is a positive indication for lead, send a sample to a definitive laboratory for quantitative confirmation with 95 percent confidence. 4.2 USING QUALITATIVE ANALYSES AS POSITIVE SCREENS Figure 4.2 presents an alternative OC curve, where the horizontal dashed line corresponds to a 5 percent probability of obtaining a positive indication for lead in paint. The hypothetical results demonstrated in Figure 4.2 represent the type of qualitative performance that would be appropriate for making decisions in the direction of a positive screen. In other words, a decision could be made, with 95 percent confidence, as to whether the amount of lead is above the action level. The figure shows that if the true lead level is below the action level of 1.0 mg/cm2, then at least 95 percent of the time a negative indication will be obtained: Probability {Negative indication given true lead level < 1.0 mg/cm2} > 0.95. This implies that the test has high specificity. Equivalently, the likelihood of a false positive (i.e., having a positive indication when the time lead level is below the action level) is no more than 5 percent: Probability {Positive indication given true lead level < 1.0 mg/cm2} < 0.05. j Such performance provides 95 percent confidence that positive indications, where made, are correct indications. Therefore, a chemical test kit exhibiting this type of performance could be used as a positive screen. 20 ------- 10H 00 0 00 0.25 0.50 0.75 100 125 150 175 2.00 True Lead Level (milligrams per centimeters squared) 2 25 2 50 Figure 4.2 Hypothetical operating characteristic (OC) curve of a chemical test kit analyzing lead in paint (demonstrating qualitative analysis performance considered appropriate as a positive screen). Figure 4.2 also highlights the fact that a chemical test kit exhibiting the displayed performance would not necessarily be appropriate for making decisions based on negative indications. The high specificity (i.e., likelihood of a negative result) of such an instrument at lead levels near 1.0 mg/cm2 would produce far too many false negative results. Specifically, for lead levels between 1.0 and 2.0 mg/cm2, this test kit would provide an incorrect negative indication more than 5 percent of the time. Such results are considered false negatives given the action level of 1.0 mg/cm2. For this reason, a more appropriate course of action given a negative indication by this test kit would be to send a sample to a definitive laboratory in order to obtain a more accurate quantitative result. As seen in Section 4.1, the qualitative nature of the analysis being performed appears to produce a limitation on the decisions that can be drawn from obtained results. Again however, a straightforward decision tree that provides an overall protection of 95 percent confidence against incorrect conclusions can be formed. In the case of a qualitative analysis used as a positive screen, the decision tree providing 95 percent confidence against error would be as follows: 21 ------- • Perform the analysis and obtain a result. (a) If the result is a positive indication for lead, conclude with 95 percent confidence that the true lead level is above the action level. (b) If the result is a negative indication for lead, send a sample to a definitive laboratory for quantitative confirmation with 95 percent confidence. 4.3 FIELD TEST RESULTS FROM EVALUATING CHEMICAL TEST KITS This section discusses two different studies that considered the performance of chemical test kits. The results of these two field studies are given below. 4.3.1 EPA Field Study on Chemical Test Kits The EPA study (EPA 747/R-95/002b) discussed in Section 3.1.2 also evaluated chemical test kits. This study concluded that chemical test kits should not be used in lead paint testing as none of the test kits demonstrated sufficiently low rates of false positive as well as false negative classifications. However, such a requirement is probably not realistic for qualitative measures such as chemical test kits. The evaluated test kits with low false positive rates tended to have high false negative rates, and vice versa. The only way a chemical test kit can achieve low rates of both false positive and false negative classifications (e.g., less than 5 percent for each) is for its OC-curve to remain near zero for all lead amounts below 1.0 mg/cm , have a very steep slope at 1.0 mg/cm2, and have a value near 1 for all lead amounts above 1.0 mg/cm2. Since such performance is highly unlikely in practice, the way in which such methods are used might need to be reconsidered, as is suggested in the previous two sections. Compared to an action level of 1.0 mg/cm2 for paint, two test kits (LeadCheck and State Sodium Sulfide) had false negative rates of 6 percent and 1 percent, respectively. These rates are near the level required for 95 percent confidence using a negative screen. All other evaluated kits had higher false negative rates. Note that compared to an action level of 0.5 percent lead by weight, each test kit's false negative rate increased. However, the demonstrated overall false negative rates of 6 percent and 1 percent suggest that such chemical test kit technology might be appropriate as a negative screen. In conjunction with a decision tree, such a technology could be used to provide 95 .percent confidence in the final statement made regarding/I lead amount compared to an action level. Compared to an action level of 1.0 mg/cm2 for paint, one test kit (Lead Alert: Sanding) had a false positive rate of 9 percent, relatively close to a level required for 95 percent confidence using a positive screen. All other evaluated kits had higher false positive rates. Note that compared to an action level of 0.5 percent, the "Lead Alert: Sanding" test kit's false positive rate increased to 10 percent and the "Lead Alert: Coring" test kit's false positive rate was 11 percent. While some came close, none of the evaluated test kits appear to have demonstrated a false positive rate low enough to be used in a qualitative analysis decision tree for providing 95 percent confidence. For a detailed discussion and treatment of the chemical test kits mentioned in this section, refer to the field test report. 22 ------- 4.3.2 Field Investigation of On-Site Techniques (Chemical Test Kits) In (Ashley, et. al., 1998-2), the performance of Rhodizonate-based chemical test kits for testing lead concentration in paint was examined. Results from on-site analysis using test kits were compared to a reference concentration determined AAS. OC curves of test kit response were given in the report, although they cannot be reproduced here as the data used in constructing them was not provided. The OC-curves show that these types of test kits are appropriate for negative screening; that is, they protect against negative results when the true lead concentration is greater than the action level. In particular, the curves indicate that, near 95 percent of the time, analyses using this technology typically will identify lead when the true lead level is at or above the action level. Some information on false positive and false negative rates was provided in the report. Specifically, three false negative readings out of 66 samples with lead levels above 0.5 percent were recorded, for a false negative rate of 4.5 percent. Further, four false positive readings out of 105 samples where reference lead concentrations were below 0.06 percent were recorded, for a false positive rate of 3.8 percent. However, these rates may not reflect performance specific to lead levels very near the action level of concern. Instead, they reflect an overall performance averaged across concentrations either above the action level or below it. For example, many of the 105 samples with lead levels above 0.5 percent actually may have contained lead concentrations well above 0.5 percent, in which case the technology's performance is certain to be superior to what it would be near the action level. Thus, an overall false negative rate of 4.5 percent may not represent actual performance near the action level of 0.5 percent. 4.4 PRESENT AND DEFINE A DECISION TREE FOR 95 PERCENT CONFIDENCE. USING BOTH A NEGATIVE SCREEN AND A POSITIVE SCREEN The discussion in Sections 4.1 and 4.2 above regarding qualitative measures used as negative and/or positive screens can be combined to form a model for making decisions with approximately 95 percent confidence. The use of such a decision tree would require access to both a negative screen qualitative measure and a positive screen qualitative measure. In absence of one or the other, the appropriate decision trees provided in the conclusions to Sections 4.1 and 4.2 instead could be employed. The following two-way table diagrams the potential conclusions from an analysis for lead in paint, dust, or soil: Analysis Conclusion for Lead Above Action Level Below Action Level True Lead Level Above Action Level Correct Conclusion False Negative Below Action Level False Positive Correct Conclusion 23 ------- A qualitative measure acting as a negative screen provides protection against false negative conclusions. A qualitative measure acting as a positive screen provides protection against false positive conclusions. However, even when armed with both a negative and positive screen, depending on the analysis results, a decision cannot always be made with 95 percent confidence. The following results can occur: 1. If both the negative and positive screen yield a negative indication for lead, then conclude with 95 percent confidence that the true lead level is below the action level. 2. If both the positive and negative screen yield a positive indication for lead, then conclude with 95 percent confidence that the true lead level is above the action level. 3. If the negative screen is positive AND the positive screen is negative, a decision regarding the true lead level as compared to the action level cannot be made with 95 percent confidence. 4. If the negative screen is negative AND the positive screen is positive, a rather conflicting result has been observed and the efficacy of one or both screens is in question. A decision regarding the true lead level as compared to the action level cannot be made with 95 percent confidence. The fourth scenario should almost never occur in practice but is included for completeness. That is, a positive screen is much more likely to produce negative results; therefore, any time a negative screen yields a negative indication for lead, then almost certainly the positive screen will produce the same result. Similarly, a negative screen is much more likely to produce positive results; therefore, any time a positive screen yields a positive indication for lead, then almost certainly the negative screen will produce the same result. The key to the limitation of qualitative measures in decision making is point number three above. Under this scenario, neither screen has provided sufficient evidence for a conclusion in one direction or the other regarding the action level. Therefore, such results would require some sort of definitive confirmation in order to make a decision with 95 percent confidence. However, this third scenario does provide-some information. Such results suggest the presence of lead at a level that is probably not extremely far above the action level. In other words, with no lead present, the negative screen will yield a negative result 95 percent of the time. With a great deal of lead present, the positive screen will yield a positive result 95 percent of the time. Lead levels somewhere in between will tend to produce the result given by three. From the above discussion, the decision tree in Figure 4.3 provides approximate 95 percent confidence that a correct decision is being made when using qualitative measures for lead analysis. 24 ------- Positive Positive Obtain Result from Negative Screen Obtain Result from Positive Screen Negative Conclude Lead is Above Action Level with 95% Confidence Send Sample to Definitive Laboratory for Quantitative Confirmation Negative Conclude Lead is Below Action Level with 95% Confidence Figure 4.3 Decision tree for making statements with 95 percent confidence, using both a negative and positive screen. Notice that the above decision tree is simply a combination of the decision trees given at the conclusions of Sections 4.1 and 4.2. The advantage of the above decision tree, assuming qualitative measures providing both negative and positive screens are available, is that it provides for the possibility of drawing a conclusion with 95 percent confidence in either direction of the action level, above or below. Of course, there is still the distinct possibility a conclusion with 95 percent confidence cannot be made; in which case more definitive information is required. Finally, observe that the decision tree given in Figure 4.3 is not unique. First, the roles of the negative and positive screens could have been switched, producing a tree in which the positive screen is conducted first. Under this design, an initial positive result or an initial negative result followed by another negative result leads to a conclusion, and an initial negative result followed by a positive result requires more information for a'decision to be made. Alternatively, the decision tree could have been designed symmetrically with both screens being performed at the initial step. In this case, two positive results or two negative results lead to a conclusion, while one negative result coupled with one positive result requires more information for a decision to be made. The advantage of the design in Figure 4.3 and the first one described in this paragraph is a reduction in cost due to implementation. Unlike these two asymmetric designs, a symmetric design that employs both screens at the initial step will always perform at least two analyses. 25 ------- 5.0 ADDITIONAL ISSUES AND CONCERNS The purpose of this section is to identify and briefly discuss further issues that may require addressing during the process of expanding the NLLAP to cover the analytical technologies discussed in this report. The suggestions in this section are not meant to be rules. Instead, the goal of this section is to raise an important issue that needs to be addressed if NLLAP is to be expanded. Therefore, the provided suggestions are intended to initiate meaningful dialogue on this topic. Some issues arise due to the different types of entities that might be included in the scope of the NLLAP. Other issues come about due to the differing capabilities of various measurement technologies. But first, an alternative decision tree is offered, and its pros and cons are discussed. 5.1 AN ALTERNATIVE DECISION TREE FORMAT One of the key aspects of the decision trees presented in the previous sections is the rule of defaulting to a definitive laboratory (testing firm) when an observed measurement lies within the gray-zone of a semi-quantitative or qualitative laboratory (testing firm). Such a rule treats definitive laboratories (testing firms) differently than their semi-quantitative and qualitative counterparts. The purpose of such a rule is to provide 95 percent confidence for the ultimate decision that will made, even in those cases when an initial result cannot do so. The idea is that the lack of precision associated with semi-quantitative and qualitative types of analyses will produce many instances in which an appropriate decision cannot be made with 95 percent confidence. Sending a sample to a definitive laboratory (testing firm) provides a safety net from which such a decision still might be made. One possible alternative decision tree format would be to remove the rule of defaulting to a definitive laboratory (testing firm) when an observed measurement is not conclusively above or below an action level. With this approach, every laboratory or testing firm (definitive, semi-quantitative or qualitative) is viewed from the same perspective. That is, a definitive-type performer is not treated as a safety net from which semi-quantitative or qualitative laboratories (testing firms) can obtain a more reliable result. Lake the decisioR tree formats offered in the previous sections, under this alternative approach, each laboratory or testing firm has its own established performance characteristics (i.e., precision, bias and accuracy). However, the alternative format would indicate that a decision regarding an action level is made based solely on the results of the initial analysis - regardless of the type of laboratory or testing firm conducting the analysis. Thus, a result above a gray-zone is classified positive for lead with 95 percent confidence, a result below a gray-zone is classified negative for lead with 95 percent confidence, and a decision needs to be made when the result is in the gray-zone. One possibility for a gray-zone result would be to default to the conservative classification of positive for lead. This rule would be similar to the decision trees of this report for those cases when the initial result and the subsequent definitive result are both in gray-zones. 26 ------- The advantage of this alternative decision tree format is its simplicity and reduction in time. Increased simplicity follows from the fact that the laboratory or testing firm performing the analysis does not have to determine when to send a sample to a definitive laboratory (testing firm). Furthermore, decisions can always be made based on initial results. The reduction in time is obvious, due to the time-savings associated with not having to wait for the definitive results. This alternative decision tree format offers a reduction in laboratory analysis cost. However, such a cost reduction is likely trivial compared to the increased costs associated with unnecessary corrective measures that are dictated by the high rate of false positive results. That is, many times a sample that produces a gray-zone result in truth will contain a lead amount that is below the action level. Concluding the sample is positive for lead can lead to costly corrective measures that would not have been taken if a definitive analysis could have led to the proper conclusion. Always defaulting to conservatively classifying gray-zone results as positive for lead will produce a higher rate of false positives. Similarly, always defaulting to classifying gray-zone results as negative for lead would produce a higher rate of false negatives which would be unacceptable from a public health and safety standpoint. Higher imprecision (e.g., semi-quantitative and qualitative analyses) leads to wider gray-zones, and subsequently more instances of some default decision needing to be made without an associated confidence level of 95 percent. Defaulting to a definitive laboratory in such instances increases the likelihood that the final decision is made with 95 percent confidence. As a final note, a hybrid approach to the decision tree format presented in this report and the alternative format discussed in this section might be worth consideration as well. When gray-zone results are observed, this hybrid approach would default to obtaining additional definitive results unless the party of burden (e.g., the property owner) was willing to classify the result as positive for lead and take subsequent corrective action. This way, overall time and cost is reduced by not always requiring a further definitive analysis be done. Furthermore, the burden of increased false positives is incurred by choice. In summary, the property owner can either pay for the additional definitive analysis or conclude the gray-zone result is positive for lead and pay for any required corrective measures. 5.2 LABORATORY VERSUS TESTING FIRM DISTINCTION The NLLAP Laboratory Quality Systems Requirements (LSQR) identifies minimum requirements for use by accreditation organizations to evaluate laboratories that perform quantitative analytical testing of paint .chip, dust, and/or soil samples collected for lead analysis. These requirements include maintaining a data quality system, as well as guidelines on personnel, equipment, sampling methodology, and reporting. Any laboratory that meets the described requirements for lead analysis is described as accredited and is recognized for its ability to produce data of a recognized level of quality. A testing firm is defined as an organization that may not meet the technical definition of a laboratory, but is still capable of producing data of a definitive-type quality. To become a definitive testing firm, the testing firm should be able to meet all of the requirements of the 27 ------- NLLAP LSQR, but there may need to be a few possible exceptions. For example, one requirement of a laboratory is that a technical manager must be on staff, with a college degree in chemistry or a related science as well as three years non-academic laboratory experience. The technical manager function must be held by a laboratory employee and not contracted out. This requirement may be too costly in the case of smaller testing firms. One possibility is that it may be more appropriate to have the technical manager position for testing firms be a part time position, where one technical manager works for several testing firms in a region. The job responsibility of the technical manager would not change, but this would allow the existence of smaller testing firms who perform straightforward lead screening analyses. That is, the burden of employing a full-time professional technical manager would be lifted. On a final note, another possibility would be to contract out the technical manager duties from an outside organization. 5.3 DIFFICULTIES IN ASSESSING PERFORMANCE In some cases, for example portable XRF technology, laboratory control samples may not be available or relevant. One approach to this problem is to take split samples for screening type analyses. For example, five to ten percent of the analyses performed might require taking two samples from the same location, with one analysis done locally and the other sent to a definitive laboratory for verification. This is feasible for soil, but a potential problem arises for paint analyses. It is not always practical for paint chips to be removed from a residential unit and sent to a laboratory, as the owners of the unit may object due to the destructive nature of the sampling. Another approach is to include a standard reference material (SRM) with an on-site laboratory or testing firm, which can be done with portable XRF technology, and has the advantage of being non-destructive. However, it is not clear whether current SRM's applied to a technology like portable XRF can provide an appropriate characterization of precision and performance. Finally, from a different perspective, characterizing the performance of a qualitative technology is not the same as that for a quantitative technology. For quantitative techniques, a standard error at an appropriate action level needs to be determined in order to calculate a gray-zone (i.e., 95 percent confidence interval). With qualitative techniques, an indication of positive or negative.isj.provided. For such output, an appropriate tool for assessing performance is the operating characteristic (OC) curve. OC-curves give the probability of a positive indication, as a function of the true lead amount. For a detailed discussion on this issue and a methodology for deriving OC-curves, see (Koyak et al., 1998). 5.4 REPEATED SAMPLING TO IMPROVE PRECISION Depending on the technology employed, on-site lead evaluation presents the opportunity to re-sample when faced with an initial gray-zone result. In this approach, more samples would be collected and analyzed when an initial result does not provide an answer with 95 percent confidence. For example, portable XRF technology offers the capability to obtain multiple measurements in a non-destructive fashion. Similarly, chemical test kits might offer this capability with only minimal impact on the surface being tested. However, if the sample error associated with, say, an XRF measurement is due mostly to the surface being tested, with little 28 ------- error contributed by the instrument's measurement imprecision, then repeated measures will provide little benefit. That is, reducing the instrument measurement error through averaging repeated measures might not reduce potentially more significant components of error, such as error due to multiple layers of paint, error due to an uneven surface, etc. Given that this issue is beyond the scope of this report, performing repeated sampling under a semi-quantitative or qualitative analysis was not included in the decision trees provided in this report. 5.5 ALTERNATIVES TO THE GRAY-ZONES PROVIDED IN THIS REPORT This section discusses two alternatives to dealing with gray-zones when making decisions based on analytical measurement. The first alternative abandons the concept of gray-zones entirely. The second uses a different method to calculate gray-zones. 5.5.1 An Alternative Approach that Avoids Gray-Zone Calculations One alternative to using a gray-zone is to accept the instrument reading at face value. For example, if an instrument gray-zone is 0.8 to 1.2 mg/cm2 and a recorded value of 0.9 mg/cm2 is observed, then conclude that lead is not present above the action level, rather than sending the sample to a definitive laboratory for analysis. There are both advantages and disadvantages to this approach. The advantages are that this method saves time and resources. A decision can be reached quickly and cheaply. The disadvantage is that the probability of an incorrect decision being made will increase. The instrument gray-zone is designed to hold the probability of a false positive reading for unleaded components, or a false negative reading for leaded components, to less than or equal to 5 percent. If values inside the gray- zone are taken at face value, this protection against an incorrect decision is removed. 5.5.2 An Alternative Gray-Zone Calculation A different method of calculating gray-zones is presented in (EPA 747-R-95-008). This method uses the concept of defining the boundaries of a gray-zone by deriving lower and upper threshold values (XL, Xu). These threshold values are determined by a probability model, which assumes that lead levels are log-normally distributed. First, the probability that an XRF reading is below XL given that lead levels are above the action level, divided by the probability of a true lead level being above .the action level, is computed. This is the probability of a false negative reading. Then the probability of a false positive reading is computed by a similar method (an XRF reading being above Xu while the true lead level is below the action level). Finally, (Xu, XL) are chosen to make these two probabilities both less than 5 percent. Gray-zones calculated this way have the following features: • The 5 percent targets for false positive and false negative classifications are an average across ranges of lead levels less than and greater than the action level, respectively. The probability of a false classification at a fixed lead level slightly above or below the action level is greater than 5 percent. • The gray-zone is derived with the assumption that true lead levels have a log-normal distribution. 29 ------- • The calculated gray-zones will be asymmetric, with the portion of the gray-zone above the action level typically being larger than the portion below the action level. This is a function of the log-normal distribution assumption. The advantage of this method of calculating gray-zones is that it results in zones that are smaller than what is achieved by using the methods in this report. Hence, laboratories performing semi-quantitative analyses would have to rely on definitive laboratories for confirmation less often in practice. The disadvantages of this method are its complexity and the fact that it is not as conservative as the method presented in this report. 5.6 CONCLUSION This report develops a field-decision performance based model for expanding NLLAP. This model takes the form of decision trees that provide 95 percent confidence in statements made regarding a lead amount relative to an action level. The provided decision trees vary according to the classification of laboratory or testing firm that is conducting the analysis. Ultimately, due to imprecision as represented by the concept of a gray-zone, exact 95 percent protection cannot be provided. However, in most cases, the decision trees provided in this report give near 95 percent protection against making incorrect decisions. For those cases when final results are inconclusive (i.e., in the gray-zone), it is recommended that a conservative conclusion be made, namely that the lead in the sample is above the action level. This approach protects those potentially at risk from a lead hazard. Finally, this report raises several issues critical to the expansion of NLLAP, which are yet to be resolved and require further careful consideration. 30 ------- 6.0 REFERENCES (Ashley, et al., 1998-1) Ashley, K., Song, R., Esche, C., Schlecht, P., Baron, P., and Wise, T., "Ultrasonic Extraction and Portable Anodic Stripping Voltammetric Measurement of Lead in Paint, Dust Wipes, Soil, and Air: An Intel-laboratory Evaluation" U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, National Institute for Occupational Safety and Health, Cincinnati, Ohio, 1998. (Ashley, et al., 1998-2) Ashley, K., Hunter, M., Tail, L., Dozier, J., Seaman, J., and Berry, P., "Field Investigation of On-Site Techniques for the Measurement of Lead in Paint Films" U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, National Institute for Occupational Safety and Health, Cincinnati, Ohio, 1998. (Ashley, 1995) Ashley, K., "Ultrasonic Extraction and Field-Portable Anodic Stripping Voltammetry of Lead from Environmental Samples", Electroanalysis, Vol. 7, No. 12, pp 1189-1190,1995. (Koyak, et al., 1998) Koyak, R., Schmehl, R., Cox, D., DeWalt, F., Haugen, M., Schwemberger, J., and Scalera, J., "Statistical Models for the Evaluation of Portable Lead Measurement Technologies - Part I: Chemical Test Kits", Journal of Agricultural, Biological, and Environmental Statistics, Vol. 3, No. 4,pp 451-465,1998. (White and Clapp, 1996) White, K., and Clapp, A., "Use of a field-portable anodic stripping voltammeter to determine the lead concentration of TCLP extracts", American Environmental Laboratory, October 1996. (EPA 747/R-95/002b) U.S. EPA, "A Field Test of Lead-Based Paint Testing Technologies: Technical Report", EPA Report No. 747/R-95/002b, U.S. Environmental Protection Agency (EPA) Office of Pollution Prevention and Toxics, Washington, D.C. 20460, 1995. (EPA 600/R-95/093) U.S. EPA, "Evaluation of the Performance of Reflectance and Electrochemical Technologies for the Measurement of Lead in Characterized Paints, Bulk Dusts, and Soils", EPA Report No. 600/R-95/093, U.S. Environmental Protection Agency (EPA) National Exposure Research Laboratory, Research Triangle Park, North Carolina, 1996. (EPA 747/R-95/008) U.S. EPA, "Methodology for XRF Performance Characteristics Sheets", EPA Report No. 747/R-95/008, U.S. Environmental Protection Agency (EPA) Office of Pollution Prevention and Toxics, Washington, D.C., 1996. 31 ------- (EPA LSQR Rev. 2.0) U.S. EPA, "National Lead Laboratory Accreditation Program: Laboratory Quality System Requirements (LSQR) Revision 2.0", U.S. Environmental Protection Agency (EPA) Office of Pollution Prevention and Toxics, Washington, D.C. 20460,1996. (EPA 600/R-97/145) U.S. EPA, "Environmental Technology Verification Report - Field Portable X-ray Fluorescence", EPA Report No. 600/R-97/145, U.S. Environmental Protection Agency (EPA) Office of Research and Development. Washington, D.C., 1998. (ASTM E456-96) ASTM Standard Terminology E456-96, "Terminology Relating to Quality and Statistics." (ASTM El 187-96a) ASTM Standard Terminology El 187-96a, "Terminology Relating to Laboratory Accreditation." (ASTM E1605-95a) ASTM Standard Terminology E1605-95a, "Terminology Relating to Abatement of Hazards from Lead-Based Paint in Buildings and Related Structures." 32 ------- APPENDIX A: GLOSSARY ------- APPENDIX A: GLOSSARY There are numerous concepts that are referred to throughout this report. In order to understand the issues being presented, it is critical that such concepts are well defined and understood. As such, several key definitions are given below. Accuracy: The closeness of agreement between a test result and an accepted reference value. Action Level: A threshold lead level to which observed measurements are to be compared (e.g.. the federal threshold of 1.0 mg/cm2 for lead in paint). Analysis, Definitive: A quantitative analysis for the amount of lead in paint, dust, or soil with an associated accuracy satisfying the following constraint: The 95 percent coverage probability (confidence interval) of the analysis lies within ± 20 percent of the action level of concern. Analysis, Semi-quantitative: A quantitative analysis for the amount of lead in paint, dust, or soil that does not satisfy the definitive analysis constraint. Analysis, Qualitative: An analysis that does not provide quantitative information regarding the amount of lead in paint, dust, or soil, but does provide an indication regarding a presence or absence of lead above or below a specified concentration (e.g., a chemical test kit). Bias: The difference between the expectation of the test result and an accepted reference value that is caused by systematic error. Decision tree: A flow diagram providing guidance for making decisions, with a required level of confidence, regarding measured lead levels as compared to an action level. Laboratory Accreditation: Formal recognition that a testing laboratory is competent to carry out specific tests or specific types of tests. Laboratory or Testing Firm, Definitive: A laboratory or testing firm performing quantitative analyses that meets the defined definitive analysis criterion. Laboratory or Testing Firm, Semi-quantitative: A laboratory or testing firm performing quantitative analyses that are not definitive in accuracy. Laboratory or Testing Firm, Qualitative: A laboratory or testing firm performing qualitative analyses. A-1 ------- Note that laboratories might be of fixed-site, mobile facility, or field operation in nature (see NLLAP LSQR 2.0 for definitions). The distinction between laboratory and testing firm in this document is the requirement of a laboratory to employ a technical manager. No such requirement is made of testing firms. Negative Screen: A qualitative analysis technique sensitive enough to provide an acceptable amount of protection (i.e., 95 percent confidence) against false negative errors. Positive Screen: A qualitative analysis technique sensitive enough to provide an acceptable amount of protection (i.e., 95 percent confidence) against false positive errors. Precision: Degree of mutual agreement between individual test results obtained under stipulated conditions. Proficiency Testing: Determination of laboratory testing performance by means of interlaboratory test comparisons. Repeatability: Precision under the following conditions: independent test results are obtained with the same method on identical test materials in the same laboratory by the same operator using the same equipment within short periods of time. Reproducibility: Precision under the following conditions: test results are obtained with the same method on identical test items in different laboratories with different operators using different equipment. A-2 ------- APPENDIX B: DETAILS FOR DATA ANALYSES CONDUCTED ------- APPENDIX B: DETAILS FOR DATA ANALYSES CONDUCTED The purpose of this appendix is to provide further detail regarding the statistical modeling and estimation of precision corresponding to different technologies of laboratories (testing firms). Included are details about statistical models that were used to determine estimates of precision at different action levels for paint, dust wipes, and soil. B.1 ASSESSING THE PRECISION OF NLLAP RECOGNIZED LABORATORIES ELPAT data for rounds 14 through 21 (N=32 means), as described in Section 2.1 of the main body of the report, were used to assess laboratory precision for NLLAP recognized laboratories. Means were treated as "true" values and the standard errors were assumed to be a linear function of the truth. Ordinary least squares regression was used to fit the following functional relationship: a = Po + Pi * V , where a represents precision and \i represents the true lead level. The results of analyses for paint, dust, and soil are given in Table B.I below. Notice that the results for paint correspond to N=28 mean by standard deviation combinations, instead of N=32. Four observations with means well beyond the action level for paint of 1.0 mg/cm2 were removed to more closely satisfy the assumed linear relationship between the ELPAT means and standard errors. Removal of these outliers was justified since these data did not represent NLLAP accredited laboratory precision near the action level. Furthermore, a more accurate portrayal of the linear relationship near the action level was obtained upon their removal. Satisfactory linear relationships were observed for dust and soil means with the removal of only one outlying data point in each data set. Table B.I Regression parameters for an NLLAP accredited laboratory's precision as a function of the true lead level. Medium Paint (n = 28) Dust (n = 31) Soil (n = 31) Model: a = )&„ + !0i * H 00 (se) 0.002 (0.003) 3.80(1.218) 3.64(1.180) 95% CI for So r-0.001, 0.0041 F1.413, 6.1861 fl. 322, 5.948] 0, (se) 0.063 (0.001) 0.066 (0.002) 0.052 (0.001) 95% CI for 0, fO.062, 0.0641 [0.062, 0.0701 F0.050, 0.0541 Descriptive statistics for the full and reduced data sets are provided in Table B.2. This table shows that the range of available data covers the action levels of concern, so use of the linear regression model to estimate standard deviations of measurement at the action levels is a valid approach. B-1 ------- Table B.2 Descriptive statistics for paint, dust, and soil sample means for full and reduced data sets. Full Paint Reduced Paint Full Dust Reduced Dust Full Soil Reduced Soil N 32 28 32 31 32 31 Mean 1.813 1.33 465.6 452.0 889 814 Minimum 0.0306 0.0306 29 29 34.8 34.8 Maximum 8.8 8.1 1498.5 1498.5 3190 2788 The parameter estimates in Table B.I can be used to calculate average gray-zone values representing NLLAP recognized laboratories. The model formula is used with the action level to calculate a precision value. Precision equals PO plus Pi times the action level. The gray-zone is further calculated as the action level, plus or minus 1.96 times the precision. Table 2.1 of the main report presents a comparison between the definitive laboratory gray-zone requirement and the estimated gray-zone for NLLAP Laboratories based on the ELPAT rounds 14 through 21 data for NLLAP labs. Based on Table 2.1, the definitive-type performance of the NLLAP recognized laboratories follows from the fact that the NLLAP recognized gray-zones are all narrower than the definitive gray-zone requirements, except for floor dust. Figure B-l below shows the ELPAT data corresponding to analyses of paint, dust, and soil. The vertical axis corresponds to the overall standard error associated with an analysis conducted by an NLLAP recognized laboratory. The horizontal axis contains the associated mean. The graphs shown indicate that an assumption of a linear relationship between laboratory standard errors and mean lead concentration indeed is reasonable. B-2 ------- Flint Sid w Pilnt Xoon Iteration THREE •a I" 3486 MEAN Df Paint Sample! Dull STD v Dull Mean Iteration TWO 100 90 80 TO 60 60 o 40 g 200 400 600 600 1000 1200 1400 1600 Hun of Dual Sample* Soil STD » Soil H>on llorallan TWO 160 ISO 140 130 120 1000 2000 Uou o! Soil Soroploi Figure B.I. Relationship between overall standard errors and means for NLLAP recognized laboratory analyses of paint, dust, and soil. B-3 ------- B.2 ASSESSING THE PRECISION OF PORTABLE XRF TECHNOLOGY In 1993, a study was conducted by the U.S. Environmental Protection Agency (EPA) and the U.S. Department of Housing and Urban Development (HUD) to collect information necessary in order to establish federal guidelines on testing for lead in paint, (EPA 747/R-95/002b). (After the passage of Title X, Section 1017 of the Residential Lead-Based Paint Hazard Reduction Act of 1992, it became clear there was not enough information from existing studies to implement Title X). X-ray fluorescence (XRF) instruments were one of two field technologies analyzed. In March and April of 1993 a pilot study was conducted in Louisville, Kentucky. From July through October of 1993, a full study was conducted in Denver, Colorado and Philadelphia, Pennsylvania. "A Field Test of Lead-based Paint Testing Technologies: Technical Report" was released in May 1995 and provides a complete technical report of this study. The overall study objective was "to collect information about field measurement methodologies sufficient to allow EPA and HUD to establish guidance and protocols for lead hazard identification and evaluation." Included in "A Field Test of Lead-based Paint Testing Technologies: Technical Report" is the statistical model that was used to describe the relationship between XRF measurements and the lead level as analyzed by inductively coupled plasma-atomic emission spectrometry (ICP) (see Section 6.4.2). ICP is a method commonly used in laboratories to analyze lead in paint and is one of the techniques recommended for confirmation testing in the HUD Guidelines (United States Department of Housing and Urban Development (1990), "Lead-Based Paint: Interim Guidelines for Hazard Identification and Abatement in Public and Indian Housing," Office of Public and Indian Housing, Washington, D.C. 20410). This was one of many motivations for using ICP in the model (see Sections 3.3.1.1 and 6.4.2). The statistical model is made up of two parts: a response component and a standard deviation (SD) component. The response component of the model mathematically describes the mean XRF reading at a particular level of lead. The linear function that is used to estimate the XRF response is described by the equation a + b(Pb) where: • Pb represents the lead level in mg/cm2 (as measured by ICP), • a is the intercept and is compared with 0.0 to determine if the XRF readings are unbiased in the absence of lead, and • b is the slope and is compared with 1.0 to determine if the instrument responded proportionately to changes in the lead level. For example, a Lead Analyzer K-shell device was used to analyze paint chip samples from Brick. A linear model was fitted to the data with the parameters a = 0.084 and b = 0.703. B-4 ------- Therefore, when the lead level of a paint chip sample is analyzed as 0.82 mg/cm2 by ICP, the mean lead level for the XRF instrument is predicted to be 0.084 + 0.703(0.82) = 0.66 mg/cm2. The SD component of the model describes the variation in XRF readings at a particular level of lead. The non-linear function that is used to estimate the SD of the reading is described by the equation [c + d(Pb)2]1/2 where: • Pb represents the lead level in mg/cm2 (as measured by ICP), • c is the variance of XRF readings at a lead level of 0.0 mg/cm2, and • d is a measure of homogeneity of variance and is compared with 0.0 as the lead level increases to determine if variability remains the same as lead levels fluctuate. For example, a model was fitted to the Lead Analyzer K-shell Brick data with parameters c = 0.030 and d = 0.026. Therefore when the lead level of a paint chip sample is analyzed as 0.82 mg/cm2, the standard deviation for the XRF instrument is predicted to be [0.030+ 0.026(Pb)2]1/2. Table B-3 contains a listing of the four model parameters a, b, c, and d for each of the six field portable devices tested in this study (Lead Analyzer, MAP-3, Microlead I, X-MET 880, XK-3, and XL). Since the Lead Analyzer and MAP-3 instruments could be operated by using either K-shell or L-shell X-rays, results were recorded twice for each of these instruments. Cells denoted with the symbol — indicate samples for which fitting a model was not deemed appropriate. B-5 ------- TABLE B.3. Estimated regression parameters for the response component and the SD component DEVICE Lead Analyzer K-shell Lead Analyzer L-shell MAP-3 K-shell MAP-3 L-shell Microlead I X-MET 880 XK-3 XL SUBSTRATE Brick (B) Concrete (C) Drywall (D) Metal (M) Plaster (P) Wood (W) Bnck (B) Concrete (C) Drywall (D) Metal (M) Plaster (P) Wood(W) Bnck (B) Concrete (C) Drywall (D) Metal (M) Plaster (P) Wood (W) Brick (B) Concrete (C) Drywall (D) Metal (M) Plaster (P) Wood (W) Bnck (B) Concrete (C) Drywall (D) Metal (M) Plaster (P) Wood (W) Bnck (B) Concrete (C) Drywall (D) Metal (M) Piaster (P) Wood (W) Bnck (B) Concrete (C) Drywall (D) Metal (M) Plaster (P) Wood(W) Bnck (B) Concrete (C) Drywall (D) Metal (M) Plaster (P) Wood (W) SAMPLE SIZE 93 218 111 188 218 351 92 217 113 145 213 337 185 436 222 374 443 689 183 433 224 371 426 663 143 218 162 186 415 348 72 197 111 175 210 334 143 191 112 185 215 342 90 213 113 187 209 191 MODEL PARAMETERS a 0.084 (0.023) 0017(0.012) -0.018(0009) 0.063(0021) 0030(0.014) 0013(0.007) 0038(0.007) 0009(0.001) -0006(0.001) 0.013(0002) 0.002(0.001) -0.019(0001) -0599(0079) -0 661 (0.072) 0.014 (0.040) 0 328 (0.039) -0.684(0065) -0.052 (0.036) 0012(0.029) -0141(0008) •0115(0005) 0044(0037) -0.123(0010) -0.079(0.008) -- 283(0051) 0.023(0031) 0351(0060) 0010(0049) 0001(0.045) -- 0045(0003) 0.038 (0 002) 0.112(0017) 004,8(0003) 0.042 (0.003) 0.861 (0.064) 1.083(0063) -0.327 (0 040) 0.451(0058) 0.535 (0.049) -0.065 (0.035) 0.109(0.016)* 0066(0009)* 0.082(0.019) 0.074 (0.017) • 0081(0.011)* 0049(0.017)* b 0.703(0055) 0.972 (0 054) 1.196(0.115) 0.958(0055) 0.861(0045) 1 266 (0.044) 0036(0.006) 0152(0012) 0 302 (0.029) 0.196(0023) 0201(0014) 0.279(0.016) 0.797 (0.045) 1212(0123) 0.863 (0 209) 1.098(0071) 1.137(0.102) 1410(0.063) 0109(0016) 0.201 (0.025) 0.498 (0.060) 0.269(0.055) 0.169(0029) 0425(0030) -- 1 094 (0 106) 1.194(0.175) 1.100(0.075) 1068(0086) 1424(0087) -- 0.064(0013) 0223(0031) 0.120(0032) 0072(0.013) 0.259 (0.025) 1016(0251) 1.668(0.227) 1234(0254) 1.405(0.140) 1.035(0.112) 1418 (0.073) 0183(0.033) 0.391 (0 035) 0.289(0.109) 0 546 (0.050) 0 405 (0.037) 0.546 (0.037) ** C 0.030(0.006) 0013(0002) 0.006(0.001) 0.034 (0.006) 0.019(0.002) 0007(0001) 0.001(0001) 00001(00002) 00000(0.0000) 0.0002(000006) 00001(0.0000) 0.0002(0.0000) 0.857 (0.103) 0.807(0.085) 0.141 (0.018) 0140(0.019) 0657(0069) 0239(0025) 0.055(0.010) 0008(0001) 0002(00003) 0.133(0018) 0008(0001) 0008(0.001) -- 0375(0041) 0.115(0013) 0380(0053) 0 265 (0.034) 0389(0040) -- 0.001 (0.0001) 0.0002 (0.0003) 0.020(0.003) 0.0005(00001) 00006(00001) 0.359(0.043) 0 404 (0.043) 0127(0019) 0.267 (0.037) 0.298 (0.035) 0.236(0.024) 0.016(0003) 0.008(0001) 0029(0.004) 0.020(0.003) 0010(0.001) 0008(0.002) d 0026(0.017) 0124(0.032) 0.120(0.081) 0.132(0.035) 0037(0016) 0180(0.033) 00004(00002) 0.010(0.002) 0.029(0008) 0.032 (0.007) 0019(0003) 0.034 (0.005) 0.012 (0.014) 0182(0.094) -0- 0.159(0.049) 0094(0067) 0203(0051) 0.003(0002) 0019(0007) 0059(0021) 0.076(0.024) 0.015(0.007) 0.068 (0.014) -- 0143(0071) -0- 0088(0.050) 0.118(0049) 0.448 (0 105) -- 0.004 (0.002) 0.018(0006) 0.037 (0.008) 0006(0.002) 0.083 (0.014) -0- -0- 0.189(0.363) 0.852(0.192) 0.103(0098) 0.235 (0.064) 0.018 (0.008) 0.051 (0.013) 0027(0.035) 0132(0025) 0.017(0.007) 0091(0.01) * Nonparametnc estimates reported. Standard error estimates obtained by bootstrapping. * * Estimates based on sample summary statistics Tor ICP < 0.1 mg/cm squared. B-6 ------- B.3 ASSESSING THE PRECISION OF UE/ASV TECHNOLOGY The purpose of this section is to provide further detail regarding the statistical modeling of lead concentration readings taken by instruments using UE/ASV technology. Included are details about statistical models that were used to determine estimates of precision at different action levels for paint, dust and soil. B.3.1 EPA Evaluation of the PaceScan 2000 As described in Section 3.2.1, the EPA evaluation analyzed UE/ASV performance at various concentrations of paint, bulk dust, and soil in (EPA 600/R-95/093) by taking repeated measurements of lead levels from a reference sample of known concentration (RTI Core Materials). For paint, 6 samples were analyzed using the PaceScan 2000 low range, and 7 samples were analyzed using the instruments high range (5 of these sample overlapped - used for both low and high ranges). Seven samples each were used for bulk dust and soil analysis. Because the original data is presented in this report, it is possible to regress the true sample concentration onto measured concentration and get an accurate estimate of the variance of measurement at lead action levels. Lead concentrations are reported in units of fig/gram sample in this study, which can be converted to percent lead using the relationship 1 percent lead = 10,000 \ig/g. A graph showing the relationship of measured concentration to true concentration for paint samples is shown below: Regression. Reference Concentration vs Observed Reading 15000 14000- 13000 12000 g> 11000 •- 10000 g 9000 oe 8000 ? 7000 Z 6000 Si 5000 S 4000 3000 2000 1000 0 5000 10000 15000 Reference Concentration 20000 Source EPA 600/R-95/093 Paint Samoles - PaceScan 2000. Low Settma The action level for paint occurs at 0.5 percent lead, which is equivalent to 5,000 Some problems arise when trying to fit a regression line to this data. The first is that variance is not uniform. The variance of observed sample readings is larger for high concentrations than for B-7 ------- low concentrations. The problem of non-constant variance was addressed by a log transformation of the data, regressing log true concentration onto log observed concentration. The second issue is that the data does not show a linear relationship. It was found that including a quadratic term in the model was appropriate. A graph showing the relationship between log true concentration and log observed concentration is shown below: 10 c TJ •o 0) 8 o o Log Reference Concentration vs Log Observed Reading 5 6 7 8 9 10 Log Reference Concentration Source EPA 600/R-95/093 Paint Samoles - PaceScan 2000 LowSeltina This model provided a good fit for the data (R2=0.9950) and was used for further analysis. Mean squared error is equal to 0.00450. It is possible to use a transformation to estimate the standard error at the action level for paint of 5,000 ng/gram sample. This transformation is appropriate for converting results from log-normally distributed data to normally distributed data. Letting Y = observed concentration and T = true concentration, the above regression model assumes that: Log(Y|T) ~ N(^i, a2), where u=p0+Pi*logT+p2*(logT)2, and c2 is estimated by MSE. If we wish to estimate the variance of Y|T instead of Log(Y|T), we can use the log-normal to normal transformation: Variance(Y|T) = [exp(or2)-l]*[exp(2n + a2)] B-8 ------- Applying this transformation gives an estimated standard error of 390 (ag/g at the action level of 5,000 ng/g. Similar analyses were done for paint using the high setting, as well as for dust and soil. B.3.2 Using Linear Extrapolation to Estimate Standard Deviations at the Action Levels of Concern As described in Section 3.2.2, an interlaboratory evaluation studied reproducibility and reliability of UE/ASV measurements by comparing measurements made at 9 different laboratories (Ashley, et al., 1998-1). Three different samples of known lead concentration were provided to each lab, and two measurements were made on each sample per lab. A total of 18 measurements were taken on any given sample. For paint, one of the samples provided had a lead concentration at the paint action level, so the reported standard error of the measurements for this sample was used. For dust wipes, samples were not provided at the same concentration as the action levels. (The samples had concentrations of roughly 100,200, and 900 jig/wipe). To get estimates for standard deviations of measurement at the applicable action levels, a linear projection using the known standard deviations at sampling concentrations was used. Standard error tends to increase with lead concentration in a linear fashion, illustrated in the graph below. Although only three data points are shown here, this linear relationship is typical of all studies considered. Reference Concentration vs. Standard Deviation 30 o - 20 ^ in I TJ CO I1 (0 100 200 300 400 500 600 700 800 900 Reference Concentration Standard error at the 50,250, and 800 ng/wipe action level was determined by using the regression line pictured above. For soil samples, readings were taken at 500 and 3,000 ng/gram sample, whereas the action level of interest is at 2,000 ng/gram. Standard error of readings at the action level was B-9 ------- estimated using the same linear projection method as was used for dust wipes. A similar linear interpolation procedure was used to estimate standard deviations at action levels in section 3.2.4. B-10 ------- *?j I INTRODUCTION Programmatic need • Expand and redesign the current NLLAP. Study objectives • Evaluate lead analysis capability of laboratories/testing firms using portable XRF, UE/ASV and chemical test kits. • Construct decision models for different categories of laboratories/testing firms to be covered under an expanded NLLAP. Bdlt6ll6 NLLAP Expansion ------- THE BOTTOM LINE The motivation in the approach to NLLAP expansion is to develop a system from which correct decisions (regarding environmental samples and measurements) are made 95% of the time. To this end, define a gray-zone as an action level of concern ± 2 x laboratory precision. A gray-zone is to be interpreted as the area around an action level for which a decision cannot be made with 95% confidence. NLLAP Expansion ------- THREE DISTINCT ANALYSES Definitive: A quantitative analysis with relative precision < 10% at the action level of concern. Semi-Quantitative: A quantitative analysis that does not meet the definitive requirement. Qualitative: An analysis that does not provide quantitative information, only a qualitative indication for lead (e.g., use of a chemical test kit) *Tl NLLAP Expansion ------- '<>'• SOME FINDINGS - QUANTITATIVE As expected, NLLAP-recognized laboratories, on average, perform at the definitive level (ELPAT rounds 14-21 data). Analyses employing UE/ASV technology demonstrate both definitive and semi-quantitative levels of performance (EPA evaluation of Pacescan 2000, NIOSH/Ashley studies). Analyses employing portable XRF technology most often perform at the semi-quantitative level (EPA field study, NIOSH/Ashley study, PCS's). NLLAP Expansion ------- !& ^ SOME FINDINGS • QUALITATIVE First, define two types of qualitative analysis • Negative screen: An analysis sensitive enough to provide 95% protection against false negative errors. • Positive screen: An analysis specific enough to provide 95% protection against false positive errors. Then, it was observed that (EPA field study) • Analyses using chemical test kits tended to demonstrate the sensitivity of a negative screen. • Analyses using chemical test kits typically lack the specificity necessary for use as a positive screen. NLLAP Expansion ------- $ . r, ' ! , I A MODEL FOR FIELD DECISIONS Consider decision trees for drawing conclusions in the field. In order to achieve the stated goal of 95% confidence in decision making, decision trees • are performance-based and therefore • depend on the type of analysis employed. Expansion ------- DEFINITIVE DECISION MAKING Batteiie v Observed Measurement Below Definitive Laboratory gray-zone V Conclude Lead is Below Action Level with 95% Confidence Sample Analyzed by Definitive Laboratory \/ Observed Measurement Within Definitive Laboratory gray-zone Conclude Lead is Above Action Level (conservative approach) Observed Measurement Above Definitive Laboratory gray-zone \f Conclude Lead is Above Action Level with 95% Confidence NLLAP Expansion ------- SEMI-QUANTITATIVE DECISION MAKING Baffeiie Observed Measurement Below Sem i-Quantitative Laboratory gray-zone V Conclude Lead is Below Action Level with 95% Confidence Sample Analyzed by Sem i-Quantitative Laboratory Observed Measurement Within Semi-Quantitative Laboratory gray-zone Send Sample to Definitive Laboratory for Confirmation Observed Measurement Above Semi-Quantitative Laboratory gray-zone V Conclude Lead is Above Action Level with 95% Confidence NLLAP Expansion ------- QUALITATIVE DECISION MAKING Positive Result _y Sample Analyzed by Qualitative Laboratory Using a Negative Screen Send Sample to Definitive Laboratory for Quantitative Confirmation Negative Result Conclude Lead is Below Action Level with 95% Confidence Battelle NLLAP Expansion ------- SOME OPEN ISSUES Whether or not to default to a definitive laboratory for decision making Laboratory vs. testing firm distinction Difficulties in assessing performance (e.g., use of split samples or standard reference materials for XRF technology) j Repeated sampling to improve precision (e.g., the average of multiple non-destructive XRF shots) Ignoring/avoiding gray-zone calculations NLLAP Expansion 10 ------- |