United States Environmental Protection Agency Atmospheric Research and Exposure Assessment Laboratory Research Triangle Park NC 27711 / I Research and Development EPA/600/S3-89/015 Aug. 1989 vxEPA Project Summary Precision and Accuracy Assessments for State and Local Air Monitoring Networks- 1987 Jack C. Suggs Precision and accuracy data ob- tained from state and local agencies during 1987 are analyzed. Pooled site variances and average biases which are relevant quantities to both preci- sion and accuracy determinations are statistically compared within and be- tween states to assess the overall ef- fectiveness and consistency in the application of various quality assur- ance programs. Individual site results are evaluated for consistent per- formance throughout the year. Re- porting organizations, states and regions which demonstrate consis- tent precision and accuracy data as the result of effectively administered quality assurance programs are iden- tified. This information is intended as a guide for identifying problem areas, for taking corrective action from the standpoint of improving the effective- ness of quality assurance programs, and for providing more knowledge- able decisions concerning attainment status with regards to ambient air quality standards. An approach to dealing with accuracy data for individual sites is presented, and an alternative sampling design for gen- erating precision and accuracy data is discussed. This Project Summary was devel- oped by EPA's Atmospheric Research and Exposure Assessment Laboratory, Research Triangle Park, NC, to an- nounce key findings of the research project that Is fully documented In a separate report of the same title (see Project Report ordering information at back). Introduction In accordance with revisions to Appen- dix A, 40 CFR Part 58 promulgated March 19, 1986, site-specific precision and accuracy data were submitted as actual test results for the first full year be- ginning January 1987. The availability of individual site data and the opportunity to assess the performance of specific instru- ments was cited as a way to improve the usefulness of the data quality estimates associated with the NAMS SLAMS monitoring network. The regulations did not, however, specify how this would be accomplished except that EPA would now be responsible for calculating the pooled precision and accuracy probability limits formerly calculated and reported by the reporting organizations. The objectives of this report are to analyze and interpret individual site data as they pertain to: 1. Identifying extreme measurement er- rors. 2. Evaluating the effectiveness of SLAMS quality assurance programs. 3. Validating models used to describe precision and accuracy data. 4. Improving decisions concerning at- tainment of air quality standards as they relate to specific instruments. The goal is to provide an overall assessment of various quality assurance programs at the reporting organization, state, and regional levels. Routine calcu- lations are provided only to verify assumptions required to satisfy the objectives. Otherwise, routine information is available through various programs that have access to PARS data files. ------- Manual SO2 and NO2 data were not included in this report because so little data is available. Also, since site codes for 1987 were converted to FIPS codes for this report, site-to-site comparisons between the PARS data and the National Performance Audit Program data were not possible since a complete cross reference conversion file was not available. Data Analysis The primary aim of this report is to assess the overall effectiveness of quality assurance programs administered by the states and local agencies. Accomplishing this within the scope of current guidelines for calculating precision and accuracy estimates requires verification of certain basic underlying assumptions either implied or explicitly stated in the models presented in Section 5 of the amendments to 40 CFR Part 58. These models use weighted site averages and pooled within-site variances to calculate quarterly probability limits for percent differences of precision data from a given reporting organization. Accuracy calcu- lations treat different instruments (ana- lyzers) for a given quarter and reporting organization as having been drawn from a homogeneous group of instruments with similar statistical properties. The 95th percentiles used for the probability limits assume that the percent differences are normally distributed. If it is assumed that individual percent differences were taken from the same normal population for a given pollutant (and level in the case of accuracy data), then annual probability limits could be calculated on a nationwide basis as shown in Tables 1 and 2. The pooled within-site standard deviations for pre- cision data over all sites and quarters is equivalent to the root mean-square error obtained by applying an analysis-of- variance type linear model to each pollutant. Comparisons of site means within reporting organizations, within and between states and regions, and across quarters using F-tests require the same basic assumption: that all site variances are homogeneous. The limits given in Tables 1 and 2 would be valid in this case if assumptions of homogeneity of variance were correct and there were no significant analysis of variance F-tests. However, this is an extreme case and one that is highly unlikely to occur. There may, however, be some reporting organ- izations, states, or even regions whose sites have equal bias as compared to within-site differences. In dealing with quality control data, samples should be in subgroups that are as homogeneous as possible so that if differences are present, they show up as differences be- tween subgroups rather than as differen- ces between numbers within a group. This latter assumption (samples should be in subgroups that are as homo- geneous as possible) is the basic premise of this investigation which exam- ines the effectiveness of various quality assurance programs. For precision data, biweekly samples for a quarter determine the basic sub- group. Variation within this subgroup is used to detect differences between sites, between quarters for a given site, or between groups of sites. With regards to the structure of higher level groupings, the basic experimental design for char- acterizing precision data could be con- sidered as a completely nested design with respect to location parameters (i.e., sites within reporting organizations within states, within regions) with repeated sampling (i.e., biweekly measurements) and quarters providing replication. Under the present model of 40 CFR Part 58, ac- curacy data does not provide a way of comparing individual sites. Rather, site differences within a reporting organization provide the within-subgroup variation to derive quarterly probability limits for each audit level. In the context of experimental design, accuracy data also follow a nested-like hierarchal structure with re- spect to location parameters, and quar- ters provide replication, but there is no repeat biweekly sampling. If analysis of variance techniques had been applicable to the PARS data, between-site variances pooled across reporting organizations and quarters would provide the exper- imental error for making within and be- tween-state and region comparisons at each separate level. The assessment of precision and ac- curacy data involves examining the var- iance of biweekly samples (between-site variance for accuracy data) and bias (weighted averages) as relevant quanti- ties to both precision and accuracy prob- ability limit calculations. Bias is a sys- tematic error that can often be adjusted through the use of calibration procedures. The variance is a measure of precision (or rather imprecision) that is indicative of random uncontrolled errors that are more serious. Due to extreme imbalance in the data and the need to verify basic assumptions of homogeneity of variance, analysis of variance techniques were not used to compare biases between groi based on location and time of year. The approach used to analyze the p cision and accuracy data in this rep began with the basic subgroup of sil (accuracy data) and biweekly sampl (precision data) and worked up the hi archal structure and across quarters. Tl provided an independent assessment the validity of the assumptions of honr geneity of variance and equality of mea as they apply to calculating probabil limits for each group. The assumption normality was accepted since all stal tical tests used in this report rely to sor extent on this assumption, and since it difficult to verify this on the basis of 6 7 biweekly samples per site per quar for precision data. For accuracy da there was usually only one site p quarter available per reporting orge ization. The basic statistical tools used in tl analysis are Bartlett's chi-square test homogeneity of variance and Welche's test for equality of means. Bartlett's t< is well suited to precision data wi unequal numbers of biweekly measui ments from site to site and accuracy de with unequal numbers of sites across r porting organization. It is, however, se sitive to departures from normality as w as to unequal variances. Welche's F-te is equivalent to an analysis-of-variance test when within-subgroup variances a homogeneous, but it does not requi that variances be homogeneous. Tl effectiveness of individual quality assi ance programs should not be judged on by the width of the probability limits f various groups, but also by the validity assumptions used in calculating pro ability limits as determined by statistic tests. As a rule, both multiple comparisi tests must not be rejected at the « 0.01 significance level to show jus fication for claiming a real differem between biases or between variance This level was chosen as an extra mea ure of precaution to prevent erroneous rejecting comparisons as being signi cant due to the possible lack of normali in the data. Since Welche's F-test do< not require that variances be homo< eneous, the two tests are independent. valid probability limit is an indication th the measurement system(s) is in contr and producing uniform results with r spect to the group of sites to which tl probability limit applies. The width of tl limits can be judged against the PAF goals for 95% probability target limits ±15% (all precision checks and flow ra audits) and ± 20% (accuracy audits). ------- Table 1. National Probability Limits for 1987 PARS Precision Data Pollutant CO N02 03 Pb PM-10 SO2 TSP N 13347 6410 17018 1247 1936 24908 14909 Weighted Average 0.09 -0.31 -0.71 -1.63 1.28 -0.87 0.07 Pooled std. dev. 3.56 5.15 4.65 10.82 6.85 4.31 5.26 Lower 95% prob. limit -6.9 -70.4 -9.8 -22.8 -12.1 -9.3 -10.2 Upper 95% prob. limit 7.0 9.7 8.4 19.5 14.7 7.5 10.3 Table 2 National Probability Limits for 1987 PARS Accuracy Data Pollutant Level CO 1 2 3 4 NO2 1 2 3 03 1 2 3 4 Pb 1 2 Pb flow 2 PM-10 2 S02 1 2 3 4 N 1004 888 877 6 440 387 380 1336 1238 1226 82 600 572 55 445 1353 1183 1188 105 Weighted average 0.14 0.34 -0.01 0.00 0.76 0.09 -0.21 -0.81 -0.73 -0.75 -2.31 -0.42 -1.03 4.58 0.10 -0.22 -0.24 -0.35 1.39 Pooled std. dev. 6.64 3.40 3.17 2.00 9.53 5.42 4.67 6.04 4.96 3.88 2.22 5.29 3.79 4.85 4.74 5.65 5.15 4.82 4.18 Lower 95% prob. limit -12.8 -6.3 -6.2 -3.9 -17.9 -10.5 -9.3 -12.6 -10.4 -8.3 -6.6 -10.8 -8.4 -4.9 -9.1 -11 3 -10.3 -9.8 -6.8 Upper 95% prob. limit 13.1 7.0 6.2 3.9 19.4 10.7 8.9 11.0 9.0 6.8 2.0 9.9 6.3 14.1 9.4 10.8 9.8 9.0 9.6 TSP 3963 0.11 3.34 -6.4 6.6 Precision Results The analysis of precision data was begun at the reporting organization level since according to Section 3 of 40 CFR Part 58, probability limits at this level are derived from pooled estimates of variances and weighted means of percent differences from stations (sites, instru- ments, etc.) that are expected to be reasonably homogeneous, as a result of common factors. Reporting organizations ' aving homogeneous within-site van- ices and equal site means were identi- fied on a quarterly basis. From this group, states with homogeneous vari- ances and equal means across reporting organizations were identified. Table 3 lists those states by pollutant that demonstrated effective quality control practices because they produced uniform results on a quarterly basis. As shown in Table 1, probability limits for precision data do not appear valid on a nationwide scale under the assump- tions of homogeneity of variance and equal means. However, probability limits at this level do apply to the examination of trends of ambient air quality data where acceptable probability limits for reporting organizations may be as wide as ± 100%, and homogeneity of variance is not of primary consideration. Use of the PARS data for purposes other than trends assessment may require that data be examined in smaller groups especially when the validity of assumptions con- cerning the uniformity of data is important. Although the requirements for precision and accuracy data were not established for the specific purposes of setting standards or determining attainment ------- Table 3. States with Homogeneous Variance and Equal Means Across Reporting Organizations Pollutant CO N02 03 PM-10 SO2 TSP State WV IN NM MO NE NE AZ PA PA PA WV FL OH OH OK MO PA AL AL NC TN TN TN TN IN OH IA IA AZ PA ME ME VA KY OK IA AL NC AR MO AZ Qtr. 1 1 3 4 1 2 2 1 2 3 4 4 2 3 1 4 4 1 4 4 1 2 3 4 3 4 1 4 4 3 3 4 3 2 2 2 1 2 1 3 1 N 31 28 15 33 20 20 42 43 38 40 31 13 33 38 29 26 31 10 24 20 22 27 28 28 71 26 27 28 60 45 44 44 69 49 39 54 90 57 34 21 39 Weighted average -1 39 -0.76 -5.13 4.03 0.85 1.41 -0.23 0.46 -0.44 -2.48 0.00 7.65 -240 2.90 0.38 -053 -0.54 0.06 -2.71 -1 82 1.03 1.30 1.26 -0.36 -0.50 -049 0.37 0.31 -1.12 -1.30 -2.45 -1.24 -1.66 -4.42 -038 -2.55 -1 23 -044 -386 -1 51 1.11 Pooled std. dev. 7.12 3.63 3.54 5.14 1.53 1.96 2.33 5.06 6.20 7.16 5.97 6.78 4.83 5.24 3.63 4.36 4.49 4.33 6.47 3.76 2.69 3.20 2.15 3.13 3.38 2.50 2.21 3.35 3.10 3.63 3.59 2.94 4.07 4.76 2.76 4.85 6.99 5.48 5.01 1.92 6.08 Lower 95% prob. limit -15.3 -7.9 -12.0 -6.0 -2 1 -2.4 -4.8 -9.4 -12.6 -16.5 -11.6 -5.6 -11.8 -7.3 -6.7 -9.0 -9.3 -8.4 -15.4 -9.1 -4.2 -4.9 -2.9 -6.5 -7 1 -5.4 -3.9 -6.2 -72 -84 -94 -70 -96 -13.7 -58 -12.0 -14.9 -11.1 -13.6 -5.2 -10.8 Upper 95% prob. limit 125 6.3 1.8 14.1 3.8 5.2 4.3 10.3 11.7 11.5 11.7 20.9 7.0 13.1 75 8.0 8.2 8.5 9.9 5.5 6.3 7.5 5.4 5.7 6.1 4.4 4.7 6.8 4.9 5.8 4.5 4.5 6.3 4.9 5.0 6.9 12.4 10.3 5.9 2.2 13.0 status, the extent to which an ambient concentration is nonattainment due to measurement error cannot be ignored. When attention is focused on individual site data, more emphasis is placed on the attainment of short-term standards rela- tive to the performance or adequacy of specific methods or types of monitoring instruments. Although precision (and accuracy) data may not be directly ap- plicable to the determination of attain- ment status to long-term standards due to averaging times and spatial differen- ces, the overriding requirements are in conjunction with the interpretation of air quality data. In this regard, site-specific bias and imprecision (variance) provide basic information upon which to draw inference or to simply calculate prob- ability limits. Due to space limitations, it is impossible to list each site that demon- strated uniform results across the year. It is worth noting that approximately 56 percent of the 2528 sites reporting data maintained an effective quality contr program, i.e., one that produced unifor results for the entire year. Accuracy Results The basic calculations for computir probability limits for accuracy data a the arithmetic mean and between-si standard deviation for each reportii organization at each audit level on quarterly basis. Minimum requiremen for these calculations are given in < ------- ";FR Part 58. Pooling of variances across Barters is not required. However, in the interest of assessing the overall effective- ness of various quality assurance programs, it was necessary to expand the basic calculations to derive annual results and to make comparisons within and between states. In these cases, the homogeneity of between-site variances and equal quarterly averages within reporting organizations had to hold for the probability limits to be statistically valid. Table 4 provides the probability limits for regions that have uniform results for 1987 at a given audit level. As with precision data, satisfying the assumptions of homogeneous variance and equal means is not necessary to the study of trends so Table 2 may be adequate for this purpose. The validity of these assumptions is necessary for assessing the effectiveness of quality assurance programs. However, a pro- gram that is effective at one audit level and not at another is not completely effective. Under the condition of homogeneity of variance, the average of the percent difference across levels can be con- sidered an estimate of the bias of the slope of a linear regression line through the origin as compared to an ideal slope of 1.00. This model would apply to data with two or more levels such as gases and analytical lead results. The basic statistical assumption of homogeneity of variance required for the validity of using this model on accuracy data can be tested using Bartlett's chi-square statistic. Table 5 lists probability limits and slope estimates for regions that demonstrated an effective quality control program with respect to homogeneity of variance across levels when Bartlett's statistic was used. Analysis of Individual Site Data Using the models customarily em- ployed for precision and accuracy data, there is currently no provision for calcula- ting probability limits for accuracy data at individual sites. However, if the conditions of homogeneity of variance can be assumed to hold across levels, then the average and variance of the percent dif- ference (across levels) provide estimates for calculating probability limits for an individual site. In this case, the average is an estimate of the bias in the slope of a regression line through the origin as com- pared to an ideal slope of 1.00. For calcu- lating variance, there are only one, two, >r three degrees of freedom at most depending on the number of levels audited. An alternative model that would be a more objective way of estimating the slope and consequently the bias is to regress the indicated value onto the designated value. Until more research can be conducted in this area, the risk must be accepted that substantial depar- tures from the necessary assumptions will invalidate the estimates derived from this latter approach. Conclusions The availability of individual site data is invaluable in providing more detailed information concerning the performance of site-specific methods and in providing more informed attainment decisions per- taining to a specific site. In addition, having individual site data affords an opportunity to use statistical models to assess the overall effectiveness of specific quality assurance programs. Evaluations based solely on probability limits are inadequate for these purposes. It is evident that some statistical measure of internal comparability must be used in order to detect uniformly reliable preci- sion and accuracy data. This is important for identifying states or reporting organ- izations that may have difficulty in consis- tently administering an effective quality assurance program. The results presented in this report are by no means conclusive. The reporting organizations, states and regions listed as demonstrating consistently effective qual- ity assurance programs were judged on the basis of the uniformity of results where there was no justification in claim- ing a real difference between biases or between the variances of the data. Pre- cautions were taken to reduce risks of committing errors in this assessment. In fact, it was in the verification of basic assumptions required to validly calculate probability limits that results relating to the effectiveness of various quality assurance programs were derived. The outcome of some comparisons may be due to lack of normality in the data. Although normality is not a direct indicator of the effectiveness of a quality assurance program, it is important for statistically testing the homogeneity of variance and equality of biases as a way of assessing the uniformity of the data. It is hoped that this evaluation will provide Regional QA Coordinators with informa- tion to assist in their review of operations and quality control practices across the states in their region. Recommendations In the interest of providing precision and accuracy data that is more easily analyzed, interpreted, and costwise and timewise more economical, some con- sideration should be given to improving the efficiency of the PARS precision and accuracy sampling design. Recommen- dations are presented in this section that should prove beneficial in these regards to the PARS and similar data bases. With the exception of flow rate audits, audits should be performed on all instruments at two nominal levels on a biweekly basis. In effect, there would be only one set of data rather than two: one for precision checks and one for accu- racy audits. This would allow a regression approach to be used in analyzing and interpreting the data. To make use of the percent difference calculation for auto- mated analyzers, the basic requirements would be that the regression line be a straight line through the origin, and that the error about the line be proportional to audit levels (i.e., homogeneous variance in percent difference across levels). In this case, levels could be placed near the extremes providing a minimum variance estimate of the slope using the percent difference calculation. An optional third level could be audited at a midpoint once per quarter as a test for curvature if needed. However, the use of two levels is best if the estimate of a slope is of primary importance. Probability limits would still be used as an indicator of the distribution of the bias between the average of ratios and the ideal slope of 1.00. Precision estimates would be calculated from the variance of biweekly samples at each level. Collocated data from manual instru- ments would still be gathered on a biweekly schedule. This is basically a regression situation as it exists since ambient levels are paired to calculate the percent difference. However, the formula currently used for manual methods is approximately equal to the difference between the logs of the designated and indicated values. Therefore, if a lognor- mal distribution is assumed for the distri- bution of ambient particulate data (which is a common assumption), the percent difference is distributed as a normal variable around a mean of zero when there is no bias. Even in the case of accuracy data, averaging percent differ- ences as an estimate of the slope of a regression line through zero is equivalent to taking logs of the indicated measure- ments in order to stabilize the variance ------- Table 4. Pollutant CO NO2 03 Pb PM-10 S02 Table s. Pollutant CO NO2 03 SO2 Regions Region 3 4 5 5 6 6 6 3 3 3 5 5 6 6 6 3 3 5 4 5 7 4 5 5 5 6 6 6 7 Regions Region 5 6 3 5 6 3 5 6 with Homogeneous Variance Across Level 2 2 1 2 / 2 3 T 2 3 2 3 1 2 3 2 3 1 1 2 2 1 1 2 3 1 2 3 2 A/ 29 98 50 32 45 56 26 37 50 61 43 42 31 31 18 95 747 66 37 48 20 754 80 277 206 43 32 22 30 Weighted average -3.33 0.12 -3.09 0.01 1.61 0.15 -0.98 4.76 2.88 7.90 -7.03 -7.74 0.05 -7.03 -0.57 -0.65 -0.55 -2.87 -2.73 -0.09 -3.03 -0.43 -4.36 2.07 0.52 -7.63 -7.9 -2.78 -0.59 with Homogeneous Variance Across N 82 127 142 85 80 242 497 97 Weighted Average -7.87 0.43 2.87 -7.09 -0.49 -0.59 0.36 -7.85 Slope 0.98 1.00 1.02 0.98 0.99 0.99 1.00 0.98 States and Quarters by Level Pooled std. dev. 4.14 2.23 7.15 3.12 5.88 4.74 5.07 7.28 4.82 4.23 5.16 5.49 11.46 4.34 3.66 3.20 3.03 4.93 4.31 3.73 2.74 5.17 6.98 4.97 4.97 7.87 6.28 6.74 6.28 States, Quarters, Pooled std. dev. 5.7 5.2 5.2 5.3 7.9 3.7 5.7 7.7 Lower 95% prob. limit -11.4 -4.2 -17.1 -6.1 -9.9 -9.1 -10.9 -9.5 -6.5 -6.3 -11.1 -11.9 -9.5 -7.6 -6.9 -6.5 -72.4 -70.5 -7.4 -8.4 -70.5 -78.0 -7.7 -9.7 -77.0 -74.2 -75.4 -72.9 and Levels Lower 95% prob. limit -13.1 -9.8 -7.4 -11.5 -16 1 -6.6 -9.8 -15.9 Upper 95% prob. limit 4.7 4.5 70.9 6.7 73.7 9.4 8.9 79.0 72.3 70.2 9.0 9.6 22.5 7.4 6.6 5.6 5.4 6.8 6.3 7.2 2.3 9.7 9.3 77.7 70.7 73.8 70.3 77.0 77.7 Upper 95% prob limit 9.3 10.6 13.1 9.3 15.1 5.4 10.5 72.2 across levels when using standard re- gression techniques. Using this approach, the terms "accuracy" and "precision" are quantities that refer to the bias and variance, respectively, of all quality assurance data and not to two separate sets of data. This is the usual statistical interpretation of these terms. Flow rate audits should no longer be classified as accuracy audits for the manual methods, but simply as a check on the quantity flow rate since accuracy (bias) and precision (variance) are both relevan quantities to flow rate audits. The regression approach would provide a single model for assessing all precisioi and accuracy data on a site-specific basis. Currently, there is no model fo ------- xamining accuracy data on a site- .pecific basis. As the sampling design now exists, sites are treated as random for calculating probability limits. For pre- cision data, biweekly measurements pro- vide the random variation for calculating within-site variances, but at only one audit level. In quality control work, it is the behavior of the specific instruments (sites) at different concentration levels that is of interest and not the behavior of a random sample of instruments from some larger population of possible instru- ments. This is a weakness in the SLAMS PARS system's design that should be resolved. ------- The EPA author, Jack C. Suggs, is with the Atmospheric Research and Exposure Assessment Laboratory, Research Triangle Park, NC 27711. The complete report, entitled "Precision and Accuracy Assessments for State and Local Air Monitoring Networks 1987,'" (Order No. PB 89-755 246/AS; Cost: $21.95, subject to change) will be available only from: National Technical Information Service 5285 Port Royal Road Springfield, VA 22161 Telephone: 703-487-4650 The EPA author can be contacted at: Atmospheric Research and Exposure Assessment Laboratory U.S. Environmental Protection Agency Research Triangle Park, NC 27711 United States Environmental Protection Agency Center for Environmental Research Information Cincinnati OH 45268 Official Business Penalty for Private Use $300 EPA/600/S3-89/015 ------- |