United States Environmental Protection Agency Atmospheric Research and Exposure Assessment Laboratory Research Triangle Park, NC 27711 Research and Development EPA/600/S3-90/008 May 1990 Project Summary Precision and Accuracy Assessments for State and Local Air Monitoring Networks: 1988 Luther Smith and Jack Wu Precision and accuracy data obtained from states and local agencies during 1988 are analyzed. Average biases, pooled standard deviations, and 95% probability limits of percent differences are presented for both accuracy and precision data. The results of a site-by-site linear regression are reported for the ac- curacy data. Reporting organizations, states and regions which demonstrate consistent precision arid accuracy data as the result of effectively admin-istered quality assurance programs are identified. The effectiveness of the quality assurance programs on a national level is gauged by use of percentiles for the percent differences. The PARS arid NPAP data sets for 1988 are analyzed and compared. This information is in-tended as a guide for identifying problem areas within the quality assurance programs which may need added attention and for allowing more knowledgeable decisions concerning attainment status regarding ambient air quality standards. This Project Summary was developed by EPA's Atmospheric Research and Exposure Assessment Laboratory, Research Triangle Park, NC, to announce key findings of the research project that is fully documented in a separate report of the same title (see Project Report ordering information at back). Introduction Revisions to Appendix A, 40 CFR Part 58 promulgated March 19, 1986, required site-specific precision and accuracy data to be submitted as actual test results beginning m January 1987. This report analyzes these data for the year January. 1988 to December, 1988. The availability of individual site data and the opportunity to assess the performance of specific instruments was cited as a way to improve the usefulness of the data quality estimates associated with the NAMS/SLAMS monitoring network. The regulations did not, however, specify how this would be accomplished except that EPA would now be responsible for calculating the pooled precision and accuracy probability limits formerly calculated and reported by the reporting organizations. The objectives of this report are to analyze and interpret individual Precision and Accuracy Reporting System (PARS) site data as they pertain to: 1. Identifying extreme measure- ment errors. 2. Evaluating the effectiveness of SLAMS quality assurance programs. 3. Validating models used to describe precision and accuracy data. ------- 4. Improving decisions concerning attainment of air quality standards as they relate to specific instruments. The goal is to provide an overall assessment of various quality assurance programs at the reporting organization, state, and regional levels. Unless otherwise noted, region as used in this report refers to EPA regions. The National Performance Audit Program (NPAP) also collects accuracy data, and this data set was analyzed as well. At those sites which had measurements from both networks, the NPAP and PARS accuracy data were compared. Manual S02 and N02 data were not included in this report because so little data are available. Data Analysis A goal of this report is to assess the overall effectiveness of quality assurance programs administered by the states and local agencies. The use of 95% probability limits for this purpose rests upon the assumptions which were required for their calculation. The data has a nested structure (i.e , sites within reporting organizations within states within regions) and replicated by quarters; for the precision data, the biweekly measurements provide repeated sampling. The analysis of the 1988 data proceeded by combining subgroups within the hierarchical structure and across quarters (and, for accuracy data, levels) beginning with sites as the basic subgroup for accuracy data and biweekly samples for the precision data. To test for homogeneity of dispersion, a test known as Lev1:med was used; it has generally been found to perform well with regard to size and power in comparison to other methods. Briefly, Lev1:med works as follows: (1) within each subgroup form the quantities Z(J = | med where x,j is the ith observation (in the case here, percent difference) in the jth subgroup and Xj.med is the corresponding median value; then (2) test for homogeneity by doing a standard one-way ANOVA F-test on the; Zij's. Strictly speaking, Lev1:med is not a test of equality of variances but of dispersion or spread in general. An advantage to using Lev1:med is that ones is allowed to postpone the assumption of normality until the probability limits are actually computed. The significance level used in this report was 5%. One problem with Lev1:med in the above form is that it is too conservative for small (n < 8), odd sample sizes. This results from values of zero for Zjj distorting the F test statistic. To adjust for this problem, Zy was calculated as above except when Xjj = Xj,med; in this instance, Xj.med was replaced by the median of the subgroup with Xji removed. While this does not completely cure the problem, it does provide a measure of relief while still retaining all the original data. Precision Results The precision data analysis was begun by aggregating sites to the reporting organization level by quarter and then across quarters. This was repeated through the state and regional levels. For those reporting organizations, states, and regions which consistently executed the quality assurance program for a given quarter, the results were quite good in 1988; the vast majority of average percent differences were less than, and a great many were well below, 10% in magnitude. For those reporting organizations, states, and regions which consistently executed the quality assurance program throughout the year, the performance of the program was also good. Table 1 provides the percentages of the reporting organizations, states, and regions which consistently executed the PARS quality assurance program. It indicates that as the level of aggregation increased either geographically or temporally the percentage of cases where homogeneity occurred declined; that is, as one combined either larger geographic areas or blocks of time, the difficulty of consistently executing the quality assurance program increased. Accuracy Results The accuracy data analysis was done in an analogous manner to that for the precision data The nature of the data necessitated one difference. Generally, the accuracy data was very sparse within quarters for the PARS data (often only one site in a reporting organization); for the NPAP data, usually only one value was available for the year. The accuracy measurements are made at (1 to 4) different levels Therefore, accuracy data was aggregated to the reporting organization, state, and regional scales across quarters; the analyses were performed by audit level separately and also across audit levels. For those reporting organizations, states, and regions which consistently executed the PARS quality assurance program, the results were very good. The overwhelming majority had average percent differences less than 10% in magnitude (with many well below). This was true for individual levels, and when aggregating data across levels. Table 2 gives the percentages of the reporting organizations, states, and regions which consistently executed the PARS program. As with the PARS data, where the NPAP program was consistently executed, the percent differences were not generally large. However, the percentages of groupings which maintained consistency varied between PARS and NPAP (Tables 2 and 3). But a better way to compare the NPAP and PARS programs is to use collocated data; this is discussed later. The accuracy data was also analyzed with a regression approach. Linear regression was attempted as a means of assessing the overall performance of the accuracy program by examining the network on a site-by-site basis. Unfortunately, the regressions at the individual sites suffered from a lack of data; often only 3 data points were available. The limited amount of data reduced the power of the regressions and prevented adequate checking of the basic regression assumptions (i.e., normally distributed error terms with homogeneous variances). Therefore, the results of the regression analysis should not be viewed as firm conclusions, but as general indicators of where future efforts might best be directed. The measured value was regressed on the audit (target) value on a site- by-site basis, and the resulting regression line was compared to the "ideal" line which has a slope of one and passes through the origin. The comparison to the ideal line was done by making the joint hypothesis test that the intercept estimated by the regression was zero and that the estimated slope was 1. It may be useful to consider the interpretation of these estimated parameters. While theoretically the intercept is the value that would be measured if no pollutant were present, the estimated intercept is best viewed here as an indicator of the general bias of the measurement process over the range of values established by the audit levels. (This is because audit levels are necessarily set at positive values, and thus no data are available about the measurement process at a pollutant level of zero.) ------- Table 1. Percentage of Cases with Homogeneous Dispersion for PARS Precision Data Key: N = Total number possible S = Number meeting homogeneity criterion % = Percentage to nearest whole percent A. Reporting organizations across sites by quarter Pollutant CO NO2 O3 Pb PM10 SO, TSP N S % 274 167 61 155 118 76 293 196 67 17 16 94 95 77 81 287 195 68 297 250 84 B. Reporting organizations across sites and quarters Pollutant CO N02 03 Pb PM10 C. States across reporting organizations by quarter Pollutant CO NO2 O3 Pb PM10 SO, SO, TSP N 80 S 34 % 43 48 32 67 93 62 63 16 12 75 47 37 79 87 49 56 103 76 74 TSP N S % 78 35 45 47 28 60 71 43 61 20 16 80 68 51 75 66 28 42 76 40 53 D. States across reporting organizations and quarters Pollutant CO NO2 03 Pb PM10 S02 TSP N S % 23 4 17 13 4 31 21 10 48 8 5 63 20 9 45 18 6 33 22 4 18 The slope indicates how the accuracy measurement depends on audit level. A slope of zero would indicate that a sampler reports numbers essentially independently of what the true pollutant levels are; a slope between zero and 1 indicates that the sampler does not increase its reported value fast enough as pollutant level increases, while a slope value greater than 1 indicates that reported values increase too rapidly; a slope of 1 says that as the pollutant level changes, the machine responds with exactly the same change in its reported value. The joint hypothesis tests were all conducted at the 5% level; the hypothesis was rejected considerably more often than would be expected by chance. (Note: the joint hypothesis may be rejeccted because either or both estimated parameters may be too far from its "ideal " value.) However, in judging the effectivenesses of the quality assurance program, the estimates for the intercept and slope are more relevant since they indicate the degree to which the parameters depart from their ideal values. In general there did not appear to be a large overall bias (i.e., intercept estimate) in the accuracy measurements for CO, N02, 03, or S02 in the PARS network. Similarly, the NPAP data did not show a large bias for CO or Pb Of more concern were the intercept estimates for Pb in the PARS results. For this pollutant intercepts were estimated which were quite large in magnitude, both positive and negative, in several cases. The slope estimates were generally within 10% to 20% of their ideal value of 1. In summary, then, the regression results indicated that for CO, N02, 03, and SO2, accuracy audits generally conformed to the desired results. For Pb, there may have been some bias in the accuracy audit results at certain sites in the PARS network. (Note: Accuracy audits for TSP and PM-10 were only done at one level, and regression was therefore inappropriate in these cases.) National Results If the assumption is made that the percent differences were taken from the same normal population for a given pollutant (and level), then annual probability limits could be calculated on a national basis, as shown in Tables 4 and 5. However, the earlier results displayed ------- Table 1. (cont'd) Percentage of Cases with Homogeneous Dispersion for PARS Precision Data Key: N = Total number possible S = Number meeting homogeneity criterion % = Percentage to nearest whole percent E. Regions across states by quarter Pollutant CO NO2 O<, Pb PMiO SO2 TSP N S % 40 4 10 28 8 29 39 10 26 22 11 50 35 20 57 36 12 33 39 25 64 F. Regions across states and quarters Pollutant CO NO2 O3 Pb PM10 SO2 TSP N 10 S 1 0/0 10 7 0 0 10 1 10 8 3 38 9 1 11 9 2 22 10 3 30 in Tables 1, 2, and 3 indicate that variance (or dispersion) is not uniform across the country. In addition, examination of data plots led to the conclusion that the assumption of normality is probably incorrect. An alternative method of examining the data which makes no assumptions about the underlying statistical distribution(s) is provided in Tables 6 and 7. (The n'h percentile is that value such that n% of the data were less than or equal to it. Note that the 5th and 95^ percentiles bracket the middle 90% of the data.) While the normality assumption may not be appropriate, the distributions of the percent differences for both precision and accuracy data generally appeared to be unimodal and symmetric, and thus the means and medians are quite similar (Tables 6 and 7). For the precision data, the ranges established by the 95% probability limits (Table 4) and the 5th and 95th percentiles (Table 6) are very similar for each pollutant. However, this is not quite the case for the accuracy data (Tables 5 and 7). The accuracy data show these exceptions the percentile spread is wider han the probability limits for N02 and narrower for 03 at level 4 in the PARS data and is narrower for lead and TSP in he NPAP data. These observations (except 03 at level 4 in PARS and Pb and TSP in NPAP for the accuracy data) indicate that the violations of the assumptions necessary for calculating he probability limits leads to an under- estimate of the amount of data in the tails (i. e., farther reaches) of the distributions. Table 6 shows that the middle 90% of he precision percent differences occurred roughly within the range of (- 10%, 10%) for CO, N02, 03, and SO2 For Pb the range was roughly (-20%, 20%), and the spreads for PM-10 and TSP were broader than for the gases but less than for lead. Table 7 indicates that the percent differences for the accuracy data tended to be larger at the lowest audit level. (Note the larger widths between the 5'h and 95th percentiles at these lowest levels.) The widest spread of the middle 90% of the accuracy percent differences is the range from -12% to 23% (for the NPAP PM-10 data). Thus, on a national scale, the precision and accuracy quality assur- ance programs seem to be operating well, in general. Comparison of the 1988 Pars and NPAP Data Some sites had data from both the NPAP and PARS programs. Using this collocated data, the two networks were compared. The quantities examined were the accuracy percent differences for CO, Pb flow rate, PM-10, and TSP. Generally there was only one NPAP observation at a site (or, for CO, a level). There were usually two or more PARS observations at a site (or level), but in some cases there was also only one PARS value. The NPAP observations were examined to see whether the NPAP value was above, below, or within the range of the PARS values. This was done on a site by site (and, for CO, level by level) basis. In cases where only one value was available from each program, binomial tests indicated that the NPAP values were evenly split between the above and below categories for level 1 of CO and for PM-10; however, there were significantly more above occurrences for levels 2 and 3 of CO and TSP. The significance level used was 5%. Where more than one PARS value was available, the data were examined to see whether (in each individual case) the NPAP value was so far removed from the PARS values as to be considered to have come from a different distribution. Dixon's r10 outlier test was used. The null hypothesis was that for each individual case the PARS and NPAP values came from a single normal distribution. First, the ratio of the difference between the NPAP value (when it was the highest or lowest value) and the closest PARS value to the range of all values was formed. A p-value was then either calculated from a formula or ------- Table 2. Percentage of Cases with Homogeneous Dispersion for PARS Accuracy Data Key: N = Total number possible S = Number meeting homogeneity criterion % = Percentage to nearest whole percent A. Reporting organizations across sites and quarters by level Pollutant CO NO2 03 Pb Pb-flow PM10 SO2 TSP N S % 150 90 60 83 48 58 166 105 63 29 24 83 5 3 60 54 38 70 157 94 60 84 52 62 B. Reporting organizations across sites, quarters, and levels Pollutant CO NO2 O3 Pb Pb-flow PM10 SO2 TSP N S °/o 78 63 81 46 40 87 92 69 75 17 14 82 5 3 60 54 38 70 88 62 70 84 52 62 C. States across reporting organizations and quarters by level Pollutant CO NO2 O3 Pb Pb-flow PM10 SO2 TSP N S % 61 46 75 37 33 89 61 54 89 22 12 55 1 1 100 20 15 75 55 47 85 20 15 75 D. States across reporting organizations, quarters, and levels Pollutant CO NO2 03 Pb Pb-flow PM10 SO2 TSP N 22 S 15 % 68 14 10 71 21 5 24 11 4 36 2 2 100 20 15 75 19 11 58 20 15 75 linearly interpolated from a table. (Cases where all PARS values were the same were excluded since in such a case, the test automatically would declare the NPAP value an outlier, no matter how close it was to the PARS value. There were very few such cases.) Table 8 shows the number of times a significant result was obtained from this procedure at the 5% level. This table indicates that generally NPAP percent differences are in agreement with PARS percent differences for accuracy measurements on a case by case basis. These results for the individual sites or levels were expanded to a network basis by using Fisher's method of combining tests. Combining the individual test results allows a comparison of the overall NPAP and PARS networks based on collocated data. Fisher's method has good statistical power and may detect differences which were hidden because the individual tests were based on so few data points. Basically, Fisher's method transforms the p-values from the individual tests and adds them to obtain a chi square distributed test statistic. The results from combining the individual tests were that only for PM-10 were the NPAP values different from the PARS values. The difference was significant at the 5% level, but not at the 1% level. The fact that CO values were available from three levels permitted a different method of comparing the two networks. At each site, the percent difference values from both networks together were linearly regressed on level, and the regression diagnostic Cook's D was calculated for each point. Cook's D basically measures how strongly a single data point affects the estimated regression parameters. Of the 197 CO regressions performed, only 13 NPAP data points had Cook's D statistics exceeding the 50th percentile value of the appropriate F distribution. Thus, based on this examination of regression diagnostics, it does not appear that the PARS and NPAP CO percent differences are substantially different. In summary, then, there do not seem to be large differences between the NPAP and PARS data sets. One pos- sible exception to this is the case of PM- 10. Overall, NPAP PM-10 percent dif- ferences appeared to be higher than PARS values at the 5% significance level; however, under the more stringent criterion of a 1% significance levei, there is not a significant difference between the two networks. ------- Table 2. (cont'd) Percentage of Cases with Homogeneous Dispersion for PARS Accuracy Data Key: N = Total number possible S = Number meeting homogeneity criterion % = Percentage to nearest whole percent E. Regions across states and quarters by level Pollutant CO NO2 O3 Pb Pb-flow PM10 SO2 TSP N S % 30 12 40 18 16 89 29 14 48 14 6 43 3 3 100 9 4 44 30 23 77 8 3 38 F. Regions across states, quarters, and levels Pollutant CO NO2 03 Pb Pb-flow PM10 SO2 TSP N S % 10 1 10 6 5 83 10 4 40 7 4 57 3 3 100 9 4 44 9 2 22 8 3 38 Summary For both precision and accuracy data in those cases where the quality assurance program (PARS or NPAP) was consistently executed across different strata of the network (e.g., geographic region, quarter, or audit level), the performance was generally good. Only rarely did weighted average percent differences exceed 10% in magnitude when data were combined across strata, and often the average level was well below 10% in size. Regression applied on a site-by-site basis to the accuracy data indicated that for CO (PARS and NPAP), N02, 03, and S02, accuracy audits generally were not biased to any large degree over the range of pollutant levels established by the audit levels and generally instruments responded well as pollutant levels varied. For Pb, there may have been some bias in the accuracy audit results at certain PARS sites. However, the nature of the data limited the utility of the regressions, and these results should be viewed only as rough indicators of the state of the accuracy quality assurance program. On a national basis, the middle 90%of the PARS precision percent differences occurred roughly within the range of (- 10%, 10%) for CO, N02, 03, and SO2. For Pb the range was roughly (- 20%,20%), and the spreads for PM-10 and TSP were broader than for the gases but less than for lead. Nationally, the percent differences for the PARS and NPAP accuracy data tended to be larger at the lowest audit level. The widest spread of the middle 90% of the accuracy percent differences is the range from -12% to 23% (for PM-10 in NPAP data). Thus, on a national scale, the precision and accuracy quality assurance programs seem to be operating well, in general. Based on analyses of collocated data, there do not seem to be large dif- ferences between the NPAP and PARS data sets One possible exception to this is the case of PM-10. Taken together, the results above indicate that the quality assurance programs for accuracy and precision generally seem to be operating well, though there may be pockets which could be improved. ------- Table 3. Percentage of Cases with Homogeneous Dispersion for NPAP Data Key: N = Total number possible S = Number meeting homogeneity criterion % = Percentage to nearest whole percent A. Reporting organizations across sites and levels Pollutant CO Pb Pb-flow PM10 TSP N S % 59 53 90 0 0 0 0 0 0 0 0 0 0 0 0 B. States across reporting organizations by level Pollutant CO Pb Pb-flow PM10 TSP N S % 44 24 55 22 3 14 0 0 0 5 2 40 19 15 79 C. States across reporting organizations and levels Pollutant CO Pb Pb-flow PM10 TSP 20 11 0 5 19 16 11 0 2 15 80 100 0 40 79 D. Regions across states by level Pollutant CO Pb Pb-flow PM10 TSP N S % 27 22 81 27 12 44 1 0 0 9 8 89 9 5 56 E. Regions across states and levels Pollutant CO Pb Pb-flow PMIO TSP N S % 10 6 60 10 7 70 1 0 0 9 8 89 9 5 56 ------- Table 4. National Probability Limits for PARS Precision Data Pollutant CO NO2 03 Pb PM-10 S02 TSP Table 5-A. Pollutant CO NO2 03 Pb Pb-fllow PM-10 S02 TSP N 14143 6294 16980 1108 4634 18087 11839 Mean (°A -0.03 -0.67 -0.84 -0.15 0.84 -1.09 0.09 x Standard Lower 95% Upper 95% Deviation prob. limit prob |jmit 3.42 5.41 4.14 13.28 8.73 4.28 7.76 -6.74 -11.27 -8.95 -26.19 16.26 -9.47 -15.13 6.68 9.93 7.28 25.88 17.94 7.29 15.31 National Probability Limits for PARS Accuracy Data Level 1 2 3 4 1 2 3 4 1 2 3 4 1 2 1 2 2 1 2 3 4 2 N Mean (°/ 1006 1050 895 7 473 458 420 15 1364 1349 1161 114 1000 856 3 107 1226 1235 1211 1003 101 2814 0.03 0.38 0.24 1.97 0.40 0.01 -0.47 0.58 -0.82 -0.58 -0.75 -2.64 0.11 -0.99 -2.09 0.57 -0.05 -0.14 -0.10 -0.44 -0.62 0.56 ,Q > Standard Deviation 5.51 2.79 2.97 0.14 4.83 3.66 4.34 1.66 5.57 3.64 4.58 8.75 5.07 4.59 1.41 3.25 3.05 5.38 4.91 5.13 4.27 322 Lower 95% prob. limit -10.82 -5.09 -5.57 1.69 -9.07 -7.16 -8.99 -2.68 -11.74 -7.71 -9.73 -19.78 -9.83 -9.97 -4.85 -5.80 -6.04 -10.69 -9.73 -10.50 -8.99 -5.75 Upper 95% prob. limit 10.77 5.84 6.05 2.24 9.87 7.18 8.04 3.83 10.09 6.55 8.24 14.50 10.05 8.00 0.67 6.95 5.94 10.41 9.53 9.62 7.75 6.87 ------- Table 5-8. National Probability Limits for NPAP Data Pollutant Level N Mean Standard Lower 95% Upper 95% (%) Deviation prob. limit prob. limit CO Pb Pb-flow PM-10 TSP 1 2 3 1 2 3 2 2 2 232 229 231 113 115 114 23 133 511 -0.62 1.85 2.18 -250 3.96 -252 4.46 1.18 2.29 5.65 3.81 3.54 10.68 10.98 10.10 7.09 10.12 9.45 -11.69 -5.63 -4.75 -23.43 -17.56 -22.31 -944 -18.65 -16.22 10.45 9.33 9.11 18.43 25.49 17.27 18.36 21.02 20.81 Table 6. National Percent Difference Percentiles for PARS Precision Data 5th g5th Pollutant N Mean Median Percentile Percentile CO NO2 03 Pb PM-10 S02 TSP 14143 6294 16980 1108 4634 18087 11839 0 -1 -1 0 1 -1 0 0 0 0 0 0 -1 0 -7 -13 -9 -22 -13 -10 -13 8 10 7 23 18 8 13 ------- Table 7-A. National Percent Difference Percentiles for PARS Accuracy Data Pollutant CO NO2 03 Pb Pb-flow PM-10 S02 TSP Level 1 2 3 4 1 2 3 4 1 2 3 4 1 2 1 2 2 1 2 3 4 2 N 1006 1050 895 7 473 458 420 15 1364 1349 1161 114 1000 856 3 107 1226 1235 1211 1003 101 2814 Mean 0 0 0 2 0 0 0 1 -1 -1 -1 -3 0 -1 -2 1 0 0 0 0 -1 1 Median 0 0 0 2 0 0 0 -1 0 -1 0 -2 0 0 -3 0 0 0 0 -1 0 0 Percentile -10 -5 -5 0 -14 -10 -10 -4 -11 -8 -8 -9 -10 -8 -4 -5 -6 -11 -9 -10 -8 -5 Percentile 12 7 6 3 15 10 8 7 9 6 6 2 11 6 1 6 6 11 9 9 9 6 10 ------- Table 7-S National Percent Difference Percentiles for NPAP Data Pollutant CO Pb Pb-flow PM-10 TSP Level 1 2 3 1 2 3 2 2 2 N 232 229 231 113 115 114 23 133 511 Mean -1 2 2 -3 4 -3 4 1 2 Median 0 2 2 -1 5 -2 2 0 2 Percentile -8 -3 -2 -18 -13 -12 -4 1 1 -10 Percentile 8 7 7 7 16 7 24 23 15 Table 8. Number of Outlier Tests Significant at the 5% Level Pollutant CO level 1 level 2 level 3 Pb-flow PM-10 JSP Number of tests 137 136 118 4 59 217 No. significant 1 9 5 1 7 16 °/ 1 7 4 25 12 7 11 ------- Luther Smith and Jack Wu are with NSI-ES Technology Services Corporation, Research Triangle Park, NC 27709 Jack C. Suggs is the EPA Project Officer (see below). The complete report, entitled "Precision and Accuracy Assessments for State and Local Air Monitoring Networks," (Order No. PB 90-183 401/AS; Cost: $23.00 subject to change) will be available only from: National Technical Information Service 5285 Port Royal Road Springfield, VA22161 Telephone: 703-487-4650 The EPA Project Officer can be contacted at: Atmospheric Research and Exposure Assessment Laboratory U.S. Environmental Protection Agency Research Triangle Park, NC 27711 United States Center for Environmental Research Environmental Protection Information Agency Cincinnati OH 45268 Official Business Penalty for Private Use $300 EPA/600/S3-90/008 ------- |