; « I United States Office of Air Quality EPA-450/2-78-037 Environmental Protection Planning and Standards OAQPS No. 1.2-092 Agency Research Triangle Park NC 27711 July 1978 Air oEPA Guideline Series Screening Procedures for Ambient Air Quality Data ------- EPA-450/2-78-037 OAQPS No. 1.2-092 Screening Procedures for Ambient Air Quality Data U S ENVIRONMENTAL PROTECTION AGENCY Office of Air, Noise, and Radiation Office of Air Quality Planning and Standards Research Triangle Park, North Carolina 27711 July 1978 ------- OAQPS GUU)EL1.\E SERIES The guideline series of reports is being issued by the Oliice ot Air Quality Planning arid Standards (OAQPS) to provide inlorrnatioii to state and local air pollution control agencies; ior txanipli. , to provide guidance on the acquisition and processing of air quality data and on the planning and analysis requisite tor the maintenance ot air quality. Reports published in this series will be available - as supplies permit - from the Library Services Office (MD-35) , U.S. Environmental Protection Agency, Research Triangle Park, North Carolina 27711; or, for a nominal lee, trom the National Technical Information Service, 5285 Port Royal Road, Springfield, Virginia 22161. Publication No. EPA-450/2-78-037 (OAQPS No. 1.2-092) 11 -------An error occurred while trying to OCR this image. ------- INTRODUCTION This guideline discusses screening procedures to identify possible outliers in ambient air quality data sets. The Standing Air Monitoring Work Group (SAMWG) has emphasized the need for ensuring data quality as an integral part of an air monitoring 1 program. The purpose of this document is to present data screening techniques to be applied to ambient air quality data by the Regions (or States) before the data are entered into SAROAD. Although the primary emphasis is on computerized techniques, the summary briefly discusses which procedures are feasible to implement manually. These screening techniques have proven to be effective in identifying "atypical" concentra- tions which often are found to have been miscoded or otherwise invalid. The meaning of the word "atypical" will become more apparent in the actual discussions of these procedures, but on an intuitive level it describes an event with very low probability and therefore, one that is unlikely to occur. The purpose of these screening procedures is to identify specific data values that warrant further investigation. The fact that a particular data value is flagged by these tests does not necessarily mean that the value is incorrect. Therefore, such values should not be deleted from the data set until they have been checked and found to actually be erroneous. The screening procedures discussed in this guideline are primarily intended to examine the internal consistency of a particular data set. For this reason, they are not designed to detect subtle errors that may result from incorrect calibration or a variety of other factors that can result in incorrect values that superficially appear consistent. That is perhaps, the easiest place to contrast these screening procedures with an overall quality assurance program. A quality assurance program usually examines all phases of the monitoring effort from data collection to the data set that is finally produced. Such an effort is much more comprehensive than the techniques presented here and is discussed in more detail elsewhere.2 Thus, the ------- techniques presented here may be considered as one part of the overall quality assurance program. However, they have been shown to be a cost-effective means of eliminating the more obvious errors and thereby improving data quality. In selecting screening procedures for this guideline, emphasis has been given to those techniques that have actually been used to examine air quality data sets. ~ Although some other approaches are briefly discussed, the intended purpose of this document is to present techniques that have been used successfully rather than to merely propose possible approaches that may some day prove useful. This document is organized so that this introduction is followed by a brief discussion of the background of the problem and then a section presenting the screening procedures followed by a conclusion and a series of appendices. In addition to a summary of the recommendations, the conclusion contrasts the initial step of identifying a possible out- lier with the final step of actually deleting the value and also discusses the proper place for these tests in the over- all data handling scheme. The appendices consist of articles discussing the application of these tests to air quality data and computer programs to perform the tests. This structure was chosen so that the screening procedures could be presented in various levels of detail. The discussion in the main body of the document is intended to give a general overview and an intuitive understanding of what each test is designed to do. The appendices provide more detail and would be of interest to those concerned with the actual implementation of these screening procedures. Those readers interested in more details on the underlying statistical theory will find the appropriate articles included in the references. ------- 2. BACKGROUND It is a truism to sa>' that data quality is important. Virtually no one will argue that data quality is not important, but the key question is "how important?" Obviously, the degree of data quality required depends upon the intended use of the data. This is why air pollution data sets present some interesting practical problems. One use of air quality data is to assess compliance with legal standards such as the National Ambient Air Quality Standards (NAAQS). The form of these standards frequently re- enforces the need for data quality. For example, the NAAQS for total suspended particulate, sulfur dioxide, carbon monoxide, and oxidant all specify upper limit concentrations that are not to be exceeded more than once per year. In such cases, it is the second highest value for the year that becomes the decision- making value. With this application in mind, the need for data quality is obvious. Another factor that must be considered in air monitoring programs is the volume of data involved. Continuous instruments can produce as many as 8760 hourly observations for the year. Intermittent monitoring schedules for 24-hour data routinely produce 60 or so values per year. When these numbers are accumulated for several pollutants for an entire network, State, or for the Nation, the total number of data values quickly becomes cumbersome. For example, it is estimated that EPA's National Aerometric Data Bank is currently expanding at the rate of 20 million values pet- year. Therefore, maintaining a data bank for air pollution measurements involves the basic conflict of having to routinely process large volumes of data and yet at the same time ensure an almost zero defect level of data quality. Because of the nature of the standards, many users may only bt interested in the two highest values at each site for each pollutant. It should be noted that two values from a data set of 8760 observations constitutes 0.023 percent of the data. This means that the user's perception of data quality may be entirely different from the -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. ------- One point worth noting in the discussion of these statistical tests concerns the validity of the underlying assumptions. As a general rule, these types of tests assume that the observations are independent. To some extent, this may be approximately correct in the case of every-sixth-day sampling, but obviously there are seasonal and diurnal patterns associated with air quality levels that make this assumption questionable in general. This problem could be approached by the use of time series models to minimize the auto-correlation (interdependence of successive values), but from a practical viewpoint, the tests discussed here have been shown to work reasonably well. In a sense, the viewpoint taken here is to use the simplest test that has been successfully demonstrated and have that fact substantiate the claim that the underlying assumptions are "approximately satisfied." 3.1 Twenty-Four Hour Data Tests There are several statistical tests that may be used to screen 24-hour air quality data sets. Tests attributed to Dixon. Grubbs, and Shewhart have been considered for identifying 2-4 7 suspect air quality values. ' Conceptually, all these tests yield a probability statement that provides a measure of the internal consistency of the data set. The Dixon and Shewhart test procedures have been applied to air quality data sets. The Dixon test may be conveniently used to examine one month's worth of 24-hour data. Basically, this test is used to examine the relative spread within the data set and is quite easy to compute. For example, if there were five values in the month, it is only necessary to rank the data from smallest to largest. Then the difference between the highest and second highest values is divided by the difference between the highest and lowest values. This ratio gives a fraction ranging from zero to one. A graphical presentation of this test is given in Figure 1 for two data sets that have four points in common, but the second data set contains a value of 420 pg/m3 instead of the 42 \ig/m^ in the first data set, i.e., a possible tran- -------An error occurred while trying to OCR this image. ------- scn'ption error. The computed ratio in the first cases is .33 which is acceptable while the second ratio is .73 which would be flagged at the 5 percent level as a possible outlier. The closer this ratio is to one, the more likely it is that the high value is an outlier rather than a correct value. Tables are available to determine the probability associated with this computed 39 ratio. ' The Grubbs test is conceptually similar although the ratio used is the difference between the highest value and the mean divided by the standard deviation. This requires slightly more computation, but again tabulated values for the associated probabilities ? in are available. '1U One characteristic of these types of tests is of particular interest in terms of their possible use with air quality data. These tests implicitly assume that at least one value in the data set is correct. If all of the values in Figure 1 were multiplied by 10, the computed ratios would remain unchanged. The key point is that these tests merely check for internal consistency and consequently, it is possible to have a data set that is entirely wrong and yet internally consistent. Initially, it may appear perfectly reasonable to expect that at least one value in the data set will be correct. However, in evaluating these tests it became apparent that the data handling schemes involved can occasionally produce an entire month of data that is incorrectly coded and there- fore, improperly scaled. With this in mind, it becomes apparent that it is not sufficient to check for internal consistency; some type of comparison must also be made to ensure that the values fall within a reasonable range. 7 12 This can be accomplished by the use of the Shewhart test. ' This test compares the monthly mean and range with those from the past few months. Again, tabulated values are available to determine 12 the associated probabilities. However, the main point is that the test is basically a two-fold screening procedure. If a monthly range differs appreciably from past monthly ranges, then it suggests an outlier within the month. On the other hand, if the monthly mean differs appreciably from past monthly means, then a scaling problem is likely. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. ------- 12 is subjective, but this should be viewed in terms of the purpose of these tests. The results of these tests are not sufficient grounds to eliminate data values, but only serve to identify values that require further examination. Viewed in this perspec- tive, these cut-offs are satisfactory. Table 1. SELECTED QUALITY CONTROL TESTS Typical Cut-Of^ Values for Patterns Test on Hourly Values (Concentration in ug/m3) Pollutant Data Stratification Maximum Adjacent Hour Hour Test Test Consecutive Spike 4-hr Test Test Ozone Total Oxidant (yg/m3) Carbon Monoxide (mg/m3) Sulfur Dioxide (ug/m3) Nitrogen Dioxide (yg/m3) * Higher values Summer-day Summer-night Winter- day Winter-night Rush traffic hours Non-rush traffic hours None None may be used for 1000 750 500 300 75 50 800* 1200 sites near 300 200 250 200 25 25 200* 500 strong 200(300%) 100(300%) 200(300%) 100(300%) 20(500%) 20(500%) 200(500%)* 200(300%) 500 500 500 500 40 40 1000* 1000 point sources. -------An error occurred while trying to OCR this image. ------- 14 data for a given period of time, such as a month, quarter, or year. Suspect values would be associated with large gaps in the frequency distribution. The length of the gap and the num- ber of values above the gap afforded a convenient means of de- tecting possible errors. With this simplification of the problem, it becomes possible to develop a probabilistic framework for this , . 6 problem. Figure 3 displays a histogram of actual carbon monoxide data for one month. As indicated, there is one hourly value equal to 30 mg/m3, but no other values above 12 mg/m3. It is relatively easy to compute the probability associated with such a gap by assuming that the data may be approximated by an ex- ponential distribution. This type of approximation has been examined and. appears to be adequate for the upper tail of the distribution, i.e., the higher concentration ranges. The actual formula for the probability of this gap is quite simple , and as would be expected, the probability of this particular gap occurring is quite small (.0006). In fact, the value of 30 mg/m was merely a keypunch error, and the correct value was 3.0 mg/m . It should be noted that the gap test is designed to identify unusually high values. Errors that produced unusually low values will not necessarily be detected. A possible option is to also c employ the previously discussed pattern test which will flag unusually low values if they result in a departure from the typical pattern. Both tests are fairly efficient, and on EPA's UNIVAC-1110 computer the computerized versions of these tests can process 25,000 hourly values for approximately $1.00. 4. CONCLUSION For twenty-four hour data, the Shewhart test is a convenient means of identifying possible errors. As discussed in the previous section, this test checks not only internal consistency within a month, but also consistency with adjacent months. This second check necessitates an added file of historical information, but experience suggests that this extra step is warranted. For hourly -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. ------- 17 used to detect changes in the standard deviation at a site. Time series models and the use of associated data, such as meteorological variables, would be expected to increase sensitivity and possibly result in even better data quality. However, it remains to be seen if these more elaborate approaches are cost effective when processing vast quantities of data from locations throughout the Nation. An important consideration is the proper placement of these procedures in the overall data handling scheme. As a general rule, the tests should be applied as close to the data collection step as possible. This will minimize the time lag before the potential out- lier is identified and thereby make it easier to check the value in question and still ensure that the data is submitted to EPA in a timely fashion. Procedures for handling data anomalies and suspect data identified in EPA's National Aerometric Data Bank are discussed in the AEROS User's Manual.14 However, the main thrust of a data screening program is to detect and correct any such errors before the data are submitted to EPA. As a final comment, it should be noted that once a value is flagged as a possible anomaly, it cannot be arbitrarily dropped from the data set. It must first be verified that the data point actually is incorrect. The fact that the data point is statistically unusual does not necessarily mean that it did not occur. There are a variety of factors that should be examined to determine whether the data point should be deleted. In general, the data screening tests presented here would detect only very gross errors. For example calibration errors can produce data sets that are internally consistent and consequently would pass these tests. The data sets flagged by these tests will usually contain a few values that are much higher than the rest of the data. In many cases these will obviously be the result of a transcription or coding error. Simple, but effective, steps in examining these flagged values include comparisons of adjacent hourly values at the same site, comparisons with other pollutant or meteorological data for the site in question, and comparisons with data for the same pollutant recorded at other nearby monitoring sites for the same time period. -------An error occurred while trying to OCR this image. ------- 19 10. Grubbs, F. E. and G. Beck. Extension of Samples, Sizes and Per- centage Points for Significance Tests fo Outlying Observations, Technometrics, Vol 14, No. 4, November 1972, pp. 847-854. 11. Shewhart, W. A. Economic Control of Quality of Manufactured Product. D. Van Nostrand Company, Inc., Princeton, N.J., 1931, p. 229. 12. Grant, E. L. Statistical Quality Control. McGraw-Hill Book Co., New York, 1964, p. 122-128. 13. Curran, T. C. and N. H. Frank. Assessing the Validity of the Lognormal Model When Predicting Maximum Air Pollutant Concentra- tions^ Presented at the 63th Annual Meeting of the Air Pollution Control Association, Boston, Massachusetts, 1975. 14- AEROS Manual Series Volume II: AEROS User's Manual. U.S. Environ- mental Protection Agency, Office of Air and Waste Management, Office of Air Quality Planning and Standards, Research Triangle Park, North Carolina EPA-450/2-76-029 (OAQPS No 1 2-039 December 1976. ------- A-l APPENDIX A - Gap Test for Hourly Data This appendix contains additional information on the gap test for hourly data. The following material is included: (1) A copy of the paper, "Quality Control for Hourly Air Pollution Data," which explains the details of the test, (2) A brief description of the computer program for this test (3) A listing of the FORTRAN computer program ------- A-2 QUALITY CONTROL FOR HOURLY AIR POLLUTION DATA Thomas C. Curran, Mathematical Statistician William F. Hunt, Jr., Chief, Data Analysis Section Robert B. Faoro, Mathematical Statistician U.S. Environmental Protection Agency Office of Air Quality Planning and Standards Monitoring and Data Analysis Division Research Triangle Park, North Carolina 27711 Presented at the 31st Annual Technical Conference of the American Society for Quality Control Philadelphia, Pennsylvania May 16-18, 1977 -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. ------- A-7 Description of Gap Test Computer Program Overview This FORTRAN program may be used to read SAROAD format raw data cards and screen hourly data for the criteria pollutants accord- ing to the gap test. Each monthly data set is screened for gaps and also for the number of hourly values exceeding a user supplied upper limit (SMAX( ) ). This latter feature is incorporated into the program to protect against an entire month, or portion of a month, being too high due to incorrect scaling. The user supplied upper limit is a concentration that should not be exceeded more than one time in a thousand. The program counts the number of values above this limit and uses the Poisson approximation to compute an associated probability. The gap test is calculated by fitting two different exponential distributions to the data. One estimate is obtained from the 50th and 95th percentiles of the data while the other uses the 50th per- centile of the data and the specified upper limit as the 99.9th percentile. These two different estimates are employed to protect against different types of errors. Output may be obtained for each monthly data set or PCUT( ) may be varied to suppress printing of acceptable data. The program contains certain editing features to prevent arrays from being over-subscripted. Summary results of the processing are printed at the end of each run. ------- A-8 Program Input SAROAD raw data cards (cards for non-hourly data are ignored) Program Execution On EPA's UNIVAC 1110, the following runstream will execute the program. G>ASG,A TRRP*ADSS. @XQT TRRP*ADSS.GAP @ADD (your data file - cards) Program Statistics On EPA's UNIVAC 1110, this program will process 25,000 hourly values or 2000 cards in approximately 30 seconds and a cost of $1. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. ------- B-l APPENDIX B - Pattern Test for Hourly Data This appendix contains additional information on the pattern tests for hourly data. The following material is included: (1) A copy of the paper "Automated Screening of Hourly Data," (2) A brief description of the computer program for these tests (3) A listing of the FORTRAN Computer Program ------- B-2 j 1978 ASQC TECHNICAL CONFERENCE TRANSACTIONS-CHICAGO FM .1 I in- ((:• ill ,..<-•;. <••.<--','', I'ltlt I'nec) I K,AUTOMAiror SCREENING, ,OF HOURLY AIR, QUALIFY . PATA j Robert B. Faoro, Mathematical Statistician Thomas C. Curran, Mathematical Statistician William F. Hunt, Jr., Chief, Data Analysis Section U.S. Environmental, Protection Agency Office of Air Quality Planning and Standards Monitoring and Data Analysis Division Research Triangle Park, North Carolina 27711 INTRODUCTION I Over the past several years a number of different automated methods to screen aJLr' ' quality data for errors have been proposed.^~* Basically these techniques were • developed to detect the more obvious data errors resulting primarily from keypunch, i transcription, or periodic malfunctioning instruments. More subtle errors from i inadequate calibrations procedures or similar problems resulting in measurement bias j will not be detected by these procedures. The goal of these techniques was to ensure , a high quality data product for the higherjconcentration levels because in many cases J , these higher values determine an areas'status with respect to the various ambient air ; i quality standards and the amount of emission controls needed. For example, the second' \ highest hourly observation out of a possible 8760 hours in a year is used to determine ' compliance for carbon monoxide and ozone. |Other pollutants have the second highest day as the decision making statistic. Pollutants having annual mean standards such as I total suspended particulate, sulfur dioxide, and nitrogen dioxide, would require that • more attention be given 'to the complete annual data set. ', ,,-'.,.j L > i Basically the techniques.which have been developed can be classified by their j application into two main categories: : 24 hour (intermittent systematic sampling) and hourly data (continuous sampling). Procedures for screening 2A-hour data, will , not be discussed in this paper. They have been described in previous papers.*"* A j ' guideline document^ has been prepared describing the complete air quality data screening package together with summary documentation of both tests described in this paper. The purpose of this paper is to evaluate two different schemes for screening hourly air quality data. These two procedures will be referred to as the typical pattern test and the monthly gap test. j '• I These screening procedures were developed to be both simple and yet effective discriminators between "good and "bad" data. Another requirement was that these \ tests could be done efficiently by a computer. Being simple and computer-efficient '. was most important because of the sheer magnitude of data requiring screening. At the present time, for example, there are over 2000 continuous monitoring sites lo- cated throughout the country who submit data to the National Aerometric Data Bank i (NADB) located in Research Triangle Park, North Carolina. If each of these sites collected a complete year of data (8760 hours), the total annual data submission to the data bank from these sites would be over 17 million measurements. Being effective discriminators of "good" and "bad" data is of course important since it would be time consuming and costly to flag "good" data and of course, disastrous to miss flagging "bad" data. j DESCRIPTION OF SCREENING TESTS i Although air quality is difficult to predict, generally it behaves within certain natural bounds and exhibits fairly regular geographical, seasonal, weekly, and diurnal concentration patterns depending upon emission and meteorological factors. The screening tests discussed here attempt to discover Inconsistencies in the data that , warrant further scrutiny. For example, the pollutant ozone, which is formed when i hydrocarbon and oxides of nitrogen emissions predominantly from motor vehicles are irradiated by sunlight generally exhibits lower concentrations during the night- i time hours and during the winter months. Nitrogen dioxide does not show as distinct • as seasonal pattern as ozone, but still has a well defined diurnal pattern. Generally, i nitrogen dioxide exhibits a distinct morning peak (8-10 a.m.) resulting from the * oxidation of nitric oxide emissions from motor vehicles during the morning commuter . rush. Pollutant concentration patterns usually behave fairly regularly and do not ' exhibit, except when under the influence of a strong local source, extreme hour to ' hour variation patterns. Likewise, high (low) pollutant concentrations usually result ------- B-3 1978 ASQC TECHNICAL CONFERENCE TRANSACTIONS-CHICAGO from a gradual increase (decrease) in concentrations rather than a sudden rise (fall). , Table 1 shows six typical days of nitrogen dioxide (N02) hourly concentrations from a i site in Los Angeles, California. Note that the hours immediately following the morning rush hour are typically the highest for this pollutant and that the concentra- tions show gradual changes in concentrations from one hour to another. ; These data screening procedures look for different types of inconsistencies in the data. The pattern test look for extremely high concentrations never or very rarely exceeded in the past and other types of unusual pollutant behavior. O'Reagan^ discusses some very interesting screening concepts along these same lines. The gap test looks for breaks in the monthly frequency distribution of the hourly pollutant observations. An example for ozone of a significant break in the three highest observations in a month would be: I HIGHEST HOUR 2nd HIGHEST HOUR 3rd HIGHEST HOUR 929 yg/m3 929 ug/m3 374 ug/m3 , i • A brief description of the two screening procedures will be presented before they are applied to actual air quality data. The typical pattern tests are not statistical tests in that probalistic statements cannot be made about a rejected data point. They instead represent simple and practical ways to check for obvious errors in the data. | Basically these tests can be classified into two main categories: , ! i j - tests which look for unusual pollutant behavior, such as ' exceeding some extremely high concentration, either never , before exceeded or exceeded only very rarely based on past i ; "good" data and ' i - a test whlcH looks for unusually high values in the day with respect to the other values in the day. ' More specifically, the tests look for the following types of errors: - hourly values exceeding an upper limit empirically derived from prescreened historical data (Max Hour) - differences in adjacent hourly values exceeding an empirically derived upper limit difference (Adjacent Hour) , - a single value being much different than the other values In the day using a modification of the Dixon Ratio Trst - differences and percent differences between the middle value and its' adjacent values in a 3-liour 1ntorv.il excreil hip, certain pre-derived limits, (Spike) and 1 i i - averages of four or more consecutive hours exceeding some pre- i derived concentration limit (Consecutive Hour). ' Table 2 gives typical upper limit check values used in the various pattern tests nw E?A A uT8i°n !' consistln8 of the state* °f Illinois, Indiana, Michigan, Minnesota, Ohio, and Wisconsin. One of the main drawbacks of these kinds of tests is that ideally these limits values woulc reflect a particular site, or a group of sites, havina common air pollution characteristics. It is impossible to have individual limits for each and every site. Therefore, some discrimination is sacrificed by merely having a given set of parameters for all sites. Of course, if you are only interested in screening data from a small number of sites, it may indeed be feasible to have site specific parameters. The pattern test outputs each day that contains at least one hour that violates a particular test and gives the tests which were violated. The frequency distribution gap test was developed to provide an even simpler means of screening hourly data. The two main advantages of this approach were that the re- rn ,?iTJ M"" ^ *> ProbaUstlc framework and that it could be applied universally to all data without modification. In order for the pattern test to be optimally effective, the limit checks would need to be varied on a site by site basis. The theory behind the- gap test is that unusually high values could be detected by examining the frequency distribution of the hourly data for a given period of time, such as a month, quarter or year. The test will be employed on a monthly basis in -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. ------- B-7 1978 ASQC TECHNICAL CONFERENCE TRANSACTIONS-CHICAGO these applications. The length of the gap and the number of values above the pap afford a convenient means of detecting possible errors. The exponential distribution was used to describe the upper tail of the hourly pollutant concentrations and thereby, provided the underlying theory for detecting significant gaps in the frequency distribution of the hourly data. An example of the days flagged from the gap test can be found in Table 4 of this paper. A detailed description of the test and its' application to some actual air pollution data has been discussed previously.1 DATA BASE Two sets of actual hourly air quality data taken from the NADB were screened using both techniques. The two sets represent the pollutants nitrogen dioxide (N02> and ozone (03) from about 40 randomly selected sites located throughout the country for the year 1976. There were approximately 100,000 hourly values for each pollutant. It took less than 1 minute of computer time costing about $2.00 on the UNIVAC-1110 system for each data set to do both screening procedures RESULTS Overall the two tests rejected basically the same data from both pollutant data sets. Of the 23 specific instances of rejected data, 18 of these were rejected by both tests while the remainder (5) had one but not both tests rejecting. The instances where the tests substantiated each other are almost without question true data anomalies while the rest of the cases are more doubtful anomalies. Table 3 gives examples of some of the data which were flagged by both tests. These data represent days with either a single hourly anomaly or in some cases multiple data errors. All told, 87 days (0.8%) out of over 10,000 days of data were rejected by the pattern test while 21 months (5.0%) of data out of 409 site months screened were rejected by the gap test. Table 4 gives several examples of days flagged by the pattern test and months flagged by the gap test where the two procedures did not flag the same data. All of the days flagged by the pattern test with the exception of the Los Angeles day (June 23rd) probably contain errors. The specific hour identified as in error are under- lined. The reason that the gap test did not flag these data is because in each of these cases the errors represent hourly concentrations which were not unusual for the month and therefore no significant gap in the monthly frequency distribution of observations occurred. These types of data errors then represent typical values for the month as a whole but they were unusual when they were compared with the data values recorded around the data value in question. Both examples of data flagged by the gap test will require further examination. The San Diego N02 data for August is unusual, however, because of the missing data immediately following the specific data in question. CONCLUSION Based on a limited, but yet representative set of continuous hourly N02 and 03 data, it has been shown that the pattern and gap screening tests mimic each other very well in terms of the data rejected. There were only minor discrepancies between the two tests. What is even more important is that both tests rejected data which in most cases contained real errors. This was particularly true when both tests rejected the same data. The overall rejection rate was quite low for both tests. Although all of the hourly data passing the tests were not reviewed, what data was reviewed did not reveal any obvious data errors that were missed by the tests. It is recommended that the gap test be used as the initial means of screening large hourly data sets because its' printed output is much less than the pattern test generates, particularly, of course, In the case where a lot of data is in error. There is also a slight savings in the amount of computer time for the gap test. The pattern test then can be used as a backup to substantiate the results of the gap test or to provide more specific out- put about the days which contain errors. It is recommended that these procedures be used by the agency collecting the data Instead of being used at the Regional or National (NADB) level. The problems of verification and correction of data flagged can be done more efficiently and effec- tively nearest the source of the data. Presently, the States of Minnesota, Ohio, and Wisconsin are using these procedures on a regular basis. -------An error occurred while trying to OCR this image. ------- B-9 1978 ASQC TECHNICAL CONFERENCE TRANSACTIONS-CHICAGO REFERENCES 1. Curran, Thomas C., W. F. Hunt, and R. B. Faoro, Quality Control for Hourly Air Pollution Data, Presented at the 31st Annual Technical Conference of the American Society for Quality Control, Philadelphia, Pennsylvania, May 16-18, 1977, 2. Hunt, W. F., and T. C, Curran. An Application of Statistical Quality Control Procedures to Determine Progress in Achieving the 1975 National Ambient Air Quality Standards. Transactions of the 28th Annual ASQC Confrence, Boston, Massachusetts, May 1974. 3. Hunt, W. F., T. C. Curran, N. H. Frank, and R. B. Faoro. Use of Statistical Quality Control Procedures in Achieving and Maintaining Clean Air. Transactions of the Joint Eurpoean Organization for Quality Control/International Academy for Quality Conference, Venice Lido, Italy, September 1975. 4. Hunt, W. F., R. B. Faoro, and S. K. 'Goranson. A comparison of the Dixon Ratio Test and Shewhart Control Chart Test Applied to the National Aerometric Data Bank. Presented at the 30th Annual Conference of the American Society for Quality Control. Torontp, Ontario, Canada June 1976. 5. O'Reagan, Robert T. Practical Techniques for Computer Editing of Magnitude Data. Unpublished paper, Department of Commerce, Bureau of the Census, Washington, D.C. 20223, 1972. 6. Curran, T. C., Guidelines for Screening Ambient Air Qaulity Data. U.S. Environ- mental Protection Agency, Office of Air Quality Planning and Standards, Research Triangle Park, North Carolina 27711. (In preparation) ------- B-10 DESCRIPTION OF PATTERN TEST COMPUTER PROGRAM This FORTRAN program consists of a main program and five subprograms to screen hourly air quality data for unexpected departures from typical patterns. The typical pattern tests are not statistical tests in that probabilistic statements cannot be made about a rejected data point. They instead represent simple and practical ways for checking for various possible, and in most cases, obvious errors in the data. The tests specifically look for the following types of errors: hourly values exceeding an empirically derived upper limit difference in adjacent hourly values exceeding an empirically derived upper limit difference a value in a day being much different than the other values in the day using a modification of the Dixon Ratio Test differences and percent differences between the middle value and it's adjacent values in a 3-hour interval exceeding certain pre- derived limits, and consecutive values of four or more hours exceeding some pre-derived concentration limit. The main program reads the standard hourly SAROAD card format, calls the subprograms, and outputs to the printer the results of the screening procedure. Listings of the main program and the subprograms are included following this discussion. The input cards must be ordered by the date (year, month and day) within each site, pollutant-method combination. Any number of site pollutant-- method combinations can be run back to back without any means of separation. An end file (@ E o F) indicator or other end of file indicators on tape is used to signal the end of the input data set. The screening checks are ------- B-ll performed in the subprograms. There is a separate subprogram for each of the pollutants considered: carbon monoxide, sulfur dioxide, nitrogen dioxide, and photochemical oxidants. The fifth subprogram is used for checking the data sequence of the inputted data cards. An example of the printed output is shown in the table enclosed. The output consists of the site code, pollutant- method code, year, month, and day, the hourly values for the day in question, and the test or tests which the data violated. Also, following the completion of a site, pollutant-method combination a line is printed out showing the number of days screened. Program Input SAROAD raw data cards Program Execution On EPA's UNIVAC 1110, the following runstream will execute the program. @ ASG, A TRRP*ADSS. @ XQT TRRP*ADSS. PATTERN @ ADD (your data file-cards) @ Fin Program Statistics On EPA's UNIVAC, this program, like the gap test will process about 25,000 hourly values or 2,000 cards in approximately 30 seconds at a cost of about $1.00. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. ------- C-l APPENDIX C - Shewhart Test This appendix contains additional information on the Shewhart Test for 24-hour data. The following material is included: (1) A copy of the paper, "The Shewhart Control Test - A Recommended Procedure for Screening 24-Hour Air Pollution Measurements," (2) A brief description of the computer program for the Shewhart Test (3) A listing of the Cobol computer program ------- C-2 THE SHEWHART CONTROL CHART TEST - A RECOMMENDED PROCEDURE FOR SCREENING 24-HOUR ATR POLLUTION MEASUREMENTS Introduction At the present time there are over 8,000 air monitoring sites operated throughout the United States by the Federal, state, and local governments.1 These sites collect approximately 20,000,000 ambient air pollution values annually, which are sent to the U.S. Environmental Protection Agency's (EPA) National Aerometric Data Bank (NADB) in Durham, North Carolina. The data are primarily collected to measure the success of emission control plans in achieving the National Ambient Air Quality Standards (NAAQS). As one might expect with data sets this large, anomalous measurements slip through the existing editing and validation procedures. Because of the importance that is attached to violations of the NAAQS, a quality control test to ensure the validity of the measurement of both short- and long-term concentrations is extremely important. A series of quality control tests have been examined2"4 to check ambient air quality data for anomalies, such as keypunch, transcription, and measure- ment errors. The Shewhart Control Chart Test^ has been selected to screen 24-hour air pollution measurements. This paper discusses its application to three major pollutants—total suspended particulate (TSP), sulfur dioxide (S02), and nitrogen dioxide (N02). The Shewhart Test is applied to data from monitoring instruments which generate one measurement per 24-hour period and are operated on a systematic sampling schedule of approximately once every 6 days. In the cases of S02 and N02, there are also continuous moni- toring instruments, which monitor the pollutants constantly; but our discus- sion here is concerned only with 24-hour data. The application of the test results in flagged data which need to be verified as either valid or invalid. A computer software program, the Air Data Screening System, has been written in the computer languages COBOL and FORTRAN. This program incorpo- rates the Shewhart Control Chart Test. It has been successfully applied to data collected in EPA's Region V, which encompasses the states of Illinois Indiana, Michigan, Minnesota, Ohio, and Wisconsin. In terms of population it is the largest of EPA's regions, and there is extensive monitoring of the above pollutants. The purpose of the Region V evaluation is to determine whether the data flagged by the Shewhart test are valid or invalid and to identify, if possible, the source of the error. This paper will discuss the flow of data from the state and local govern- ments; the data-editing process; the basic characteristics of the data; the application and evaluation of the Shewhart Test; and the computer software program, the Air Data Screening System (ADSS); it will conclude with our recommendations. Data Flow Most ambient air quality data are collected by state and local air pollution control agencies and are forwarded via EPA's Regional Offices to the NADB. A considerable amount of data is forwarded—approximately 20 million air quality measurements a year. The data are sent quarterly in a standard format6 that specifies the site location; the year, month, and day ------- C-3 of sampling; and the measurement itself (24-hour or 1-hour value) in micro- grams or milligrams per cubic meter (yg/m3 or tng/m3) or parts per million (ppm). A corresponding site file contains descriptive information on the sampling-site environment. EPA edits the submitted data, checking for con- sistency with acceptable monitoring methods, and other identifying parameters. In the data-editing program, air quality data with extremely high values are 'flagged. Data that do not pass these checks or that have values exceeding certain predetermined limits are returned to the originating agency via the Regional Office for correction and resubmittal. Unfortunately, with data sets this large, there are still anomalous measurements that slip through the existing editing and validation procedures. Therefore, there is a need for a simple cost-effective statistical test that can be applied to the air quality data by which to detect, primarily, obvious transcription, keypunch, and measurement errors. Statistical tests do not eliminate, however, the need for more intensive quality assurance at the local level. For example, inadequate calibration procedures or similar problems that result in measurement bias will not be detected by our statis- tical procedures, which are intended primarily for macroanalysis. Basic Characteristics of TSP, S02. and NC>2 Data Basic characteristics of the TSP, SC>2, and NC>2 data were considered in selecting the quality control test being used. To begin with, the test was applied to data which were obtained from monitoring instruments that generate one measurement per 24-hour period.^ For such monitoring methods, EPA recommends that a systematic sampling procedure of once every 6 days, or 61 samples per year, be used at a minimum to collect the data.8 Such a sampling procedure generates data, which for our purposes, may be considered as approximately independent. In examining the distributional properties of the data, past research has shown that ambient TSP concentrations are approximately lognormally dis- tributed.^'-'-^ This is sometimes true for SC>2 and N02, also, but is not always the case. In selecting the quality control tests, the averaging times which corre- spond to the NAAQS are important. The values of interest are the peak con- centrations (24-hour average measurements) for TSP and S02, and the annual means for TSP, S02, and N02. The final data characteristic of importance is the seasonality of the pollutants. As an example, in some areas of the country, TSP and S(>2 measure- • ments are highest in the winter months and lowest in the summer months. Therefore, the factor of seasonality had to be considered in the selection of the quality control test to minimize this as a possible source of error. Shewhart Control Chart Test The Shewhart Control Chart Test can be used to examine both shifts in monthly averages, as well as shifts in the monthly range. From the former it can detect possible multiple errors and from the latter, single anomalous values. In this test the data can be divided up into what Shewhart called rational subgroups.H In a manufacturing process the subgroups would most likely relate to the order of production. Ambient air quality measurements ------- C-4 can be viewed In the same way because they are collected by a monitoring instrument over time. A month of data was selected as the rational subgroup because the air quality data are recorded by the state and local agencies on a monthly basis in a standard format.6 The monthly subgroup generally consists of five measurements based on EPA's recommended sampling schedule8 of 61 observations per year, which also is the common subgroup size found in indus- trial use. Using a subgroup size of five, it can be assumed that the distri- bution of the monthly means is nearly normal, even though the samples are taken from a non-normal universe. The test was applied to the 1974 Region V data on a moving 4-month basis; that is, the averages and range of values in the month in question were compared with the overall averages of the three previous monthly averages and monthly ranges. The moving 4-month comparison was used to minimize the effect of the seasonality of the pollutants. The formulas for calculating the upper and lower control limits, UCL and LCL, respectively,;are as follows: For the monthly range: UCL = D,R, and LCLD = D~R. K j For the monthly means: UCL- = x + A R, and X LCL- = x - A2R, where R = the monthly range; R = the average of the three previous monthly ranges; x = the monthly average in question; x = the average of the three previous monthly averages; and D3> D4,_and A« are factors for determining from R the 3-sigma control limits for x and R. (See Table C on page 562, reference number 5.) Results of Application of Quality Control Test During 1974, TSP, S02, and N02 were being monitored in Region V at 855, 366, and 303 sites, respectively. The Shewhart Control Chart Test was applied to all 1974 TSP, S02, and N02 data from Region V. An examina- tion was made of those data in which the flagged monthly mean or range exceeded one of the pollutant-specific NAAQS. For TSP and S02, appropriate cutoffs were thought to be 260 yg/m3 and 365 Mg/m3, which are their respec- tive primary short-term 24-hour standards. In the case of N02, the annual primary NAAQS of 100 pg/m3 was used because N02 has no short-term primary standard. Although their choice was somewhat arbitrary, the NAAQS were used as cutoffs because their violation results in re-examination of the overall adequacy of local air pollution control measures in effect. Thus, high values must be verified because they can result in significant impact on the original control strategy designed to achieve the NAAQS. Table I indicates the number of Region V sites reporting TSP, S02, and N02 data which were flagged by the Shewhart Control Test. The number of flagged sites which were found to have one or more erroneous 24-hour measure- ments based upon later evaluation is also given. Of the 855 sites in Region V measuring TSP in 1974, 38 were flagged by the Shewhart Control Test. The flagged sites reported at least one monthly mean and/or range equal to or greater than 260 yg/m3. Of these 38 sites, 31 ------- C-5 Tnhle 1. Shewhart Control Chart Tests ns applied to sites in Region V monitoring TSP, SC>2, and N(>2 in 1974. Pollutant TSP Q High value in question (yg/m3) Total sites, no. >_ 260 855 >_ 365 366 >_ 100 302 Shewhart test Flagged sites, no. 38 4 36 Flagged sites, no. with errors 31 3 16 Percent with actual errors 81.2 75.0 44.4 aThe high value in question is the monthly mean or range. The National Ambient Air Quality Standards (NAAQS) were used as high value cutoffs: 260 yg/m3 and 365 yg/m3 are the 24-hour primary NAAQS for the TSP and S02, respectively, while 100 yg/m3 is the annual primary NAAQS for NO™. ------- C-6 were found to have multiple transcription or keypunch errors. In the case of S02, 4 of the 366 sites were flagged by the Shewhart Test. The monthly mean and ranges in question were equal to or greater than 365 yg/m3. Of the four sites flagged, one was found to have multiple transcription errors; two sites had single transcription errors and one site was correct. Finally, of the 302 sites measuring N02, transcription and keypunch errors were found at 16 of the 36 sites flagged by the Shewhart Test. An example of a site flagged was one that measured TSP for 11 months in 1974. The monthly mean (x), ranges (R), and subgroup sizes (n) are indicated below by month: Jan Feb Mar Apr May June July Aug Sept Oct Nov Dec x - 67 60 56 70 56 66 73 59 591 82 41 R - 74 25 71 44 102 37 64 68 595 68 30 n04555 3 555 534 The Shewhart Control Chart Test was applied on a moving 4-month basis. When the monthly average and range for October became the values in question, they were compared with the overall averages of the July, August, and Septem- ber averages and ranges. The test results are shown in Figure 1 for both the monthly mean and range. In both cases the air quality data are "out of control" for the month of October, with both the October average and range way above their respective upper control limits. The problem was later identified as a multiple transcription error in which all numbers for the month of October were off by a factor of 10. In many cases, these outliers are obvious, but due to the large volume of data, a screening procedure is essential to identify the suspect data. Air Data Screening System Based on the success of the test results of the Shewhart Control Chart Test, an effort is now underway to assist the states in EPA Region V in implementing the Air Data Screening System on their respective data banks. The Air Data Screening System, when completed, will consist of two computer programs for both one-hour continuous air quality measurements, as well as the 24-hour air quality measurements. In this paper we are only addressing the screening of 24-hour data. This requires the use of a control data set based on past data and the generation of data sets created from incoming data (Figure 2). It is necessary to build a three-month control file prior to applying the Shewhart Test. The computer program generates monthly control information for any site-pollutant combination. The Shewhart Test, of course, is not applied until the fourth month's measurements are available. This data is con- trolled by the previous three month's data. Data that pass the Shewhart Test update the control data set, while the suspect data is printed with summary totals. The system thus creates a new control file with each update cycle. This program will be operated in EPA's Regional Offices and is available to state and local air pollution control agencies. Recommendations Based upon the results of our Region V evaluation, we recommend that air pollution control agencies consider using the Shewhart Control Chart -------An error occurred while trying to OCR this image. ------- C-8 FLOW CHART FOR AIR DATA SCREENING SYSTEM FOR INCOMING 24-HOUR MEASUREMENTS CONTROL FILE AIR QUALITY DATA SHEWHART CONTROL PROGRAM UPDATED CONTROL FILE INPUT FOR NEXT RUN LISTING OF FLAGGED DATA Figure 2. Flow chart for Air Data Screening System for incoming 24-hour measurements. ------- C-9 Test on Inroming 24-hour air quality measurements. It has the advantage that it can simultaneously examine shifts in both the monthly mean and range and can be presented graphically. We have prepared the computer software, the Air Data Screening System, which makes use of the Shewhart Test, and we will make it available to any interested state or local air pollution control agency. Future papers will discuss appropriate quality control procedures for continuous one-hour data and such procedures will be incorporated into the Air Data Screening System. Acknowledgements The authors wish to express their appreciation to the state air pollu- tion control agencies in Region V for their help in the evaluation of the tests, to Mrs. Joan Bivins and Mr. Willie Tigs for their clerical support, and to Dr. Thomas Curran and Mr. William Cox for their many helpful comments on earlier drafts of the paper. ------- C-10 References 1. Monitoring and Air Quality Trends Report, 1974. U.S. Environmental Pro- tection Agency, Office of Air Quality Planning and Standards. Research Triangle Park, N.C. Publication No. EPA-450/1-76-001. February 1976. 2. Hunt, W. F., Jr., and T. C. Curran. An Application of Statistical Quality Control Procedures to Determine Progress in Achieving the 1975 National Ambient Air Quality Standards. Transactions of the 28th Annual ASQC Conference, Boston, Massachusetts. May 1974. 3. Hunt, W. F., Jr., T. C. Curran, N. H. Frank, and R. B. Faoro. Use of Statistical Quality Control Procedures in Achieving and Maintaining Clean Air. Transactions of the Joint European Organization for Quality Control/International Academy for Quality Conference, Venice Lido, Italy. September 1975. 4. Hunt, W. F., Jr., R. B. Faoro, and S. K. Goranson. A Comparison of the Dixon Ratio Test and Shewhart Control Chart Test Applied to the National Aerometric Data Bank. Transactions of the 30th Annual ASQC Conference, Toronto, Ontario, Canada. June 1976. 5. Grant, E. L. Statistical Quality Control. McGraw Hill Book Co., New York, 1964, p. 122-128. 6. Saroad Users Manual. U.S. Environmental Protection Agency, Research Triangle Park, N.C. Publication No. APTD-0663. July 1971. 7. Hoffman, A. J., T. C. Curran, T. B. McMullen, W. M. Cox, and W. F. Hunt, Jr. EPA's Role in Ambient Air Quality Monitoring. Science. JL90(4211):243-248, October 1975. 8. Title 40 - Protection of Environment. Requirements for Preparation, Adoption, and Submittal of Implementation Plans. Federal Register. J16/158): 15490, August 14, 1971. 9. Larsen, R. I. A Mathematical Model for Relating Air Quality Measurement to Air Quality Standards. U. S. Environmental Protection Agency, Research Triangle Park, N.C. Publication No. AP-89. 1971. 10. Hunt, W. F., Jr. The Precision Associated with the Sampling Frequency of. Lognonnally Distributed Air Pollutant Measurements. J. Air Poll. Control Assoc. J22J9):687, 1972. 11. Shewhart, W. A. Economic Control of Quality of Manufactured Product. D. Van Nostrand Company, Inc., Princeton, N.J., 1931, p. 299. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. -------An error occurred while trying to OCR this image. ------- TECHNICAL REPORT DATA (Please read Instructions on the reverse before completing) REPORT NO. EPA-450/2-78-037 j RECIPIENT'S ACCESSIOr>*NO. TITLE ANDSUBTITLE 5 REPORT DATE July. 1978 6. PERFORMING ORGANIZATION CODE AUTHOR(S) Thomas C. Curran 8 PERFORMING ORGANIZATION REPORT NO. PERFORMING ORGANIZATION NAME AND ADDRESS U.S. Environmental Protection Agency Office of Air and Waste Management Office gfTAir Quality,Planning and Standards Research Triangle Park, North Carolina 27/11 10. PROGRAM ELEMENT NO. 11. CONTRACT/GRANT NO. 2. SPONSORING AGENCY NAME AND ADDRESS 13. TYPE OF REPORT AND PERIOD COVERED Final 14 SPONSORING AGENCY CODE 200/04 5.SUPPLEMENTARY NOTES Special mention should be made of the contributions of Jon Clark, William F. Hunt, Jr., Robert B. Faoro and William M. Cox. 6. ABSTRACT This guideline discusses screening procedures to identify possible outliers in ambient air quality data sets. Although the primary emphasis is on computerized techniques the summary briefly discusses which procedures are feasible to implement manually. The screening procedures discussed in this guideline are primarily intended to examine the internal consistency of a particular data set. Appendices are included consisting of articles discussing the application of these tests to air quality data and computer programs to perform the tests. 17. KEY WORDS AND DOCUMENT ANALYSIS DESCRIPTORS b IDENTIFIERS/OPEN ENDED TERMS Data Screening Quality Control Outliers Shewhart Test c. COSATI Field/Group 13. DISTRIBUTION STATEMENT Release Unlimited 19. SECURITY CLASS (This Report/ Unclassified 21. NO. OF 77 20 SECURITY CLASS (This page) Unclassified 22. PRIC EPA Form 2220-1 (9-73) -------An error occurred while trying to OCR this image. ------- |