United States Environmental Protection Agency - Atmospheric Research and ^v, ;s Exposure Assessment Laboratory ', Ly ' Research Triangle Park NC 27711 ri \x Research and Development EPA/600/S3-89/031 Sept. 1989 &ERA Project Summary Materials Aerometric Database for Use in Developing Materials Damage Functions Ruen-Tai Tang, P. Michael Barlow, and Paul Waldruff Meteorological and air quality data acquired at field exposure sites have been accumulated into the Materials Aerometric Database (MAO). Task Group VII of the National Acid Precipitation Assessment Program (NAPAP) will use the MAD to develop damage functions for materials ex- posed at the sites; these functions then will be used in preparing NAPAP integrated assessment reports to Congress. The MAD data cover as many as six and a half years at five materials exposure sites in the eastern United States. Conservative techniques based on secondary-site data, regression predictions, and other information have been applied to the MAD to enhance the quality and usability of the database. The en- hanced version of the MAD, as well as the original MAD, have been given to Task Group VII. This Project Summary was devel- oped by EPA's Atmospheric Research and Exposure Assessment Laboratory, Research Triangle Park, NC, to an- nounce key findings of the research project that is fully documented in a separate report of the same title (see Project Report ordering information at back). Introduction The EPA's Atmospheric Research and Exposure Assessment Laboratory (AREAL) has undertaken the task of maintaining the Materials Aerometric Database (MAD), which consists of air quality and meteorological data from five test sites to be used in developing dam- age functions for the National Acid Precipitation Assessment Program (NAPAP) Materials Assessment Program. The research objectives for this project are as follows: (1)To accumulate and organize an aerometric database (MAD) con- taining air quality data and meteoro- logical measurements made at five primary field sites. Develop a uniform format for the MAD data. Provide validated tapes of the MAD data to the principal investi- gators within the Materials and Cultural Effects Task Group (Task Group VII). Acquire quality assurance/quality control (QA/QC) data from the site operators. Monitor the independent systems and performance audits of the sites, conducted by Research Tri- angle Institute. (2) To enhance the database and allow its use in continuous-damage models by making reasonable pre- dictions for missing data points (infilling). Acquire secondary-source data for infilling missing primary-source data. Provide data tapes of the en- hanced air quality and meteoro- logical data to the principal inves- tigators within Task Group VII. Technical Approach Five materials exposure sites were chosen for continuously recording air quality, meteorology, particle loadings and chemistry, and rain chemistry meas- urements (Figure 1): ------- /V 44.00* N 44.00* N 42.00* N 40.00* N 38.00* N 38.00* /V 36.00* N 36.00* /V 34.00* N 34.00* Figure 1. Locations of the five materials exposure test sites (marked with solid squares). Adirondack Ecological Center, New- comb, NY Bell Communications Research Cen- ter, Chester, NJ County Services Building, Steuben- ville, OH West End Library, Washington, DC Research Triangle Institute, Re- search Triangle Park, NC The following variables were measured in order to quantitatively evaluate the deposition of acids on the materials samples: Meteorological Variables Wind speed average (WSA) Wind direction average (WDA) Wind direction vector (WDV) Temperature (TP) Dew point (DP) Relative humidity (RH) Precipitation (PR) Solar radiation (SR) Air Quality Variables Sulfur dioxide Ozone Oxides of nitrogen (NOX) Nitric oxide (NO) Nitrogen dioxide (N02) The data were supplied to the EPA in < variety of site-dependent formats. Th< format used by the Research TriangU Park site was chosen as the stand arc data storage format (Figure 2), an( ------- Col. no : Contents: 1 2345678 12345678901234567890123456789012345678901234567890123456789012345678901234567890 286 043 03 26 15F 3 18 13 320 33 -12 -91 550 0 0 319 9999 286 043 04 24 13 3 16 15 318 33 -17 -95 553 0 0 318 9999 YR DAY HR SO2 NO2 NO NOX O3 WDA WSA TP DP RH PR SR WDV WSV Additional information: Column 1 contains the site ID (1 = DC. 2 = NC, 3 = NJ, 4 = NY, 5 = OH) Columns 5-7 contain the Julian date. Column 21 contains an example information flag (discussed below); all are non-numerical characters, except the minus sign. WSV stands for wind speed vector, a variable reported by only the New York site. Figure 2. Format for storage of all MAD data (based on the format for the RTP, NC, site). software was developed to convert all other sites' data to this format. During the project, some of the data have been lost due to equipment shortages and failures, power outages, etc. A number of secondary sources were found to replace the missing data. These sites were located near the pri- mary sites with similar micrometeorology and are listed in Table 1 with the types of variables recorded at each site. Each site performed ongoing QA on its own systems and data, based on the for- mat outlined in 40 CFR 58, Appendix A. Also, an independent audit team from the Research Triangle Institute (RTI) has conducted annual or semiannual perfor- mance and system audits at the sites since 1985. As the raw data were received, we performed preliminary statistical, analysis and quality assurance checks, identified data problems, and contacted the site operators about them. Wherever possi- ble, these problems were corrected and problem data were replaced by the site. Using secondary-source data (if avail- able) to infill primary-source data is the most reliable method of infilling, as long as the correlation between them is good and1 any- bias can* be rem-oved:. We used secondary-source substitution when- ever possible, before using alternative forms of infilling. For missing meteoro- logical data, we used only secondary- source substitution, because infilling this type of data using predictions from any form of calculations could corrupt the database. For missing air quality data, however, we employed three predictive infilling methods when no acceptable secondary-source data were available: linear interpolation using good data on either side of the gap; regression using a long-term least-squares prediction; and daytime or nighttime averages each compiled from a month's worth of data. We performed a preliminary survey of the NC data to evaluate the occurrences of missing data. For a given year and variable, we recorded the gap length for each instance of missing data and then developed histograms of this information. In most cases, the data displayed a peak at one hour and then dropped off quickly after two or three hours, as shown in the example in Table 2. We decided to take the most conserv- ative approach to infilling the data. For one-hour and two-hour gaps, we used interpolation. After these were infilled, the remaining gaps of three or more hours were i«Wted using a regression predic- tion, if the R2 for the regression equation was greater than 0.50. We then infilled any gaps still remaining with the ap- propriate daytime or nighttime monthly average. Listed below is a summary of the steps we followed in processing each subset of the database to produce the final enhanced database: Step 1. For both meteorological and air quality data: Replace ad missmg data, data below detectable Itmits, or data above reason- able limits with acceptable secondary- source data, if the correlation between primary- and secondary-source data is high (r2 > 0.95). Use the following mathematical replacement: XpranW = xsec(0 + (the differ- ence in their yearly averages) Step 2. For air quality data only: For one-hour or two-hour gaps re- maining after Step 1, infill with a smoothed value interpolated from the points before and after the gap. For gaps three or more hours long, use regression to replace the data if R2 for the regres- sion equation is greater than 0.50; other- wise, replace missing data with daytime or nighttime monthly averages (where daytime includes hourly data from 7 A.M to 6 PM. and nighttime includes data from 7 P M. tO 6 A M ) Step 3. Apply the following special corrections as needed: Set solar radiation to zero at night, if not already zero. For air quality data only, smooth in filled data into measured data to avoid abrupt slope discontinuities Use a five-time-step smoothing scheme for infilling done with mul- tiple regressions, secondary-source data, and monthly averages; do not use for infilling done with one- and two-hour interpolations. Maintain the NOX, NO, and N02 balance using: Conc(NOx) = Conc(NO) + Conc(NO2) An information flag accompanies every data point in the MAD. The flag is a blank for original, untouched data. If the data were modified or infilled, this flag is set to a code describing the infilling method. Results Detailed descriptions of the raw data statistics and the site operations for all sites are presented in an April 1988 EPA internal report, Monitoring and Operations at Materials Effects Sites (R.T. Tang, P.M. Barlow, and J.W Spence). Table 3 presents raw-data summary statistics for one site (RTP, NC). ------- Table 1. Secondary MAD Data Sources Primary Site Corresponding Secondary Site(s) Variables Measured Newcomb, NY Chester, NJ Steubemille, OH Washington, DC Research Triangle Park, NC State University of New York site i km away from primary site Bell Core Lab, Chester, NJ Harvard School of Public Health Study site in Steubemille, OH NOVAA, Mmgo Junction, OH Washington National Airport Raleigh-Durham Airport USEPA, Research Triangle Park, NC Meteorology Meteorology Meteorology, air quality Meteorology Meteorology Meteorology Meteorology, air quality Table 2. Missing-Data Gap-Length Frequencies for 1984 RTP, NC Air Quality Data Number of Gaps in a Gap Length Variable's Data (h) O3 SO2 NO NOX NO2 1 2 3 4-6 7-12 13-24 >24 31 10 2 4 5 4 6 19 15 6 15 6 4 8 12 5 3 4 1 1 2 16 4 3 4 1 1 2 13 5 4 4 0 2 2 We processed the raw meteorological and air quality data from each site using the steps discussed above, and then per- formed statistical analyses on the en- hanced MAD; the procedures and results are given in Enhancement of Materials Aerometric Database (R. T. Tang, P. M. Barlow, and J. W. Spence), a July 1988 EPA draft internal report. Table 4 pre- sents summary statistics for the infilled data for the RTP, NC, site. Discussion The MAD is now available for use in predicting damage functions. The raw and enhanced MAD data tapes contain the data for all sites over part or all of the 1982-1988 period. The missing air qual- ity data have been infilled using the algo- rithm discussed above. However, it was not possible to find secondary sources for all of the sites, so there are gaps in the meteorological data. The use of modeled meteorological data in the development of the damage functions could seriously bias the predicted data values and the statistics that describe them. Also, a bias has already been found the data from two sites. For data value below the minimum detectable lirr (MDL), the DC site has been reportir the MDL and the NJ site has reporte one-half the MDL. This is acceptable fi some EPA uses, but we are current trying to acquire the unmodified data. Conclusions We have developed an enhance database that will provide materia assessment investigators with a con prehensive data set of meteorologic and air quality data collected durm materials test exposures. Two tapes, or containing the raw data rewritten m uniform format and the second containm the enhanced database, have been pr< vided to the principal investigators with Task Group VII. The MAD provide essential data for developing damag functions to be used in estimating currei materials damage due to acid ran predicting future damage, and aiding the development of control scenario NAPAP will use this information to d< velop reports to Congress. ------- Table 3. Summary Statistics for the flaw MAD for the RTP, NC, Site Variable Year 1982' 1983 1984 1985 1986 1987 Statistic Mean (ppm) Std. Dev. Mm. Max. % Missing Mean (ppm) Std. Dev. Min. Max. % Missing Mean (ppm) Std. Dev. Min. Max. % Missing Mean (ppm) Std. Dev. Min. Max. % Missing Mean (ppm) Std. Dev. Min. Max. % Missing Mean (ppm) Std. Dev. Min. Max. % Missing 03 0.023 0.019 0.000 0.091 3.3 0.029 0.023 0.000 0.132 6.5 0.025 0.021 0.000 0.118 1.2 0.025 0.020 0.000 0.119 3.0 0.025 0.022 0.000 0.123 10.3 0.027 0.022 0.000 0.112 10.2 SO2 O.OOJ 0.003 0.000 0.021 3.2 0.003 0.004 0.000 0.048 4.6 0.004 0.004 0.000 0.040 42.5 0.002 0.002 0.000 0.026 71.2 0.002 0.005 0.000 0.080 42.5 0.003 0.004 0.000 0.054 13.3 NO 0.009 0.022 0.000 0.225 1.1 0.008 0.019 0.000 0.295 3.3 0.010 0.024 0.000 0.351 2.8 0.008 0.018 0.000 0.268 24.5 0.010 0.022 0.000 0.410 42.4 0.011 0.028 0.000 0.375 41.6 NOX 0.020 0.024 0.000 0.250 2.2 0.021 0.023 0.000 0.312 3.3 0.022 0.029 0.000 0.372 3.8 0.022 0.026 0.000 0.307 16.4 0.023 0.025 0.000 0.436 16.4 0.027 0.034 0.000 0.391 31.5 NO2 0.011 0.007 0.000 0.046 2.3 0.013 0.009 0.000 0.073 3.5 0.012 0.010 0.000 0.065 3.5 0.013 0.010 0.000 0.108 31.2 0.015 0.009 0.000 0.169 49.1 0.014 0.009 0.000 0.059 41.3 WSA 1.25 1.11 0.00 6.80 31.0 1.63 1.30 0.00 10.00 0.1 1.65 1.20 0.00 8.10 0.1 1.48 1.20 0.00 10.00 0.0 1.44 1.11 0.00 8.10 0.3 1.40 1.23 0.00 8.20 0.8 TEMP 16.74 7.96 0.00 31.80 0.4 14.64 10.07 -16.00 38.70 0.9 15.16 9.31 -12.70 35.00 0.1 15.29 9.64 -21.10 34.80 0.8 15.53 9.77 -12.70 37.70 0.3 14.91 9.88 -8.90 37.70 0.1 DEWPT 11.04 8.50 -13.50 2.60 6.5 6.86 9.43 -26.10 23.00 16.4 7.79 9.45 -17.50 23.10 10.5 8.02 10.58 -33.50 22.70 35.3 5.18 10.70 -26.30 22.80 46.3 7.98 9.88 -16.20 24.40 1.5 RH 71.22 17.19 24.00 99.40 6.8 65.07 20.22 16.20 98.60 17.3 65.96 78.85 13.40 100.00 10.5 61.00 19.46 11.90 98.40 35.3 57.05 20.46 10.60 95.30 46.3 66.01 20.06 10.70 100.00 1.2 PR 0.12 0.97 0.0 33.0 0.0 0.13 0.85 0.0 22.0 0.0 0.14 1.01 0.0 32.1 2.5 0.12 0.98 0.0 36.1 1.1 0.11 1.15 0.0 41.3 0.0 0.12 1.11 0.0 64.2 0.1 SR 13.12 19.41 0.0 76.20 4.4 14.03 21.03 0.0 78.0 2.4 11.38 17.57 0.0 76.0 0.1 13.67 20.27 0.0 78.0 1.6 13.34 20.12 0.0 76.0 0.3 13.27 19.88 0.0 80.0 1.7 My through December data only. ------- Table 4. Summary Statistics for the Enhanced MAD for the RTP, NC, Site Variable Year 1982" 1983 1984 1985 1986 1987 Statistic Mean (pom) Std. Dev. Min. Max. % Missing Mean (ppm) Std. Dev. Min. Max. % Missing Mean (ppm) Std. Dev. Min. Max. % Missing Mean (ppm) Std. Dev. Min. Max. % Missing Mean (ppm) Std. Dev. Min. Max. % Missing Mean (ppm) Std. Dev. Min. Max. % Missing 03 0.023 0.019 0.000 0.091 0.0 0.029 0.023 0.000 0.143 0.0 0.025 0.021 0.000 0.118 0.0 0.025 0.020 0.000 0.119 0.0 0.025 0.021 0.000 0.123 0.0 0.026 0.021 0.000 0.112 0.0 S02 0.001 0.003 0.000 0.021 0.0 0.003 0.004 0.000 0.048 0.0 0.004 0.003 0.000 0.040 0.0 0.003 0.002 0.000 0.026 0.0 0.002 0.004 0.000 0.080 0.0 0.003 0.004 0.000 0.054 0.0 NO 0.009 0.022 0.000 0.225 0.0 0.008 0.019 0.000 0.295 0.0 0.010 0.024 0.000 0.351 0.0 0.007 0.016 0.000 0.268 0.0 0.008 0.017 0.000 0.410 0.0 0.010 0.022 0.000 0.375 0.0 NOX 0.020 0.024 0.000 0.250 0.0 0.021 0.023 0.000 0.312 0.0 0.022 0.029 0.000 0.372 0.0 0.022 0.024 0.000 0.307 0.0 0.022 0.023 0.000 0.436 0.0 0.026 0.029 0.000 0.391 0.0 NO2 0.011 0.007 0.000 0.091 0.0 0.013 0.008 0.000 0.073 0.0 0.012 0.010 0.000 0.119 0.0 0.014 0.014 0.000 0.268 0.0 0.015 0.013 0.000 0.278 0.0 0.016 0.014 0.000 0.276 0.0 WSA 1.28 .98 0.00 6.80 0.0 1.63 1.30 0.00 10.00 0.0 1.65 1.20 0.00 8.10 0.0 1.48 1.20 0.00 10.00 0.0 1.44 1.11 0.0 8.10 0.0 1.40 1.23 0.00 8.20 0.0 TEMP 16.77 7.97 0.00 31.80 0.0 14.50 10.12 -16.00 38.70 0.0 15.17 9.31 -12.70 35.00 0.0 15.18 9.69 -21.10 34.80 0.0 15.51 9.76 -12.70 37.70 0.0 14.91 9.88 -8.90 37.70 0.0 DEW PT 12.39 8.37 -12.20 23.90 0.0 8.21 9.84 -27.20 24.40 0.0 8.87 9.87 -17.20 24.40 0.0 7.63 10.54 -33.30 22.80 0.0 9.31 10.18 -22.80 26.10 0.0 8.63 10.35 -20.60 25.00 0.0 RH 77.57 18.45 24.60 100.00 0.0 69.12 20.36 20.70 100.00 0.0 68.65 1934 19.30 100.00 00 63.84 20.36 13.0 100.00 0.0 70.28 21.89 15.70 100.00 0.0 69.63 21.80 10.40 100.00 0.0 PR 0.12 0.97 0.00 33.0 0.0 0.13 0.85 0.00 22.0 0.0 0.14 1.01 0.00 32.10 2.5 0.12 0.98 0.00 36.10 1.1 0.11 1.15 0.00 41.30 0.0 0.12 1.11 0.00 64.20 0.1 SR 12.72 19.18 0.00 76.20 0.6 13.72 20.89 0.00 78.0 0.0 11.36 17.56 0.00 76.00 0.0 13.46 20.18 0.00 78.00 0.0 13.31 20.10 0.00 76.00 0.0 13:27 19.88 0.00 80.00 1.7 July through December data only. ------- Ruen-Tai Tang, P. Michael Barlow, and Paul Waldruff are with Computer Sciences Corporation, Research Triangle Park, NC 27709 F. H. Haynie is the EPA Project Officer (see below). The complete report, entitled "Materials Aerometric Database for Use in Developing Materials Damage Functions," (Order No. PB 89-181 2591 AS; Cost: $13.95, subject to change) will be available only from: National Technical Information Service 5285 Port Royal Road Springfield, VA 22161 Telephone: 703-487-4650 The EPA Project Officer can be contacted at: Atmospheric Research and Exposure Assessment Laboratory U.S. Environmental Protection Agency ResearchTriangle Park, NC 27711 United States Center for Environmental Research Environmental Protection Information Agency Cincinnati OH 45268 Official Business Penalty for Private Use $300 EPA/600/S3-89/031 ------- |