United States
                   Environmental Protection
                   Agency	
                            -
Atmospheric Research and      ^v,  ;s
Exposure Assessment Laboratory ', Ly '
Research Triangle Park NC 27711    ri \x
                   Research and Development
EPA/600/S3-89/031 Sept. 1989
&ERA          Project Summary
                    Materials Aerometric
                    Database for  Use  in  Developing
                    Materials Damage  Functions

                    Ruen-Tai Tang, P. Michael Barlow, and Paul Waldruff
                    Meteorological and air quality data
                   acquired at field exposure sites have
                   been accumulated into the Materials
                   Aerometric Database  (MAO). Task
                   Group  VII of the National Acid
                   Precipitation  Assessment Program
                   (NAPAP) will use the MAD to develop
                   damage functions for  materials ex-
                   posed at the sites; these functions
                   then will be used in preparing NAPAP
                   integrated  assessment reports to
                   Congress. The MAD data cover as
                   many as six and a half years at five
                   materials exposure  sites  in  the
                   eastern United States. Conservative
                   techniques based on  secondary-site
                   data, regression predictions,  and
                   other information have been applied
                   to the MAD to enhance the quality
                   and usability of the database. The en-
                   hanced version of the MAD, as  well
                   as the original MAD, have been given
                   to Task Group VII.
                    This Project Summary was devel-
                   oped by EPA's Atmospheric Research
                   and Exposure  Assessment Laboratory,
                   Research Triangle  Park, NC, to  an-
                   nounce key findings of the research
                   project that is fully documented in a
                   separate report of the same title (see
                   Project Report ordering information at
                   back).

                   Introduction
                    The EPA's Atmospheric Research and
                   Exposure  Assessment Laboratory
                   (AREAL) has  undertaken the task of
                   maintaining  the Materials  Aerometric
                   Database (MAD), which consists of air
                   quality and  meteorological data from five
                   test sites to be used in developing dam-
                   age  functions for the  National  Acid
Precipitation Assessment Program
(NAPAP) Materials Assessment Program.
The research objectives for this project
are as follows:
  (1)To accumulate and  organize an
    aerometric  database  (MAD)  con-
    taining air quality data  and meteoro-
    logical measurements  made at five
    primary field sites.
    • Develop a uniform format for the
     MAD data.
    • Provide validated tapes of the
     MAD data to the principal investi-
     gators within  the Materials  and
     Cultural Effects Task Group (Task
     Group VII).
    • Acquire quality assurance/quality
     control (QA/QC) data from the site
     operators.
    • Monitor the independent systems
     and  performance audits of the
     sites, conducted by Research Tri-
     angle Institute.
  (2) To enhance  the database and  allow
    its use in  continuous-damage
    models by making reasonable pre-
    dictions for missing  data points
    (infilling).
    • Acquire secondary-source data for
     infilling missing primary-source
     data.
    • Provide data tapes of the  en-
     hanced air quality and meteoro-
     logical data to the principal inves-
     tigators within Task Group VII.

Technical Approach
  Five materials exposure sites  were
chosen  for  continuously  recording air
quality,  meteorology, particle  loadings
and  chemistry, and rain  chemistry meas-
urements (Figure  1):

-------
                 /V 44.00*
                                 N 44.00*
                                                                           N 42.00*
                                                                            N 40.00*
              N 38.00*
                                    N 38.00*
             /V 36.00*
                                     N 36.00*
             /V 34.00*
                                      N 34.00*
             Figure 1.     Locations of the five materials exposure test sites (marked with solid squares).
  • Adirondack Ecological Center, New-
    comb, NY
  • Bell Communications  Research Cen-
    ter, Chester, NJ
  • County Services Building,  Steuben-
    ville, OH
  • West End Library, Washington, DC
  • Research Triangle  Institute, Re-
    search Triangle Park,  NC

  The following variables were measured
in  order  to  quantitatively evaluate  the
deposition  of acids on  the materials
samples:

Meteorological Variables
    Wind speed average     (WSA)
    Wind direction average   (WDA)
    Wind direction vector    (WDV)
    Temperature            (TP)
    Dew point              (DP)
    Relative humidity        (RH)
    Precipitation            (PR)
    Solar radiation          (SR)
Air Quality Variables

  • Sulfur dioxide
  • Ozone
  • Oxides of nitrogen        (NOX)
  • Nitric oxide              (NO)
  • Nitrogen dioxide         (N02)
  The data were supplied to the EPA in <
variety of site-dependent formats.  Th<
format used by the Research TriangU
Park site was  chosen as  the stand arc
data  storage  format  (Figure 2),  an(

-------
          Col. no :
          Contents:
         1          2345678

12345678901234567890123456789012345678901234567890123456789012345678901234567890
                         286 043 03   26    15F    3   18   13  320    33  -12   -91    550     0    0 319  9999
                         286 043 04   24    13    3   16   15  318    33  -17   -95   553     0    0 318  9999
YR  DAY  HR  SO2  NO2  NO  NOX   O3 WDA WSA  TP  DP    RH    PR   SR WDV WSV
          Additional information:

             Column 1 contains the site ID (1 = DC. 2 = NC, 3 = NJ, 4 = NY, 5 = OH)
             Columns 5-7 contain the Julian date.
             Column 21 contains an example information flag (discussed below); all are non-numerical
              characters, except the minus sign.
             WSV stands for wind speed vector, a variable reported by only the New York site.
          Figure 2.    Format for storage of all MAD data (based on the format for the RTP, NC, site).
 software was developed to convert all
 other sites' data to this format.
   During the project, some of the  data
 have  been  lost  due  to  equipment
 shortages and  failures,  power outages,
 etc. A  number of secondary sources
 were found to replace the missing data.
 These sites  were  located near the pri-
 mary sites with similar micrometeorology
 and are listed in Table 1  with the types of
 variables recorded at each site.
   Each site performed ongoing QA on its
 own systems and data, based on the for-
 mat outlined  in 40 CFR  58, Appendix A.
 Also, an independent audit team from the
 Research Triangle Institute  (RTI)  has
 conducted annual  or semiannual perfor-
 mance  and  system  audits at the sites
 since 1985.
   As the  raw data were  received,  we
 performed preliminary statistical, analysis
 and quality assurance checks, identified
 data  problems, and  contacted the  site
 operators  about them. Wherever  possi-
 ble, these problems were  corrected  and
 problem data were replaced by the site.
   Using secondary-source data (if avail-
 able) to infill  primary-source data is the
 most reliable  method of  infilling, as  long
 as the correlation between them is good
 and1 any- bias can* be rem-oved:. We used
 secondary-source substitution  when-
 ever  possible, before using alternative
 forms of infilling.  For missing meteoro-
 logical data,  we used only  secondary-
 source substitution, because infilling  this
 type of  data using predictions from  any
 form  of calculations  could corrupt  the
database.  For missing air quality data,
however, we  employed  three predictive
infilling  methods  when  no  acceptable
secondary-source data  were  available:
linear interpolation  using good data on
either side of the gap; regression using a
long-term least-squares  prediction;
                 and daytime or nighttime averages each
                 compiled from a month's worth of data.
                   We performed  a preliminary survey of
                 the NC data to evaluate  the occurrences
                 of missing  data. For a  given year and
                 variable, we recorded the gap length for
                 each instance of missing data and then
                 developed histograms of this information.
                 In most cases, the data displayed a peak
                 at one hour and then dropped off quickly
                 after two or three hours,  as shown in the
                 example in Table 2.
                   We decided to take the most conserv-
                 ative approach to infilling the data. For
                 one-hour  and two-hour  gaps, we  used
                 interpolation. After these were infilled, the
                 remaining gaps of three or more  hours
                 were i«Wted using a regression predic-
                 tion, if the R2 for  the regression equation
                 was greater  than 0.50.  We then infilled
                 any gaps still  remaining with the  ap-
                 propriate  daytime or nighttime  monthly
                 average. Listed below is a  summary of
                 the steps we followed in processing each
                 subset of the database  to produce the
                 final enhanced database:

                 Step 1.  For both meteorological
                 and air quality data:

                   Replace ad missmg data, data  below
                 detectable Itmits,  or  data above  reason-
                 able  limits  with acceptable secondary-
                 source data, if the correlation  between
                 primary-  and secondary-source  data is
                 high  (r2  >  0.95).  Use  the  following
                 mathematical replacement:

                    XpranW   = xsec(0 +  (the differ-
                              ence  in their yearly
                              averages)

                 Step 2. For air quality data  only:
                   For  one-hour  or two-hour  gaps  re-
                 maining  after  Step 1,  infill   with a
                 smoothed  value interpolated  from  the
 points before and after the gap. For gaps
 three or more hours long, use regression
 to replace the data if R2 for the regres-
 sion equation is greater than 0.50; other-
 wise, replace missing data  with daytime
 or  nighttime monthly  averages  (where
 daytime includes hourly  data from 7 A.M
 to 6 PM. and nighttime includes data from
 7 P M. tO 6 A M )

 Step  3. Apply the following
 special corrections as  needed:

  • Set solar radiation  to zero at night, if
     not already zero.
  • For air quality data only, smooth in
     filled  data into measured  data to
     avoid abrupt slope  discontinuities
     Use   a  five-time-step  smoothing
     scheme for infilling  done with mul-
    tiple  regressions, secondary-source
    data,  and monthly averages; do not
    use for infilling done with one-  and
    two-hour interpolations.
  •  Maintain the  NOX,  NO,  and  N02
    balance using:

    Conc(NOx) = Conc(NO) + Conc(NO2)

  An information flag accompanies every
data point in the MAD. The flag is a blank
for  original,  untouched data. If the data
were modified or infilled, this flag is set to
a code describing the infilling method.

Results

  Detailed descriptions  of the raw data
statistics  and the site  operations for all
sites are presented  in an April 1988 EPA
internal  report,   Monitoring  and
Operations  at  Materials Effects Sites
(R.T.  Tang, P.M.   Barlow, and J.W
Spence).  Table 3  presents raw-data
summary  statistics  for  one site (RTP,
NC).

-------
            Table 1.     Secondary MAD Data Sources
            Primary Site                  Corresponding Secondary Site(s)
                                          Variables Measured
            Newcomb, NY


            Chester, NJ

            Steubemille, OH




            Washington, DC

            Research Triangle Park, NC
State University of New York site i km
away from primary site

Bell Core Lab, Chester, NJ

Harvard School of Public Health Study site
in Steubemille, OH

NOVAA, Mmgo Junction, OH

Washington National Airport

Raleigh-Durham Airport

USEPA, Research Triangle Park, NC
 Meteorology


 Meteorology

 Meteorology, air quality


 Meteorology

 Meteorology

 Meteorology

 Meteorology, air quality
Table 2.    Missing-Data  Gap-Length
           Frequencies for 1984 RTP, NC
           Air Quality Data
               Number of Gaps in a
 Gap Length        Variable's Data
    (h)      O3   SO2  NO  NOX  NO2
1
2
3
4-6
7-12
13-24
>24
31
10
2
4
5
4
6
19
15
6
15
6
4
8
12
5
3
4
1
1
2
16
4
3
4
1
1
2
13
5
4
4
0
2
2
  We processed the raw meteorological
and air quality data from each site  using
the steps discussed above, and then per-
formed  statistical analyses on  the en-
hanced MAD; the procedures and results
are given in  Enhancement of Materials
Aerometric Database (R.  T. Tang,  P. M.
Barlow, and J. W.  Spence), a July 1988
EPA draft  internal  report. Table 4 pre-
sents summary statistics for the infilled
data  for the RTP, NC, site.

Discussion
  The MAD is now available for use  in
predicting damage  functions.  The raw
and  enhanced MAD  data  tapes contain
the data for all sites over part or all of the
1982-1988  period.  The missing air qual-
ity data have been infilled using the algo-
rithm discussed  above. However,  it was
not possible  to find  secondary sources
for all of the  sites, so there are gaps  in
the  meteorological data. The use  of
modeled meteorological  data  in the
development  of  the  damage functions
could seriously bias the predicted  data
values  and the  statistics  that describe
them.
  Also, a bias has already been found
the data from two sites.  For data value
below  the  minimum  detectable  lirr
(MDL), the  DC site  has been reportir
the MDL and the NJ  site has  reporte
one-half the MDL. This is acceptable fi
some EPA  uses,  but  we  are current
trying to acquire the unmodified data.

Conclusions
  We have  developed an   enhance
database  that will  provide  materia
assessment investigators  with  a con
prehensive  data  set  of  meteorologic
and  air quality  data  collected durm
materials test exposures. Two  tapes, or
containing  the  raw data rewritten  m
uniform format and the  second containm
the enhanced database,  have been pr<
vided to the principal investigators with
Task  Group VII.  The  MAD  provide
essential data  for developing  damag
functions to  be used in  estimating currei
materials  damage due  to  acid  ran
predicting  future damage, and aiding
the development of control   scenario
NAPAP  will use  this  information to  d<
velop reports to Congress.

-------
Table 3.     Summary Statistics for the flaw MAD for the RTP, NC, Site
                                                          Variable
Year
1982'




1983




1984




1985




1986




1987




Statistic
Mean (ppm)
Std. Dev.
Mm.
Max.
% Missing
Mean (ppm)
Std. Dev.
Min.
Max.
% Missing
Mean (ppm)
Std. Dev.
Min.
Max.
% Missing
Mean (ppm)
Std. Dev.
Min.
Max.
% Missing
Mean (ppm)
Std. Dev.
Min.
Max.
% Missing
Mean (ppm)
Std. Dev.
Min.
Max.
% Missing
03
0.023
0.019
0.000
0.091
3.3
0.029
0.023
0.000
0.132
6.5
0.025
0.021
0.000
0.118
1.2
0.025
0.020
0.000
0.119
3.0
0.025
0.022
0.000
0.123
10.3
0.027
0.022
0.000
0.112
10.2
SO2
O.OOJ
0.003
0.000
0.021
3.2
0.003
0.004
0.000
0.048
4.6
0.004
0.004
0.000
0.040
42.5
0.002
0.002
0.000
0.026
71.2
0.002
0.005
0.000
0.080
42.5
0.003
0.004
0.000
0.054
13.3
NO
0.009
0.022
0.000
0.225
1.1
0.008
0.019
0.000
0.295
3.3
0.010
0.024
0.000
0.351
2.8
0.008
0.018
0.000
0.268
24.5
0.010
0.022
0.000
0.410
42.4
0.011
0.028
0.000
0.375
41.6
NOX
0.020
0.024
0.000
0.250
2.2
0.021
0.023
0.000
0.312
3.3
0.022
0.029
0.000
0.372
3.8
0.022
0.026
0.000
0.307
16.4
0.023
0.025
0.000
0.436
16.4
0.027
0.034
0.000
0.391
31.5
NO2
0.011
0.007
0.000
0.046
2.3
0.013
0.009
0.000
0.073
3.5
0.012
0.010
0.000
0.065
3.5
0.013
0.010
0.000
0.108
31.2
0.015
0.009
0.000
0.169
49.1
0.014
0.009
0.000
0.059
41.3
WSA
1.25
1.11
0.00
6.80
31.0
1.63
1.30
0.00
10.00
0.1
1.65
1.20
0.00
8.10
0.1
1.48
1.20
0.00
10.00
0.0
1.44
1.11
0.00
8.10
0.3
1.40
1.23
0.00
8.20
0.8
TEMP
16.74
7.96
0.00
31.80
0.4
14.64
10.07
-16.00
38.70
0.9
15.16
9.31
-12.70
35.00
0.1
15.29
9.64
-21.10
34.80
0.8
15.53
9.77
-12.70
37.70
0.3
14.91
9.88
-8.90
37.70
0.1
DEWPT
11.04
8.50
-13.50
2.60
6.5
6.86
9.43
-26.10
23.00
16.4
7.79
9.45
-17.50
23.10
10.5
8.02
10.58
-33.50
22.70
35.3
5.18
10.70
-26.30
22.80
46.3
7.98
9.88
-16.20
24.40
1.5
RH
71.22
17.19
24.00
99.40
6.8
65.07
20.22
16.20
98.60
17.3
65.96
78.85
13.40
100.00
10.5
61.00
19.46
11.90
98.40
35.3
57.05
20.46
10.60
95.30
46.3
66.01
20.06
10.70
100.00
1.2
PR
0.12
0.97
0.0
33.0
0.0
0.13
0.85
0.0
22.0
0.0
0.14
1.01
0.0
32.1
2.5
0.12
0.98
0.0
36.1
1.1
0.11
1.15
0.0
41.3
0.0
0.12
1.11
0.0
64.2
0.1
SR
13.12
19.41
0.0
76.20
4.4
14.03
21.03
0.0
78.0
2.4
11.38
17.57
0.0
76.0
0.1
13.67
20.27
0.0
78.0
1.6
13.34
20.12
0.0
76.0
0.3
13.27
19.88
0.0
80.0
1.7
 My through December data only.

-------
Table 4.      Summary Statistics for the Enhanced MAD for the RTP, NC, Site
                                                            Variable
Year
1982"




1983




1984




1985




1986




1987




Statistic
Mean (pom)
Std. Dev.
Min.
Max.
% Missing
Mean (ppm)
Std. Dev.
Min.
Max.
% Missing
Mean (ppm)
Std. Dev.
Min.
Max.
% Missing
Mean (ppm)
Std. Dev.
Min.
Max.
% Missing
Mean (ppm)
Std. Dev.
Min.
Max.
% Missing
Mean (ppm)
Std. Dev.
Min.
Max.
% Missing
03
0.023
0.019
0.000
0.091
0.0
0.029
0.023
0.000
0.143
0.0
0.025
0.021
0.000
0.118
0.0
0.025
0.020
0.000
0.119
0.0
0.025
0.021
0.000
0.123
0.0
0.026
0.021
0.000
0.112
0.0
S02
0.001
0.003
0.000
0.021
0.0
0.003
0.004
0.000
0.048
0.0
0.004
0.003
0.000
0.040
0.0
0.003
0.002
0.000
0.026
0.0
0.002
0.004
0.000
0.080
0.0
0.003
0.004
0.000
0.054
0.0
NO
0.009
0.022
0.000
0.225
0.0
0.008
0.019
0.000
0.295
0.0
0.010
0.024
0.000
0.351
0.0
0.007
0.016
0.000
0.268
0.0
0.008
0.017
0.000
0.410
0.0
0.010
0.022
0.000
0.375
0.0
NOX
0.020
0.024
0.000
0.250
0.0
0.021
0.023
0.000
0.312
0.0
0.022
0.029
0.000
0.372
0.0
0.022
0.024
0.000
0.307
0.0
0.022
0.023
0.000
0.436
0.0
0.026
0.029
0.000
0.391
0.0
NO2
0.011
0.007
0.000
0.091
0.0
0.013
0.008
0.000
0.073
0.0
0.012
0.010
0.000
0.119
0.0
0.014
0.014
0.000
0.268
0.0
0.015
0.013
0.000
0.278
0.0
0.016
0.014
0.000
0.276
0.0
WSA
1.28
.98
0.00
6.80
0.0
1.63
1.30
0.00
10.00
0.0
1.65
1.20
0.00
8.10
0.0
1.48
1.20
0.00
10.00
0.0
1.44
1.11
0.0
8.10
0.0
1.40
1.23
0.00
8.20
0.0
TEMP
16.77
7.97
0.00
31.80
0.0
14.50
10.12
-16.00
38.70
0.0
15.17
9.31
-12.70
35.00
0.0
15.18
9.69
-21.10
34.80
0.0
15.51
9.76
-12.70
37.70
0.0
14.91
9.88
-8.90
37.70
0.0
DEW PT
12.39
8.37
-12.20
23.90
0.0
8.21
9.84
-27.20
24.40
0.0
8.87
9.87
-17.20
24.40
0.0
7.63
10.54
-33.30
22.80
0.0
9.31
10.18
-22.80
26.10
0.0
8.63
10.35
-20.60
25.00
0.0
RH
77.57
18.45
24.60
100.00
0.0
69.12
20.36
20.70
100.00
0.0
68.65
1934
19.30
100.00
00
63.84
20.36
13.0
100.00
0.0
70.28
21.89
15.70
100.00
0.0
69.63
21.80
10.40
100.00
0.0
PR
0.12
0.97
0.00
33.0
0.0
0.13
0.85
0.00
22.0
0.0
0.14
1.01
0.00
32.10
2.5
0.12
0.98
0.00
36.10
1.1
0.11
1.15
0.00
41.30
0.0
0.12
1.11
0.00
64.20
0.1
SR
12.72
19.18
0.00
76.20
0.6
13.72
20.89
0.00
78.0
0.0
11.36
17.56
0.00
76.00
0.0
13.46
20.18
0.00
78.00
0.0
13.31
20.10
0.00
76.00
0.0
13:27
19.88
0.00
80.00
1.7
  July through December data only.

-------
   Ruen-Tai  Tang,  P.  Michael Barlow, and  Paul Waldruff are  with  Computer
   Sciences Corporation, Research Triangle Park, NC 27709
   F. H. Haynie is the EPA Project Officer (see below).
   The complete report,  entitled  "Materials Aerometric Database for Use in
        Developing Materials Damage Functions," (Order No.  PB 89-181 2591 AS;
        Cost: $13.95, subject to change) will be available only from:
            National Technical Information Service
            5285 Port Royal Road
            Springfield, VA 22161
            Telephone: 703-487-4650
   The EPA Project  Officer can be contacted at:
            Atmospheric Research and Exposure Assessment Laboratory
            U.S. Environmental Protection Agency
            ResearchTriangle Park, NC 27711
United States                   Center for Environmental Research
Environmental Protection         Information
Agency                         Cincinnati OH 45268
Official Business
Penalty for Private Use $300

EPA/600/S3-89/031

-------