GUIDELINE SERIES
              OAQPS NO.   1.2-015
         GUIDELINES FOR THE EVALUATION
             OF AIR QUALITY DATA
US. ENVIRONMENTAL PROTECTION AGENCY
   Office of Air Quality Planning and Standards

      Research Triangle Park, North Carolina

-------
                                       45OR74O05
              GUIDELINE SERIES

              OAQPS NO. 1.2-015
       GUIDELINES FOR THE EVALUATION OP
              AIR QUALITY DATA
   U. S. ENVIRONMENTAL  PROTECTION AGENCY
OFFICE OF AIR QUALITY PLANNING AND STANDARDS
   MONITORING AND.DATA  ANALYSIS DIVISION
RESEARCH TRIANGLE PARK, NORTH CAROLINA  27711

-------
                    TABLE OF CONTENTS
                                                    PAGE


PREFACE                                               i

1.  INTRODUCTION                                      1

2.  BASIC CONVENTIONS FOR HANDLING AIR QUALITY DATA   2

      2.1.  Significant Figures                       3
      2.2.  Minimum Detectable Limit                  3

3.  CHARACTERISTIC PATTERNS OF AIR QUALITY DATA       5

      3.1.  Seasonal Patterns                         7
      3.2.  Diurnal Patterns                          7
      3.3.  Frequency Distribution                   10

4.  SUMMARIZING AIR QUALITY DATA                     10

      4.1.  Indicating Typical Values                13
      4.2.  Indicating Maximum Values                15
      4.3.  Indicators of Spread                     17

5.  MAKING INFERENCES FROM AIR QUALITY DATA          17

      5.1.  Inferences About a Particular Site       19
      5.2.  Inferences About a Region                22

6.  SOME STATISTICAL TESTS   '                        24

      6.1.  Student's T-test                         26
      6.2.  Non-Parametric Quantile Test             28

7.  BASIC MEANS OF OBTAINING AIR QUALITY DATA        29

-------
                LIST OF TABLES AND FI'GURES
                                                   PAGE

TABLE 1  Suggested Reporting Accuracy        •        4
         For Raw Data
TABLE 2  Minimum Detectable Limits for Selected
         Measurement Techniques                      6

TABLE 3  Number of Hours Above Oxidant Standard     11
         By Month and Time of Day  (1971 Data)

TABLE 4  Maximum and Second High Values (Phila.)    16
         for Various Sampling Schemes.

TABLE 5  Geometric Means, Medians, and 90th         18
         Percentile Values  For Table 4

TABLE 6  Summary Criteria for Continuous            21
         Measurements

TABLE 7  Probability of Selecting Two or More       23
         Days When Site is Above Standard

TABLE 8  NADB Output for Common Questions on        31
         Air Quality           .                    .
FIGURE 1  Graphs of Monthly Averages for Various     8
          Pollutants at a Particular Site

FIGURE 2  Graphs of Seasonal Patterns  for Various    9
          Pollutants at a Particular Site

FIGURE 3  Frequency Distribution - TSP  (Phila.)     12

-------
                           PREFACE

     The Monitoring  and  Data Analysis Division of the Office
of Air Quality Planning  and Standards has prepared this
guideline  entitled "Guidelines  for the Evaluation of Air
Quality Data"  for use by the Regional Offices of the Environ-
mental Protection Agency.  The  purpose of the report is to
provide guidance  information on current air quality data
evaluation techniques.   Adherence to the guidance presented
in the report  will,  hopefully,  ensure mutually compatible
ambient air quality  data evaluation by all States and Regions.
Further, any risks involved in  policy decisions concerning
National Ambient  Air Quality Standards should be minimized.
This report will  serve on an interim basis until more
specific and detailed guidance  on this subject is developed.

-------
1.   INTRODUCTION
         The purpose of this guideline document is to present
    the basic elements of air quality data analysis that are
    essential in preparing reports describing the air quality.
    status of a given region.  With this aim in mind, emphasis  has
    been placed upon describing both the conventions and the
    methodology to be employed with minimum discussion of the
    associated statistical theory.  Much of the material that
    is presented has been treated before but for the sake of
    completeness, is reiterated in this document with appropriate
    references indicated.
         Since the phrase "air quality data" covers a variety
    of possible data sets/ it is convenient to indicate the
    exact nature of this phrase as used in this paper.  For present
    purposes, the term "air quality data" refers to a set of ob-
    servations for a particular pollutant having the following
    properties:
              1.  All measurements were made at the same site.
              2.  Uniform methodology was employed.
              3.  All measurements have the same averaging time.
         It should be noted that the statistical treatments described
    here for such a data set constitute a minimum effort.  There
   .are a variety of more sophisticated techniques available that
    could be used to extract more information from the data.  -In
    general, the degree of effort devoted to data analysis should

-------
    bo consistent with the value associated with the data.  This' .
    can be viewed in financial terms as cost: of data analysis
    versus cost of data collection or cost; of 'data analysis versus
    potential cost of control strategies, etc.  In most cases,'
    the extent of the data analysis phase is determined by a sub-
    jective judgment of what is appropriate.  It should be noted
    that no matter how extensive the data analysis effort is, the
    end result can be no better than the original data.  :'This .
    point is particularly important because throughout'the following
    discussions no analysis is made concerning 'the errors inherent
     i                                                 .   •
   .in the measurement method.  Therefore, it is essential that
    the air quality data analyst be aware of the shortcomings in
    the data and the conclusions that are "statistically signifi-
    cant" be carefully evaluated to determine if they are "realty
    significant."
2.   BASIC CONVENTIONS FOR HANDLING AIR QUALITY DATA
         Before discussing the analysis of air quality data, it
    is essential that certain basic conventions be presented for
    handling the raw data.  These conventions are introduced to.
    prevent the air quality summaries Sfrbm;'appearing; to be mprb' .
    accurate than the data warrants.  These conventions have been
    discussed previously  (Nehis and AklancC 1973) and are rep'oated
    here since they are the procedures presently employed'by1EPA
    in maintaining the National Aerometric Data Bank.

-------
           The two  topics  treated in this section both relate to
      the relative  precision of the raw data with respect to the
      methodology employed in obtaining the!measurement.  The first
      topic concerns  the number of significant figures that should
      be reported while the second deals with values that are below
      the minimum detectable limit.         ;
2.1.  Significant Figures                   '
           The number of significant figures that are meaningful
      for a particular air quality measurement is limited by the
      methodology employed.  To use more significant figures than
      is warranted  by the  sensitivity of the analytical procedure
      adds no real  information and can often be misleading.
      Table 1 presents the suggested reporting accuracy for raw data
      for various pollutants.  While the conventions apply to the
      raw data it is  also  useful to specify  the accuracy of geometric
      and annual means.  For simplicity/ the general convention is
      that all means  be reported to one more significant digit than
      the raw data.
2.2.  Minimum Detectable Limit
           Some reported pollutant measurements are below the limit
      of detection  for the analytical procedure.  In such cases,
                                            i
      the reported  number  should be viewed  as representing a range
                                            i
      from zero to  the minimum detectable.  ! However, in order to
      use such data in computing annual summary statistics such as

-------
 TABLE 1 -' SUGGESTED REPORTING ACCURACY FOR RAW DATA
Pollutant
Number of Decimal Places
 I   ug/m3
Suspended Particulate Matter
Benzene Soluble Organic Matter
Sulfates
Nitrates
Ammonium
Sulfur Dioxide
Nitrogen Dioxide          . :'  .
Nitric Oxide
Carbon Monoxide
Total oxidants
Total Hydrocarbons
Ozone
Methane          .
      0
      1
      1
      1
      1
      0
      0
      0
      1
      0
      1
      0
      1
2
•2
2
0
2
1
3
.1-

-------
    geometric means it is convenient to have a convention indi-
    cating -what value should be substituted for a measurement
    below the minimum detectable.  As a general rule, each value
    below the minimum detectable is replaced by a value approxi-
                           " '  , / '   '•  "'  i •'." '.-.(•
    mately equal to one-half the minimum detectable.  Table 2
                           •:..•'';'.    '',•/'•(•'
    indicates selected minimum detectable limits used by the
    Natipnal Aerometric Data Bank  (NADB) for various analytical
    methods.  A complete listing may be obtained from the National
    Air Data Branch, EPA, Research Triangle Park, N. C. 27711.
    The mid-point substitution was selected after examining the
    statistical distribution of the data  (Nehls and Akland, 1973).
    It should be noted that in comparing data over several years,
    a standard minimum detectable should be used unless it has
    changed by an order of magnitude.
         In preparing summary  statistics, if more than 25% of the
    observations are less than the minimum detectable no statistics
    are computed from the data.
3.  CHARACTERISTIC PATTERNS; OF AIR QUALITY DATA
         Before summarizing;any data, some thought should be given
    to the characteristics of  the raw data.  This is particularly
    true of air quality data for which strong seasonal and diurnal
    patterns may effect the interpretation of the data.  For
    example, the maximum hourly oxidant value for a year based on
    4,000 observations could h&Ve Completely different meanings,
    depending upon whether the observations were made primarily
    during the winter or -the summer.  This section presents

-------
                                       TABLE  2

            MINIMUM DETECTABLE LIMITS FOR SELECTED MEASUREMENT TECHNIQUES
     Pollutant
Collection
 Method
    Analysis Method
 Units
  Minimum
Detectable
Suspended Particulate

Nitrate  -;•'.  .^-'. ;-

Sulfate  .r. > i* •;< -  ._-
          , -' - .-"~\ ' • .• -'•-
Carbon Monoxide
          *?  . "  .
Sulfur DiokiderA

Total Oxidants
 Hi-Vol

 Hi-Vol
 Instrumental

 Gas Bubbler

 Instrumental
Gravimetric
          .?!•'*.
Reductipn-piazo Coupling

Colorimetrie

Nondispensive Infra-Red

West-Gaeke Sulfamic Acid

Colorimetric Neutral KI
 ug/m

 ug/m

'ug/m

.mg/m

 ug/m

 ug/m
                                                                            3 '
   1.0

    .05

    .5

    .575

   5.0

  19.6 "

-------
      excinplos of some of these patterns.  The analysis of these
      patterns can frequently be an end in itself since they pro-
      vide insight into the behavior of the pollutant.  An awareness
      of these patterns also provides a means for screening the data
      for anomolous values.  It should be noted that while the
      following discussion is general in nature, the characteristic
      pattern at a given site is a function of local factors such
      as emissions and meteorology and as a consequence characteristic
      pattern may be specific to that site or locality.
3.1.  Seasonal Patterns
           Figure 1 displays graphs of monthly averages for various
      pollutants at a particular site.  Superimposed on these graphs
      is a smooth curve selected to emphasize the long term trend in
      the data.  Figure 2 displays smoothed curves illustrating the
      seasonal patterns in the data.  The intensity of the seasonal
      pattern for a particular pollutant may vary from site to site
      within an area depending upon factors such as proximity to point
      sources.  A knowledge of the seasonality of a pollutant can
      provide .useful information for interpreting the data since it
      suggests the season in which maximum concentrations would be
      expected.
3.2.  Diurnal Patterns
           In addition to seasonal patterns some pollutants also have
      pronounced -diurnal patterns.  These patterns may be due to
      factors such ar, solar radiation, traffic density, etc. which
      influence pollution levels.

-------
   it
   u
   o
                rACHii'i  M Wiixiw  , •   •     •   ,
     INiiTMl'MI M'l'.M. N •*.>! :M'i:i:S1Vr IMUA-M:I>
               M.:A.'i',M:;7i,|«, (I', i1)
63  '64
                                                               a-
                                                               O .
                                                                                    ,  v    '  •   .  ;'.»,,:'     . .'•'
                                                                                      '         ' '"  ''•'     ':     '
                                                                                     I'.':
                                                                                                                           .      ••
                                                                                                                            ' ••< '••.'"••/'£)','•
                                                                                                                           •;•'.>.•••• .;  ;frv,;,' •  •
                                                                                                                           -..".'• ^ / '•',  •  . -h
                                                                                                                           •" f  ' ' .   -' • '••••(!
                                                                                                                              ''1'  '-'*1     *
                                                                                                             ,.--           •,
                                                                                                .:  /     -•• .•• •.,-.•'. ; ;•  4 •  ••;• '.'•  •.  ,'.•:••."
                                                                                                       ,ti»'Jtu'Mirr.i:ir 125 cv   •  ••••••'

                                                                                               6,4;•••'  65'
  SAC

  300
  aoo
   56
                           •  '   NITRIC OXIOE'' 'f;.'•;•;.'
                          INSTRUMENTAL COLOHIMETfRIC
                               UG/Ctl  METEH  (8S C): ':'t~
         i r   ' 1      I• ••   ,i-
      61,  . t9    76    71     7J
»££.

na.
                             SUSPENDED TARTICULATB
                             HI-VOL  GRAVIMETRIC

                               UC/CU HBTF.R (25  C)
                                        ..  P  ':.,  '^^..v^'Sf!' .'   ..;'., •
                        INSTRIIMENTAL COLORIHtVKIC
                                 'METE*; Hi c)    !  ,
 : >'Ui

  jqJU

  *,
;--&W

  -UK
                                   04  ,  oJ-   b<4    oa
                                                                                                             FLAMS ION1IATTON
                                                                                                    .; Ufl/CO HETiR  (SfS C)
                                  TOTAL  OXIDANT8

                      JNSTnUMtNTAL
      b »     t,4 ' I.-..    • ,     tv
                                        ' i i   '
                                                                                  ;.•.   DJ .• •  ici i--,»a 1'Wji^w;*/;—••»• ' j ••<»,.T ; f v •'f •:• ,i f

-------
I
.  lll-V'H. I'I'.V. I'fl .TUIC-
        m'TCH i:"j r)
               57 '  58 * 59   '  60 '61^63  ' CJ  '04  • 68 "f 6i>  .' 67  «  68  • 69
                                                                      NrAi. UiKitun L
                                                                       M«t/CU «KTi:«
                                                                                                                USIVB i;:ru\i>i.»
                                                                                                                (25 C)
                                                                                           64 ' 65 ' 66 ' 6?  ' 6B  ' fc»  '  '0  '
                                        SULFUR DIOXIDE
                                 INSTRUMENTAL CONDUCTIMETRIC
                                      UO/CU METEK  (25 C)
              0  ' 64  ' 65  ' 66   «7  ^8   «»  ' 70   71   7.8
                                                                                   20JX
                                                                                    U
                                                                                     e
                                                                        NITROGEN DIOXIDE
                                                                   INSTHUMENTAL CPLORIMETRIC
                                                                       UO/CU METER  (25 C)
                                                           O   64   65    <6  '67   68   69   70"T 7i ' 7J
           li!

           Ul

           JJ
           TOTAL OXZDANTI
IMSTRUMENTAJ. COLORIMETRIC  KBUTBAl XX
         W/CV METER (IS C)
                                              UflJ

                                              JOI
               C) '64 ' 65 ' 66  ' 67^68 ^ 6»  ' 70 ' 71 •
                                                                OWOEf Of HITKOOBH
                                                             IK8THUMENTAI. COLORIMKTRIC
                                                                00/CU MBTER (}5 C)
                                                                                       68  •  68  •  70 '  71   it
          iUA
          14M
                                     TOTAL »YDROCAR»G!IS
                                  STRUMENTAL PLAMU ICMItATIOH
                                             KTJiH (35
                                                                                  gflft
                                                                           NITRIC OXIOf
                                                                     INSTMUMtHTAL
                                                                        UQ/CU MKTKR (>l
                               <..c   •:. /

-------
                                     10
           Table  3  !;u'",!'Kirir<:;j lhc» ,1971. oxidant data  for the Downtown

      Los Anrjoloy uit<.!s operated by Los /mcjales Air; Pollution Control. -i
      District.' ' The  number'; of ti^mcs ...that the* .nationairb^idan't .•fjtkh

      darcf was  exceeded iu p'rxiipented ,by month and  hou',r of • the -day.
     ; The marginal  totals indicate both the diufnal  pattern and

      seasonal pattern.   .        '    ;.           .    .        ' ';  '   : .

3.3.  .Frequency Distributions  ,      .       •:••;•.•••    ;,.. : •••,.;-' ••.-•, -;

           One characteristic pattern of air quality data' that is

      particularly  important becomes apparent after  examining sbrne •

      frequency distributions.   Many quantities  are' assumed to have

      a symmetric distribution about the average such  as  the normal-

      distribution.   Figure 3 shows the frequency. distribution for

      total suspended particulate data from Philadelphia,   it is

      apparent that  this  distribution is not symmetric.   However,

      Figure 4 shows  the  frequency distribution  for  the logs of

      this same data.  The  distribution is more  symmetric and. may  •.

      be approximated by  a  normal curve.  Data having  this; property

      is said to be  log-normally distributed arid ,this>is  a; qo^utvpn Y ;
           •.,    ,            .  _      •      .         •,•'",..•   ,'''.',>''.' i  ';
      assumption 'regarding  air quality data (I^argen,; 1971)>   ' ,'j

  4.  SUMMARIZING AIR QUALITY DATA           ./        '  :   ,:

           In preparing a summary of air quality data,  one of ^the

      moot important  steps  is to determine t,ho pvrposQ  of (the .:••'•'  •• ;.. .
        • ...      "       •':'••     ,. '.  •'  .•  '. '•'• •'•",    '"V"  --  '',.\ •
     •summary.  The usual use of these suitunar.ies is  to  indicate  J- .•''•'

      typical levels  and  peak IpVels.   This" section  d^abuQses \ ;;  ;  0

      coino of the basic statistics that can be usod  for this purpose.

-------
TABLE 3    NUMBER OP HOURS ABOVE OXIDANT STANDARD
           BY MONTH AND TIME OF DAY   (1971 DATA)
               DOWNTOWN LOS ANGELES
H123456789
T: *j •
rz3
:C.R 1
Jv??.
-~Vl
iu:; 12
JVL . ; .2
:.-JG " 2
3E?T 3
3CT . •
iJC'V
3F.C
IOTAL BY
HOUR 1 10
10


1
4

9
13
8
6
2


43
11

1
1
6
3
9
19
17
10
7


73
N
1
4
3
8
4
12
18
16
10
5
1

82
1
2
4
3
8
4
12
15
16
10
9
1

84
2
2
4
2
7
3
11
11
7
. 6
6


59
3
3
3
1
7
1
6
4
3
1
2


31
TOTAL BY
45678 9 10 11 MONTH
8
16
12
31 44
1 16
2 *1 65
1 83
1 70
46
31
2
. o I
73 393

-------
       FIGURE 3  - FREQUENCY DISTRIBUTION -  TSP  (PHILADELPHIA-1969)
  5C
  30


  20


  10


   c
         Cf>   Cf»  Cf»  CT»  CT» CM  CT»  Cft
           IO  1^ CO  Cf>  O
          I    I   I   I    I   I   I   i—
         O   O  O  O  O O  O   I
                                             CM  O>  CTl
                                            ."r-  CM  CO
                                                        en
                          CT>  CTv
                          in  vo
                               Ol  CT»
                               co •  en
                                              i   i
                                             O o  o
I O. •— .  CU, CO,
  r-l ,— »  ,— J
                                                  .1
                                                  o
                           i    i   i   i    i
                          o  o o o  o
                          u>  vo I"**, oo  en
Cn en
O r—
CM CM
 I   I
O O  O
                                         cr>  c\  ci c^
                                         CM  n  -^- i.-j
                                         CVJ  CM  CM C,J
                                          I   I    I
                                                                               O r—
                                                                                CM
                                                                                CM,
                                                  en  o A
                                                                                          CM  CM
40


30



20


10


 0
FIGURE 4 - FREQUENCY  DISTRIBUTION  - LOG OF  TSP  DATA  (PHILADELPHIA-1969
               !  to !
                              oo
         CO
         "t-
        .0
                   CO ' CO . CO
                    I •  III
                   000
                   in  *o  t**
               ..co • -co. co  co ro  co ]
                                  _..__.
                                  O» • O 1 r-i  CM  CO
CO  «
 I  . I •. I
O  O : O
en . O-! r—

                                                       to  r»» oo en  O-   •—• CM  ro
*3- . *j-  «i-  <•   «3- «s- «^-  *3-  in.   inintn
 i    i   i:   i     i   i  n  i   i.    it   i
O • O  O'O   O OOC3  O   OO  O
c\j •" co ' «?-.  in'UJt»-cocnc3   •— CM  n
                                                   I
                                                  O
                                                                   : in
                                                                      I
                                                                  ~\~s ~~-*-$&'
                                        in in  «n
                                         i   •
                t
               m
               in

-------
                                     13
      The first two subsections discuss the treatment of. typical
      and peak values.  The third discusses. the range of the data,
4.1.   Indicating Typical Values
           This section discusses the arithmetic mean, the median,
                              • '           ',,'••'•         .
      and the geometric mean as indicators of typical values.  The
      arithmetic mean and the median are frequently used in air   ;
      pollution studies" because :. of certain properties of the log-normal
      distribution.  In choosing the appropriate statistic, the purpose
      of the sximmary must be considered.  While all three may indicate
      typical values, if the purpose of the summary is to compare
      the data to the National Ambient Air Quality Standards, then
      the standard suggests the: appropriate statistic.  A commonly
      used statistic to indicate typical values is the mode.  The
      mode is the value that occurs most frequently.  The use of. the
      mode is not discussed here since it is frequently of little
      value in summarising air quality data.  For example, the mode
      for oxidant could be near the minimum detectable due to low
      values throughout the night.
           Arithmetic Mean
                Given a set of n observations, say X,, X_, .'..", X ,
      the arithmetic mean is simply    rr   1   £- v   .
            •  '   •      -         '      x •"         x  '      '
         •,:      When the term "average" is! used the arithmetic mean
      '  •  •"   •   '     .       .. ,1,1 '•''*'*   •' '  •    • '      '      • '
      is usually what is meant.' ::                        ''.      ,

-------
                               14
     Modi c^n



          The median  is  the  middle  value of the data.   That
                              1        •''-'.'       •     !  . - 'J
                                       '" •    .    ' .   .  • • .  ' {'.• ,-" •

is if tho data  is  ranked in  order of magnitude so that



         ...  < X then the median is  X-..-,     if n is  odd,
             —   n,                    •".-•*•
    s                  ••.-..  •     . ..   •• •

and/X    +   X    ,  \   if n  is  even.
          2


           The median  is  a  convenient statistic that is not



influenced as much  as the  arithmetic mean by changes in the
                        • ;.  ;         •'....  • .'  .:.'•; •'..•> . •  •:- •

extremely  high or low values  of  the distribution.       .



     Geometric Mean



           Given a set of n observations,  say X. ,  X,, ... X ,



the geometric mean  is g = (X1«X2...X ) 'n .



           Since this  probably is the least intuitive of the



statistics presented,  it is worthwhile to discuss it in



more detail.



           If distribution  is  symmetric, such as .the normal



distribution, then  the expected  value of the arithmetic roe.an



and median are identical.   However, for a log-normally



distributed variable,  it is the  expected value of th^ geometric



mean that  approximates the expected value of tho med.,i;an,. j  •



Therefore, since some air  pollutants have a.distribution that



is approximately log-normal,  the geometric ir.ean .became'used/as



a convenient method of summarizing the data and for to.tal,  .



suspended  particulate, tho annual standards are expressed as



geometric  moans.

-------
                                    1
                                     r
                 As an alternate computational formula, it should



      be noted that
    ,   M    ••• :. :  •;••.. /  '-   ''••-. (-1,   n

9 = ~  2H  log x. or g = EX.''};?.  21  log x

    n  i«l- •. •/"-'-. .'-1','  '- --.w:V	
                                                          4V .
                                                          *•!"
                                                           **  •
4.2.  Indicating Maximum Values            ,   :



           As  in the previous section, the purpose of the summary



      is a critical factor in determining the appropriate statistic.



      .'•'aximum  values may be indicated by listing the maximum and/or



      the second highest value.  The second highest value is important



      uinee  compliance with the short-term air.; quality standards is   ;



      determined by this value. 'However( there are other statistics  '



      that are useful for indicating maximum values.  The principle



      difficulty with using the second highest value is that it does,



      not allow for differences in sample sixes.  For example,  if   :



      two monitoring devices are side by side and one monitors  every



      day of the year while the other monitors only every sixth day,



      it would be expected that the second high value for the every



      day device would be higher than the every sixth day device



      even though both monitored the same air.  Table 4 illustrates



      how the  second high value may vary depending upon different



      sampliisg frequencies based upon total suspended pafrticulate



      da|a from a Philadelphi^*site; that samplad daily; !i-^;    .'.',
        •* .'..,•      . - •  '   ', ; '  •'•.''  •:'V ".."' ' H   " •'- • '• •'•!*•*''•, '• -. '  • ' • . •'   '" , ,V- •-•' ''.•*" - '•• '>  '
                    '.''    ' ' '  ''.''' i   '••'   '•;'•'--'-'•''"'' '  .-'••" ^T.' --''-'v''' '.  ; • '  '.' A

         \ To  allow for this d:ependenco upon saunple.••'d^Jt'^V;'}^4r.i6us



      percentiles are somotimq's! used to indicate maximum values.



      For example,  the 99th percentilo nugtit be used for hourly



      delta while the 90th might be appropriato for daily measurements.

-------
TABLE 4
MAXIMUM AND SECOND HIGH VALUES ( PHILADELPHIA-1969)
           FOR VARIOUS SAMPLING SCHEMES
                                                                                     ' *•' "• .",
Sampling Schedule Observations
Everyday - "'•• 365
Every Sixth Day 61
" 61
X: .61
61
•••"•• 61
: ; " 60
" ' - " • .-- --._'. " n .- ; •. .:;: - ., - __• 25 '."'"
:---,- •', ••' • ,;:: --•- '-' "-•• ••• 25 :- v: •••
-v.-- •'.', ' -: " ..':.-.'•• '- ' . • 25 ••;
".. .:- -; '-•' "- '-. • ' '• 24-
: " 24
-.•:;• ; r, ,:"--' - - - '24 • '
. ' •- .•.-.--- " - "'" ---, 24
'---.- . .,;. "; "'. -' '- •" ' " , -24. .. •
-Cv' ' •• 'v,-::- •-^•r.'-::- -:-- -:; 24;.
--.. :-•...• --,".: ---•.'• - - 24
-'•;•- • -' --"- :--" :-. \ ' - 24"'ir-\ :
. - .- • "•'• . . •/• ' • --. 24
"---. ••••-". ' - ' , . 24
Haximvim
325
219
195
: 244
. 215
325
239
. 205 : -.
,325 .
•"-C-r "^'239 '-i- •"•'•• '
~. 219 :
'..--..•^•234 ••-,' ;.-...:
201
215
195 -
. - 183
/ .-,.;'• 195 x;:,;- -.-
'•"• ---j 160 •.•":.;••; ^
- 244 ^ ':•:•':' '
•:. "'••" .215 •' ?'•"••"'•
•:-'. -179 ..=:'.. -:.
238
Second Highest '"'"'
• ''•••••-' 24;4 ' ; ' •
'• >•- .;; 215 •''"' :'•- :' -
j • 171 ;
•;. •:- -238 - - -^::-:
'••: .-''211 . . '.: •;•
. "; .:. 234 -:. . •,,,-;:
. ' '205 ---vTv.--
'•':.'"."• :^ 176 .- ':' -•;{;; ;":
-X.c=; •;;' 207 ''•' ".. .'•/' "• i^-,'
"'-''•*•' •-'• •'""'' -191; ~? .:-;!V %.'X
•v-v- -' -196 . ; -V-^^
-' V-:'-, X-;;'X65 X -•'••;/- -'-; .'. .• 'S
.-' -1 98 r ::/.•• •"-•: "•'.""
;'" • 211 ' 'i;:- -•""" .
183 " ," \;-'
-.... - 173; .-, ;;\.;.-:i-f.)
'.-. ' : 169 -.''f '.' '... • >V/Xi
•.•:-'.'..':• ,;.;Xi54;;- ;=;\.'^i- . .X-X;-^
.•' •^.•..-•'X:;139/ *;;'X"^.i-'X '-"• ;-^:
;XX;-", •; ,.7^*2 01-" - .'•"'. : ' ' :.' " ";' -^
• -.<:.>-::'-;.171. - ?.:'•>" 'X:-.- - '•-"•':'
-X-'-'-'iS??? •":a-';^--

                                                                          •'' • --'  "C - : *' V-%-''-•••'"-'• V

-------
                                    17
      By using a percontilc value, allowance is made for varying
      sampling frequencies from site to site and year to year.
      Table 5 indicates the 90th percentile for the sampling
      schedules used in Table 4.                     •
4.3.  Indicators of Spread
           In addition to an indication of typical and peak values ,
      it is also desirable to have a measure of how variable the
      data is.  Did it fluctuate widely or were all values fairly
      uniform?  The customary statistics for this purpose are either
      the arithmetic standard deviation or the geometric standard
      deviation.  Ranges or perccntiles could also be used depending
      upon the desired use of the summary but they are not discussed,
      The basic formulas for the arithmetic and the geometric
      standard deviations are given below.
           Let X.,, X2, ..., X  be a set of n observations.
      Then the arithmetic standard deviation is:
                 i  £  (x.-x)2]1/2  where x-I  £  xi
                 n  i=i   i    J               n  i=1
      and the geometric standard deviation is
                 EXP [i   £  (In x± - Ing 2J 1/2
      where g is the geometric mean.
  5.   MAKING INFERENCES FROM AIR QUALITY DATA
           Once the air quality data has been summarized, it is in a
      convenient form to be examined so that conclusions can be made
      regarding air quality.  At. this point the data is either

-------
TABLE 5      GEOMETRIC MEANS, MEDIANS, AND 90TH PERCENTILE VALUES
                     FOR SAMPLING DATA OF TABLE 4
Sampling Schedule
Everyday
Every Sixth Day
n
w
n
w
, . if
Every Fifteenth Day
• - ; If. '.:'
• ' * • ';
'•'.-"'• H
" " ' . ;
. .'•
— '*• "
W
n .
•*<•-. '. •'
- ' . . 1*
•"•»'""-" " -
". • ' " •
it •
•: „ . • .. ...
tt
Observations
- 365
61
- 61
61
61
61
60
• ••"• "• ^s - :
' •"-• -25 •' -
-' -- 25--,:,,.
., - ' .25--'
-.- 25-
24
24
24
.•.•'•:• .24
24
, ./.-.-,- . -.24
24
24
24
•••''::" 24 . •
Goornctric Mean
102.5
99.3
95.2
113.6
107.2
106.4
,S*.7--
:.100,2
vl!4;6.
, 125.0
104.9
100.3
99.8
104.4
- 102.4
. .:92.1
.100.8
• 92. o:
104.6
.107.2
' :-- 94.1
' -S3. 6
Median
-97
105
93
113
101
105
94
":-•-' "111
121 :
130. ,
• :95 -•"
105
90
98
99 -.
25
. S6
; -as
97
109
- -54 --•
98
90th Percentile
171
-162
155
188
. 177
'•171\
153
175
178
.: ::iS9
•' -+n £
: !1-18.
190.
M77
-.- .!1?i
• " v-. ;-.:.:'l^3;;'
..162
140
186
173
. .••U6-2-
. '- " '165

-------
                                    19
      extremely useful or extremely dangerous depending upon the
      quality of the  summary.  Thin section discm^seu thane inferences
      to illustrate the potential dangers that can. roe'ult-from in-
      adequate s:rrjaaries. - For convenience, the discussion is divided
      into two parts.  The first deals with inferences about a
      particular site while  the second deals with  inferences about
      a region.
5.1.  Inferences About a Particular Site
           This section discusses inferences that  can bo made about
      a given sito from one  year's data for a particular pollutant.
      Since any conclusions  based upon the data can be. no better
      than the data itSelf,  the most important part of the summary
      is to decide if the data gives adequate annual coverage.  This
      relates directly to the previous discussion  of characteristic
      patterns.  If an annual average is to be computed from the
      data, then it is essential that .all portions of the year be
      represented equally.   An examination of the  seasqnality that
      exists for certain pollutants shows why this is essential.
      As a convenient rule,  it may bo. assumed that if each calendar
      quarter contains at least 20% of the total observations then
      the sample is adequately balanced.  If this  is not the case,
      then a more appropriate way to determine the annual average is to
      use a weighted mean calculated as follows:
           (1)  determine the average for each quarter and
           (2)  compute the  average of these four  quarterly averages.
           While the previous constraint applies to the seasonal balance
      of the sample, it is also essential to have  a restriction on
      the minimum number of  observations that are  required to compute
      an annual ir:-.-an.  Such  constraints are employed in the National
      Aeronatric Tata Bank system ('llehls and Akland, 1973) and to
      maintain un if trinity/ they are repeated here.  For continuous

-------
                               20
measurements at. lirtist 75C, of the total possible observations

should be present before sunimary statistics are calculated.

The exact requirements arc given in Table 6.  For intermittent

sampling data, l.hore must be at least five observations

per quarter and if one month has no observations the remaining

two months in that quarter must both have at least two obser-

vations.  While those conventions are used in general/ it is

of course possible to modify them for certain applications.

For the most part the general intention of these restrictions

is to ensure that the observations are sufficiently represen-

tative of .the entire year to calculate an annual mean.  For
                                           4
peak value statistics such as the number of times a certain

value is exceeded the constraint is not essential in showing

violations.  For example, two hourly oxidant values in excess

of the standard is sufficient to show non-compliance even if

there were no other observations that year.  Nevertheless, to

assess the extent of the problem, data sufficient to meet the

requirements for determining a mean would be advantageous

although for seasonal pollutants it could suffice to summarize

only particular quarters or months.

     In discussing the inferences that can be made from a given

sample, it is worth observing that while the annual mean can be

either undei— or over-estimated the maximum and the second

high values can only be vindcrestirnatcd assuming no instrumental

error.  Tor example,  if a simple hypergeor.ietric probability

-------
  TABLE G
SUMMARY CRITERIA FOR CONTINUOUS MEASUREMENTS
   Time Interval
                  Minimum Number of Observations
3-hour running average
8-hour running average
24-hour
Monthly
Quarterly
Yearly
                3 consecutive hourly observations
                6 hourly observations
                18 hourly observations
                21 daily averages
                3 consecutive monthly averages
                9 monthly averages with at least
                  two monthly averages per quarter

-------
                                    22
     inoclcl 'is i-.rjsunioci, Table 7 shows the probability of detecting

     violations of the .'.short-tcrin standard 'as a function of

     .sampling frequency.  From this table it may be seen that if
   • . ,. i                 i                       I
    /•samples are taken every sixth day the probability of detecting

     two excursions above the standard is less than 500 unless the

     site actually exceeds.the standard 10 days per year.  This

     illustrates the weaknesses associated with xiatcriuining maximum

     values on the basis of intermittent sampling.

          Two possible solutions to this problem are  (1) to

     intensify sampling schedules or (2) to use 'mathematical

     equations to extrapolate from the data tip predict maximum

     values.  At the present time, /there is no convenient predictive

     formula that can be applied on a general basis to give sufficiently

     accurate maximum values.  As a guide, the predictive formula

     developed by Larsen (1971) based on the log-normal distribution

     may be used to determine the possible magnitude of the under-

     estimation due to intermittent sampling., However, this

     empirical model assumes log-normality arid independence and

     should not be used to determine compliance with  the standards

     since its predictive accuracy has not been 1 fully documented.

5.2.  Inferences About a Region   ;
                                                (i
          Once conclusions have been made for each site in a region

     the next step is to draw conclusions concerning the region*  If

     any one of the sites exceeds the NAAQS then the region is not

     in compliance.  It should also be pointed out that the worst

-------
TABLE-  7         PROBABILITY OF SELECTING  TWO OR MORE DAYS WHEN SITE
 .  \  .                              IS ABOVE STANDARD
                                   Sampling Frequency - Days per year

        Actual no.
        of excursions             61/365               122/365           183/365
o
t-
4
6
8
10
' 12
14
16
.- IS
20
22
:24
26
.03
•• ' • " , -13 "
.26:
.40
.52
.62
.71
.78
V .83
• ".?';'' ; •;"' .87 '••
.91
.93
.95
.11
.41
.65
.81
.90
.95
.97
.98
.59
.99
.99
.99
.99
.25
.69
.89
.96
.99
.99
00
• s •*
.99
.99
•-.;•; .99
,99
.99
.99
                                                                                                     NJ
                                                                                                     OJ

-------
si to  in  the region may r;t.i 11 uhrorostinate the  magnitude  of
the air  pollution problem.   The only way in which a  cite  may
ovc:. o:;tiT".-;:te Iho air pollution problem is if it ic not
rcprri^eiTt-.ative of the air to which receptors are exposed.  There
are guideline documents discussing this subject.  While it is
rel.i..lively ear.y to compare the aij: quality in a region with
the TUv^QS  it is not so easy to compare one region with another.
For maniple,  one region may choose to concentrate most of its
monitoring efforts at sites having high pollution potential while
another  region may have numerous sites monitoring background
levels.  Therefore, extreme caution should be used if such
comparisons must be made and particular attention should  be
given to the placement of monitoring sites.
Some Statistical Vests
    When making inferences from air quality data it  is frequently
necessary  to have some objective means to make  judgments.  This
is the point at which statistical inference becomes  useful.  The
previous treatment has used statistics merely for descriptive
purposes in order to conveniently summarize the data. The
purpose  of statistical inference is to objectively substantiate
generalisations made from the data.  For this reason, two basic
statistical tests are discussed.
    While  those statistical tests are relatively straight forward,
a certain  dorree of caution is required regarding the underlying
aosuriptions that determine their validity.  Since one of  these?
cu»suy..ption£ in particularly important in applications dealing

-------
with air quality data, it. will be discussed in detail.
    In statist-:ic:r>, it is commonly assumed that the data to be
analysed is a random sample of all the data and thcit the
measurements arc; independent.  While this may be approximately
true for interini.tt.ont data col lee tod on a sampling scheme com-
parable to that employed by the NASN., it may not be true for
all samples.  Tor the most part, these statistical assumptions
are merely a mathematical formulation of common sense ideas.
Certainly, if data were only collected on Sundays, it would
not be expected that the average of these numbers is truely
representative of the annual average.  Sampling schedules that
only monitor certain days of the week result in non-remdom
samples and their degree of usefulness is inherently limited.
The problem of independence is somo;:what more subtle.  For
example, successive hourly oxidant measurements are not in-
dependent.  While the concept of statistical independence
may be clearly defined in mathematical terms, it is possible
to present an intuitive notion of what it entails.  Two
numbers may be thought of as being independent if knowing
one of the numbers does not help in guessing what the other
number is.  The classical example of this is rolling dice in
which knowing what number occurred on one die does not improve
a guess of what number occurred on the other.  With this in
mind,  it is apparent that knowing one hourly oxidant value
hclpr. in guessing what the next hourly value will be.  It

-------
                                    26
      should ho noted that it is not nocosr-ary that it make the
      guess a 'certainty-only that it improve.the chances of guessing
      correctly.
          With the ideas of randomness rnd independence in mind,  it
      is possible to present two statistical techniques that are
      generally useful in practice.  The first test is commonly known
      as student's t-test and is useful for examining the mean.  The
      second test is the non-parametric quantile test and despite
      the rather elegant name it is a convenient test for the median
      and other percentiles and is very easy to use.
6.1.  Student's t-test
          The Student's t-test is a commonly used statistical test for
      data that may be assumed to be normally distributed.  As mentioned
      earlier, air pollution is frequently assumed to be log-normally
      distributed so that the t-test may be employed to examine the
      logarithms of the data.  The application of this technique to
      determine confidence intervals for annual goemetric means has
      been discussed by Hunt (1972) and is briefly treated here.   This
      present discussion examines construction of a confidence in-
      terval for an annual mean.  Extensions to comparisons of two
      means may also bo performed but are not treated here since the
      approach is almost identical and can be found in basic statistical
      texts.  More general tests concerning trends at a site are ex-
      cimined in the guideline document for trend analysis.
          The basic application is that a set of data from an intermittent
      monitoring device has been obtained.  This data has been used
      to detenr.ine the annual, geometric moan.   Since this data re-

-------
presents only  a  fraction  of  the  tottil  number of cluyn in the year,
the question arises  uu  to how clone the mc-un of the- data in to
the actual  annual  mean.   The statistical technique employed for
this purpose is  the'confidence interval so that a probability
statement may  be made regarding  the range of the true annual
mean.
    To calculate a 95£  confidence interval for the geometric
mean, the interval in first  constructed for the arithmetic mean
of the logarithms.  To  do this,  the following calculations
are necessary:
               1
    Let x,   - n £  log x.    , where n is the sample size
            J     is-"1
               I  n             -    •> 1 /?
    T r*4- C    _ I I V   fl*~ ^.  _ „   \*-\ •*-/*•
    L'^ Sloq

                        n-1
    Let d =^  t1_a/2   -     U-'j), ;>where '-'t^ay2 'is obtained

    from a table  for Student's t-test where 1-ct is the con-
    fidence  level and  N is the possible number of samplest
    e.g. 365 for  daily samples.
Then the lower  and  upper confidence intervals for the geometric
mean, denoted as  L  and U respectively, are given by

    L ' EXP(*log  -  d)                            -

and U = EXP(x.j    +  d) .

    It should be  noted that in the above formulas the finite
correction factor, (!-£) ,  was used since it is asfUKiiod that the
                      r.

-------
      population pixo is.finite raUv.-r than infinite.   For example,



      in com; I do ring daily measurements it is assumed  that the



      population size i;.; 3G5, i.e.  the total number of days in the



      year.



6.2.   Non-Paranatrie Quantile Test



          In discussing the t-test it was pointed out  that it is



      necessary to assume; that the logarithms of the air pollution



      moasurci'.iOMits are normally distributed.  In some  cases, it may



      not be dosirable to make this assumption.   For example, an



     .examination of the data may show that such an assumption is



      unwarranted.  For such cases, non-parametric statistical tests



      are appropriate since they do not require  any assumptions



      regarding the form of the underlying distribution.  Moreover,



     -non-parametric tests are frequently quite  easy to employ since



      many of the calculations are relatively simple.   A variety



      of non-parametric tests are available.  A  more detailed des-  ,



      cription  of the test discussed here is available in the text



      by Conover (1971).                                 -



          Quantile is a more general term than percentile.  For the



      present discussion,  the test is used to examine  the median but



      it may also be applied to any percentiles  or quantiles.  It is



      also assumed that there are more than 20 observations since



      this is generally true for air quality problems  and reduces



      the need  for tables.

-------
                                   29
        Let x ,, x~» ..., x  be n sample of air quality measurements


    cind suppose it is desired to test if the annual median is


    greater than a specific valxje, say s.


        Then it is only necessary to calculate the following two


    values:


        T - the number of sample values less than or equal to s


    cind t = pn + w  ./ np(l-p) , where n is the sample size p is i


            the quantile value and w  is the a quantile of a standard


            normal random variable.


        For tests at the .05 level w  is - 1.645.


        For tests concerning the median the quantile value is. .5


        so the above formula becomes


        t = ,5n - 1.645  /.25n
                         v


          = .5n - .822  Jn  .


        If T is less than t then the conclusion may be stated that


    "the median is greater than s" and that the result was obtained


    by employing the quantile test "at the 5% level."


7.  Basic Means of Obtaining Air Quality Data

        One station continuously monitoring oxidant can produce


    8,760 observations.  Therefore, considerable caution should


    be exercised when requesting air quality data since there is


    a considerable rink of being inundated with unnecessary numbers.


    Usually when questions arise concerning air quality, the answer


    may be given in terms of summary statistics and it is not necessary


    to review the raw data.  Certain biisic sources include the various

-------
                               30



periodic  reports  from State  and  local agencies as well as
        /
y;?/Vs r'eporl-.y  on  the  NASN  and  CAMP  monitoring efforts.

Overview  reports  with extensive  appendices sucli as The Nat tonal

7-.i. r Monitor ing Prog r o.r.i;  Air Quality an d Emis.si on Trends Annual

Rejjojrt, are also  available.

     The  National Aorometric Data Ban]; provider- many

summary files  that may be  accessed by time sharing terminals.

In addition,  the  NADD provides printouts containing general

information that  moy  be easily looked up with no need to

access  the computer.   Table 8  lists frequent questions and a

readily available source.

-------
TABLE 8
NADB OUTPUT FOR COMMON QUESTIONS ON AIR QUALITY
Question

What data is available nationwide for a
     particular pollutant?
                                        Source
                                        Inventory by pollutant
What cat a is available for a particular
     geographical region?
                                        Inventory by-site
f-rhat was maximum value at a site  (annual)?
                                        Any inventory
What'was mean value at a site (annual)?
                                        Any inventory -
                                           if valid year
How many observations  (annual)?
                                        Any inventory
Status of a site with respect to NAAQS?
Frequency Distribution
                                        Time Sharing Option (TSO)
Quarterly or monthly data
Raw data
Description of the site such as UTM coordi-
     nates, county, operating agency, etc.
                                        Site File

-------
                           REFERENCES
            ,  W.  J. ,  "Practical Nonparametrie Statistics , "
           [iley and Sons, Inc., New York, 1971.
2.
    Pol
            .  F. ,  Jr. ,  "rj'ho Precision Associated with  the
            5  Pr^am.-ncy o:': Log-Normally Distributed  i\i.\:
            nt Measurements," Journal of the 7iir Pollution
            Association,  Volume 22, No. 4, September  1972.
3..
             R-  I. /  "A Mathematical Model fox- Ralating  Air
            .Measiirementrj to Air Quality Standards,"  AP-C9,
              1971.
4.  Nchls^  G.  J.  and G.  G.  Akland, "Procedures  for  Handling
    Aercir.e'tric D;.'.t--\, " Journal of the Air Pollution  Control
    Association,  Volume  23, No. 3, March 1973.

5;  "Guidb.linels  for the  Evaluation of Air Qiiality Trends,
    Inter,!™ Guidelines," U. S.  Environmental Protection
    Agency,  Offico  of Air Quality Planning and  Standards,
    Re'sc^fch' Triangle Par]:, N.  C. , OAQPS No. 1.2-014,
    December 1973.

-------