Optimum meteorological and air pollution sampling network selection in cities : Volume II, Evaluation of wind field predictions for St. Louis


SEPA
            United States
            Environmental Protection
            Agency
            Environmental Monitoring
            Systems Laboratory
            PO Box 15027
            Las Vegas NV 89114
EPA-600 4-79-069
October 1979
            Research and Development
Optimum Meteorological
and Air Pollution
Sampling Network
Selection in Cities:

Volume II - Evaluation
of Wind Field  Predictions
for St. Louis

-------
                   RESEARCH REPORTING SERIES

Research reports of the Office of Research and Development,  U.S. Environmental
Protection Agency, have been grouped into nine series. These nine broad categories
were established to facilitate further development and application of environmental
technology.   Elimination of traditional grouping was  consciously  planned to foster
technology  transfer and a maximum interface in related fields. The nine series are:


      1.    Environmental Health Effects Research
      2.    Environmental Protection Technology
      3.    Ecological Research
      4.    Environmental Monitoring
      5.    Socioeconomic Environmental Studies
      6.    Scientific and Technical Assessment Reports (STAR)
      7.    Interagency Energy-Environment Research  and Development
      8.    "Special" Reports
      9.    Miscellaneous Reports
This report has been assigned to the ENVIRONMENTAL MONITORING series.This series
describes research conducted to develop new or improved methods and instrumentation
for  the  identification and quantification  of environmental pollutants at the lowest
conceivably significant concentrations. It also includes studies to determine the ambient
concentrations of pollutants in the environment and/or the variance of pollutants as a
function of time or meteorological factors.
This document is available to the public through the National Technical Information
Service, Springfield, Virginia  22161

-------
                                                   EPA-600/4-79-069
                                                   October 1979
           OPTIMUM METEOROLOGICAL AND AIR POLLUTION
             SAMPLING NETWORK SELECTION IN CITIES

Volume II:  Evaluation of Wind Field Predictions for St. Louis
            Fred M. Vukovich and C. Andrew Clayton
                  Research Triangle Institute
                        P. 0. Box 12194
                    Research Triangle Park,
                     North Carolina  27709
                    Contract No. 68-03-2187
                        Project Officer

                       Janes L. McElroy
     Monitoring Systems Research and Development Division
        Environmental Monitoring Systems Laboratory
                   Las Vegas, Nevada  89114
          ENVIRONMENTAL MONITORING SYSTEMS LABORATORY
              OFFICE OF RESEARCH AND DEVELOPMENT
             U. S. ENVIRONMENTAL PROTECTION AGENCY
                   LAS VEGAS, NEVADA  89114

-------
                               DISCLAIMER
     This report has been reviewed by the Environmental Monitoring Systems
Laboratory-Las Vegas, U.S. Environmental Protection Agency,  and
approved for publication.  Approval does not signify that the contents
necessarily reflect the views and policies of the U.S.  Environmental
Protection Agency, nor does mention of trade names of commercial  products
constitute endorsement or recommendation for use.
                                      ii

-------
                                FOREWORD
      Protection of the environment requires effective regulatory actions
that are based on sound technical and scientific data.  This information
must include the quantitative description and linking of pollutant sources,
transport mechanisms, interactions, and resulting effects on man and his
environment.  Because of the complexities involved, assessment of specific
pollutants in the environment requires a total systems approach that tran-
scends the media of air, water, and land.  The Environmental Monitoring
Systems Laboratory-Las Vegas contributes to the formation and enhancement of
a sound monitoring data base for exposure assessment through programs designed
to:
          *  develop and optimize systems and strategies for moni-
             toring pollutants and their impact on the environment

          *  demonstrate new monitoring systems and technologies by
             applying them to fulfill special monitoring needs of
             the Agency1 s operating programs

     This report is the second in a series (see EPA-600/4-78-030) on a method
for designing meteorological and air quality monitoring networks and the
application of the method to the metropolitan St. Louis area.  It is concerned
with the evaluation of the meteorological (wind field) network selected for
St. Louis.  Regional or local agencies may find this method useful in plan-
ning new or adjusting existing aerometric monitoring networks.  The Monitoring
Systems Design and Analysis Staff may be contacted for further information
on the topic.
                                        //
                                        George B. Morgan
                                             Director
                           Environmental Monitoring Systems Laboratory
                                            Las Vegas
                                     iii

-------
                                PREFACE
     This document is the second in a series on the development of a method-
ology for designing optimum meteorological and air quality monitoring networks
and the application of the methodology to the metropolitan St.  Louis area.
It deals with the evaluation of the meteorological (wind field) network.   The
first document (EPA-600/4-78-030) considered the theoretical aspects of the
methodology and the network(s) established for St. Louis.   Subsequent reports
will be concerned with verification of the methodology with regard to the
air quality.
                            James L. McElroy
                            Project Officer
             Environmental Monitoring Systems Laboratory
                              Las Vegas
                                     iv

-------
                              SUMMARY









     This report is the second in a series treating a method for develop-




ing optimum meteorological and air pollution networks and the application




of the methodology for St. Louis (EPA-600/4-78-030 describes the method




and the network for St. Louis).  This particular report deals with the




evaluation of the wind field determined from the optimum network.  For




this purpose, wind data obtained through summer (August 1975) and win-




ter (February-March 1976) field programs were reduced and validated.




The basic objective of the evaluation was to determine the precision and




accuracy of the procedures used for estimating the wind field.  The




procedures for determining the wind field involved applying stepwise




regression to a class of statistical models and data from a 19-station




network; the network Cthe optimum network) and the class of models




(linear statistical models involving subsets of a specific set of 13




terms) were determined during the theoretical phase of the study.




Evaluation included the selection of a large class of model forms to




compare with the 13-term class.  For this purpose, a basic set of 23




terms which were dictated by the results of the theoretical phase of the




study was chosen.  The evaluation also included estimations based on




data from all reporting stations—up to a total of 26 stations.




     The principal conclusion of this study was that application of




stepwise regression to the 13-term model together with wind data from




the 19-station optimum network produced predicted wind  fields  comparable




to those obtained by more general procedures Cthe 23-term model) applied




to a larger network (at most 26 stations).  This substantiated through




observed data the results of the theoretical analysis conducted in  the




above-mentioned report.




                                v

-------
     An exhaustive evaluation was not feasible largely due to numerous




analytical and data limitations.  Less than 50% of the total data collec-




ted could be used for the analysis due to unreported and invalid data.




The wind data associated with the winter field program was atypical for




that period in that the. period was characterized by southwesterly winds.




According to available statistical information, the winter period in the




St. Louis region is normally characterized by northwesterly winds.




Furthermore, relative measures had to be utilized in the evaluation




since the best model was unknown, and since only a small number of




additional (i.e., non-network) wind monitoring stations were available




in St. Louis.  Also, errors which arose from network deficiencies could




not be isolated from errors arising from other sources (e.g., model




deficiencies, measurement errors).
                               vi

-------
                                CONTENTS


                                                                  Page

FOREWORD	      iii

PREFACE	       iv

SUMMARY	        v

LIST OF FIGURES	     viii

LIST OF TABLES	       ix

LIST OF SYMBOLS	      xii

ACKNOWLEDGMENTS	      xvi

1.   INTRODUCTION	        1
     Overview of the Proposed Methodology	        2
     Summary of Previous Results	        2
     Objectives and Scope of the Current Research	        5
     Objectives and Scope of Remaining Research	        5

2.   SUMMARY OF AVAILABLE WIND DATA	        7
     Description of Raw Data	        7
     Data Editing	       12

3.   EVALUATION TECHNIQUES	       21
     Selection of Modeling Procedures	       23
     Criteria for Evaluating Modeling Procedures	       29
     Criteria for Evaluating the RTI Network	       34

4.   EVALUATION RESULTS	       38
     Summary of Specific Models Selected by Alternative
       Approaches	       39
     Comparison of Alternative Modeling Procedures Using
       Wind Data From All Stations	       45
     Comparison of Alternative Modeling Procedures Using
       Wind Data From Stations in the RTI Network	       53
     Accuracy of Predicted Wind Fields	       58
     Combined Evaluative Measures	       65
     Selected Cases and Conditions	       68

5.   DISCUSSION OF RESULTS	       90
     Conclusion and Findings	       90
     Analytical Limitations	       93
     Remarks	       94

REFERENCES	~	       97
                                    vii

-------
                          LIST OF FIGURES
Number                                                            Page
          Location of Stations  in  the  RTI  Network and
          Other Non-Network Stations Used  in the  Eval-
          uation	
          Pooled Vector RMSE's For  Individual  Stations
          by Modeling Procedure	      61

          Plot of f  (j) versus a, for Five Modeling

          Procedures (j)	      67

          Plot of g  (j) versus a, for Five Modeling

          Procedures (j)	      67

          Pooled Measures  of  Estimation  and Prediction
          Errors Versus Prevailing  Wind  Speed.	      77

          Observed and Predicted Winds for Case  I:   (A)
          Observed Data;  (B)  Predicted Winds Using Proce-
          dure 1;  (C) Predicted Winds Using Procedure 2	      84

          Observed and Predicted Winds for Case  II:  (A)
          Observed Data;  (B)  Predicted Winds Using Proce-
          dure 1;  (C) Predicted Winds Using Procedure 2	      87

          Observed and Predicted Winds for Case  III:  (A)
          Observed Data;  (B)  Predicted Winds Using Proce-
          dure 1;  (C) Predicted Winds Using Procedure 2	      88

          Distribution of  Predicted Winds on a 2 km by
          2 km Grid  for Case  II Using Procedure  2	      95
                               viii

-------
                          LIST OF TABLES

Number                                                           Page

   1      Geographic Locations and Terrain Elevations for
          Stations in the RTI Network	       4
          Geographic Locations and Terrain Elevations for
          Stations Not in the RTI Network	
   3      Coefficients Used for Determining Frictional
          Velocity Components	      11

   4      Mean Roughness Lengths (T ) , by Station	      12

   5      Distribution of Cases, by Date and Time-of-Day
          and by Date and Number of RTI Network Stations
          Reporting	      15

   6      Number of Cases for Which Valid Wind Data Are
          Reported, by Station	     16

   7      Number of Available and Potential Observations
          by Network and Season	      17

   8      Distribution of Cases, by Season and Prevailing
          Wind Conditions	      18

   9      Summary Statistics, by Station for Observed Wind
          Data Over All Cases	      20

  10      Summary of Modeling Procedures.	      29

  11      Distribution of Cases by Model Size—For Four
          Modeling Procedures Applied to Wind Component
          Data From Stations in the RTI Network and the
          Full Network	      40

  12      Pairwise Comparisons of Modeling Procedure0 in
          Terms of Model Sizes and Model Forms	      42

  13      Percentage of 908 Cases in Which Network/
          Modeling Procedures Resulted in the Same Model
          Form	      43

  14      Number of Cases For Which Specific Model Terms
          Are Selected, By Wind Component, Modeling Pro-
          cedure, and Network.	      44

  15      Summary of Analysis of Variance Results Based
          on Estimations From the Full Network	      46
                                 ix

-------
LIST OF TABLES (cont'd)
  16      Values of Pooled Evaluative Criteria by Season,
          Wind Component, and Modeling Procedure Based on
          the Full Network Estimations	      47

  17      Distributions of Residual Standard Deviations
          Over the 908 Cases For Four Modeling Procedures
          Applied to Data From All Stations	      49
                                     2
  18      Distributions of Adjusted R  Statistics Over the
          908 Cases For Four Modeling Procedures Applied
          To Data From All Stations	      50

  19      Percentage Frequency Distributions of Residuals
          by Season, Wind Component, and Modeling Procedures
          (Over All Cases and All Stations) Based on Full
          Network Estimations	      51

  20      Percentage Frequency Distributions of Residuals
          by Wind Component and Modeling Procedure  (.Over All
          Cases and All Stations) Based on Full Network
          Estimations	•	      52

  21      Summary of Analysis of Variance Results Based
          on Estimations From the RTI Network	      54

  22      Values of Pooled Evaluative Criteria by Season,
          Wind Component, and Modeling Procedure Based on
          RTI Network Estimations	      55

  23      Distributions of Residual Standard Deviations
          Over the 908 Cases For Four Modeling Procedures
          Applied to Data From Stations in the RTI Net-
          work	      56
                                     2
  24      Distributions of Adjusted R  Statistics Over the
          908 Cases For Four Modeling Procedures Applied
          to Data From Stations in the RTI Network	      57

  25      Means of Deviations Between Observed and Pre-
          dicted Values at Non-Network Stations, by  Wind
          Component and Modeling Procedure—Based on
          Estimations From RTI Network Data	      59

  26      Means of Deviations Between Observed and Pre-
          dicted Values at Non-Network Stations, by  Wind
          Component and Modeling Procedure—Based on
          Estimations From Full Network Data	      59

  27      Root Mean Square Errors  (mps) For Each Non-
          Network Station Based on Estimations From  the
          RTI Network, By Wind Component, Season, and
          Modeling Procedure	•	      60
                                 x

-------
LIST OF TABLES (.cont'd)
  28      Characterization of the Distributions Over the
          908 Cases of RMSE's Across All Non-Network Sta-
          tions—Based on Estimations From RTI Network
          Data	      63

  29      Characterization of the Distributions Over the
          908 Cases of RMSE's Across Stations in the Inner-
          Non-Network—Based on Estimations from RTI Net-
          work Data	      64

  30      Percentage Frequency Distributions of Devia-
          tions Between Observed and Predicted Wind Com-
          ponents at Non-RTI Network Stations—Estima-
          tions Based on Data From RTI Network Stations	      69

  31      Percentage Frequency Distribution of Devia-
          tions Between Observed and Predicted Values
          Based on Estimations From RTI Network Data	      72

  32      Sample Sizes, by Prevailing Wind Speed and
          Direction Categories	      73

  33      Summary of Estimation Errors By Prevailing
          Wind Speed and Direction Categories	      74

  34      Summary of Prediction Errors By Prevailing
          Wind Speed and Direction Categories	      76

  35      Percentage Errors in Wind Speed Predictions
          at Non-Network Stations, by Prevailing Wind
          Speed Categories	      79

  36      Prediction Models for Three Specific Cases	      80

  37      Analysis of Variance Results for Three Specific
          Cases	      82

  38      Root Mean Square Errors (mps) For Three Specific
          Cases	      83

-------
                          LIST OF SYMBOLS


U              west-east wind component in meters per  second  (mps)

V              south-north wind component in meters  per second  (mps)

W              wind speed (mps)

(U,V)          wind vector with components U and V

x              west-east geographic coordinate  relative to a  given
               origin, in kilometers  (km)

y              south-north geographic coordinate relative  to  a
               given origin  (km)

h or h(x,y)    terrain elevation in meters (m)  at the  point (x,y)
               relative to a fixed base plane

k              wind component index (k = 1 for  U-component, k = 2
               for V-component

i              station index

t              time index

j              index that identifies modeling procedures

(x.,y.)        geographic coordinates of i   station

(U  ,V  )        observed wind components at m minutes away  from  a
               nominal time point  (at 10 or 30  m above ground
               level)

(U  ,V  )        20-minute average of wind components, at 10 or 30 m
               above ground  level

(U',V)        20-minute average of wind components  at 10  m
               above ground  level, as estimated from observations
               at 30 m above ground level

(U*,V*)        west-east, south-north components of  the friction
               velocity

e              elevation above ground level  (m)

T              mean roughness length  (m)

Z,   or        20-minute average for wind component  k  at 10 m
7   /„  _rx       above gound level, at  time t and at the point  (x,y)
                                xii

-------
LIST OF SYMBOLS  (cont'd)


Z,  (i)         Z,  (x.,y.) — i.e.,  observed value of wind component k

               at  station  i at time t

x_              a 23 x  1 vector involving functions of x and y

J3,              a 23 x  1 vector of unknown parameters associated with
               wind component k at time t

e,   or         the deviation, at the point (x,y), of wind component
   /   N       k at time t from an assumed model of the form x ' 3,
ekt(x,y)                                                      -- kt


x^             a 13 x  1 vector consisting of the first 13 elements
^~\J               ,-
               Of  X_

J3^,            a 13 x  1 vector of unknown parameters associated with
               wind component k at time t

e .  or        the deviation, at the point (x,y), of wind component
     ,   ..      k at time t from an assumed model of the form xJIJS,.,
£Okt(X'y'                                                     -0-Okt


Z.,              a vector containing the Z  (i) ,  i=l,2,... .

X*             a matrix for which the i   row consists of the x.'
               vector  evaluated at (x.,y.)

Q              a arbitrary subset of stations

F              a network consisting of all (reporting) stations

R              a subset of stations consisting of all (reporting)
               stations in the RTI network

n  (Q)          the number  of observations (i.e., reporting stations)
               at  time t in the network Q, where Q = R or F
Pt  (J>Q)        the number of terms in the model for wind component
                k at time t when modeling procedure j is applied to
                data in network Q,  where Q = R or F

                3. vector of p,  (j,Q) estimated parameters for compo-
                             i£t
                nent k at time t, obtained by applying modeling pro-
                cedure j to network Q, where Q = R or F

x.,             a vector obtained by retaining those elements of x_
— ]kt
               which correspond to the 3.' elements
X.,             a  matrix for which the i   row consists of the x_'
 -1              vector elevated at (x.,y.)                      J
                                xiii

-------
LIST OF SYMBOLS (cont'd)

A                                           i.'L
Z,  (j,Q,i)     the predicted value of  the  k   wind component at
               time t at the point (x.,y,)  when modeling procedure

               j is applied to data  in network Q,  where Q = R or F
r*
e-  (j,F,i)     the deviation between the observed  wind component,
 Kt                                                  A
               Z  (i), and the predicted component,  Zfc (j,F,i)

 2
s,  (j,Q)       the residual variance for component k from the model
               based on procedure j  applied to wind data from net-
               work Q (Q = R or F) at  time t
 2
R,, (J»Q)       the proportion of the total variation in wind compo-
               nent k at time t  (over  network  Q) accounted for by
               the model resulting from modeling procedure j when it
               is applied to network Q (Q  = R  or F),  i.e., an
                2
               R  statistic

 2                           22
A^ (j,Q)       the adjusted R  statistic based on  Rkt(J»Q)

C              an arbitrary subset of  cases (i.e., t  values)

 2
S1P(J»Q)       the pooled residual variance over C,  obtained as a
 kC                                     fy
               weighted average of the si,t(J>Q)  values
 2                         2
^T-r-O >Q)       the pooled R  statistic over C, i.e.,  the proportion
               of the total within-case variation  in  wind component
               k accounted for by applying modeling procedure j to
               data from network Q CQ  = R  or F)

 2                                   22
Av.r(j »Q)       the pooled adjusted R  statistic based on R,  (j,Q)


Npn            the number of wind observations in  the intersection
  4            of C and Q

W (i)          observed wind speed at  (x.,y.)  at time t

6 (i)          observed wind direction at  (x ,y.)  at  time t


W (j,Q,i)      predicted wind speed  at (x.,y.) at  time t, based on

               applying modeling procedure j to data from network Q
               (Q = R or F)
A
6 (J>Q>i)      predicted wind direction at (x.,y.) at time t, based

               on applying modeling  procedure  j to data from network
               Q (U = R or F)

s(j)           the square root of s^(j,R)  + siL(j,R), where C con-
                                   J.U          ^<->
               sists of all cases
                                xiv

-------
LIST OF SYMBOLS  (.cont'd)
r(j)           the pooled vector root mean  square  error  associated
               with procedure j — pooled over both wind components,
               all cases, and all stations  not in  the  RTI network

r*(j)          same as r(j) but over all interior  stations not in
               the RTI network
                                            2           2
f (j)          a weighted average of  [r(j)]   and  [s(j)]  ,  where a
               is the weight attached to the  former
                                            2           2
g (j)          a weighted average of  [r*(j)]  and  [s(j)j  , where
               a is the weight attached to  the former
                                xv

-------
                         ACKNOWLEDGMENTS









     This report was prepared by the Research Triangle Institute (RTI),




Research Triangle Park, North Carolina, under contract No. 68-03-2187




for the U.S. Environmental Protection Agency (EPA).  The project officer




was Dr. James L. McElroy.  Many individuals from RTI participated in




this project.  Mr. J. W. Dunn was responsible for developing the com-




puter algorithm for processing and reducing the wind data.  Mr. Bobby




Crissman was responsible for the initial data reduction.  Mr. Clifford




Decker was responsible for management of the field program.




     We would also like to acknowledge the cooperation of Mr. Robert




Browning of EPA, Research Triangle Park, North Carolina, for providing




us with the Regional Air Pollution Study (RAPS)  data; and Mr. Ashwin




Gajjar, St. Louis County Air Pollution Control Agency, for providing




data from the St. Louis City and County air pollution stations.
                               xvi

-------
                             SECTION 1

                           INTRODUCTION



     This report provides an evaluation of one aspect of an overall

methodology for generating estimated pollution concentration surfaces

over an urban area.  This methodology, if successful, would avoid three

of the major problems typically encountered in estimating such surfaces

directly from observed air quality data; these problems occur because:

     (a)  reliable estimation (for a single pollutant) requires a high
          resolution network of air quality monitoring stations,

     (b)  "optimal" networks for two different pollutants would gen-
          erally be different because of different emission sources, and

     (,c)  an "optimal" network Cfor a single pollutant) remains "optimal"
          only in the short-term because of changes in the emission
          sources.

The proposed methodology has the potential of overcoming these problems

by utilizing the emissions source inventory as a primary source of data

and by establishing a network which is "optimal" for estimating wind

fields.  The model development phase of the proposed methodology, as

well as its implementation in the St. Louis, Missouri area, is described

by Vukovich et al. (1978).

     The following subsections provide a brief description of the over-

all concept, and summarize the statistical model form and sampling

network which resulted from applying the methodology in St. Louis.  The

specific objectives of this report are then described, along with a

description of the organization of the remainder of the report.

-------
OVERVIEW OF THE PROPOSED METHODOLOGY
     The proposed methodology involves six major steps:
          (1)  Utilize a three-dimensional hydrodynamic model to gene-
               rate simulated wind fields for the (urban) area under a
               variety of (.initial) meteorological conditions.
          (2)  Determine a class of statistical model forms relating
               winds to geographic location and topography which will
               yield a reasonable approximation to the simulated results
               for any of the initial conditions.
          (3)  Using the results of (.2) , determine an "optimal" set of
               sites for monitoring winds.
          (4)  Establish wind and air quality monitoring stations at the
               indicated sites.
          (5)  Estimate wind fields by fitting statistical models based
               on the class of forms determined in (2) to the observed
               data.
          (.6)  Utilize an objective variational analysis model to esti-
               mate pollutant concentrations over the area by combining
               the emissions source inventory, the observed pollutant
               concentrations, and the estimated wind fields.
With minor modifications resulting from practical and economic constraints,
the first four steps above have been completed for the St. Louis area;
the following section describes the class of statistical models and the
network established in the St. Louis area.


SUMMARY OF PREVIOUS RESULTS
     Consider an arbitrary point in the St. Louis region with coor-
dinates (x,y) relative to a fixed origin, where x denotes distance in
kilometers (km) in the east direction and y, in the north direction.
Let h = h(x,y) denote the elevation in meters (m) at  (x,y) relative to a
                                 -2-

-------
fixed base plane at river elevation of approximately  100 m.   Let




Zkt H Zkt^X'y^ denote the value in meters  per  second  (mps) of the  kfc



wind component (k=l for the west-east component, U; k=2 for  the south-



north component, V) at time t.  The model  form proposed by the Research



Triangle Institute (RTI) in Vukovich  et al. (1978),  which formed  the



basis for determining the sampling network, was




     zkt ' sj £okt + "out                                        (1)



where



      i   /,      22     332    2  4 4  .v
     x^ = (1 x y x  y  xyx  y  xyxy  x  y   h),



     j3_,  = a 13 x 1 vector of unknown parameters  for component k

            at time t, and



     en,  = £-.  (x,y) = random deviation in component k at time t  at
      UsCt    UiCC         -      .   /   \
                        the point (x,y).



     The proposed network, which was subsequently  established and  which



is herein referred to as the RTI network,  involves 19 stations (see



Figure 1).  Because of on-going data collection activities in the  local



area, it was only necessary for RTI to set up  three stations  for this



evaluation.  Sixteen existing stations were situated  in close proximity



to "optimal" locations established during  the  theoretical phase of the



study.  Table 1 shows the (x,y) coordinates and elevations (h) of  the 19



stations in the RTI network.  Four of the  19 stations in the  network are



St. Louis city/county stations (denoted by the STL prefix in  the station



names), twelve are Regional Air Pollution  Study (RAPS) Stations of the



United States Environmental. Protection Agency  (denoted by the EPA



prefix in the station names), and three stations  (denoted by  the RTI



prefix) were temporary stations set up by  RTI  specifically for this re-



search project.  The RTI stations were located on  the grounds of Incar-



nate Word Academy in northwest St. Louis county; on the grounds of Ken-
                                -3-

-------
          TABLE 1.   GEOGRAPHIC LOCATIONS AND TERRAIN ELEVATIONS FOR
                    STATIONS IN THE RTI NETWORK*

Station
Name
STL008
RTI202
STL009
STL006
RTI205
STL002
RTI207
EPA101
EPA102
EPA104
EPA105
EPA106
EPA108
EPA109
EPA110
EPA113
EPA118
EPA119
EPA120
X
(km)
0.
- 4.
- 7.
-20.
- 6.
0.
10.
6.
5.
9.
4.
1.
10.
18.
9.
0.
5.
_ g
-16.
y
(km)
16.
8.
2.
- 3.
- 6.
-10.
-10.
1.
7.
_ o
- 3.
- 1.
12.
0.
- 6.
10.
-16.
- 8.
8.
h
(m)
45.
79.
44.
46.
37.
12.
11.
24.
5.
13.
50.
36.
9.
13.
6.
55.
28.
56.
37.

          *  Locations are defined relative to an origin at the inter-
             section of Lindell Blvd. and King's Highway in St. Louis.
             Elevations are defined relative to a local river elevation
             of approximately 100 m.

rick Seminary in southwest St. Louis county; and on the grounds of the

East Side Sanitary District's South Pumping Station in East St. Louis,

Illinois.

     The major emphasis of the second phase of the research project

involved the preparation and execution of a summer and winter field

program in St. Louis.  These field programs were held during a period

when EPA was performing an intensive study in St. Louis:  August, 1975,

and February and March, 1976.  During this time, there was a concerted

effort to maintain a high level of performance of the RAPS stations.

-------
OBJECTIVES AND SCOPE OF THE CURRENT RESEARCH
     The scope of the current effort is limited  to an analysis  of  the
wind data which were obtained during the summer  and winter field pro-
grams.  These data consisted of  the horizontal wind components  as
measured at various time  intervals and at 27 sites within the St.  Louis
region.  These 27 sites included the 19 stations involved in the RTI
network.  The objectives  of the  study are
     (1)  to develop an easily-automated estimation procedure,  based on
          the model form  in equation (1)> for generating estimated
          wind fields, and
     (.2)  to evaluate the performance of this procedure and of  the RTI
          network.
Thus,  in terms of the six major  steps involved in the methodology, this
phase  of the research involves a demonstration of step 5, and an evalua-
tion of the overall methodology  up through step  5.
     Section 2 describes  the available data, its limitations, and  the
editing procedures employed in preparing the data for analysis.  The
analytical approach is described in Section 3 and the results are
summarized in Section 4.  Section 5 presents the conclusions, findings,
recommendations, and analytical  limitations of the study.


OBJECTIVES AND SCOPE OF REMAINING RESEARCH
     Assuming validation  of the  procedures and network for making  wind
field  predictions, the next step in the research project will involve an
evaluation of the objective variational analysis model (OVAM) used to
derive the estimated air  pollution distribution. The OVAM uses the
estimated wind field as an input parameter, along with the emissions
                                 -5-

-------
inventory and the air pollution concentrations as measured at the net-




work stations.  Carbon monoxide (CO) will be used in the evaluation.




     The evaluation of the OVAM will be made on a case study basis, with




each case study covering a 12- to 24-hour period.   The selected case




studies will be chosen so as to represent a variety of wind conditions




(speeds, directions) and of CO concentration distributions over the




monitoring stations.  The basic evaluation parameters will consist of




correlations and root mean squared errors between observed and predicted




CO concentrations at stations outside of the RTI network.  As a part of




this study, it will be determined if it is necessary to monitor CO at




each of the 19 network stations.
                                -6-

-------
                             SECTION 2

                  SUMMARY OF AVAILABLE WIND DATA



DESCRIPTION OF RAW DATA

     In addition to stations in the RTI network, eight additional sta-

tions provided data.  Coordinates and elevations of these stations are

shown in Table 2.

          TABLE 2.  GEOGRAPHIC LOCATIONS AND TERRAIN
                    ELEVATIONS FOR STATIONS NOT IN THE
                    RTI NETWORK*

Station
Name
STL003
STL004
STL007
STL010
EPA103
EPA107
EPA111
EPA112
X
(km)
2
-2
-10
-8
10
2
1
-4
y
(km)
6
-1
10
-12
3
3
-7
2
h
Is!
39
39
81
62
16
44
19
44

          *  Locations are defined relative to an origin
          at the intersection of Lindell Blvd. and King's
          Highway in St. Louis.  Elevations are defined
          relative to a local river elevation of approxi-
          mately 100 m.

The total set of 27 stations, whose locations are shown in Figure 1,

will be referred to as the full network; the above set of eight stations

will be referred to as non-network stations (meaning non-RTI-network

stations).  Stations STL007, STL010, and EPA103 will be referred to as

outer-non-network stations, since they are located on the border of the

innermost grid  (see Figure 1), whereas the remaining five non-network

stations will be called inner-non-network stations.  These two sets of

non-network stations are distinguished because it was shown in the first

report in this  series (Vukovich, et al., 1978) that, if wind data from
                                 -7-

-------
Figure 1.  Location of stations in the RTI  network (solid dots) and
           other non-network stations (open dots)  used in the evaluation
           (interior  grid spacing  =  1 km)
                                 -8-

-------
the RTI network were used to produce predictions at the non-network




stations, considerably better predictions should be achieved for inner-




non-network stations than for outer-non-network stations.




     The raw wind data consisted of 1-minute and 5-minute average




values from the EPA stations, 3-minute average values from the St. Louis




(STL) city/county stations and 5-minute average values from the RTI




stations.  Five-point averages centered at each half-hour were




constructed.  The nominal 20- to 25-minute averaging period is consist-




ent with the averaging performed in the hydrodynamic model, which pro-




duced the simulated wind fields upon which the RTI network was based.




For the EPA and RTI stations, these averages were computed from the




5-minute averages (for the U and V components, respectively) as
and
where the subscripts indicate deviations in minutes between the nominal




(hour or half hour) time point and the midpoint of the averaging inter-




val for the raw data.  For the city/county stations,
and
In either case, at least  three of the five readings were required to be




present in order for an average wind to be used.




     Winds at the three RTI stations and at all of the St. Louis city/




county stations were measured at 10 m above ground level.  This was




also the case for three of the RAPS stations:  EPA108, EPA110, and




EPA118.  Measurements at  the remaining thirteen RAPS  stations, however,

-------
were made at 30 m above ground level.  The wind data from these



stations were therefore inappropriate for evaluating the methodology.



To alleviate this problem, a profile equation for the surface boundary



layer CEstoque and Bhumralkar, 1969) was used to generate estimated



10-m winds at these thirteen stations using the winds at the 30-m  level.



The estimated wind components at 10 m at a particular station and  time



were determined as:
     IP = UQ/3 + 2.5U*  [L10-L3()]
      ; - VQ/3 + 2.5V*
                                                                  C2)
where
     U' and V1 are, respectively, the west-east and south-north compo-



nents of the wind velocity at the 10-m level;



     U  and V  are, respectively, the west-east and south-north compo-
      o      o


nents of the wind velocity at the 30-m level;
     L  = An
      e
"e+T 1
 	o  .

  T
L  o J
     T  is the mean roughness length associated with the particular

        site;


      e is the elevation (m) above ground level; and


     U* and V* are, respectively, the west-east and south-north compo-

        nents of the friction velocity.



The U* component was determined as




     u* = FA + B|DO| + cu^J  sign(uo)                            C3)




where the coeffients A, B,  and C were based on data relating the  mean



wind speed to the friction  velocity (J.I. Clarke, EPA-RTP, personal



communique, 1978).   A similar formula  was  used for  the V*-component.



     The coefficients in Eq. (3) were determined separately for each



season (summer and winter)  and for each of three types of stations
                                -10-

-------
(urban, suburban, rural).  They are based on comparative analyses be-



tween measured turbulence parameters and wind speeds that were performed



at numerous RAPS stations and were consolidated for the purposes of this



study.  The analyses were performed by, and the results acquired from,



the U.S. Environmental Protection Agency.  Values of the coefficients



are shown in Table 3 below:



          TABLE 3.  COEFFICIENTS USED FOR DETERMINING

                    FRICTIONAL VELOCITY COMPONENTS

Coefficient
Season
Summer


Winter


Region
Urban
Suburban
Rural
Urban
Suburban
Rural
A
-0.04591
-0.05006
-0.01640
-0.07601
-0.04947
0.02616
B
0.18763
0.13023
0,05419
0.16372
0.12742
0.02902
C
-0.01036
-0.00212
0.00102
-0.00469
-0.00275
0.00243

Originally, coefficients were also determined as a function of stabi-



lity.  However, the values obtained were judged to be sufficiently



similar so that such additional differentiation was unnecessary.



     Table 4 indicates the type of each station and its mean roughness



length (.T ), as used in the above conversion formulae.  The roughness



lengths were determined using the technique developed by Lettau (1969),



with parameters developed specifically for St. Louis  (Vukovich.et  al.,



1976).


     The estimated U' and V values determined for the 13 RAPS stations
                    o      o


from equations (2) and (3), along with the observed U  and V  values  for



the other 14 stations, constituted the basic wind data upon which the



evaluations were performed.
                               -11-

-------
               TABLE 4.   MEAN ROUGHNESS LENGTHS (T ) ,
                         BY STATION               °

Type
Urban



Suburban






Rural

Station
EPA101
EPA104
EPA106
EPA107
EPA102
EPA103
EPA105
EPA111
EPA112
EPA113
EPA119
EPA109
EPA120
To(»)
0.72
0.39
1.08
1.32
0.20
0.20
0.60
0.24
0.48
0.66
0.66
0.20
0.45

DATA EDITING

     For those time points (cases) in which only a few of the 19 RTI-

network stations provided data, the estimation of wind fields would be

quite tenuous; furthermore, evaluation of the performance of the network

for providing good predictions would be unrealistic in such cases.

Consequently, as a first step in preparing the data for analysis, all

cases in which more than one-third of the RTI-network stations failed to

furnish wind data were deleted from further consideration.  With this

requirement imposed on the data set — namely, that data be available

for at least 13 stations in the RTI network — there were 260 cases

available from the summer field program and 654 from the winter field

program.

     A manual screening of these data was then performed.  Inconsisten-

cies in the city/county data relative to the remaining data in the first

six summer cases, which were scattered across 11 days (July 29 to August

8, 1975), led to the exclusion of these six cases from the basic summer


                               -12-

-------
data  set.   These inconsistencies were apparently the result of calibra-

tion  problems.   Out of the remaining 254 summertime cases,  the manual

screening  resulted in the deletion of

      (a)   all wind data from EPA102, which appeared highly  inconsistent

           with  data at nearby stations,*

      (b)   three extremely peculiar wind values at other stations,  and

      (c)   forty-three consecutive observations for EPA120 and eight for

           RTI205 in which instrument failures were apparently respon-
           sible for producing zero values for both wind components.

Among the  254 summer cases, no data were available for two  stations:

STL010 and EPA111.

      A similar  editing of the winter data resulted in the exclusion of

all wind data from STL010, and partial exclusion of data from six  other

stations.   Counts of these exclusions, which also resulted  from instru-

ment  failures,  are shown below:

                    Initial No.  of      No. of Cases        No. of Cases
    Station         Reported Cases        Deleted             Retained
RTI202
RTI205
RTI207
STL008
STL007
STL010
EPA107
454
457
549
639
639
500
616
184
7
4
550
543
500
3
270
450
545
89
96
0
613
     The final edited data sets covered an 8-day period in August 1975

(August 9-16) and a 25-day period in the winter of 1976 (February 10 -

March 5).   Out of the 8 x 48 = 384 potential cases which could have

occurred during the 8-day summer field program, only 254 were actually
*    Major repairs were performed on  the wind monitoring equipment at

EPA102 between  the times of  the  summer  field program and the subsequent

winter program.


                               -13-

-------
retained after all editing; only 654 wintertime cases were available,




out of a possible 25 x 48 = 1200.  Thus, the field programs not only




were of short duration (especially the summer program) but also failed




to provide "sufficient" data in many cases.  This is depicted in the




left-hand portion of Table 5, which shows the distribution of available




cases, by date and time-of-day.  It is clear from this table that a




large degree of clustering of cases within time periods occurred.




Consequently, increasing the number of cases by constructing three




rather than two 20-minute averages per hour would not have enhanced the




data base in terms of its coverage of additional wind conditions (see




Table 8).




     Unfortunately, a substantial amount of missing data occurred even




within the 908 cases for which 13 or more stations in the RTI network




reported data (because of the above-described editing, or because the




data were simply not reported).  For instance, the full set of 19 sta-




tions in the RTI network furnished coincident data in only 14 of the 908




cases; these cases all occurred during a 2-day period within the win-




ter field program, as shown in the right-hand portion of Table 5.  Eigh-




teen or more stations in the network reported in only 105 out of the 908




cases.




     The high incidence of missing data was not confined to stations in




the RTI network, as evidenced in Table 6.  Two of the non-network sta-




tions, STL007 and EPA111, had particularly low reporting rates (after




editing).  Assuming 8 full days for the summer program and 25 full days




for the winter program, the reporting rates in terms of individual




observations were as shown in Table 7.
                               -14-

-------
t_n
         TABLE 5.  DISTRIBUTION OF CASES,  BY DATE AND TIME-OF-DAY AND  BY DATE AND NUMBER OF RTI  NETWORK
                    STATIONS  REPORTING
                                                                                   No. of Reporting
                                  Time-of-Day  Reporting              Total          Stations in RTI  Network
Date
75/08/09
10
11
12
13
14
15
i f.
-LO
76/02/10

12
i ^
X J
14
15
16
17
18
19
20
21
22
X. £.
23
24
25
26
27
28
29
76/03/01
02
03
04
05
Summer Total
Winter Total
0000-0530
6
12
12
12
12
5
12


10
n
u
7
f.
o
0
10
4
0
12
11
6
3
n
\J
3
12
11
11
12
1
12
2
11
12
0
12
71
168
0600-1130
6
12
12
12
9
5
11


11
n
u
8

7
12
12
3
4
1
6
9
i
A.
11
5
11
1
12
1
12
4
12
1
0
2
67
146
1200-1730
0
11
12
12
6
9
1


10
9
£.
12

12
12
12
7
3
0
4
5
a
U
9
2
12
1
12
12
10
12
12
0
11
0
51
180
1800-2330
11
10
12
12
7
9
0


0
0
J
7

8
1
3
9
11
2
0
0
n
\J
12
10
11
11
12
12
12
12
12
0
12
0
65
160

23
45
48
48
34
28
24


31
c
J
34

27
35
31
19
30
14
16
17
q
y
35
29
45
24
48
26
46
30
47
13
23
14
254
654
12 13
__ 	
—
11
— —
—
—
—


	 	

	 	
n

5
— —
—
1
—
—
1

— —
5
1
I
—
—
3
—
2
1
1 —
— —
0 11
3 19
14
	
—
13
—
1
—
1


——

	


2
—
—
3
3
—
4

3
4
3
—
1
—
1
1
4
—
1
1
15
36
15
23
45
11
3
8
—
—
/,

	

3

1
7
2
1
5
3
—
1

4
18
12
—
4
—
7
2
2
—
7
2
94
81
16
	
—
13
42
12
1
3


11

17

4
10
7
2
8
2
6
4

4
2
19
4
4
5
19
7
14
1
5
1
71
56
17
	
—
—
3
2
7
3


20
5
14

22
11
22
16
13
6
6
7

20
—
10
15
24
12
14
20
19
11
9
6
15
302
18
—
—
—
—
11
20
17


	

	

	
—
—
—
—
—
4
—

4
	
—
4
10
—
2
—
6
	
—
4
48
43
19
—
—
—
—
—
—
—


	

	

	
—
—
—
—
—
—
—

^_
	
	
	
5
9
	
—
_ 	
	
	
—
0
14
            Overall Total   239
213
231
225
908
30   51  175  227   317   91   14

-------
TABLE 6.  NUMBER OF CASES FOR WHICH VALID WIND DATA ARE REPORTED, BY
          STATION
                                    No. of Cases
     Network   Station
     RTI:
STL008
RTI202
STL009
STL006
RTI205
STL002
RTI207
EPA101
EPA102
EPA104
EPA105
EPA106
EPA108
EPA109
EPA110
EPA113
EPA118
EPA119
EPA120
Summer

  254
  223
  254
  254
  242
  253
  250
  116
    0
  240
  243
  254
  252
  253
  248
  245
  164
   70
  203
Winter
   89
  270
  639
  639
  450
  639
  545
  618
  641
  639
  647
  638
  645
  514
  530
  636
  635
  628
  630
Total

 343
 493
 893
 893
 692
 892
 795
 734
 641
 879
 890
 892
 897
 767
 778
 881
 799
 698
 833
     Non-RTI:
Outer:

Inner :




STL007
EPA103
STL003
STL004
EPA107
EPA111
EPA112
254
253
254
254
231
0
237
96
580
638
638
613
596
624
350
833
892
892
844
596
861

NOTE:  Station STL010, in the outer non-network, is omitted because no
       "valid" data were reported from this station.
                               -16-

-------
     TABLE 7.   NUMBER OF AVAILABLE AND POTENTIAL OBSERVATIONS,
               BY NETWORK AND SEASON +
Subset of No. of
Stations Stations
RTI Network
Non-Network*
Outer*
Inner
Full Network
19
8
3
5
27
Summer
No.
Obs.
4018
1483
507
976
5501
Field Program
Potential
No. Obs.
7296
3072
1152
1920
10368
Rate
(%)
55.1
48.3
44.0
50.8
53.0
Winter
No.
Obs.
10672
3785
676
3109
14457
Field Program
Potential
No. Obs.
22800
9600
3600
6000
32400
Rate
(%)
46.8
39.4
18.8
51.8
44.6

+    Potential no. observations = no. stations x 48 cases per day x no.
     days

*    For completeness, STL010 is counted as a potential station, although
     no valid wind data were obtained from this station.


     It is clear  that the limited time span, the limited number of non-

network stations, and the large amount of missing data impose some

severe limitations on the model and network evaluations.  Fortunately,

analysis of historical, seasonal wind roses for the National Weather

Service Station at Lambert Air Field in St. Louis showed that the winds

which occurred during the 8-day summer field program were typical of the

wind conditions which are prevalent in St. Louis in the summer (i.e.,

predominantly south to southwest winds of low velocity).  The distribu-

tion of prevailing wind speeds/directions for these 254 cases is shown

in the upper portion of Table 8.  The prevailing wind speeds and direc-

tions for a particular case were based on the average wind vector over

the following outlying stations:  STL006, STL007, STL008, EPA118, EPA119,

EPA120.  This definition is maintained throughout this  report.  On the

other hand, the winter data also showed a predominant southerly wind  (of
                               -17-

-------
TABLE 8.  DISTRIBUTION OF CASES, BY SEASON AND PREVAILING WIND CONDITIONS
Season

Summer




Prevailing
Speed
(mps)
0-1
1-2
2-3
3-4
4-5
Prevailing Direction
N NE E
i «•._ ««
— — —
— — —
— — —
—
SE
7
2
9
1
—
S
24
60
35
3
—
SW
14
46
19
1
—
W
8
7
7
2
1
NW
1
6
—
—
—
Total
55
121
70
7
1
          Total
19  122
80   25
                                                                       254
Winter
0-1
1-2
2-3
3-4
4-5
5-6
6-7
7-8
8-9
__ _ _ i
i __ __
— — —
1
— — —
— — —
— — —
— — —
—
3
4
18
32
11
3
—
—
—
3
28
47
114
114
40
11
4
4
4
8
19
36
33
28
13
4
—
1
7
5
10
18
18
6
4
—
—
1
—
—
—
—
—
—
—
12
49
89
193
176
89
30
12
4
          Total
71  365   145   69
                   654
Combined
0-1
1-2
2-3
3-4
4-5
5-6
6-7
7-8
8-9
1
1
—
—
—
—
—
—
—
1
— —
— —
1
— —
— —
— —
— —
—
10
6
27
33
11
3
—
—
—
27
88
82
117
114
40
11
4
4
18
54
38
37
33
28
13
4
—
9
14
12
12
19
18
6
4
—
1
7
—
—
—
—
—
—
—
67
170
159
200
177
89
30
12
4
          Total
90  487   225
     94
908
     Prevailing wind speed and direction for a particular case is based
     on the average wind vector over the following outlying stations:
     STL006, STL007, STL008, EPA118, EPA119, EPA120.
                                     -18-

-------
somewhat higher velocity), which contrasts with the wintertime pattern




of northerly winds typical of the St. Louis area.  Thus the model and




network evaluation is also limited  in terms of the types of cases covered,




     An overall summary  of the  observed,  edited data is given in Table




9, which shows sample sizes  (N) and the means and standard deviations




(s.d.) of the wind components (denoted by U and V) and wind speeds  (de-




noted by W) for each station.   Some care must be exercised in comparing




the means of two  or more stations,  since  the averages are not neces-




sarily taken over the same set  of cases due to the presence of missing




data.
                                -19-

-------
                                 TABLE 9.  SUMMARY STATISTICS BY STATION FOR
                                           OBSERVED WIND DATA OVER ALL CASES
O
J5.I Al ION
S LU06
R 1202
S L U 0 9
S LOUb
R I 2 U b
S L002
H 1207
EPA101
EPAJ.U2
EHA104
EH All) b
EHAlOb
EPA10H
EPA 10 9
EPA110
EPA115
EH Aliti
EPA119
EPA120
S'lLUU/
SIL010
EPA103
S 1 L 0 () 5
SILU04
EPA107
EPAill
EPA112
M
S'»3
'IS 3
ns3
oy3
6^2
BS2
7S5
741
ftMl
B79
«yo
8S2
697
767
77ft
pfil
7^9
6 9 a
833
39
3.100
3.189
4.559
3.787
3,2?7
2,570
4,233
3,046
2.969
3.080
4,496
2,890
2.305
2.014
o, n
4.522
2,226
3.095
2,756
3,532
2.680
W fit AN
(mps)
1.925
2.176
3.076"
3.133
2.496
3.725
3.718
3.846
b.527
4.667
3.610
3.411
b,12b
3.595
3.830
3.918
5.201
3.602
3.007
2.460
0.0
5.483
3.321
3.851
3.314
4.313
3.383
U S.I),
(mps)
2.081
1,641
1.423
2.001
1,222
1.802
2.056
1.837
2.669
2.231
U634
1,720
2,671
1.677
2.246
1.976
2.426
1.916
1.718
1.165
n.O
2.590
1 .839
2.230
1.479
2.189
1.793
V S.D.
Tmp¥)
0.926
1.252 	 	
1.953
1.941
1.562
1.782
2.206
1.659
2,274
2.189
1.778
1.386
2.968
1.881
1.904
1.685
2,580
1.730
1.464
1.287
0.0
... 2>437
1,632
1.776
1.4BO
1.905
1.564
W S . 0 .
Tmps)
1,874
1,369
1.689"
1.746
1,594
1.604
2,220
1.360
1.929
2.058
1,646
1,162
2,Q44
1.760
2.029
1.615
2,448
1.6U1
1,352
1.018
0.0
2.115
1,446
1.782
1.274
1.703
1.411

-------
                             SECTION 3

                       EVALUATION TECHNIQUES



    The development of the RTI network was based on the 13-term model

form CEq. (1)), as described in Vukovich  et al. (1978).  Based on the

simulated data, it appeared reasonable to assume that a model of this

order of variability would yield adequate wind field predictions in all,

or virtually all, cases which might occur within the St. Louis area—if

actual and simulated winds behave similarly.  It was also apparent from

the simulated data that this order of variability (i.e., the full model)

would not necessarily be required in all, or even in most, cases.  That

is, some simpler model would be sufficient in the majority of cases.

Because fitting the full model in a case in which a simpler submodel is

appropriate can substantially decrease the precision of a predicted

value, the modeling procedure not only must provide estimates of the

regression coefficients but also must establish, through some variable

selection technique, the form of the model.  It should be noted that a

proper evaluation of the theoretical phase results requires that wind

fields be estimated via some submodel of the model given in Eq. (.1) •

In an actual implementation of the technique, however, other surface-

fitting techniques could be utilized for estimating the winds over the

region of interest.

     There are many possible variable selection procedures; in general,

three basic steps are involved:

     CD  specification of a class of potential model forms from which
          the selection is to be made,

     (.2)  determination of a single "good" model form from within  this
          class, and

     (3)  estimation of the parameters of this model form.

                               -21-

-------
Step (1) was performed as a part of the network selection during the




theoretical phase of the study.  In the current context, the proposed




methodology can be considered successful in terms of_ selecting, a. model




form for estimating wind fields only if submodels of Eq. (1) can




provide "adequate" fits to the wind component data at any point in time.




In terms of both model and network selection, the methodology can be




considered successful for estimating wind fields if applying this model-




ing procedure to wind data from stations in the RTI network provides




accurate prediction of the winds over the region or, in practice, at




particular sites not in the RTI network.




     The first requirement for the evaluation is therefore to define the




wind field modeling procedure i.e., to define precisely this aspect of




the methodology.  For this evaluation, the definition must be compatible




with the results of the theoretical phase and must therefore utilize the




model of Eq. (1) as its basis.  The second requirement is a definition




of alternative modeling procedures against which this procedure can be




compared.  The development of these modeling procedures is described in




the subsection below.




     The next step in the evaluation is to define measures of model




"adequacy" for making the comparisons among the alternative procedures.




Finally, measures of accuracy for judging the success of the overall




methodology (excluding the air quality predictions) are needed.  These




two steps are described in the last two subsections of this section.




     As indicated in the previous section, the 908 cases available for




evaluating the methodology do not constitute a probability sample of




time intervals; consequently, it was not possible to make valid statis-




tical inferences to the population of wind conditions occurring in St.
                               -22-

-------
Louis during some given period  of  time  (e.g.,  one year).  On  the  other



hand, consideration of a  few  selected cases would also not appear to be



sufficient for evaluating the methodology.  Hence,  the basic  strategy



adopted for the evaluation involves  generating estimated wind fields for



all 908 cases, generating measures which reflect the precision and



accuracy of these estimates,  and then summarizing these measures  — in



terms of descriptive  statistics — over all cases and over various



subsets of cases.







SELECTION OF MODELING PROCEDURES



     The class of model forms indicated by the analysis of the simulated



wind data consists of all possible subsets of  the following twelve terms



Cas defined in Eq.  (D):



     ,       22        332244,,              ,,,
     (x, y, x  , y  , xy, x , y , x  y, xy , x ,  y , h)              (.4)



This assumes that a constant  or intercept term would be required  in any


                                 12
selected model.  Thus there are 2    = 4,096 possible model forms  (i.e.,



subsets of terms), which  range  in  complexity from a constant, one-term



model (corresponding  to the selection of no terms from  (4)) to a  full



13-term model corresponding to  the selection of all 12 terms  in  (4) .



     Many algorithms  can  be used for selecting variables; however, most



of these procedures provide the user with a list of candidate models.



The user must then apply  some additional criterion  in order to arrive at



a single model form.  This is regarded as a major advantage of these



techniques; in the present context,  however, such techniques  are  not



practical unless one  can  also automate  the additional criterion because



of the large number of cases.   For instance, in this study, the user of



such a technique would have to  examine  1,816 lists  of candidate models
                               -23-

-------
(908 cases x 2 wind components).  The burden on the user during an




actual implementation of the methodology would also be extreme if such




an approach were to be adopted.  Hence, one practical constraint on the




variable selection procedure to be used is that it be fully automated in




the sense that it incorporates its own stopping criterion and therefore




yields a single model for each individual case.  Even so, such an approach




cannot be advocated for general implementation unless, for selected




cases, one can carry out (a) an examination of residuals, and (b) a




comparison with alternative modeling procedures (such as all-possible




regressions).  Regardless of what procedure might be implemented, it




would also be essential that screening the wind data for erroneous




values precede the model fitting.




     Three sequential variable selection techniques which meet the above




described constraint are the forward selection technique, the backward




elimination technique, and the stepwise technique.  Draper and Smith




(1966) and Barr et al., (1976), for example, provide descriptions of




these techniques and their relative merits.  The stepwise procedure is




generally considered to be superior to either of the other techniques.




Also, the backward elimination technique, which successively deletes




terms from an assumed larger, "full" model, would encounter estimability




problems when the number of model terms exceeded the number of reporting




stations.  Hence, the stepwise regression approach was selected for use




in the evaluations.




     This procedure requires the use of two parameters referred to as




the "inclusion" and "retention" parameters.  As with the forward selec-




tion approach, the stepwise procedure begins by finding the best 2-




variable model; this assumes an intercept is always included and is
                                    -24-

-------
counted as one of the variables.  Here, variable A is considered better

than variable B if its correlation with the dependent variable  (i.e.,

the observed wind component data) is higher, or more generally, if the

partial F-statistic associated with variable A is larger than that for

variable B.  The variable with the largest F-statistic is retained in
                                                                      «•
the model if the significance probability associated with the F-statis-

tic is less than the "retention" parameter.  If so, partial F-statistics

associated with the remaining independent variables are computed and

their significance probabilities are compared with the "inclusion"

parameter; the variable with the smallest significance probability is

added if this probability is less than the "inclusion" parameter.  After

such, a variable is added, partial F-statistics are computed for all

variables currently in the model to determine if any variable should be

deleted from the model.  A previously included variable is dropped if

its associated significance probability exceeds the "retention" parame-

ter.  After any such deletions have been made, the F-values for the

remaining variables are again determined to see if any meet the inclu-

sion criterion.  This process is continued until no variable can meet

the inclusion criterion or until deletion of the last included variable


occurs.

     Two pairs of inclusion and retention parameters were used:

          Modeling         Inclusion         Retention
          Procedure        Parameter         Parameter

              1              0.10              0.10
              2              0.20              0.20

These values were chosen, as opposed to smaller values, because of the

small effects expected for many of the candidate terms and because of

the small sample sizes — namely, about 15 stations per case for the RTI
                                -25-

-------
network.  In such situations, the use of smaller parameter values is


generally not recommended because the derived models will tend to omit


one or more "good" predictors.


     Part of the evaluation procedure is thus a determination of which


of these procedures is the more appropriate.  Obviously, procedure 2


generates larger (i.e., more terms) models than does procedure 1; also,

                             2
procedure 2 produces larger R  statistics (the square of the multiple


correlation coefficient) and smaller residual sums of squares than pro-


cedure 1 achieves.  However, procedure 1 may produce better predictions


if procedure 2 tends to select "too many" terms.


     Another key question to be addressed in the evaluation involves the


choice of the initial class of model forms.  For instance, is there


another class of model forms which contains models that would provide


substantially better approximations to the wind fields in the St. Louis


area?  Obviously, this aspect of the evaluation can be carried out only


to a limited degree since there are an infinite number of possible model


forms which could be investigated.  The problem of evaluation is com-


pounded by the fact that many cases are involved.  In order to provide


some evaluation of this potential source of error in the methodology,


several other modeling procedures are considered.  Whereas procedures 1


and 2 above are consistent with the proposed methodology, these addi-


tional procedures, in one way or another, are inconsistent with it.


Hence, if performance of one of these additional procedures was judged


to be substantially superior to procedures 1 and 2, it would indicate a


deficiency in the proposed methodology.  On the other hand, "good"


performance by procedure 1 or 2 relative to the additional procedures


would tend to support this aspect of the methodology but would not, of
                               -26-

-------
course, provide absolute proof of  it because  of  the  limitations involved

in the evaluation.

     The four additional modeling  procedures  used for the evaluation are

defined below:

     Modeling
     Procedure                          Description

         0          Fit the  full 13-term model by ordinary least
                    squares.

         3          Apply  stepwise regression to a larger class of model
                    forms, utilizing the same "inclusion" and "retention"
                    parameters as  used  for procedure 1.

         4          Same as  procedure 3, but  using the parameters of
                    procedure 2 rather  than those of procedure 1.

         5          Fit a  flat surface  (i.e., a  one-term model involving
                    the constant term)  by ordinary least squares.

As with procedures  1  and 2,  the above procedures are applied on a case-

by-case basis for each horizontal  wind  component.

     Procedures 0 and 5 represent  the extremes of the previously-defined

class  of model forms  used  in procedures 1 and 2.  These two procedures

are not considered  likely  candidates for modeling winds, but are defined

here because summary  statistics based on these procedures are used for

comparative purposes  in the  evaluation.

     Procedures 3 and 4 differ from procedures 1 and 2, respectively,

only in the choice  of initial terms from which a model is developed.

This initial class  of terms  for procedures 3  and 4 involves a total of

22 terms; in addition to all 12 terms shown in  (4),  the following 10

are also included:
x y
x5
3
xy
y
2 2
x y
6
x
xh
6
y
yh
h2
The basis for  selecting  these  additional  terms was  the  analysis of  the

simulated data, as described in Vukovich.,  et  al.  (1978).  This analysis
                               -27-

-------
indicated that such terms, while less important than the 12-term  set,



were nevertheless useful for explaining some of the variation  in  some  of



the simulated cases.  The class of models based on the 22-term set

          o 2
contains 2   potential model forms; hence, this class is 1,024 times



larger than the class based on 12 terms.  Because the 12-term  set is a



subset of the 22-term set, it is clear that models based on procedure  3


                                                            2
(or 4) will generally explain more variation (i.e., larger R   values and



smaller residual sums of squares) than procedure 1 (or 2).  However, in



terms of accuracy of predictions, models based on procedure 1  or  2 could



still be superior to models based on procedures 3 or 4.



     All six of the modeling procedures described above can be regarded



as six different techniques for selecting a subset of 23 terms which



consists of an intercept plus 22 specific terms.  Let x_ denote the



column vector of these 23 terms at an arbitrary location in the St. Louis



area; that is,


      i   ,,         22       332     2   4   4  u
     x1 = (1, x, y, x , y , xy, x , y , x y, xy , x , y , h,


           3     3556622,    ,,2,
          xy, xy,x,y,x,y,xy, xh, yh, h )             (5)



Let _§,  denote a 23 x 1 vector of unknown coefficients for wind com-



ponent k at time t.  The general model can therefore be expressed as




                  h £Vi-                                           (6)
where Zfct = Zkt(x,y) is the observed value of the kC  wind component at



(x,y) and time t, and e   is a random deviation in component k at time
                       K.L


t at the point (x,y).  Table 10 summarizes the six modeling procedures



with respect to this general model formulation.  As indicated in this



table, each procedure involves an assumption as to which coefficients



are negligible (i.e., which terms in the x vector are deleted).  Model



(6) reduces to model (1), for example, when the last ten terms of
                               -28-

-------
(5) are assumed negligible.  Also, procedures 1 through 4 may deter-

mine, on the basis of  the statistical  tests involved in the stepwise

algorithm, that other  parameters can reasonably be assigned a zero

value.  In these cases,  the  selected model form will depend on what data

            TABLE 10.  SUMMARY  OF MODELING PROCEDURES*
Modeling Procedure
0
1
2
3
4
5
Coefficients assumed
to be non-zero:

Coefficients assumed
to be zero:

Coefficients which
may be  zero, as  deter-
mined by  stepwise
regression:

Stepwise  regression
parameters -
 1-13
14-23   14-23   14-23   none
                                none
                        2-23
 none
         2-13
2-13
2-23   2-23
                                         none
Inclusion:
Retention:
N/A
N/A
0.1
0.1
0.2
0.2
0.1
0.1
0.2
0.2
N/A
N/A

      *   Term numbers  appearing  as  tabular  entries assume that terms
         are ordered as in definition (5)

 set  is  utilized,  e.g., the full network or the RTI network.  Once the

 model form has  been determined, ordinary least squares  is used for

 estimating the  parameters.



 CRITERIA FOR EVALUATING MODELING PROCEDURES

      Each of the  modeling procedures is applied,  at  a given point in

 time, to two sets of  wind data—the data from stations  in  the RTI

 network and the data  from all stations (i.e., the full  network).  Thus,
                              S

 for  each case,  twelve estimated wind fields are  produced  (.2 networks x  6

 procedures),  as illustrated below:
                                -29-

-------
     Network Used for
Modeling Procedure
Model Estimation
RTI
Full
0


1


2


3


4


5


     For evaluating the modeling procedures, data  from the full network
 (F) are utilized to determine the model forms and  to  estimate parame-
 ters.  For each case  (t) and wind component  (k), six  models are there-
 fore estimated.
     Assume that there are n (F) stations in the full network which pro-
 vide "valid" wind data at time t.  Let p,  (j,F) denote the number  of
                                        iCC
 terms in the (selected) model when procedure j  is  applied  to  this  set of
            *
 data.  Let X  denote a matrix consisting of n (F)  rows  Cone row cor-
 responding to each reporting station) and 23 columns;  the  i   row  con-
 sists of the x.' vector (5) evaluated at the coordinates of  the  ±
 station.  Once the form of the model has been established,  the  least
 squares estimates are determined as

     4jkt = (XjktXjkt}   XjkAt

 where
     "(F)
     JLkt is the vector of Pkt(J»F) estimated coefficients  from proce-
          dure j, applied to the full network (F),
     Zfct is the vector of observed data, Z^O^.y^,  1=1,2,..., n  (F) ,
         and
     X    is a matrix obtained by deleting those columns (terms) of X*
          that are associated with zero regression coefficients, as
          indicated by the particular procedure  (see  Table 10).
At an arbitrary point (x,y) in the region, the six predicted  values of
 the wind component are obtained as
                               -30-

-------
     ,«   R(F)
     ijkt -Sjkt                                                    C7)
where xjkt consists  of  the  relevant model terms.   Hence,  if coordinates

of the i   station,  (x^y^,  are substituted into (7),  predicted

values for this  station are determined.   Let Z  (j,F,i) denote the
                                               KrL
predicted value  of the  k   wind component for case t and  station i, when

procedure j is applied  to  the full network (denoted by  F) .   The observed

wind component at station  i for case t is denoted by Z   (i),  i.e.,
                                                      let
Z, (i) = Z, (x.,y.).  For  each value of  k,  t,  and i, there  are six

deviations between observed and predicted values:

     ekt(j,F,i)  = Zkt(i) -  Zkt(j,F,i)     j=0,l,...,5.             (8)


These deviations form the  basis for evaluating the modeling procedures.

     It should be noted that the mean of these deviations is zero when

the average is taken over  stations in the full network  (denoted by ieF);

that is,

      £    e   (j,F,i) = 0  for all k, t,  and j.                   (.9)
     ieF    kt

In this same  situation, the residual sums of squares corresponds to the

sum of the squared deviations:


           £  e? (j,F,i) =  [n (F)  - p  (j ,F) ]sj (j ,F)             U-0)
          ieF kt             t       kt        kt
where
        n  (F) = number  of  stations  in network F providing valid data at
                time  t,

     P  (J>F) = number  of  terms  in  the selected model when procedure j
                is  applied to data  from network F,  for component k at
                time  t,  and
      2  ,.
          - ,rj =

                ponent  k at  time t.
 f\
s  (j,F) = the residual variance from the model based on procedure
 kt        j when it is applied to data from network F, for com-
                               -31-

-------
To simplify notation, let SSE(j) denote the sums of squared deviations


appearing in CIO) — for an arbitrary k and t.  Then, as previously


indicated, the following conditions must hold :


     SSE(O) < SSEC2) ^ SSE(l) 5 SSE(5)                           (H)


     SSE(4) < SSE(3) < SSE(5).                                   (12)


The following conditions also usually, though not necessarily, hold :


     SSE(3) £ SSECQ                                             (13)


     SSE(4) < SSE(2).                                            (14)


     For an individual case and wind component, typical ways for evalu-


ating the fits of various models are


     (a)  comparison of individual residuals


     (b)  comparison of frequency distributions of the residuals or of

          absolute values of residuals — or equivalently, the propor-

          tion of residuals less than some constant


     (c)  comparison of residual variances

                         2
     (d)  comparison of R  statistics

                                  2
     (e)  comparison of adjusted R  statistics.


The residual variances are defined in (10)  and can be rewritten as
     2
The R  statistics for a particular case are defined as


     R2 H F^ - SSE(5)-SSE(j)
     V3'^ "     SSE(5)   '                                   CL6)

              2
The adjusted R  statistics for a particular case are given by



      2               2               F)
                                         o ,n                    (17)


                              2
     It should be noted that R  statistics are highly dependent  on model


size in situations where the number of parameters is large relative  to


the number of observations.  The same is true, to a lesser degree, for
                               -32-

-------
the residual variance criterion.  The  adjusted R2 statistics avoid this


problem.


     The general strategy  for  comparing modeling procedures over cases


involves (a) computing  the above-described  statistics for each case and


summarizing the distributions  of  such  statistics over all cases or over


relevant subsets of  cases  or  (b)  computing  analogous statistics "pooled"


over all cases or  over  relevant subsets of  cases.  The subsets of pri-


mary interest are  the following:


           season  (i.e.,  the winter  or  summer  field program),


     -     prevailing wind  speed categories  (0-2 mps, 2-4 mps, 4-6 mps,
           >6 mps),


           prevailing wind  direction categories (E & SE, S, SW, other).


     The pooled residual variance over an arbitrary subset of cases


 (say, C) is defined  as
                                                                  (18)
Note  that  this is a weighted average of the  individual residual vari-

                                    2
ances.   The corresponding overall R 's  are obtained  as the proportions


                     SSE(5)-r  SSE(j)
£
teC
I
teC
nt(F)-Pkt(j

,F) sf (j,F)
' y kt J
J
"nt(F>-PktU,F)J
                         SSE(5)



                         2
The associated adjusted R 's are computed as
                      «


                      ieC
                                ieC
                                -33-

-------
     Because the above-described criteria are based on residuals which




result from fitting models to the full network data, they cannot provide




a thorough evaluation of the modeling procedures.  One option, for




example, for extending the evaluation would be to apply the modeling




procedures to various subsets of the full network obtained by deleting




one or several data points and to examine the distributions of residuals




occurring at the omitted points.  Except for one special case, such a




procedure was not employed because of the large number of cases involved.




The special case involved selecting the subset of stations to be the RTI




network.  Then, such a procedure leads to a joint evaluation of the




modeling procedures and the RTI network—that is, an overall evaluation




of the proposed methodology for estimating wind fields.  This is discussed




in the following subsection.









CRITERIA FOR EVALUATING THE RTI NETWORK




     The remaining six predicted wind fields—those based on data from




stations in the RTI network—are utilized for evaluating the performance




of the RTI network.  This network evaluation can be carried out only in




a limited sense.  In particular, most of the evaluative measures must be




judged in terms of absolute, rather than relative, units, since observed




data are available at only a limited number of sites.  That is, there is




little "feel" for how well some other network of comparable size might




have performed.  Although comparing the performance of the RTI network




with that of the full network is useful in terms of the overall wind




prediction capability of the network, it does not provide a separate




evaluation of the RTI network.  Rather, differences in the evaluative




criteria for the two networks generally represent measures which reflect
                               -34-

-------
both network and model differences.   Such differences do, however, yield
a limited comparative evaluation.
     The evaluative criteria  for  this aspect of  the evaluation are of
two basic types.  Both types  are  based on the deviations between ob-
served and predicted wind  component values, where  the predicted values
are determined by applying the modeling procedures to data from stations
in _the RTI network.
     The first type of criteria,  and  their formulation and properties,
are completely analogous to those described in the previous subsection.
These criteria are obtained by using  "R" to represent the RTI network
and substituting R for F into (7)  through (.20).  Such criteria do not,
of course, provide measures of the accuracy of the predictions but
simply characterize the predictions over the RTI network itself.
     Criteria of the  second type  do provide such measures of accuracy
and, consequently, are regarded as the more important type.  Let Q
denote some  subset of the  non-network stations.  For instance, Q might
represent
     (a)  the non-network,
     (b)  the inner non-network,
     (c)  the outer non-network,  or
     (d)  an individual station within the non-network.
Let C define subsets  of cases, as previously described.  The accuracy
measures are of  three basic types:
     (1)  means  of deviations, over C and Q,
     (2)  means  of squared deviations (or the  square root  thereof),  over
          C  and  Q, and
     (3)  frequency distributions of  deviations, over  C  and  Q.
     The mean deviations are  defined, for each modeling  procedure  (j)
and each wind component  (k),  as
                                -35-

-------
     N
         teC  ieQ               CQ  teC   ieQ
where N   = £  n (Q)  = the number  of  observed values  occurring  at  sta-
        ^   teC  t      tions  in the Q subset and within  the  set of
                        cases  C.

These means represent  average  biases over  the particular  subsets of

cases and non-network  stations.   The root  mean squared error  (RMSE)

criteria are determined as  the square  root of


      £    £  ;kt«'R'i)/NCQ-                                     (22)
     teC  ieQ

A mean squared error criterion for  the Q and C subsets which  encompasses

errors in both the U-  and V-components is  obtained by  summing (22)

over k.  This criterion, referred to as the vector mean square error,

represents an average  of the squared differences between  the  observed

and predicted wind vectors  in  terms of distances in the (U, V) plane.

The vector MSE can be  partitioned into two components which represent

the mean squared errors in  predicting  wind speeds and  in  predicting wind

directions:
     Vector MSE =                   [Z   (i)-Z
                  ^CQ  k=l  teC  i£Q  fct
                  f-  i'i  \i  4<» +   i
                   CQ  teC   ieQ  (k=l  Kt       l£l

                            2         ,         ]
                         -2£  Zkt(I) Zkt(j,R,i)


                             £  [w  (i)-W (j,R,i)1
                            teOL ^    C       J
     CQ   teC  ieQ

2
£   £ W  (i)W  (j,R,i) {l-cos[0  (i)-9  fj,R,i)]l (23)
:eC  ieQ                ^             c        '
             NCQ   tC  ieQ
                               -36-

-------
where W (i) and 9  (i) a^e  the  observed wind speed and direction,  respec-
               tively,  for case t at station i,  and
      /\              "•
      W (j,R,i) and  6  (j,R,i)  are the corresponding predicted values
               based on applying procedure j to  data from the RTI net-
               work.
The  first  component  in (23) is the MSB associated with predicting wind
speeds; the second term is the MSE associated with direction errors.
                                 -37-

-------
                             SECTION 4
                        EVALUATION RESULTS


     Although it would be desirable to have evaluative measures which
would isolate the effects of the modeling procedures from those of the
network, this is not really feasible—because evaluation of the proce-
dures is conditional on what network is used and because evaluation of
the network requires that a given model be employed across a number of
alternative networks.  Consequently, the evaluation is organized as fol-
lows:
     (1)  Comparing modeling procedures over the full network when data
          from the full network are used to establish model forms and
          parameter estimates.
     (2)  Comparing modeling procedures over the RTI network when data
          from the RTI network are used to establish the model forms and
          parameter estimates and comparing results to those of (1)
          above.
     (3)  Evaluating the accuracy of predicted results at non-network
          stations, when estimation has been carried out using data from
          the RTI network stations.
It should be noted that (1) and (2) above, in contrast to (3), are
basically concerned with precision.  Also, (1) deals with the criteria
discussed in the second subsection of Section 3, whereas (2) and (3)
deal with the two types of criteria discussed in the last subsection of
that section.
     Before discussing the results of these evaluations, it is useful to
characterize the modeling procedures involving stepwise regression in
terms of the model forms that resulted from applying the algorithms.
This is the purpose of the subsection below.
                               -38-

-------
SUMMARY OF SPECIFIC MODELS SELECTED  BY ALTERNATIVE APPROACHES




     Each of the four stepwise  regression  procedures  (procedures 1-4 of




Table 10) was applied to data from the RTI and  the full networks.  The




RLSTEP subroutine of the International Mathematical and Statistical




Libraries. Inc.  (1975) was utilized  to perform  the stepwise regressions.




Potentially, eight different model forms could  result  for each specific




case and wind component.  Table 11 characterizes  the models selected by




the four procedures in terms of the  frequency with which various size




models result.   This table demonstrates that models containing more than




four or five terms are rarely selected.  As expected,  models from proce-




dure 4 are larger than those from procedure 3;  similarly, models from




procedure 2 are  larger than those from procedure  1.  The pattern of




these distributions is similar  for both the U-  and V-components; however,




smaller size models are much more frequent for  the U-component.  When




the same modeling procedure is  applied to  the two different networks,




there is a tendency for  the full network cases  to yield slightly larger




models; this, of course,  is not surprising because of  the increase in




statistical power which  results from the larger number of stations used




in the full-network estimations.




     Table 11 also indicates  the number of times  (out  of 908 cases) that




flat-surface models are  chosen  (i.e., the  number  of 1-term, constant




models).  This  is summarized, in terms of  percentages, below:





     Percentage  of Cases  in Which One-Term Models are Selected




                    RTI  Network Data         Full Network Data
Procedure
1
2
3
4
U
36.2
15.1
23.3
8.6
V
15.2
5.5
10.1
3.3
U
30.4
11.2
20.5
4.8
V
10.2
4.0
7.8
2.5
                                -39-

-------
TABLE 11.  DISTRIBUTION OF CASES BY MODEL SIZE—FOR FOUR MODELING
           PROCEDURES APPLIED TO WIND COMPONENT DATA FROM STATIONS
           IN THE RTI NETWORK AND THE FULL NETWORK

Using Data From
RTI Network
Modeling Procedure
No. of
Terms in
Selected
Model
1
2
3
4
5
6
7
8
9
10
11
12
1
2
3
4
5
6
1
8
9
10
11
12
13
14
15
16
1


329
362
136
54
19
6
2
0
0
0
0
0
138
287
263
155
43
13
5
2
2
0
0
0
0
—
—
— —
2


137
273
223
119
86
33
21
6
6
3
1
0
50
127
238
193
149
79
35
25
8
4
0
0
0
—
—
—
3


257
328
190
81
24
14
9
2
2
1
0
0
92
234
277
172
82
33
12
5
1
0
0
0
0
0
0
0
4

U-Component
78
200
231
169
79
59
37
22
19
9
2
3
V- Component
30
74
167
184
160
112
76
40
32
21
4
5
0
1
1
1
Using Data From
Full Network
Modeling Procedure
1


276
348
161
64
48
7
4
0
0
0
0
0
93
237
318
162
62
27
6
2
1
0
0
0
0
—
—
—
2


102
231
233
144
110
45
23
15
1
4
0
0
36
84
187
233
173
122
48
19
5
1
0
0
0
—
—
—
3


186
302
227
112
52
16
7
2
3
1
0
0
71
189
267
220
98
49
9
3
2
0
0
0
0
0
0
0
4


44
148
243
193
112
72
43
29
12
5
6
1
23
59
155
190
175
133
94
45
19
12
0
1
2
0
0
0
                               -40-

-------
These percentages indicate  how frequently each procedure  yields  a model




like the procedure 5 model.   These  cases  are important  in that no varia-




tion is accounted for by  such models  (i.e.,  R2=0).




     Table 12 provides  pairwise comparisons  of the network/modeling pro-




cedures in terms of their model sizes and model forms.  The  two  methods




involved in a comparison  (denoted by  method  A and method  B in Table 12)




can differ in several ways,  as indicated  below:
Type of
Comparison
I
II
III
IV
V
VI
VII
Initial
Class of
Model Forms
Different
Same
Different
Same
Different
Same
Different
Stepwise
Regression
Parameters
Same
Different
Different
Same
Same
Different
Different
Network
Used as
Data Base
Same
Same
Same
Different
Different
Different
Different
 As  might be expected,  similar size and similar  form models  occur more




 frequently when the two methods being compared  are  more  alike — for




 example, types I,  II,  and IV as compared to type VII.  Table 13 shows




 the results of Table 12 relating to the similarity  of model forms  in




 terms of percentages.



      Out of the 908 cases,  the number of times  that each of the 23 model




 terms occurred in a selected model is shown in  Table  14. Because  many




 of  the potential terms are highly correlated,  the inclusion of a particu-




 lar term in a model is highly dependent on what other  terms are involved




 in  the model; also, there are likely to be many models with essentially




 the same predictive capability.  Thus the results of  Table  14 merely




 provide a descriptive summary of the selected models  and should be so




 interpreted.




                                -41-

-------
TABLE 12.   PAIRWISE COMPARISONS  OF MODELING PROCEDURES  IN TERMS  OF MODEL SIZES AND  MODEL  FORMS
U-Component
Number of Cases With:
Type of
Compar- Methods
ison A
I 1R
2R
IF
2F
II 1R
3R
IF
3F
III 1R
3R
IF
3F
IV 1R
2R
3R
4R
V 1R
2R
IF
2F
VI 1R
3R
IF
3F
VII 1R
3R
IF
3F
B
3R
4R
3F
4F
2R
4R
2F
4F
4R
2R
4F
2F
IF
2F
3F
4F
3F
4F
3R
4R
2F
4F
2R
4R
4F
2F
4R
2R
Method
A Model
Smaller
212
367
262
393
488
566
516
573
662
410
691
419
212
281
265
333
398
486
248
353
566
640
431
483
722
506
608
343
Method
B Model
Smaller
35
91
47
119
0
2*
0
0
14
95
14
93
112
162
140
215
77
128
186
226
22
41
36
64
18
96
44
170
Same
Model
Size
661
451
599
396
420
340
392
335
232
403
203
396
578
465
503
360
433
294
474
329
320
227
441
361
168
306
256
395
Same
Model
Form
515
263
422
192
420
336
388
329
172
258
125
217
511
337
405
223
278
104
322
119
262
154
365
256
94
166
146
194
V-Coraponent
Number of Cases With:
Method
A Model
Smaller
258
426
254
367
536
603
568
585
678
457
688
476
267
331
296
318
401
486
291
419
617
616
490
570
708
543
648
429
Method
B Model
Smaller
67
123
51
123
0
2*
0
0
14
97
19
74
156
224
206
314
126
198
229
256
45
70
80
69
42
112
62
174
Same
Model
Size
583
359
603
418
372
303
340
323
216
354
201
358
485
353
406
276
381
224
388
233
246
222
338
269
158
253
198
305
Same
Model
Form
401
167
412
187
369
296
335
308
122
186
121
172
390
225
265
119
221
67
204
52
177
116
250
157
76
80
87
137
    The notation  1R means modeling procedure 1 applied to the RTI network; similarly,  2F means procedure 2
    using data  from the full network of  stations.  See Table 10 for definitions of  modeling procedures.

    This number is not zero, because procedure 4R failed to meet round-off tolerances  in these cases and the
    "selected"  model from procedure 4R was defined to be the constant, one-term model.

-------
TABLE 13.  PERCENTAGE  OF  908  GASES  IN WHICH  NETWORK/MODELING
	PROCEDURES  RESULTED  IN THE SAME MODEL FORM	

                                 U-Component
Method 2R
1R 46.3
2R
3R
4R
IF
2F
3F
3R 4R
56.7 18.9
28.4 29.0
37.0




IF
56.3
40.2
35.5
16.1



2F
28.9
37.1
18.3
13.1
42.7


3F
30.6
21.4
44.6
28.2
46.5
23.9

4F
10.4
11.5
17.0
24.6
13.8
21.1
36.2
V- Component
Method
1R
2R
3R
4R
IF
2F
3F
2R 3R 4R IF
40.6 44.2 13.4 43.0
20.5 18.4 27.5
32.6 22.5
9.6



2F
19.5
24.8
8.8
5.7
36.9


3F
24.3
15.1
29.2
17.3
45.4
18.9

4F
8.4
7.4
12.8
13.1
13.3
20.6
33.9
                                -43-

-------
    TABLE 14.   NUMBER OF CASES  FOR WHICH SPECIFIC  MODEL TERMS ARE SELECTED,  BY WIND COMPONENT, MODELING

                PROCEDURE  AND NETWORK
-e-
.o
I
Models for
U-Component Predictions
Using Data From
RTI Network 	
Term
Number*
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Modeling
1
908
97
74
34
140
53
74
62
34
138
42
68
98










2
908
168
135
98
247
129
175
119
112
212
111
136
196










Procedure
3
908
75
64
28
123
40
27
29
42
117
8
38
71
48
38
78
38
38
35
125
54
41
53
4
908
132
106
76
195
116
74
67
112
166
28
74
142
112
95
149
65
83
62
220
109
126
133
Using Data From
Full Network
Modeling Procedure
1
908
121
88
75
151
77
97
58
48
107
62
99
130










2
908
192
156
171
249
172
193
115
133
188
129
162
221










3
908
105
91
56
114
57
40
22
61
94
12
37
89
63
45
89
37
51
47
168
70
63
58
4
908
158
144
119
152
125
99
50
132
151
38
62
150
129
98
169
60
100
96
244
126
156
136
Models for V-Component Predictions
Using Data From
RTI Network
Modeling Procedure
1
908
389
120
39
187
75
79
91
104
242
38
98
113










2
908
506
193
135
290
206
209
169
213
279
96
187
175










3
908
272
96
31
154
54
39
39
88
245
10
46
86
51
44
54
58
29
64
126
186
89
51
4
908
355
193
79
252
133
151
85
176
287
50
95
135
119
107
130
125
74
122
206
272
195
137
Using Data From
Full Network
Modeling Procedure
1
908
460
107
66
177
83
123
98
156
193
49
129
159










2
908
537
189
152
270
222
270
185
276
229
133
226
240










3
908
392
97
41
136
78
71
36
143
179
14
70
124
44
34
71
65
45
67
137
98
119
50
4
908
463
175
99
193
132
151
83
238
219
42
110
186
114
98
139
118
99
116
189
183
222
109
       *  Terms are assumed to be ordered as in definition (5) .

-------
COMPARISON OF ALTERNATIVE MODELING  PROCEDURES USING WIND DATA FROM ALL
STATIONS


     Overall analyses of the data from  the  summer  and winter field


programs are shown in Table 15.  For  these  analyses, all data from each


season are pooled together.  The total  sums of  squares among the 254


summertime cases and the 654 wintertime cases are  each partitioned into


a between-case and a within-case component.  The various modeling proce-


dures are applied on a case-by-case basis and can  therefore have no


effect on the between-case component  of variation.  The within-case


component, which corresponds to fitting a constant for each case (i.e.,


using modeling procedure 5), can then be partitioned into a "pooled


regression" and a "pooled residual" component for  each of the modeling


procedures.  Only the latter of these two components is actually shown


in Table 15.


     The results shown in Table 15  are  utilized to compute values  of the


"pooled" criteria described in the  second subsection of Section 3.


These results are presented in Table  16 for each of the two seasons.  As


expected, all of the stepwise procedures result in smaller pooled  resi-

                                              2
dual standard deviations and larger adjusted R  's  than either procedure


0 or procedure 5.  In terms of these  measures of overall precision,


modeling procedure 4 is clearly superior for both  wind components  and


both seasons.  Modeling procedure 2 yields  models  which achieve,, on


average, virtually the same precision as modeling  procedure 3.   How-


ever, it requires an average of about one more  term per case than  proce-


dure 3 requires.  On the average, procedure 1 involves fewer terms than


the other stepwise procedures and,  in terms of  the pooled precision


measures, appears the least favorable among the four stepwise proce-


dures .
                               -45-

-------
TABLE 15.  SUMMARY OF ANALYSIS OF VARIANCE RESULTS BASED ON ESTI-
           MATIONS FROM THE FULL NETWORK
Summer Field Program
Source of
Variation*
Total
Between Cases
Within Cases
Residual (0)
Residual (1)
Residual (2)
Residual (3)
Residual (4)
Source of
Variation*
Total
Between Cases
Within Cases
Residual (0)
Residual (1)
Residual (2)
Residual (3)
Residual (4)
Degrees
U
5500
253
5247
2199
4942
4628
4829
4490

Degrees
U
14456
653
13803
5955
12995
12341
12752
11866
of Freedom
V
5500
253
5247
2199
4773
4450
4708
4327
Winter Field
of Freedom
V
14456
653
13803
5955
12477
11671
12231
11245
Mean Squares
U
1.3928
16.7408
0.6528
0.6279
0.5574
0.5033
0.5029
0.4495
Program
Mean Squares
U
5.2119
94.9021
0.9688
0.9316
0.7289
0.6721
0.6865
0.6095
(mps )
V
1.9584
21.9731
0.9933
0.7480
0.6770
0.5996
0.6373
0.5437

(mps )
V
4.5891
64.3676
1.7610
0.8924
0.8471
0.7447
0.7750
0.6599

     The notation "Residual (j)" means the pooled residual variation
     from fitting models determined by modeling procedure j.  It should
     be noted that "Within-Cases" is equivalent to "Residual  (5)".
                               -46-

-------
TABLE 16.   VALUES OF POOLED EVALUATIVE CRITERIA BY SEASON, WIND COMPONENT
           AND MODELING PROCEDURE BASED ON THE FULL NETWORK ESTIMATIONS
Wind
Statistic Component
Average No. of U
Model Terms
(intercept
included) V

Pooled Residual U
Std. Dev. (mps)
V

Pooled R2 U

V

Pooled 2 U
Adjusted R
V

Modeling
Season*
S
W

S
W
S
W
S
W
S
W
S
W
S
W
S
W
13
13

13
13
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0
.0
.0

.0
.0
79
97
86
94
60
59
68
78
04
04
25
49
1
2.
2.

2.
3.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.

2
2

9
0
75
85
82
92
20
29
38
57
15
25
32
52
2
3.
3.

4.
4.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
Procedure

4
2

1
3
71
82
77
86
32
38
49
64
23
31
40
0.58
3
2.
2.

3.
3.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.

6
6

1
4
71
83
80
88
29
35
42
61
23
29
36
56
4
4.0
4.0

4.6
4.9
0.67
0.78
0.74
0.81
0.41
0.46
0.55
0.69
0.31
0.37
0.45
0.63
5
1.0
1.0

1.0
1.0
0.81
0.98
1.00
1.33
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
 *   S =  summer  field program;
    W =  winter  field program.
                                     -47-

-------
The following tendencies should also be noted:

     (a)  there is greater within-case variation from station-to-station
          in the V-component,

     (b)  there is greater total variation in both components in  the
          winter than in the summer,

     (c)  a smaller percentage of the total variation is accounted for
          by the models in the summer than in the winter for both the
          U- and V-components.

     Tables 17 and 18 present, respectively, the distributions of the
                                                                    2
residual standard deviations and the distributions of the adjusted R

values.  These distributions are based on all 908 cases.  Particular

note should be made of the similarity of the distributions for proce-

dures 2 and 3 in both of these tables.

     Tables 19 and 20 show distributions of the individual residuals

resulting from the full-network estimations.  Table 19 clearly indicates

the smaller summertime variation, as compared to that of the wintertime.

Table 20 combines the distributions of Table 19 over seasons.  In addi-

tion, the distributions of deviations between observed and predicted

wind speeds are shown.  Large positive deviations in the wind speeds

appear more frequently than large negative deviations, indicating (when

such errors occur) a tendency toward underprediction of the wind speeds.

The majority of thi wind speed residuals,  however,  are less than 1.5

mps, as shown by the percentages below that are derived directly from

Table 20:


    Modeling              Percentage of Observations With:
    Procedure           |W-W|<1.5 mps            |W-W|>1.5 mps

        1                 93.62                     6.38
        2                 95.45                     4.55
        3                 94.24                     5.76
        4                 96.20                     3.80
        5                 83.54                    16.46
                                   -48-

-------
TABLE 17.  DISTRIBUTIONS OF RESIDUAL STANDARD DEVIATIONS OVER THE 908
           CASES FOR FOUR MODELING PROCEDURES APPLIED TO DATA FROM ALL
           STATIONS
Percentage Frequency
Distributions
Residual
Std. Dev.
Cumulative
Modeling Procedure
1234
Modeling
1 2
Percentages
Procedure
3 4
(mps)
U-Component
0.0 -
0.5 -
1.0 -
1.5 -
0.5
1.0
1.5
2.0
14.5
66.9
18.0
0.7
18.6
66.9
14.1
0.4
18.0
66.1
15.4
0.6
24.
65.
10.
0.
2
0
4
4
14
81
99
100
.5
.4
.3
.0
18.6
85.5
99-6
100.0
18.0
84.0
99.4
100.0
24.2
89.2
99.6
100.0
V- Component
0.0 -
0.5 -
1.0 -
1.5 -
0.5
1.0
1.5
2.0
5.5
68.9
24.1
1.4
8.0
72.5
19.1
0.4
7.2
71.8
20.4
0.7
13.
72.
14.
0.
8
1
0
1
5
74
98
100
.5
.4
.6
.0
8.0
80.5
99.6
100.0
7.2
79.0
99.3
100.0
13.8
85.9
99.9
100.0
                                -49-

-------
TABLE 18.  DISTRIBUTIONS OF ADJUSTED R  STATISTICS OVER THE 908 CASES
           FOR FOUR MODELING PROCEDURES APPLIED TO DATA FROM ALL STATIONS

Adjusted
o
R2


Percentage Frequency
Distributions
Modeling Procedure
1

2

3

4

Cumulative
Modeling
1

2

Percentages
Procedure
3

4

U-Component

0.0
0.2
0.4
0.6
0.8
0.0
- 0.2
- 0.4
- 0.6
- 0.8
- 1.0
30.4
30.3
23.7
10.2
5.1
0.3
11.2
36.2
29.0
14.9
8.0
0.7
20.5
27.6
29.0
14.3
7.5
1.1
4.8
26.9
32.7
20.2
12.0
3.4
30.4
60.7
84.4
94.6
99.7
100.0
11.2
47.5
76.4
91.3
99.3
100.0
20.5
48.1
77.1
91.4
98.9
100.0
4.8
31.7
64.4
84.6
96-6
100.0
V-Component

0.0
0.2
0.4
0.6
0.8
0-0
- 0.2
- 0.4
- 0.6
- 0.8
- 1.0
10.2
12.6
24.1
30.7
19-7
2.6
4.0
10.5
21.3
32.2
26.8
5.4
7.8
11.0
21.8
30.3
24.8
4.3
2.5
8.6
17.6
29-0
30.5
11.8
10.2
22.8
46.9
77.6
97.4
100.0
4.0
14.4
35.7
67.8
94.6
100.0
7.8
18.8
40.6
70.9
95.7
100.0
2.5
11.1
28.7
57.7
88.2
100.0
                               -50-

-------
             TABLE 19.  PERCENTAGE FREQUENCY DISTRIBUTIONS OF RESIDUALS BY SEASON, WIND COMPONENT, AND  MODELING

                        PROCEDURES OVER ALL STATIONS BASED ON FULL NETWORK ESTIMATIONS
I
Ui

Summer
Wind
Comp.
U




V




Winter
Wind
Comp.
U




V




Field Program
Modeling
Procedure
1
2
3
4
5
1
2
3
4
5
Field Program
Modeling
Procedure
1
2
3
4
5
1
2
3
4
5
(5501 Observations)
Deviation
<-5 -4
0.04
0.04
0.04
0.04
0.05
__ —
—
— —
—
0.02
Between
-3
0.11
0.07
0.09
0.07
0.16
0.22
0.16
0.18
0.15
0.78
Observed and
-2
1.75
1.42
1.51
1.25
2.65
2.16
1.45
1.78
1.15
5.33
-1
18.21
16.11
16.12
13.65
20.85
21.20
19.14
20.45
17.31
21.61
Predicted Value (midpt. of interval)
0
61.23
65.53
65.12
70.57
54.01
54.99
59.86
56.95
63.95
45.36
1
16.05
14.72
15.02
12.60
18.91
18.00
16.60
17.25
14.96
21.12
2
2.16
1.80
1.75
1.53
2.80
3.04
2.51
3.02
2.27
4.29
3
0.44
0.29
0.35
0.27
0.53
0.38
0.27
0.35
0.22
1.38
4
0.02
0.02
0.02
0.02
0.04
0.02
—
0.02
—
0.09
>5
__
—
—
—
—
	
—
—
—
0.02
(14457 Observations)


0.01 0.04
0.01 0.03
0.01 0.04
0.01 0.02
0.02 0.06
0.04
0.01
0.04
0.03
0.22 0.48


0.32
0.30
0.32
0.24
0.64
0.26
0.18
0.18
0.10
1.61


2.91
2.46
2.68
2.19
4.57
3.33
2.66
2.87
2.13
7.40


20.70
19.10
19.33
16.74
22.65
22.43
20.43
21.33
18.32
24.85


52.32
56.51
55.50
61.38
45.53
48.77
54.08
51.97
59.29
35.07


19.92
18.52
18.84
16.99
20.43
20.88
19.36
19.89
17.36
18.27


3.42
2.78
2.94
2.17
4.89
3.72
2.89
3.25
2.49
8.13


0.28
0.22
0.27
0.19
0.93
0.47
0.35
0.39
0.26
3.09


0.08
0.08
0.06
0.06
0.27
0.09
0.04
0.08
0.03
0.73


0.01
0.01
0.01
0.01
0.01
	
—
—
—
0.14

-------
             TABLE 20.  PERCENTAGE FREQUENCY DISTRIBUTIONS OF RESIDUALS BY WIND COMPONENT AND MODELING PROCEDURE

                        OVER ALL CASES AND ALL STATIONS BASED ON FULL NETWORK ESTIMATIONS
I
Cn
N>

Wind
Comp . *
U




V




W




Modeling
Procedure
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Deviation
<-5
0.01
0.01
0.01
0.01
0.02
—
—
—
—
0.16
—
—
—
—
0.01
-4
0.04
0.03
0.04
0.03
0.06
0.03
0.01
0.03
0.02
0.36
0.01
—
0.01
—
0.06
Between Observed and
-3
0.26
0.24
0.26
0.20
0.51
0.25
0.18
0.18
0.11
1.38
0.09
0.04
0.08
0.02
0.49
-2
2.59
2.17
2.36
1.93
4.04
3.01
2.32
2.57
1.86
6.83
1.86
1.25
1.60
0.94
4.43
-1
20.02
18.28
18.44
15.89
22.15
22.09
20.07
21.09
18.04
23.96
18.45
16.69
17.75
15.28
22.94
Predicted Value (midpt. of interval)
0
54.78
58.99
58.15
63.91
47.87
50.49
55.68
53.34
60.57
37.90
52.87
58.25
55.15
62.34
39.78
1
18.85
17.47
17.78
15.78
20.01
20.09
18.60
19.16
16.70
19.06
22.30
20.51
21.34
18.58
20.82
2
3.07
2.51
2.61
1.99
4.31
3.53
2.79
3.19
2.43
7.07
3.92
2.96
3.61
2.61
7.70
3
0.33
0.24
0.29
0.22
0.82
0.45
0.33
0.38
0.25
2.62
0.41
0.27
0.38
0.22
2.98
4
0.06
0.06
0.05
0.05
0.21
0.07
0.03
0.07
0.02
0.56
0.09
0.03
0.07
0.02
0.68
>5
0.01
0.01
0.01
0.01
0.01
	
—
—
—
0.11
0.01
0.01
0.01
—
0.11

             *  W denotes wind speed.

-------
COMPARISON OF ALTERNATIVE MODELING PROCEDURES USING WIND DATA FROM
STATIONS IN THE RTI NETWORK
     The modeling procedures  applied  to  data from  the RTI network sta-
tions yield similar results to  those  described  in  the previous subsec-
tion.  Tables 21 through 24,  which are analogous to Tables 15 through
18, respectively, provide a summary of the major results.  It should
again be emphasized that these  results,  like those of the preceding
subsection, relate to  the precision of the modeling procedures rather
than to their accuracy.
     Comparison of these results  to those of the previous subsection in-
dicates that in general the criteria  values based  on the RTI network
estimations are slightly less consistent than those for the full net-
work.  More specifically, the results can be summarized as follows:
     (a)  In the winter, the  within-case variation over the RTI network
          stations is  somewhat  larger than  the  within-case variation
          over all stations —  for both  wind components; in the summer,
          the within-case variation for  the V-component is smaller for
          the RTI network than  for the full network.
     (b)  For the V-component,  the residual variances over the RTI net-
          work are usually smaller than  the corresponding quantities for
          the full network; for the U-component, they are about the same
          in the summer and larger in the winter than the corresponding
          full-network residual variances.
     (c)  Among the stepwise  regression  procedures, the pooled adjusted
          R2 statistics from  the  RTI  network estimations are quite com-
          parable to those of the full network; the distributions of the
                    f\
          adjusted R   statistics  show that more large and more small
          adjusted R2  values  occur for the  RTI-network estimations than
          for the full-network  estimations.
                                -53-

-------
TABLE 21.  SUMMARY OF ANALYSIS OF VARIANCE RESULTS BASED ON ESTIMATIONS
           FROM THE RTI NETWORK
Source of
Variation*
Total
Between Cases
Within Cases
Residual (0)
Residual (1)
Residual (.2)
Residual (3)
Residual (4)
Source of
Variation*
Total
Between Cases
Within Cases
Residual (0)
Residual (1)
Residual (2)
Residual (3)
Residual (4)

Degrees
U
4017
253
3764
716
3515
3200
3434
3131

Degrees
U
10671
653
10018
2170
9353
8744
9138
8209
Summer Field
of Freedom
V
4017
253
3764
716
3421
3096
3294
2747
Winter Field
of Freedom
V
10671
653
10018
2170
8786
8028
8576
7557
Program
Mean Squares
U
1.2919
10.5446
0.6700
0.5483
0.5419
0.4909
0.5041
0.4500
Program
Mean Squares
U
5.2568

(mps )
V
1.7273
13.9551
0.9044
0.6147
0.6324
0.5380
0.5529
0.4175

2
(mps )
V
4.8337
69.4364 49.3675
1.0734
1.2154
0.8168
0.7411
0.7611
0.6635
1.9309
0.8338
0.8065
0.6861
0.7199
0.5898
     The notation "Residual (j)"  means  the  pooled  residual variation
     from fitting models determined  by  modeling  procedure j.   It  should
     be noted that "Within Cases" is equivalent  to "Residual  (5)".
                               -54-

-------
TABLE 22.   VALUES OF POOLED EVALUATIVE CRITERIA BY SEASON, WIND COMPONENT, AND
           MODELING PROCEDURE BASED ON RTI NETWORK ESTIMATIONS
Wind
Statistic Component
Average No. of U
Model Terms
(intercept
included) V

Pooled Residual U
Std. Dev. (mps)
V

Pooled R2 U

V

Pooled • U
Adjusted RZ
V

Modeling Procedure
Season*
S
W

S
W
S
W
S
W
S
W
S
W
S
W
S
W
13
13

13
13
0.
1.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0
.0
.0

.0
.0
74
10
78
91
84
75
87
91
18
00
32
57
1
2.
2.

2.
2.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.

0
0

4
9
74
90
80
90
24
29
37
63
19
24
30
58
2
3.
2.

3.
4.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.

2
9

6
0
70
86
73
83
38
40
51
72
27
31
41
64
3
2.3
2.3

2.9
3.2
0.71
0.87
0.74
0.85
0.32
0.35
0.47
0.68
0.25
0.29
0.39
0.63
4
3
3

5
4
0
0
0
0
0
0
0
0
0
0
0
0
.5
.8

.0
.8
.67
.81
.65
.77
.44
.49
.66
.77
.33
.38
.54
.69
5
1.0
1.0

1.0
1.0
0.82
1.04
0.95
1.39
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
 *  S  =  summer  field program;
    W  =  winter  field program.
                                      -55-

-------
TABLE 23.   DISTRIBUTIONS OF RESIDUAL STANDARD DEVIATIONS OVER THE 908
           CASES FOR FOUR MODELING PROCEDURES APPLIED TO DATA FROM
           STATIONS IN THE RTI NETWORK
Residual
Std. Dev.
Percentage Frequency
  Distributions	
 Modeling Procedure
123
Cumulative Percentages
  Modeling Procedure
 1234
                                    U-Component
0.0 -
0.5 -
1.0 -
1.5 -
0.5
1.0
1.5
2.0
15.3
61.3
21.4
2.0
19.7
61.5
17.7
1.1
18.9
60.7
19.2
1.2
28.1
56.6
14.4
0.9
15.3
76.7
98.0
100.0
                                                     19-7    18.9    28.1
                                                     81.2    79.6    84.7
                                                     98.9    98.8    99.1
                                             100.0   100.0   100.0   100.0
                                    V-Component
0.0 -
0.5 -
1.0 -
1.5 -
0.5
1.0
1.5
2.0
10.1
67.9
20.5
1.5
15.5
68.8
15.1
0.6
15.4
66.4
17.4
0.8
31.
57.
10.
0.
6
6
2
6
10
78
98
100
.1
.0
.5
.0
15.5
84.4
99.4
100.0
15.4
81.8
99.2
100.0
31.6
89.2
99.4
100.0
                               -56-

-------
TABLE 24.  DISTRIBUTIONS OF ADJUSTED RZ STATISTICS OVER THE 908 CASES FOR
           FOUR MODELING PROCEDURES APPLIED TO DATA FROM STATIONS IN THE
           RTI NETWORK

Adjusted


Percentage Frequency
Distributions
Cumulative
Modeling Procedure
1234






Modeling
1 2


Percentages
Procedure
3 4


U-Component

0.0 -
0.2 -
0.4 -
0.6 -
0.8 -
0.0
0.2
0.4
0.6
0.8
1.0
36.2
23.8
21.0
12.4
5.4
1.1
15.1
31.4
24.2
17.4
8.7
3.2
28.3
21.5
24.3
15.3
7.8
2.8
8.
24.
26.
19.
12.
8.
8
2
2
8
7
5
36
60
81
93
98
100
.2
.0
.1
.5
.9
.0
15.1
46.5
70.7
88.1
96.8
100.0
28.3
49.8
74.1
89.4
97.2
100.0
8.6
32.8
59.0
78.9
91.5
100.0
V- Component

0.0 -
0.2 -
0.4 -
0.6 -
0.8 -
0.0
0.2
0.4
0.6
0.8
1.0
15.2
9.9
19.1
24.2
24.7
6.9
5.5
11.5
16.2
23.7
29.3
13.9
10.1
9.1
15.6
24.0
27.3
13.8
3.
7.
10.
21.
27.
30.
3
3
2
6
6
0
15
25
44
68
93
100
.2
.1
.2
.4
.1
.0
5.5
17.0
33.1
56.8
86.1
100.0
10.1
19.3
34.9
58.9
86.2
100.0
3.3
10.6
20.8
42.4
70.0
100.0
                                -57-

-------
ACCURACY OF PREDICTED WIND FIELDS




     Evaluation of the accuracy of the modeling procedures depends upon




the deviations between observed and predicted values at the non-network




stations, when the estimation is based on data from the RTI network.




The means of these deviations by wind component are shown in Table 25




for each of the seven non-network stations; for comparative purposes,




the corresponding mean deviations resulting from the full network esti-




mations are given in Table 26.  It is apparent from these results that




the largest discrepancies between the mean deviations of the two tables




occur for the outer-non-network stations STL007 and EPA103.  Except for




these two stations, the corresponding mean deviations of Tables 25 and




26 usually differ by less that 0.1 mps.




     Pooled root mean square errors (RMSE's) at each of the non-network




stations are presented in Table 27.  These are shown for each wind




component and season; the pooled vector RMSE's, denoted by (U,V), pro-




vide a convenient method of summarizing the errors over the two compo-




nents, as described at the end of Section 3.  In order to evaluate the




magnitude of the errors occurring at the non-network stations, the




pooled vector RMSE's shown in the last five rows of Table 27 are plotted




in Figure 2 along with the corresponding RMSE's for the RTI network




stations.  Root mean square errors based on the full-network (F) esti-




mations are also shown.  These appear to the left of each vertical line,




whereas those based on the RTI network data are shown on the right.




This plot clearly demonstrates the trend of decreasing RMSE's for the




full-network estimations when going from procedure 1 to procedure 4 and




the similar trend for RTI network stations based on the RTI network




estimations.  The greatest improvement in precision in going from proce-
                               -58-

-------
TABLE 25.  MEANS OF DEVIATIONS BETWEEN OBSERVED AND PREDICTED VALUES AT NON-
           NETWORK STATIONS, BY WIND COMPONENT AND MODELING PROCEDURE—BASED
           ON ESTIMATIONS FROM RTI NETWORK DATA
Wind Modeling
Comp. Procedure
U (mps)




V (mps)




1
2
3
4
5
1
2
3
4
5
STL003
0.
0.
0.
0.
0.
-1.
-0.
-0.
-0.
-0.
543
526
520
483
566
663
693
678
733
702
STL004
-0.
-0.
-0.
-0.
-0.
0.
0.
0.
0.
0.
395
440
418
466
338
452
531
429
506
167
Station
STL007
-0.
-0.
-0.
-0.
-0.
0.
0.
0.
1.
-0.
700
761
560
612
428
414
558
833
273
215
EPA103
0.328
0.310
0.314
0.306
0.332
1.387
1.232
1.380
1.228
1.709
EPA107
-0.044
-0.083
-0.067
-0.113
0.000
-0.146
-0.150
-0.182
-0.210
-0.263
EPA111
-0.519
-0.517
-0.549
-0.585
-0.542
0.210
0.248
0.225
0.279
0.168
EPA112
-0.202
-0.239
-0.238
-0.308
-0.138
0.175
0.272
0.193
0.307
-0.197

 TABLE 26.  MEANS  OF DEVIATIONS  BETWEEN OBSERVED  AND  PREDICTED VALUES AT NON-
           NETWORK STATIONS,  BY WIND  COMPONENT AND MODELING  PROCEDURE—BASED
           ON ESTIMATIONS  FROM  FULL NETWORK DATA
Wind Modeling
Comp. Procedure
U (mps)




V (mps)




1
2
3
4
5
1
2
3
4
5
STL003
0.
0.
0.
0.
0.
-0.
-0.
-0.
-0.
-0.
545
510
492
452
571
716
720
683
664
731
STL004
-0.
-0.
-0.
-0.
-0.
0.
0.
0.
0.
0.
367
387
384
381
333
392
461
387
451
138
Station
STL007
-0.
-0.
-0.
-0.
-0.
0.
0.
0.
0.
-0.
416
359
229
173
433
273
285
328
291
301
EPA103
0.292
0.258
0.261
0.193
0.331
1.126
0.951
1.131
0.913
1.667
EPA107
-0.030
-0.078
-0.069
-0.102
0.008
-0.216
-0.205
-0.207
-0 . 190
-0.290
EPA111
-0.468
-0.430
-0.479
-0.466
-0.519
0.144
0.164
0.157
0.209
0.180
EPA112
-0.171
-0.186
-0.200
-0.221
-0.133
0.143
0.233
0.165
0.255
-0.227
                                      -59-

-------
TABLE 27.   ROOT MEAN SQUARE ERRORS (MPS)  FOR EACH NON-NETWORK STATION BASED ON ESTIMATIONS
           FROM THE RTI NETWORK,  BY WIND COMPONENT,  SEASON, AND MODELING PROCEDURE
Wind
Comp.
U




U




V




V




(U,V)




(U,V)




(U,V)




Modeling
Season Procedure
Summer 1
2
3
4
5
Winter 1
2
3
4
5
Summer 1
2
3
4
5
Winter 1
2
3
4
5
Summer 1
2
3
4
5
Winter 1
2
3
4
5
Combined 1
2
3
4
5

STL003
0.960
0.927
0.942
0.925
1.015
0.849
0.850
0.842
0.841
1.859
0.774
0.817
0.837
0.990
0.807
1.220
1.202
1.193
1.191
1.245
1.233
1.236
1.260
1.355
1.297
1.486
1.472
1.460
1.458
1.512
1.419
1.409
1.406
1.429
1.454

STL004
1.109
1.090
1.113
1.104
1.073
0.973
1.041
0.991
1.069
0.903
1.118
1.075
1.051
1.027
1.099
1.183
1.259
1.164
1.238
0.989
1.575
1.531
1.531
1.508
1.536
1.532
1.633
1.529
1.636
1.339
1.544
1.605
1.529
1.601
1.398

STL007
1.017
1.438
1.034
1.311
0.678
1.359
1.382
1.790
2.001
1.461
1.120
1.363
1.946
2.847
0.788
1.088
1.147
1.265
1.592
1.589
1.513
1.981
2.204
3.135
1.040
1.741
1.796
2.192
2.558
2.158
1.579
1.932
2.201
2.988
1.436
Station
EPA103
0.870
0.848
0.869
0.876
0.901
1.014
0.981
1.030
1.021
1.112
1.937
1.755
1.871
1.678
2.231
1.689
1.582
1.695
1.609
1.971
2.124
1.949
2.063
1.893
2.406
1.970
1.861
1.984
1.906
2.263
2.018
1.888
2.008
1.902
2.308

EPA107
0.426
0.443
0.426
0.443
0.435
0.494
0.493
0.487
0.510
0.571
0.501
0.521
0.510
0.566
0.444
0.659
0.641
0.639
0.635
0.729
0.657
0.684
0.665
0.719
0.621
0.824
0.809
0.804
0.815
0.926
0.782
0.777
0.768
0.790
0.853

EPA111
	
—
—
—
—
0.783
0.781
0.817
0.867
0.794
—
—
—
—
—
0.771
0.786
0.795
0.823
0.770
—
—
—
—
—
1.099
1.108
1.140
1.196
1.106
1.099
1.108
1.140
1.196
1.106

EPA112
0.516
0.581
0.542
0.607
0.469
0.663
0.696
0.672
0.767
0.641
0.509
0.541
0.493
0.534
0.453
0.704
0.780
0.681
0.777
0.725
0.725
0.794
0.733
0.808
0.652
0.967
1.045
0.956
1.092
0.967
0.907
0.982
0.900
1.022
0.892
All Non-
Network
Stations
0.863
0.955
0.867
0.933
0.310
0.833
0.846
0.864
0.911
0.851
1.119
1.116
1.279
1.521
1.156
1.095
1.092
1.092
1.109
1.163
1.413
1.469
1.545
1.784
1.412
1.376
1.381
1.392
1.436
1.441
1.387
1.406
1.437
1.542
1.433
Inner Non-
Network
Stations
0.317
0.813
0.817
0.822
0.816
0.772
0.795
0.782
0.833
0.765
0.777
0.781
0.768
0.821
0.762
0.944
0.971
0.930
0.968
0.918
1.127
1.128
1.122
1.162
1.116
1.220
1.255
1.215
1.277
1.195
1.198
1.226
1.194
1.250
1.177

-------
^°°led Legend: * RTI Network Stations
pj^°5 cx * Inngr Non-Network Stations
RMSE (plpsl 0 Outer Non-Network Stations
5,5 «, ,
! 1
i i

j j
5.0. * |
1 t
1 1
1 i

1 |
1 1
1 1
| i
20+ I Q
* I I *
1 * 1 *
! 0 1* *
1 1*0
1.5+ *\ * (^

I )i X
1 *
I * *X *
1,0 + X* * *
I n* *x a*
. | 	 * *x. 	 X*
1 X X*
1 _*J
0,5 +
1
1 	 1
0 0 * J
• - • ' ' '
1 1
1 I 1
1 1 1
1 1 1
	 - 	 1 	 1 	 . 1 	 	 ,.
1 i O 1
1 I 1




0
* *
0*
*x
* - ft
. x 	
x

*x *
*X X*
* *
*X .... 0*
* 	 X*
*

	 , — , 	 ., 	
	 1- 	 1 	
1 I
I A !
1 1*0
0 i 0*1*
1 * 1 *
| « 1 B- . I
a i * i *
i n *i
* 1 1 *
1 1 p *
* * 1* I
* * | X * 1 *
x i ; *i *x
X 0*1 X X*. I *X - 	
* X i X *| *
X |* *|*X
* * 1 X X*| *
* W ^ * ^ I i W
*X X*i*X X ! 	
L _X_*_i *- .- L
1 1
1 1 .1 	
1 t 1
1 t 1
                       -t	2	3-
                        F R        F R        F R

                              Modeling Procedure
NOTE:  Some.observations are not shown since
       computer would not overprint.
F  R
             5... _
           F  R
(F = Full network
 R = RTI network)
Figure 2.  Pooled vector RMSE's  for  individual stations by modeling procedure
                                      -61-

-------
dure 1 to procedure 4 appears to occur for stations in the outlying




areas of the St. Louis region.




     Also apparent from Figure 2 is the increase in the RMSE's for the




non-network stations based on the RTI network estimations over the




corresponding values for the full network estimations.  These increases




tend to be most pronounced for procedures 3 and 4 and are quite dramatic




for the outer-non-network station, STL007.  Thus, among the non-network




stations, there is a general trend of increasing RMSE's in going from




procedure 1 to procedure 4, as contrasted to the reverse trend for the




RTI network stations.  As shown in Table 27, at least one of the first




three procedures yields a smaller RMSE than procedure 4 at each of the




non-network stations.  This suggests that the apparently higher pre-




cision of model 4 relative to the other procedures is obtained by over-




fitting (i.e., including too many terms) in some cases; this results in




a loss in accuracy relative to the first three procedures.  Figure 2




also shows that, although the flat-surface models (procedure 5) provide




predictions at the non-network stations which are nearly comparable in




accuracy to those of procedures 1, 2, and 3, the precision of procedure




5, as measured by the RMSE's at the RTI network stations, is substan-




tially poorer than that of the stepwise regression procedures.




     Among the four stepwise regression procedures, procedure 1 would




appear to yield the most accurate results across all seven non-network




stations; procedure 3 appears more accurate across the five inner-non-




network stations.  These general conclusions are supported by the re-




sults of Tables 28 and 29, which show various statistics that summarize




the distributions of the RMSE's over all cases.  Table 28 provides these

-------
       TABLE 28.  CHARACTERIZATION OF THE DISTRIBUTIONS OVER THE 908 CASES OF RMSE'S ACROSS ALL NON-NETWORK STATIONS
                  BASED ON ESTIMATIONS FROM RTI NETWORK DATA
CO

Modeling
Wind Proce-
Comp. dure
U I
2
3
4
5
V 1
2
3
4
5
W I
2
3
4
5
(U,V) 1
2
3
4
5
Pooled
RMSE
(mps)
0.842
0.878
0.865
0.917
0.840
1.102
1.093
1.147
1.239
1.161
1.086
1.061
1.097
1.113
1.188
1.387
1.406
1.437
1.542
1.433
Mean
RMSE
(mps)
0.778
0.794
0.787
0.822
0.772
1.039
1.035
1.066
1.116
1.095
1.023
0.995
1.027
1.030
1.118
1.328
1.336
1.359
1.428
1.373
Std.
Dev.
of
RMSE
(mps)
0.313
0.364
0.349
0.394
0.324
0.365
0.365
0.419
0.531
0.374
0.361
0.361
0.374
0.406
0.385
0.389
0.427
0.455
0.565
0.394
Maximum
RMSE
(mps)
2.193
5.038
3.488
3.588
2.202
2.478
2.898
5.494
6.441
2.357
2.245
3.434
2.763
3.737
2.453
2.844
5.117
5.525
6.467
3.114


Percentage of
<0.5
19.2
19.1
19.2
17.2
18.7
5.7
4.8
4.6
4.7
4.8
5.7
5.3
5.5
5.7
4.5
0.2
0.2
0.2
0.2
0.3
<1.0
79.0
76.9
77.6
74.8
79.6
47.5
49.7
46.0
44.6
41.3
50.4
54.3
51.0
50.9
39.6
20.0
20.5
18.9
18.2
15.3



Cases with RMSE (mps) :
<1.5
97.6
97.0
96.1
94.2
96.6
89.6
90.0
87.8
85.2
86.7
90.1
91.0
89.0
88.9
84.9
70.5
70.0
69.6
65.0
65.2
<2.0
99.7
99.3
99.3
98.5
99.7
98.8
98.5
97.6
96.1
97.9
98.8
98.9
98.3
97.6
98.1
94.5
93.2
91.7
89.0
92.8
<2.5
100.0
99.8
99.8
99.2
100.0
100.0
99.8
99.4
98.6
100.0
100.0
99.8
99.8
99.2
100.0
99.3
98.8
98.5
96.1
99.7

-------
TABLE 29.  CHARACTERIZATION OF THE DISTRIBUTIONS OVER THE 908 CASES OF RMSE'S ACROSS STATIONS IN THE  INNER-
           NON-NETWORK BASED ON ESTIMATIONS FROM RTI NETWORK DATA

Modeling
Wind Proce-
Comp. dure
U 1
2
3
4
5
V 1
2
3
4
5
W 1
2
3
4
5
(U,V) 1
2
3
4
5
Pooled
RMSE
(raps)
0.783
0.800
0.791
0.830
0.778
0.907
0.929
0.894
0.935
0.883
0.830
0.835
0.816
0.845
0.836
1.198
1.226
1.194
1.250
1.177
Mean
RMSE
(mps)
0.717
0.728
0.722
0.751
0.714
0.819
0.842
0.811
0.851
0.809
0.759
0.765
0.746
0.775
0.768
1.127
1.153
1.124
1.176
1.113
Std.
Dev.
of
RMSE
(mps)
0.319
0.331
0.326
0.350
0.318
0.377
0.376
0.362
0.376
0.344
0.342
0.333
0.332
0.334
0.341
0.397
0.402
0.393
0.411
0.380
Maximum
RMSE
(mps)
2.372
2.372
2.372
2.372
2.275
2.645
2.645
2.306
2.254
2.418
2.411
2.445
2.410
2.397
2.359
2.773
2.850
2.587
2.570
2.594
Percentage of Cases with RMSE (mps) :
<0.5
26.0
25.0
25.1
23.1
26.1
19.7
17.5
19.2
17.0
18.3
23.9
22.5
24.2
20.6
22.1
2.4
2.2
2.3
2.1
2.6
<1.0
83.4
82.2
82.6
79.5
84.1
72.7
70.5
74.7
69.4
75.2
77.9 -
78.9
79.2
76.9
78.0
42.6
40.4
43.1
38.8
41.7
<1.5
97.8
97.2
97.2
95.9
97.9
94.3
94.2
94.7
93.6
96.0
96.7
96.8
97.1
96.7
97.1
83.2
82.6
83.0
79.1
85.1
<2.0
99.7
99.7
99.7
99.3
99.6
99.2
98.9
99.3
99.2
99.6
99.6
99.7
99.7
99.8
99.3
97.6
96.6
97.1
95.9
97.4
<2.5
100.0
100.0
100.0
100.0
100.0
99.8
99.9
100.0
100.0
100.0
100.0
100.0
100.0
100.0
100.0
99.4
99.2
99.8
99.8
99.7

-------
measures for all of  the non-network whereas  Table  29  provides  comparable

results for the inner-non-network only.



COMBINED EVALUATIVE  MEASURES

      The overall merit of  a  procedure must  be  judged by some  combined

measure of its estimation error  (precision)  and its prediction error

(accuracy).  An overall measure  of the precision of a procedure is the

square root of the sum of the pooled residual variances for the two wind

components, based on estimations from the  RTI network data; these

values are shown below and  are denoted by  s(j),  where j indicates the

particular procedure:

                      Pooled Residual                  s(j):
    Modeling          Variances  (mps )              Square  Root
Procedure (j )
1R
2R
3R
4R
5R

0.
0.
0.
0.
0.
U
7417
6741
6909
6045
9632

0.
0.
0.
0.
1.
V
7577
6449
6736
5439
6508
Total
1.
1.
1.
1.
2.
4994
3190
3645
1484
6140
of
1
1
1
1
1
Total
.224
.148
.168
.072
.617
(mps)





A compatible measure that  reflects  the  accuracy of a procedure is the

pooled vector  RMSE over  stations  not  in the  RTI network.  Values of

these quantities  were shown for  the entire non-network in Table 28 and

for  the  inner  non-network  in Table  29-

     Let a, where 0 ^ a  <  1, be  used  as a weighting factor to reflect

the  importance of accuracy relative to  precision; define

     fa(j) = a[r(j)]2 +  (1-a)  [s(j)]2                           (24)

and
     ga(j) =  a[r*(j)]2 + (1-cO  [s(j)]2,                          (25)

where r(j) and  r*(j)  represent,  respectively,  the  pooled vector RMSE's

over all non-network  stations  and  over  inner-non-network stations.  Note
                               -65-

-------
that a=0 corresponds to assuming that precision of a particular  proce-



dure is of paramount importance and that accuracy can be completely



ignored.  Choosing a=l, on the other hand, would completely ignore how



well the particular procedure actually fit the data which were used  to



produce the estimates (i.e., the RTI network data).  Regarding esti-



mation and prediction errors to be of equal importance  (i.e., ot=O.5)



would result in the selection of procedures 2 or 4 as the "best" proce-



dure, depending upon whether (24)  or (25)  is  used as the criterion.



As indicated by Figure 3, however, for this choice of a, there is little



difference among the four stepwise procedures.  Figure  4, which shows



values of the g (j) versus a, indicates little preference among the



stepwise procedures when a~0-. 75.   Although the choice of a particular a



value is arbitrary, values in the range 0.5 to 0.8 would appear to be most



reasonable; this corresponds to assuming that prediction errors are at



least as important as estimation errors and may be up to 4 times more



important.  It should be noted that values of f (2) and f (3) are close
                                               ct         a


for all values of a.  The same holds true for g (2) and g (3).  For



a > 0.5, values of f (2) and g (2) are also close to values of f (1)  and



g (1), respectively.



     Figures 3 and 4 suggest that procedure 4, because  of its tendency



to produce inaccurate results, is the least preferable  of the stepwise



procedures.  Among the first three procedures, there is no clear prefe-



rence:  larger values of a tend to support procedure 1 whereas smaller



values tend to support procedures 2 or 3.  Over the range 0.5 to 0.8,



procedure 2 might be selected because when it is not "best", its f  and
                                                                  a


g  values are never "much larger" than the corresponding values  for  the



procedure with the smallest f  and g  values.  The same can be said  for
                               -66-

-------
i
ON
            2.6-
            2.2 -
         (mps)'
            1.8 -
            1.4 -
            1.0
                                                  1.00
                                                                            0-0    0.25     0.50      0.75
                                      1.00
                Figure  3.   Plot  of  f  (j) versus  a,  for


                           five  modeling procedures (j)
Figure 4.  Plot of g (j) versus a, for


           five modeling procedures (j)

-------
procedure 1, however, for a values greater than 0.6 or 0.7.  Also, the




consistency of procedure 1 over the non-network stations would tend to




support its use.




     Tables 30 and 31 show frequency distributions of the deviations




between observed and predicted values for procedures 1 and 2 (and, for




comparative purposes, for procedure 5).  Table 30 shows these distribu-




tions for each non-network station and for the non-network as a whole.




Table 31 shows the distributions over the RTI network, the full network,




and the outer and inner non-networks.  In both tables, all available




observations are considered.  Again, no strong preference for procedure




1 over procedure 2, or vice versa, is discernable.









SELECTED CASES AND CONDITIONS




     Results of the previous subsections have presented various evalua-




tive measures which, for the most part, have represented averages over




a large number of cases.  Such summaries, while quite essential for




reducing the volume of data, can also be misleading in some situations.




For instance, the importance of a difference of 0.1 mps in the average




RMSE's of two procedures averaged over all cases may be difficult to




judge.  Such a difference could be caused by a few extreme cases or




could be the result of small differences in a large number of cases.




Although the various frequency distributions shown in previous sections




provide some insight regarding the relative performance of the modeling




procedures on a more case-specific basis, an overall "feel" for how the




procedures might perform in a specific situation may be lacking.




     Consequently, this subsection provides some additional detail and




illustrative examples that should prove useful.  Two major types of




results are presented:



                               -68-

-------
       TABLE 30.   PERCENTAGE FREQUENCY DISTRIBUTIONS OF DEVIATIONS BETWEEN OBSERVED AND PREDICTED WIND COMPONENTS
                  AT NON-KTI NETWORK STATIONS — ESTIMATIONS BASED ON DATA FROM RTI NETWORK STATIONS
VO

Modeling
Procedure
1







2







5







Deviation Between Observed and Predicted U-Components (midpoint
Station
STL003
STL004
STL007
EPA103
EPA107
EPA111
EPA112
Total
STL003
STL004
STL007
EPA103
EPA107
EPA111
EPA112
Total
STL003
STL004
STL007
EPA103
EPA107
EPA111
EPA112
Total
W
892
892
350
833
844
596
861
5268
892
892
350
833
844
596
861
5268
892
892
350
833
844
596
861
5268
<-4 -4
— — __
0.22
0.29
— —
— —
0.17
—
0.08
— _ _
0.45
0.86 0.57
—
—
0.17
—
0.06 0.13
— —
0.22
0.57
—
—
0.34
—
0.11
-3
—
1.23
2.86
0.12
—
0.17
0.12
0.46
__
1.46
3.14
0.12
—
0.17
0.12
0.51
__
0.90
2.57
0.48
—
—
0.12
0.42
-2
0.56
9.53
12.57
1.56
0.24
3.69
1.05
3.41
0.67
10.99
12.86
1.56
0.47
4.19
1.28
3.83
0.22
7.40
4.86
2.16
1.30
4.19
0.58
2.73
-1
6.28
33.07
41.43
13.21
13.74
45.97
28.11
23.50
6.17
32.40
39.14
14.05
15.40
45.64
32.40
24.28
6.61
32.29
35.14
12.48
12.91
46.14
23.69
22.06
0
37.67
41.03
36.86
48.14
74.05
46.31
61.32
50.51
38.79
40.47
35.43
48.02
74.17
46.48
56.68
49.77
37.00
43.83
45.71
47.42
70.97
46.81
65.16
51.54
1
48.32
12.33
5.14
26.65
11.73
3.69
8.71
18.55
46.86
11.77
6.86
26.65
9.95
3.36
8.71
18.00
47.09
12.89
10.00
26.17
14.34
2.52
9.87
19.15
2
7.06
2.24
0.86
7.92
0.24
—
0.58
3.02
7.40
2.13
1.14
8.40
—
—
0.70
3.13
8.86
1.91
1.14
7.92
0.47
—
0.46
3.03
of interval, in mps)
3
0.11
0.34
—
2.16
—
—
—
0.42
0.11
0.34
—
1.08
—
—
—
0.25
0.22
0.56
—
3.24
—
—
—
0.65
4
__
—
—
0.24
—
—
0.12
0.06
	
—
—
0.12
—
—
0.12
0.04
__
—
—
0.12
—
—
0.12
0.04
>4
—
—
—
—
—
—
—
—
	
—
—
—
—
—
—
—
__
—
—
—
—
—
	
—

-------
       TABLE 30 (continued)
o

Modeling
Procedure
1







2







5






Deviation Between Observed and Predicted V-Components (midpoint
Station
STL003
STL004
STL007
EPA103
EPA107
EPA111
EPA1L2
Total
STL003
STL004
STL007
EPA103
EPA107
EPA111
EPA112
Total
STL003
STL004
STL007
EPA103
EPA107
EPA111
EPA112
N <-4
892
892
350
833
844
596
861
5268
892
892
350
833
844
596
861
5268
892
892
350 0.29
833
844
596
861
-4
0.22
—
—
—
—
—
—
0.04
0.11
—
—
—
—
—
—
0.02
0.56
—
0.29
—
—
—
—
-3
1.91
—
0.29
0.12
0.12
—
—
0.38
1.35
0.11
0.86
0.12
—
—
—
0.32
1.46
—
2.29
0.24
0.12
—
—
-2
13.90
3.36
3.71
0.48
1.90
0.34
0.12
3.61
14.69
3.59
3.14
0.72
1.54
0.50
0.23
3.76
15.25
5.38
9.14
0.84
2.73
0.34
2.21
-1
43.72
15.70
13.71
4.44
24.17
17.11
13.47
19.68
45.63
12.89
11.71
5.64
25.71
14.77
11.96
19.32
45.07
20.40
22.57
3.12
31.28
19.46
29.50
0
31.61
33.52
33.71
15.13
62.32
46.98
59.23
40.64
30.04
32.51
32.57
16.09
61.37
48.49
52.73
39.24
29.15
37.67
40.57
11.16
56.28
47.48
56.68
1
7.29
30.94
35.71
29.89
11.26
32.05
24.51
23.01
7.17
32.74
32.86
34.93
11.14
31.54
31.36
24.94
7.06
27.13
22.57
22.45
9.60
28.86
11.03
2
1.35
12.33
10.86
36.25
0.24
3.02
2.56
9.57
1.01
13.12
15.71
32.41
0.12
4.19
3.48
9.62
1.46
7.74
2.29
37.45
—
3.52
0.58
of interval, in mps)
3
—
3.81
1.43
11.52
—
0.50
0.12
2.64
—
4.37
1.71
8.40
0.12
0.50
0.23
2.30
— —
1.68
—
21.13
—
0.34
—
4
—
0.34
0.57
2.16
—
—
—
0.44
— -
0.67
0.86
1.68
—
—
—
0.44
	
—
—
3.48
—
—
—
>4
—
—
—
—
—
—
—
—
	
—
0.57
—
—
—
—
0.04
	
—
—
0.12
—
—
—
                   Total
5268   0.02   0.11   0.46   5.07   25.11  39.43   17.44    8.12    3.66    0.55    0.02

-------
TABLE 30 (continued)
Modeling
Procedure
1







2







5






Deviation Between Observed and Predicted
Station
STL003
STL004
STL007
EPA103
EPA107
EPA111
EPA112
Total
STL003
STL004
STL007
EPA103
EPA107
EPA111
EPA112
Total
STL003
STL004
STL007
EPA103
EPA107
EPA111
EPA112
N
892
892
350
833
844
596
861
5268
892
892
350
833
844
596
861
5268
892
892
350
833
844
596
861
<-4 -4
0.22
— —
0.29
—
__ —
— —
—
0.06
0.11
—
0.57
—
—
—
—
0.04 0.02
0.45
—
0.57 1.14
— —
—
—
—
-3
1.23
—
1.14
—
0.12
—
—
0.30
1.01
—
2.57
—
0.12
—
—
0.36
0.90
—
3.43
—
0.12
—
—
-2
5.94
1.23
5.43
—
3.08
0.50
0.23
2.16
6.39
0.78
3.43
0.12
2.13
0.84
0.23
1.94
6.61
2.80
9.71
0.12
4.98
1.01
2.09
-1
30.83
11.21
12.86
1.32
25.36
17.95
11.61
16.17
32.74
9.08
12.57
2.40
27.73
14.77
11.03
16.21
29.82
17.38
13.14
0.96
31.87
19.13
29.62
Wind Speeds (midpoint of interval,
0
42.94
36.32
47.14
8.64
58.41
53.02
60.63
43.19
42.38
34.75
42.86
11.04
59.36
55.03
54.47
42.29
42.60
40.25
49.43
6.12
50.47
54.70
52.96
1
16.70
35.99
27.14
31.33
12.91
26.68
25.32
24.91
15.47
40.70
30.29
35.65
10.55
26.34
31.82
27.03
16.82
30.04
21.43
21.01
12.44
23.49
14.05
2
2.02
13.57
5.71
40.46
0.12
1.68
1.74
9.91
1.79
12.67
6.86
38.54
0.12
2.85
1.97
9.66
2.58
7.96
1.14
40.94
0.12
1.68
1.05
3
0.11
1.46
0.29
15.49
—
0.17
0.23
2.79
0.11
1.79
0.86
10.32
—
0.17
0.23
2.07
0.22
1.46
—
25.81
—
—
0.12
4
—
0.22
—
2.52
—
—
0.12
0.46
— _
0.22
—
1.68
—
—
0.12
0.32
— _ .
0.11
—
4.68
—
—
0.12
in mps)
>4
—
—
—
0.24
—
—
0.12
0.06
	
—
—
0.24
—
—
0.12
0.06
	 	
—
—
0.36
—
—
—
             Total
5268   0.04   0.15   0.40    3.51   21.13  41.21  19.63   8.71   4.38   0.78   0.06

-------
   TABLE 31.  PERCENTAGE FREQUENCY DISTRIBUTION OF DEVIATIONS  BETWEEN OBSERVED AND  PREDICTED VALUES  BASED ON
               ESTIMATIONS FROM RTI NETWORK DATA
IsJ
Modeling Subset of
Variable Procedure Stations
U-Comp. 1 Outer-non
Inner-non
RTI
Full
2 Outer-non
Inner-non
RTI
Full
5 Outer-non
Inner-non
RTI
Full
No. Deviation Between Observed and Predicted Values (midpoint of interval,
Obs.
1183
4085
14690
19958
1183
4085
14690
19958
1183
4085
14690
19958
<-4
—
—
0.01
0.01
0.25
—
0.01
0.03
	
—
0.02
0.02
-4
0.08
0.07
0.03
0.04
0.17
0.12
0.02
0.05
0.17
0.10
0.05
0.07
-3
0.93
0.32
0.41
0.42
1.01
0.37
0.34
0.39
1.10
0.22
0.62
0.57
-2
4.82
3.01
3.08
3.17
4.90
3.53
2.58
2.91
2.96
2.67
4.73
4.20
-1
21.56
24.06
19.61
20.64
21.47
25.09
17.41
19.22
19.19
22.89
22.10
22.09
0
44.80
52.17
54.27
53.28
44.29
51.36
59.24
56.74
46.91
52.88
46.39
47.75
1
20.29
18.04
18.97
18.86
20.79
17.18
17.71
17.78
21.39
18.51
20.37
20.05
2
5.83
2.20
3.18
3.14
6.26
1.81
2.40
2.60
5.92
2.55
4.59
4.25
3
1.52
0.10
0.37
0.38
0.76
0.10
0.22
0.23
2.28
0.17
0.87
0.81
4
0.17
0.02
0.06
0.06
0.08
0.02
0.06
0.06
0.08
0.02
0.25
0.19
in mps)
>4
—
—
0.01
0.01
—
—
0.01
0.01
	
—
0.01
0.01

V-Comp. 1 Outer-non
Inner-non
RTI
Full
2 Outer-non
Inner-non
RTI
Full
5 Outer-non
Inner-non
RTI
1183
4085
14690
19958
1183
4085
14690
19958
1183
4085
14690
—
—
—
—
—
—
—
—
0.08
—
0.21
—
0.05
0.01
0.02
—
0.02
—
0.01
0.08
0.12
0.45
0.17
0.44
0.21
0.26
0.34
0.32
0.12
0.17
0.85
0.34
1.71
1.44
4.24
2.97
3.14
1.44
4.43
1.91
2.40
3.30
5.58
7.39
7.19
23.30
21.25
20.84
7.44
22.77
18.49
18.71
8.88
29.82
23.08
20.63
46.44
51.42
48.57
20.96
44.53
59.24
53.96
19.86
45.09
35.22
31.61
20.51
21.05
21.57
34.32
22.23
18.05
19.87
22.49
15.99
22.04
28.74
4.01
2.80
4.59
27.47
4.46
1.99
4.01
27.05
2.64
6.89
8.54
0.93
0.25
0.88
6.42
1.10
0.19
0.75
14.88
0.42
2.33
1.69
0.07
0.03
0.14
1.44
0.15
0.02
0.13
2.45
	
0.57
—
—
—
—
0.17
—
—
0.01
0.08
	
0,12
                             Full
19958   0.16   0.36   1.38   6.77  23.61  36.33  20.82   7.22   2.69   0.57   0.09

-------
     1.   Summaries over various  subsets  of  cases, and
     2.   Detailed results for  several  individual cases.
The results shown are limited to  modeling procedures  1 and 2, since the
combined measures of the previous subsection indicate that these two
procedures are certainly competitive with the two procedures that uti-
lize the larger class of model  terms.
     The prevailing wind speeds and directions are utilized to group the
908 cases into subsets  over which the various evaluation measures are
computed.  Four prevailing wind speed categories and  four prevailing
wind direction categories are used; these are shown in Table 32 below,
along with their relevant sample  sizes  (number of cases and observa-
tions) .
TABLE 32.  SAMPLE SIZES, BY PREVAILING  WIND  SPEED AND DIRECTION CATEGORIES

Prevailing Wind
Condition
Speed:
(mps)


<2
2-4
4-6
>6
Direction:
E,SE



Total
S
SW
Other

No. of
Cases
237
359
266
46
92
487
225
104
908
Number of Observations
RTI Network
3813
5733
4377
767
1565
7787
3599
1739
14690
Non-Network Inner-Non-Network
1377
2057
1572
262
560
2829
1268
611
5268
965
1629
1274
217
436
2208
973
468'
4085
     Table 33 presents values  of  three  evaluation measures that charac-
terize the magnitude  of  the  estimation  errors.   These are shown for both
procedures 1 and 2 applied to,  and  evaluated  over,  the stations in the
RTI network.  Parts A and B  of  this table  indicate  that the estimation
errors tend to be larger for the  higher wind  speed  cases.  This is true,

                                -73-

-------
TABLE 33.  .SUMMARY OF ESTIMATION ERRORS BY PREVAILING WIND SPEED
           AND DIRECTION CATEGORIES


Wind
Comp .
U
V
(U,V)

Modeling
Procedure
1R
2R
1R
2R
1R
2R
A. Pooled Residual Standard
Prevailing Wind Speed (mps)
<2 2-4 4-6 >6
0.612 0.806 1.022 1.221
0.580 0.766 0.973 1.170
0.707 0.832 0.978 1.203
0.663 0.764 0.898 1.115
0.935 1.158 1.415 1.714
0.881 1.082 1.324 1.617
B. Percentage of Cases With Residual
Less Than 1.0 mps
Wind
Comp .
U
V
Modeling
Procedure
1R
2R
1R
2R
Prevailing Wind Speed (mps)
<2 2-4 4-6 >6
95.8 84.7 56.0 34.8
97.0 88.9 62.4 47.8
92.4 84.4 63.9 34.8
94.5 90.5 74.4 41.3
Deviations (mps)
Prevailing Wind Direction
E,SE S SW Other
0.868 0.862 0.884 0.800
0.813 0.839 0.828 0.710
0.784 0.854 0.893 0.969
0.697 0.782 0.827 0.923
1.170 1.213 1.256 1.256
1.071 1.147 1.171 1.165
Standard Deviations
Prevailing Wind Direction
E,SE S SW Other
75-0 76.8 72.0 87.5
84.8 79.5 77.8 93.3
90.2 79.5 75.1 66.3
93.5 87.3 81.8 68.3
2
C. Pooled Adjusted R Statistics
Wind
Comp .
U
V
Modeling
Procedure
1R
2R
1R
2R
Prevailing Wind Speed (mps)
<2 2-4 4-6 >6
0.253 0.217 0.240 0.218
0.331 0.291 0.311 0.282
0.397 0.472 0.600 0.640
0.470 0.555 0.663 0.690
Prevailing Wind Direction
E,SE S SW Other
0.253 0.147 0.247 0.458
0.345 0.191 0.339 0.573
0.237 0.532 0.625 0.492
0.396 0-607 0.678 0.539
                                -74-

-------
even though a larger percentage  of  the  variation is  typically accounted


for in the high-speed cases,  as  is  demonstrated  in Part C of Table 33.


On an absolute scale, smaller estimation errors  occur  for the east/


southeast category  than  for  the  other wind  direction categories; this


appears to be a reflection of the trend noted above  for wind speed,


since the average wind speed  over cases in  this  wind direction category


is less than that for the other  direction categories.  As might be


expected, only a small percentage of the variation is  accounted for in


the less-dominant wind component—for instance,  in the U-component when


a prevailing southerly wind  occurs.  In terms of estimation errors, the


improvement of procedure 2 over  procedure 1 appears  to be quite con-


sistent across all  eight (4  speed and 4 direction) categories and both

                                                       2
wind components.  Differences in the pooled adjusted R statistics for


procedures 1 and 2, for  example, range  from about 0.05 to about 0.12.


     Table 34 provides two basic measures of the prediction errors—


namely, pooled RMSE's over all non-network  stations  (Part A), and over


all inner-non-network stations (Part B)—categorized by prevailing wind


speed and by prevailing  wind  direction.   The pooled  RMSE's for pro-


cedure 1 are usually slightly smaller than  the corresponding RMSE's for


procedure 2.  As with the estimation errors, there is  a definite pattern


of larger RMSE's for the higher  wind speed  cases as  compared to the


lower speed cases;  this  trend appears to be more pronounced for the


inner-non-network stations  (Part A).  Figure 5 illustrates this trend


and permits a visual comparison  of  the  relative  magnitudes of the  esti-


mation and prediction errors  to  be  made for the  various wind speed


categories.
                                -75-

-------
TABLE 34.  SUMMARY OF PREDICTION ERRORS BY PREVAILING WIND SPEED AND DIRECTION
           CATEGORIES'

A. Pooled Root Mean Square
Wind
Comp .
U
V
w
(U,V)
B.
Wind
Comp .
U
V
W
(U,V)
Modeling
Procedure
1R
2R
1R
2R
1R
2R
1R
2R
Pooled Root
Modeling
Procedure
1R
2R
1R
2R
1R
2R
1R
2R
Prevailing Wind
<2
0.716
0.744
1.026
1.009
1.032
0.985
1.251
1.253
Mean
2-4
0.834
0.900
1.062
1.057
1.062
1.042
1.351
1.388
Square

0
0
1
1
1
1
1
1
Errors
Speed
4-6
.927
.941
.106
.119
.081
.072
.443
.462
Over All Non-Network Stations
Ops)
>6
0.968
0.957
1.643
1.622
1.499
1.452
1.907
1.884
Errors Over All
Prevailing Wind
<2
0.682
0.671
0.710
0.724
0.741
0.716
0.985
0.987
2-4
0.776
0.800
0.821
0.833
0.792
0.798
1.130
1.156

0
0
0
1
0
0
1
1
Speed
4-6
.851
.875
.983
.028
.841
.871
.300
.350
(mps)
>6
0.841
0.857
1.585
1.584
1.283
1.265
1.795
1.801
(mps)
Prevailing Wind Direction
E,S£
0.958
1.047
0
0
0
0
1
1
.923
.948
.985
.991
.330
.412
S
0.758
0.820
1.098
1.078
1.082
1.054
1.334
1.355
Inner-Non-Network
sw
0.876
0.874
1.192
1.198
1.131
1.099
1.479
1.482
Other
1.009
0.970
1.077
1.106
1.096
1.074
1.476
1.472
Stations (mps)
Prevailing Wind Direction
E
0
1
0
0
0
0
1
1
,SE
.933
.018
.747
.795
.792
.813
.195
.292
S
0.743
0.754
0.823
0.841
0.786
0.790
1.109
1.130
SW
0.803
0.805
1.043
1.059
0-922
0-918
1.316
1.330
Other
0.770
0.766
1.099
1.129
0.864
0.874
1.343
1.364
                                     -76-

-------
      1.9
      1.8
      1.7
      1.6
Dips)
      1.4
      1.3-
      1-.2
      1.1
      1.0
     0.9.
     0.8'
                  I	1	1	1	
               ^2 raps         2-4 mps        4-6 mps        >6  mps

              	 Pooled Vector Residual Std. Dev.,  Procedure 1R
          -  	  -   Pooled Vector Residual Std. Dev., Procedure 2R
          	 Pooled Vector RMSE (Inner-Non-Network)*
          	  Pooled Vector RMSE (Non-Network)*
Prevailing
   Wind
  Speed
           Shown for  procedure 1R;  a very similar curve occurs for procedure 2R.
          Figure 5.  Pooled measures  of estimation and prediction errors
                     versus prevailing wind speed
                                         -77-

-------
     Percentage errors in predicting wind speeds at non-network stations




are summarized in Table 35 by prevailing wind speed categories.  These




percentage errors are shown for procedures 1 and 2 applied to data from




the RTI network and, for comparative purposes, for procedure 4 applied




to the full network data.  The percentage errors tend to decrease with




increasing wind speed for all three of these procedures.  Over the




inner-non-network, the percentage errors for procedures 1R and 2R are




about 20% greater than the corresponding percentage errors for procedure




4F.  Percentage errors for procedures 1R and 2R, over all non-network




stations, are roughly 30% larger than the percentage errors for proce-




dure 4F.




     In order to further demonstrate the performance of the modeling




procedures in specific cases, three particular cases were selected for




detailed examination:




                                      Prevailing Winds
Case
I
II
III
Date
8/12/75
2/20/76
2/21/76
Time
1800
1600
1700
Speed
2.56 mps
4,83 mps
5.59 mps
Direction
158°
136°
274°
It should be emphasized that these cases were picked arbitrarily.  They




do not necessarily reflect "typical" cases from among the 908 cases;




Case I, for example, was purposely chosen as a worst case situation for




modeling procedure 2, in that the vector RMSE over all non-network




stations for this case was much larger than for any other case.




     The prediction models determined by procedures 1 and 2 for these




cases are given in Table 36.  These particular models illustrate the




typical pattern of larger, more complex models for modeling procedure 2,




as compared to procedure 1.  It should also be noted that in two of the
                              -78-

-------
TABLE 35.  PERCENTAGE  ERRORS  IN WIND SPEED PREDICTIONS AT NON-NETWORK
	STATIONS. BY  PREVAILING WIND SPEED CATEGORIES  	
Subset of
Stations
Prevailing
Wind Speed
Category
Mean Wind
Speed (mps)
(W)
Inner-non
Network
Stations
S  mps
2-4 mps
4-6 mps
>6 mps

Overall
1.882
3.344
4.732
6.383

3.593
                                             100%  x  (RMSE(W)/W)
                                            For  Modeling Procedure:
                                             1R        2R       4F
             39.
             23.
             17.8
             20.1

             23.1
                                                             19.1
 All Non-
 Network
 Stations
22 mps
2-4 mps
4-6 mps
>6 mps

Overall
2.
3.
5.
6.
129
622
036
896
                             3.816
48.5
29.3
21.5
21.7
28.5
46.3
28.8
21.3
21.1
27.8
35.2
21.6
16.5
17.0
21.3
                                 -79-

-------
                                TABLE  36.  PREDICTION MODELS FOR THREE SPECIFIC CASES
                        Modeling
             Case      Procedure      	Prediction Model Based on Data From RTI Network	

               I            1         u = - 0.30969 - 0.00127536xy2


                                      V = 3.23072 - 0.00027154y3 + 0.00154632x2y


                            2         U = 0.95191 - 0.170683x + 0.0095480x2 - 0.0454584xy + 0.00445697x2y

                                                  - 0.00176493xy2 - 0.038375h


                                      V = 3.26253 + 0.00111345x2y + 0.00103673xy2 - 0.000027405y4


g             II            1         U - - 3.81314 - 0.085727x
i

                                      V = 4.76958 - 0.0118349y2 - 0.00026648x3 + 0.00232374xy2


                            2         U = -3.60660 - 0.195394x + 0.00045859x3


                                      V = 4.76958 - 0.0118349y2 - 0.00026648x3 + 0.00232374xy2


             III            1         U = 5.32123 + 0.00113637xy2


                                      V = 0.09533 - 0.000014250x4


                            2         U = 5.84047 + 0.162089x + 0.0043193xy - 0.00056576x3 - 0.0174165h


                                      V = 0.09533 - 0.000014250x4

-------
cases (Cases II and III)  the  same model was selected  for  the  V-component

by both procedures.


     Table 37 summarizes  the  fit of  the models  over the RTI network


stations  (i.e., over  the  set  of  stations actually  used for determining


the model form and parameter  estimates).  Modeling procedure  2 generally


accounts  for more of  the  variation in winds among  these stations  (i.e.,

        2
larger R  values).  That  is,  the predicted surfaces from  procedure 2


will typically have more  hills,  valleys, ridges, etc. than those  from


procedure 1, and therefore, if these are "real"  (e.g., as demonstrated


by comparing predicted  values with observed data from the non-network


stations), it would be  the preferred procedure.  On the other hand,


because procedure 2 yields more  complex polynomials, it is more likely


to produce spurious hills, valleys,  ridges, etc. in the wind  field over


those areas not in the  vicinity  of one or more RTI network stations—for


instance, in the outlying areas  of the region.  This is well,  illustrated


by Case I, in which the RMSE's over  the inner-non-network are quite


comparable for the two  procedures (see Table 38, Part B) , whereas the


RMSE for  procedure 2  over the entire non-network is extremely large


relative  to the corresponding RMSE from procedure  1 (see  Table 38,


Part A).  The large deviation in observed and predicted winds at  sta-


tion STL007 accounts  for  this discrepancy;  it should be noted that wind


data were not available for any  RTI  network station near  to the STL007


site.

     The  observed data  for Case  I, as shown in  Figure 6(A) , indicate


that the  wind flow is generally  out of the south-southeast with wind


speeds across the city  ranging from about 1 to  6 mps  and  averaging about


3.3 mps.  The flow pattern suggests  the influence  of  a heat island
                               -81-

-------
                        TABLE 37.   ANALYSIS  OF VARIANCE  RESULTS FOR  THREE  SPECIFIC CASES
I
CO
No. Stations
in RTI
Case Network
I 16


II 18


III 17


Modeling Wind
Procedure Comp.
1 U
V
2 U
V
1 U
V
2 U
V
I U
V
2 U
V
No. of Terms
in Model
2
3
7
4
2
4
3
4
2
2
5
2

Total
26.5783
18.2802
26.5793
18.2802
47.8158
33.4045
47.8158
33.4045
20.2199
19.3347
20.2199
19.8347
Sums of Squares
Regression
6.8707
8.5297
21.5190
11.2206
11.6919
24.5170
17.3583
24.5170
7.6288
6.6433
15.9330
6.6933
(raps)
Residual
19.7087
9.7504
5.0603
7.0596
36.1238
8.3675
30.4575
8.8875
12.5911
13.1415
4.2869
13.1415
Residual
Variance*
1.4078
0.7500
0.5623
0.5883
2.2577
0.6348
2.0305
0.6348
0.8394
0.8761
0.3572
0.8761
R2
0.258
0.467
0.810
0.614
0.245
0.734
0.363
0.734
0.377
0.337
0.784
0.337
F Value+
4.881
5.686
6.379
6.358
5.179
12.873
4.274
12.873
9.088
7.640
11.150
7.640
        *  The  residual variance is calculated by dividing the  residual sum of squares by the number of residual  degrees of
           freedom.  This degrees of freedom is the number of stations minus the number of model terms.

        +  The  F-value is calculated as  the ratio of the regression mean square to the residual variance.  The degrees of
           freedom for the regression mean square is one less than the number of model terms.

-------
A.  EMSE's Over all Non-Network Stations
Wind
Comp.
U
V
W
Cu, v)

Wind
Conrp .
U
V
W
(U, V)
Modeling
Procedure
1
2
1
2
1
2
1
2
B. RMSE's
Modeling
Procedure
1
2
1
2
1
2
1
2

I
1.245
5.038
1.418
0.896
1.247
3.434
1.886
5.117
Case
II
1.485
1.663
1.384
1.384
1.487
1.511
2.030
2.164
Over All Inner-Non-Network

I
1.840
1.891
0.574
0.524
1.068
0.048
1.928
1.963
Case
II
1.648
1.910
1.085
1.085
1.210
1.321
1.973
2.196

III
1.160
0.779
0.759
0.759
1.140
0.806
1.387
1.088
Stations

III
0.801
0.606
0.816
0.816
0.785
0.655
1.143
1.016
                  -83-

-------
                                                    .4 M   107 \ [ 103  x-v
                                                      71 u-irO,
Figure  6.  Observed and predicted winds for case I:  (A)  observed
          data;  (B) predicted winds using procedure 1;  (C) predicted
          winds  using procedure 2
                                 -84-

-------
circulation having strong  convergence in the  northwestern part  of  the




city.  Such flow patterns  associated  with heat  island  circulation  have




been observed before  in the  city of St.  Louis (Vukovich,  Dunn et al.,



1979).




     The wind predictions  for  Case I  were based on  only 16 of the  19




stations in the RTI network.   The three  non-reporting  stations were RAPS




stations 102, 119, and  120.  Station  119 is located at the outer boun-




dary of the southwestern portion of the  network and station 120, at the




outer boundary of  the northwestern portion (see Figure 1).  Large  errors




in the predicted wind field  might be  expected in these regions due to




the absence of wind data from  these areas of  the network.  That is, the




predictions in these  areas would essentially  represent extrapolation of




the polynomial models outside  the range  of the  data; it is well known




that such  extrapolation is highly error  prone.




     The predicted wind fields for Case  I determined by modeling proce-




dures I and 2 are  shown in Figures 6(B)  and 6(C), respectively.  Figure




6(B) shows a general  flow  pattern from the south-southeast with wind




speeds ranging from about  1.4  to 5.1  mps.  Although some  convergence in




the flow downstream of  the city is evident, it is not  as  intense as that




appearing  in the observed  flow field. The predicted wind field from




procedure  2  (Figure 6(C))  also shows  the general south-southeasterly




flow pattern; the  predicted  wind speeds  range from  about  1.6  to 11.7




mps.  A southwesterly wind with a speed  of 11.7 mps is predicted for




STL007.  This station is in  the northwestern  zone of the  region and  thus




represents an area in which  extrapolation occurs when no  data are  avail-




able from  EPA120.  If STL007 is excluded, the predicted wind  speeds




range from 1.6 to  6.2 mps  across the  other stations; the  predicted field
                                -85-

-------
also shows the strong convergence downstream of the city that was appa-




rent in the observed data.  Except for the problem of extrapolation




caused by the missing data, it would therefore appear that modeling




procedure 2 performed better than procedure 1 in this case.




     The wind data for Case II (Figure 7(A)) shows a general south-




easterly flow with wind speeds ranging from 1.8 to 9.4 mps.  The average




wind speed was 6.0 mps.  There is some indication of convergence imme-




diately downstream of the city which may be associated with the heat




island circulation; this convergence is not as significant as that found




in Case I.  There is also an apparent speed convergence over the city,




probably due to the increased friction in that region.




     The predicted wind fields were based on 18 of the 19 stations in




the RTI network.  The missing station was RTI202, which is located in




the interior of the network domain (see Figure 1).  Both of the pre-




dicted wind fields for this case, shown in Figures 7(B) and 7(C), appear




to pick up the indicated speed convergence over the central portion of




the city.  Procedure 2 appears to indicate the convergence downstream of




the city somewhat better than procedure 1.




     The observed data for Case III (Figure 8(A)) indicate flow from the




west with wind speeds ranging from 4 to 8 mps.  The wind distribution




shows no significant distortion of the flow pattern due to the presence




of the city except for a slight decrease in wind speed over the central




portion of the city again due to the increased friction in that region.




     The predicted wind fields for Case III were obtained by utilizing




17 of the RTI network stations.  Data were missing from RTI stations 202




and 205, which are located in the interior of the network.  The flow




pattern obtained from modeling procedure 1  (Figure 8(B)) is very similar
                               -86-

-------
                                             A
                  B
Figure 7.  Observed and predicted winds for case III:   (A)  observed
           data;   (B)  predicted winds using procedure  1;   (C)  predicted
           winds  using procedure 2
                                    -87-

-------
               B
Figure 8.  Observed and predicted winds for case III:   (A)  observed
           data;   (B)  predicted winds using procedure  1;   (C)  predicted
           winds  using procedure 2
                                   -88-

-------
to the observed data, although the  lower wind speeds in the central city
 »                             .s
(relative to the surrounding regions) are not as obvious as those of

Figure 8(A).  The flow  field based  on procedure 2  (Figure 8(C)) is also

quite similar to the observed flow  field; in this  case, the lower wind

speeds over the urban region are  somewhat more evident.

     Based on these three  cases,  it appears that modeling procedure 2

may produce predicted wind fields with  general characteristics more

similar  to the observed wind field  than the procedure  1 predictions.

The results also indicate  that missing  data may lead to substantially

poorer predictions  in  some areas  within the region, particularly when

the missing data occur  at  the boundaries  of the network.  In such

cases,  it will be necessary to  redefine the network domain  so  as to

avoid  the effects  of  extrapolations.
                                 -89-

-------
                             SECTION 5
                       DISCUSSION OF RESULTS


CONCLUSIONS AND FINDINGS
     The primary conclusion of this study is that a polynomial model
derived by stepwise regression on 13 model terms and applied to the 19-
station RTI network could produce predicted wind fields for St. Louis
comparable to those produced by similar procedures applied to a larger
class of model terms and a larger network (i.e., 23 terms and 26 sta-
tions) .  The 13-term model and the 19-station network were selected in
the theoretical phase of this research program (Vukovich et al.,
1978)  based on  the  argument that the addition of terms in the
model and/or stations in the network would not markedly improve the
analysis of the wind field.  This hypothesis has now been substantiated
using observed data.  The conclusion of this study is based on the
following findings:
          In terms _of estimation errors (precision), the results of
     applying four stepwise regression procedures to wind data from the
     RTI network (the "optimum" network) and from the full network
     indicate that comparable results are obtained for the RTI network
     and the full network, although the estimations for the full network
                                              2
     yield somewhat more consistent adjusted R  values across the vari-
     ous cases.
          The four stepwise regression procedures are clearly superior
     to both procedure 0 (fitting the full 13-term model) and procedure
     5 (fitting a flat surface).  This indicates that stepwise regres-
     sion techniques offer a practical method for automating the model
     form determination over a large number of cases; prior screening of
     the data for outliers, however, may hamper implementation of any
     automated, quick-response method for model estimation.
                               -90-

-------
                                   lure
     Among the four  stepwise regression procedures,  the proceck
permitting the most  complex  model  forms  (i.e., procedure 4) yields
the smallest estimation  errors.
     Procedure 2, which  utilizes a class of model forms consistent
with the overall methodology, yields residual variances that are
comparable to those  of procedure 3 which utilizes a  larger class of
model forms.
     Procedure 1, which  differs from procedure 2 only in that it
uses a stepwise regression parameter of 0.1 instead  of 0.2, appears
the least favorable  of the four stepwise regression  procedures in
terms of estimation  errors.
     Pooled residual standard deviations for the individual wind
components obtained  from procedure 1 are about 0.08 mps larger than
those for procedure  4 and about 0.04 mps larger than those for
procedures 2 and 3.
     In terms of predictions at the non-RTI-network  stations
(accuracy), procedure 4  is clearly less accurate than the other
procedures (which tended to  produce simpler models than those of
procedure 4) .  The mean  square errors  over the entire set of seven
non-network stations and over all  cases are somewhat better for
procedure 1 than for procedure 2 or 3;  over the subset of five
non-network stations in  the  interior portion of the  St. Louis
region, however, procedure 3 appears more favorable.
     Over interior non-network stations, percentage  errors for
predicting wind speeds by procedures 1 and 2 averaged 23%, when
data from the RTI network are utilized.  This compares favorably
with a corresponding error of .19%  for  procedure 4 applied to the
full network of stations.
     A subjective weighting  to reflect the relative  importance of
estimation errors and prediction errors was utilized to judge the
overall performance  of the various estimation procedures.  If the
prediction errors are condisdered  the  more important of the two
types, either of the two procedures consistent with  the overall
methodology (i.e., procedures 1 and 2)  or procedure  3 may be con-
sidered "best" depending upon the  particular criterion chosen
(e.g., the particular weight chosen and the particular set of
-91-

-------
     stations and/or cases considered).  It is clear, therefore, that
     little improvement is achieved by expanding the class of models
     from the 13-term set up to the 23-term set of candidate terms.
          Magnitudes of the average and pooled root mean square errors
     for procedures 1 and 2 — across all non-network stations and all
     cases — are roughly 0.1 to 0.2 mps larger than the corresponding
     pooled standard deviations over the RTI network.
          Individual case studies, which serve to illustrate the ana-
     lysis for several wind directions and wind speeds, indicated that
     procedure 2 performed better than procedure 1, and that the tech-
     nique yielded estimates of the wind field that closely compared to
     the observed data.
          Over the range of wind directions encountered, there was
     little change in the estimation error due to wind direction.
     Unfortunately only three major wind directions occurred with regu-
     larity, (i.e., flow from the southeast, south, and southwest).  The
     variation of the errors with wind speed was also not substantial
     although larger absolute errors and smaller percentage errors
     tended to occur for cases with high wind speeds.
     It was not possible to determine whether the 19-station RTI network
was "the" optimum network for the city of St. Louis because there were
not sufficient data or a sufficient number of auxiliary stations to test
the 19-station network against all other possible networks.  Further-
more, the theoretical phase showed that the network chosen for St. Louis
is likely to be near-optimal only for the procedures used; if another
procedure was used, it is likely that a different network would have
been selected.  The reliability of the wind field analysis will also
depend on the results of the prediction of the air pollution analysis
model, since the wind field is an input parameter to that model.  How-
ever, the results of this study have, in our opinion, demonstrated that
the methodology can be used to determine the locations of a reasonable
                               -92-

-------
number of stations  from which wind  data can yield reasonable wind  field



estimates over the  domain  of  the network.









ANALYTICAL LIMITATIONS




     The data available for demonstrating  the  wind field  estimation




procedures and for  evaluating the sampling network have several limita-




tions.   These data had, due  to  economic constraints, a limited time




span (i.e., a total of 33  days).  Even  within  this period, there were




large amounts unreported,  invalid,  and  unusable data.  For example, only




908 cases out of  potential number of  1584  cases were usable; in terms of




individual observations, less than  50%  of  the  potential number were




available.  In addition, for  13  of  the  16  RAPS stations,  the winds at 10




m above ground had  to be estimated  from winds  observed at 30 m above the




ground level.   The wind fields  associated with the winter field program




were atypical for that time of the  year according to statistical analy-




sis of wind data  obtained  from the  National Climatic Center for the




synoptic weather  station  (Lambert Field);  the  winter regime is generally




characterized by  northwesterly winds  but,  during  the winter data collec-




tion period, southwesterly winds predominated.




     The model/network evaluations  were also limited by several practi-




cal constraints.  First, only seven stations not  in the RTI network




furnished data for  which comparisons  between observed and predicted




winds could be made.  Secondly,  comparisons involved in the evaluation




typically had to  be made in terms of  absolute  measures such as root mean




squared errors rather than relative measures,  since the "best" model was




unknown and since only a relatively small  number  of potential network




designs could be  judged (i.e., those  which were subsets of the full
                                -93-

-------
network).   Furthermore, errors which arose from deficiencies  in  the




network could not be isolated from those than were effected by other




sources (e.g., measurement errors, model deificiencies, etc.).




     The findings outlined above provide, in our opinion, an  accurate




assessment of the major results of this study; they are obviously made




within the context of the limitations described above.









REMARKS




     The evaluation of the RTI network was based on data obtained from




the U.S. Environmental Protection Agency's Regional Air Pollution Study,




the St. Louis City/County Air Pollution Network, and three stations set




up by the Research Triangle Institute.  Overall, there were 26 stations




utilized, including the 19-station "optimum" network.  Though the econo-




mic burden to obtain these data was significant, the data were not




sufficient to make a complete evaluation of the network.




     In the application of this technique for other cities, an evalua-




tion of the network will certainly be necessary.  It is unfeasible for




future  evaluations to face the same economic burden as the present




evaluation.  Nevertheless, after establishing the optimum network, a




period  should be set aside in which data are collected at the network




stations and at locations not in the network.  Non-network data  can be




collected by a mobile van during periods when the wind is in  quasi-




steady  state.  Case studies should be examined in which wind  speeds and




wind directions differ from case to case.  The results of the case study




analyses will yield estimates of the reliability of the network.  This




technique should also be applied in evaluation of the air pollution




distribution obtained from the objective variational analysis model.
                               -94-

-------
     The 13-term class  of model  forms was used in the evaluation  of  the




wind field in order  to  test  and  validate the methodology developed in




the theoretical phase of this research project.   Now that this  aspect  is




complete and the results are positive, other surface fitting procedures




for estimating the wind field should be investigated.  For example,  one




procedure which would avoid  the  extrapolation problems of the polynomial




models  is gravitational-weighted (inverse of distance squared)  interpo-




lation.  This  approach  uses  only those data points close to the grid




point  for which  a  wind  prediction is being made;  generally, this  allows




extrapolation  into locations a small distance outside the domain  of  the




network without  large error.




      The wind  analysis  is an input parameter to the objective varia-




tional analysis  model (OVAM) to be used to derive the air pollution




distribution.   The evaluation of that model will take place in  the next




phase and will utilize wind field predictions at selected grid  points.




Figure 9 provides on illustration (Case II, Procedure 2 in the  last




 subsection of  Section 4) of the predicted wind field as it would  be  used




 in the OVAM.   The predicted winds at  each grid point in the 20-km x  20-




km area would  be utilized as inputs.  A grid spacing of 2 km is utilized




 in this figure.
                                -95-

-------
Figure 9.  Distribution of predicted  winds  on a  2-km  by  2-km  grid
           for case II using procedure 2
                                   -96-

-------
                            REFERENCES
Barr, A.J., J.H. Goodnight, J.P. Sail, J.T. Helwig, 1976:  A User's
Guide to SAS - 76, SAS Institute, Sparks Press, Raleigh, N. C.

Draper, N.R. and H. Smith,  1966:  Applied Regression Analysis, John
Wiley and Sons, New York.

Estoque, M.A. and C.M. Bhumralkar, 1969:  "Flow Over a Localized Heat
Source", Monthly Weather Review, 97,  850-859.

International Mathematical  and Statistical Libraries, 1975:  IMSL
Library Reference Manual.

Lettau, H., 1969:  "Note on Aerodynamic Roughness-Parameter Estimation
on the Basis of Roughness-Element Description", J. Appl. Meteor., 8^,
822-832.

Vukovich, P.M., J.W.  Dunn,  and B.W.  Crissman,  1976:  "A Theoretical
Study of the St. Louis Heat Island:   The Wind  and  Temperature Dis-
tribution", J. Appl.  Meteor.. 15, 417-440.

Vukovich, F.M., W.D.  Bach,  Jr.,  and  C.A. Clayton,  1978:  "Optimum
Meteorological and Air Pollution Sampling Network  Selection in Cities,
Volume I:   Theory and Design  for St.  Louis", Environmental Monitoring
Series, EPA-600/4-78-030.

Vukovich, F.M., J.W.  Dunn,  and W.J.  King, 1979:  Observations and
Simulations of Diurnal Variation of  the Urban  Heat Island Circulation
and  Associated Ozone  Variations:  A  Case Study.  Submitted to J. Appl.
Meteor.
                                -97-

-------
TECHNICAL REPORT DATA
(Please read Instructions on the reverse before completing)
i. REPORT NO.
EPA-600/4-79-069
3. RECIPIENT'S ACCESSION-NO.
4. TITLE A\'D SUBTITLE
OPTIMUM METEOROLOGICAL AND AIR POLLUTION NETWORK
SELECTION IN CITIES: Volume II - Evaluation of Wind
Field Predictions for St. Louis
5. REPORT DATE
October 1979
6. PERFORMING ORGANIZATION CODE
7. AUTHOR(S)
Fred M. Vukovich and C. Andrew Clayton
8. PERFORMING ORGANIZATION REPORT NO,
9. PERFORMING ORGANIZATION NAME AND ADDRESS
Research Triangle Institute
P.O. Box 12094
Research Triangle Park, North Carolina
10. PROGRAM ELEMENT NO.
1HE775
27709
11. CONTRACT/GRANT NO.
63-03-2187
12. SPONSORING AGENCY NAME AND ADDRESS
U.S. Environmental Protection Agency—Las Vegas, NV
Office of Research and Development
Environmental Monitoring and Support Laboratory
Las Vegas, NV 89114
13. TYPE OF REPORT AND PERIOD COVERED
period ending Feb. 1979
14. SPONSORING AGENCY CODE
EPA/600/07
15. SUPPLEMENTARY NOTES
This report is the second in a series on this topic (see EPA-600/4-78-030).
For
further information contact J.L. McElroy, Project Officer (702)736-2969, X241, Las Veg,
16. ABSTRACT
This report is the second in a series on the development of a method for design-
ing optimum meteorological and air pollution sampling networks and its application for
St. Louis, Missouri (see EPA-600/4-78-030). It involves the evaluation of the wind
field network and utilizes wind data collected during special summer and winter field
programs.
The evaluation considers the precision and accuracy of the procedure used for
estimating the wind field. The basic procedure for determining the wind field involve:
applying stepwise regression to a class of linear statistical models involving subsets
of 13 specific terms and data from a 19-station network; determined during the
theoretical phase of the study. The evaluation includes the selection of a larger
class of model forms and a basic set of 23 terms to compare with the 13<-term class
and includes estimations based on data from all reporting stations—up to a total of
26 stations.
The results demonstrate that application of 13-term modeling procedures to wind
data from the 19-station network can produce predicted wind fields comparable to those
produced by similar but more general procedures applied to a larger (26-station) net-
work and that the method can objectively provide a reasonable estimate of the wind
field over the domain of the network. An exhaustive evaluation was not feasible due
largely to numerous analytical and data limitations.
17.
KEY WORDS AND DOCUMENT ANALYSIS
DESCRIPTORS
b.IDENTIFIERS/OPEN ENDED TERMS
COSATI Field/Group
mathematical models
wind field
air pollution
meteorology
sampling network
St. Louis, Missouri
43F
55C
68A
72E
13. DISTRIBUTION STATEMENT

RELEASE TO THE PUBLIC
19. SECURITY CLASS (ThisReport)
UNCLASSIFIED
21. NO. OF PAGES
114
20. SECURITY CLASS (Thispage)
UNCLASSIFIED
22. PRICE
EPA Form 2220-1 (9-73)
U.S. GOVERNMENT PRINTING OFFICE 683-O91/22O9

-------