SEPA
United States
Environmental Protection
Agency
Environmental Monitoring
Systems Laboratory
PO Box 15027
Las Vegas NV 89114
EPA-600 4-79-069
October 1979
Research and Development
Optimum Meteorological
and Air Pollution
Sampling Network
Selection in Cities:
Volume II - Evaluation
of Wind Field Predictions
for St. Louis
-------
RESEARCH REPORTING SERIES
Research reports of the Office of Research and Development, U.S. Environmental
Protection Agency, have been grouped into nine series. These nine broad categories
were established to facilitate further development and application of environmental
technology. Elimination of traditional grouping was consciously planned to foster
technology transfer and a maximum interface in related fields. The nine series are:
1. Environmental Health Effects Research
2. Environmental Protection Technology
3. Ecological Research
4. Environmental Monitoring
5. Socioeconomic Environmental Studies
6. Scientific and Technical Assessment Reports (STAR)
7. Interagency Energy-Environment Research and Development
8. "Special" Reports
9. Miscellaneous Reports
This report has been assigned to the ENVIRONMENTAL MONITORING series.This series
describes research conducted to develop new or improved methods and instrumentation
for the identification and quantification of environmental pollutants at the lowest
conceivably significant concentrations. It also includes studies to determine the ambient
concentrations of pollutants in the environment and/or the variance of pollutants as a
function of time or meteorological factors.
This document is available to the public through the National Technical Information
Service, Springfield, Virginia 22161
-------
EPA-600/4-79-069
October 1979
OPTIMUM METEOROLOGICAL AND AIR POLLUTION
SAMPLING NETWORK SELECTION IN CITIES
Volume II: Evaluation of Wind Field Predictions for St. Louis
Fred M. Vukovich and C. Andrew Clayton
Research Triangle Institute
P. 0. Box 12194
Research Triangle Park,
North Carolina 27709
Contract No. 68-03-2187
Project Officer
Janes L. McElroy
Monitoring Systems Research and Development Division
Environmental Monitoring Systems Laboratory
Las Vegas, Nevada 89114
ENVIRONMENTAL MONITORING SYSTEMS LABORATORY
OFFICE OF RESEARCH AND DEVELOPMENT
U. S. ENVIRONMENTAL PROTECTION AGENCY
LAS VEGAS, NEVADA 89114
-------
DISCLAIMER
This report has been reviewed by the Environmental Monitoring Systems
Laboratory-Las Vegas, U.S. Environmental Protection Agency, and
approved for publication. Approval does not signify that the contents
necessarily reflect the views and policies of the U.S. Environmental
Protection Agency, nor does mention of trade names of commercial products
constitute endorsement or recommendation for use.
ii
-------
FOREWORD
Protection of the environment requires effective regulatory actions
that are based on sound technical and scientific data. This information
must include the quantitative description and linking of pollutant sources,
transport mechanisms, interactions, and resulting effects on man and his
environment. Because of the complexities involved, assessment of specific
pollutants in the environment requires a total systems approach that tran-
scends the media of air, water, and land. The Environmental Monitoring
Systems Laboratory-Las Vegas contributes to the formation and enhancement of
a sound monitoring data base for exposure assessment through programs designed
to:
* develop and optimize systems and strategies for moni-
toring pollutants and their impact on the environment
* demonstrate new monitoring systems and technologies by
applying them to fulfill special monitoring needs of
the Agency1 s operating programs
This report is the second in a series (see EPA-600/4-78-030) on a method
for designing meteorological and air quality monitoring networks and the
application of the method to the metropolitan St. Louis area. It is concerned
with the evaluation of the meteorological (wind field) network selected for
St. Louis. Regional or local agencies may find this method useful in plan-
ning new or adjusting existing aerometric monitoring networks. The Monitoring
Systems Design and Analysis Staff may be contacted for further information
on the topic.
//
George B. Morgan
Director
Environmental Monitoring Systems Laboratory
Las Vegas
iii
-------
PREFACE
This document is the second in a series on the development of a method-
ology for designing optimum meteorological and air quality monitoring networks
and the application of the methodology to the metropolitan St. Louis area.
It deals with the evaluation of the meteorological (wind field) network. The
first document (EPA-600/4-78-030) considered the theoretical aspects of the
methodology and the network(s) established for St. Louis. Subsequent reports
will be concerned with verification of the methodology with regard to the
air quality.
James L. McElroy
Project Officer
Environmental Monitoring Systems Laboratory
Las Vegas
iv
-------
SUMMARY
This report is the second in a series treating a method for develop-
ing optimum meteorological and air pollution networks and the application
of the methodology for St. Louis (EPA-600/4-78-030 describes the method
and the network for St. Louis). This particular report deals with the
evaluation of the wind field determined from the optimum network. For
this purpose, wind data obtained through summer (August 1975) and win-
ter (February-March 1976) field programs were reduced and validated.
The basic objective of the evaluation was to determine the precision and
accuracy of the procedures used for estimating the wind field. The
procedures for determining the wind field involved applying stepwise
regression to a class of statistical models and data from a 19-station
network; the network Cthe optimum network) and the class of models
(linear statistical models involving subsets of a specific set of 13
terms) were determined during the theoretical phase of the study.
Evaluation included the selection of a large class of model forms to
compare with the 13-term class. For this purpose, a basic set of 23
terms which were dictated by the results of the theoretical phase of the
study was chosen. The evaluation also included estimations based on
data from all reporting stationsup to a total of 26 stations.
The principal conclusion of this study was that application of
stepwise regression to the 13-term model together with wind data from
the 19-station optimum network produced predicted wind fields comparable
to those obtained by more general procedures Cthe 23-term model) applied
to a larger network (at most 26 stations). This substantiated through
observed data the results of the theoretical analysis conducted in the
above-mentioned report.
v
-------
An exhaustive evaluation was not feasible largely due to numerous
analytical and data limitations. Less than 50% of the total data collec-
ted could be used for the analysis due to unreported and invalid data.
The wind data associated with the winter field program was atypical for
that period in that the. period was characterized by southwesterly winds.
According to available statistical information, the winter period in the
St. Louis region is normally characterized by northwesterly winds.
Furthermore, relative measures had to be utilized in the evaluation
since the best model was unknown, and since only a small number of
additional (i.e., non-network) wind monitoring stations were available
in St. Louis. Also, errors which arose from network deficiencies could
not be isolated from errors arising from other sources (e.g., model
deficiencies, measurement errors).
vi
-------
CONTENTS
Page
FOREWORD iii
PREFACE iv
SUMMARY v
LIST OF FIGURES viii
LIST OF TABLES ix
LIST OF SYMBOLS xii
ACKNOWLEDGMENTS xvi
1. INTRODUCTION 1
Overview of the Proposed Methodology 2
Summary of Previous Results 2
Objectives and Scope of the Current Research 5
Objectives and Scope of Remaining Research 5
2. SUMMARY OF AVAILABLE WIND DATA 7
Description of Raw Data 7
Data Editing 12
3. EVALUATION TECHNIQUES 21
Selection of Modeling Procedures 23
Criteria for Evaluating Modeling Procedures 29
Criteria for Evaluating the RTI Network 34
4. EVALUATION RESULTS 38
Summary of Specific Models Selected by Alternative
Approaches 39
Comparison of Alternative Modeling Procedures Using
Wind Data From All Stations 45
Comparison of Alternative Modeling Procedures Using
Wind Data From Stations in the RTI Network 53
Accuracy of Predicted Wind Fields 58
Combined Evaluative Measures 65
Selected Cases and Conditions 68
5. DISCUSSION OF RESULTS 90
Conclusion and Findings 90
Analytical Limitations 93
Remarks 94
REFERENCES ~ 97
vii
-------
LIST OF FIGURES
Number Page
Location of Stations in the RTI Network and
Other Non-Network Stations Used in the Eval-
uation
Pooled Vector RMSE's For Individual Stations
by Modeling Procedure 61
Plot of f (j) versus a, for Five Modeling
Procedures (j) 67
Plot of g (j) versus a, for Five Modeling
Procedures (j) 67
Pooled Measures of Estimation and Prediction
Errors Versus Prevailing Wind Speed. 77
Observed and Predicted Winds for Case I: (A)
Observed Data; (B) Predicted Winds Using Proce-
dure 1; (C) Predicted Winds Using Procedure 2 84
Observed and Predicted Winds for Case II: (A)
Observed Data; (B) Predicted Winds Using Proce-
dure 1; (C) Predicted Winds Using Procedure 2 87
Observed and Predicted Winds for Case III: (A)
Observed Data; (B) Predicted Winds Using Proce-
dure 1; (C) Predicted Winds Using Procedure 2 88
Distribution of Predicted Winds on a 2 km by
2 km Grid for Case II Using Procedure 2 95
viii
-------
LIST OF TABLES
Number Page
1 Geographic Locations and Terrain Elevations for
Stations in the RTI Network 4
Geographic Locations and Terrain Elevations for
Stations Not in the RTI Network
3 Coefficients Used for Determining Frictional
Velocity Components 11
4 Mean Roughness Lengths (T ) , by Station 12
5 Distribution of Cases, by Date and Time-of-Day
and by Date and Number of RTI Network Stations
Reporting 15
6 Number of Cases for Which Valid Wind Data Are
Reported, by Station 16
7 Number of Available and Potential Observations
by Network and Season 17
8 Distribution of Cases, by Season and Prevailing
Wind Conditions 18
9 Summary Statistics, by Station for Observed Wind
Data Over All Cases 20
10 Summary of Modeling Procedures. 29
11 Distribution of Cases by Model SizeFor Four
Modeling Procedures Applied to Wind Component
Data From Stations in the RTI Network and the
Full Network 40
12 Pairwise Comparisons of Modeling Procedure0 in
Terms of Model Sizes and Model Forms 42
13 Percentage of 908 Cases in Which Network/
Modeling Procedures Resulted in the Same Model
Form 43
14 Number of Cases For Which Specific Model Terms
Are Selected, By Wind Component, Modeling Pro-
cedure, and Network. 44
15 Summary of Analysis of Variance Results Based
on Estimations From the Full Network 46
ix
-------
LIST OF TABLES (cont'd)
16 Values of Pooled Evaluative Criteria by Season,
Wind Component, and Modeling Procedure Based on
the Full Network Estimations 47
17 Distributions of Residual Standard Deviations
Over the 908 Cases For Four Modeling Procedures
Applied to Data From All Stations 49
2
18 Distributions of Adjusted R Statistics Over the
908 Cases For Four Modeling Procedures Applied
To Data From All Stations 50
19 Percentage Frequency Distributions of Residuals
by Season, Wind Component, and Modeling Procedures
(Over All Cases and All Stations) Based on Full
Network Estimations 51
20 Percentage Frequency Distributions of Residuals
by Wind Component and Modeling Procedure (.Over All
Cases and All Stations) Based on Full Network
Estimations 52
21 Summary of Analysis of Variance Results Based
on Estimations From the RTI Network 54
22 Values of Pooled Evaluative Criteria by Season,
Wind Component, and Modeling Procedure Based on
RTI Network Estimations 55
23 Distributions of Residual Standard Deviations
Over the 908 Cases For Four Modeling Procedures
Applied to Data From Stations in the RTI Net-
work 56
2
24 Distributions of Adjusted R Statistics Over the
908 Cases For Four Modeling Procedures Applied
to Data From Stations in the RTI Network 57
25 Means of Deviations Between Observed and Pre-
dicted Values at Non-Network Stations, by Wind
Component and Modeling ProcedureBased on
Estimations From RTI Network Data 59
26 Means of Deviations Between Observed and Pre-
dicted Values at Non-Network Stations, by Wind
Component and Modeling ProcedureBased on
Estimations From Full Network Data 59
27 Root Mean Square Errors (mps) For Each Non-
Network Station Based on Estimations From the
RTI Network, By Wind Component, Season, and
Modeling Procedure 60
x
-------
LIST OF TABLES (.cont'd)
28 Characterization of the Distributions Over the
908 Cases of RMSE's Across All Non-Network Sta-
tionsBased on Estimations From RTI Network
Data 63
29 Characterization of the Distributions Over the
908 Cases of RMSE's Across Stations in the Inner-
Non-NetworkBased on Estimations from RTI Net-
work Data 64
30 Percentage Frequency Distributions of Devia-
tions Between Observed and Predicted Wind Com-
ponents at Non-RTI Network StationsEstima-
tions Based on Data From RTI Network Stations 69
31 Percentage Frequency Distribution of Devia-
tions Between Observed and Predicted Values
Based on Estimations From RTI Network Data 72
32 Sample Sizes, by Prevailing Wind Speed and
Direction Categories 73
33 Summary of Estimation Errors By Prevailing
Wind Speed and Direction Categories 74
34 Summary of Prediction Errors By Prevailing
Wind Speed and Direction Categories 76
35 Percentage Errors in Wind Speed Predictions
at Non-Network Stations, by Prevailing Wind
Speed Categories 79
36 Prediction Models for Three Specific Cases 80
37 Analysis of Variance Results for Three Specific
Cases 82
38 Root Mean Square Errors (mps) For Three Specific
Cases 83
-------
LIST OF SYMBOLS
U west-east wind component in meters per second (mps)
V south-north wind component in meters per second (mps)
W wind speed (mps)
(U,V) wind vector with components U and V
x west-east geographic coordinate relative to a given
origin, in kilometers (km)
y south-north geographic coordinate relative to a
given origin (km)
h or h(x,y) terrain elevation in meters (m) at the point (x,y)
relative to a fixed base plane
k wind component index (k = 1 for U-component, k = 2
for V-component
i station index
t time index
j index that identifies modeling procedures
(x.,y.) geographic coordinates of i station
(U ,V ) observed wind components at m minutes away from a
nominal time point (at 10 or 30 m above ground
level)
(U ,V ) 20-minute average of wind components, at 10 or 30 m
above ground level
(U',V) 20-minute average of wind components at 10 m
above ground level, as estimated from observations
at 30 m above ground level
(U*,V*) west-east, south-north components of the friction
velocity
e elevation above ground level (m)
T mean roughness length (m)
Z, or 20-minute average for wind component k at 10 m
7 / _rx above gound level, at time t and at the point (x,y)
xii
-------
LIST OF SYMBOLS (cont'd)
Z, (i) Z, (x.,y.) i.e., observed value of wind component k
at station i at time t
x_ a 23 x 1 vector involving functions of x and y
J3, a 23 x 1 vector of unknown parameters associated with
wind component k at time t
e, or the deviation, at the point (x,y), of wind component
/ N k at time t from an assumed model of the form x ' 3,
ekt(x,y) -- kt
x^ a 13 x 1 vector consisting of the first 13 elements
^~\J ,-
Of X_
J3^, a 13 x 1 vector of unknown parameters associated with
wind component k at time t
e . or the deviation, at the point (x,y), of wind component
, .. k at time t from an assumed model of the form xJIJS,.,
£Okt(X'y' -0-Okt
Z., a vector containing the Z (i) , i=l,2,... .
X* a matrix for which the i row consists of the x.'
vector evaluated at (x.,y.)
Q a arbitrary subset of stations
F a network consisting of all (reporting) stations
R a subset of stations consisting of all (reporting)
stations in the RTI network
n (Q) the number of observations (i.e., reporting stations)
at time t in the network Q, where Q = R or F
Pt (J>Q) the number of terms in the model for wind component
k at time t when modeling procedure j is applied to
data in network Q, where Q = R or F
3. vector of p, (j,Q) estimated parameters for compo-
i£t
nent k at time t, obtained by applying modeling pro-
cedure j to network Q, where Q = R or F
x., a vector obtained by retaining those elements of x_
]kt
which correspond to the 3.' elements
X., a matrix for which the i row consists of the x_'
-1 vector elevated at (x.,y.) J
xiii
-------
LIST OF SYMBOLS (cont'd)
A i.'L
Z, (j,Q,i) the predicted value of the k wind component at
time t at the point (x.,y,) when modeling procedure
j is applied to data in network Q, where Q = R or F
r*
e- (j,F,i) the deviation between the observed wind component,
Kt A
Z (i), and the predicted component, Zfc (j,F,i)
2
s, (j,Q) the residual variance for component k from the model
based on procedure j applied to wind data from net-
work Q (Q = R or F) at time t
2
R,, (J»Q) the proportion of the total variation in wind compo-
nent k at time t (over network Q) accounted for by
the model resulting from modeling procedure j when it
is applied to network Q (Q = R or F), i.e., an
2
R statistic
2 22
A^ (j,Q) the adjusted R statistic based on Rkt(J»Q)
C an arbitrary subset of cases (i.e., t values)
2
S1P(J»Q) the pooled residual variance over C, obtained as a
kC fy
weighted average of the si,t(J>Q) values
2 2
^T-r-O >Q) the pooled R statistic over C, i.e., the proportion
of the total within-case variation in wind component
k accounted for by applying modeling procedure j to
data from network Q CQ = R or F)
2 22
Av.r(j »Q) the pooled adjusted R statistic based on R, (j,Q)
Npn the number of wind observations in the intersection
4 of C and Q
W (i) observed wind speed at (x.,y.) at time t
6 (i) observed wind direction at (x ,y.) at time t
W (j,Q,i) predicted wind speed at (x.,y.) at time t, based on
applying modeling procedure j to data from network Q
(Q = R or F)
A
6 (J>Q>i) predicted wind direction at (x.,y.) at time t, based
on applying modeling procedure j to data from network
Q (U = R or F)
s(j) the square root of s^(j,R) + siL(j,R), where C con-
J.U ^<->
sists of all cases
xiv
-------
LIST OF SYMBOLS (.cont'd)
r(j) the pooled vector root mean square error associated
with procedure j pooled over both wind components,
all cases, and all stations not in the RTI network
r*(j) same as r(j) but over all interior stations not in
the RTI network
2 2
f (j) a weighted average of [r(j)] and [s(j)] , where a
is the weight attached to the former
2 2
g (j) a weighted average of [r*(j)] and [s(j)j , where
a is the weight attached to the former
xv
-------
ACKNOWLEDGMENTS
This report was prepared by the Research Triangle Institute (RTI),
Research Triangle Park, North Carolina, under contract No. 68-03-2187
for the U.S. Environmental Protection Agency (EPA). The project officer
was Dr. James L. McElroy. Many individuals from RTI participated in
this project. Mr. J. W. Dunn was responsible for developing the com-
puter algorithm for processing and reducing the wind data. Mr. Bobby
Crissman was responsible for the initial data reduction. Mr. Clifford
Decker was responsible for management of the field program.
We would also like to acknowledge the cooperation of Mr. Robert
Browning of EPA, Research Triangle Park, North Carolina, for providing
us with the Regional Air Pollution Study (RAPS) data; and Mr. Ashwin
Gajjar, St. Louis County Air Pollution Control Agency, for providing
data from the St. Louis City and County air pollution stations.
xvi
-------
SECTION 1
INTRODUCTION
This report provides an evaluation of one aspect of an overall
methodology for generating estimated pollution concentration surfaces
over an urban area. This methodology, if successful, would avoid three
of the major problems typically encountered in estimating such surfaces
directly from observed air quality data; these problems occur because:
(a) reliable estimation (for a single pollutant) requires a high
resolution network of air quality monitoring stations,
(b) "optimal" networks for two different pollutants would gen-
erally be different because of different emission sources, and
(,c) an "optimal" network Cfor a single pollutant) remains "optimal"
only in the short-term because of changes in the emission
sources.
The proposed methodology has the potential of overcoming these problems
by utilizing the emissions source inventory as a primary source of data
and by establishing a network which is "optimal" for estimating wind
fields. The model development phase of the proposed methodology, as
well as its implementation in the St. Louis, Missouri area, is described
by Vukovich et al. (1978).
The following subsections provide a brief description of the over-
all concept, and summarize the statistical model form and sampling
network which resulted from applying the methodology in St. Louis. The
specific objectives of this report are then described, along with a
description of the organization of the remainder of the report.
-------
OVERVIEW OF THE PROPOSED METHODOLOGY
The proposed methodology involves six major steps:
(1) Utilize a three-dimensional hydrodynamic model to gene-
rate simulated wind fields for the (urban) area under a
variety of (.initial) meteorological conditions.
(2) Determine a class of statistical model forms relating
winds to geographic location and topography which will
yield a reasonable approximation to the simulated results
for any of the initial conditions.
(3) Using the results of (.2) , determine an "optimal" set of
sites for monitoring winds.
(4) Establish wind and air quality monitoring stations at the
indicated sites.
(5) Estimate wind fields by fitting statistical models based
on the class of forms determined in (2) to the observed
data.
(.6) Utilize an objective variational analysis model to esti-
mate pollutant concentrations over the area by combining
the emissions source inventory, the observed pollutant
concentrations, and the estimated wind fields.
With minor modifications resulting from practical and economic constraints,
the first four steps above have been completed for the St. Louis area;
the following section describes the class of statistical models and the
network established in the St. Louis area.
SUMMARY OF PREVIOUS RESULTS
Consider an arbitrary point in the St. Louis region with coor-
dinates (x,y) relative to a fixed origin, where x denotes distance in
kilometers (km) in the east direction and y, in the north direction.
Let h = h(x,y) denote the elevation in meters (m) at (x,y) relative to a
-2-
-------
fixed base plane at river elevation of approximately 100 m. Let
Zkt H Zkt^X'y^ denote the value in meters per second (mps) of the kfc
wind component (k=l for the west-east component, U; k=2 for the south-
north component, V) at time t. The model form proposed by the Research
Triangle Institute (RTI) in Vukovich et al. (1978), which formed the
basis for determining the sampling network, was
zkt ' sj £okt + "out (1)
where
i /, 22 332 2 4 4 .v
x^ = (1 x y x y xyx y xyxy x y h),
j3_, = a 13 x 1 vector of unknown parameters for component k
at time t, and
en, = £-. (x,y) = random deviation in component k at time t at
UsCt UiCC - . / \
the point (x,y).
The proposed network, which was subsequently established and which
is herein referred to as the RTI network, involves 19 stations (see
Figure 1). Because of on-going data collection activities in the local
area, it was only necessary for RTI to set up three stations for this
evaluation. Sixteen existing stations were situated in close proximity
to "optimal" locations established during the theoretical phase of the
study. Table 1 shows the (x,y) coordinates and elevations (h) of the 19
stations in the RTI network. Four of the 19 stations in the network are
St. Louis city/county stations (denoted by the STL prefix in the station
names), twelve are Regional Air Pollution Study (RAPS) Stations of the
United States Environmental. Protection Agency (denoted by the EPA
prefix in the station names), and three stations (denoted by the RTI
prefix) were temporary stations set up by RTI specifically for this re-
search project. The RTI stations were located on the grounds of Incar-
nate Word Academy in northwest St. Louis county; on the grounds of Ken-
-3-
-------
TABLE 1. GEOGRAPHIC LOCATIONS AND TERRAIN ELEVATIONS FOR
STATIONS IN THE RTI NETWORK*
Station
Name
STL008
RTI202
STL009
STL006
RTI205
STL002
RTI207
EPA101
EPA102
EPA104
EPA105
EPA106
EPA108
EPA109
EPA110
EPA113
EPA118
EPA119
EPA120
X
(km)
0.
- 4.
- 7.
-20.
- 6.
0.
10.
6.
5.
9.
4.
1.
10.
18.
9.
0.
5.
_ g
-16.
y
(km)
16.
8.
2.
- 3.
- 6.
-10.
-10.
1.
7.
_ o
- 3.
- 1.
12.
0.
- 6.
10.
-16.
- 8.
8.
h
(m)
45.
79.
44.
46.
37.
12.
11.
24.
5.
13.
50.
36.
9.
13.
6.
55.
28.
56.
37.
* Locations are defined relative to an origin at the inter-
section of Lindell Blvd. and King's Highway in St. Louis.
Elevations are defined relative to a local river elevation
of approximately 100 m.
rick Seminary in southwest St. Louis county; and on the grounds of the
East Side Sanitary District's South Pumping Station in East St. Louis,
Illinois.
The major emphasis of the second phase of the research project
involved the preparation and execution of a summer and winter field
program in St. Louis. These field programs were held during a period
when EPA was performing an intensive study in St. Louis: August, 1975,
and February and March, 1976. During this time, there was a concerted
effort to maintain a high level of performance of the RAPS stations.
-------
OBJECTIVES AND SCOPE OF THE CURRENT RESEARCH
The scope of the current effort is limited to an analysis of the
wind data which were obtained during the summer and winter field pro-
grams. These data consisted of the horizontal wind components as
measured at various time intervals and at 27 sites within the St. Louis
region. These 27 sites included the 19 stations involved in the RTI
network. The objectives of the study are
(1) to develop an easily-automated estimation procedure, based on
the model form in equation (1)> for generating estimated
wind fields, and
(.2) to evaluate the performance of this procedure and of the RTI
network.
Thus, in terms of the six major steps involved in the methodology, this
phase of the research involves a demonstration of step 5, and an evalua-
tion of the overall methodology up through step 5.
Section 2 describes the available data, its limitations, and the
editing procedures employed in preparing the data for analysis. The
analytical approach is described in Section 3 and the results are
summarized in Section 4. Section 5 presents the conclusions, findings,
recommendations, and analytical limitations of the study.
OBJECTIVES AND SCOPE OF REMAINING RESEARCH
Assuming validation of the procedures and network for making wind
field predictions, the next step in the research project will involve an
evaluation of the objective variational analysis model (OVAM) used to
derive the estimated air pollution distribution. The OVAM uses the
estimated wind field as an input parameter, along with the emissions
-5-
-------
inventory and the air pollution concentrations as measured at the net-
work stations. Carbon monoxide (CO) will be used in the evaluation.
The evaluation of the OVAM will be made on a case study basis, with
each case study covering a 12- to 24-hour period. The selected case
studies will be chosen so as to represent a variety of wind conditions
(speeds, directions) and of CO concentration distributions over the
monitoring stations. The basic evaluation parameters will consist of
correlations and root mean squared errors between observed and predicted
CO concentrations at stations outside of the RTI network. As a part of
this study, it will be determined if it is necessary to monitor CO at
each of the 19 network stations.
-6-
-------
SECTION 2
SUMMARY OF AVAILABLE WIND DATA
DESCRIPTION OF RAW DATA
In addition to stations in the RTI network, eight additional sta-
tions provided data. Coordinates and elevations of these stations are
shown in Table 2.
TABLE 2. GEOGRAPHIC LOCATIONS AND TERRAIN
ELEVATIONS FOR STATIONS NOT IN THE
RTI NETWORK*
Station
Name
STL003
STL004
STL007
STL010
EPA103
EPA107
EPA111
EPA112
X
(km)
2
-2
-10
-8
10
2
1
-4
y
(km)
6
-1
10
-12
3
3
-7
2
h
Is!
39
39
81
62
16
44
19
44
* Locations are defined relative to an origin
at the intersection of Lindell Blvd. and King's
Highway in St. Louis. Elevations are defined
relative to a local river elevation of approxi-
mately 100 m.
The total set of 27 stations, whose locations are shown in Figure 1,
will be referred to as the full network; the above set of eight stations
will be referred to as non-network stations (meaning non-RTI-network
stations). Stations STL007, STL010, and EPA103 will be referred to as
outer-non-network stations, since they are located on the border of the
innermost grid (see Figure 1), whereas the remaining five non-network
stations will be called inner-non-network stations. These two sets of
non-network stations are distinguished because it was shown in the first
report in this series (Vukovich, et al., 1978) that, if wind data from
-7-
-------
Figure 1. Location of stations in the RTI network (solid dots) and
other non-network stations (open dots) used in the evaluation
(interior grid spacing = 1 km)
-8-
-------
the RTI network were used to produce predictions at the non-network
stations, considerably better predictions should be achieved for inner-
non-network stations than for outer-non-network stations.
The raw wind data consisted of 1-minute and 5-minute average
values from the EPA stations, 3-minute average values from the St. Louis
(STL) city/county stations and 5-minute average values from the RTI
stations. Five-point averages centered at each half-hour were
constructed. The nominal 20- to 25-minute averaging period is consist-
ent with the averaging performed in the hydrodynamic model, which pro-
duced the simulated wind fields upon which the RTI network was based.
For the EPA and RTI stations, these averages were computed from the
5-minute averages (for the U and V components, respectively) as
and
where the subscripts indicate deviations in minutes between the nominal
(hour or half hour) time point and the midpoint of the averaging inter-
val for the raw data. For the city/county stations,
and
In either case, at least three of the five readings were required to be
present in order for an average wind to be used.
Winds at the three RTI stations and at all of the St. Louis city/
county stations were measured at 10 m above ground level. This was
also the case for three of the RAPS stations: EPA108, EPA110, and
EPA118. Measurements at the remaining thirteen RAPS stations, however,
-------
were made at 30 m above ground level. The wind data from these
stations were therefore inappropriate for evaluating the methodology.
To alleviate this problem, a profile equation for the surface boundary
layer CEstoque and Bhumralkar, 1969) was used to generate estimated
10-m winds at these thirteen stations using the winds at the 30-m level.
The estimated wind components at 10 m at a particular station and time
were determined as:
IP = UQ/3 + 2.5U* [L10-L3()]
; - VQ/3 + 2.5V*
C2)
where
U' and V1 are, respectively, the west-east and south-north compo-
nents of the wind velocity at the 10-m level;
U and V are, respectively, the west-east and south-north compo-
o o
nents of the wind velocity at the 30-m level;
L = An
e
"e+T 1
o .
T
L o J
T is the mean roughness length associated with the particular
site;
e is the elevation (m) above ground level; and
U* and V* are, respectively, the west-east and south-north compo-
nents of the friction velocity.
The U* component was determined as
u* = FA + B|DO| + cu^J sign(uo) C3)
where the coeffients A, B, and C were based on data relating the mean
wind speed to the friction velocity (J.I. Clarke, EPA-RTP, personal
communique, 1978). A similar formula was used for the V*-component.
The coefficients in Eq. (3) were determined separately for each
season (summer and winter) and for each of three types of stations
-10-
-------
(urban, suburban, rural). They are based on comparative analyses be-
tween measured turbulence parameters and wind speeds that were performed
at numerous RAPS stations and were consolidated for the purposes of this
study. The analyses were performed by, and the results acquired from,
the U.S. Environmental Protection Agency. Values of the coefficients
are shown in Table 3 below:
TABLE 3. COEFFICIENTS USED FOR DETERMINING
FRICTIONAL VELOCITY COMPONENTS
Coefficient
Season
Summer
Winter
Region
Urban
Suburban
Rural
Urban
Suburban
Rural
A
-0.04591
-0.05006
-0.01640
-0.07601
-0.04947
0.02616
B
0.18763
0.13023
0,05419
0.16372
0.12742
0.02902
C
-0.01036
-0.00212
0.00102
-0.00469
-0.00275
0.00243
Originally, coefficients were also determined as a function of stabi-
lity. However, the values obtained were judged to be sufficiently
similar so that such additional differentiation was unnecessary.
Table 4 indicates the type of each station and its mean roughness
length (.T ), as used in the above conversion formulae. The roughness
lengths were determined using the technique developed by Lettau (1969),
with parameters developed specifically for St. Louis (Vukovich.et al.,
1976).
The estimated U' and V values determined for the 13 RAPS stations
o o
from equations (2) and (3), along with the observed U and V values for
the other 14 stations, constituted the basic wind data upon which the
evaluations were performed.
-11-
-------
TABLE 4. MEAN ROUGHNESS LENGTHS (T ) ,
BY STATION °
Type
Urban
Suburban
Rural
Station
EPA101
EPA104
EPA106
EPA107
EPA102
EPA103
EPA105
EPA111
EPA112
EPA113
EPA119
EPA109
EPA120
To(»)
0.72
0.39
1.08
1.32
0.20
0.20
0.60
0.24
0.48
0.66
0.66
0.20
0.45
DATA EDITING
For those time points (cases) in which only a few of the 19 RTI-
network stations provided data, the estimation of wind fields would be
quite tenuous; furthermore, evaluation of the performance of the network
for providing good predictions would be unrealistic in such cases.
Consequently, as a first step in preparing the data for analysis, all
cases in which more than one-third of the RTI-network stations failed to
furnish wind data were deleted from further consideration. With this
requirement imposed on the data set namely, that data be available
for at least 13 stations in the RTI network there were 260 cases
available from the summer field program and 654 from the winter field
program.
A manual screening of these data was then performed. Inconsisten-
cies in the city/county data relative to the remaining data in the first
six summer cases, which were scattered across 11 days (July 29 to August
8, 1975), led to the exclusion of these six cases from the basic summer
-12-
-------
data set. These inconsistencies were apparently the result of calibra-
tion problems. Out of the remaining 254 summertime cases, the manual
screening resulted in the deletion of
(a) all wind data from EPA102, which appeared highly inconsistent
with data at nearby stations,*
(b) three extremely peculiar wind values at other stations, and
(c) forty-three consecutive observations for EPA120 and eight for
RTI205 in which instrument failures were apparently respon-
sible for producing zero values for both wind components.
Among the 254 summer cases, no data were available for two stations:
STL010 and EPA111.
A similar editing of the winter data resulted in the exclusion of
all wind data from STL010, and partial exclusion of data from six other
stations. Counts of these exclusions, which also resulted from instru-
ment failures, are shown below:
Initial No. of No. of Cases No. of Cases
Station Reported Cases Deleted Retained
RTI202
RTI205
RTI207
STL008
STL007
STL010
EPA107
454
457
549
639
639
500
616
184
7
4
550
543
500
3
270
450
545
89
96
0
613
The final edited data sets covered an 8-day period in August 1975
(August 9-16) and a 25-day period in the winter of 1976 (February 10 -
March 5). Out of the 8 x 48 = 384 potential cases which could have
occurred during the 8-day summer field program, only 254 were actually
* Major repairs were performed on the wind monitoring equipment at
EPA102 between the times of the summer field program and the subsequent
winter program.
-13-
-------
retained after all editing; only 654 wintertime cases were available,
out of a possible 25 x 48 = 1200. Thus, the field programs not only
were of short duration (especially the summer program) but also failed
to provide "sufficient" data in many cases. This is depicted in the
left-hand portion of Table 5, which shows the distribution of available
cases, by date and time-of-day. It is clear from this table that a
large degree of clustering of cases within time periods occurred.
Consequently, increasing the number of cases by constructing three
rather than two 20-minute averages per hour would not have enhanced the
data base in terms of its coverage of additional wind conditions (see
Table 8).
Unfortunately, a substantial amount of missing data occurred even
within the 908 cases for which 13 or more stations in the RTI network
reported data (because of the above-described editing, or because the
data were simply not reported). For instance, the full set of 19 sta-
tions in the RTI network furnished coincident data in only 14 of the 908
cases; these cases all occurred during a 2-day period within the win-
ter field program, as shown in the right-hand portion of Table 5. Eigh-
teen or more stations in the network reported in only 105 out of the 908
cases.
The high incidence of missing data was not confined to stations in
the RTI network, as evidenced in Table 6. Two of the non-network sta-
tions, STL007 and EPA111, had particularly low reporting rates (after
editing). Assuming 8 full days for the summer program and 25 full days
for the winter program, the reporting rates in terms of individual
observations were as shown in Table 7.
-14-
-------
t_n
TABLE 5. DISTRIBUTION OF CASES, BY DATE AND TIME-OF-DAY AND BY DATE AND NUMBER OF RTI NETWORK
STATIONS REPORTING
No. of Reporting
Time-of-Day Reporting Total Stations in RTI Network
Date
75/08/09
10
11
12
13
14
15
i f.
-LO
76/02/10
12
i ^
X J
14
15
16
17
18
19
20
21
22
X. £.
23
24
25
26
27
28
29
76/03/01
02
03
04
05
Summer Total
Winter Total
0000-0530
6
12
12
12
12
5
12
10
n
u
7
f.
o
0
10
4
0
12
11
6
3
n
\J
3
12
11
11
12
1
12
2
11
12
0
12
71
168
0600-1130
6
12
12
12
9
5
11
11
n
u
8
7
12
12
3
4
1
6
9
i
A.
11
5
11
1
12
1
12
4
12
1
0
2
67
146
1200-1730
0
11
12
12
6
9
1
10
9
£.
12
12
12
12
7
3
0
4
5
a
U
9
2
12
1
12
12
10
12
12
0
11
0
51
180
1800-2330
11
10
12
12
7
9
0
0
0
J
7
8
1
3
9
11
2
0
0
n
\J
12
10
11
11
12
12
12
12
12
0
12
0
65
160
23
45
48
48
34
28
24
31
c
J
34
27
35
31
19
30
14
16
17
q
y
35
29
45
24
48
26
46
30
47
13
23
14
254
654
12 13
__
11
n
5
1
1
5
1
I
3
2
1
1
0 11
3 19
14
13
1
1
2
3
3
4
3
4
3
1
1
1
4
1
1
15
36
15
23
45
11
3
8
/,
3
1
7
2
1
5
3
1
4
18
12
4
7
2
2
7
2
94
81
16
13
42
12
1
3
11
17
4
10
7
2
8
2
6
4
4
2
19
4
4
5
19
7
14
1
5
1
71
56
17
3
2
7
3
20
5
14
22
11
22
16
13
6
6
7
20
10
15
24
12
14
20
19
11
9
6
15
302
18
11
20
17
4
4
4
10
2
6
4
48
43
19
^_
5
9
_
0
14
Overall Total 239
213
231
225
908
30 51 175 227 317 91 14
-------
TABLE 6. NUMBER OF CASES FOR WHICH VALID WIND DATA ARE REPORTED, BY
STATION
No. of Cases
Network Station
RTI:
STL008
RTI202
STL009
STL006
RTI205
STL002
RTI207
EPA101
EPA102
EPA104
EPA105
EPA106
EPA108
EPA109
EPA110
EPA113
EPA118
EPA119
EPA120
Summer
254
223
254
254
242
253
250
116
0
240
243
254
252
253
248
245
164
70
203
Winter
89
270
639
639
450
639
545
618
641
639
647
638
645
514
530
636
635
628
630
Total
343
493
893
893
692
892
795
734
641
879
890
892
897
767
778
881
799
698
833
Non-RTI:
Outer:
Inner :
STL007
EPA103
STL003
STL004
EPA107
EPA111
EPA112
254
253
254
254
231
0
237
96
580
638
638
613
596
624
350
833
892
892
844
596
861
NOTE: Station STL010, in the outer non-network, is omitted because no
"valid" data were reported from this station.
-16-
-------
TABLE 7. NUMBER OF AVAILABLE AND POTENTIAL OBSERVATIONS,
BY NETWORK AND SEASON +
Subset of No. of
Stations Stations
RTI Network
Non-Network*
Outer*
Inner
Full Network
19
8
3
5
27
Summer
No.
Obs.
4018
1483
507
976
5501
Field Program
Potential
No. Obs.
7296
3072
1152
1920
10368
Rate
(%)
55.1
48.3
44.0
50.8
53.0
Winter
No.
Obs.
10672
3785
676
3109
14457
Field Program
Potential
No. Obs.
22800
9600
3600
6000
32400
Rate
(%)
46.8
39.4
18.8
51.8
44.6
+ Potential no. observations = no. stations x 48 cases per day x no.
days
* For completeness, STL010 is counted as a potential station, although
no valid wind data were obtained from this station.
It is clear that the limited time span, the limited number of non-
network stations, and the large amount of missing data impose some
severe limitations on the model and network evaluations. Fortunately,
analysis of historical, seasonal wind roses for the National Weather
Service Station at Lambert Air Field in St. Louis showed that the winds
which occurred during the 8-day summer field program were typical of the
wind conditions which are prevalent in St. Louis in the summer (i.e.,
predominantly south to southwest winds of low velocity). The distribu-
tion of prevailing wind speeds/directions for these 254 cases is shown
in the upper portion of Table 8. The prevailing wind speeds and direc-
tions for a particular case were based on the average wind vector over
the following outlying stations: STL006, STL007, STL008, EPA118, EPA119,
EPA120. This definition is maintained throughout this report. On the
other hand, the winter data also showed a predominant southerly wind (of
-17-
-------
TABLE 8. DISTRIBUTION OF CASES, BY SEASON AND PREVAILING WIND CONDITIONS
Season
Summer
Prevailing
Speed
(mps)
0-1
1-2
2-3
3-4
4-5
Prevailing Direction
N NE E
i «._ ««
SE
7
2
9
1
S
24
60
35
3
SW
14
46
19
1
W
8
7
7
2
1
NW
1
6
Total
55
121
70
7
1
Total
19 122
80 25
254
Winter
0-1
1-2
2-3
3-4
4-5
5-6
6-7
7-8
8-9
__ _ _ i
i __ __
1
3
4
18
32
11
3
3
28
47
114
114
40
11
4
4
4
8
19
36
33
28
13
4
1
7
5
10
18
18
6
4
1
12
49
89
193
176
89
30
12
4
Total
71 365 145 69
654
Combined
0-1
1-2
2-3
3-4
4-5
5-6
6-7
7-8
8-9
1
1
1
1
10
6
27
33
11
3
27
88
82
117
114
40
11
4
4
18
54
38
37
33
28
13
4
9
14
12
12
19
18
6
4
1
7
67
170
159
200
177
89
30
12
4
Total
90 487 225
94
908
Prevailing wind speed and direction for a particular case is based
on the average wind vector over the following outlying stations:
STL006, STL007, STL008, EPA118, EPA119, EPA120.
-18-
-------
somewhat higher velocity), which contrasts with the wintertime pattern
of northerly winds typical of the St. Louis area. Thus the model and
network evaluation is also limited in terms of the types of cases covered,
An overall summary of the observed, edited data is given in Table
9, which shows sample sizes (N) and the means and standard deviations
(s.d.) of the wind components (denoted by U and V) and wind speeds (de-
noted by W) for each station. Some care must be exercised in comparing
the means of two or more stations, since the averages are not neces-
sarily taken over the same set of cases due to the presence of missing
data.
-19-
-------
TABLE 9. SUMMARY STATISTICS BY STATION FOR
OBSERVED WIND DATA OVER ALL CASES
O
J5.I Al ION
S LU06
R 1202
S L U 0 9
S LOUb
R I 2 U b
S L002
H 1207
EPA101
EPAJ.U2
EHA104
EH All) b
EHAlOb
EPA10H
EPA 10 9
EPA110
EPA115
EH Aliti
EPA119
EPA120
S'lLUU/
SIL010
EPA103
S 1 L 0 () 5
SILU04
EPA107
EPAill
EPA112
M
S'»3
'IS 3
ns3
oy3
6^2
BS2
7S5
741
ftMl
B79
«yo
8S2
697
767
77ft
pfil
7^9
6 9 a
833
39
3.100
3.189
4.559
3.787
3,2?7
2,570
4,233
3,046
2.969
3.080
4,496
2,890
2.305
2.014
o, n
4.522
2,226
3.095
2,756
3,532
2.680
W fit AN
(mps)
1.925
2.176
3.076"
3.133
2.496
3.725
3.718
3.846
b.527
4.667
3.610
3.411
b,12b
3.595
3.830
3.918
5.201
3.602
3.007
2.460
0.0
5.483
3.321
3.851
3.314
4.313
3.383
U S.I),
(mps)
2.081
1,641
1.423
2.001
1,222
1.802
2.056
1.837
2.669
2.231
U634
1,720
2,671
1.677
2.246
1.976
2.426
1.916
1.718
1.165
n.O
2.590
1 .839
2.230
1.479
2.189
1.793
V S.D.
Tmp₯)
0.926
1.252
1.953
1.941
1.562
1.782
2.206
1.659
2,274
2.189
1.778
1.386
2.968
1.881
1.904
1.685
2,580
1.730
1.464
1.287
0.0
... 2>437
1,632
1.776
1.4BO
1.905
1.564
W S . 0 .
Tmps)
1,874
1,369
1.689"
1.746
1,594
1.604
2,220
1.360
1.929
2.058
1,646
1,162
2,Q44
1.760
2.029
1.615
2,448
1.6U1
1,352
1.018
0.0
2.115
1,446
1.782
1.274
1.703
1.411
-------
SECTION 3
EVALUATION TECHNIQUES
The development of the RTI network was based on the 13-term model
form CEq. (1)), as described in Vukovich et al. (1978). Based on the
simulated data, it appeared reasonable to assume that a model of this
order of variability would yield adequate wind field predictions in all,
or virtually all, cases which might occur within the St. Louis areaif
actual and simulated winds behave similarly. It was also apparent from
the simulated data that this order of variability (i.e., the full model)
would not necessarily be required in all, or even in most, cases. That
is, some simpler model would be sufficient in the majority of cases.
Because fitting the full model in a case in which a simpler submodel is
appropriate can substantially decrease the precision of a predicted
value, the modeling procedure not only must provide estimates of the
regression coefficients but also must establish, through some variable
selection technique, the form of the model. It should be noted that a
proper evaluation of the theoretical phase results requires that wind
fields be estimated via some submodel of the model given in Eq. (.1)
In an actual implementation of the technique, however, other surface-
fitting techniques could be utilized for estimating the winds over the
region of interest.
There are many possible variable selection procedures; in general,
three basic steps are involved:
CD specification of a class of potential model forms from which
the selection is to be made,
(.2) determination of a single "good" model form from within this
class, and
(3) estimation of the parameters of this model form.
-21-
-------
Step (1) was performed as a part of the network selection during the
theoretical phase of the study. In the current context, the proposed
methodology can be considered successful in terms of_ selecting, a. model
form for estimating wind fields only if submodels of Eq. (1) can
provide "adequate" fits to the wind component data at any point in time.
In terms of both model and network selection, the methodology can be
considered successful for estimating wind fields if applying this model-
ing procedure to wind data from stations in the RTI network provides
accurate prediction of the winds over the region or, in practice, at
particular sites not in the RTI network.
The first requirement for the evaluation is therefore to define the
wind field modeling procedure i.e., to define precisely this aspect of
the methodology. For this evaluation, the definition must be compatible
with the results of the theoretical phase and must therefore utilize the
model of Eq. (1) as its basis. The second requirement is a definition
of alternative modeling procedures against which this procedure can be
compared. The development of these modeling procedures is described in
the subsection below.
The next step in the evaluation is to define measures of model
"adequacy" for making the comparisons among the alternative procedures.
Finally, measures of accuracy for judging the success of the overall
methodology (excluding the air quality predictions) are needed. These
two steps are described in the last two subsections of this section.
As indicated in the previous section, the 908 cases available for
evaluating the methodology do not constitute a probability sample of
time intervals; consequently, it was not possible to make valid statis-
tical inferences to the population of wind conditions occurring in St.
-22-
-------
Louis during some given period of time (e.g., one year). On the other
hand, consideration of a few selected cases would also not appear to be
sufficient for evaluating the methodology. Hence, the basic strategy
adopted for the evaluation involves generating estimated wind fields for
all 908 cases, generating measures which reflect the precision and
accuracy of these estimates, and then summarizing these measures in
terms of descriptive statistics over all cases and over various
subsets of cases.
SELECTION OF MODELING PROCEDURES
The class of model forms indicated by the analysis of the simulated
wind data consists of all possible subsets of the following twelve terms
Cas defined in Eq. (D):
, 22 332244,, ,,,
(x, y, x , y , xy, x , y , x y, xy , x , y , h) (.4)
This assumes that a constant or intercept term would be required in any
12
selected model. Thus there are 2 = 4,096 possible model forms (i.e.,
subsets of terms), which range in complexity from a constant, one-term
model (corresponding to the selection of no terms from (4)) to a full
13-term model corresponding to the selection of all 12 terms in (4) .
Many algorithms can be used for selecting variables; however, most
of these procedures provide the user with a list of candidate models.
The user must then apply some additional criterion in order to arrive at
a single model form. This is regarded as a major advantage of these
techniques; in the present context, however, such techniques are not
practical unless one can also automate the additional criterion because
of the large number of cases. For instance, in this study, the user of
such a technique would have to examine 1,816 lists of candidate models
-23-
-------
(908 cases x 2 wind components). The burden on the user during an
actual implementation of the methodology would also be extreme if such
an approach were to be adopted. Hence, one practical constraint on the
variable selection procedure to be used is that it be fully automated in
the sense that it incorporates its own stopping criterion and therefore
yields a single model for each individual case. Even so, such an approach
cannot be advocated for general implementation unless, for selected
cases, one can carry out (a) an examination of residuals, and (b) a
comparison with alternative modeling procedures (such as all-possible
regressions). Regardless of what procedure might be implemented, it
would also be essential that screening the wind data for erroneous
values precede the model fitting.
Three sequential variable selection techniques which meet the above
described constraint are the forward selection technique, the backward
elimination technique, and the stepwise technique. Draper and Smith
(1966) and Barr et al., (1976), for example, provide descriptions of
these techniques and their relative merits. The stepwise procedure is
generally considered to be superior to either of the other techniques.
Also, the backward elimination technique, which successively deletes
terms from an assumed larger, "full" model, would encounter estimability
problems when the number of model terms exceeded the number of reporting
stations. Hence, the stepwise regression approach was selected for use
in the evaluations.
This procedure requires the use of two parameters referred to as
the "inclusion" and "retention" parameters. As with the forward selec-
tion approach, the stepwise procedure begins by finding the best 2-
variable model; this assumes an intercept is always included and is
-24-
-------
counted as one of the variables. Here, variable A is considered better
than variable B if its correlation with the dependent variable (i.e.,
the observed wind component data) is higher, or more generally, if the
partial F-statistic associated with variable A is larger than that for
variable B. The variable with the largest F-statistic is retained in
«
the model if the significance probability associated with the F-statis-
tic is less than the "retention" parameter. If so, partial F-statistics
associated with the remaining independent variables are computed and
their significance probabilities are compared with the "inclusion"
parameter; the variable with the smallest significance probability is
added if this probability is less than the "inclusion" parameter. After
such, a variable is added, partial F-statistics are computed for all
variables currently in the model to determine if any variable should be
deleted from the model. A previously included variable is dropped if
its associated significance probability exceeds the "retention" parame-
ter. After any such deletions have been made, the F-values for the
remaining variables are again determined to see if any meet the inclu-
sion criterion. This process is continued until no variable can meet
the inclusion criterion or until deletion of the last included variable
occurs.
Two pairs of inclusion and retention parameters were used:
Modeling Inclusion Retention
Procedure Parameter Parameter
1 0.10 0.10
2 0.20 0.20
These values were chosen, as opposed to smaller values, because of the
small effects expected for many of the candidate terms and because of
the small sample sizes namely, about 15 stations per case for the RTI
-25-
-------
network. In such situations, the use of smaller parameter values is
generally not recommended because the derived models will tend to omit
one or more "good" predictors.
Part of the evaluation procedure is thus a determination of which
of these procedures is the more appropriate. Obviously, procedure 2
generates larger (i.e., more terms) models than does procedure 1; also,
2
procedure 2 produces larger R statistics (the square of the multiple
correlation coefficient) and smaller residual sums of squares than pro-
cedure 1 achieves. However, procedure 1 may produce better predictions
if procedure 2 tends to select "too many" terms.
Another key question to be addressed in the evaluation involves the
choice of the initial class of model forms. For instance, is there
another class of model forms which contains models that would provide
substantially better approximations to the wind fields in the St. Louis
area? Obviously, this aspect of the evaluation can be carried out only
to a limited degree since there are an infinite number of possible model
forms which could be investigated. The problem of evaluation is com-
pounded by the fact that many cases are involved. In order to provide
some evaluation of this potential source of error in the methodology,
several other modeling procedures are considered. Whereas procedures 1
and 2 above are consistent with the proposed methodology, these addi-
tional procedures, in one way or another, are inconsistent with it.
Hence, if performance of one of these additional procedures was judged
to be substantially superior to procedures 1 and 2, it would indicate a
deficiency in the proposed methodology. On the other hand, "good"
performance by procedure 1 or 2 relative to the additional procedures
would tend to support this aspect of the methodology but would not, of
-26-
-------
course, provide absolute proof of it because of the limitations involved
in the evaluation.
The four additional modeling procedures used for the evaluation are
defined below:
Modeling
Procedure Description
0 Fit the full 13-term model by ordinary least
squares.
3 Apply stepwise regression to a larger class of model
forms, utilizing the same "inclusion" and "retention"
parameters as used for procedure 1.
4 Same as procedure 3, but using the parameters of
procedure 2 rather than those of procedure 1.
5 Fit a flat surface (i.e., a one-term model involving
the constant term) by ordinary least squares.
As with procedures 1 and 2, the above procedures are applied on a case-
by-case basis for each horizontal wind component.
Procedures 0 and 5 represent the extremes of the previously-defined
class of model forms used in procedures 1 and 2. These two procedures
are not considered likely candidates for modeling winds, but are defined
here because summary statistics based on these procedures are used for
comparative purposes in the evaluation.
Procedures 3 and 4 differ from procedures 1 and 2, respectively,
only in the choice of initial terms from which a model is developed.
This initial class of terms for procedures 3 and 4 involves a total of
22 terms; in addition to all 12 terms shown in (4), the following 10
are also included:
x y
x5
3
xy
y
2 2
x y
6
x
xh
6
y
yh
h2
The basis for selecting these additional terms was the analysis of the
simulated data, as described in Vukovich., et al. (1978). This analysis
-27-
-------
indicated that such terms, while less important than the 12-term set,
were nevertheless useful for explaining some of the variation in some of
the simulated cases. The class of models based on the 22-term set
o 2
contains 2 potential model forms; hence, this class is 1,024 times
larger than the class based on 12 terms. Because the 12-term set is a
subset of the 22-term set, it is clear that models based on procedure 3
2
(or 4) will generally explain more variation (i.e., larger R values and
smaller residual sums of squares) than procedure 1 (or 2). However, in
terms of accuracy of predictions, models based on procedure 1 or 2 could
still be superior to models based on procedures 3 or 4.
All six of the modeling procedures described above can be regarded
as six different techniques for selecting a subset of 23 terms which
consists of an intercept plus 22 specific terms. Let x_ denote the
column vector of these 23 terms at an arbitrary location in the St. Louis
area; that is,
i ,, 22 332 2 4 4 u
x1 = (1, x, y, x , y , xy, x , y , x y, xy , x , y , h,
3 3556622, ,,2,
xy, xy,x,y,x,y,xy, xh, yh, h ) (5)
Let _§, denote a 23 x 1 vector of unknown coefficients for wind com-
ponent k at time t. The general model can therefore be expressed as
h £Vi- (6)
where Zfct = Zkt(x,y) is the observed value of the kC wind component at
(x,y) and time t, and e is a random deviation in component k at time
K.L
t at the point (x,y). Table 10 summarizes the six modeling procedures
with respect to this general model formulation. As indicated in this
table, each procedure involves an assumption as to which coefficients
are negligible (i.e., which terms in the x vector are deleted). Model
(6) reduces to model (1), for example, when the last ten terms of
-28-
-------
(5) are assumed negligible. Also, procedures 1 through 4 may deter-
mine, on the basis of the statistical tests involved in the stepwise
algorithm, that other parameters can reasonably be assigned a zero
value. In these cases, the selected model form will depend on what data
TABLE 10. SUMMARY OF MODELING PROCEDURES*
Modeling Procedure
0
1
2
3
4
5
Coefficients assumed
to be non-zero:
Coefficients assumed
to be zero:
Coefficients which
may be zero, as deter-
mined by stepwise
regression:
Stepwise regression
parameters -
1-13
14-23 14-23 14-23 none
none
2-23
none
2-13
2-13
2-23 2-23
none
Inclusion:
Retention:
N/A
N/A
0.1
0.1
0.2
0.2
0.1
0.1
0.2
0.2
N/A
N/A
* Term numbers appearing as tabular entries assume that terms
are ordered as in definition (5)
set is utilized, e.g., the full network or the RTI network. Once the
model form has been determined, ordinary least squares is used for
estimating the parameters.
CRITERIA FOR EVALUATING MODELING PROCEDURES
Each of the modeling procedures is applied, at a given point in
time, to two sets of wind datathe data from stations in the RTI
network and the data from all stations (i.e., the full network). Thus,
S
for each case, twelve estimated wind fields are produced (.2 networks x 6
procedures), as illustrated below:
-29-
-------
Network Used for
Modeling Procedure
Model Estimation
RTI
Full
0
1
2
3
4
5
For evaluating the modeling procedures, data from the full network
(F) are utilized to determine the model forms and to estimate parame-
ters. For each case (t) and wind component (k), six models are there-
fore estimated.
Assume that there are n (F) stations in the full network which pro-
vide "valid" wind data at time t. Let p, (j,F) denote the number of
iCC
terms in the (selected) model when procedure j is applied to this set of
*
data. Let X denote a matrix consisting of n (F) rows Cone row cor-
responding to each reporting station) and 23 columns; the i row con-
sists of the x.' vector (5) evaluated at the coordinates of the ±
station. Once the form of the model has been established, the least
squares estimates are determined as
4jkt = (XjktXjkt} XjkAt
where
"(F)
JLkt is the vector of Pkt(J»F) estimated coefficients from proce-
dure j, applied to the full network (F),
Zfct is the vector of observed data, Z^O^.y^, 1=1,2,..., n (F) ,
and
X is a matrix obtained by deleting those columns (terms) of X*
that are associated with zero regression coefficients, as
indicated by the particular procedure (see Table 10).
At an arbitrary point (x,y) in the region, the six predicted values of
the wind component are obtained as
-30-
-------
,« R(F)
ijkt -Sjkt C7)
where xjkt consists of the relevant model terms. Hence, if coordinates
of the i station, (x^y^, are substituted into (7), predicted
values for this station are determined. Let Z (j,F,i) denote the
KrL
predicted value of the k wind component for case t and station i, when
procedure j is applied to the full network (denoted by F) . The observed
wind component at station i for case t is denoted by Z (i), i.e.,
let
Z, (i) = Z, (x.,y.). For each value of k, t, and i, there are six
deviations between observed and predicted values:
ekt(j,F,i) = Zkt(i) - Zkt(j,F,i) j=0,l,...,5. (8)
These deviations form the basis for evaluating the modeling procedures.
It should be noted that the mean of these deviations is zero when
the average is taken over stations in the full network (denoted by ieF);
that is,
£ e (j,F,i) = 0 for all k, t, and j. (.9)
ieF kt
In this same situation, the residual sums of squares corresponds to the
sum of the squared deviations:
£ e? (j,F,i) = [n (F) - p (j ,F) ]sj (j ,F) U-0)
ieF kt t kt kt
where
n (F) = number of stations in network F providing valid data at
time t,
P (J>F) = number of terms in the selected model when procedure j
is applied to data from network F, for component k at
time t, and
2 ,.
- ,rj =
ponent k at time t.
f\
s (j,F) = the residual variance from the model based on procedure
kt j when it is applied to data from network F, for com-
-31-
-------
To simplify notation, let SSE(j) denote the sums of squared deviations
appearing in CIO) for an arbitrary k and t. Then, as previously
indicated, the following conditions must hold :
SSE(O) < SSEC2) ^ SSE(l) 5 SSE(5) (H)
SSE(4) < SSE(3) < SSE(5). (12)
The following conditions also usually, though not necessarily, hold :
SSE(3) £ SSECQ (13)
SSE(4) < SSE(2). (14)
For an individual case and wind component, typical ways for evalu-
ating the fits of various models are
(a) comparison of individual residuals
(b) comparison of frequency distributions of the residuals or of
absolute values of residuals or equivalently, the propor-
tion of residuals less than some constant
(c) comparison of residual variances
2
(d) comparison of R statistics
2
(e) comparison of adjusted R statistics.
The residual variances are defined in (10) and can be rewritten as
2
The R statistics for a particular case are defined as
R2 H F^ - SSE(5)-SSE(j)
V3'^ " SSE(5) ' CL6)
2
The adjusted R statistics for a particular case are given by
2 2 F)
o ,n (17)
2
It should be noted that R statistics are highly dependent on model
size in situations where the number of parameters is large relative to
the number of observations. The same is true, to a lesser degree, for
-32-
-------
the residual variance criterion. The adjusted R2 statistics avoid this
problem.
The general strategy for comparing modeling procedures over cases
involves (a) computing the above-described statistics for each case and
summarizing the distributions of such statistics over all cases or over
relevant subsets of cases or (b) computing analogous statistics "pooled"
over all cases or over relevant subsets of cases. The subsets of pri-
mary interest are the following:
season (i.e., the winter or summer field program),
- prevailing wind speed categories (0-2 mps, 2-4 mps, 4-6 mps,
>6 mps),
prevailing wind direction categories (E & SE, S, SW, other).
The pooled residual variance over an arbitrary subset of cases
(say, C) is defined as
(18)
Note that this is a weighted average of the individual residual vari-
2
ances. The corresponding overall R 's are obtained as the proportions
SSE(5)-r SSE(j)
£
teC
I
teC
nt(F)-Pkt(j
,F) sf (j,F)
' y kt J
J
"nt(F>-PktU,F)J
SSE(5)
2
The associated adjusted R 's are computed as
«
ieC
ieC
-33-
-------
Because the above-described criteria are based on residuals which
result from fitting models to the full network data, they cannot provide
a thorough evaluation of the modeling procedures. One option, for
example, for extending the evaluation would be to apply the modeling
procedures to various subsets of the full network obtained by deleting
one or several data points and to examine the distributions of residuals
occurring at the omitted points. Except for one special case, such a
procedure was not employed because of the large number of cases involved.
The special case involved selecting the subset of stations to be the RTI
network. Then, such a procedure leads to a joint evaluation of the
modeling procedures and the RTI networkthat is, an overall evaluation
of the proposed methodology for estimating wind fields. This is discussed
in the following subsection.
CRITERIA FOR EVALUATING THE RTI NETWORK
The remaining six predicted wind fieldsthose based on data from
stations in the RTI networkare utilized for evaluating the performance
of the RTI network. This network evaluation can be carried out only in
a limited sense. In particular, most of the evaluative measures must be
judged in terms of absolute, rather than relative, units, since observed
data are available at only a limited number of sites. That is, there is
little "feel" for how well some other network of comparable size might
have performed. Although comparing the performance of the RTI network
with that of the full network is useful in terms of the overall wind
prediction capability of the network, it does not provide a separate
evaluation of the RTI network. Rather, differences in the evaluative
criteria for the two networks generally represent measures which reflect
-34-
-------
both network and model differences. Such differences do, however, yield
a limited comparative evaluation.
The evaluative criteria for this aspect of the evaluation are of
two basic types. Both types are based on the deviations between ob-
served and predicted wind component values, where the predicted values
are determined by applying the modeling procedures to data from stations
in _the RTI network.
The first type of criteria, and their formulation and properties,
are completely analogous to those described in the previous subsection.
These criteria are obtained by using "R" to represent the RTI network
and substituting R for F into (7) through (.20). Such criteria do not,
of course, provide measures of the accuracy of the predictions but
simply characterize the predictions over the RTI network itself.
Criteria of the second type do provide such measures of accuracy
and, consequently, are regarded as the more important type. Let Q
denote some subset of the non-network stations. For instance, Q might
represent
(a) the non-network,
(b) the inner non-network,
(c) the outer non-network, or
(d) an individual station within the non-network.
Let C define subsets of cases, as previously described. The accuracy
measures are of three basic types:
(1) means of deviations, over C and Q,
(2) means of squared deviations (or the square root thereof), over
C and Q, and
(3) frequency distributions of deviations, over C and Q.
The mean deviations are defined, for each modeling procedure (j)
and each wind component (k), as
-35-
-------
N
teC ieQ CQ teC ieQ
where N = £ n (Q) = the number of observed values occurring at sta-
^ teC t tions in the Q subset and within the set of
cases C.
These means represent average biases over the particular subsets of
cases and non-network stations. The root mean squared error (RMSE)
criteria are determined as the square root of
£ £ ;kt«'R'i)/NCQ- (22)
teC ieQ
A mean squared error criterion for the Q and C subsets which encompasses
errors in both the U- and V-components is obtained by summing (22)
over k. This criterion, referred to as the vector mean square error,
represents an average of the squared differences between the observed
and predicted wind vectors in terms of distances in the (U, V) plane.
The vector MSE can be partitioned into two components which represent
the mean squared errors in predicting wind speeds and in predicting wind
directions:
Vector MSE = [Z (i)-Z
^CQ k=l teC i£Q fct
f- i'i \i 4<» + i
CQ teC ieQ (k=l Kt l£l
2 , ]
-2£ Zkt(I) Zkt(j,R,i)
£ [w (i)-W (j,R,i)1
teOL ^ C J
CQ teC ieQ
2
£ £ W (i)W (j,R,i) {l-cos[0 (i)-9 fj,R,i)]l (23)
:eC ieQ ^ c '
NCQ tC ieQ
-36-
-------
where W (i) and 9 (i) a^e the observed wind speed and direction, respec-
tively, for case t at station i, and
/\ "
W (j,R,i) and 6 (j,R,i) are the corresponding predicted values
based on applying procedure j to data from the RTI net-
work.
The first component in (23) is the MSB associated with predicting wind
speeds; the second term is the MSE associated with direction errors.
-37-
-------
SECTION 4
EVALUATION RESULTS
Although it would be desirable to have evaluative measures which
would isolate the effects of the modeling procedures from those of the
network, this is not really feasiblebecause evaluation of the proce-
dures is conditional on what network is used and because evaluation of
the network requires that a given model be employed across a number of
alternative networks. Consequently, the evaluation is organized as fol-
lows:
(1) Comparing modeling procedures over the full network when data
from the full network are used to establish model forms and
parameter estimates.
(2) Comparing modeling procedures over the RTI network when data
from the RTI network are used to establish the model forms and
parameter estimates and comparing results to those of (1)
above.
(3) Evaluating the accuracy of predicted results at non-network
stations, when estimation has been carried out using data from
the RTI network stations.
It should be noted that (1) and (2) above, in contrast to (3), are
basically concerned with precision. Also, (1) deals with the criteria
discussed in the second subsection of Section 3, whereas (2) and (3)
deal with the two types of criteria discussed in the last subsection of
that section.
Before discussing the results of these evaluations, it is useful to
characterize the modeling procedures involving stepwise regression in
terms of the model forms that resulted from applying the algorithms.
This is the purpose of the subsection below.
-38-
-------
SUMMARY OF SPECIFIC MODELS SELECTED BY ALTERNATIVE APPROACHES
Each of the four stepwise regression procedures (procedures 1-4 of
Table 10) was applied to data from the RTI and the full networks. The
RLSTEP subroutine of the International Mathematical and Statistical
Libraries. Inc. (1975) was utilized to perform the stepwise regressions.
Potentially, eight different model forms could result for each specific
case and wind component. Table 11 characterizes the models selected by
the four procedures in terms of the frequency with which various size
models result. This table demonstrates that models containing more than
four or five terms are rarely selected. As expected, models from proce-
dure 4 are larger than those from procedure 3; similarly, models from
procedure 2 are larger than those from procedure 1. The pattern of
these distributions is similar for both the U- and V-components; however,
smaller size models are much more frequent for the U-component. When
the same modeling procedure is applied to the two different networks,
there is a tendency for the full network cases to yield slightly larger
models; this, of course, is not surprising because of the increase in
statistical power which results from the larger number of stations used
in the full-network estimations.
Table 11 also indicates the number of times (out of 908 cases) that
flat-surface models are chosen (i.e., the number of 1-term, constant
models). This is summarized, in terms of percentages, below:
Percentage of Cases in Which One-Term Models are Selected
RTI Network Data Full Network Data
Procedure
1
2
3
4
U
36.2
15.1
23.3
8.6
V
15.2
5.5
10.1
3.3
U
30.4
11.2
20.5
4.8
V
10.2
4.0
7.8
2.5
-39-
-------
TABLE 11. DISTRIBUTION OF CASES BY MODEL SIZEFOR FOUR MODELING
PROCEDURES APPLIED TO WIND COMPONENT DATA FROM STATIONS
IN THE RTI NETWORK AND THE FULL NETWORK
Using Data From
RTI Network
Modeling Procedure
No. of
Terms in
Selected
Model
1
2
3
4
5
6
7
8
9
10
11
12
1
2
3
4
5
6
1
8
9
10
11
12
13
14
15
16
1
329
362
136
54
19
6
2
0
0
0
0
0
138
287
263
155
43
13
5
2
2
0
0
0
0
2
137
273
223
119
86
33
21
6
6
3
1
0
50
127
238
193
149
79
35
25
8
4
0
0
0
3
257
328
190
81
24
14
9
2
2
1
0
0
92
234
277
172
82
33
12
5
1
0
0
0
0
0
0
0
4
U-Component
78
200
231
169
79
59
37
22
19
9
2
3
V- Component
30
74
167
184
160
112
76
40
32
21
4
5
0
1
1
1
Using Data From
Full Network
Modeling Procedure
1
276
348
161
64
48
7
4
0
0
0
0
0
93
237
318
162
62
27
6
2
1
0
0
0
0
2
102
231
233
144
110
45
23
15
1
4
0
0
36
84
187
233
173
122
48
19
5
1
0
0
0
3
186
302
227
112
52
16
7
2
3
1
0
0
71
189
267
220
98
49
9
3
2
0
0
0
0
0
0
0
4
44
148
243
193
112
72
43
29
12
5
6
1
23
59
155
190
175
133
94
45
19
12
0
1
2
0
0
0
-40-
-------
These percentages indicate how frequently each procedure yields a model
like the procedure 5 model. These cases are important in that no varia-
tion is accounted for by such models (i.e., R2=0).
Table 12 provides pairwise comparisons of the network/modeling pro-
cedures in terms of their model sizes and model forms. The two methods
involved in a comparison (denoted by method A and method B in Table 12)
can differ in several ways, as indicated below:
Type of
Comparison
I
II
III
IV
V
VI
VII
Initial
Class of
Model Forms
Different
Same
Different
Same
Different
Same
Different
Stepwise
Regression
Parameters
Same
Different
Different
Same
Same
Different
Different
Network
Used as
Data Base
Same
Same
Same
Different
Different
Different
Different
As might be expected, similar size and similar form models occur more
frequently when the two methods being compared are more alike for
example, types I, II, and IV as compared to type VII. Table 13 shows
the results of Table 12 relating to the similarity of model forms in
terms of percentages.
Out of the 908 cases, the number of times that each of the 23 model
terms occurred in a selected model is shown in Table 14. Because many
of the potential terms are highly correlated, the inclusion of a particu-
lar term in a model is highly dependent on what other terms are involved
in the model; also, there are likely to be many models with essentially
the same predictive capability. Thus the results of Table 14 merely
provide a descriptive summary of the selected models and should be so
interpreted.
-41-
-------
TABLE 12. PAIRWISE COMPARISONS OF MODELING PROCEDURES IN TERMS OF MODEL SIZES AND MODEL FORMS
U-Component
Number of Cases With:
Type of
Compar- Methods
ison A
I 1R
2R
IF
2F
II 1R
3R
IF
3F
III 1R
3R
IF
3F
IV 1R
2R
3R
4R
V 1R
2R
IF
2F
VI 1R
3R
IF
3F
VII 1R
3R
IF
3F
B
3R
4R
3F
4F
2R
4R
2F
4F
4R
2R
4F
2F
IF
2F
3F
4F
3F
4F
3R
4R
2F
4F
2R
4R
4F
2F
4R
2R
Method
A Model
Smaller
212
367
262
393
488
566
516
573
662
410
691
419
212
281
265
333
398
486
248
353
566
640
431
483
722
506
608
343
Method
B Model
Smaller
35
91
47
119
0
2*
0
0
14
95
14
93
112
162
140
215
77
128
186
226
22
41
36
64
18
96
44
170
Same
Model
Size
661
451
599
396
420
340
392
335
232
403
203
396
578
465
503
360
433
294
474
329
320
227
441
361
168
306
256
395
Same
Model
Form
515
263
422
192
420
336
388
329
172
258
125
217
511
337
405
223
278
104
322
119
262
154
365
256
94
166
146
194
V-Coraponent
Number of Cases With:
Method
A Model
Smaller
258
426
254
367
536
603
568
585
678
457
688
476
267
331
296
318
401
486
291
419
617
616
490
570
708
543
648
429
Method
B Model
Smaller
67
123
51
123
0
2*
0
0
14
97
19
74
156
224
206
314
126
198
229
256
45
70
80
69
42
112
62
174
Same
Model
Size
583
359
603
418
372
303
340
323
216
354
201
358
485
353
406
276
381
224
388
233
246
222
338
269
158
253
198
305
Same
Model
Form
401
167
412
187
369
296
335
308
122
186
121
172
390
225
265
119
221
67
204
52
177
116
250
157
76
80
87
137
The notation 1R means modeling procedure 1 applied to the RTI network; similarly, 2F means procedure 2
using data from the full network of stations. See Table 10 for definitions of modeling procedures.
This number is not zero, because procedure 4R failed to meet round-off tolerances in these cases and the
"selected" model from procedure 4R was defined to be the constant, one-term model.
-------
TABLE 13. PERCENTAGE OF 908 GASES IN WHICH NETWORK/MODELING
PROCEDURES RESULTED IN THE SAME MODEL FORM
U-Component
Method 2R
1R 46.3
2R
3R
4R
IF
2F
3F
3R 4R
56.7 18.9
28.4 29.0
37.0
IF
56.3
40.2
35.5
16.1
2F
28.9
37.1
18.3
13.1
42.7
3F
30.6
21.4
44.6
28.2
46.5
23.9
4F
10.4
11.5
17.0
24.6
13.8
21.1
36.2
V- Component
Method
1R
2R
3R
4R
IF
2F
3F
2R 3R 4R IF
40.6 44.2 13.4 43.0
20.5 18.4 27.5
32.6 22.5
9.6
2F
19.5
24.8
8.8
5.7
36.9
3F
24.3
15.1
29.2
17.3
45.4
18.9
4F
8.4
7.4
12.8
13.1
13.3
20.6
33.9
-43-
-------
TABLE 14. NUMBER OF CASES FOR WHICH SPECIFIC MODEL TERMS ARE SELECTED, BY WIND COMPONENT, MODELING
PROCEDURE AND NETWORK
-e-
.o
I
Models for
U-Component Predictions
Using Data From
RTI Network
Term
Number*
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Modeling
1
908
97
74
34
140
53
74
62
34
138
42
68
98
2
908
168
135
98
247
129
175
119
112
212
111
136
196
Procedure
3
908
75
64
28
123
40
27
29
42
117
8
38
71
48
38
78
38
38
35
125
54
41
53
4
908
132
106
76
195
116
74
67
112
166
28
74
142
112
95
149
65
83
62
220
109
126
133
Using Data From
Full Network
Modeling Procedure
1
908
121
88
75
151
77
97
58
48
107
62
99
130
2
908
192
156
171
249
172
193
115
133
188
129
162
221
3
908
105
91
56
114
57
40
22
61
94
12
37
89
63
45
89
37
51
47
168
70
63
58
4
908
158
144
119
152
125
99
50
132
151
38
62
150
129
98
169
60
100
96
244
126
156
136
Models for V-Component Predictions
Using Data From
RTI Network
Modeling Procedure
1
908
389
120
39
187
75
79
91
104
242
38
98
113
2
908
506
193
135
290
206
209
169
213
279
96
187
175
3
908
272
96
31
154
54
39
39
88
245
10
46
86
51
44
54
58
29
64
126
186
89
51
4
908
355
193
79
252
133
151
85
176
287
50
95
135
119
107
130
125
74
122
206
272
195
137
Using Data From
Full Network
Modeling Procedure
1
908
460
107
66
177
83
123
98
156
193
49
129
159
2
908
537
189
152
270
222
270
185
276
229
133
226
240
3
908
392
97
41
136
78
71
36
143
179
14
70
124
44
34
71
65
45
67
137
98
119
50
4
908
463
175
99
193
132
151
83
238
219
42
110
186
114
98
139
118
99
116
189
183
222
109
* Terms are assumed to be ordered as in definition (5) .
-------
COMPARISON OF ALTERNATIVE MODELING PROCEDURES USING WIND DATA FROM ALL
STATIONS
Overall analyses of the data from the summer and winter field
programs are shown in Table 15. For these analyses, all data from each
season are pooled together. The total sums of squares among the 254
summertime cases and the 654 wintertime cases are each partitioned into
a between-case and a within-case component. The various modeling proce-
dures are applied on a case-by-case basis and can therefore have no
effect on the between-case component of variation. The within-case
component, which corresponds to fitting a constant for each case (i.e.,
using modeling procedure 5), can then be partitioned into a "pooled
regression" and a "pooled residual" component for each of the modeling
procedures. Only the latter of these two components is actually shown
in Table 15.
The results shown in Table 15 are utilized to compute values of the
"pooled" criteria described in the second subsection of Section 3.
These results are presented in Table 16 for each of the two seasons. As
expected, all of the stepwise procedures result in smaller pooled resi-
2
dual standard deviations and larger adjusted R 's than either procedure
0 or procedure 5. In terms of these measures of overall precision,
modeling procedure 4 is clearly superior for both wind components and
both seasons. Modeling procedure 2 yields models which achieve,, on
average, virtually the same precision as modeling procedure 3. How-
ever, it requires an average of about one more term per case than proce-
dure 3 requires. On the average, procedure 1 involves fewer terms than
the other stepwise procedures and, in terms of the pooled precision
measures, appears the least favorable among the four stepwise proce-
dures .
-45-
-------
TABLE 15. SUMMARY OF ANALYSIS OF VARIANCE RESULTS BASED ON ESTI-
MATIONS FROM THE FULL NETWORK
Summer Field Program
Source of
Variation*
Total
Between Cases
Within Cases
Residual (0)
Residual (1)
Residual (2)
Residual (3)
Residual (4)
Source of
Variation*
Total
Between Cases
Within Cases
Residual (0)
Residual (1)
Residual (2)
Residual (3)
Residual (4)
Degrees
U
5500
253
5247
2199
4942
4628
4829
4490
Degrees
U
14456
653
13803
5955
12995
12341
12752
11866
of Freedom
V
5500
253
5247
2199
4773
4450
4708
4327
Winter Field
of Freedom
V
14456
653
13803
5955
12477
11671
12231
11245
Mean Squares
U
1.3928
16.7408
0.6528
0.6279
0.5574
0.5033
0.5029
0.4495
Program
Mean Squares
U
5.2119
94.9021
0.9688
0.9316
0.7289
0.6721
0.6865
0.6095
(mps )
V
1.9584
21.9731
0.9933
0.7480
0.6770
0.5996
0.6373
0.5437
(mps )
V
4.5891
64.3676
1.7610
0.8924
0.8471
0.7447
0.7750
0.6599
The notation "Residual (j)" means the pooled residual variation
from fitting models determined by modeling procedure j. It should
be noted that "Within-Cases" is equivalent to "Residual (5)".
-46-
-------
TABLE 16. VALUES OF POOLED EVALUATIVE CRITERIA BY SEASON, WIND COMPONENT
AND MODELING PROCEDURE BASED ON THE FULL NETWORK ESTIMATIONS
Wind
Statistic Component
Average No. of U
Model Terms
(intercept
included) V
Pooled Residual U
Std. Dev. (mps)
V
Pooled R2 U
V
Pooled 2 U
Adjusted R
V
Modeling
Season*
S
W
S
W
S
W
S
W
S
W
S
W
S
W
S
W
13
13
13
13
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0
.0
.0
.0
.0
79
97
86
94
60
59
68
78
04
04
25
49
1
2.
2.
2.
3.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
2
2
9
0
75
85
82
92
20
29
38
57
15
25
32
52
2
3.
3.
4.
4.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
Procedure
4
2
1
3
71
82
77
86
32
38
49
64
23
31
40
0.58
3
2.
2.
3.
3.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
6
6
1
4
71
83
80
88
29
35
42
61
23
29
36
56
4
4.0
4.0
4.6
4.9
0.67
0.78
0.74
0.81
0.41
0.46
0.55
0.69
0.31
0.37
0.45
0.63
5
1.0
1.0
1.0
1.0
0.81
0.98
1.00
1.33
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
* S = summer field program;
W = winter field program.
-47-
-------
The following tendencies should also be noted:
(a) there is greater within-case variation from station-to-station
in the V-component,
(b) there is greater total variation in both components in the
winter than in the summer,
(c) a smaller percentage of the total variation is accounted for
by the models in the summer than in the winter for both the
U- and V-components.
Tables 17 and 18 present, respectively, the distributions of the
2
residual standard deviations and the distributions of the adjusted R
values. These distributions are based on all 908 cases. Particular
note should be made of the similarity of the distributions for proce-
dures 2 and 3 in both of these tables.
Tables 19 and 20 show distributions of the individual residuals
resulting from the full-network estimations. Table 19 clearly indicates
the smaller summertime variation, as compared to that of the wintertime.
Table 20 combines the distributions of Table 19 over seasons. In addi-
tion, the distributions of deviations between observed and predicted
wind speeds are shown. Large positive deviations in the wind speeds
appear more frequently than large negative deviations, indicating (when
such errors occur) a tendency toward underprediction of the wind speeds.
The majority of thi wind speed residuals, however, are less than 1.5
mps, as shown by the percentages below that are derived directly from
Table 20:
Modeling Percentage of Observations With:
Procedure |W-W|<1.5 mps |W-W|>1.5 mps
1 93.62 6.38
2 95.45 4.55
3 94.24 5.76
4 96.20 3.80
5 83.54 16.46
-48-
-------
TABLE 17. DISTRIBUTIONS OF RESIDUAL STANDARD DEVIATIONS OVER THE 908
CASES FOR FOUR MODELING PROCEDURES APPLIED TO DATA FROM ALL
STATIONS
Percentage Frequency
Distributions
Residual
Std. Dev.
Cumulative
Modeling Procedure
1234
Modeling
1 2
Percentages
Procedure
3 4
(mps)
U-Component
0.0 -
0.5 -
1.0 -
1.5 -
0.5
1.0
1.5
2.0
14.5
66.9
18.0
0.7
18.6
66.9
14.1
0.4
18.0
66.1
15.4
0.6
24.
65.
10.
0.
2
0
4
4
14
81
99
100
.5
.4
.3
.0
18.6
85.5
99-6
100.0
18.0
84.0
99.4
100.0
24.2
89.2
99.6
100.0
V- Component
0.0 -
0.5 -
1.0 -
1.5 -
0.5
1.0
1.5
2.0
5.5
68.9
24.1
1.4
8.0
72.5
19.1
0.4
7.2
71.8
20.4
0.7
13.
72.
14.
0.
8
1
0
1
5
74
98
100
.5
.4
.6
.0
8.0
80.5
99.6
100.0
7.2
79.0
99.3
100.0
13.8
85.9
99.9
100.0
-49-
-------
TABLE 18. DISTRIBUTIONS OF ADJUSTED R STATISTICS OVER THE 908 CASES
FOR FOUR MODELING PROCEDURES APPLIED TO DATA FROM ALL STATIONS
Adjusted
o
R2
Percentage Frequency
Distributions
Modeling Procedure
1
2
3
4
Cumulative
Modeling
1
2
Percentages
Procedure
3
4
U-Component
0.0
0.2
0.4
0.6
0.8
0.0
- 0.2
- 0.4
- 0.6
- 0.8
- 1.0
30.4
30.3
23.7
10.2
5.1
0.3
11.2
36.2
29.0
14.9
8.0
0.7
20.5
27.6
29.0
14.3
7.5
1.1
4.8
26.9
32.7
20.2
12.0
3.4
30.4
60.7
84.4
94.6
99.7
100.0
11.2
47.5
76.4
91.3
99.3
100.0
20.5
48.1
77.1
91.4
98.9
100.0
4.8
31.7
64.4
84.6
96-6
100.0
V-Component
0.0
0.2
0.4
0.6
0.8
0-0
- 0.2
- 0.4
- 0.6
- 0.8
- 1.0
10.2
12.6
24.1
30.7
19-7
2.6
4.0
10.5
21.3
32.2
26.8
5.4
7.8
11.0
21.8
30.3
24.8
4.3
2.5
8.6
17.6
29-0
30.5
11.8
10.2
22.8
46.9
77.6
97.4
100.0
4.0
14.4
35.7
67.8
94.6
100.0
7.8
18.8
40.6
70.9
95.7
100.0
2.5
11.1
28.7
57.7
88.2
100.0
-50-
-------
TABLE 19. PERCENTAGE FREQUENCY DISTRIBUTIONS OF RESIDUALS BY SEASON, WIND COMPONENT, AND MODELING
PROCEDURES OVER ALL STATIONS BASED ON FULL NETWORK ESTIMATIONS
I
Ui
Summer
Wind
Comp.
U
V
Winter
Wind
Comp.
U
V
Field Program
Modeling
Procedure
1
2
3
4
5
1
2
3
4
5
Field Program
Modeling
Procedure
1
2
3
4
5
1
2
3
4
5
(5501 Observations)
Deviation
<-5 -4
0.04
0.04
0.04
0.04
0.05
__
0.02
Between
-3
0.11
0.07
0.09
0.07
0.16
0.22
0.16
0.18
0.15
0.78
Observed and
-2
1.75
1.42
1.51
1.25
2.65
2.16
1.45
1.78
1.15
5.33
-1
18.21
16.11
16.12
13.65
20.85
21.20
19.14
20.45
17.31
21.61
Predicted Value (midpt. of interval)
0
61.23
65.53
65.12
70.57
54.01
54.99
59.86
56.95
63.95
45.36
1
16.05
14.72
15.02
12.60
18.91
18.00
16.60
17.25
14.96
21.12
2
2.16
1.80
1.75
1.53
2.80
3.04
2.51
3.02
2.27
4.29
3
0.44
0.29
0.35
0.27
0.53
0.38
0.27
0.35
0.22
1.38
4
0.02
0.02
0.02
0.02
0.04
0.02
0.02
0.09
>5
__
0.02
(14457 Observations)
0.01 0.04
0.01 0.03
0.01 0.04
0.01 0.02
0.02 0.06
0.04
0.01
0.04
0.03
0.22 0.48
0.32
0.30
0.32
0.24
0.64
0.26
0.18
0.18
0.10
1.61
2.91
2.46
2.68
2.19
4.57
3.33
2.66
2.87
2.13
7.40
20.70
19.10
19.33
16.74
22.65
22.43
20.43
21.33
18.32
24.85
52.32
56.51
55.50
61.38
45.53
48.77
54.08
51.97
59.29
35.07
19.92
18.52
18.84
16.99
20.43
20.88
19.36
19.89
17.36
18.27
3.42
2.78
2.94
2.17
4.89
3.72
2.89
3.25
2.49
8.13
0.28
0.22
0.27
0.19
0.93
0.47
0.35
0.39
0.26
3.09
0.08
0.08
0.06
0.06
0.27
0.09
0.04
0.08
0.03
0.73
0.01
0.01
0.01
0.01
0.01
0.14
-------
TABLE 20. PERCENTAGE FREQUENCY DISTRIBUTIONS OF RESIDUALS BY WIND COMPONENT AND MODELING PROCEDURE
OVER ALL CASES AND ALL STATIONS BASED ON FULL NETWORK ESTIMATIONS
I
Cn
N>
Wind
Comp . *
U
V
W
Modeling
Procedure
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Deviation
<-5
0.01
0.01
0.01
0.01
0.02
0.16
0.01
-4
0.04
0.03
0.04
0.03
0.06
0.03
0.01
0.03
0.02
0.36
0.01
0.01
0.06
Between Observed and
-3
0.26
0.24
0.26
0.20
0.51
0.25
0.18
0.18
0.11
1.38
0.09
0.04
0.08
0.02
0.49
-2
2.59
2.17
2.36
1.93
4.04
3.01
2.32
2.57
1.86
6.83
1.86
1.25
1.60
0.94
4.43
-1
20.02
18.28
18.44
15.89
22.15
22.09
20.07
21.09
18.04
23.96
18.45
16.69
17.75
15.28
22.94
Predicted Value (midpt. of interval)
0
54.78
58.99
58.15
63.91
47.87
50.49
55.68
53.34
60.57
37.90
52.87
58.25
55.15
62.34
39.78
1
18.85
17.47
17.78
15.78
20.01
20.09
18.60
19.16
16.70
19.06
22.30
20.51
21.34
18.58
20.82
2
3.07
2.51
2.61
1.99
4.31
3.53
2.79
3.19
2.43
7.07
3.92
2.96
3.61
2.61
7.70
3
0.33
0.24
0.29
0.22
0.82
0.45
0.33
0.38
0.25
2.62
0.41
0.27
0.38
0.22
2.98
4
0.06
0.06
0.05
0.05
0.21
0.07
0.03
0.07
0.02
0.56
0.09
0.03
0.07
0.02
0.68
>5
0.01
0.01
0.01
0.01
0.01
0.11
0.01
0.01
0.01
0.11
* W denotes wind speed.
-------
COMPARISON OF ALTERNATIVE MODELING PROCEDURES USING WIND DATA FROM
STATIONS IN THE RTI NETWORK
The modeling procedures applied to data from the RTI network sta-
tions yield similar results to those described in the previous subsec-
tion. Tables 21 through 24, which are analogous to Tables 15 through
18, respectively, provide a summary of the major results. It should
again be emphasized that these results, like those of the preceding
subsection, relate to the precision of the modeling procedures rather
than to their accuracy.
Comparison of these results to those of the previous subsection in-
dicates that in general the criteria values based on the RTI network
estimations are slightly less consistent than those for the full net-
work. More specifically, the results can be summarized as follows:
(a) In the winter, the within-case variation over the RTI network
stations is somewhat larger than the within-case variation
over all stations for both wind components; in the summer,
the within-case variation for the V-component is smaller for
the RTI network than for the full network.
(b) For the V-component, the residual variances over the RTI net-
work are usually smaller than the corresponding quantities for
the full network; for the U-component, they are about the same
in the summer and larger in the winter than the corresponding
full-network residual variances.
(c) Among the stepwise regression procedures, the pooled adjusted
R2 statistics from the RTI network estimations are quite com-
parable to those of the full network; the distributions of the
f\
adjusted R statistics show that more large and more small
adjusted R2 values occur for the RTI-network estimations than
for the full-network estimations.
-53-
-------
TABLE 21. SUMMARY OF ANALYSIS OF VARIANCE RESULTS BASED ON ESTIMATIONS
FROM THE RTI NETWORK
Source of
Variation*
Total
Between Cases
Within Cases
Residual (0)
Residual (1)
Residual (.2)
Residual (3)
Residual (4)
Source of
Variation*
Total
Between Cases
Within Cases
Residual (0)
Residual (1)
Residual (2)
Residual (3)
Residual (4)
Degrees
U
4017
253
3764
716
3515
3200
3434
3131
Degrees
U
10671
653
10018
2170
9353
8744
9138
8209
Summer Field
of Freedom
V
4017
253
3764
716
3421
3096
3294
2747
Winter Field
of Freedom
V
10671
653
10018
2170
8786
8028
8576
7557
Program
Mean Squares
U
1.2919
10.5446
0.6700
0.5483
0.5419
0.4909
0.5041
0.4500
Program
Mean Squares
U
5.2568
(mps )
V
1.7273
13.9551
0.9044
0.6147
0.6324
0.5380
0.5529
0.4175
2
(mps )
V
4.8337
69.4364 49.3675
1.0734
1.2154
0.8168
0.7411
0.7611
0.6635
1.9309
0.8338
0.8065
0.6861
0.7199
0.5898
The notation "Residual (j)" means the pooled residual variation
from fitting models determined by modeling procedure j. It should
be noted that "Within Cases" is equivalent to "Residual (5)".
-54-
-------
TABLE 22. VALUES OF POOLED EVALUATIVE CRITERIA BY SEASON, WIND COMPONENT, AND
MODELING PROCEDURE BASED ON RTI NETWORK ESTIMATIONS
Wind
Statistic Component
Average No. of U
Model Terms
(intercept
included) V
Pooled Residual U
Std. Dev. (mps)
V
Pooled R2 U
V
Pooled U
Adjusted RZ
V
Modeling Procedure
Season*
S
W
S
W
S
W
S
W
S
W
S
W
S
W
S
W
13
13
13
13
0.
1.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0
.0
.0
.0
.0
74
10
78
91
84
75
87
91
18
00
32
57
1
2.
2.
2.
2.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0
0
4
9
74
90
80
90
24
29
37
63
19
24
30
58
2
3.
2.
3.
4.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
2
9
6
0
70
86
73
83
38
40
51
72
27
31
41
64
3
2.3
2.3
2.9
3.2
0.71
0.87
0.74
0.85
0.32
0.35
0.47
0.68
0.25
0.29
0.39
0.63
4
3
3
5
4
0
0
0
0
0
0
0
0
0
0
0
0
.5
.8
.0
.8
.67
.81
.65
.77
.44
.49
.66
.77
.33
.38
.54
.69
5
1.0
1.0
1.0
1.0
0.82
1.04
0.95
1.39
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
* S = summer field program;
W = winter field program.
-55-
-------
TABLE 23. DISTRIBUTIONS OF RESIDUAL STANDARD DEVIATIONS OVER THE 908
CASES FOR FOUR MODELING PROCEDURES APPLIED TO DATA FROM
STATIONS IN THE RTI NETWORK
Residual
Std. Dev.
Percentage Frequency
Distributions
Modeling Procedure
123
Cumulative Percentages
Modeling Procedure
1234
U-Component
0.0 -
0.5 -
1.0 -
1.5 -
0.5
1.0
1.5
2.0
15.3
61.3
21.4
2.0
19.7
61.5
17.7
1.1
18.9
60.7
19.2
1.2
28.1
56.6
14.4
0.9
15.3
76.7
98.0
100.0
19-7 18.9 28.1
81.2 79.6 84.7
98.9 98.8 99.1
100.0 100.0 100.0 100.0
V-Component
0.0 -
0.5 -
1.0 -
1.5 -
0.5
1.0
1.5
2.0
10.1
67.9
20.5
1.5
15.5
68.8
15.1
0.6
15.4
66.4
17.4
0.8
31.
57.
10.
0.
6
6
2
6
10
78
98
100
.1
.0
.5
.0
15.5
84.4
99.4
100.0
15.4
81.8
99.2
100.0
31.6
89.2
99.4
100.0
-56-
-------
TABLE 24. DISTRIBUTIONS OF ADJUSTED RZ STATISTICS OVER THE 908 CASES FOR
FOUR MODELING PROCEDURES APPLIED TO DATA FROM STATIONS IN THE
RTI NETWORK
Adjusted
Percentage Frequency
Distributions
Cumulative
Modeling Procedure
1234
Modeling
1 2
Percentages
Procedure
3 4
U-Component
0.0 -
0.2 -
0.4 -
0.6 -
0.8 -
0.0
0.2
0.4
0.6
0.8
1.0
36.2
23.8
21.0
12.4
5.4
1.1
15.1
31.4
24.2
17.4
8.7
3.2
28.3
21.5
24.3
15.3
7.8
2.8
8.
24.
26.
19.
12.
8.
8
2
2
8
7
5
36
60
81
93
98
100
.2
.0
.1
.5
.9
.0
15.1
46.5
70.7
88.1
96.8
100.0
28.3
49.8
74.1
89.4
97.2
100.0
8.6
32.8
59.0
78.9
91.5
100.0
V- Component
0.0 -
0.2 -
0.4 -
0.6 -
0.8 -
0.0
0.2
0.4
0.6
0.8
1.0
15.2
9.9
19.1
24.2
24.7
6.9
5.5
11.5
16.2
23.7
29.3
13.9
10.1
9.1
15.6
24.0
27.3
13.8
3.
7.
10.
21.
27.
30.
3
3
2
6
6
0
15
25
44
68
93
100
.2
.1
.2
.4
.1
.0
5.5
17.0
33.1
56.8
86.1
100.0
10.1
19.3
34.9
58.9
86.2
100.0
3.3
10.6
20.8
42.4
70.0
100.0
-57-
-------
ACCURACY OF PREDICTED WIND FIELDS
Evaluation of the accuracy of the modeling procedures depends upon
the deviations between observed and predicted values at the non-network
stations, when the estimation is based on data from the RTI network.
The means of these deviations by wind component are shown in Table 25
for each of the seven non-network stations; for comparative purposes,
the corresponding mean deviations resulting from the full network esti-
mations are given in Table 26. It is apparent from these results that
the largest discrepancies between the mean deviations of the two tables
occur for the outer-non-network stations STL007 and EPA103. Except for
these two stations, the corresponding mean deviations of Tables 25 and
26 usually differ by less that 0.1 mps.
Pooled root mean square errors (RMSE's) at each of the non-network
stations are presented in Table 27. These are shown for each wind
component and season; the pooled vector RMSE's, denoted by (U,V), pro-
vide a convenient method of summarizing the errors over the two compo-
nents, as described at the end of Section 3. In order to evaluate the
magnitude of the errors occurring at the non-network stations, the
pooled vector RMSE's shown in the last five rows of Table 27 are plotted
in Figure 2 along with the corresponding RMSE's for the RTI network
stations. Root mean square errors based on the full-network (F) esti-
mations are also shown. These appear to the left of each vertical line,
whereas those based on the RTI network data are shown on the right.
This plot clearly demonstrates the trend of decreasing RMSE's for the
full-network estimations when going from procedure 1 to procedure 4 and
the similar trend for RTI network stations based on the RTI network
estimations. The greatest improvement in precision in going from proce-
-58-
-------
TABLE 25. MEANS OF DEVIATIONS BETWEEN OBSERVED AND PREDICTED VALUES AT NON-
NETWORK STATIONS, BY WIND COMPONENT AND MODELING PROCEDUREBASED
ON ESTIMATIONS FROM RTI NETWORK DATA
Wind Modeling
Comp. Procedure
U (mps)
V (mps)
1
2
3
4
5
1
2
3
4
5
STL003
0.
0.
0.
0.
0.
-1.
-0.
-0.
-0.
-0.
543
526
520
483
566
663
693
678
733
702
STL004
-0.
-0.
-0.
-0.
-0.
0.
0.
0.
0.
0.
395
440
418
466
338
452
531
429
506
167
Station
STL007
-0.
-0.
-0.
-0.
-0.
0.
0.
0.
1.
-0.
700
761
560
612
428
414
558
833
273
215
EPA103
0.328
0.310
0.314
0.306
0.332
1.387
1.232
1.380
1.228
1.709
EPA107
-0.044
-0.083
-0.067
-0.113
0.000
-0.146
-0.150
-0.182
-0.210
-0.263
EPA111
-0.519
-0.517
-0.549
-0.585
-0.542
0.210
0.248
0.225
0.279
0.168
EPA112
-0.202
-0.239
-0.238
-0.308
-0.138
0.175
0.272
0.193
0.307
-0.197
TABLE 26. MEANS OF DEVIATIONS BETWEEN OBSERVED AND PREDICTED VALUES AT NON-
NETWORK STATIONS, BY WIND COMPONENT AND MODELING PROCEDUREBASED
ON ESTIMATIONS FROM FULL NETWORK DATA
Wind Modeling
Comp. Procedure
U (mps)
V (mps)
1
2
3
4
5
1
2
3
4
5
STL003
0.
0.
0.
0.
0.
-0.
-0.
-0.
-0.
-0.
545
510
492
452
571
716
720
683
664
731
STL004
-0.
-0.
-0.
-0.
-0.
0.
0.
0.
0.
0.
367
387
384
381
333
392
461
387
451
138
Station
STL007
-0.
-0.
-0.
-0.
-0.
0.
0.
0.
0.
-0.
416
359
229
173
433
273
285
328
291
301
EPA103
0.292
0.258
0.261
0.193
0.331
1.126
0.951
1.131
0.913
1.667
EPA107
-0.030
-0.078
-0.069
-0.102
0.008
-0.216
-0.205
-0.207
-0 . 190
-0.290
EPA111
-0.468
-0.430
-0.479
-0.466
-0.519
0.144
0.164
0.157
0.209
0.180
EPA112
-0.171
-0.186
-0.200
-0.221
-0.133
0.143
0.233
0.165
0.255
-0.227
-59-
-------
TABLE 27. ROOT MEAN SQUARE ERRORS (MPS) FOR EACH NON-NETWORK STATION BASED ON ESTIMATIONS
FROM THE RTI NETWORK, BY WIND COMPONENT, SEASON, AND MODELING PROCEDURE
Wind
Comp.
U
U
V
V
(U,V)
(U,V)
(U,V)
Modeling
Season Procedure
Summer 1
2
3
4
5
Winter 1
2
3
4
5
Summer 1
2
3
4
5
Winter 1
2
3
4
5
Summer 1
2
3
4
5
Winter 1
2
3
4
5
Combined 1
2
3
4
5
STL003
0.960
0.927
0.942
0.925
1.015
0.849
0.850
0.842
0.841
1.859
0.774
0.817
0.837
0.990
0.807
1.220
1.202
1.193
1.191
1.245
1.233
1.236
1.260
1.355
1.297
1.486
1.472
1.460
1.458
1.512
1.419
1.409
1.406
1.429
1.454
STL004
1.109
1.090
1.113
1.104
1.073
0.973
1.041
0.991
1.069
0.903
1.118
1.075
1.051
1.027
1.099
1.183
1.259
1.164
1.238
0.989
1.575
1.531
1.531
1.508
1.536
1.532
1.633
1.529
1.636
1.339
1.544
1.605
1.529
1.601
1.398
STL007
1.017
1.438
1.034
1.311
0.678
1.359
1.382
1.790
2.001
1.461
1.120
1.363
1.946
2.847
0.788
1.088
1.147
1.265
1.592
1.589
1.513
1.981
2.204
3.135
1.040
1.741
1.796
2.192
2.558
2.158
1.579
1.932
2.201
2.988
1.436
Station
EPA103
0.870
0.848
0.869
0.876
0.901
1.014
0.981
1.030
1.021
1.112
1.937
1.755
1.871
1.678
2.231
1.689
1.582
1.695
1.609
1.971
2.124
1.949
2.063
1.893
2.406
1.970
1.861
1.984
1.906
2.263
2.018
1.888
2.008
1.902
2.308
EPA107
0.426
0.443
0.426
0.443
0.435
0.494
0.493
0.487
0.510
0.571
0.501
0.521
0.510
0.566
0.444
0.659
0.641
0.639
0.635
0.729
0.657
0.684
0.665
0.719
0.621
0.824
0.809
0.804
0.815
0.926
0.782
0.777
0.768
0.790
0.853
EPA111
0.783
0.781
0.817
0.867
0.794
0.771
0.786
0.795
0.823
0.770
1.099
1.108
1.140
1.196
1.106
1.099
1.108
1.140
1.196
1.106
EPA112
0.516
0.581
0.542
0.607
0.469
0.663
0.696
0.672
0.767
0.641
0.509
0.541
0.493
0.534
0.453
0.704
0.780
0.681
0.777
0.725
0.725
0.794
0.733
0.808
0.652
0.967
1.045
0.956
1.092
0.967
0.907
0.982
0.900
1.022
0.892
All Non-
Network
Stations
0.863
0.955
0.867
0.933
0.310
0.833
0.846
0.864
0.911
0.851
1.119
1.116
1.279
1.521
1.156
1.095
1.092
1.092
1.109
1.163
1.413
1.469
1.545
1.784
1.412
1.376
1.381
1.392
1.436
1.441
1.387
1.406
1.437
1.542
1.433
Inner Non-
Network
Stations
0.317
0.813
0.817
0.822
0.816
0.772
0.795
0.782
0.833
0.765
0.777
0.781
0.768
0.821
0.762
0.944
0.971
0.930
0.968
0.918
1.127
1.128
1.122
1.162
1.116
1.220
1.255
1.215
1.277
1.195
1.198
1.226
1.194
1.250
1.177
-------
^°°led Legend: * RTI Network Stations
pj^°5 cx * Inngr Non-Network Stations
RMSE (plpsl 0 Outer Non-Network Stations
5,5 «, ,
! 1
i i
j j
5.0. * |
1 t
1 1
1 i
1 |
1 1
1 1
| i
20+ I Q
* I I *
1 * 1 *
! 0 1* *
1 1*0
1.5+ *\ * (^
I )i X
1 *
I * *X *
1,0 + X* * *
I n* *x a*
. | * *x. X*
1 X X*
1 _*J
0,5 +
1
1 1
0 0 * J
- ' ' '
1 1
1 I 1
1 1 1
1 1 1
- 1 1 . 1 ,.
1 i O 1
1 I 1
0
* *
0*
*x
* - ft
. x
x
*x *
*X X*
* *
*X .... 0*
* X*
*
, , .,
1- 1
1 I
I A !
1 1*0
0 i 0*1*
1 * 1 *
| « 1 B- . I
a i * i *
i n *i
* 1 1 *
1 1 p *
* * 1* I
* * | X * 1 *
x i ; *i *x
X 0*1 X X*. I *X -
* X i X *| *
X |* *|*X
* * 1 X X*| *
* W ^ * ^ I i W
*X X*i*X X !
L _X_*_i *- .- L
1 1
1 1 .1
1 t 1
1 t 1
-t 2 3-
F R F R F R
Modeling Procedure
NOTE: Some.observations are not shown since
computer would not overprint.
F R
5... _
F R
(F = Full network
R = RTI network)
Figure 2. Pooled vector RMSE's for individual stations by modeling procedure
-61-
-------
dure 1 to procedure 4 appears to occur for stations in the outlying
areas of the St. Louis region.
Also apparent from Figure 2 is the increase in the RMSE's for the
non-network stations based on the RTI network estimations over the
corresponding values for the full network estimations. These increases
tend to be most pronounced for procedures 3 and 4 and are quite dramatic
for the outer-non-network station, STL007. Thus, among the non-network
stations, there is a general trend of increasing RMSE's in going from
procedure 1 to procedure 4, as contrasted to the reverse trend for the
RTI network stations. As shown in Table 27, at least one of the first
three procedures yields a smaller RMSE than procedure 4 at each of the
non-network stations. This suggests that the apparently higher pre-
cision of model 4 relative to the other procedures is obtained by over-
fitting (i.e., including too many terms) in some cases; this results in
a loss in accuracy relative to the first three procedures. Figure 2
also shows that, although the flat-surface models (procedure 5) provide
predictions at the non-network stations which are nearly comparable in
accuracy to those of procedures 1, 2, and 3, the precision of procedure
5, as measured by the RMSE's at the RTI network stations, is substan-
tially poorer than that of the stepwise regression procedures.
Among the four stepwise regression procedures, procedure 1 would
appear to yield the most accurate results across all seven non-network
stations; procedure 3 appears more accurate across the five inner-non-
network stations. These general conclusions are supported by the re-
sults of Tables 28 and 29, which show various statistics that summarize
the distributions of the RMSE's over all cases. Table 28 provides these
-------
TABLE 28. CHARACTERIZATION OF THE DISTRIBUTIONS OVER THE 908 CASES OF RMSE'S ACROSS ALL NON-NETWORK STATIONS
BASED ON ESTIMATIONS FROM RTI NETWORK DATA
CO
Modeling
Wind Proce-
Comp. dure
U I
2
3
4
5
V 1
2
3
4
5
W I
2
3
4
5
(U,V) 1
2
3
4
5
Pooled
RMSE
(mps)
0.842
0.878
0.865
0.917
0.840
1.102
1.093
1.147
1.239
1.161
1.086
1.061
1.097
1.113
1.188
1.387
1.406
1.437
1.542
1.433
Mean
RMSE
(mps)
0.778
0.794
0.787
0.822
0.772
1.039
1.035
1.066
1.116
1.095
1.023
0.995
1.027
1.030
1.118
1.328
1.336
1.359
1.428
1.373
Std.
Dev.
of
RMSE
(mps)
0.313
0.364
0.349
0.394
0.324
0.365
0.365
0.419
0.531
0.374
0.361
0.361
0.374
0.406
0.385
0.389
0.427
0.455
0.565
0.394
Maximum
RMSE
(mps)
2.193
5.038
3.488
3.588
2.202
2.478
2.898
5.494
6.441
2.357
2.245
3.434
2.763
3.737
2.453
2.844
5.117
5.525
6.467
3.114
Percentage of
<0.5
19.2
19.1
19.2
17.2
18.7
5.7
4.8
4.6
4.7
4.8
5.7
5.3
5.5
5.7
4.5
0.2
0.2
0.2
0.2
0.3
<1.0
79.0
76.9
77.6
74.8
79.6
47.5
49.7
46.0
44.6
41.3
50.4
54.3
51.0
50.9
39.6
20.0
20.5
18.9
18.2
15.3
Cases with RMSE (mps) :
<1.5
97.6
97.0
96.1
94.2
96.6
89.6
90.0
87.8
85.2
86.7
90.1
91.0
89.0
88.9
84.9
70.5
70.0
69.6
65.0
65.2
<2.0
99.7
99.3
99.3
98.5
99.7
98.8
98.5
97.6
96.1
97.9
98.8
98.9
98.3
97.6
98.1
94.5
93.2
91.7
89.0
92.8
<2.5
100.0
99.8
99.8
99.2
100.0
100.0
99.8
99.4
98.6
100.0
100.0
99.8
99.8
99.2
100.0
99.3
98.8
98.5
96.1
99.7
-------
TABLE 29. CHARACTERIZATION OF THE DISTRIBUTIONS OVER THE 908 CASES OF RMSE'S ACROSS STATIONS IN THE INNER-
NON-NETWORK BASED ON ESTIMATIONS FROM RTI NETWORK DATA
Modeling
Wind Proce-
Comp. dure
U 1
2
3
4
5
V 1
2
3
4
5
W 1
2
3
4
5
(U,V) 1
2
3
4
5
Pooled
RMSE
(raps)
0.783
0.800
0.791
0.830
0.778
0.907
0.929
0.894
0.935
0.883
0.830
0.835
0.816
0.845
0.836
1.198
1.226
1.194
1.250
1.177
Mean
RMSE
(mps)
0.717
0.728
0.722
0.751
0.714
0.819
0.842
0.811
0.851
0.809
0.759
0.765
0.746
0.775
0.768
1.127
1.153
1.124
1.176
1.113
Std.
Dev.
of
RMSE
(mps)
0.319
0.331
0.326
0.350
0.318
0.377
0.376
0.362
0.376
0.344
0.342
0.333
0.332
0.334
0.341
0.397
0.402
0.393
0.411
0.380
Maximum
RMSE
(mps)
2.372
2.372
2.372
2.372
2.275
2.645
2.645
2.306
2.254
2.418
2.411
2.445
2.410
2.397
2.359
2.773
2.850
2.587
2.570
2.594
Percentage of Cases with RMSE (mps) :
<0.5
26.0
25.0
25.1
23.1
26.1
19.7
17.5
19.2
17.0
18.3
23.9
22.5
24.2
20.6
22.1
2.4
2.2
2.3
2.1
2.6
<1.0
83.4
82.2
82.6
79.5
84.1
72.7
70.5
74.7
69.4
75.2
77.9 -
78.9
79.2
76.9
78.0
42.6
40.4
43.1
38.8
41.7
<1.5
97.8
97.2
97.2
95.9
97.9
94.3
94.2
94.7
93.6
96.0
96.7
96.8
97.1
96.7
97.1
83.2
82.6
83.0
79.1
85.1
<2.0
99.7
99.7
99.7
99.3
99.6
99.2
98.9
99.3
99.2
99.6
99.6
99.7
99.7
99.8
99.3
97.6
96.6
97.1
95.9
97.4
<2.5
100.0
100.0
100.0
100.0
100.0
99.8
99.9
100.0
100.0
100.0
100.0
100.0
100.0
100.0
100.0
99.4
99.2
99.8
99.8
99.7
-------
measures for all of the non-network whereas Table 29 provides comparable
results for the inner-non-network only.
COMBINED EVALUATIVE MEASURES
The overall merit of a procedure must be judged by some combined
measure of its estimation error (precision) and its prediction error
(accuracy). An overall measure of the precision of a procedure is the
square root of the sum of the pooled residual variances for the two wind
components, based on estimations from the RTI network data; these
values are shown below and are denoted by s(j), where j indicates the
particular procedure:
Pooled Residual s(j):
Modeling Variances (mps ) Square Root
Procedure (j )
1R
2R
3R
4R
5R
0.
0.
0.
0.
0.
U
7417
6741
6909
6045
9632
0.
0.
0.
0.
1.
V
7577
6449
6736
5439
6508
Total
1.
1.
1.
1.
2.
4994
3190
3645
1484
6140
of
1
1
1
1
1
Total
.224
.148
.168
.072
.617
(mps)
A compatible measure that reflects the accuracy of a procedure is the
pooled vector RMSE over stations not in the RTI network. Values of
these quantities were shown for the entire non-network in Table 28 and
for the inner non-network in Table 29-
Let a, where 0 ^ a < 1, be used as a weighting factor to reflect
the importance of accuracy relative to precision; define
fa(j) = a[r(j)]2 + (1-a) [s(j)]2 (24)
and
ga(j) = a[r*(j)]2 + (1-cO [s(j)]2, (25)
where r(j) and r*(j) represent, respectively, the pooled vector RMSE's
over all non-network stations and over inner-non-network stations. Note
-65-
-------
that a=0 corresponds to assuming that precision of a particular proce-
dure is of paramount importance and that accuracy can be completely
ignored. Choosing a=l, on the other hand, would completely ignore how
well the particular procedure actually fit the data which were used to
produce the estimates (i.e., the RTI network data). Regarding esti-
mation and prediction errors to be of equal importance (i.e., ot=O.5)
would result in the selection of procedures 2 or 4 as the "best" proce-
dure, depending upon whether (24) or (25) is used as the criterion.
As indicated by Figure 3, however, for this choice of a, there is little
difference among the four stepwise procedures. Figure 4, which shows
values of the g (j) versus a, indicates little preference among the
stepwise procedures when a~0-. 75. Although the choice of a particular a
value is arbitrary, values in the range 0.5 to 0.8 would appear to be most
reasonable; this corresponds to assuming that prediction errors are at
least as important as estimation errors and may be up to 4 times more
important. It should be noted that values of f (2) and f (3) are close
ct a
for all values of a. The same holds true for g (2) and g (3). For
a > 0.5, values of f (2) and g (2) are also close to values of f (1) and
g (1), respectively.
Figures 3 and 4 suggest that procedure 4, because of its tendency
to produce inaccurate results, is the least preferable of the stepwise
procedures. Among the first three procedures, there is no clear prefe-
rence: larger values of a tend to support procedure 1 whereas smaller
values tend to support procedures 2 or 3. Over the range 0.5 to 0.8,
procedure 2 might be selected because when it is not "best", its f and
a
g values are never "much larger" than the corresponding values for the
procedure with the smallest f and g values. The same can be said for
-66-
-------
i
ON
2.6-
2.2 -
(mps)'
1.8 -
1.4 -
1.0
1.00
0-0 0.25 0.50 0.75
1.00
Figure 3. Plot of f (j) versus a, for
five modeling procedures (j)
Figure 4. Plot of g (j) versus a, for
five modeling procedures (j)
-------
procedure 1, however, for a values greater than 0.6 or 0.7. Also, the
consistency of procedure 1 over the non-network stations would tend to
support its use.
Tables 30 and 31 show frequency distributions of the deviations
between observed and predicted values for procedures 1 and 2 (and, for
comparative purposes, for procedure 5). Table 30 shows these distribu-
tions for each non-network station and for the non-network as a whole.
Table 31 shows the distributions over the RTI network, the full network,
and the outer and inner non-networks. In both tables, all available
observations are considered. Again, no strong preference for procedure
1 over procedure 2, or vice versa, is discernable.
SELECTED CASES AND CONDITIONS
Results of the previous subsections have presented various evalua-
tive measures which, for the most part, have represented averages over
a large number of cases. Such summaries, while quite essential for
reducing the volume of data, can also be misleading in some situations.
For instance, the importance of a difference of 0.1 mps in the average
RMSE's of two procedures averaged over all cases may be difficult to
judge. Such a difference could be caused by a few extreme cases or
could be the result of small differences in a large number of cases.
Although the various frequency distributions shown in previous sections
provide some insight regarding the relative performance of the modeling
procedures on a more case-specific basis, an overall "feel" for how the
procedures might perform in a specific situation may be lacking.
Consequently, this subsection provides some additional detail and
illustrative examples that should prove useful. Two major types of
results are presented:
-68-
-------
TABLE 30. PERCENTAGE FREQUENCY DISTRIBUTIONS OF DEVIATIONS BETWEEN OBSERVED AND PREDICTED WIND COMPONENTS
AT NON-KTI NETWORK STATIONS ESTIMATIONS BASED ON DATA FROM RTI NETWORK STATIONS
VO
Modeling
Procedure
1
2
5
Deviation Between Observed and Predicted U-Components (midpoint
Station
STL003
STL004
STL007
EPA103
EPA107
EPA111
EPA112
Total
STL003
STL004
STL007
EPA103
EPA107
EPA111
EPA112
Total
STL003
STL004
STL007
EPA103
EPA107
EPA111
EPA112
Total
W
892
892
350
833
844
596
861
5268
892
892
350
833
844
596
861
5268
892
892
350
833
844
596
861
5268
<-4 -4
__
0.22
0.29
0.17
0.08
_ _
0.45
0.86 0.57
0.17
0.06 0.13
0.22
0.57
0.34
0.11
-3
1.23
2.86
0.12
0.17
0.12
0.46
__
1.46
3.14
0.12
0.17
0.12
0.51
__
0.90
2.57
0.48
0.12
0.42
-2
0.56
9.53
12.57
1.56
0.24
3.69
1.05
3.41
0.67
10.99
12.86
1.56
0.47
4.19
1.28
3.83
0.22
7.40
4.86
2.16
1.30
4.19
0.58
2.73
-1
6.28
33.07
41.43
13.21
13.74
45.97
28.11
23.50
6.17
32.40
39.14
14.05
15.40
45.64
32.40
24.28
6.61
32.29
35.14
12.48
12.91
46.14
23.69
22.06
0
37.67
41.03
36.86
48.14
74.05
46.31
61.32
50.51
38.79
40.47
35.43
48.02
74.17
46.48
56.68
49.77
37.00
43.83
45.71
47.42
70.97
46.81
65.16
51.54
1
48.32
12.33
5.14
26.65
11.73
3.69
8.71
18.55
46.86
11.77
6.86
26.65
9.95
3.36
8.71
18.00
47.09
12.89
10.00
26.17
14.34
2.52
9.87
19.15
2
7.06
2.24
0.86
7.92
0.24
0.58
3.02
7.40
2.13
1.14
8.40
0.70
3.13
8.86
1.91
1.14
7.92
0.47
0.46
3.03
of interval, in mps)
3
0.11
0.34
2.16
0.42
0.11
0.34
1.08
0.25
0.22
0.56
3.24
0.65
4
__
0.24
0.12
0.06
0.12
0.12
0.04
__
0.12
0.12
0.04
>4
__
-------
TABLE 30 (continued)
o
Modeling
Procedure
1
2
5
Deviation Between Observed and Predicted V-Components (midpoint
Station
STL003
STL004
STL007
EPA103
EPA107
EPA111
EPA1L2
Total
STL003
STL004
STL007
EPA103
EPA107
EPA111
EPA112
Total
STL003
STL004
STL007
EPA103
EPA107
EPA111
EPA112
N <-4
892
892
350
833
844
596
861
5268
892
892
350
833
844
596
861
5268
892
892
350 0.29
833
844
596
861
-4
0.22
0.04
0.11
0.02
0.56
0.29
-3
1.91
0.29
0.12
0.12
0.38
1.35
0.11
0.86
0.12
0.32
1.46
2.29
0.24
0.12
-2
13.90
3.36
3.71
0.48
1.90
0.34
0.12
3.61
14.69
3.59
3.14
0.72
1.54
0.50
0.23
3.76
15.25
5.38
9.14
0.84
2.73
0.34
2.21
-1
43.72
15.70
13.71
4.44
24.17
17.11
13.47
19.68
45.63
12.89
11.71
5.64
25.71
14.77
11.96
19.32
45.07
20.40
22.57
3.12
31.28
19.46
29.50
0
31.61
33.52
33.71
15.13
62.32
46.98
59.23
40.64
30.04
32.51
32.57
16.09
61.37
48.49
52.73
39.24
29.15
37.67
40.57
11.16
56.28
47.48
56.68
1
7.29
30.94
35.71
29.89
11.26
32.05
24.51
23.01
7.17
32.74
32.86
34.93
11.14
31.54
31.36
24.94
7.06
27.13
22.57
22.45
9.60
28.86
11.03
2
1.35
12.33
10.86
36.25
0.24
3.02
2.56
9.57
1.01
13.12
15.71
32.41
0.12
4.19
3.48
9.62
1.46
7.74
2.29
37.45
3.52
0.58
of interval, in mps)
3
3.81
1.43
11.52
0.50
0.12
2.64
4.37
1.71
8.40
0.12
0.50
0.23
2.30
1.68
21.13
0.34
4
0.34
0.57
2.16
0.44
-
0.67
0.86
1.68
0.44
3.48
>4
0.57
0.04
0.12
Total
5268 0.02 0.11 0.46 5.07 25.11 39.43 17.44 8.12 3.66 0.55 0.02
-------
TABLE 30 (continued)
Modeling
Procedure
1
2
5
Deviation Between Observed and Predicted
Station
STL003
STL004
STL007
EPA103
EPA107
EPA111
EPA112
Total
STL003
STL004
STL007
EPA103
EPA107
EPA111
EPA112
Total
STL003
STL004
STL007
EPA103
EPA107
EPA111
EPA112
N
892
892
350
833
844
596
861
5268
892
892
350
833
844
596
861
5268
892
892
350
833
844
596
861
<-4 -4
0.22
0.29
__
0.06
0.11
0.57
0.04 0.02
0.45
0.57 1.14
-3
1.23
1.14
0.12
0.30
1.01
2.57
0.12
0.36
0.90
3.43
0.12
-2
5.94
1.23
5.43
3.08
0.50
0.23
2.16
6.39
0.78
3.43
0.12
2.13
0.84
0.23
1.94
6.61
2.80
9.71
0.12
4.98
1.01
2.09
-1
30.83
11.21
12.86
1.32
25.36
17.95
11.61
16.17
32.74
9.08
12.57
2.40
27.73
14.77
11.03
16.21
29.82
17.38
13.14
0.96
31.87
19.13
29.62
Wind Speeds (midpoint of interval,
0
42.94
36.32
47.14
8.64
58.41
53.02
60.63
43.19
42.38
34.75
42.86
11.04
59.36
55.03
54.47
42.29
42.60
40.25
49.43
6.12
50.47
54.70
52.96
1
16.70
35.99
27.14
31.33
12.91
26.68
25.32
24.91
15.47
40.70
30.29
35.65
10.55
26.34
31.82
27.03
16.82
30.04
21.43
21.01
12.44
23.49
14.05
2
2.02
13.57
5.71
40.46
0.12
1.68
1.74
9.91
1.79
12.67
6.86
38.54
0.12
2.85
1.97
9.66
2.58
7.96
1.14
40.94
0.12
1.68
1.05
3
0.11
1.46
0.29
15.49
0.17
0.23
2.79
0.11
1.79
0.86
10.32
0.17
0.23
2.07
0.22
1.46
25.81
0.12
4
0.22
2.52
0.12
0.46
_
0.22
1.68
0.12
0.32
_ .
0.11
4.68
0.12
in mps)
>4
0.24
0.12
0.06
0.24
0.12
0.06
0.36
Total
5268 0.04 0.15 0.40 3.51 21.13 41.21 19.63 8.71 4.38 0.78 0.06
-------
TABLE 31. PERCENTAGE FREQUENCY DISTRIBUTION OF DEVIATIONS BETWEEN OBSERVED AND PREDICTED VALUES BASED ON
ESTIMATIONS FROM RTI NETWORK DATA
IsJ
Modeling Subset of
Variable Procedure Stations
U-Comp. 1 Outer-non
Inner-non
RTI
Full
2 Outer-non
Inner-non
RTI
Full
5 Outer-non
Inner-non
RTI
Full
No. Deviation Between Observed and Predicted Values (midpoint of interval,
Obs.
1183
4085
14690
19958
1183
4085
14690
19958
1183
4085
14690
19958
<-4
0.01
0.01
0.25
0.01
0.03
0.02
0.02
-4
0.08
0.07
0.03
0.04
0.17
0.12
0.02
0.05
0.17
0.10
0.05
0.07
-3
0.93
0.32
0.41
0.42
1.01
0.37
0.34
0.39
1.10
0.22
0.62
0.57
-2
4.82
3.01
3.08
3.17
4.90
3.53
2.58
2.91
2.96
2.67
4.73
4.20
-1
21.56
24.06
19.61
20.64
21.47
25.09
17.41
19.22
19.19
22.89
22.10
22.09
0
44.80
52.17
54.27
53.28
44.29
51.36
59.24
56.74
46.91
52.88
46.39
47.75
1
20.29
18.04
18.97
18.86
20.79
17.18
17.71
17.78
21.39
18.51
20.37
20.05
2
5.83
2.20
3.18
3.14
6.26
1.81
2.40
2.60
5.92
2.55
4.59
4.25
3
1.52
0.10
0.37
0.38
0.76
0.10
0.22
0.23
2.28
0.17
0.87
0.81
4
0.17
0.02
0.06
0.06
0.08
0.02
0.06
0.06
0.08
0.02
0.25
0.19
in mps)
>4
0.01
0.01
0.01
0.01
0.01
0.01
V-Comp. 1 Outer-non
Inner-non
RTI
Full
2 Outer-non
Inner-non
RTI
Full
5 Outer-non
Inner-non
RTI
1183
4085
14690
19958
1183
4085
14690
19958
1183
4085
14690
0.08
0.21
0.05
0.01
0.02
0.02
0.01
0.08
0.12
0.45
0.17
0.44
0.21
0.26
0.34
0.32
0.12
0.17
0.85
0.34
1.71
1.44
4.24
2.97
3.14
1.44
4.43
1.91
2.40
3.30
5.58
7.39
7.19
23.30
21.25
20.84
7.44
22.77
18.49
18.71
8.88
29.82
23.08
20.63
46.44
51.42
48.57
20.96
44.53
59.24
53.96
19.86
45.09
35.22
31.61
20.51
21.05
21.57
34.32
22.23
18.05
19.87
22.49
15.99
22.04
28.74
4.01
2.80
4.59
27.47
4.46
1.99
4.01
27.05
2.64
6.89
8.54
0.93
0.25
0.88
6.42
1.10
0.19
0.75
14.88
0.42
2.33
1.69
0.07
0.03
0.14
1.44
0.15
0.02
0.13
2.45
0.57
0.17
0.01
0.08
0,12
Full
19958 0.16 0.36 1.38 6.77 23.61 36.33 20.82 7.22 2.69 0.57 0.09
-------
1. Summaries over various subsets of cases, and
2. Detailed results for several individual cases.
The results shown are limited to modeling procedures 1 and 2, since the
combined measures of the previous subsection indicate that these two
procedures are certainly competitive with the two procedures that uti-
lize the larger class of model terms.
The prevailing wind speeds and directions are utilized to group the
908 cases into subsets over which the various evaluation measures are
computed. Four prevailing wind speed categories and four prevailing
wind direction categories are used; these are shown in Table 32 below,
along with their relevant sample sizes (number of cases and observa-
tions) .
TABLE 32. SAMPLE SIZES, BY PREVAILING WIND SPEED AND DIRECTION CATEGORIES
Prevailing Wind
Condition
Speed:
(mps)
<2
2-4
4-6
>6
Direction:
E,SE
Total
S
SW
Other
No. of
Cases
237
359
266
46
92
487
225
104
908
Number of Observations
RTI Network
3813
5733
4377
767
1565
7787
3599
1739
14690
Non-Network Inner-Non-Network
1377
2057
1572
262
560
2829
1268
611
5268
965
1629
1274
217
436
2208
973
468'
4085
Table 33 presents values of three evaluation measures that charac-
terize the magnitude of the estimation errors. These are shown for both
procedures 1 and 2 applied to, and evaluated over, the stations in the
RTI network. Parts A and B of this table indicate that the estimation
errors tend to be larger for the higher wind speed cases. This is true,
-73-
-------
TABLE 33. .SUMMARY OF ESTIMATION ERRORS BY PREVAILING WIND SPEED
AND DIRECTION CATEGORIES
Wind
Comp .
U
V
(U,V)
Modeling
Procedure
1R
2R
1R
2R
1R
2R
A. Pooled Residual Standard
Prevailing Wind Speed (mps)
<2 2-4 4-6 >6
0.612 0.806 1.022 1.221
0.580 0.766 0.973 1.170
0.707 0.832 0.978 1.203
0.663 0.764 0.898 1.115
0.935 1.158 1.415 1.714
0.881 1.082 1.324 1.617
B. Percentage of Cases With Residual
Less Than 1.0 mps
Wind
Comp .
U
V
Modeling
Procedure
1R
2R
1R
2R
Prevailing Wind Speed (mps)
<2 2-4 4-6 >6
95.8 84.7 56.0 34.8
97.0 88.9 62.4 47.8
92.4 84.4 63.9 34.8
94.5 90.5 74.4 41.3
Deviations (mps)
Prevailing Wind Direction
E,SE S SW Other
0.868 0.862 0.884 0.800
0.813 0.839 0.828 0.710
0.784 0.854 0.893 0.969
0.697 0.782 0.827 0.923
1.170 1.213 1.256 1.256
1.071 1.147 1.171 1.165
Standard Deviations
Prevailing Wind Direction
E,SE S SW Other
75-0 76.8 72.0 87.5
84.8 79.5 77.8 93.3
90.2 79.5 75.1 66.3
93.5 87.3 81.8 68.3
2
C. Pooled Adjusted R Statistics
Wind
Comp .
U
V
Modeling
Procedure
1R
2R
1R
2R
Prevailing Wind Speed (mps)
<2 2-4 4-6 >6
0.253 0.217 0.240 0.218
0.331 0.291 0.311 0.282
0.397 0.472 0.600 0.640
0.470 0.555 0.663 0.690
Prevailing Wind Direction
E,SE S SW Other
0.253 0.147 0.247 0.458
0.345 0.191 0.339 0.573
0.237 0.532 0.625 0.492
0.396 0-607 0.678 0.539
-74-
-------
even though a larger percentage of the variation is typically accounted
for in the high-speed cases, as is demonstrated in Part C of Table 33.
On an absolute scale, smaller estimation errors occur for the east/
southeast category than for the other wind direction categories; this
appears to be a reflection of the trend noted above for wind speed,
since the average wind speed over cases in this wind direction category
is less than that for the other direction categories. As might be
expected, only a small percentage of the variation is accounted for in
the less-dominant wind componentfor instance, in the U-component when
a prevailing southerly wind occurs. In terms of estimation errors, the
improvement of procedure 2 over procedure 1 appears to be quite con-
sistent across all eight (4 speed and 4 direction) categories and both
2
wind components. Differences in the pooled adjusted R statistics for
procedures 1 and 2, for example, range from about 0.05 to about 0.12.
Table 34 provides two basic measures of the prediction errors
namely, pooled RMSE's over all non-network stations (Part A), and over
all inner-non-network stations (Part B)categorized by prevailing wind
speed and by prevailing wind direction. The pooled RMSE's for pro-
cedure 1 are usually slightly smaller than the corresponding RMSE's for
procedure 2. As with the estimation errors, there is a definite pattern
of larger RMSE's for the higher wind speed cases as compared to the
lower speed cases; this trend appears to be more pronounced for the
inner-non-network stations (Part A). Figure 5 illustrates this trend
and permits a visual comparison of the relative magnitudes of the esti-
mation and prediction errors to be made for the various wind speed
categories.
-75-
-------
TABLE 34. SUMMARY OF PREDICTION ERRORS BY PREVAILING WIND SPEED AND DIRECTION
CATEGORIES'
A. Pooled Root Mean Square
Wind
Comp .
U
V
w
(U,V)
B.
Wind
Comp .
U
V
W
(U,V)
Modeling
Procedure
1R
2R
1R
2R
1R
2R
1R
2R
Pooled Root
Modeling
Procedure
1R
2R
1R
2R
1R
2R
1R
2R
Prevailing Wind
<2
0.716
0.744
1.026
1.009
1.032
0.985
1.251
1.253
Mean
2-4
0.834
0.900
1.062
1.057
1.062
1.042
1.351
1.388
Square
0
0
1
1
1
1
1
1
Errors
Speed
4-6
.927
.941
.106
.119
.081
.072
.443
.462
Over All Non-Network Stations
Ops)
>6
0.968
0.957
1.643
1.622
1.499
1.452
1.907
1.884
Errors Over All
Prevailing Wind
<2
0.682
0.671
0.710
0.724
0.741
0.716
0.985
0.987
2-4
0.776
0.800
0.821
0.833
0.792
0.798
1.130
1.156
0
0
0
1
0
0
1
1
Speed
4-6
.851
.875
.983
.028
.841
.871
.300
.350
(mps)
>6
0.841
0.857
1.585
1.584
1.283
1.265
1.795
1.801
(mps)
Prevailing Wind Direction
E,S£
0.958
1.047
0
0
0
0
1
1
.923
.948
.985
.991
.330
.412
S
0.758
0.820
1.098
1.078
1.082
1.054
1.334
1.355
Inner-Non-Network
sw
0.876
0.874
1.192
1.198
1.131
1.099
1.479
1.482
Other
1.009
0.970
1.077
1.106
1.096
1.074
1.476
1.472
Stations (mps)
Prevailing Wind Direction
E
0
1
0
0
0
0
1
1
,SE
.933
.018
.747
.795
.792
.813
.195
.292
S
0.743
0.754
0.823
0.841
0.786
0.790
1.109
1.130
SW
0.803
0.805
1.043
1.059
0-922
0-918
1.316
1.330
Other
0.770
0.766
1.099
1.129
0.864
0.874
1.343
1.364
-76-
-------
1.9
1.8
1.7
1.6
Dips)
1.4
1.3-
1-.2
1.1
1.0
0.9.
0.8'
I 1 1 1
^2 raps 2-4 mps 4-6 mps >6 mps
Pooled Vector Residual Std. Dev., Procedure 1R
- - Pooled Vector Residual Std. Dev., Procedure 2R
Pooled Vector RMSE (Inner-Non-Network)*
Pooled Vector RMSE (Non-Network)*
Prevailing
Wind
Speed
Shown for procedure 1R; a very similar curve occurs for procedure 2R.
Figure 5. Pooled measures of estimation and prediction errors
versus prevailing wind speed
-77-
-------
Percentage errors in predicting wind speeds at non-network stations
are summarized in Table 35 by prevailing wind speed categories. These
percentage errors are shown for procedures 1 and 2 applied to data from
the RTI network and, for comparative purposes, for procedure 4 applied
to the full network data. The percentage errors tend to decrease with
increasing wind speed for all three of these procedures. Over the
inner-non-network, the percentage errors for procedures 1R and 2R are
about 20% greater than the corresponding percentage errors for procedure
4F. Percentage errors for procedures 1R and 2R, over all non-network
stations, are roughly 30% larger than the percentage errors for proce-
dure 4F.
In order to further demonstrate the performance of the modeling
procedures in specific cases, three particular cases were selected for
detailed examination:
Prevailing Winds
Case
I
II
III
Date
8/12/75
2/20/76
2/21/76
Time
1800
1600
1700
Speed
2.56 mps
4,83 mps
5.59 mps
Direction
158°
136°
274°
It should be emphasized that these cases were picked arbitrarily. They
do not necessarily reflect "typical" cases from among the 908 cases;
Case I, for example, was purposely chosen as a worst case situation for
modeling procedure 2, in that the vector RMSE over all non-network
stations for this case was much larger than for any other case.
The prediction models determined by procedures 1 and 2 for these
cases are given in Table 36. These particular models illustrate the
typical pattern of larger, more complex models for modeling procedure 2,
as compared to procedure 1. It should also be noted that in two of the
-78-
-------
TABLE 35. PERCENTAGE ERRORS IN WIND SPEED PREDICTIONS AT NON-NETWORK
STATIONS. BY PREVAILING WIND SPEED CATEGORIES
Subset of
Stations
Prevailing
Wind Speed
Category
Mean Wind
Speed (mps)
(W)
Inner-non
Network
Stations
S mps
2-4 mps
4-6 mps
>6 mps
Overall
1.882
3.344
4.732
6.383
3.593
100% x (RMSE(W)/W)
For Modeling Procedure:
1R 2R 4F
39.
23.
17.8
20.1
23.1
19.1
All Non-
Network
Stations
22 mps
2-4 mps
4-6 mps
>6 mps
Overall
2.
3.
5.
6.
129
622
036
896
3.816
48.5
29.3
21.5
21.7
28.5
46.3
28.8
21.3
21.1
27.8
35.2
21.6
16.5
17.0
21.3
-79-
-------
TABLE 36. PREDICTION MODELS FOR THREE SPECIFIC CASES
Modeling
Case Procedure Prediction Model Based on Data From RTI Network
I 1 u = - 0.30969 - 0.00127536xy2
V = 3.23072 - 0.00027154y3 + 0.00154632x2y
2 U = 0.95191 - 0.170683x + 0.0095480x2 - 0.0454584xy + 0.00445697x2y
- 0.00176493xy2 - 0.038375h
V = 3.26253 + 0.00111345x2y + 0.00103673xy2 - 0.000027405y4
g II 1 U - - 3.81314 - 0.085727x
i
V = 4.76958 - 0.0118349y2 - 0.00026648x3 + 0.00232374xy2
2 U = -3.60660 - 0.195394x + 0.00045859x3
V = 4.76958 - 0.0118349y2 - 0.00026648x3 + 0.00232374xy2
III 1 U = 5.32123 + 0.00113637xy2
V = 0.09533 - 0.000014250x4
2 U = 5.84047 + 0.162089x + 0.0043193xy - 0.00056576x3 - 0.0174165h
V = 0.09533 - 0.000014250x4
-------
cases (Cases II and III) the same model was selected for the V-component
by both procedures.
Table 37 summarizes the fit of the models over the RTI network
stations (i.e., over the set of stations actually used for determining
the model form and parameter estimates). Modeling procedure 2 generally
accounts for more of the variation in winds among these stations (i.e.,
2
larger R values). That is, the predicted surfaces from procedure 2
will typically have more hills, valleys, ridges, etc. than those from
procedure 1, and therefore, if these are "real" (e.g., as demonstrated
by comparing predicted values with observed data from the non-network
stations), it would be the preferred procedure. On the other hand,
because procedure 2 yields more complex polynomials, it is more likely
to produce spurious hills, valleys, ridges, etc. in the wind field over
those areas not in the vicinity of one or more RTI network stationsfor
instance, in the outlying areas of the region. This is well, illustrated
by Case I, in which the RMSE's over the inner-non-network are quite
comparable for the two procedures (see Table 38, Part B) , whereas the
RMSE for procedure 2 over the entire non-network is extremely large
relative to the corresponding RMSE from procedure 1 (see Table 38,
Part A). The large deviation in observed and predicted winds at sta-
tion STL007 accounts for this discrepancy; it should be noted that wind
data were not available for any RTI network station near to the STL007
site.
The observed data for Case I, as shown in Figure 6(A) , indicate
that the wind flow is generally out of the south-southeast with wind
speeds across the city ranging from about 1 to 6 mps and averaging about
3.3 mps. The flow pattern suggests the influence of a heat island
-81-
-------
TABLE 37. ANALYSIS OF VARIANCE RESULTS FOR THREE SPECIFIC CASES
I
CO
No. Stations
in RTI
Case Network
I 16
II 18
III 17
Modeling Wind
Procedure Comp.
1 U
V
2 U
V
1 U
V
2 U
V
I U
V
2 U
V
No. of Terms
in Model
2
3
7
4
2
4
3
4
2
2
5
2
Total
26.5783
18.2802
26.5793
18.2802
47.8158
33.4045
47.8158
33.4045
20.2199
19.3347
20.2199
19.8347
Sums of Squares
Regression
6.8707
8.5297
21.5190
11.2206
11.6919
24.5170
17.3583
24.5170
7.6288
6.6433
15.9330
6.6933
(raps)
Residual
19.7087
9.7504
5.0603
7.0596
36.1238
8.3675
30.4575
8.8875
12.5911
13.1415
4.2869
13.1415
Residual
Variance*
1.4078
0.7500
0.5623
0.5883
2.2577
0.6348
2.0305
0.6348
0.8394
0.8761
0.3572
0.8761
R2
0.258
0.467
0.810
0.614
0.245
0.734
0.363
0.734
0.377
0.337
0.784
0.337
F Value+
4.881
5.686
6.379
6.358
5.179
12.873
4.274
12.873
9.088
7.640
11.150
7.640
* The residual variance is calculated by dividing the residual sum of squares by the number of residual degrees of
freedom. This degrees of freedom is the number of stations minus the number of model terms.
+ The F-value is calculated as the ratio of the regression mean square to the residual variance. The degrees of
freedom for the regression mean square is one less than the number of model terms.
-------
A. EMSE's Over all Non-Network Stations
Wind
Comp.
U
V
W
Cu, v)
Wind
Conrp .
U
V
W
(U, V)
Modeling
Procedure
1
2
1
2
1
2
1
2
B. RMSE's
Modeling
Procedure
1
2
1
2
1
2
1
2
I
1.245
5.038
1.418
0.896
1.247
3.434
1.886
5.117
Case
II
1.485
1.663
1.384
1.384
1.487
1.511
2.030
2.164
Over All Inner-Non-Network
I
1.840
1.891
0.574
0.524
1.068
0.048
1.928
1.963
Case
II
1.648
1.910
1.085
1.085
1.210
1.321
1.973
2.196
III
1.160
0.779
0.759
0.759
1.140
0.806
1.387
1.088
Stations
III
0.801
0.606
0.816
0.816
0.785
0.655
1.143
1.016
-83-
-------
.4 M 107 \ [ 103 x-v
71 u-irO,
Figure 6. Observed and predicted winds for case I: (A) observed
data; (B) predicted winds using procedure 1; (C) predicted
winds using procedure 2
-84-
-------
circulation having strong convergence in the northwestern part of the
city. Such flow patterns associated with heat island circulation have
been observed before in the city of St. Louis (Vukovich, Dunn et al.,
1979).
The wind predictions for Case I were based on only 16 of the 19
stations in the RTI network. The three non-reporting stations were RAPS
stations 102, 119, and 120. Station 119 is located at the outer boun-
dary of the southwestern portion of the network and station 120, at the
outer boundary of the northwestern portion (see Figure 1). Large errors
in the predicted wind field might be expected in these regions due to
the absence of wind data from these areas of the network. That is, the
predictions in these areas would essentially represent extrapolation of
the polynomial models outside the range of the data; it is well known
that such extrapolation is highly error prone.
The predicted wind fields for Case I determined by modeling proce-
dures I and 2 are shown in Figures 6(B) and 6(C), respectively. Figure
6(B) shows a general flow pattern from the south-southeast with wind
speeds ranging from about 1.4 to 5.1 mps. Although some convergence in
the flow downstream of the city is evident, it is not as intense as that
appearing in the observed flow field. The predicted wind field from
procedure 2 (Figure 6(C)) also shows the general south-southeasterly
flow pattern; the predicted wind speeds range from about 1.6 to 11.7
mps. A southwesterly wind with a speed of 11.7 mps is predicted for
STL007. This station is in the northwestern zone of the region and thus
represents an area in which extrapolation occurs when no data are avail-
able from EPA120. If STL007 is excluded, the predicted wind speeds
range from 1.6 to 6.2 mps across the other stations; the predicted field
-85-
-------
also shows the strong convergence downstream of the city that was appa-
rent in the observed data. Except for the problem of extrapolation
caused by the missing data, it would therefore appear that modeling
procedure 2 performed better than procedure 1 in this case.
The wind data for Case II (Figure 7(A)) shows a general south-
easterly flow with wind speeds ranging from 1.8 to 9.4 mps. The average
wind speed was 6.0 mps. There is some indication of convergence imme-
diately downstream of the city which may be associated with the heat
island circulation; this convergence is not as significant as that found
in Case I. There is also an apparent speed convergence over the city,
probably due to the increased friction in that region.
The predicted wind fields were based on 18 of the 19 stations in
the RTI network. The missing station was RTI202, which is located in
the interior of the network domain (see Figure 1). Both of the pre-
dicted wind fields for this case, shown in Figures 7(B) and 7(C), appear
to pick up the indicated speed convergence over the central portion of
the city. Procedure 2 appears to indicate the convergence downstream of
the city somewhat better than procedure 1.
The observed data for Case III (Figure 8(A)) indicate flow from the
west with wind speeds ranging from 4 to 8 mps. The wind distribution
shows no significant distortion of the flow pattern due to the presence
of the city except for a slight decrease in wind speed over the central
portion of the city again due to the increased friction in that region.
The predicted wind fields for Case III were obtained by utilizing
17 of the RTI network stations. Data were missing from RTI stations 202
and 205, which are located in the interior of the network. The flow
pattern obtained from modeling procedure 1 (Figure 8(B)) is very similar
-86-
-------
A
B
Figure 7. Observed and predicted winds for case III: (A) observed
data; (B) predicted winds using procedure 1; (C) predicted
winds using procedure 2
-87-
-------
B
Figure 8. Observed and predicted winds for case III: (A) observed
data; (B) predicted winds using procedure 1; (C) predicted
winds using procedure 2
-88-
-------
to the observed data, although the lower wind speeds in the central city
» .s
(relative to the surrounding regions) are not as obvious as those of
Figure 8(A). The flow field based on procedure 2 (Figure 8(C)) is also
quite similar to the observed flow field; in this case, the lower wind
speeds over the urban region are somewhat more evident.
Based on these three cases, it appears that modeling procedure 2
may produce predicted wind fields with general characteristics more
similar to the observed wind field than the procedure 1 predictions.
The results also indicate that missing data may lead to substantially
poorer predictions in some areas within the region, particularly when
the missing data occur at the boundaries of the network. In such
cases, it will be necessary to redefine the network domain so as to
avoid the effects of extrapolations.
-89-
-------
SECTION 5
DISCUSSION OF RESULTS
CONCLUSIONS AND FINDINGS
The primary conclusion of this study is that a polynomial model
derived by stepwise regression on 13 model terms and applied to the 19-
station RTI network could produce predicted wind fields for St. Louis
comparable to those produced by similar procedures applied to a larger
class of model terms and a larger network (i.e., 23 terms and 26 sta-
tions) . The 13-term model and the 19-station network were selected in
the theoretical phase of this research program (Vukovich et al.,
1978) based on the argument that the addition of terms in the
model and/or stations in the network would not markedly improve the
analysis of the wind field. This hypothesis has now been substantiated
using observed data. The conclusion of this study is based on the
following findings:
In terms _of estimation errors (precision), the results of
applying four stepwise regression procedures to wind data from the
RTI network (the "optimum" network) and from the full network
indicate that comparable results are obtained for the RTI network
and the full network, although the estimations for the full network
2
yield somewhat more consistent adjusted R values across the vari-
ous cases.
The four stepwise regression procedures are clearly superior
to both procedure 0 (fitting the full 13-term model) and procedure
5 (fitting a flat surface). This indicates that stepwise regres-
sion techniques offer a practical method for automating the model
form determination over a large number of cases; prior screening of
the data for outliers, however, may hamper implementation of any
automated, quick-response method for model estimation.
-90-
-------
lure
Among the four stepwise regression procedures, the proceck
permitting the most complex model forms (i.e., procedure 4) yields
the smallest estimation errors.
Procedure 2, which utilizes a class of model forms consistent
with the overall methodology, yields residual variances that are
comparable to those of procedure 3 which utilizes a larger class of
model forms.
Procedure 1, which differs from procedure 2 only in that it
uses a stepwise regression parameter of 0.1 instead of 0.2, appears
the least favorable of the four stepwise regression procedures in
terms of estimation errors.
Pooled residual standard deviations for the individual wind
components obtained from procedure 1 are about 0.08 mps larger than
those for procedure 4 and about 0.04 mps larger than those for
procedures 2 and 3.
In terms of predictions at the non-RTI-network stations
(accuracy), procedure 4 is clearly less accurate than the other
procedures (which tended to produce simpler models than those of
procedure 4) . The mean square errors over the entire set of seven
non-network stations and over all cases are somewhat better for
procedure 1 than for procedure 2 or 3; over the subset of five
non-network stations in the interior portion of the St. Louis
region, however, procedure 3 appears more favorable.
Over interior non-network stations, percentage errors for
predicting wind speeds by procedures 1 and 2 averaged 23%, when
data from the RTI network are utilized. This compares favorably
with a corresponding error of .19% for procedure 4 applied to the
full network of stations.
A subjective weighting to reflect the relative importance of
estimation errors and prediction errors was utilized to judge the
overall performance of the various estimation procedures. If the
prediction errors are condisdered the more important of the two
types, either of the two procedures consistent with the overall
methodology (i.e., procedures 1 and 2) or procedure 3 may be con-
sidered "best" depending upon the particular criterion chosen
(e.g., the particular weight chosen and the particular set of
-91-
-------
stations and/or cases considered). It is clear, therefore, that
little improvement is achieved by expanding the class of models
from the 13-term set up to the 23-term set of candidate terms.
Magnitudes of the average and pooled root mean square errors
for procedures 1 and 2 across all non-network stations and all
cases are roughly 0.1 to 0.2 mps larger than the corresponding
pooled standard deviations over the RTI network.
Individual case studies, which serve to illustrate the ana-
lysis for several wind directions and wind speeds, indicated that
procedure 2 performed better than procedure 1, and that the tech-
nique yielded estimates of the wind field that closely compared to
the observed data.
Over the range of wind directions encountered, there was
little change in the estimation error due to wind direction.
Unfortunately only three major wind directions occurred with regu-
larity, (i.e., flow from the southeast, south, and southwest). The
variation of the errors with wind speed was also not substantial
although larger absolute errors and smaller percentage errors
tended to occur for cases with high wind speeds.
It was not possible to determine whether the 19-station RTI network
was "the" optimum network for the city of St. Louis because there were
not sufficient data or a sufficient number of auxiliary stations to test
the 19-station network against all other possible networks. Further-
more, the theoretical phase showed that the network chosen for St. Louis
is likely to be near-optimal only for the procedures used; if another
procedure was used, it is likely that a different network would have
been selected. The reliability of the wind field analysis will also
depend on the results of the prediction of the air pollution analysis
model, since the wind field is an input parameter to that model. How-
ever, the results of this study have, in our opinion, demonstrated that
the methodology can be used to determine the locations of a reasonable
-92-
-------
number of stations from which wind data can yield reasonable wind field
estimates over the domain of the network.
ANALYTICAL LIMITATIONS
The data available for demonstrating the wind field estimation
procedures and for evaluating the sampling network have several limita-
tions. These data had, due to economic constraints, a limited time
span (i.e., a total of 33 days). Even within this period, there were
large amounts unreported, invalid, and unusable data. For example, only
908 cases out of potential number of 1584 cases were usable; in terms of
individual observations, less than 50% of the potential number were
available. In addition, for 13 of the 16 RAPS stations, the winds at 10
m above ground had to be estimated from winds observed at 30 m above the
ground level. The wind fields associated with the winter field program
were atypical for that time of the year according to statistical analy-
sis of wind data obtained from the National Climatic Center for the
synoptic weather station (Lambert Field); the winter regime is generally
characterized by northwesterly winds but, during the winter data collec-
tion period, southwesterly winds predominated.
The model/network evaluations were also limited by several practi-
cal constraints. First, only seven stations not in the RTI network
furnished data for which comparisons between observed and predicted
winds could be made. Secondly, comparisons involved in the evaluation
typically had to be made in terms of absolute measures such as root mean
squared errors rather than relative measures, since the "best" model was
unknown and since only a relatively small number of potential network
designs could be judged (i.e., those which were subsets of the full
-93-
-------
network). Furthermore, errors which arose from deficiencies in the
network could not be isolated from those than were effected by other
sources (e.g., measurement errors, model deificiencies, etc.).
The findings outlined above provide, in our opinion, an accurate
assessment of the major results of this study; they are obviously made
within the context of the limitations described above.
REMARKS
The evaluation of the RTI network was based on data obtained from
the U.S. Environmental Protection Agency's Regional Air Pollution Study,
the St. Louis City/County Air Pollution Network, and three stations set
up by the Research Triangle Institute. Overall, there were 26 stations
utilized, including the 19-station "optimum" network. Though the econo-
mic burden to obtain these data was significant, the data were not
sufficient to make a complete evaluation of the network.
In the application of this technique for other cities, an evalua-
tion of the network will certainly be necessary. It is unfeasible for
future evaluations to face the same economic burden as the present
evaluation. Nevertheless, after establishing the optimum network, a
period should be set aside in which data are collected at the network
stations and at locations not in the network. Non-network data can be
collected by a mobile van during periods when the wind is in quasi-
steady state. Case studies should be examined in which wind speeds and
wind directions differ from case to case. The results of the case study
analyses will yield estimates of the reliability of the network. This
technique should also be applied in evaluation of the air pollution
distribution obtained from the objective variational analysis model.
-94-
-------
The 13-term class of model forms was used in the evaluation of the
wind field in order to test and validate the methodology developed in
the theoretical phase of this research project. Now that this aspect is
complete and the results are positive, other surface fitting procedures
for estimating the wind field should be investigated. For example, one
procedure which would avoid the extrapolation problems of the polynomial
models is gravitational-weighted (inverse of distance squared) interpo-
lation. This approach uses only those data points close to the grid
point for which a wind prediction is being made; generally, this allows
extrapolation into locations a small distance outside the domain of the
network without large error.
The wind analysis is an input parameter to the objective varia-
tional analysis model (OVAM) to be used to derive the air pollution
distribution. The evaluation of that model will take place in the next
phase and will utilize wind field predictions at selected grid points.
Figure 9 provides on illustration (Case II, Procedure 2 in the last
subsection of Section 4) of the predicted wind field as it would be used
in the OVAM. The predicted winds at each grid point in the 20-km x 20-
km area would be utilized as inputs. A grid spacing of 2 km is utilized
in this figure.
-95-
-------
Figure 9. Distribution of predicted winds on a 2-km by 2-km grid
for case II using procedure 2
-96-
-------
REFERENCES
Barr, A.J., J.H. Goodnight, J.P. Sail, J.T. Helwig, 1976: A User's
Guide to SAS - 76, SAS Institute, Sparks Press, Raleigh, N. C.
Draper, N.R. and H. Smith, 1966: Applied Regression Analysis, John
Wiley and Sons, New York.
Estoque, M.A. and C.M. Bhumralkar, 1969: "Flow Over a Localized Heat
Source", Monthly Weather Review, 97, 850-859.
International Mathematical and Statistical Libraries, 1975: IMSL
Library Reference Manual.
Lettau, H., 1969: "Note on Aerodynamic Roughness-Parameter Estimation
on the Basis of Roughness-Element Description", J. Appl. Meteor., 8^,
822-832.
Vukovich, P.M., J.W. Dunn, and B.W. Crissman, 1976: "A Theoretical
Study of the St. Louis Heat Island: The Wind and Temperature Dis-
tribution", J. Appl. Meteor.. 15, 417-440.
Vukovich, F.M., W.D. Bach, Jr., and C.A. Clayton, 1978: "Optimum
Meteorological and Air Pollution Sampling Network Selection in Cities,
Volume I: Theory and Design for St. Louis", Environmental Monitoring
Series, EPA-600/4-78-030.
Vukovich, F.M., J.W. Dunn, and W.J. King, 1979: Observations and
Simulations of Diurnal Variation of the Urban Heat Island Circulation
and Associated Ozone Variations: A Case Study. Submitted to J. Appl.
Meteor.
-97-
-------
TECHNICAL REPORT DATA
(Please read Instructions on the reverse before completing)
i. REPORT NO.
EPA-600/4-79-069
3. RECIPIENT'S ACCESSION-NO.
4. TITLE A\'D SUBTITLE
OPTIMUM METEOROLOGICAL AND AIR POLLUTION NETWORK
SELECTION IN CITIES: Volume II - Evaluation of Wind
Field Predictions for St. Louis
5. REPORT DATE
October 1979
6. PERFORMING ORGANIZATION CODE
7. AUTHOR(S)
Fred M. Vukovich and C. Andrew Clayton
8. PERFORMING ORGANIZATION REPORT NO,
9. PERFORMING ORGANIZATION NAME AND ADDRESS
Research Triangle Institute
P.O. Box 12094
Research Triangle Park, North Carolina
10. PROGRAM ELEMENT NO.
1HE775
27709
11. CONTRACT/GRANT NO.
63-03-2187
12. SPONSORING AGENCY NAME AND ADDRESS
U.S. Environmental Protection AgencyLas Vegas, NV
Office of Research and Development
Environmental Monitoring and Support Laboratory
Las Vegas, NV 89114
13. TYPE OF REPORT AND PERIOD COVERED
period ending Feb. 1979
14. SPONSORING AGENCY CODE
EPA/600/07
15. SUPPLEMENTARY NOTES
This report is the second in a series on this topic (see EPA-600/4-78-030).
For
further information contact J.L. McElroy, Project Officer (702)736-2969, X241, Las Veg,
16. ABSTRACT
This report is the second in a series on the development of a method for design-
ing optimum meteorological and air pollution sampling networks and its application for
St. Louis, Missouri (see EPA-600/4-78-030). It involves the evaluation of the wind
field network and utilizes wind data collected during special summer and winter field
programs.
The evaluation considers the precision and accuracy of the procedure used for
estimating the wind field. The basic procedure for determining the wind field involve:
applying stepwise regression to a class of linear statistical models involving subsets
of 13 specific terms and data from a 19-station network; determined during the
theoretical phase of the study. The evaluation includes the selection of a larger
class of model forms and a basic set of 23 terms to compare with the 13<-term class
and includes estimations based on data from all reporting stationsup to a total of
26 stations.
The results demonstrate that application of 13-term modeling procedures to wind
data from the 19-station network can produce predicted wind fields comparable to those
produced by similar but more general procedures applied to a larger (26-station) net-
work and that the method can objectively provide a reasonable estimate of the wind
field over the domain of the network. An exhaustive evaluation was not feasible due
largely to numerous analytical and data limitations.
17.
KEY WORDS AND DOCUMENT ANALYSIS
DESCRIPTORS
b.IDENTIFIERS/OPEN ENDED TERMS
COSATI Field/Group
mathematical models
wind field
air pollution
meteorology
sampling network
St. Louis, Missouri
43F
55C
68A
72E
13. DISTRIBUTION STATEMENT
RELEASE TO THE PUBLIC
19. SECURITY CLASS (ThisReport)
UNCLASSIFIED
21. NO. OF PAGES
114
20. SECURITY CLASS (Thispage)
UNCLASSIFIED
22. PRICE
EPA Form 2220-1 (9-73)
U.S. GOVERNMENT PRINTING OFFICE 683-O91/22O9
------- |