EPA/600/R-01/071
September, 2001
Probabilistic Aquatic Exposure
Assessment for Pesticides
I: Foundations
By
Lawrence A. Burns, Ph.D.
Ecologist, Ecosystems Research Division
U.S. Environmental Protection Agency
960 College Station Road
Athens, Georgia 30605-2700
National Exposure Research Laboratory
Office of Research and Development
U.S. Environmental Protection Agency
Research Triangle Park, NC 27711
-------
Notice
The U.S. Environmental Protection Agency through its Office of Research and Development funded and
managed the research described here under GPRA Goal 4, Preventing Pollution and Reducing Risk in
Communities, Homes, Workplaces and Ecosystems, Objective 4.3, Safe Handling and Use of Commercial
Chemicals and Microorganisms, Subobjective 4.3.4, Human Health and Ecosystems, Task 6519, Advanced
Pesticide Risk Assessment Technology. It has been subjected to the Agency's peer and administrative review
and approved for publication as an EPA document. Mention of trade names or commercial products does not
constitute endorsement or recommendation for use.
Abstract
Models that capture underlying mechanisms and processes are necessary for reliable extrapolation of
laboratory chemical data to field conditions. For validation, these models require a major revision of the
conventional model testing paradigm to better recognize the conflict between model user's and model
developer's risk (as Type I and Type II errors) in statistical testing of model predictions. The predictive
reliability of the models must be hypothesized and tested by methods that lead to conclusions of the form "the
model predictions are within a factor-of-two of reality at least 95% of the time." Once predictive reliability is
established, it can be treated as a "method error" within a probabilistic risk assessment framework. This report,
developed under APM 131 ("Develop a Probability-Based Methodology for Conducting Regional Aquatic
Ecosystem Exposure and Vulnerability Assessments for Pesticides"), describes a step-by-step process for
establishing the predictive reliability of exposure models.
Monte Carlo simulation is the preferred method for capturing variability in environmental driving forces and
uncertainty in chemical measurements. Latin Hypercube Sampling (LHS) software is under development to
promote efficient computer simulation studies and production of tabular and graphical outputs. Desirable
outputs include exposure metrics tailored to available toxicological data expressed as distribution functions
(pdf, cdf) and, if needed, empirical distribution functions suitable for use in Monte Carlo risk assessments
combining exposure and effects distributions. ORD numerical models for pesticide exposure supported under
this research program include a model of spray drift (AgDisp), a cropland pesticide persistence model (PRZM),
a surface water expo sure model (EXAMS), and a model offish bioaccumulation (BASS). A unified climatological
database for these models is being assembled by combining two National Weather Service (NWS) products: the
Solar and Meteorological Surface Observation Network (SAMSON) data for 1961-1990, and the Hourly United
States Weather Observations (nuswo)data for 1990-1995. Together these NWS products provide coordinated
access to solar radiation, sky cover, temperature, relative humidity, station atmospheric pressure, wind direction
and speed, and precipitation. By using observational data for the models, "trace-matching" Monte Carlo
simulation studies can transmit the effects of environmental variability directly to exposure metrics, by-passing
issues of correlation (covariance) among external driving forces. Additional datasets in preparation include soils
and land-use (planted crops) data summarized for the State divisions of Major Land Resource Areas (MLRA),
derived from National Resource Inventory (NRI) studies.
This report covers aperiod from May 2,2001 to September 30,2001 and work was completed as of September
30,2001.
-------
Preface
Predictive modeling is an important tool for assessing the environmental safety of new pesticidal active
ingredients and new uses for currently registered products, and for evaluating the implications of new findings
in their environmental and product chemistries. Climate, soil properties, limnology, and agronomic practices
influence exposure by controlling the movement of pesticides within the agricultural landscape and by
governing the speed and products of transformation reactions. These factors vary with time and with location
within the often continent-wide use patterns of agricultural chemicals. This variability, together with
measurement uncertainties in the values of chemical properties, mandates a statistical and probabilistic
approach to exposure assessment. An effective pesticide modeling technology must include validated
algorithms for transport and transformation of pesticides, extensive databases of agro-ecosystem scenarios
(crop and soil properties, meteorology, limnology, fish community ecology) and graphical user interfaces to
maximize the ease of production and interpretation of complex probabilistic analyses. Several agencies collect
data of significance for environmental safety, but these data must be assembled in usable forms, organized by
appropriate landscape units, and made accessible to simulation models if their potential is to be realized.
This report is a foundational document for predictive modeling in support of pesticide exposure studies. The
Environmental Protection Agency' s approach to pesticide regulation in its Office of Pesticide Programs (OPP)
is outlined, with a brief discussion of the development of probabilistic exposure assessment within OPP's
Environmental Fate and Effects Division (EFED). The epistemological basis forpredictivemodelingin support
of EFED modeling is reviewed, with a brief description of Monte Carlo methods for incorporating variability
and uncertainty into process-based models. Within the Monte Carlo context, the predictive uncertainty of
numerical models must be explicitly quantified; a method of model validation and quantification of predictive
uncertainty developed for this project is reported here along with a brief review of prior validation studies of
the models. Finally, source materials for databases under development to support "trace matched" input
parameters for exposure modeling are briefly documented.
Lawrence A. Burns, Ecologist
Ecosystems Research branch
Foreword
Environmental protection efforts are increasingly directed toward preventing adverse health and ecological
effects associated with specific chemical compounds of natural or human origin. As part of the Ecosystems
Research Division's research on the occurrence, movement, transformation, impact, and control of
environmental contaminants, the Ecosystems Assessment Branch studies complexes of environmental
processes that control the transport, transformation, degradation, fate, and impact of pollutants or other
materials in soil and water and develops models for assessing the risks associated with exposures to
chemical contaminants.
Rosemarie C. Russo, Director
Ecosystems Research Division
Athens, Georgia
111
-------
Contents
Notice ii
Abstract ii
Preface iii
Foreword iii
Abbreviations, Acronyms, and Symbols vi
Acknowledgments vi
Introduction to Pesticide Safety Assessment 1
Steps in Risk Assessment 1
The OPP "Quotient Method" of Assessing Ecological Risk 2
Ecotoxicological Hazard Assessment 2
Environmental Exposure Assessment 3
Ecological Risk Characterization 3
Probabilistic Assessment 4
Modeling Issues and Technique 6
Pesticide Exposure Modeling 6
Concepts of Model Building and Model Testing 6
Empirical Science and Numerical Extrapolation Models 8
Variability and Uncertainty 9
Dealing with Variability and Uncertainty: The Monte Carlo Method 10
Interpretation and Presentation of the Results of Monte Carlo Analysis 11
Validation, Verification, and Performance Measures 13
The Validation Problem 13
Why Can't Models be Proven? 13
Testing the Performance Capabilities of Models 14
How Can Prediction Uncertainty be Quantified? 14
Risk and Decision Analysis in Model Testing 15
Goals and Constraints of Performance Tests 15
Good Statistical Practice for Testing Models 17
A Methodology for Performance Testing (Validation) of Simulation Models 17
A Step-Wise Procedure for Performance Testing 21
A Substantial Example: Photolysis of DMDE in EXAMS 21
Descriptive Statistics and Predictive Uncertainty 24
Predictive Validity of Exposure Models 24
Current Model Validation Status 25
AgDisp/AgDrift 25
Pesticide Root Zone Model (PRZM) 26
IV
-------
Exposure Analysis Modeling System (EXAMS) 27
Bioaccumulation and Aquatic System Simulator (BASS) 27
DataBase Documentation 28
Agricultural Geography 28
Physiography of LRR and MLRA 28
Meteorology: SAMSON/HUSWO 29
Soils and Land Use 29
National Resource Inventory Data Characteristics 30
Stratospheric Ozone from the TOMS 30
The Ozone Measurement 30
TheDataFiles 31
Problems with the Data 31
Pesticide Usage 31
Appendix: SAMSON/HUSWO Stations 35
References 40
-------
Abbreviations, Acronyms, and Symbols
a Probability of Type I error; the probability of rejecting the null hypothesis H0 when H0 is
in fact true; P {reject H0|H0}
P Probability of type II error; the probability of accepting the null hypothesis H0 when in
fact the alternate hypothesis Ha is true; P {accept H0|Ha}; probability of accepting H0
when H0 is in fact false
(1-P) The probability of rejecting H0 when it is in fact false; the power function of a test
100 x a Significance level expressed as a percentage (e.g., 5%)
100 (1-a) Confidence level expressed as a percentage (e.g., 95%)
AgDisp Agricultural Dispersion Model
BASS 5ioaccumulation and^quatic System Simulator
d.f. Degrees of freedom
ECOFRAM Ecological Committee on ETFRA Risk. Assessment Methods
EFED Environmental Eate and Effects E>ivision of OPP
EPA Environmental Protection Agency
EXAMS Exposure Analysis Modeling System
FGETS Eood and Gill Exchange of Toxic Substances
FIFRA Eederal insecticide, Eungicide, andElodenticide^4ct (legislation governing pesticide
registration in the United States)
|i The true mean of a variable
v Degrees of freedom
NRC National Research Council, National Academy of Sciences
OPP Office of Pesticide Programs, U. S. Environmental Protection Agency
ORD Office of Research and Development, U.S. Environmental Protection Agency
PRZM Pesticide .Root Zone Model
a2 The variance of a variable
S2,s2 Sample estimate of variance
S, s, S.D. Sample standard deviation
S.E. Standard error of the mean
x Sample estimate of the mean
Acknowledgments
Development of the model validation technique described in this report benefitted from reviews by R.A.
Ambrose, M.C. Barber, L.A. Suarez, and J. Babendreier. The descriptions of the current validation status of
the AgDisp and BASS models were supplied by S.L. Bird and M.C. Barber respectively. L.A. Suarez prepared
Figure 1 and Figure 2; L. Prieto prepared Figure 12 and Figure 13. Discussions with the OPP/EFED
implementation team for probabilistic risk assessment (Kathryn Gallagher, James Lin, Leslie Touart, Ron
Parker, et al.) were helpful in formulating the objectives of the work herein reported. Figure 1, Figure 2 and
Figure 3 are fromMathematicalModeling ofBiological Systemsby Harvey J. Gold, copyright © 1977 by John
Wiley & Sons, Inc. This material is used by permission of John Wiley & Sons, Inc.
VI
-------
Introduction to Pesticide Safety Assessment
Exposure assessment modeling is an important component of
U.S. Environmental Protection Agency (EPA) ecological risk
assessments for pesticides. Estimates of uncertainty in model
results, although long recognized as desirable by the Office of
Pesticide Programs (OPP) [1], have primarily been
accommodated in the regulatory process by imposing safety
factors and various conservative assumptions on modeling
scenarios and exposure metrics. The Agency uses a "tiered"
approach to risk assessment as a means of focusing attention on
the most problematic pesticides and use patterns. This approach
is one of screening out, via the use of conservative assumptions,
pesticides posing minimal risk to non-target biota. Those
materials failing to pass simple screening tests are remanded to
higher tier, more complex risk analyses. Thus, the most
conservative assumptions are not used to restrict usage or "ban"
chemicals, but only to segregate materials according to the
intensity of scrutiny they are to receive during the analysis phase
of a risk assessment.
Recommendations and analyses of the National Research
Council (NRC) have been formative in the development of OPP
regulatory analyses and procedures. NRC'S analysis of risk
assessment in regulatory agencies [2] was adopted early by EPA
as its standard approach to chemical safety issues. The NRC first
separated risk assessment from risk management, in order to
encourage clarification of the boundaries between the technical
and scientific components of regulatory activities, as against the
social, economic, and political pressures that constrain
regulatory decision-making.
Risk assessment is the characterization of the potential adverse
effects of exposure to environmental hazards. Risk assessments
include several elements: description of the potential effects on
organisms based on lexicological, epidemiological, and
ecological research; extrapolation from those results to predict
the type and estimate the severity and extent of effects in natural
populations under given conditions of exposure; estimation of
the species and locations of organisms exposed at various
intensities and durations; and summary judgments on the
existence and magnitude of ecological problems. The process
includes both quantitative risk assessment, with its reliance on
numerical results, and qualitative expressions or judgments made
during model parameterization and in the evaluation of the
uncertainties inherent in generalized (e.g., regional) exposure
analyses and laboratory-to-field extrapolation of toxicological
studies.
Risk management is the process of evaluating alternative
regulatory actions and selecting among them. Risk management,
which is carried out by EPA under its various regulatory
mandates, is the Agency decision-making process that entails
consideration of political, social, economic, and engineering
information to develop, analyze, and compare possible
regulatory responses to a potential ecotoxicological hazard. The
selectionprocess necessarily requires the use of value judgments
on such issues as the acceptability of risk and the reasonableness
of the costs of controls.
Steps in Risk Assessment
The NRC report divided risk assessment into four major steps:
hazard identification, dose-response assessment, exposure
assessment, and risk characterization. A risk assessment may
stop with the first step, hazard identification, if no adverse
effects are found, or if the Agency elects to take regulatory
action without further analysis for reasons of policy or statutory
mandate.
Hazard identification is the process of determining whether
exposure to an agent can cause an increase in the incidence of a
biological effect (mortality, reproductive impairment, etc.). It
involves characterizing the nature and strength of the evidence
of causation. In the case of pesticides, biological effects must be
evaluated in the context both of efficacy (biocidal impact on the
target organism), and the incidental endangerment of non-target
organisms, both on-site and off-site following transport of the
pesticide out of the intended target area. In the case of industrial
chemicals and waste disposal operations, biological effects are
universally undesirable. There are, however, few cases in which
direct biological data are available. The question, therefore, is
often restated in terms of the effects on laboratory animals or
other test systems (microcosms, mesocosms), e.g., "Does the
agent produce mortality in test animals?" Positive answers to
such questions are typically taken as evidence that an agent may
pose a risk to exposed natural systems. Information from short-
term in vitro (e.g., bacteriological bioassay) tests and inferences
from structural similarities to known chemical hazards may also
be considered.
-------
Dose-response assessment is the process of (1) characterizing
the relation between the dose of an agent administered or
received and the incidence of an adverse effect in an
experimentally exposed population, and (2) of estimating the
incidence of the effect in natural populations and ecosystems as
a function of exposure to the agent. It takes account of intensity
and duration of exposure, age patterns of exposure, and possibly
other variables that might affect response such as sex, seasonal
variation in condition, and other modifying factors. A dose-
response assessment usually requires extrapolation from high to
low dose and extrapolation from test species to potential target
species. A dose-response assessment should describe andjustify
the methods of extrapolation used to predict incidence, and
should characterize the statistical and biological uncertainties in
these methods.
Exposure assessment, of special interest in the present context,
is the process of measuring or estimating the intensity,
frequency, and duration of contact of the biota with an agent
currently present in the environment, or of estimating
hypothetical exposures that might arise from the release of new
chemicals into the environment. In its most complete form, it
describes the magnitude, duration, schedule, and route of
exposure; the size, condition, and species of the biological
entities exposed; and the uncertainties in all estimates. Exposure
assessment is often used to identify feasible prospective
regulatory control options and to predict the effects of available
control technologies on exposure. Quantitative exposure
prediction is often required in EPA regulatory analyses. Goals
may include an estimate of the effects of new chemicals prior to
manufacture or from increased production volumes of existing
chemicals, an evaluation of off-site impacts of pesticides during
the initial registration process or prior to expansion of use areas,
or establishment of remediation priorities among hazardous
waste sites.
Risk characterization is the process of estimating the incidence
of an effect under the various conditions of contamination of
natural systems and biological contact described in the exposure
assessment. It is performed by combiningthe exposure and dose-
response assessments. The ultimate effects of the uncertainties
in the preceding steps are described in this step. Within OPP, at
lower tiers this step is usually conducted using a quotient
method, i.e., a direct comparison of exposure and dose-response
results using a ratio of the two to infer the magnitude of risk.
The OPP "Quotient Method" of Assessing
Ecological Risk1
The EPA'S Office of Pesticide Programs (OPP) adapted the NRC
1983 [2] paradigm to better fit its legislated mandates, goals,
organizational structures and procedures. The goal of OPP
ecological risk assessments is to provide the scientific basis
needed to support and inform Agency risk management
decisions and regulatory actions. These can range from
registrationof a pesticide, to placing restrictions on the permitted
usages of a chemical, to requiring detailed laboratory or field
testing of a chemical to better evaluate impacts, to outright
prohibition of a chemical. Combining hazard identification and
dose-response assessment into a single domain, OPP parsed
chemical safety as a matter of ecotoxicological hazard
assessment, environmental exposure assessment, and. ecological
risk characterization.
Ecotoxicological Hazard Assessment, in OPP practice,
combines hazard identification and dose-response assessment
into five steps leading to the ultimate goal of the assessment, a
toxicological level of concern', the concentrations that, if equaled
or exceeded in the environment, could reasonably be expected
to produce adverse effects in the biota. This step-wise
methodology encompasses five specific procedures:
1. Define the endpoints of concern. The term endpoint is used to
denote the specific effects of pesticides that merit evaluation
during risk assessment. Organisms exposed to a chemical may
die, may fail to develop normally, or may fail to reproduce.
Organisms may bioconcentrate chemicals to levels in their
tissues that harm their predators2 or the detrital food chain,
irrespective of effects upon themselves. In addition, an early
identification of the species potentially at risk can guide the
assessment into the most effective avenues of investigation.
2. Select an appropriate test species for laboratory toxicological
investigation. The myriad of potentially exposed species and
ecosystems forces the use of experimental models, and the use
of inferential rules to extrapolate from the laboratory to field
exposure conditions. OPP has identified surrogate species for
several categories of birds, mammals, fishes, reptiles,
amphibians, invertebrates, and plants. The ideal test species is
sensitive to chemical insult, ecologically ubiquitous with a well-
known life history that accommodates the constraints of
toxicological investigations, is highly valued by society, and is
easily and inexpensively cultured in toxicological laboratories.
The objective is to avoid the massive uncertainties and
controversy that plague human health assessments that must
extrapolate from mouse to man: if the test organism is per se of
concern, then the laboratory toxicology is directly relevant. The
surrogates chosen by OPP - rainbow trout, mallard duck, etc.- fit
these criteria rather well. Still, in many instances toxicology
must be inferred. For example, it is not generally feasible to test
endangered species directly for toxicological impact, although
some endangered species have been cultured by the US Fish and
Wildlife Service for just such purposes (Foster L. Mayer, Jr.,
personal communication).
1 Developed from Office of Pesticides and Toxic
Substances (1990). State of the Practice Ecological Risk Assessment.
US Environmental Protection Agency, Washington, DC.
2 One obvious example is the closing of commercial
fisheries due to contamination of the fishes with PCBs.
-------
3. OPP uses a tiered testing system of four progressively more
complex and expensive tests. Testing begins with short-term
acute and sub-chronic laboratory studies; those most commonly
conducted develop basic dose-response data (the LC50 and the
LD50). The second and third tiers usually include an expanded
series of acute and chronic tests fora wider variety of organisms,
as well as fish bioaccumulation factors. The fourth tier may
include field tests and mesocosm or pond studies.
4. Validation of test data is so important to sound environmental
regulation that OPP regards it as a separate sub-discipline of
ecotoxicological hazard assessment The National Research
Council [2] listed among the aims of dose-response assessment
the characterization of "statistical and biological uncertainties."
The OPP analysis of test data evaluates accuracy (e.g.,
appropriate test conditions, interfering collateral contaminants),
precision (adherence to prescribed methods yields repeatability
within about a factor of two), and sensitivity (the design and
conduct of a test must be adequate to detect the endpoint, i.e.,
the probability of afalse negative mustbe evaluated). The Office
of Pesticide Programs (OPP) has developed, and made publicly
available, Standard Evaluation Procedures (SEP) for almost
every kind of data required for OPP hazard assessments.
5. Finally, from the foregoing steps, OPP analysts develop a
toxicological level of concern. This is the concentration that is
comparedto environmental exposures to produce a risk quotient;
it is the concentration that reasonably can be expected to cause
adverse effects. Although ideally the measuredLC50 can be used
directly, it often must be extrapolated from test subjects to
additional species in order to provide a margin of exposure or
safety margin (OPP); or to generate an Office of Pollution
Prevention and Toxics (OPPT) assessment factor.
Environmental Exposure Assessment develops
information of two kinds, essentially in conformance with the
NRC [2] exposition:
! The intensity, duration, and frequency of contact between
the biota and a potentially harmful agent - variously known as
Expected Environmental Concentrations (EEC), Estimated
Environmental Concentrations (EEC), Predicted Environmental
Exposure or Predicted Environmental Concentration (PEC), etc.
! A profile (size, condition, species) of organisms potentially
exposed to a chemical, and their distribution in the environment.
1. Estimating environmental concentrations requires several
procedures. For pesticides, registrants submit information
detailing target crops, application rates, frequency, timing,
method, etc. The Pesticide Root Zone Model (PRZM) [3]
calculates the transport off-site of pesticides and transformation
products from the specifics of climate, soils, and crops. A direct
interface between PRZM and EXAMS surface water models [4]
collects the PRZM edge-of-field pesticide export data and
converts them into EXAMS Mode 3 input load sequences.
OPP also constructs more direct estimates of chemical loadings
on aquatic systems: some herbicides are applied directly to water
bodies, and OPP frequently estimates the mass of pesticide
entering water bodies due to spray drift. Exposure estimates of
several kinds are performed in support of the risk assessment
tiers (see Text Box 1, adapted from [5]).
Ecological Risk Characterization is the final component of
ecological risk assessment in OPP chemical safety analyses. Risk
characterization compares the toxicological levels of concern
Preliminary exposure analysis includes simple laboratory tests and models to provide an initial fate profile for a pesticide (hydrolysis
and photolysis in soil and water, aerobic and anaerobic soil metabolism, and mobility).
Fate and transport assessment provides a comprehensive profile of the chemical (persistence, mobility, leachability, binding capacity,
transformation products (metabolites and "degradates")) and may include field dissipation studies, published literature, other
field monitoring data, ground-water studies, and modeled surface water estimates.
Estimated environmental concentrations (EEC) are derived during the exposure analysis or comprehensive fate and transport
assessment. There are four EEC estimation procedures:
Level 1: A direct-application, high-exposure model designed to estimate direct exposure to a nonflowing, shallow-water (<15 cm)
system.
Level 2: Adds simple drift or runoff exposure variables such as drainage basin size, surface area of receiving water, average depth,
pesticide solubility, surface runoff, or spray drift loss, which attenuate the Level 1 direct application model estimate.
Level 3: Computer runoff and aquatic exposure simulation models. A loading model (SWRBB-WQ, PRZM, etc.) is used to estimate
field losses of pesticide associated with surface runoff and erosion; the model then serves as input to a partitioning model
(EXAMS) to estimate sorbed and dissolved residue concentrations. Simulations are based on either reference environment
scenarios or environmental scenarios derived from typical pesticide use circumstances.
Level 4: Stochastic modeling where EECs are expressed as exceedence probabilities for the environment, field, and cropping conditions.
Text Box 1. Generalized exposure analysis methods and procedures used in prospective ecological risk screens of pesticides [5]
-------
with the EECS developed in exposure assessment to judge
whether there is sufficient risk to warrant further investigations
or regulatory action. There are four steps in risk characterization:
First, the quotient of the EEC and the lexicological level of
concern (TLC) is calculated to arrive at the quotient of the
"quotient method": EEC / TLC = Quotient. Quotients> 1 imply that
frank effects are likely and regulatory action is indicated.
Quotients«l (e.g., 0.01) imply risk is slight and little or no
action is required. Quotients near 1 represent uncertainty in the
risk estimate and usually require additional data. The second step
of OPP risk characterization is to compare the quotient to
regulatory criteria. Third, OPP may augment its analyses with
confirmatory data from mesocosm and field studies, incident
report observations of mortalities, and ecological simulation
models. The final step in OPP ecological risk assessment is an
evaluation of the weight of the evidence, that is, a review of the
quotient method analyses and additional evidence available from
field tests, incident reports, and simulation models. This step
includes evaluation of the quality of all available data and the
frequency and magnitude of effects in various media, while
retaining the flexibility to include all available relevant scientific
information in the final risk assessment.
Although predating their formulation, the ecological risk
assessment methods used by OPP are consistent with EPA's [6]
risk assessment guidelines. Two pesticide studies (carbofuran
and synthetic pyrethroids) were influential during the
development of the guidelines [7]. OPP has a continuing interest
in the further development of probabilistic exposure analysis,
methods for predicting the efficacy of risk mitigation measures,
and the development of a set of standard data bases for
consistent parameterization of fate and transport models [5, 8].
Probabilistic Assessment
OPP regards the quotient method as an entrenched, useful
techno logy with numerous acknowledged weaknesses, including
deficiencies in evaluation of indirect effects, disregard of
incremental dose impacts, and neglect of effects at higher levels
of organization. It nonetheless continues to provide a useful
lower tier deterministic "screening method" when coupled to
appropriately conservative assumptions. The exposure portion of
these assessments has, however, been conducted for some time
with explicit accounting for environmental variability within the
limited scenarios employed. In these analyses, the PRZM/EXAMS
models are run with between 22 and 36 years of input
meteorological data, and the 90thpercentiles of several exposure
metrics are captured from their probabilistic plotting position.
Although more informative than a simple dilution calculation, a
fuller exposition of the properties of the distribution, with
expansion of the analysis to include a variety of physiographic
regions, soils, and agricultural landscapes, could serve to place
the current point estimates in a fuller national, multi-use context.
In a congressionally mandated review of EPA risk assessment
procedures [9], the National Academy of Sciences strongly
criticized EPA's approach to risk assessment of hazardous air
pollutants insofar as "it does not supplant or supplement
artificially precise single estimates of risk ('point estimates')
with ranges of values or quantitative estimates of
uncertainty...This obscures the uncertainties inherent in risk
estimation, although the uncertainties themselves do not go
away...without uncertainty analysis it can be quite difficult to
determine the conservatism of an estimate." hi 1997, the EPA
established its Policy for Use of Probabilistic Analysis in Risk
Assessment [10] and published a set of "guiding principles" for
conducting such analyses using Monte Carlo methods [11]. The
policy reaffirms the place of deterministic methods in the suite
of Agency methods: "[Probabilistic] analysis should be a part of
a tiered approach to risk assessment thatprogresses from simpler
(e.g., deterministic) to more complex (e.g., probabilistic)
analyses as the risk management situation requires." More
importantly, the policy statement establishes a set of "conditions
for acceptance" by the Agency of probabilistic analyses. These
conditions, intended to encourage the ideals of transparency,
reproducibility, and the use of sound methods, identify factors
to be considered by Agency staff in implementing the policy (see
page 5).
In May of 1996, the OPP EFED (Environmental Fate and Effects
Division) presented two ecological risk assessment case studies
to its FIFRA Scientific Advisory Panel (SAP) for comment. The
SAP affirmed the value of the process, but urged that OPP begin
development of tools and methods forprobabilistic assessments.
In response, EFED instituted an "Ecological Committee on FIFRA
Risk Assessment Methods" (ECOFRAM) composed of four
workgroups (aquatic and terrestrial exposure and effects).
ECOFRAM was composed of experts drawn from government
agencies, academia, environmental groups, industry, and other
stakeholders. Followingcompletionof the ECOFRAM draft report,
EPA conducted an "Aquatic Peer Input Workshop" on June 22-
23, 1999. One consensus view in the reviewer comments was
that the models used for exposure assessment (PRZM/EXAMS
inter alia) need validation. These validation studies should
include field evaluation of both structural and parameter
(scenario) reliability and performance characteristics. Case study
examples of the proper use of the models in regulatory analysis
should be developed as well. The goal of standardized and
transparent scenarios for a variety of physiographic regions,
crops, and aquatic ecosystem types was encouraged, with an
ultimate aim of developing a complete database suitable for
systematic, regular use in pesticide risk assessments.
As part of the process of further developing and implementing
probabilistic risk assessment approaches, OPP/EFED executed a
case study including both deterministic and probabilistic risk
assessments. The aquatic assessment was limited to four crops
(corn, cotton, potatoes, and grapes) for a single example
chemical and product formulation [12]. In exploring the
statistical properties of outputs from the PRZM/EXAMS linked
exposure models, it was observed that no single distribution
family consistently fit the output data and the quality of fit varied
widely among scenarios. A 2-D Monte Carlo model was
-------
therefore developed using an empirical distribution function to SAP strongly endorsed the direction taken by OPP, and "the Panel
represent exposures, in preference to a fitted theoretical concluded that field verification (for effects and chemical fate)
distribution. This study was presented to OPP'S FIFRA Scientific of model predictions is very important and needs to be
Advisory Panel (SAP) in March of 2001. In its critique [13], the conducted."
1. The purpose and scope of the assessment should be clearly articulated in a "problem formulation" section that includes a full
discussion of any highly exposed or highly susceptible subpopulations [that have been] evaluated.... The questions the assessment
attempts to answer are to be discussed and the assessment endpoints are to be well defined.
2. The methods used for the analysis (including all models used, all data upon which the assessment is based, and all assumptions
that have a significant impact upon the results) are to be documented and easily located in the report. This documentation is to
include a discussion of the degree to which the data used are representative of the population under study. Also, this
documentation is to include the names of the models and software used to generate the analysis. Sufficient information is to be
provided to allow the results of the analysis to be independently reproduced.
3. The results of sensitivity analyses are to be presented and discussed in the report. Probabilistic techniques should be applied to
the compounds, pathways, and factors of importance to the assessment, as determined by sensitivity analyses or other basic
requirements of the assessment.
4. The presence or absence of moderate to strong correlations or dependencies between the input variables is to be discussed and
accounted for in the analysis, along with the effects these have on the output distribution.
5. Information for each input and output distribution is to be provided in the report. This includes tabular and graphical
representations of the distributions (e.g., probability density functionand cumulative distribution function plots) that indicate the
location of any point estimates of interest (e.g., mean, median, 95th percentile). The selection of distributions is to be explained
and justified. For both the input and output distributions, variability and uncertainty are to be differentiated where possible.
6. The numerical stability of the central tendency and the higher end (i.e., tail) of the output distributions are to be presented and
discussed.
7. Calculations of exposures and risks using deterministic (e.g., point estimate) methods ate to be reported if possible. Providing
these values will allow comparisons between the probabilistic analysis and past or screening level risk assessments. Further,
deterministic estimates may be used to answer scenario specific questions and to facilitate risk communication. When comparisons
are made, it is important to explain the similarities and differences in the underlying data, assumptions, and models.
8. Since fixed exposure assumptions (e.g., exposure duration, body weight) are sometimes embedded in the toxicity metrics (e.g.,
Reference Doses,...[96-hour LC50]), the exposure estimates from the probabilistic output distribution are to be aligned with the
toxicity metric.
Text Box 2. EPA Policy: "Conditions for Acceptance" of Probabilistic Risk Assessments [10]
-------
Modeling Issues and Technique
Pesticide Exposure Modeling
Models serve avariety of purposes in the scientific enterprise. In
the most general terms, a "model" is a representation of some
aspect of physical reality intended to enhance or express our
understanding of that reality. Thus, physical models may serve
as tools for studying the hydraulics of river systems,
architectural models display the visual esthetics of a proposed
structure, experimental models facilitate medical investigation of
the causes of disease, statistical models help interpret the results
of experiments, and mathematical process models express
phenomena in symbolic formalisms. The goal of encapsulating
physical phenomena in the mathematics of underlying process
has its roots in antiquity, as for example in the work of
Pythagoras (fl. 530 B.C.E.) on the integer physics of vibrating
strings - the foundation of Western music. Models based in
underlying process are almost universally held to be more
reliable guides to action than are statistical extensions of
observed trends or tendencies, an idea often expressed as
"correlation is not causation." Mathematical process models
occupy pride of place in the sciences because they canbe readily
manipulated to derive the consequences of the understanding
imbedded in their structure. Examination of these consequences
then serves as a test of the adequacy of the perceptions of reality
underlying the model. The testing process is beset, however,
with intractable epistemological problems with troublesome
ethical implications for the use of models to inform public policy
and regulatory decisions [14].
Models are notoriously easy to create and notoriously difficult
to evaluate. Because the creation of models is intrinsic to
science, the modeling enterprise has resisted attempts at
standardization on the (quite legitimate) grounds that objectives
and methods are unique to each discipline. In the context of
"regulatory models," in which mathematical constructs have
been encapsulated in computer codes to assist with regulatory
decisions and rule-makings, there are strong economic and
political reasons to cast doubt upon the reliability of these
tools-at least when the conclusions are unpalatable. Regulatory
decisions often embody an attempt to mediate among competing
interests, and so are often unpalatable to some. Hostile scrutiny
of models underlying regulatory decisions is thus the norm; the
value of standard methods for evaluating regulatory models is
apparent.
The questions routinely asked of technical staff by EPA
pesticide regulatory officials include [15]:
"What models did we use? Have they been validated? Are
they widely accepted and scientifically sound?" and
"How predictive and confident are we in using them?"
These simple questions raise deep issues of the basis and
reliability of knowledge, the social aspects of technique, and the
limits of forecasting in the face of incomplete information. Can
a qualitative epistemological question ever be answered in the
simple affirmative? Quantitative analysis can contribute to
discussion of the weight and kinds of evidence required to
support a decision, but cannot of itself create a value structure to
support regulatory judgement. All models are fundamentally
linguistic constructs, i.e., symbolic representations of an external
physical reality. Perception of phenomena, modes and character
of data collections, mathematical and logical inference, and the
nature of available decisions and decision processes are
inextricably intertwined. Because language provides the vehicle
of communication and creates much of the controversy over
model validation, clarity of language is the first task: in Oreskes'
[16] phrase, "calling a model validated does not mean it is
valid."
Concepts of Model Building and Model Testing
The terms to be understood include validation, verification, and
calibration. In brief, validation, as used in the vernacular
exhibited above, obviously pertains to suitability and reliability
for the task at hand (from its Latin root valere-to be strong3). Its
first meaning is one of having legal force, as in a valid
automobile license plate. It has also, however, seen extensive use
in scientific contexts as a term for experimental confirmation of
theory, or as a term for model testing, e.g., "a process of
obtaining assurance that a model is a correct representation of
the process or systems for which it is intended" [17].
Verification, with roots in Latin verus (truth), speaks more
immediately to the issue of "truth," although here too many of its
Lexicographical discussion based on Webster's Third New
International Dictionary (Unabridged), G. & C. Merriam Company,
Springfield MA (1971).
-------
senses are of the courtroom rather than of the state of our
knowledge. Verification has sometimes been restricted to issues
of the internal integrity of models [ 18], but has also been used in
ways synonymous with some form of external validation [e.g.,
19]. The distinction in common usage remains clear, however:
a police officer may have a need to verify that a license plate is
valid. Finally, calibration carries an idea that parameters of a
model can be adjusted to "ascertain the proper correction
factors" via comparing model outputs with observed data-rather
as though a model were a physical measuring device in need of
tuning to achieve its best performance. Natural language derives
it power from its allusive force, so the bald assertion that a
model is "valid" or has "been validated" inevitably suggests that
its ability to make accurate predictions has been established. In
actuality, such an assertion may mean almost anything, or
nothing at all, and may well be met with the counter that
"[v]erification and validation of numerical models of natural
systems is impossible" [20]. This last idea is not merely
provocative: the limits of empirical scientific knowledge as
reliable guides to action are too easily forgotten in our
technocratic era. Unfortunately, the incompleteness of scientific
knowledge has been used as Luddite artillery by industrialist and
environmentalist alike to attack inconvenient regulatory
decisions. The validity of pesticide exposure models, and what
is meant by it, is thus a matter deserving of close and detailed
attention.
The difference between interpolation and extrapolation is also
germane to what follows. Broadly, interpolation is the process
of estimating intermediate values that fall within the range of
prior observations, and extrapolation is the process of extending
estimation beyond the range of prior observations. Interpolation
often can rely on straight-forward statistical description of a
dataset. There are pitfalls even in simple descriptive models,
however.
In Figure 1 the pattern of y as f(x) is described with a straight-
line relationship, a curvilinear relationship, and two high-
precision equations (empirical models!) providing an exact fit to
all the observations (ignoring, as usual, the possibility of error in
the measurements). The straight line is the simplest model of
these data and, if Ockham's razor in its naive form ("simpler is
better") is applied, it is the best. Noticing that the residuals
exhibit some systematic structure (the midrange values tend to
fall well above the line; the end members are all below) usually
is sufficient justification for imposing a curvilinear
relationship-in the name of being more "faithful to the observed
data." William of Ockham's4 stricture has not, incidentally, been
b. c. 1285, Ockham, Surrey?, England; d. 1347/49, Munich,
Bavaria (now in Germany). Franciscan philosopher, theologian, and political
writer, a late scholastic thinker regarded as the founder of a form of
nominalism and a pioneer of modern scientific epistemology. Marilyn
McCord Adams, William Ockham, 2 vol. (1987), discusses in detail his
thinking on a variety of complex topics; an informative summary of his life
and thinking can be found in the Internet Encyclopedia of Philosophy
(http ://www.utm. edu/research/iep)
violated, for his injunction was not "simpler is better," but rather
"plurality should not be assumed without necessity" (or more
precisely "non sunt multiplicanda entia praeter necessitatem").
Necessity is clearly a matter of judgement, so Ockham's razor is
not quite the reliable guide to virtuous model-building
sometimes alleged, and "keep it simple, stupid" is at best a
grotesque modern parody of his idea.
The pitfall of an obsessive "faithfulness to the data" is also
illustrated in Figure 1: two functions have been fitted to the
dataset, each of which passes through every observed point with
absolute fidelity. Note, however, that they disagree at every
interpolation point, illustrating a point to which we shall return:
there is a multiplicity of models (or, equivalently,
parameterizations of model structures) equally able to represent
any dataset. In this case, each of these two models can claim
high fidelity (in both accuracy and precision) as a result of
calibration. Which would be the better for prediction? Simply
using the mean and variance of the observations would be
preferable to a policy of despair in which every possible
parameter set and descriptive equation is given equal weight in
estimating the "uncertainty" of interpolated predicted values.
Thus: simple numerical testing of models is an incomplete and
error-prone (albeit necessary) guide to reliability, even when no
more than a simple interpolation among observations is all that
is desired. Extrapolation, the extending of estimationbeyondthe
range of observation, invites another flavor of disaster
altogether.
Figure 1. The dangers of interpolation (adapted from [21]).
The dangers of extrapolation [21] are illustrated in Figure 2.
The two functions illustrated describe the observations equally
well, and for interpolationpurposes there is little need to choose
between them. Once beyond the range of observation, however,
their behavior is completely different. The choice, if
interpolation will not suffice, must be made on grounds wholly
external to the data itself. If, for example, the matter at hand is
one of crop yields in response to soil amendments, Liebig's
"Law of the Minimum" suggests the dotted line as the more
reliable guide. If the data represent enzyme activity as a function
of pH, the dashed line is the more plausible. The empiricist ideal
of "letting the data speak for itself' cannot answer the need.
-------
Empirical Science and Numerical Extrapolation
Models
The principal distinction here is that of a descriptive, or
correlative, as opposed to an explanatory, model [21] (Figure 3).
The function of the correlative model, often developed by
statistical analysis of experimental or observational data, is to
provide some "adequate" function describing the data, which is
then used as a predictive tool. The process is one of collecting
relevant data, passing the dataset through some statistical
machinery, and then using the resulting parameterized
mathematical function for prediction as in Figure 1. The
predictions can form a basis for additional investigations, from
which additional data can be collected to improve the calibration
of the model. The left side of Figure 3 illustrates the process.
^
,*/
Figure 2. The dangers of extrapolation (from [21]).
CORRELATIVE
EXPLANATORY
Figures. The model-building process [21]. Scientific
models begin from empirical data. Conceptual models
may be formulated directly from the data, or by inference
from descriptive correlation analysis.
The right side of Figure 3 illustrates explanatory modeling, as
contrasted with correlative models, in the language of [21]. The
useful distinction here is not one of "theoretical" vs. "empirical"
models, a dichotomy that is little more than a politically charged
relict of an ancient dispute between Platonists and Aristoteleans.
In both cases, scientific models begin from a set of empirical
data collected because judged to be relevant to some
phenomenon of interest. The construction of an explanatory
model is driven less by the desire to describe the dataset than by
an ambition to discover a physical mechanism underlying the
observed behavior. This conceptual model is then captured by a
mathematical relationship, and tested by examining its ability to
describe the original data. To the extent it succeeds,
understanding of the phenomenon is increased, and similar
events are amenable to interpretation in terms of similar physical
causes. To the extent it fails, revision of the mathematics to
achieve a better fit is not an acceptable strategy: the physical
reasons for the failure must be analyzed.
Prediction now becomes possible beyond the range of prior
observation because the mathematics of the explanatory model
is not so tightly tied to the details of the empirical dataset from
which the analysis began. These new predictions can be tested
against elements of the original data, or by collecting additional
data for a direct test of the extrapolations. As the collection of
successful ("confirmatory") predictions accumulates, confidence
in the model increases, and its reliability for inference
concerning related situations becomes established.
A major advantage of explanatory models over correlative
models (or, in politicized terms, "theoretical" models over
"empirical" models; "process" models over "statistical" models)
lies in the explanatory models' reduced structural uncertainty.
Explanatory models offer better capabilities for the
extrapolations required to support policy decisions. These
capabilities flow from the organic process understanding built
into the fundamental structure of the model. In contrast, an
"empirical" model confronted with new data may require an
entirely new structure to maintain its "curve-fitting" ability, and
may thus radically alter even its interpolated predictions.
This tale of the social history of a model derives it rhetorical
strength from its echoes of the standard "hypothetico-deductive"
model of the scientific enterprise at large [22, 23]. In this view,
science proceeds from triumph to triumph by forming precise
hypotheses regarding physical phenomena, constructing critical
experimental tests, and thus from incontrovertible rejection of
hypothesis proceeding to the next step in the long accretion of
scientific wisdom. It has been suggested, indeed, that any
scientific idea not thus subject to "Popperian falsification" is not
worthy of the name of science at all, being presumably mere
philately or mysticism. One of the more important limitations of
Popper's ideas is their inability to recognize the importance of
the interpretive branch of Figure 3, as for example in the
guidance of medical research provided by Darwinian biology.
Another weakness of this idealization arises from the
-------
observation that scientific theories are seldom abandoned on the
basis of a single (or multiplicitous) contrary observation, but are
patched and re-patcheduntil the absurdity of the situation forces
a general reorganization of the underlying conceptual model-a
"paradigm shift," in the much-abused phrase [24]. Just as a
single falsification does not warrant the wholesale abandonment
of a model, however, neither does a single (or multiplicitous)
successful prediction warrant any but tentative, conditional, and
skeptical confidence. Still, prediction is a necessary precondition
to rational regulation, and governments are not likely to revert to
the methods of the ancient Roman haruspex. Understanding the
nature of principled objections to the use of models for policy
formulation, and their limitations and appropriate use, is critical
for responsible analysis in support of regulatory activities.
When models are used to guide regulatory decisions, the demand
that the models' reliability as a guide be assessed is eminently
reasonable. Because society increasingly demands infallibility of
public officials, that regulatory decisions guarantee a risk-free
environment, and that the economic costs of regulation be
vanishingly small, Agency decision-makers pressure technical
staff to guarantee the "validity" of decision tools. A fear then
arises that Gresham's Law (bad currency drives out good) will
virtually guarantee that models which are not claimed to be
"valid" will be supplanted by models of less scrupulous
patrimony. Because model "uncertainty" and its quantification
are increasingly valued, however, the bald affirmation that a
model has been validated and verified is increasingly met with
a healthy skepticism. Issues of model validation are dealt with
more completely beginning on page 13.
Variability and Uncertainty
Variability and uncertainty are separated by their occupancy of
different probability spaces, and by their ramifications for
decision-making [9]. Variability refers to intrinsic heterogeneity
in a quantity. It cannot be diminished by additional study, but
only better characterized. Rainfall is a good example of a
variable quantity of importance for pesticide exposure. Rainfall
varies from place to place, and over time at any given place.
Rainfall can be characterized by its intensity, duration, area!
extent, total volume, etc., depending on the purposes of the
analysis. Models of pesticide export from treated areas can be
developed on the basis of annual rainfall/runoff volumes, as
averages across physiographic regions, or as responses to
specific rainfall events. Individual rainfall events may be
characterized by 15-minute, hourly, daily, or annual totals.
Model development is often constrained by the unavailability of
suitably detailed input data to drive model responses to events.
Uncertainty, by contrast, can be reduced by additional study; it
expresses a lack of knowledge of the true value of a quantity
(whichmay also be masking variability). Uncertainty in pesticide
risk assessment may arise from several sources. Imperfect
measurements of chemical properties (e.g., transformation rate
constants) are inevitable in the laboratory investigations
underpinning registration studies. Chemical parameter
uncertainty develops from random and systematic errors in
measurement that can be adequately characterized using
conventional statistical technique.
Exposure models are subject to structural (inadequate process
algorithms) and parameter (inadequate parameter measurement)
uncertainty. These often interact For example, in characterizing
the ability of rainfall to penetrate the soil, heterogeneity across
an entire planted field may be summarized in a single infiltration
parameter, which then becomes exceedingly difficult to measure
during site investigations. The predictive reliability of models
can be adequately characterized only by means of validation
studies featuring direct comparison of model predictions with
independent observations from empirical reality.
Additional uncertainties arise from spatial and temporal
aggregations and approximations used to construct "scenarios"
- the definitions of watershed geography and agroecosystem
properties used to represent the treated landscape. Uncertainty
also arises from the assumption that specimen "surrogate"
systems can provide an adequate measure of protection for all
protected endpoints (e.g., the use of farm ponds as a surrogate
for all potentially affected aquatic ecosystems [13], or mallard
ducks as a surrogate for all exposed waterfowl). Scenario and
surrogate uncertainties are in large measure matters of regulatory
experience and policy development.
In their ramifications, uncertainty "forces decision-makers to
judge how probable it is that risks will be overestimated or
underestimated for every member of the exposed population,
whereas variability forces them to cope with the certainty that
different individuals will be subjected to risks both above and
below any reference point one chooses" [9].
The distinction between uncertainty and variability is sometimes
difficult to maintain. The frequentist view of probability carries
an implied symmetry between frequency of occurrence and
probability of detection. For example, if there is a 10"3 chance
that any given body of water is contaminated, then inspecting
water samples from 1,000 waterbodies would presumably locate
the contaminated member of the set. Or, put another way, if a
water body is selected at random, there is only a 0.001 chance of
detecting contamination. This symmetry can lead to some
confusion in risk communication: In a population of 200 million
individuals, an individual risk of 10"6 is negligible for each single
person, but the statement seems to imply that 200 individuals are
at serious risk. Similarly, variability can contribute to
uncertainty: when a quantity varies by several orders of
magnitude, a precise estimate of the true mean can require a
dauntingly large data set.
Probability distributions are used to quantify both variability and
uncertainty, but the interpretation given the distributions differs.
The concepts can perhaps be separated by reserving the term
"frequency distribution" to describe variability, and "probability
distribution" to represent uncertainty [25]. Distributions for
-------
variable quantities thus represent the relative frequencies of
values drawn from specified intervals, and distributions of
uncertain quantities represent the degree of belief or subjective
probability (colored by measurement error, etc.) that a specified
value falls within a specified interval [26]. Uncertainty regarding
variability can be viewed as probability regarding frequency
[27].
It should be acknowledged, then, that variability and uncertainty
are to some degree inextricably intertwined. For example,
statistical summaries may capture variability in a few
distributional parameters, sacrificing spatial or temporal
precision for the sake of reducing a welter of detail to
manageable proportions. Again, regularly spaced point
measurements of the concentration of pesticides in watercourses
sacrifice temporal detail to economic restraints, leaving
uncertainty as to the true duration of episodes of contamination
driven by rainfall events. Still, the motivation for separating
natural variability fromuncertainty is to clarify the contributions
to decision processes of irreducibly stochastic phenomena, e.g.,
rainfall variability, as against the contribution of more readily
remediable measurement errors and knowledge gaps. Reductions
in uncertainty, by, for example, improving the precision and
detail of laboratory investigations of chemical properties, or
through field investigations of the efficacy of mitigation
techniques, can improve the reliability of regulatory decisions.
In contrast, the irreducible variability of natural phenomena will
always introduce unwelcome uncertainty into regulatory
decision-making.
Dealing with Variability and Uncertainty: The
Monte Carlo Method
The "Monte Carlo method" was developed at Los Alamos
National Laboratory to solve problems in weapons design
stemming from the need to combine "stochastic and
deterministic flows" [28] in the study of the "interaction of high
energy neutrons with heavy nuclei" [29]. In this problem, the
path of any particle and its sequence of interactions with other
particles is physically determined by its initial velocity. The
initial velocity of a particle cannot, however, be precisely
predicted. In studying this problem, it developed that "the
practical procedure is to produce a large number of examples ...
and then to examine the relative proportion^]" of each of the
potential outcomes, thus producing a frequency distribution
predictive of the probable behavior of the ensemble.
Monte Carlo methods take advantage of the ability of computers
to produce random numbers drawn from a uniform distribution,
which are then translated into the actual distributions of the
stochastic input variables. The method was concisely defined by
Metropolis and Ulam [29]: "Once a uniformly distributed
random set is available, sets with a prescribed probability
distribution/ft) can be obtained from it by first drawing from a
uniform uncorrelated distribution, and then using, instead of the
number x which was drawn, another value y=g(x) where g(x)
was computed in advance so that the values y possess the
distribution/^." The nameof the method is derived fromgames
of chance that play out according to prescribed rules following
an initial random event (e.g., randomization of card order, roll of
the dice, etc.). In discussing the method, Metropolis and Ulam
[29] noted that practical applications must, in the specification
of the functions g(x), allow for covariance (correlations among
input variables) to avoid physically impossible combinations of
parameters.
In Monte Carlo simulation studies, the analyst must "perform a
finite number of experiments" that translate input variability and
uncertainty into the output distributions of interest. The best way
to specify the appropriate "finite number" is never entirely
obvious. If too few iterations are performed, the output
distributions are so contaminated by sampling errors as to be
unreliable, especially in the distribution tails that are usually of
greatest interest for risk assessment. If too many iterations are
employed, the analysis becomes computationally burdensome
and inappropriate for routine use. One solution, essentially
Bayesian in outlook, is to track the statistical properties of the
output distributions as the simulation proceeds and terminate the
process when these properties stabilize. This translates the
difficulty into one of defining adequate stability, which is at least
more tractable than that of a priori defining the number of
"experiments" to be executed on the computer.
Several means of sampling the input distributions f(x) are
commonly used; these include simple random sampling, "trace
matching," and Latin Hypercube Sampling (LHS). Simple
random sampling is usually deprecated because of its
inefficiency, i.e., an excessively large number of simulations is
required to achieve stable output distributions. In "trace
matching," only actual observed values of the input variables are
employed. This has the advantage of not requiring that the input
data be matched to a distribution for sampling. It thus eliminates
the sometimes contentious process of deciding on proper
representation of the tails of the input distributions and the
proper means of interpolating among observations. It has the
parallel disadvantage, however, of potentially failing to capture
extreme values and of being unnecessarily contaminated with
gross measurement errors. Proper use of observational data for
Monte Carlo input thus demands effective quality control
procedures for database development, and some independent
knowledge of the intrinsic variability of the natural processes
underlying the data.
In Latin Hypercube Sampling (LHS), the input distribution is
divided into n intervals of equal probability, where n is at a
minimum the number of simulations to be run. Samples are
drawn once, without replacement, from each of these intervals,
thus ensuring efficient representation of the entire range of the
distribution. Sampling within the intervals maybe either entirely
random or by explicit choice of the median of the interval, a
technique known as "median LHS."
10
-------
Interpretation and Presentation of the Results of
Monte Carlo Analysis
Monte Carlo methods can be iterated during exposure analysis.
An initial simulation using mean values of input parameters can
identify dominant fate processes from an EXAMS model's
standard sensitivity analysis, with subsequent application of
probabilistic techniques to the governing parameters of those
processes. When variable and uncertain parameters are collected
during simulation and paired with output exposure metrics,
multiple regression analysis can indicate the contribution of
each input to the output variability [30].
In presenting the results of a Monte Carlo analysis, a "tiered"
presentation style is recommended by EPA [11], in which the
level of detail increases with each successive tier. For example,
the first tier may be a one-page summary with graphs of the final
risk metrics and a summary of the study, the second tier an
executive summary, the third a detailed report. Documentation
of model inputs is an important element of these reports.
Reliable, coordinated synoptic observational datasets offer a
number of advantages for exposure analysis: prior
documentation relieves the individual analyst of the need to
research environmental data for each analysis, simple trace
matching obviates the need to fit the observations to an input
distribution, the tails of the (empirical) distribution are not in
dispute because they need not be invoked, and covariance among
inputs is automatically accommodated. Uncertainty in chemical
measurements should be treated as Normal distributions in
almost all cases. Variability in laboratory data results from the
accumulation of errors classically giving rise to the Normal
distribution. Although an individual set of measurements may
show statistically detectable skew or kurtosis, it is very likely
that such results arise from sampling error rather than a
genuinely non-Normal measurement process [31].
The outputs from a Monte Carlo exposure analysis can be
presented as cumulative exceedence curves. In constructing a
particular analysis, the exposure findings should be matched to
the biological endpoint and the available toxicity metric. For
example, for mortality of annually reproducing fishes the 96-
hour LC50 can be compared to the largest average 4-day
concentration for eachyear at agiven site. For multiple sites, the
90th percentiles of the 4-day events could be assembled. If
chronic data for reproductive failure is available, 21-day average
concentrations during the breeding period can be assembled for
single sites over multiple years, or for some percentile of the
single site distributions over multiple sites. These distributions
0 10 20 30 40 50 60 70 80 90 100
Percent of Years, Area, or Sites With Equal or Higher Concentrations
m Peak or Short
Term Average
m lont Tffm Acute Tootlclty Chronic Toxicity
Average Threshold Threshold
Figure 4. Cumulative exceedence curves constructed from
single-site temporal data or multi-site data summaries.
can then be passed to a fuller probabilistic risk assessment
incorporating uncertainty in the toxicity data, or compared
directly to point estimates of toxicity as in Figure 4.
Full probability density function (pdf) and cumulative
distribution function (cdf) graphs can convey different
perspectives. These may also be constructed to match available
toxicity metrics, including, for example, 24-h, 48-h, 96-h, 21-
day, 90-day, breeding season or annual exposures for limnetic or
benthic zones, for individual high-risk sites or across multiple
sites representing the full suite of physiographic zones in the
pesticide's use area. Point estimates and mean values from
simpler analyses can be indicated on the these graphs for
reference, as in Figure 5 and Figure 6.
The pdf plot displays values of a randomvariable (concentration
expressed in exposure metrics over short intervals) on the
horizontal axis (abscissa) and relative frequency of occurrence
or probability density on the vertical axis (ordinate). The pdf
(Figure 5) is useful for displaying the relative probability of
values, the most likely values (e.g., modes), the shape of the
distribution (skew, kurtosis), and small changes in probability
density.
A cumulative distribution function plots, on the ordinate, the
probability that a value of the exposure metric (randomvariable)
is less than a specific value on the abscissa. These plots (Figure
6) can display fractiles (including the median), probability
intervals (including confidence intervals), stochastic dominance,
and mixed, continuous, and discrete distributions.
11
-------
0.025
•»• 0.020
0.0*5
0.0*0
0.005
0.000
1.0E-06 f.OE-05 1.0E-04 1.0E-03 1.0E-02 1.0E-01
Concentration (mg/L)
0.00
Mean
Value
1.0E-06 1.0E-05 1.0E-04 1.0E-03 1.0E-02
Concentration (mg/L)
1.0E-01
Figure 5. Example Monte Carlo estimate of a probability density
function (pdf).
Figure 6. Example Monte Carlo estimate of a cumulative density
function (cdf).
12
-------
Validation, Verification, and Performance Measures
The Validation Problem
The idea that the "quality" of a model should be assayed is not
entirely the same as the idea that models must be tested by
comparing their predictions to independent empirical data. The
"quality" of a model depends on (1) the quality of its underlying
science, (2) the degree to which it has been shown to be "robust"
(i.e., insensitive) in the face of violations of assumptions and
approximations used in structuring the model, (3) "verification"
studies confirming that the model indeed behaves according to
its design specifications, and (4) demonstration studies of its
parameter sensitivities and the relative performance of its
internal constituent process models. These issues have been
discussed in some detail in the context of the initial development
of EXAMS [32], including, for example, tests of a simplified
model of exchange at the benthic boundary layer, examination
of the effect of neglecting the impact of sorption to suspended
sediments on the photochemical absorption spectrum, and the
structural and parameter sensitivity of volatilization models.
Although suchstudies are anecessary preliminary, studiesofthe
performance of a model confronted with full-scale, independent
empirical datasets reflective of the regulatory context are the
only means of quantifying its "performance validity." Indeed,
even models of very high quality (i.e., founded in well-
established science, codes tested against an extensive set of
limiting cases and parallel analytical solutions ("bench marked"),
etc.) may fail when applied in a regulatory or policy
development environment [33]. Such failures may arise, not only
or even principally from failures in the underlying model, but
from failures of the numerical representation to accurately solve
the underlying equations in a complex field setting (for which
there is now no equivalent analytical simplification), or from an
inadequate representation of the environmental setting itself. In
addition, there is clearly a difference between the ability of a
model to describe a given situation ("history matching") by
calibration of its parameters, and its ability to adequately
represent new observations using the existing calibration,
standard defaults, or generalized parameters. The latter has been
termed a "post-audit" [34, 35] and provides the most fertile
ground for generating a history of quantified measures of
performance validity. Post-audit failures frequently expose the
limitations of an initial calibration data set that failed to fully
reflect underlying variability in environmental processes. More
seriously, post-audit studies may reveal actual structural
deficiencies, i.e., failures in the underlying conceptualization of
the problem domain (see, for example, the case histories
enumerated by Konikow and Bredehoeft [33]). These
deficiencies are often then repaired, emphasizing that numerical
models share some properties with scientific theory: they are
often most useful as a means to enhance understanding and
guide further work, and are liable to be patched and re-patched
until they are supplanted by something entirely new. The task of
quantifyingpredictionuncertainty is thus complicatedby the felt
obligation to use the results of a "validation test" to improve the
structure of the model or, somewhat less usefully, to re-calibrate
and thereby update knowledge of the parameter space able to
generate acceptable history matches.
A further difficulty arises from the fact that, even absent re-
calibration or re-structuring, successful empirical (post-audit)
testing ("validation") cannot conclusively demonstrate that a
model will have good predictive reliability when applied to novel
situations [33,36]. Thus, because "valid" is used by government
agencies to signify "reliable for support of regulatory decision-
making," Oreskes [16] has observed that "calling a model
validated does not mean it is valid." To simply affirm that a
model has been verified or validated ipso facto implies that its
truth has been demonstrated, and it therefore can serve as a
reliable basis for regulatory decisions. The tendency of scientists
to claim that validation and verification mean something else
within the technical community is ingenuous. The terms have
indeed been given a variety of precise and honest definitions in
the course of technical discussions, e.g., "Verification is usually
defined as ensuring that the model behaves (runs) as intended,
and validation is usually defined as determining that an adequate
agreement exists between the entity being modelled and the
model/or its intended use" (emphasis added) [37]. That said, it
mustbe observed that resistance to direct translation of the terms
valid and verified into public discourse has been futile. It is
equally futile to demand that the terms be abandoned, however,
because a refusal to answer the query "Have [the models] been
validated?" [15] would be viewed as obfuscation. The answer
rather lies in a serious response to the collateral questions posed:
"Are they widely accepted and scientifically sound?" and "How
predictive and confident are we in using them?" [15] which
address issues of the technical basis of a model and the degree
to which uncertainties in its predicted values can be quantified.
Why Can't Models be Proven ?
Absolute proof of a proposition or assertion can only be
accomplished within the confines of a closed logical system -
geometry, mathematics, symbolic logic. Within the natural
13
-------
sciences, truth is elusive and ultimately unattainable, for a
variety of reasons. First and most importantly, any set of
observations we make on nature comes about from the operation
of many underlying processes. Thus, data collected to test a
model, even when completely consistent with the model
predictions, do not "prove" the model, they only fail to disprove
it. When the data and the model disagree, it is usually difficult
to pinpoint the cause, not least because the model contains many
components any one of which may be the primary root of the
problem. The model may utilize well-corroborated components
that are relevant to underlying process mechanics but are
overridden by governing processes at the differing temporal or
spatial scale of the test situation [38]. Similarly, when the model
and data agree, the presence of multiple components introduces
the possibility that the agreement is wholly fortuitous, for errors
in one component may have been offset by errors in another.
Although modelers occasionally claim to find this a comfort, it
is unlikely that errors offsetting one another in one situation
would continue to oblige in another. In addition, numerical
models in the natural sciences are "under-determined" in their
parameters, i.e., there is a very large set of parameter values that
will result in a given set of output values. It is thus impossible
to obtain a unique set of calibrated parameters to match any
given set of field observations.
Numerical model extrapolations are necessary to evaluate the
behavior of new pesticides, or of new uses proposed for existing
compounds. These predictions are, however, unavoidably
contaminated by uncertainties intrinsic to both the modeling and
the data-gathering process. In structuring a process model, the
level of detail chosen for the model requires that some detail -
spatial, temporal, or causal - be either summarized or omitted as
irrelevant. For example, the amount of suspended matter in a
water column may be portrayed as a single concentration value,
of specified organic matter content, averaged over a full month,
applying to the full length of a stream reach. This representation
is clearly false: suspended matter usually contains a mixture of
particle sizes, some of which will deposit from the movingwater
and some of which will remain suspended, the amount of
sediment and its particle size distribution changing during the
month as rain storms pass thro ugh the area, etc. Difficulties with
testing this model arise from several sources. Sediment-
associated chemical is represented as a single data element,
presumably for the purposes of evaluating the removal of the
chemical from ecotoxicological concern. When testing or
parameterizing the model, however, the stream can only be
sampled at specific places and times, with an assumption that the
sampling program can adequately represent the "true" mean
values required for the model (even leaving aside the underlying
question of whether this model adequately represents
lexicological bioavailability). Uncertainty thus adheres to the
data used to parameterize the model, the data used to test it, and
to the approximation originally used to construct the model.
Increasing the detail and complexity of the model does not solve
the problem. The data collection effort now becomes more
complex, with a dependency on more complex instrumentation
and analyses with their own uncertainties. The new model
structure is conditioned (as envisioned in this example) on the
selection among competing multi-phase sorption models
developed under different laboratory conditions with a different
set of experimental uncertainties. An increase in the knowledge
embedded in the model carries substantial costs.
Although a more complex model incorporating additional
knowledge of the system or additional spatial or temporal detail
intuitively should provide more reliable predictive power, the
necessary increase in its parameter complexity increases its level
of under-determination. The number of sets of parameter values
equally able to achieve a match to the observed history thus
inevitably increases with model complexity. Because many of
these parameter sets will lead to contradictory predictions when
the model is confronted with new external data (e.g., a new
chemical or environmental setting), a good ability to match prior
experience may not be indicative of model reliability as a guide
to good regulatory decisions [36]. Many of the important tests of
a model must therefore be conducted in terms of critical tests of
its constituent hypotheses. Othertests should be conducted at the
boundaries of its use in current regulatory contexts in order to
test its ability to provide useful information under novel
conditions. Although model testing in particular field situations
can serve to demonstrate a model's ability to reflect particular
realities, only in the unlikely circumstance that this particular
field setting were fully representative of the more general safety
evaluationproblem would the problemof establishing predictive
model scenarios be resolved. The problem of "model
validation," as conventionally phrased, may thus be an unreliable
guide to the utility of a model for prediction and reliable
regulatory guidance: once again, "calling a model validated does
not mean it is valid" [16].
Testing the Performance Capabilities of Models
Despite the substantial intellectual resources devoted in recent
decades to constructing quantitative frameworks for testing
operational models [17-19, 37, 39-54], specific simple
procedures for appropriately quantifying model performance
remain incompletely developed. Quantitative assessment is too
frequently replaced with subjective evaluations consisting of
little more than a time-series plot of point observations and
model-produced continuous lines, coupled with qualitative
statements as to the adequacy of the "fit." The problem of
subjective validity criteria is especially acute in the case of
models designed for hazard and risk analysis of toxic chemicals,
because of a continual conflict in the model user's, as against the
model builder's, perception of the risk and decision structures
surrounding validation studies.
How Can Prediction Uncertainty be Quantified?
Two (related) approaches to establishing the intrinsic accuracy
and precision of a model are feasible: a descriptive approach, in
which measures of model performance are accumulated to give
a continually improving picture of model reliability, or a
14
-------
hypothesis-testing approach in which the model's ability to meet
pre-established performance standards is tested by some
appropriate statistic. In either case, a Bayesian perspective is
required: no single test can serve to unambiguously validate or
invalidate a model. A series of tests can serve, in the aggregate,
to establish confidence or fully discredit the model. Clearly,
classical Bayesian judgement is required: the point at which
enough testing has been done to declare the process complete is
an entirely social decision. The descriptive approach suffers
from an inability to evaluate the significance of any individual
test because the performance criteria are unspecified and there
is, therefore, a daunting array of possible performance reports
[19]. The hypothesis-testing approach suffers from the need to
establish performance criteria in advance to give a yardstick for
evaluatingthe model. The descriptive approach can, however, be
construed as an extension of the hypothesis-testing approach,
and the hypothesis-testing approach also can serve to frame the
discussion in terms of the risks involved in using models to
guide policy and regulatory decision-making.
Risk and Decision Analysis in Model Testing
In all analyses, there is a risk that the analyst will infer the wrong
decision from the test data. There are two varieties of such
errors: in Type I, or alpha (a) error, the analyst mistakenly
accepts the alternate hypothesis (Ha) when the null hypothesis
(H0) is in fact true. In Type II, or beta (P) error, the analyst
accepts the null hypothesis as being true when the alternate is in
fact true. The most cogent analysis of the role of risk in
inference and decision was developed by Blaise Pascal (1623-
1662, French mathematician, physicist, and philosopher). Pascal
clarified the role of the consequences of wrong decisions in
directingthe analyst's attention to the appropriate choice of error
controls, as in the example of Table 1. Here it is clearly most
important to minimize the probability of a Type I (a) error
(rejecting a true H0), because of the significant negative
consequences involved. Therefore, a, the probability of
mistakenly rejecting a true H0, should be made as small as
possible, and P, the probability of a Type II error, can perhaps
for most purposes be ignored.
R.A. Fisher, in his development of statistical practice for
application to agricultural experimentation, understood that the
tendency of the observer to desire a particular outcome is often
Table 1. Probability matrix for hypothesis testing
If the truth is:
a complicating factor in experimental design. For example, when
the effect of a new fertilizer or pesticide on crop yields is under
investigation, there is a natural tendency to want the new
material to prove to be efficacious. This tendency to see positive
results where none exist can be guarded against (Table 2) by
posing H0 as a "null" or no-difference hypothesis, and then
making it difficult to reject (i.e., minimize the probability of an
a error). In this instance H0 is posedvs. a composite single-sided
alternative that improvement exceeds some critical factor "8,"
i.e., the material must produce a "substantial" improvement, and
should it damage the crop it is of no further interest. Notice, in
Table 2, that this device for institutionalizing intellectual
honesty also allocates the more severe consequences of wrong
decisions (ruin of individual and corporate reputations for
probity and competence) to a errors. P errors generally merely
lead to the question, "Are we sure this material should be
abandoned?" - with the option of reserving judgement and
extending field trials if the material is believed on other grounds
to perhaps be efficacious after all. When this "no difference"
null hypothesis approach to H0 is used uncritically for testing the
adequacy of ecotoxicological models, however, difficulties begin
to arise. In what follows, "validation" will be used as a
convenient synonym for "quantitatively testing the ability of a
model to meet performance criteria," or, in the phrase of [37],
"determining that an adequate agreement exists between the
entity being modelled and the model for its intended use." The
primary tasks include, first, establishing a quantitative definition
of "adequate agreement," and then defining appropriate test
methods.
Goals and Constraints of Performance Tests
The term "null hypothesis" (H0) generally, although not of
necessity, is used to refer to a condition of "no difference."
Almost all validation studies have been constructed around a
null hypothesis that "the model is valid," that is, that the mean
values of the model predictions and of the observations on the
prototype are not "significantly different" [40]; or that "model
error is negligible" [43]. This approach has several inherent
difficulties.
First, the observations on the prototype and the measurements of
the parameters used to drive the model are both subject to error.
Because the variance in the parameters propagates into the
and the decision taken is to:
Believe
Not believe
H0: God exists
vs. Ha: God does
not exist
Correct decision. Probability
of this correct decision = (1-a).
Consequence: Heaven
Analyst makes P error. Probability
of this -wrong decision = P.
Consequence: unmerited faith
Analyst makes a error. Probability
of this -wrong decision = a.
Consequence: Hell
Correct decision. Probability
of this correct decision = (1-P).
Consequence: atheism
15
-------
Table 2. Decision matrix for agricultural efficacy experiment
If the truth in fact is:
and the decision that can be made is:
No improvement in yield
Improvement in yield
H0: Treatment ineffective:
no improvement in yield
vs. Ha: Improvement in
yield of size 8.
Correct: P=(l-a)
Consequence: discard
faulty material
Wrong: P error (P=P)
Consequence: discard
promising material
Wrong: a error (P=0t)
Consequence: tout faulty goods,
ruin for all
Correct: P=(1-P)
Consequence: solve world food
problem, prosper
model predictions, the paradox arises that a less precise model
is better able to withstand validation tests. Notice, in Figure 7,
that model M2 is both less accurate, as well as less precise, than
model M!, but under conventional testing will be perceived as
more "valid". For example, regression of experimental
observations on model predictions has been suggested as a
formal validation method [51]. In this case the validity of the
model is judged against an ideal finding, phrased as a null
hypothesis, of a slope of the regression line of 1 and an intercept
of 0. This test suffers from the ambiguity depicted in Figure 7:
the more scatter in the data, the larger the standard error of the
slope, and the more difficult it is to reject the null hypothesis.
Models with mote scatter are thus less likely to be rejected [55].
(Data sets with few values or badly contaminated with
measurement errors give rise to equivalent ambiguities.)
VALUE OF OBSERVATION
Figure 7. Distributions of observations on a prototype P and
two models M, and M2.
Secondly, when validation studies are posed so that the main
hypothesis (H0) is that there are no detectable differences
between the model and the real-world "prototype," the primary
hypothesis amounts to a statement that "the model is valid." As
a result, the primary default statistical focus is on minimizing the
risk of rejecting a true model (builder's risk), as a Type I (a)
error.
The resulting decision matrix (Table 3) illustrates the fallacies
of this approach. For a given sample size, in fact, increasing the
stringency of the test statistic (by decreasing a) increases the
likelihood of a P error. Regulatory scientists who must use
environmental models are concerned, quite properly, almost
exclusively with the dangers of using erroneous models. In this
design, Type II (P) errors are the risk that they must perforce
accept (model user's risk). Because Type II risks are usually not
well-controlled, perceptive users of models are subject to
continual Angst as to the reliability of their regulatory and
evaluative tools.
A similar situation arises in experimental evaluations of the
effects of pesticides on aquatic life. These tests have in some
instances been conducted by dosing small ponds or
"mesocosms" with the pesticide, comparing the dosed systems
with undosed controls. When the efficacy model of Table 2 is
employed for analysis of the results (Table 4), the paradox arises
that the most efficient way to certify the safety of a pesticide is
to conduct an inferior experiment The P error, which is poorly
controlled or simply allowed to vary in response to sample size,
contains what may be the more serious failure - allowing
dangerous materials to slip through the safety evaluation and
registration process undetected.
In some instances, validation studies have been designed to
control user's risks by creating an especially demanding
experimental frame that increases the likelihood of detecting
false models [56]. The statistical power of such designs is
always open to post-facto criticism, however, and they do not
fully meet the need for wholly objective reporting on the
performance and reliability of a simulation model. The problem,
then, is that the focus of risk control in the usual run of
validation studies is on the modeler's risk of rejecting a true
model (Type I error), and the user's risk is ignored.
Testing methodologies can in fact be designed to control both
Type I and Type n risks, while evaluating both accuracy and
precision (bias and dispersion) of simulation models. These
objectives can be met by reformulating the goals of validation
studies to better conform with several principles of good
statistical practice.
16
-------
Table 3. Conventional (flawed) decision matrix for model validation studies
If the truth in fact is: and the decision that can be made is:
No Difference
Model and Prototype Differ
H0: No difference between
"model" and "prototype" -
model is therefore "valid"
vs. Ha: model and prototype
substantially differ - model is
fatally flawed
Correct: P = (1-a)
Consequence: accept model, use
for decisions
Wrong: P error (P = P)
Consequence: user's ruin:
faulty safety decisions
Wrong: a error (P = a)
Consequence: modeler's ruin:
reject good model
Correct: P = (1-0)
Consequence: Better models
must be developed
Table 4. Conventional (flawed) decision matrix for aquatic safety testing
If the truth in fact is:
and the decision that can be made is:
No Difference
Pesticide has an impact
H0: No difference between
"treatment" and "control" -
pesticide is therefore safe
vs. Ha: treatment and control
differ-pesticide is causing
ecological harm (or benefit?)
Correct: P = (1-a)
Consequence: register safe
pesticide, allow runoff
Wrong: P error (P = P)
Consequence: allow use of
harmful material
Wrong: a error (P = a)
Consequence: ban materials that
pose little danger
Correct: P = (1-P)
Consequence: ban dangerous
materials; seek substitutes
Good Statistical Practice for Testing Models
First, when conducting a statistical evaluation of any
experimental situation, good practice demands that the null
hypothesis be phrased as the opposite of what the experimenter
wishes to prove; the tests canthenbe constructed so as to protect
against unconscious bias (stemmingfrom the modeler's natural
desire to have constructed a valid model) by making it difficult
to reject H0. In validating simulation models, then, the null
hypothesis should always be formulated in terms of H^ the
model is invalid, vs. an alternative hypothesis Ha: the model is
valid. (Note that this is the opposite of the usual practice in the
field.) This has the advantage of re-assigning the user's risk to
Type I error, and the modeler's risk to Type II error, and allows
for a "no decision" option (continued model development) in
addition to the usual "two-frame" approach to validation [57].
Second, whenever possible, statistical tests shouldbe formulated
so that H0 is a "simple" (as opposed to a "composite")
hypothesis that can be tested against a one-sided (composite)
alternative, so that it is possible to specify a and P, the
probabilities of committing Type I and Type II errors, and
determine the minimum sample size needed to attain this degree
of precision [58:262]. Parametric tests shouldbe preferred over
non-parametric techniques whenever possible: non-parametric
tests often require fewer computations and assumptions about
underlying distributions, butthey also carry greater probabilities
of Type II errors (rejecting a valid simulation model) for a given
level of user's risk, a situation highly undesirable from the
modeler's perspective!
Third, it must be recognized that both the model predictions and
the observations on the prototype involve the same experimental
unit, even though both are subject to (often independent)
measurement errors. For this reason, only "paired sample"
techniques are fully appropriate in most such situations. As in
the case of non-parametric analyses, when inappropriate
techniques (e.g.,testsof differencesbetweenmeans) are applied,
the frequency of Type II errors (modeler's risk) will tend to rise,
in this case as a response to increasing variances. Finally, given
a situation in which parametric tests canbe applied, both user's
and modeler's risks can be controlled at pre-determined levels
during the design of the study by selection of an appropriate
sample size [59].
A Methodology for Performance Testing (Validation) of
Simulation Models
To apply these principles, it must first be recognized that
objective test methods require obj ective criteria for judgment of
model validity. These criteria usually must be based on
considerations external to the model, that is, on the needs of the
model user for accuracy and precision in model outputs.
Chemical exposure models are usually used in a context of
lexicological safety evaluations, in which, given the parallel
uncertainty in toxicological data and inferences, a consistent
17
-------
factor-of-two error is probably not excessive [60]. We can thus
formulate an objective test of validity as follows:
"Predicted values from the model must be within a
factor of two of reality at least ninety-five percent of
the time."
(The particular error factor and reliability criterion selected is
not significant; the methodology would apply to a 25% error
factor, or a factor-of-three error, a 99% adherence to the error
criterion, or any other numerical specification of acceptable
uncertainty and reliability in model predictions.) Paired sample
techniques require that the validation focus on the point-to-point
differences between the model and the prototype. The error in
the prediction is a function of the difference between the model
prediction "P" and the observation on the prototype "O". The
observational pairs are usually taken over a period of time,
leading to a series {P^O^ P2,O2;...}.
Some thought should be given to the effect of serial correlation
and independence of the pairs on the significance of the test
results. For example, when studying a series of contamination
episodes in a watercourse, if the hydrology is relatively simple
dissipation of the chemical may merely reflect flow
characteristics. In this case, comparisons might be tailored to
compare the predicted and observed integrated dose for each
event, rather than comparing every measured concentration to its
predicted pair. Complete point-by-point comparisons may,
however, be entirely appropriate in a more complex hydrologic
regime or a more complex water quality setting.
Predictions can be arrived at in several ways, for example:
! When the model parameters are also measured at times t,, t2,
..., the test could be called a "structural validation," that is,
it tests the mathematical structure of the model.
! When the model is run with mean or nominal values of the
parameters (degenerate random variables, i.e., of zero
variance), the test could be termed a "parameter validation."
It tests the ability of the model to function adequately when
its input data are subject to larger errors and
approximations.
! When the model has been designed for Monte Carlo
sampling of the parameter space, parameter validation using
degenerate random variables can be followed by Monte
Carlo tests that incorporate stochastic effects into the model
predictions. These tests could provide a statistical validation
of models with explicit provision for stochasticity in their
(environmental and/or chemical) input parameters.
Regardless of the type of validation test, paired-sample error
measures allow for simultaneous stochastic effects on both P
(the model predictions) and O (observations on the prototype).
Errors in parameter estimations (expressed in P) and
measurements (sampling errors in O) are both incorporated into
the test procedure. As a result, the model with the largervariance
(M2 in Figure 7) loses its apparent advantage over its more
precise cousin.
The first test that must be executed is a simple test for positive
correlation between the P and O series. (Notice that regression
techniques cannot be applied rigorously here because neither?
nor O is known precisely.) If P and O are not positively
correlated, then the mean value of O would presumably be a
better model than is the simulation program. This test can be
simply phrased as
H0:p = Q vs. Ha:p>0
where p is the population correlation coefficient, and testing is
conducted at a significance level a of 0.01 in order to maximize
protection from false models. (Notice that should it develop that
p<0, the one-sided test will tend to support H0 rather than Ha.) In
designed validation studies, models are unlikely to fail this test
because of the wide range of conditions typically selected for
testing. In studies of real-world systems, failures are more likely;
in any case correlation serves as a convenient screening test and
confirmation that continued analysis is warranted. Although
sensitivity analysis is of most value during model development,
finding significant correlation in this test also serves to confirm
that the model is sensitive to parameters that co-vary with the
observed data in the current validation study.
Implementation of the validity criterion given above requires a
measure of the observed differences D such that a situation of no
difference between the P and O gives a value of zero, increasing
to a critical value D* at a factor-of-two difference. For
parametric testing, the underlying distribution of the comparison
function D should be Normal, and the sampled distribution ofD
in a specific test should be at least approximately normally
distributed. A simple ratio is not symmetric on the interval of
factor-of-two under- and over-prediction, and thus not a
candidate. One candidate transformation is the logarithmic, such
that D = log(P) - log(O). This function has the desired
properties:
at P = O,
at P = 2 x O,
at P = O/2,
D = 0.0
D = +0.301
D = -0.301
Thus, the maximum permissible bias (i.e., systematic over- or
under-prediction), for a model with absolute precision (zero
variance), would occur at p| = 0.301 = D*.
The validity criterion includes a precision constraint as well,
here taken as a constraint that 95% of the errors must be within
the limits of permissible bias. This requirement leads to a
maximum permissible value for the standard deviation of D, in
addition to its maximummean value. For a completely unbiased
model (|ifl = 0), a 95% confidence interval onD may not exceed
18
-------
Ha: D is distributed in some other way.
0,025
0.025
Z= -1,96
D*= -0.301
Z= 1,98
D'= 0,301
Figure 8. Maximum values of bias (D*) and dispersion (a*)
permitted of model meeting validation criteria.
the limits [-0.301,+0.301] (i.e., a factor of 2 (log(2)=0.301)) in
order that the model be judged valid. For a normally distributed
random variable, 95% of the area under the curve is
encompassed by the interval -1.96 to +1.96 in z, where z is a
random variable having the Standard Normal distribution
N(z;0,l) (that is, z is normally distributed with mean zero and
variance 1). The corresponding points on the underlying
distribution of D (assuming it to be Normal) are the limits of
permissible bias -0.301 and +0.301 (Figure 8). In order to
compute the maximum permissible value for a, the standard
deviation of D, we need only observe that when D takes on any
value X, the corresponding standardized normal random variable
z assumes the value (X-|i)/o. For an unbiased model (fJ.D=0), the
critical value of the standard deviation ofD at the boundary of
the region of acceptable variances (also D*), say a*, is (Figure
8):
a* = D*/1.96 = 0.301/1.96 = 0.154
This suggests that one test that must be performed during the
validation is
H„: a = 0.154 vs. Ha: a < 0.154
using, for example, a x2 test with a = 0.01. Note that the test is
again phrased with a simple null hypothesis, a one-sided
composite alternative, and is designed to preserve our interest in
valid models by making it difficult to reject H0. As with the test
for correlation given above, if o>0.154 the test will tend to
support the null hypothesis and the model will fail validation.
Clearly a goodness-of-fit test of the supposition that D is
normally distributed will also be required, which can be posed
in the conventional way as
H0: D is (approximately) normal, vs.
This test can also be conducted via a x2 test, but in this case, the
goal of protecting against false models suggests that it be
somewhat easier to reject Hg. We can, for example, relax a to
0.10, in preference to a = 0.01, in order to continue to minimize
model user' s risks, in the sense that this makes it more likely that
the premises upon which the analysis is constructed will be
rejected. The statistic computed for this test is a value of a
random variable whose distribution is approximately chi-square
with (p-q-l) degrees of freedom (d.f.), where/) is the number of
cells and q is the number of parameters estimated from the data.
As two parameters (|i and a) must be estimated, a minimum of
four cells (groups) is required to conduct this test (with only a
single remaining degree of freedom). It is customary to use this
test only when none of the expected frequencies is less thanfive
[58:284]. This suggests that sample sizes should not be less than
twenty in order to validate normality assumptions.
Alternative tests of normality are available for smaller sample
sizes. The Shapiro-Wilk test [61] has on occasion been
recommended by EPA as a "test of choice" fornormality [11].
For this test, the n observations onD are ordered so that X>X ^,
i=2,3,...n, and the test parameter b is calculated from
n/2
where the ain are tabulated coefficients for the elements of the
sum; the values of the ain depend on the sample size n. Wn, the
test statistic, is then calculated from Wn=b2/[(n-l)s2], s1 the
sample variance of theXj. If Wnis smaller than the critical value
W,^ the null hypothesis (of normality) is rejected. A description
of the test and tables of the required coefficients and percentage
points of the test statistic are available for 2<«<50 in [62].
The method of constructing sample sizes for controlled levels of
user's (a) and modeler's (P) risk [59] requires the specification
of acceptable levels of both. In this instance, we will select both
risk levels at 0.0 1 . Clearly the appropriate values for these risks
depend on the perceived social significance of the acceptance
and use of false models, and of the rejection of true models; the
methodology does not depend on the particular values selected.
This methodology does possess the significant advantage,
however, of making the risks explicit.
Combining bias and dispersion in the model performance
criterion requires specification of the least effect that must be
detectable in the analysis (8). Because a is unknown in paired-
sample analyses, when calculating the minimum required sample
size 8 must be phrased as a function of standard deviation (S.D.,
here the S.D. of the error measure D). This value of the S.D. can
be derived by considering the fact that all real models possess
some bias, even if it is not detectable (i.e., not "statistically
significant"). It should be emphasized that the simple presence
of detectable bias is not of any particular interest in a validation
19
-------
study because its presence can be conceded from the outset, and
sufficient testing is certain ultimately to detect it. The points of
concern are, first, whether the accuracy and precision of the
model are sufficient to meet the validity criterion (here that 95%
of predictions be within a factor of two of reality), and, second,
the method of translating this criterion into an objective test that
is based on the bias and dispersion of the error measure.
The smallest interval of interest (the least detectable effect) will
occur when sufficient bias is present to allocate all violations of
the validity criterion to one side or the other of the error
distribution (Figure 9) : this value is necessarily smallerthan any
instance in which the permitted 5% of predictions larger than the
permissible (factor-of-two) standard are occurring as both under-
and over-predictions. The left side of Figure 9 depicts a case in
whichthe model is producing predictions somewhat smallerthan
the observations on the prototype, that is, D is less than zero (the
argument is symmetrical forD>0). Assuming that a < a* (which
assumption will be validated in a separate test), we need concern
ourselves only with the left-hand side of the distribution of D.
The true value of D must be greater than -D* by an amount
sufficient that only 5% of the total area under the curve is to the
left of -D*. In the standard normal distribution, this (single-
sided) value corresponds to z = -1.645, and this is the interval
"8" which must be detectable in the validation study. (The final
test will be posed in terms of single-sided alternatives, with
selection of the appropriate test predicated on whether the
estimated mean value ofD is less than, or greaterthan, zero.) As
was done above for determining the maximum permissible value
of a (a* = 0.154), the value of 8 can be computed by taking
advantage of the relationship between any normally distributed
random variable X and the Standard Normal variate z: z = (X-
\i)/a. As z = 1.645, and (X-\i) is the distance D*-|iD = 8, the
difference that must be detectable in the study, substitution in
this formula leads to 8 = (D*-|iD) = 1.645 a.
The formula for computation of the required minimum sample
size [59] is
N=(Ua+Up)2o2/tf
where Ua and C/p are the user's and modeler's risk points on the
normal probability curve; here both have a (single-sided) value
of Zggj = 2.326 [59:325]. Substituting 8 = 1.645 a yields avalue
of N = 7.997. As the true value of a is actually unknown, this
estimate of N is too low. When a2 is unknown, validity tests
must be based on the (single-sided) t distribution rather than the
Normal distribution, and the equation for TV must be revised to
read
Taking N-l = 6.997 degrees of freedom, the probability point on
the (single-sided) / distribution for an 0.01 level of risk is 3.0
[59:326], and
TV, = (3+3)2/(1.645)2=13.3
Therefore, the collection of 14 samples will control both user's
and modeler's risk at a level of 0.01, for a normally distributed
error measure having a standard deviation a < 0.154. (Recall,
however, that a minimum of 20 samples is required for a x2 test
of the normality of the error measure; the Shapiro-Wilk test is
available for N= 14).
0.05
-D* D.
Figure 9. Definition sketch for formulation of objective
validation tests.
The actual single-sided tests on the error measure can now be
specified. These tests incorporate both the accuracy and the
precision requirements of the validity criterion imposed on the
model, with the selection of the appropriate test dependent on
the sign of the sample mean of the error measure D (Figure 9).
In order to phrase the final test as a simple (non-composite) null
hypothesis, we will specify that the model will be rejected as
invalid if it is at (or, by implication, beyond) the boundary of the
validity criterion. This is a slightly more restrictive test than the
original criterion, but its advantages in posing the null
hypothesis for risk control greatly outweigh any loss of latitude
in acceptable model performance. (The criterion was not initially
specified in this form because of the convoluted language
required: It now reads "the model must be closer to reality than
a factor of two, at least slightly more than 95% of the time.")
The appropriate hypotheses are, for D > 0,
H0: |iD = (D*-8), with the alternate Ha: |iD < (D*-8)
and, for D < 0,
H0: |iD = (-D*+8), with the alternate Ha: |iD > (-D*+8)
where 8, the least significant difference, is now computed from
the sample standard deviation S of D, and the / distribution at a
("single-sided") probability point of 0.05 (if based on a validity
criterion of 95% precision) with n-l degrees of freedom:
0, by comparing
the value from the single-sided / distribution (tc = ^n_i), a the
level of user's risk, to the sample / statistic computed from
20
-------
£ =
(D-(D -
g is rejected jf
When D < 0, the sample t statistic is computed from
- (- D* " +
and /ffl is rejected if />
A Step-Wise Procedure for Performance Testing
The steps required for a validation study can now be
enumerated:
Step 1: Select appropriate validity criteria and establish
acceptable levels of user's and modeler's risk. (Except as
otherwise noted, upon completion of Step 2 the level of user's
risk selected in Step 1 should be used as the a risk for all
subsequent analyses.) As here illustrated, one choice for
ecotoxicological models is to specify that the model predictions
and the observed data must differ by no more than a factor of
two, at least 95% of the time, with the level of user's risk
controlled at 1%, i.e., no more than one chance in a hundred of
accepting an invalid model. In the example of DMDE photolysis
in EXAMS (below), in which a constituent hypothesis of the
model is being tested, the acceptable difference is set at a factor
of 2.Ox, at least 99% of the time, with a 1% user's risk of the
model being falsely accepted as meeting the criterion.
Step 2: Develop an appropriate error measure for
H0: the model is invalid-it cannot meet an acceptable range
of accuracy and precision in the current context, vs.
Ha: the model is valid (acceptable), at least in this instance.
Step 3: Determine the minimum sample size requirements
needed to control risks at acceptable levels and to provide
sufficient data for tests of underlying assumptions.
Step 4: Collect point-for-point sample data on the model output
and an experimental unit (prototype), producing a set of paired
samples of simulation model predictions and observations on
equivalent properties of the prototype system.
Step 5: Test the paired data for significant correlation; reject the
model if they are not positively correlated.
Step 6: Compute the dataset for the error measure D, and test the
hypothesis thatD is approximately normal (via a x2 or other test
at a level of significance suitably less restrictive than the a risk
level in use to control user's risks). If the null hypothesis is
rejected, the analysis is invalid and will at the least require
reformulation.
Step 7: Test the null hypothesis
Hg: a = a", the model is unacceptably imprecise, vs. the one-
sided composite alternative
Ha: a < a', the model precision is adequate, where a* is the
critical maximum permissible variance derived from the
validity criterion imposed in Step 1.
Step 8: Test the full validity of the simulation model by
comparison of the computed value of the / statistic to the
appropriate (a, v) point on the / distribution, v the degrees of
freedom.
The decision matrix is given in Table 5; note the allocation of
model user's risk to a error. lfH0 is rejected then the model can
be concluded to be "not invalid" (at least in this instance) with
certainty of at least (1-a), where a is the level of model user's
risk selected in Step 1. In the DMDE photolysis example
presented here, if \t\ > \tc, the model can be accepted as having
passed this particular performance test, with 99% certainty.
A Substantial Example: Photolysis of DMDE in EXAMS
The Exposure Analysis Modeling System EXAMS [4] contains
algorithms for computing radiative transfer in the atmosphere as
a function of location and climate, and for coupling the
absorption spectrum and quantum yield of a synthetic chemical
to computed spectral irradiance in order to arrive at an estimate
of the rate of photochemical reactions in the aqueous
environment. Zepp [personal communication; 63] has examined
the clear-sky behavior of l,l-bis(^-methoxy-phenyl)-2,2-
dichloroethylene (DMDE) at solar noon, collecting 20
independent samples over the course of the calendaryear. These
experiments provide an opportunity to evaluate EXAMS'
constituent hypotheses governing direct photolysis kinetics,
within the context of the complete model system.
The EXAMS program calculates rate constants based on mean
whole-day (i.e., over the 24-hour period) conditions on the day
of the month when the solar declination results in the monthly
mean value of the incoming extra-terrestrial irradiance [64:62].
For testing, EXAMS was loaded with the declination and radius
vector of the Earth on the specific dates at which the
observations O were taken. To form a commensurate dataset, the
rate constant k computed by EXAMS must be cosine corrected for
having been averaged over the courseof aday, and then adjusted
for day length:
k (noon value) = (daily value) (rc/2) x (24/day length)
The parameter robustness of the model was investigated by
loading the code with standard values of its environmental input
parameters. Monthly mean values of stratospheric ozone were
loaded automatically via specification of the latitude (33.94°)
and longitude (-83.32°) of the Athens, Georgia(US A) laboratory
at which the observations were collected. (See page 30 for a
21
-------
Table 5. Decision matrix for unambiguous validation testing of environmental models
If the truth in fact is:
and the decision that can be made is:
Accept H0-reject the
model as "invalid"
Reject H0-accept the
model as "valid"
H0: Dispersion or differences
of model too large-model
is thus "invalid"
vs. Ha: model has acceptable
precision and accuracy, and
can be accepted as "valid"
Correct decision: P = (1-a)
Consequence: continue
model development
Wrong: ft error (P = P)
Consequence: wasted effort
fixing valid model
Wrong: a error (P = a)
Consequence: accept bad model, make
poor decisions
Correct decision: P = (1-P)
Consequence: use "valid" model to
improve safety
description of EXAMS' internal dataset, a compilation of data
from the TOMS (Total Ozone Mapping Spectrometer) instrument
flown on the Nimbus-7 spacecraft). EXAMS' input monthly
atmospheric turbidities [65] were taken as a constant 2 km (the
default monthly values) usinga "Rural" atmospheric type. These
data thus constitute afairly rigorous test of the model's ability to
withstand reasonable levels of error and approximation in its
input parameters, and represent typical use in a regulatory
environment in which the default data are routinely employed.
For testing a constituent process algorithm of EXAMS, the
acceptable validity criterion, i.e., the uncertainty adhering to a
point estimate from the model, was taken as "model and
observation must differ by less than a factor of two, at least
slightly betterthan 99% of the time, with 99% certainty." Putting
Step 1 in another way, passing the test will serve to certify, with
less than 1% chance of error, that 99% of the estimates are
within a factor of two of equivalent experimental observations.
(Note that, so long as (kt), the product of the decay rate constant
and time, is < -4, statements concerning half-lives are equivalent
to within about 2% to statements concerning relative exposure
concentrations.)
For Step 2, D is taken as log(P)-log(O) and D*, the limit of
acceptable difference, is log(2.0). The criterion of 99% of
differences within 2.0x results in a maximum permissible
standard deviation (o*)of Iog(2.0)/z001 = 0.301/2.576 = 0.1168.
(The area under the standard normal between -2.576 and +2.576
encloses 99% of the total.)
For Step 3, calculation of the minimum number of pairs required
to achieve 0.01 a and P risks, the least detectable effect 8 is
based on the point on the normal probability curve for which
99% of the curve lies to the right of D* (see Figure 9, here using
0.01, rather than 0.05, as the area outside D*). The ("single-
sided") value of z001 = 2.326, so 8 = 2.326o. The minimum
sample size to achieve a 1% Type I user's and Type II modeler's
risk is (3+3)2/(2.326)2 = 7.
Step 4. Table 6 gives observed half-lives, EXAMS' predicted
values, and the computed differences (D = log(P) - log (O)).
Step 5. Correlation analysis to validate the assumption that the
simulation model is a better predictor of the observations than is
the sample mean. Correlation of the O and P datasets in Table
6 yields a value of the sample correlation coefficient r of 0.93.
In testing the null hypothesis (H0: p = 0) vs. the one-sided
alternative (Ha: p > 0) at an a=0.01 level of significance, the
computed value of the z statistic must be greater than the single-
sided z0 01=2.326. Computing the z statistic associated with this
value of r in the usual way [58:311] gives
2
1-r
The null hypothesis can thus be safely rejected, leading to the
conclusion that the observations and model predictions are
indeed positively correlated and the analysis may proceed.
Step 6. Test the (approximate) normality of the distribution of
the error measure D. This step tests the null hypothesis
H0: D is approximately normal,
vs. Ha: D is not normally distributed and the
analysis must be reformulated.
To execute the x2 test, the sample mean and standard deviation
are used to predict the frequency distribution of the observations
D on the pairs. These expected values are then compared to the
actual distribution of the sample differences via a x2 "goodness-
of-fit" test. The test can be conducted by partitioning the
observed data into the required minimum of four cells (to have
at least one d.f.), each of which meets the requirement of
expected frequency >5, by constructing four cells of equal
frequency centered on the sample mean (-0.0032). The value of
z in the Standard Normal (0.675) giving the 25% points was
converted to matching intervals on the distribution of D,
assuming!) to be normally distributed with mean -0.0032 and
standard deviation 0.0707. This has the effect of partitioning D
into 4 cells with expected frequency of 5 values of D in each
cell. The 20 observations in Table 6 were then assigned to the
matching cells in Table 7.
22
-------
From the results shown in Table 7, the x2 statistic is computed
in the usual way as the sum of the (O-E)2/E, giving in this case
a value of 0 (zero). This value is less than the (0.10, 1) critical
value of x2 (2.70), so the null hypothesis that the distribution of
D is approximately normal cannot be rejected. Execution of a
Shapiro-Wilk test on the ordered set of D yields W=0.9659. As
0.9659>W(20 010)=0.920, the null hypothesis that D has a normal
distribution again cannot be rejected and the analysis may
proceed.
Table 6. Observed and predicted EXAMS mid-day half-lives of DMDE
at Athens, Georgia.
Mid-day Half-lives
Date
2 Jan
23 Jan
31 Jan
19Feb
24 Apr
21 Jim
23 Jim
25 Aug
26 Sep
30 Sep
20ct
31 Oct
1 Nov
1 Nov
19 Nov
20 Nov
3 Dec
4 Dec
12 Dec
22 Dec
Sample
mean
Ozone (O3)'
cmNTP
0.282
0.282
0.282
0.296
0.316
0.312
0.312
0.288
0.274
0.274
0.264
0.264
0.268
0.268
0.268
0.268
0.278
0.278
0.278
0.278
arithmetic
Sample standard
deviation (S)
(hours)
Observed Model
4.41
3.38
2.30
1.67
1.36
0.96
1.02
1.06
1.32
1.55
1.20
1.82
1.40
1.45
2.35
2.24
3.00
2.60
3.40
4.05
2.13
1.04
3.35
2.69
2.39
1.81
1.03
0.93
0.93
1.05
1.30
1.36
1.30
2.02
1.89
1.89
2.50
2.51
2.99
3.01
3.19
3.28
2.07
0.84
Difference
metric (D)
-0.1194
-0.0992
0.0167
0.0350
-0.1207
-0.0138
-0.0401
-0.0041
-0.0066
-0.0568
0.0348
0.0453
0.1303
0.1151
0.0269
0.0494
-0.0015
0.0636
-0.0277
-0.0916
-0.0032
0.0707
Step 7. This step tests the sample standard deviation for
exceedence of the critical value specified in the original validity
criterion. The hypotheses are phrased as
H0\ 0 = 0' = 0.1 168, vs. Ha\ a < 0. 1 168
In this test, the null hypothesis can be rejected if the computed
X2 statistic is less than the value of x2 for the (1-a) level of
significance withw-1 (v=19) degrees of freedom (7. 633).
The sample statistic is
_ (19X0,0707)- _
_
and, as 6.96 < 7.633, it can be concluded with 99% confidence
that the precision of the model is adequate.
Step 8: Test the validity of the model, given criteria for
acceptable accuracy and precision. Because the sample mean of
the error observations is less than zero, the appropriate null
hypothesis is
H0: |iD = (-D*+5) vs. Ha: |iD > (-D*+5)
In this case, D* = 0.301, and, selecting the appropriate point on
the / distribution to detect a 1% (one-sided) error rate,
-_ Wiss _ (2.539)(0.0707)
-.fa 447
The (single-sided) critical value tc for 19 degrees of freedom (v)
at the a=0.01 level of significance is 2.539. The null hypothesis
can be rejected if the computed / statistic is >2.529.
Table 7. x2 test of normality of error measure D
Intervals of 25%
onN(D;-0.0032, 0.0707)
< -0.0509
-0.0509 to -0.0032
-0.0032 to +0.0445
> +0.0445
The computed value is
Obs
Counts
5
5
5
5
s
(-0.0032 -(-0.301 + 0.0402))-/2C!
Expect
Prop
0.25
0.25
0.25
0.25
-Irtl
Exp
Counts
5
5
5
5
1 Monthly mean O, from Nimbus-7 TOMS.
0.0707
As 16.3> tc (=2.539), the null hypothesis canbe rejected, and the
simulation model can, with 99% confidence, be accepted as
satisfying the performance criterion: betterthan99% of EXAMS'
23
-------
estimates differ by no more than 2* from experimental
observations of clear-sky photolysis of DMDE.
Descriptive Statistics and Predictive Uncertainty
The uncertainty attaching to any point estimate of near-surface
photolytic half-lives can be developed directly from the error
measure D in Table 6. D has a mean value of -0.0032 and
standard deviation 0.0707, giving a 99% confidence interval of
-0.0032±0.0452. This interval encompasses uncertainties in the
model, uncertainties in the measurement of light absorption
spectra and quantum yields, and uncertainties in the
experimental procedure used to measure DMDE photolysis.
Because all these contribute to the uncertainty in final exposure
estimates, it is appropriate to combine them into a single
variance estimate.
The initial question was phrased in terms of exposure
concentrations rather than the estimates of photolytic half-life
that were used to test an internal constituent hypothesis of the
EXAMS model. These specific results can, however, be used to
project the uncertainty in photolysis kinetics onto the model
exposure estimates used for risk assessment. EXAMS produces,
as part of its standard outputs, estimates of acute and chronic
mean exposure concentrations. Granting the average difference
between the experimental and computed half-lives as deriving
from model bias (rather than experimental error), the statistical
properties of D can be translated into variation in predicted
exposure concentrations via transformation of the distribution of
half-lives into photolytic decay rate constants, and integration of
the resulting chemical decay curves. By positing an example
(albeit unrealistically simplified) scenario with an available
analytical solution, the case can also be used to "verify" sensu
[37] or "bench-mark" sensu [20] the numerical algorithms used
in EXAMS to compute the time course of pesticide dissipation and
to arrive at ecotoxicological exposure estimates.
For purposes of concrete illustration, consider the example of a
chemical with an observed ("true") half-life of exactly 2 days,
giving a disappearance rate constant/: of 0.3466 d"1. Fora simple
dissipation of DMDE from clean, shallow water, the average
value over t days is (C0-Ct)/(kt), where C0 is the initial
concentration and Ct the concentration at time /. Taking the
initial concentration as 1 mg/L, the "true" concentration after
four days would be 0.25 mg/L, and the average (the 96-hour
"acute exposure" value) would be 0.541 mg/L. Similarly, at the
expiration of 21 days the average concentration (the "chronic"
value) would be 0.137 mg/L. Evaluation of the predictive
uncertainty of the model, in terms of the original requirement
that exposure estimates be within a factor of two of reality at
least 99% of the time, can now be accomplished. Examination of
Table 8 shows that EXAMS' estimates of acute (4-day) and
chronic (21 -day, 60-day, 90-day, and annual) exposures are well
within the factor-of-two criterion at the end points of the 99%
confidence limit on D (expressed as half-lives), thus confirming
that the original requirement stated for testing predictions from
the model has been satisfied.
Given its restricted range of process chemistry, this example
validation is an illustration of the fact that global validation can
only be accomplished through a series of specific validations,
each of which is critically dependent on some subset of the
constituent hypotheses of the model. In this instance, however,
it has been unambiguously demonstrated that the EXAMS code
can compute clear-sky radiative transfer and near-surface
photolysis of synthetic chemicals, even under severe conditions
of parameter approximation. Although EXAMS has many
additional capabilities, each deserving specific validation, every
instance of successful (objective, risk-controlled) validation adds
to the confidence that can be placed in this simulation model, via
its incremental Bayesian reduction in global model user's risk.
Specifically, an accumulation of tests formulated to demonstrate,
with 99% certainty, that the model is producing estimates within
a factor of two of observed values more than 95% of the time,
could be used to develop and support a variance or expected
method error or intrinsic uncertainty for model point estimates
used in probabilistic risk assessment.
Predictive Validity of Exposure Models
The quality and reliability of a model becomes established
through a variety of tests. Internal tests of the quality of model
construction are a necessary, but not a sufficient, condition for
establishing a model's value as a regulatory tool: testing the
strength of the welds in an automobile frame does not eliminate
the need for full-scale crash tests. The only appropriate tests of
a model's reliability as a predictive tool for regulatory decision
("validity") are those conducted by comparison of full model
capabilities with independent experimental and field
observations of model output quantities. Even so, the
conclusions drawn must be qualified by the observation that any
given test can only exercise a subset of the constituent
hypotheses and computational techniques of the model.
Objective validations can only be conducted when the criteria
for validity are objectively specified. Because the social
consequences of accepting false models (inadequate chemical
safety regulations) are much more serious than the consequences
of rejecting true models (continued research and validation
studies), model validations should always be phrased to test the
null hypothesis that "the model is invalid." As only a single
experimental unit is available for each measurement, paired
sample techniques must be used to create a comparison metric D.
When the underlying distribution of D can be identified, the
tools of parametric hypothesis testing and statistical decision
theory become available to the analyst, as in this validation of
the EXAMS radiative transfer and direct photolysis algorithms.
These tools allow for preliminary experimental designs
guaranteed to control user's and modeler's risks at acceptable
levels. In addition, the usual tools of statistical inference
(confidence limits, etc.) can be used to compare models, to
compute the validity properties (bias and dispersion) exhibited
by a simulation model when it is subjected to particular
experimental conditions, and to contribute to estimates of
prediction uncertainty. Even when non-parametric tests must be
24
-------
Table 8. Prediction uncertainty for rapid photolytic dissipation of DMDE from surface water
Lower 99% "True" Value "True" Value Upper 99%
Half-Life (days)
Source
Acute (96-hour) Exposure (mg/L)
Chronic (21 -day) Exposure (mg/L)
Average 60-day Exposure (mg/L)
Average 90-day Exposure (mg/L)
Average Annual Exposure (mg/L)
1.789
EXAMS
0.515
0.124
0.0436
0.0290
0.00716
2.000
Analytical
0.541
0.137
0.0481
0.0321
0.00791
2.000
EXAMS
0.546
0.139
0.0486
0.0324
0.00798
2.203
EXAMS
0.574
0.152
0.0534
0.0356
0.00878
employed, however, the adverse consequences of using false
models mandate that model user's risks be assigned to Type I
errors by proper formulation of the null hypothesis as "this
model is invalid" until proven otherwise.
It should also be remarked that this analysis has been formulated
from an implicit view that the task of the model is to provide an
adequate mimic of experimental results. In fact, what is sought
is concordance between two views, the experimental and the
mathematical, of a single natural obj ect, in this instance chemical
transformation in the presence of sunlight. Discord couldaccrue
from errors in the mathematics, errors in the conduct of the
experiments and measurements, or from errors in underlying
physical and chemical conceptualizations. The similarity
between complex numerical models and scientific theory is
apparent. The continual attention to and repair of flaws
discovered in numerical models differs little in motivation or
technique from the more general development process of
scientific theory.
Neither concordance nor dissonance between model and data
should be taken as definitive, however, for notorious instances
of the failure of each can be found in the history of science [14].
Astronomers of the 16th century, in response to Copernicus'
heliocentric theory, searched for apparent movement of the stars
("stellar parallax") and, finding none, rejected Copernicus'
theory based on the failure of observation to confirm theory.
Today we realize that contemporary telescopes were inadequate
to the task, and the failure was one of observation rather than
one of theory. The wiser course, when confronted with a failure
of observation to conform to theory, is often to reserve
judgement.
In the 19th century, Lord Kelvin, working from well-established
physical principles, calculated the age of the Earth from the rate
of cooling of an initially molten sphere and concluded that
Lyell's geology and Darwin's evolution by natural selection
were invalid because Earth's history contained insufficient time
for Uniformitarian processes to have brought about the observed
world. This view held sway for a generation, and it was not until
the discovery of radiogenic heat that evolutionary biology and
geology began to regain their lost ground. The wiser course,
when confronted with a failure of theory to conform to
observation, is often to reserve judgement.
Finally, it should also be observed that ^'-century astronomers
had a competing theory of the heavens at hand in the Ptolemaic
view of the universe. Had that theory been computerized, its
proponents would have been well-justified in terming it a
"validated" model, with myriad instances of concordance
between model prediction and astronomical observation. The
wiser course, when confronted with a concordance of
observation and theory, is often to proceed with regulatory
action while always remembering the limits of scientific
knowledge and retaining a willingness to revisit regulatory
decisions in response to new discoveries and an evolving
paradigm of risk assessment.
Current Model Validation Status
AgDisp/AgDrift
A systematic evaluation of the AgDisp algorithms, which
simulate off-site drift and deposition of aerially applied
pesticides, contained in the AgDRIFT® model [66] was
performed by comparing model simulations to field trial data
collected by the Spray Drift Task Force [67]. Field trial data
used for model evaluation included 161 separate trials of typical
agriculture aerial applications under a wide range of application
and meteorological conditions. Input for model simulations
included information on the aircraft and spray equipment, spray
material, meteorology, and site geometry. The model input data
sets were generated independently of the field deposition results
- i.e., model inputs were in no way altered or selected to
improve the fit of model output to field results. AgDisp shows
a response similar to that of the field observations for many
application variables (e.g., droplet size, application height, wind
speed). AgDisp is, however, sensitive to evaporative effects, and
modeled deposition in the far field responds to wet bulb
depression although the field observations did not. The model
tended to over-predict deposition rates relative to the field data
25
-------
for far-field distances, particularly under evaporative conditions.
AgDisp was in good agreement with field results for estimating
near-field buffer zones needed to manage human, crop,
livestock, and ecological exposure.
Pesticide Root Zone Model (PRZM)
The FIFRA Environmental Model Validation Task Force
(FEMVTF), a collaborative effort of scientists from the crop
protection industry and the U.S. Environmental Protection
Agency, compared the results of PRZM predictions with
measured data collected in 18 different leaching and runoff field
studies as part of a process to improve confidence in the results
of regulatory modeling; the following discussion is drawn from
their report [68].
In its initial phase, the Model Validation Project reviewed
existing, published studies on the validation of PRZM. The
primary purpose of this literature review was to assess the
quality and quantity of existing information to determine
whether additional model validation studies were needed. A
second purpose of the literature review was to collect
information that would be useful in planning future model
validation studies. The report [68] summarizes both aspects of
this literature review and presents the reasons why the FIFRA
Exposure Modeling Work Group concluded that more validation
research would be useful in improving confidence in models
used in regulatory assessments.
The literature search identified 35 articles involving the
calibration/validation of model simulations with measured data.
These studies included data from seven countries on three
continents and a number of different compounds. Due to the
varied nature of the papers and the lack of details for both model
predictions and measured results, a detailed comparison of
model predictions to observed data proved impossible. The
majority of the papers indicated good agreement betweenmodel
predictions and measurements, or that the models generally
predicted more movement than actually occurred. These results,
given the wide range of conditions reported in the papers, weie
taken by FEMVTF to lend general support to the use of PRZM in
the regulatory process, especially for predicting leaching.
Following review of this literature, the FiFRAExposure Modeling
Work Group decided that additional comparisons of field data
and model predictions would be useful to supplement existing
studies in helping improve confidence in the regulatory use of
environmental models for predicting leaching and runoff. The
following observations contributed to this decision:
• None of the published studies used the current version (3) of
the model (this is especially important in that PRZM runoff
routines have significantly evolved inversion 3).
• Very few of the studies focused on runoff losses (most studies
focused on the mobility of crop protection products in the
soil profile).
• The number of studies having quantitative validation results is
minimal. Since few of the published studies consider model
validation the primary purpose of the field experiments,
often data sets were not as extensive as would be desirable
for model validation.
• Modelers were aware of field results in most of the studies
(although in some of the studies where the field results were
known, modelers claimed to make no adjustments to the
input parameters). Therefore, in these studies the
comparisons of model predictions and experimental
measurements could be considered calibration - in model
validation the modeler should have no knowledge of the
field results to prevent biasing the selection of input
parameters.
The Task Force report concludes that PRZM provides a
reasonable estimate of chemical runoff at the edge of a field.
Simulations based on the best choices for input parameters (no
conservatism built into input parameters) were generally within
an order of magnitude of measured data, with better agreement
observed both for larger events and for cumulative values over
the study period. When the model input parameters were
calibrated to improve the hydrology, the fit between predicted
values and observed data improved (results usually within a
factor of three). When conservatism was deliberately introduced
into the input pesticide parameters, substantial over-prediction
of runoff loses occurred.
Simulations with PRZM obtained reasonable estimates of
leaching in homogeneous soils where preferential flow was not
significant. PRZM usually did agoodjob of predicting movement
of bromide in soil (soil pore water concentrations were generally
within a factor of two of predicted values). For simulations
based on the best choices for input parameters (no conservatism
built into input parameters), predictions of soil pore water
concentrations for pesticides were usually within a factor of
three. This was about a factor of two closer than when
conservative assumptions were used to define input pesticide
parameters. When the model input parameters were calibrated to
improve the hydrology, predicted pesticide concentrations were
usually within a factor of two of measured concentrations.
Because of the sensitivity of leaching to degradation rate, the
best predictions were obtained with pesticides with relatively
slow degradation rates.
Differences in initial work conducted by different analysts
demonstrated the importance of having a "standard operating
procedure" to define the selection of all model inputparameters.
The FEMVTF concluded that the most satisfactory way to
implement regulatory modeling is through the development of a
"shell" that could provide all input parameters related to the
scenario, with the user providing only the parameters related to
the specific pesticide being assessed.
26
-------
Exposure Analysis Modeling System (EXAMS)
Validation studies of EXAMS have been conducted, inter alia, in
the Monongahela River, USA [69], an outdoor pond in Germany
[70], a bay (Norrsundet Bay) on the east coast of Sweden [71,
72], Japanese rice paddies [73], and rivers in the UK [74] and in
South Dakota [75].
In the Monongahela River study [69], model predictions of
phenol concentration downstream from a steel mill effluent were
compared to ambient data. Agreement between observed levels
of phenol and model predictions was best when the
concentration of oxidizing species was treated as a reach-
specific calibration parameter, although satisfactory agreement
was evident using a single value of 10"8 M.
The dyestuff Disperse Yellow (DY 42, C.I. No. 10338) was
introduced into an outdoor pond [70]. The model was judged to
show "good agreement" with the measured behavior of the dye,
but the published concentration time-series clearly indicates that
the exchange rate used in the authors' pond scenario
underestimated the velocity of capture of the dye by benthic
sediments.
Norrsundet Bay is heavily polluted with kraft mill effluent [71,
72]. The model scenario was calibrated using data on chloroform
in wastewater and seawater, and then tested on four other
pollutants present in the wastewater (2,4,6-trichlorophenol,
3,4,5-trichloroguaiacol, tetra-chloroguaiacol, and
tetrachlorocatechol). The results were judged "satisfactory" for
three of the test compounds; failures with tetrachlorocatechol
were suspected to arise from this compound's high affinity for
suspended sediment.
In studies of the dissipation of sulfonylurea herbicides in rice
paddy [73], model scenarios were calibrated against paddy water
dissipation data from two 1-m2 simulated paddies to obtain
values forthe benthic exchange coefficient and oxidative radical
concentrations. The EXAMS scenario then successfully predicted
the partitioning and degradation that led to half-lives in paddy
water of 3 - 4 days observed in field studies conducted in Japan.
River studies in the UK [74] compared model predictions to
downstream losses of styrene, xylenes, dichloro-benzenes, and
4-phenyldodecane in treatment plant effluent. Quantitative
predictions compared well with observed values for those
compounds for which reliable environmental rate data were
available. Rapid losses of 4-phenyldodecane were observed in
the river, but there were not sufficient data on the postulated loss
mechanism (indirect photolysis) to derive input parameters for
the model. Fate and transport of an anionic surfactant were
studied in Rapid Creek, South Dakota [75]. Laboratory limnetic
and benthic biolysis constants were used, and benthic exchange
was calibrated to field data. EXAMS predictions agreed with
observed water column and sediment concentrations to within
factors of ±2 and ±4 respectively.
Validation studies of 1,4-dichlorobenzene in Lake Zurich,
Switzerland [32] showed excellent agreement (differences
<10%) between values measured in the Lake and EXAMS
predictions; this study is included as an example application in
the EXAMS user's guide [4]. Volatilization of diazinon,
parathion, methyl parathion, and malathion from water, wet soil,
and a water-soil mixture were studied in a simple environmental
chamber and the results compared to EXAMS estimates of
volatilization rates [76]. Experimental and predicted daily
fractions volatilized agreed within a factor of three for diazinon,
methyl parathion, and malathion, and within a factor of five for
parathion, despite the fact that EXAMS was not designed for use
with wet soil systems.
Bioaccumulation and Aquatic System Simulator
(BASS) BASS' bioconcentrationandbioaccumulationalgorithms
have been validated by comparing its predicted uptake and
elimination rates to values published in the peer-reviewed
literature [77, 78]. These comparisons encompass both a wide
variety offish species, including Atlantic salmon (Salmo salar),
brook trout (Salvelinusfontinalis), brown trout (Salmo trutta),
carp (Cyprinus carpio), fathead minnow (Pimephalespromelas),
flagfish (Jordanella floridae), goldfish (Carassius auratus),
golden orfe (Leuciscus idus), guppy (Poecilia reticulata),
killifish (Oryzias latipes), mosquitofish (Gambusia affinis), and
rainbow trout (Oncorhynchus mykiss), as well as a wide variety
of chemical classes (brominated benzenes, brominated toluenes,
chlorinated anisoles, chlorinatedbenzenes, chlorinated toluenes,
organo-phosphorus pesticides, polybrominated biphenyls
(PBBs), poly-chlorinated aromatic hydrocarbons (PAHs),
poly chlorinated biphenyls (PCBs), polychlorinated
dibenzofurans, polychlorinated dibenzodioxins, polychlorinated
diphenyl ethers, polychlorinated insecticides, etc.). Reduced
major axis regression analysis of observed vs. predicted
exchange rates demonstrated excellent agreement between
predicted rates of gill uptake and elimination vs. published
values.
For organic chemicals FGETS/BASS bioaccumulation algorithms
have also been validated by simulations of mixtures of PCBs in
Lake Ontario salmonids and laboratory studies [79]. In these
studies FGETS/BASS simulations of PCBs in Lake Ontario
salmonids agreed well with observed data, and FGETS/BASS
correctly simulated the relative contribution of gill and dietary
routes of exposure for such hydrophobic chemicals as
polychlorinated dibenzodioxins. For sulfhydryl-binding metals,
BASS' bioconcentration algorithms have been validated by
simulations of methyl mercury bioaccumulation in Florida
Everglades fish communities [80]. As with the Lake Ontario
PCB simulations, BASS methyl mercury simulations of fish
communities in the Florida Everglades agreed well with
observed data. Validation studies of BASS'S bioenergetic growth
algorithms are also available [79, 80].
27
-------
DataBase Documentation
Accurate Monte Carlo simulation requires careful attention to
covariance among input parameters driving the models. The
assumption of independence among variables that are in fact
correlated has on some occasions been found to artificially
deflate, and on others to inflate, the variability observed in
model outputs. "Trace matching" is the term used for parameter
inputs that are based on observational data absent any attempt to
fit distribution functions to the data. This technique is
particularly attractive for geographic and climatological input
datasets, for reasons both of preserving an accurate
representation of extreme events and for robust preservation of
covariances among variables. Several datasets for driving
simulation models are under development; this section
documents their sources and some aspects of initial database
design.
Agricultural Geography
The geographic base for developing pesticide exposure datasets
is the intersection of MLRA (Major Land Resource Areas) and
state boundaries (Figure 12). The Soil Conservation Service
(SCS), in its Agriculture Handbook 296 (AH-296), published a
map and accompanying text description delineating 20 Land
Resource Regions (LRR) of the coterminous United States [81].
Each LRR is subdivided into Major Land Resource Areas
(MLRA); these are the fundamental geography of physiographic
units used in this project. The LRR and MLRA were updated in
1984 through release of a Digital Line Graph (DLG) of the MLRA
by the National Cartographic Center (Fort Worth, Texas) of the
Soil Conservation Service. This update included major and
minor boundary changes, and the elimination of duplicate MLRA
in LRR (e.g., 048A occurs only in LRR D and was removed from
LRR E).
AH-296 assembled available information about the land as a
resource forfarming, ranching, forestry, engineering, recreation,
and other uses. The 1981 release of AH-296 improved upon its
predecessor (AH-296 of 1965) in its refined cartography,
identification of the soils present in each MLRA, description of
the potential natural vegetation of each, and the addition of
Alaska, Hawaii, and Puerto Rico to the inventory. AH-296 was
designed as a resource for making decisions in state-wide,
interstate, regional, and national agricultural planning, as abase
for natural resource inventories, as a framework for
extrapolating results within physiographic units, and for
organizing resource conservation programs.
The land resource categories used at state and national levels are,
in increasing geographical extent, land resource units, land
resource areas, and land resource regions. Land resource units
are geographic areas, usually a few thousand hectares in extent,
that are characterized by a particular pattern of soils, climate,
water resources, and land use. A unit can be a single continuous
area or several separate nearby areas. Land resource units (LRU)
are the basic units from which major land resource areas are
determined. LRU are also the basic units for state land resource
maps. They are usually co-extensive with state general soil map
units, although some general soil map units are subdivided to
produce LRU because of significant geographic distinctions in
climate, water resources, or land use. Major Land Resource
Areas (MLRA) are geographically associated land resource units
(LRU). In AH-296, MLRA are further grouped into Land Resource
Regions (LRR) that are designated by capital letters and
identified with a descriptive geographical name (see Figure 12).
For example, the de scriptive name for Land Resource Region S
is the "Northern Atlantic Slope Diversified Farming Region."
The MLRA subdivisions of the LRR range in number from a
minimum of 2 (Alluvium; Silty Uplands) MLRA divisions in
Region O (Mississippi Delta Cotton and Feed Grains Region),
to a high of 23 MLRA in Region D (Western Range and Irrigated
Region). In land area, the 181 MLRA range from a low of 2,476
km2 (C-16, California Delta sector of Region C - California
Subtropical Fruit, Truck, and Specialty Crop Region), to the
294,252 km2 of the Northern Rocky Mountains (E-43, in the
Rocky Mountain Range and Forest Region), with a median size
of 30,080 km2. Outlines of the MLRA are indicated in Figure 12.
Physiography of LRR and MLRA
AH-296 briefly describes the dominant physical characteristics
of the Land Resource Regions and the Major Land Resource
Areas under the headings of land use, elevation and topography,
climate, water, soils, and potential natural vegetation:
Land use - The extent of the land used for cropland, pasture,
range, forests, industrial and urban development, and other
special purposes is indicated. A list of the principal crops
grown and the type of farming practiced is included.
Elevation and topography - a range in elevation above sea
level and any significant exceptions is provided for the
MLRA as a whole. The topography of the area is described.
Climate - AH-296 gives a range of the annual precipitation for
28
-------
the driest parts of the area to the wettest and the seasonal
distribution of precipitation, plus a range of the average
annual temperature and the average freeze-free period
characteristic of different parts of the MLRA.
Water - Information is provided on surface stream-flow and
ground water, and the source of water for municipal use and
irrigation. MLRAS dependent on otherareas for water supply
and those supplying water to other areas are identified.
Soils - The dominant soils are identified according to the
principal suborders, great groups, and representative soil
series.
Potential natural vegetation - the plant species that the MLRA
can support are identified by their common names.
Meteorology: SAMSON/HUSWO
Although "weather generator" software is available and is
regularly used for water resource and climate studies, it has been
observed that the weather sequences thus generated are weak in
their ability to capture the extreme events that are usually of
greatest importance in risk assessments. In a study [82] of the
USCLIMATE [83,84] and CLIGEN [85] models, the authors remark
that "Annual and monthly precipitation statistics (means,
standard deviations, and extremes) were adequately replicatedby
both models, but daily amounts, particularly typical extreme
amounts in any given year, were not entirely satisfactorily
modeled by either model. USCLIMATE consistently
underestimated extreme daily amounts, by as much as 50%.". In
a study [86] of WGEN [87] (itself an element of USCLIMATE) and
LARS-WG [88] at 18 climatologically diverse sites in the USA,
Europe, and Asia, the authors conclude that the gamma
distribution used in WGEN "probably tends to overestimate the
probability of larger values" of rainfall. This result, although
opposite in tendency to that of [82], is no less undesirable. Both
models had a lower inter-annual variance in monthly mean
precipitationthanthat in the observed data, and neither generator
"performed uniformly well in simulating the daily variances of
the climate variables."
Issues of accurately preservingthe covariance among parameters
can be completely by-passed by using observed synoptic data,
and all danger of generating impossible input scenarios (e.g.,
days of heavy rainfall coupled with maximum drying potential)
can be avoided, given adequate quality assurance of the input
datasets. Weather generator software can be problematic in this
regard; for example, CLIGEN generates temperature, solar
radiation, and precipitation independently of one another, so the
covariance structure of daily sequences is clearly not preserved
in the model outputs. Semenovetal. [86] concluded that failures
to represent variance in LARS-WG and WGEN were "likely to be
due to the observed data containing many periods in which
successive values are highly correlated..."
Data from SAMSON (Solar and Meteorological Surface
Observation Network) is available as a three-volume CD-ROM
disk set that contains observational and modeled meteorological
and solar radiation data for the period 1961-1990. An additional
CD-ROM (HUSWO, Hourly United States Weather Observations)
extends the data set to 1995. Combined data are available for
234 National Weather Service stations in the United States,
Guam and Puerto Rico (Figure 13). Appendix A lists the
available stations by their standard WBAN (Weather Bureau
Army Navy) number, with station location (City, State),
geographic (latitude and longitude) coordinates, and station
elevation (m).
The hourly SAMSON solar elements are: extraterrestrial horizontal
and extraterrestrial direct normal radiation; and global, diffuse,
and direct normal radiation. Meteorological elements are: total
and opaque sky cover, temperature and dew point, relative
humidity, station pressure, wind direction and speed, visibility,
ceiling height, present weather, precipitable water, aerosol
optical depth, snow depth, days since last snowfall, and hourly
precipitation. An additional five years of data (1991-1995) were
acquired on CD from NCDC (National Climatic Data Center) as
the HUSWO (Hourly United States Weather Observations)
product. HUSWO is an update to the SAMSON files. Weather
elements in the files include total and opaque sky cover;
temperature and dew point; relative humidity; station pressure;
wind direction and speed; visibility; ceiling height; present
weather; ASOS cloud layer data; snow depth; and hourly
precipitation for most stations. Stations for which hourly
precipitation is unavailable are indicated in the station list of
Appendix A.
Soils and Land Use
The National Resources Inventory (NRI) is a statistically-based
inventory of land cover and use, soil erosion, prime farmland,
wetlands, and other natural resource characteristics on non-
Federal rural land in the United States. Inventories are conducted
at 5-year intervals by the U.S. Department of Agriculture's
Natural Resources Conservation Service (NRCS, formerly the
Soil Conservation Service (scs)), to determine the condition and
trends in the use of soil, water, and related resources nationwide
and statewide.
The 1992 NRI covered 170 data elements at some 800,000
sample points on the 1.5 billion acres of non-Federal land in the
USA - some 75 percent of the Nation's land area. At each
sample point, information is available for three years-1982,
1987, and 1992. Data is currently being re-summarized for 1997.
Originated as a means of getting accurate natural resource
information to USDA policymakers, the NRI has become useful to
a variety of users. The NRI contains three codes identifying the
geographic location of each point by its Major Land Resource
Area (MLRA), Hydrologic Unit Code (HUC), and County
(representedby five-digit codes). The MLRAS are geographically-
associated land resource units, which in turn are geographic
areas, usually several thousand acres in extent, characterized by
a particular pattern of soils, climate, water resources, and land
use. Hydrologic Unit Codes (HUC) consist of eight digits
denoting major stream drainage basins as defined and digitized
by the U.S. Geological Survey. County five-digit codes ate
29
-------
standard FIPS (Federal Information Processing Standards)
identifiers in which the first two digits identify the State and the
remainingthree the individual Counties. With these geographies
combined in a GIS and with the Federal lands masked out, the
individual regions represent the smallest spatial feature that can
be used to locate NRI samples. Privacy concerns preclude
detailed analysis at this finest level of spatial resolution,
however. NRI data are statistically reliable for national, regional,
state, and sub-state analysis; for this project the state-level
sections of MLRAS have been chosen for summary. The NRI was
scientifically designed and conducted, and is based on
recognized statistical sampling methods. The data are used in
national, state, and local planning, university research, and
private sector analysis. NRI data help shape major environmental
and land-use decisions, and hold considerable potential for
contributing to analysis of potential pesticide off-site migration,
fate and effects.
National Resource Inventory Data Characteristics
Data collected in the 1982, 1987, 1992 and 1997 NRI provide a
basis for analysis of 5-year and 10-year trends in resource
conditions. Many data items in the 1997 NRI are consistent with
previous inventories. In addition, the NRI is linked to the Natural
Resources Conservation Service's extensive Soil Interpretations
Records to provide additional soils information suitable for the
PRZM model.
Data elements consistent within the NRI database are:
Farmstead, urban, and built-up areas
Farmstead and field windbreaks
Streams less than 1/8 mile wide and water bodies less than
40 acres
Type of land ownership
Soils information - soil classification, soil properties, and
soil interpretations such as prime farmland
Land cover/use - cropland, pasture land, rangeland, forest
land, barren land, rural land, urban and built-up areas
The cropland land cover/use category includes areas used for the
production of adapted crops for harvest, including row crops,
small grain crops, hay crops, nursery crops, orchard crops, and
other specialty crops. Cultivated cropland includes land
identified as being in row or close-grown crops, summer fallow,
aquaculture in crop rotation, hayland or pastureland in a rotation
with row or close grown crops, or horticulture that is double
cropped. Also included is "cropland not planted" because of
weather conditions, or because the land is in a USD A set-aside
or similar short-term program, or because of other short-term
circumstances. Non-cultivated cropland includes land that is in
a permanent hayland or horticultural crop cover; hayland that is
managed for the production of forage crops (grasses, legumes)
that are machine harvested; horticultural cropland that is used for
growing fruit, nut, berry, vineyard, and other bush fruit, and
similar crops. Cropland information represented includes:
Cropping history
Irrigation-type and source of water
Erosion data-wind and water
Wetlands-classification of wetlands and deepwater habitats
in the U.S. (not in 1987)
Conservation practices and treatment needed
Potential conversion to cropland
Rangeland condition, apparent trend of condition
New data elements added for the 1992 NRI included:
Streams greater than 1/8 mile wide and water bodies by
kind and size greater than 40 acres
Conservation Reserve Program land under contract
Type of earth cover - crop, tree, shrub, grass-herbaceous,
barren, artificial, water
Forest type group
Primary and secondary use of land and water
Wildlife habitat diversity
Irrigation water delivery system
Food Security Act (FSA) wetland classification
For rangeland areas - range site name and number, woody
canopy, noxious weeds
Concentrated flow, gully, and streambank erosion
Conservation treatment needed
Type of conservation tillage
Stratospheric Ozone from the TOMS
Photochemical transformation of pesticides is driven by sunlight
reaching the surface of the Earth. Higher-energy wavelengths in
the ultraviolet portion of the solar spectrum are in most cases the
most potent for effecting these transformations. Exposure
models that estimate ground-level solar spectral intensity require
access to stratospheric ozone data as an input to calculations of
losses in incoming radiation during passage through the
atmosphere. To meet this requirement, a world-wide database
was developed from the latest release (1996, using Version 7 of
the data reduction algorithm) of ozone data from the TOMS (Total
Ozone Mapping Spectrometer) instrument flown on the Nimbus-
7 spacecraft [89]. The dataset was derived from a 2 CD-ROM set
containing data covering the entire Nimbus-7 TOMS lifetime
(November 1, 1978 through May 6, 1993), given as monthly
averages.
The Ozone Measurement
The Nimbus-7 spacecraft was in a south-to-north,
sun-synchronous polar orbit so that it was always close to local
noon/midnight below the spacecraft Thus, ozone measurements
were taken for the entire world every 24 hours. TOMS directly
measured the ultraviolet sunlight scattered by the Earth's
atmosphere. This NASA-developed instrument measured ozone
indirectly by mapping ultraviolet light emitted by the Sun to that
scattered from the Earth's atmosphere back to the satellite. Total
column ozone was inferred from the differential absorption of
scattered sunlight in the ultraviolet range. Ozone was calculated
by taking the ratio of two wavelengths (312 nm and 331 nm, for
30
-------
example), where one wavelength is strongly absorbed by ozone
while the other is absorbed only weakly. The instrument had a
50 kilometer square field of view at the sub-satellite point. TOMS
collected 35 measurements every 8 seconds as it scanned right
to left, producing approximately 200,000 ozone measurements
daily. These individual measurements varied typically between
100 and 650 Dobson Units (DU) and averaged about 300 DU.
This is equivalent to an 0.3 cm (about a 10th of an inch) thick
layer of pure ozone gas at NTP (Normal Temperature and
Pressure).
The Data Files
Gridded Monthly Average
For each month, the individual TOMS measurements were
averaged into grid cells covering 1 degree of latitude by 1.25
degrees of longitude. The 180x288 ASCII data array contains data
from 90S to 90N, from 180W to 180E. Each ozone value is a 3
digit integer. For each grid cell, at least 20 days of data in any
given month and year were required to be good, for the monthly
average to have been computed. For pesticide exposure
modeling, these files were averaged to provide grand mean
monthly values for the period of record for each grid cell.
Zonal Means
Monthly zonal means were available from the file
\zonalavg\zonalmon.n7t on the 2nd CD. The averages are for 5
degree latitude zones, area-weighted. At least 75% of possible
data in a given zone was required to be present for the mean to
be calculated. In 1978 and 1979 there were missing days when
the TOMS instrument was turned off to conserve power. In the
later years there are at least some data every day. The units of
measurement for the zonal means are Dobson Units. EXAMS'
ozone module defaults to zonal means when the data file cannot
be found.
Problems with the Data
Polar Night TOMS measured ozone using scattered sunlight; it
is not possible to measure ozone when there is no sun (in the
polar regions in winter). Consequently, for example, the
Antarctic polar regions for August and September always have
areas of missing data due to polar night. These gaps were filled
by the expedient of averaging the monthly zonal means across all
available years, interpolating from polar dusk to polar dawn
during periods of continuous darkness, and then substituting
these values for zeros remaining in the monthly gridded dataset
after incorporating all available monthly gridded data.
Missing Data During 1978 and 1979 the TOMS instrument was
turned off periodically to conserve power, including a 5-day
period (6/14- 6/18) in June 1979. On many days, data were lost
due to missing orbits or other problems. The sample size among
grid cells is thus not identical. The variance (2 S.E.) in the ozone
data over the 14-year lifetime of the instrument is, however, only
1.5%.
High Terrain The ozone reported is total column ozone to the
ground. Over high mountains (the Himalayas, the Andes) low
ozone will be noticed relative to surrounding low terrain. This is
not an error.
Pesticide Usage
Current pesticide use patterns and rates can be of value for
evaluating exposure of entire classes of compounds (e.g., the
organophosphorus insecticides) or the actual usage pattern of
single registered compounds.
Both EPA [90, 91] and USDA accumulate data on the sale and
use of pesticides. USDA pesticide use surveys include eight
benchmark years (1964, 1966, 1971, 1976, 1982, 1990, 1991,
and 1992) [92]. Consistent information over time is, however,
only available for eleven crops: corn, cotton, soybeans, wheat,
rice, grain sorghum, peanuts, fall potatoes, other vegetables,
citrus, and apples. Under the sponsorship of EPA, USDA, and
the Water Resources Division of USGS, the National Center for
Food and Agricultural Policy (NCFAP) assembled a
comprehensive database of pesticide use in American agriculture
[93]. The NCFAP database is not specific to any particular year;
it is a summary compilation of studies conducted by public
agencies over the four-year period 1990-1993, including
National Agricultural Statistics Service (NASS) surveys of
pesticide use in field crops, vegetable crops, and fruit and
nut crops,
reports funded by the USDA Cooperative Extension Service
(CES),
pesticide benefit assessments from the USDA National
Agricultural Pesticide Impact Assessment Program
(NAPIAP), and
State of California compilations of farmers' pesticide use
records, supplemented by
NCFAP surveys of Extension Service specialists, and,
where necessary, imputations developed from the
assumption that neighboring States' pesticide use profiles
are similar.
The 15,740 individual use records in the database - covering 200
active ingredients (a.i) and 87 crops - are State-level point
estimates focused on two use coefficients: (1) the percent of a
crop's acreage in a State treated with an individual a.i., and (2)
the average annual application rate of the active ingredient per
treated acre.
Because these data represent the average application and
treatment rates by State, they do not yield precise estimates of
use at the sub-State level. The State use coefficients represent an
average for the entire State and consequently do not reflect the
local variability of cropping and management practices. This is
an irreducible uncertainty not readily amenable to quantification.
The reliability of these (State-level) estimates can, however, be
evaluated from NASS (National Agricultural Statistics Service)
assessments of the coefficient of variation (c.v.) or percentage
31
-------
relative standard error (% rse) of data in the NASS chemical
usage reports used in building the NCFAP database [94-96]. The
variability due to sampling error is calculated for all chemical
and acreage variables in the NASS surveys, and expressed as a
percentage of the estimate. Table 9 shows the entire range of
sampling variability for percent of acres treated and application
rate for each crop class (field crops, fruits, vegetables) across all
crops and States surveyed. The particular value to be used
should be selected based upon the number of reports used to
develop an estimate for a particular crop in a particular State; for
aggregated totals (e.g., all field crops within a multi-State region,
combined usage of organo-phosphorus insecticides within an
entire State, etc.), the combined sample size should be used to
select the appropriate % rse.
These data can be used to calculate approximate confidence
limits, and to evaluate the significance of inter-annual variability
in the source data underlying the NCFAP database. In general
the % rse can be interpreted by imagining that the surveys are
repeated many times using the same sample size: in two out of
three cases, the outcome would not differ from the database
value by more than the stated sampling variability. Approximate
confidence bands can be calculated by applying values from
Table 2 to NCFAP data elements. For example, if a tabulated
value gives 20% of a field crop treated with a specific pesticide,
the (66%) confidence band for a State with few reports would be
20±(20x0.35) or 20±7 percent of the crop acreage. For a State
with a large sample size, the confidence interval would be
20±(20x().10) or 20±2 percent of the crop acreage. For
comparison of values, an overlap of confidence bands at twice
the % rse (i.e., 2 standard errors) indicates that the estimates
have only a 1 in 20 chance of being genuinely different.
Table 9. Reliability statement (sampling variability expressed as percentage relative standard error (rse) of the estimate) of
1992 pesticide data (Fruits survey conducted in 1993 crop year) in NASS agricultural chemical usage reports. The rse to be
applied to a specific datum depends on the size of the sample used to develop the item; the table entries indicate the range
of rse encountered in the data sets
Tabulated % Field Crops (1992 Crop Year)
acres treated
Fruits (1993 Crop Year)
Vegetables (1992 Crop Year)
Acres Treated Appl. Rate Acres Treated Appl. Rate Acres Treated Appl. Rate
< 10
10-24
25 -49
50-75
>75
40-100
10-35
5-15
5-15
1-5
1-60
5-35
1-30
5-25
1-10
25-90
15-65
10-35
5-20
1-10
1-30
1-20
1-20
1-15
1-5
35-85
20-70
10-40
5-20
1-5
1-10
1-10
1-10
1-10
1-10
32
-------
State Parts of MLR A
Absu EqialArca Pro|*etbi
400 600 Kilometers
Figure 12. Land Resource Regions and Major Land Resource Areas with State Boundaries.
LAND RESOURCE REGIONS
] A Northwestern Forest. Forage, and Specialty Crop Region
J B Northwestern Wheat and Range Region
j C California Subtropical Fruit. Truck, and Specialty Crop Region
] D Western Range and irrigated Region
] E Rocky Mountain Range and Forest Region
] F Northern Great Plains Spring Wheat Region
] G Western Great Plains Range and Irrigated Region
| H Central Great Plains Winter Wheat and Range Region
] I Southwest Plateaus and Plains Range and Cotton Region
j J Southwestern PraJries Cotton and Forage Region
] K Northern Lake States Forest and Forage Region
] L Lake States Fruit, Truck, and Dairy Region
J M Central Feed Grains and Livestock Region
| | N East and Central Farming and Forest Region
| O Mississippi Delta Cotton and Feed Grains Region
U P South Atlantic and Gulf Slope Cash Crops, Forest, and Livestock Region
J R Northeastern Forage and Forest Region
] S Northern Atlantic Slope Diversified Farming Region
] T Atlantic and Gulf Coast Lowland Forest and Crop Region
I U Florida Subtropical Fruit. Truck Crop, and Range Region
33
-------
SAMSON/HUSWO Stations and State Parts of MLRA
Primary Solar Radiation Stations
• Derived Solar Radiation Stations
Albers Equal Area Projection
Figure 13. Location of SAMSON/HUSWO Stations for Integrated Climatological Database.
34
-------
Appendix: SAMSON/HUSWO Stations
Weather Bureau Army Navy (WBAN) number, station location (City and State), geographic coordinates (latitude and longitude) and
elevation (m M. S.L.) of the 234 stations available for coordinated climatological dataset for AgDisp, PRZM, and EXAMS (see Figure
13). Primary stations (those with measured solar radiation data for at least one year) are in bold type; stations lacking hourly
precipitation data are italicized.
WBAN
03103
03812
03813
03820
03822
03856
03860
03870
03927
03928
03937
03940
03945
03947
04725
04751
11641
12834
12836
Station
Flagstaff
Asheville
Macon
Augusta
Savannah
Huntsville
Huntington
Greenville
Fort Worth
Wichita
Lake
Charles
Jackson
Columbia
Kansas City
Binghamton
Bradford
San Juan
Daytona
Beach
Key West
State
AZ
NC
GA
GA
GA
AL
WV
SC
TX
KS
LA
MS
MO
MO
NY
PA
PR
FL
FL
Lat.
35.1
35.4
32.7
33.4
32.1
34.7
38.4
34.9
32.8
37.7
30.1
32.3
38.8
39.3
42.2
41.8
18.4
29.2
24.6
Long.
-111.7
-82.5
-83.7
-82.0
-81.2
-86.8
-82.6
-82.2
-97.1
-97.4
-93.2
-90.1
-92.2
-94.7
-76.0
-78.6
-66.0
-81.1
-81.8
Elev.
(m)
2135
661
110
45
16
190
255
296
164
408
3
101
270
315
499
600
19
12
1
12839
12842
12844
12912
12916
12917
12919
12921
12924
12960
13722
13723
13729
13733
13737
13739
13740
13741
13748
13781
13865
Miami
Tampa
West Palm
Beach
Victoria
New Orleans
Port Arthur
Brownsville
San Antonio
Corpus
Christi
Houston
Raleigh
Greensboro
Elkins
Lynchburg
Norfolk
Philadelphia
Richmond
Roanoke
Wilmington
Wilmington
Meridian
FL
FL
FL
TX
LA
TX
TX
TX
TX
TX
NC
NC
WV
VA
VA
PA
VA
VA
NC
DE
MS
25.8
28.0
26.7
28.9
30.0
30.0
25.9
29.5
27.8
30.0
35.9
36.1
38.9
37.3
36.9
39.9
37.5
37.3
34.3
39.7
32.3
-80.3
-82.5
-80.1
-96.9
-90.3
-94.0
-97.4
-98.5
-97.5
-95.4
-78.8
-80.0
-79.9
-79.2
-76.2
-75.3
-77.3
-80.0
-77.9
-75.6
-88.8
2
3
6
32
o
J
7
6
242
13
33
134
270
594
279
9
9
50
358
9
24
94
35
-------
WBAN
13866
13873
13874
13876
13877
13880
13881
13882
13883
13889
13891
13893
13894
13895
13897
13957
13958
13959
13962
13963
13964
13966
13967
13968
13970
13985
13994
13995
13996
Station
Charleston
Athens
Atlanta
Birmingham
Bristol
Charleston
Charlotte
Chattanooga
Columbia
Jacksonville
Knoxville
Memphis
Mobile
Montgomery
Nashville
Shreveport
Austin
Waco
Abilene
Little Rock
Fort Smith
Wichita Falls
Oklahoma
City
Tulsa
Baton Rouge
Dodge City
St. Louis
Springfield
Topeka
State
WV
GA
GA
AL
TN
SC
NC
TN
SC
FL
TN
TN
AL
AL
TN
LA
TX
TX
TX
AR
AR
TX
OK
OK
LA
KS
MO
MO
KS
Lat.
38.4
34.0
33.7
33.6
36.5
32.9
35.2
35.0
34.0
30.5
35.8
35.1
30.7
32.3
36.1
32.5
30.3
31.6
32.4
34.7
35.3
34.0
35.4
36.2
30.5
37.8
38.8
37.2
39.1
Long.
-81.6
-83.3
-84.4
-86.8
-82.4
-80.0
-80.9
-85.2
-81.1
-81.7
-84.0
-90.0
-88.3
-86.4
-86.7
-93.8
-97.7
-97.2
-99.7
-92.2
-94.4
-98.5
-97.6
-95.9
-91.2
-100.0
-90.4
-93.4
-95.6
Elev.
(m)
290
244
315
192
459
12
234
210
69
9
299
87
67
62
180
79
189
155
534
81
141
314
397
206
23
787
172
387
270
14607
14733
14734
14735
14737
14739
14740
14742
14745
14751
14764
14765
14768
14771
14777
14778
14820
14821
14826
14827
14836
14837
14839
14840
14842
14847
14848
14850
14852
Caribou
Buffalo
Newark
Albany
Allentown
Boston
Hartford
Burlington
Concord
Harrisburg
Portland
Providence
Rochester
Syracuse
Wilkes-Barre
Williamsport
Cleveland
Columbus
Flint
Fort Wayne
Lansing
Madison
Milwaukee
Muskegon
Peoria
Sault Ste.
Marie
South Bend
Traverse City
Youngstown
ME
NY
NJ
NY
PA
MA
CT
VT
NH
PA
ME
RI
NY
NY
PA
PA
OH
OH
MI
IN
MI
WI
WI
MI
IL
MI
IN
MI
OH
46.9
42.9
40.7
42.8
40.7
42.4
41.9
44.5
43.2
40.2
43.7
41.7
43.1
43.1
41.3
41.3
41.4
40.0
43.0
41.0
42.8
43.1
43.0
43.2
40.7
46.5
41.7
44.7
41.3
-68.0
-78.7
-74.2
-73.8
-75.4
-71.0
-72.7
-73.2
-71.5
-76.9
-70.3
-11 A
-77.7
-76.1
-75.7
-77.1
-81.9
-82.9
-83.7
-85.2
-84.6
-89.3
-87.9
-86.3
-89.7
-84.4
-86.3
-85.6
-80.7
190
215
9
89
117
5
55
104
105
106
19
19
169
124
289
243
245
254
233
252
256
262
211
191
199
221
236
192
361
36
-------
WBAN Station
State Lat. Long. Elev.
(m)
14860
14891
14895
14898
14913
14914
14918
14920
14922
14923
14925
14926
14933
14935
14936
14940
14941
14943
14944
14991
21504
22516
22521
22536
23023
23034
23042
23044
23047
Erie
Mansfield
Akron
Green Bay
Duluth
Fargo
International
Falls
La Crosse
Minneapolis
Moline
Rochester
Saint Cloud
Des Moines
Grand Island
Huron
Mason City
Norfolk
Sioux City
Sioux Falls
Eau Claire
Hilo
Kahului
Honolulu
Lihue
Midland/
Odessa
San Angelo
Lubbock
El Paso
Amarillo
PA
OH
OH
WI
MN
ND
MN
WI
MN
IL
MN
MN
IA
NE
SD
IA
NE
IA
SD
WI
HI
HI
HI
HI
TX
TX
TX
TX
TX
42.1
40.8
40.9
44.5
46.8
46.9
48.6
43.9
44.9
41.5
43.9
45.6
41.5
41.0
44.4
43.2
42.0
42.4
43.6
44.9
19.7
20.9
21.3
22.0
31.9
31.4
33.7
31.8
35.2
-80.2
-82.5
-81.4
-88.1
-92.2
-96.8
-93.4
-91.3
-93.2
-90.5
-92.5
-94.1
-93.7
-98.3
-98.2
-93.3
-97.4
-96.4
-96.7
-91.5
-155.1
-156.4
-157.9
-159.4
-102.2
-100.5
-101.8
-106.4
-101.7
225
395
377
214
432
274
361
205
255
181
402
313
294
566
393
373
471
336
435
273
11
15
5
45
871
582
988
1194
1098
23050
23061
23065
23066
23129
23153
23154
23155
23160
23161
23169
23174
23183
23184
23185
23188
23232
23234
23273
24011
24013
24018
24021
24023
24025
24027
24028
24029
24033
Albuquerque
Alamo sa
Goodland
Grand
Junction
Long Beach
Tonopah
Ely
Bakersfield
Tucson
Daggett
Las Vegas
Los Angeles
Phoenix
Prescott
Reno
San Diego
Sacramento
San
Francisco
Santa Maria
Bismarck
Minot
Cheyenne
Lander
North Platte
Pierre
Rock Springs
Scottsbluff
Sheridan
Billings
NM
CO
KS
CO
CA
NV
NV
CA
AZ
CA
NV
CA
AZ
AZ
NV
CA
CA
CA
CA
ND
ND
WY
WY
NE
SD
WY
NE
WY
MT
35.1
37.5
39.4
39.1
33.8
38.1
39.3
35.4
32.1
34.9
36.1
33.9
33.4
34.7
39.5
32.7
38.5
37.6
34.9
46.8
48.3
41.2
42.8
41.1
44.4
41.6
41.9
44.8
45.8
-106.6
-105.9
-101.7
-108.5
-118.2
-117.1
-114.9
-119.1
-110.9
-116.8
-115.2
-118.4
-112.0
-112.4
-119.8
-117.2
-121.5
-122.4
-120.5
-100.8
-101.3
-104.8
-108.7
-100.7
-100.3
-109.1
-103.6
-107.0
-108.5
1619
2297
1124
1475
17
1653
1906
150
779
588
664
32
339
1531
1341
9
8
5
72
502
522
1872
1696
849
526
2056
1206
1209
1088
37
-------
WBAN Station
State Lat. Long. Elev.
(m)
24089
24090
24121
24127
24128
24131
24143
24144
24146
24153
24155
24156
24157
24221
24225
24227
24229
24230
24232
24233
24243
24283
24284
25308
25339
25501
25503
25624
Casper
Rapid City
Elko
Salt Lake
City
Winnemucca
Boise
Great Falls
Helena
Kalispell
Missoula
Pendleton
Pocatello
Spokane
Eugene
Medford
Olympia
Portland
Redmond/
Bend
Salem
Seattle/
Tacoma
Yakima
Arcata
North Bend
Annette
Yakutat
Kodiak
King Salmon
Cold Bay
WY
SD
NV
UT
NV
ID
MT
MT
MT
MT
OR
ID
WA
OR
OR
WA
OR
OR
OR
WA
WA
CA
OR
AK
AK
AK
AK
AK
42.9
44.1
40.8
40.8
40.9
43.6
47.5
46.6
48.3
46.9
45.7
42.9
47.6
44.1
42.4
47.0
45.6
44.3
44.9
47.5
46.6
41.0
43.4
55.0
59.5
57.8
58.7
55.2
-106.5
-103.1
-115.8
-112.0
-117.8
-116.2
-111.4
-112.0
-114.3
-114.1
-118.9
-112.6
-117.5
-123.2
-122.9
-122.9
-122.6
-121.2
-123.0
-122.3
-120.5
-124.1
-124.3
-131.6
-139.7
-152.3
-156.7
-162.7
1612
966
1547
1288
1323
874
1116
1188
904
972
456
1365
721
109
396
61
12
940
61
122
325
69
5
34
9
34
15
29
25713
26411
26415
26425
26451
26510
26528
26533
26615
26616
26617
27502
41415
93037
93058
93129
93193
93721
93729
93730
93738
93805
93814
93815
93817
93819
93820
St Paul Is.
Fairbanks
Big Delta
Gulkana
Anchorage
Mcgrath
Talkeetna
Settles
Bethel
Kotzebue
Nome
Barrow
Guam
Colorado
Springs
Pueblo
Cedar City
Fresno
Baltimore
Cape
Hatteras
Atlantic City
Sterling
(Washington-
Dulles Airpt.)
Tallahassee/
Apalachicola
Covington
Dayton
Evansville
Indianapolis
Lexington
AK
AK
AK
AK
AK
AK
AK
AK
AK
AK
AK
AK
PI
CO
CO
UT
CA
MD
NC
NJ
VA
FL
KY
OH
IN
IN
KY
57.2
64.8
64.0
62.2
61.2
63.0
62.3
66.9
60.8
66.9
64.5
71.3
13.6
38.8
38.3
37.7
36.8
39.2
35.3
39.5
39.0
30.4
39.1
39.9
38.1
39.7
38.0
-170.2
-147.9
-145.7
-145.5
-150.0
-155.6
-150.1
-151.5
-161.8
-162.6
-165.4
-156.8
-144.8
-104.7
-104.5
-113.1
-119.7
-76.7
-75.6
-74.6
-77.5
-84.4
-84.7
-84.2
-87.5
-86.3
-84.6
7
138
388
481
35
103
105
205
46
5
7
4
110
1881
1439
1712
100
47
2
20
82
21
271
306
118
246
301
38
-------
WBAN Station
State Lat. Long. Elev.
(m)
93821
93822
93842
93987
94008
940 187
23062
94185
94224
94240
94702
94725
947287
14732
Louisville
Springfield
Columbus
Lufkin
Glasgow
Boulder/
Denver
Burns
Astoria
Quillayute
Bridgeport
Massena
New York
(LGA)
KY
IL
GA
TX
MT
CO
OR
OR
WA
CT
NY
NY
38.2
39.8
32.5
31.2
48.2
39.8
43.6
46.2
48.0
41.2
44.9
40.8
-85.7
-89.7
-85.0
-94.8
-106.6
-104.9
-119.1
-123.9
-124.6
-73.1
-74.9
-73.9
149
187
136
96
700
1610
1271
7
55
2
63
11
94814
94822
94823
94830
94846
94847
94849
94860
94910
949187
14942
Houghton
Rockford
Pittsburgh
Toledo
Chicago
Detroit
Alpena
Grand Rapids
Waterloo
Omaha
MI
IL
PA
OH
IL
MI
MI
MI
IA
NE
47.2
42.2
40.5
41.6
41.8
42.4
45.1
42.9
42.6
41.3
-88.5
-89.1
-80.2
-83.8
-87.8
-83.0
-83.6
-85.5
-92.4
-95.9
329
221
373
211
190
191
210
245
265
298
94746 Worcester
MA 42.3
-71.9
301
39
-------
References
1. Urban DJ, Cook NJ. 1986. Ecological Risk Assessment.
Hazard Evaluation Division Standard Evaluation Procedure
EPA 540/9-85-001. US EPA Office of Pesticide Programs,
Washington, DC, USA.
2. NRC. 1983. Risk Assessment in the Federal Government:
Managing the Process. National Research Council,
Committee on the Institutional Means for Assessment of
Risks to Public Health. National Academy Press,
Washington, DC, USA.
3. Carsel RF, Mulkey LA, Lorber MN, Baskin LB. 1985. The
pesticide root zone model (PRZM): A procedure for
evaluation pesticide leaching threats to groundwater. Ecol
Model 30:49-69.
4. Burns LA. 2000. Exposure Analysis Modeling System
(EXAMS): User Manual and System Documentation.
EPA/600/R-00/081. U. S. Environmental Protection
Agency, Athens, GA, USA.
5. CENR. 1999. Ecological risk assessment under FIFRA. In
Ecological Risk Assessment in the Federal Government
CENR/5-99/001. National Science and Technology
Council, Committee onEnvironmentandNaturalResources
(CENR), Washington, DC, USA, pp 3.1-3.11.
6. EPA. 1998. Guidelines for Ecological Risk Assessment.
FedRegist 63:26846-26924.
7. EPA. 1993. A Review of Ecological Assessment Case
Studies from a Risk Assessment Perspective. Risk
AssessmentForumEPA/630/R-92/005. U.S. Environmental
Protection Agency, Washington, DC, USA.
8. SETAC. 1994. Final Report: Aquatic Risk Assessmentand
Mitigation Dialogue Group. Society of Environmental
Toxicology and Chemistry, Pensacola, FL, USA.
9. NRC. 1994. Science and Judgment in Risk Assessment.
National Research Council, Board on Environmental
Studies and Toxicology. National Academy Press,
Washington, DC, USA.
10. HansenF. 1997. Policy for Use of Probabilistic Analysis in
Risk Assessment at the U.S. Environmental Protection
Agency. In Memorandum dated May 15, 1997: Use of
Probabilistic Techniques (IncludingMonte Carlo Analysis)
in Risk Assessment. U. S. Environmental Protection
Agency, Washington, DC, USA.
11. Firestone M, Fenner-Crisp P, Barry T, Bennett D, Chang S,
Callahan M, Burke A, Michaud J, Olsen M, Cirone P,
Barnes D, Wood WP, Knott SM. 1997. Guiding Principles
for Monte Carlo Analysis. EPA/630/R-97/001. U.S.
Environmental Protection Agency, Washington, DC, USA.
12. Gallagher K, Touart L, Lin J, Barry T. 2001. A Probabilistic
Model and Process to Assess Risks to Aquatic Organisms.
Report prepared for May 13-16, 2001 FIFRA Scientific
Advisory Panel Meeting; available at
http://www.epa.gov/scipoly/sap/2001/march/aquatic.pdf
U.S. Environmental Protection Agency, Washington, DC,
USA.
13. SAP. 2001. Probabilistic Models and Methodologies:
Advancing the Ecological Risk Assessment Process in the
EPA Office of Pesticide Programs. Report of FIFRA
Scientific Advisory Panel Meeting, March 13-16, 2001,
held at the Sheraton Crystal City Hotel, Arlington, Virginia.
SAP Report No. 2001-06, available at
http://www.epa.gov/scipoly/sap/2001/march/marchl3200
l.pdf. US Environmental Protection Agency FIFRA
Scientific Advisory Panel, Washington, DC, USA.
14. Oreskes N. 2000. Why believe a computer? Models,
measures, and meaning in the natural world. In
Schneiderman JS, ed, The Earth Around Us: Maintaining
a Livable Planet. W. H. Freeman and Company, New York,
NY, USA, pp 70-82.
15. Sunzenauer I. 1997. 1000 questions. E-mail dated
07/08/1997.
16. OreskesN. 1998. Evaluation (not validation) of quantitative
models. Environ Health Perspect 106 (Suppl. 6): 1453-1460.
17. Maloszewski P, Zuber A. 1992. On the calibration and
validation of mathematical models for the interpretation of
tracer experiments in groundwater. Advances in Water
Resources 15:47-62.
18. Mihram GA. 1972. Some practical aspects of the
verification and validation of simulation models.
Operational Research Quarterly 23:17-29.
19. ThomannRV. 1982. Verification of water quality models.
JEnvir Eng Div, ProcASCE 108:923-940.
20. Oreskes N, Shrader-Frechette K, Belitz K. 1994.
Verification, validation, and confirmation of numerical
models in the earth sciences. Science 263:641-646.
21. Gold HJ. 1977. Mathematical Modeling of Biological
Systems. John Wiley & Sons, New York, NY, USA.
22. Popper KR. 1962. Conjectures and Refutations: the Growth
of Scientific Knowledge. Basic Books, New York, NY,
USA.
23. Popper KR. 1968. The Logic of Scientific Discovery. 3rd
(revised) ed. Hutchinson, London, UK.
40
-------
24. KuhnTS. 1970. The Structure of Scientific Revolutions. 2nd
ed, International Encyclopedia of Unified Science, Vol 2.
University of Chicago Press, Chicago, IL, USA.
25. Cullen AC, Frey HC. 1999. Probabilistic Techniques in
Exposure Assessment: a Handbook for Dealing with
Variability and Uncertainty in Models and Inputs. Plenum
Press, New York, NY, USA.
26. IAEA. 1989. Evaluatingthe Reliability of Predictions Made
Using Environmental Transfer Models. Safety Series 100.
International Atomic Energy Agency, Vienna, Austria.
27. Kaplan S, Garrick BJ. 1981. On the quantitative definition
of risk. Risk Analysis 1:11-27.
28. Ulam SM, von Neumann J. 1945. Random ergodic
theorems. Bulletin of the American Mathematical Society
51:660.
29. Metropolis N, Ulam S. 1949. The Monte Carlo method.
Journal of the American Statistical Association 44:335-341.
30. EPA. 1997. Exposure Factors Handbook. Volume I -
General Factors. EPA/600/P-95/002Fa. U.S. Environmental
Protection Agency, Washington, DC, USA.
31. Farrar D. 2001. Distributions for risk assessments: Some
regulatory and statistical perspectives (in preparation), hi
Pellston Workshop on the Application of Uncertainty
Analysis to Ecological Risks of Pesticides. SETAC Press,
Pensacola, FL, USA.
32. BurnsLA. 1983. Validation of exposure models: the role of
conceptual verification, sensitivity analysis, and alternative
hypotheses. In Bishop WE, Cardwell RD, Heidolph BB,
eds, Aquatic Toxicology and Hazard Assessment, Vol
ASTM STP 802. American Society for Testing and
Materials, Philadelphia, PA, USA, pp 255-281.
33. KonikowLF, Bredehoeft JD. 1992. Ground-water models
cannot be validated. Advances in Water Resources 15:75-
83.
34. Anderson MP, Woessner WW. 1992. The role of the
postaudit in mo del validation. A dvances in Water Resources
15:167-173.
35. Mossman DJ, Schnoor JL. 1989. Post-audit study of
dieldrin bioconcentration model. Journal of Environmental
Engineering 115:675-679.
36. Beck MB. 1987. Water quality modeling: A review of the
analysis of uncertainty. Water Resour Res 23:1393-1442.
37. SargentRG. 1982. Verification and validation of simulation
models. In Cellier FE, ed, Progress in Modelling and
Simulation. Academic Press, London, UK, pp 159-169.
38. Rastetter EB. 1996. Validating models of ecosystem
response to global change. BioScience 46:190-198.
39. Aldenberg T, Janse JH, Kramer PRO. 1995. Fitting the
dynamic model PCLake to a multi-lake survey through
Bayesian statistics. Ecol Model 78:83-99.
40. Balci O, Sargent RG. 1981. A methodology for cost-risk
analysis in the statistical validation of simulation models.
Communications of the A CM 24:190-197.
41. Beck MB, Ravetz JR, Mulkey LA, Barnwell TO. 1997. On
the problem of model validation for predictive exposure
assessments. Stochastic Hydrology and Hydraulics 11:229-
254.
42. Flavelle P. 1992. A quantitative measure of model
validation and its potential use for regulatory purposes.
Advances in Water Resources 15:5-13.
43. Luis SM, McLaughlinD. 1992. A stochastic approach to
model validation. A dvances in Water Resources 15:15-32.
44. Lynch DR, Davies AM, eds. 1995. Quantitative Skill
Assessment for Coastal Ocean Models, Coastal and
Estuarine Studies Vol 47. American Geophysical Union,
Washington, DC, USA.
45. Marcus AH, Elias RW. 1998. Some useful statistical
methods for model validation. Environ Health Perspect 106
(Suppl. 6):1541-1550.
46. Mayer DG, Butler DG. 1993. Statistical validation. Ecol
Model 68:21-32.
47. OrenTI. 1981. Concepts and criteria to assess acceptability
of simulation studies: A frame of reference.
Communications of the A CM 24:180-189.
48. ParrishRS, Smith CN. 1990. A method for testing whether
model predictions fall within a prescribed factor of true
values, with an application to pesticide leaching. Ecol
Model 51:59-72.
49. Priesendorfer RW, Barnett TP. 1983. Numerical model-
reality intercomparison tests using small-sample statistics.
Journal of the Atmospheric Sciences 40.
50. Reckhow KH, Chapra SC. 1983. Confirmation of water
quality models. Ecol Model 20:113 -13 3.
51. Reckhow KH, Clements JT, Dodd RC. 1990. Statistical
evaluation of mechanistic water-quality models. Journal of
Environmental Engineering 116:250-268.
52. Shaeffer DL. 1980. A model evaluation methodology
applicable to environmental assessment models. Ecol Model
8:275-295.
53. VenkatramA. 1982. A framework for evaluating air quality
models. Boundary-Layer Meteorol 24:371-385.
54. Willmott CJ, Ackleson SG, Davis RE, Feddema JJ, Klink
KM, Legates DR, ODonnell J, Rowe CM. 1985. Statistics
for the evaluation and comparison of models. J Geophys
Res 90:8995-9005.
55. Mitchell PL. 1997. Misuse of regression for empirical
validation models. Agric Syst 54:313-326.
56. LassiterRR, ParrishRS, Burns LA. 1986. Decomposition
by planktonic and attached microorganisms improves
chemical fate models. Environ Toxicol Chem 5:29-39.
57. Shrader-Frechette KS. 1994. Science, environmental risk
assessment, and the frame problem. BioScience 44:548-551.
58. Freund JE. 1962. Mathematical Statistics. Prentice-Hall,
Englewood Cliffs, NJ, USA.
59. Diamond WJ. 1981. Practical Experiment Designs for
Engineers and Scientists. Lifetime Learning Publications,
Belmont, CA, USA.
60. USEPA. 1982. Testing for Field Applicability of Chemical
Exposure Models. Workshop on Field Applicability
Testing, 15-18 March 1982. Exposure Modeling Committee
Report. US Environmental Protection Agency, Athens, GA,
USA.
41
-------
61. Shapiro SS, Wilk MB. 1965. An analysis of variance test
for normality (complete samples). Biometrika 52:591-611.
62. Bratley P, Fox BL, Schrage LE. 1987. A Guide to
Simulation. 2nded. Springer- Verlag, New York, NY, USA.
63. Zepp RG, Cline DM. 1977. Rates of direct photolysis in
aquatic environment. Environ Sci Technol 11:359-366.
64. Iqbal M. 1983. An Introduction to Solar Radiation.
Academic Press, New York, NY, USA.
65. Green AES, Schippnick PF. 1982. UV-B reaching the
surface. In Calkins J, ed, The Role of Solar Ultraviolet
Radiation in Marine Ecosystems. Plenum Press, New York,
NY, USA, pp 5-27.
66. Teske ME, Bird SL, Esterly DM, Curbishley TB, Ray SL,
Perry SG. 2001. AgDRIFT®: A model for estimating near-
field spray drift. Environ Toxicol Chem (Accepted).
67. Bird SL, Perry SG, Teske ME. 2001. Evaluation of the
AgDRIFT® aerial spray drift model. Environ Toxicol Chem
(Accepted).
68. Jones RL, Russell MH, eds. 2000. FIFRA Environmental
Model Validation Task Force Final Report (Penultimate
Draft).
69. Pollard JE, Hern SC. 1985. A field test of the Exams model
in the Monongahela River. Environ Toxicol Chem 4:362-
369.
70. Schramm K-W, Hirsch M, Twele R, Hutzinger O. 1988.
Measured and modeled fate of Disperse Yellow 42 in an
outdoor pond. Chemosphere 17:587-595.
71. Kolset K, Aschjem BF, Christopherson N, Heiberg A,
Vigerust B. 1988. Evaluation of some chemical fate and
transport models. A case study on the pollution of the
Norrsundet Bay (Sweden). In Angeletti G, Bjerseth A, eds,
Organic Micropollutants in the Aquatic Environment
(Proceedings of the Fifth European Symposium, held in
Rome, Italy October 20-22, 1987). Kluwer Academic
Publishers, Dordrecht, pp 372-386.
72. Kolset K, Heiberg A. 1988. Evaluation of the 'fugacity'
(FEQUM) and the 'EXAMS' chemical fate and transport
models: A case study on the pollution of the Norrsundet
Bay (Sweden). Water Sci Technol 20:1-12.
73. Armbrust KL, Okamoto Y, Grochulska J, Barefoot AC.
1999. Predicting the dissipation of bensurfuron methyl and
azimsulfuron in rice paddies using the computer model
EXAMS2. JPest Sci 24:357-363.
74. Tynan P, Watts CD, Sowray A, Hammond I. 1991. Field
measurement and modelling for styrene, xylenes,
dichlorobenzenes and 4-phenyldodecane. ^Proceedings of
the 6th European Symposium on Aquatic Environment,
1990, pp 20-37.
75. Games LM. 1982. Field validation of Exposure Analysis
Modeling System (Exams) in a flowing stream In Dickson
KL, Maki AW, Cairns J, Jr., eds, Modeling the Fate of
Chemicals in the Aquatic Environment. Ann Arbor Science
Publ., Ann Arbor, Michigan, pp 325-346.
76. Sanders PF, Seiber JN. 1984. Organophosphorus pesticide
volatilization: Model soil pits and evaporation ponds. In
Kreuger RF, Seiber JN, eds, Treatment and Disposal of
Pesticide Wastes. ACS Symposium Series, Vol 259.
American Chemical Society, Washington, D.C., pp 279-
295.
77. Barber MC, Suarez LA, Lassiter RR. 1988. Modeling
bioconcentration of non-polar organic pollutants by fish.
Environ Toxicol Chem 7:545-558.
78. Barber MC. 2002. A comparison of models for predicting
chemical bioconcentration in fish. Can J Fish Aquat Sci (In
preparation).
79. Barber MC, Suarez LA, Lassiter RR. 1991. Modelling
bioaccumulation of organic pollutants in fish with an
application to PCBs in Lake Ontario salmonids. Can JFish
Aquat Sci 48:318-337.
80. Barber MC. 2001. Bioaccumulation and Aquatic System
Simulator (BASS). User's Manual Beta Test Version 2.1.
EPA/600/R-01/03 5 .U.S. Environmental Protection Agency,
Office of Research and Development, Athens, GA, USA.
81. SCS. 1981. Land Resource Regions and Major Land
Resource Areas of the United States. Agriculture Handbook
296. United States Department of Agriculture Soil
Conservation Service, Washington, DC, USA.
82. Johnson GL, Hanson CL, Hardegree SP, BallardEB. 1996.
Stochastic weather simulation: Overview and analysis of
two commonly used models. Journal of Applied
Meteorology 35:1878-1896.
83. Hanson CL, Gumming KA, Woolhiser DA, Richardson
CW. 1993. Program for Daily Weather Simulation. Water
Resources Investigations Rep. 93-4018. U.S. Geological
Survey, Denver, CO, USA.
84. Hanson CL, Gumming KA, Woolhiser DA, Richardson
CW. 1994. Microcomputer Program for Daily Weather
Simulation in the Contiguous United States. Agricultural
Research Service ARS-114. U.S. Department of
Agriculture, Boise, ID, USA.
85. Nicks AD, Gander GA. 1994. CLIGEN: A weather
generator for climate inputs to water resource and other
models. In Proceedings of the Fifth International
Conference on Computers in Agriculture. American Society
of Agricultural Engineers, Orlando, FL, USA.
86. Semenov MA, Brooks RJ, Barrow EM, Richardson CW.
1998. Comparison of the WGEN andLARS-WG stochastic
weather generators for diverse climates, dim Res 10:95-
107.
87. Richardson CW, Wright DA. 1984. WGEN: A Model for
Generating Daily Weather Variables. Agricultural Research
Service ARS-8. U.S. Department of Agriculture,
Washington, DC, USA.
88. Semenov MA, Barrow EM. 1997. Use of a stochastic
weather generator in the development of climate change
scenarios. Climate Change 35:397-414.
89. McPeters RD, Bhartia PK, Krueger AJ, Herman JR,
Schlesinger BM, Wellemeyer CG, Seftor CJ, Jaross G,
Taylor SL, Swissler T, Torres O, Labow G, Byerly W,
Cebula RP. 1996. Nimbus-7 Total Ozone Mapping
Spectrometer (TOMS) Data Products User's Guide. NASA
Reference Publication, National Aeronautics and Space
42
-------
Administration, Scientific and Technical Information
Branch, Lanham, MD, USA.
90. Aspelin AL. 1994. Pesticides Industry Sales and Usage:
1992 and 1993 Market Estimates. EPA 733-K-94-001. US
EPA Office of Prevention, Pesticides and Toxic Substances,
Washington, DC, USA.
91. Aspelin AL. 1997. Pesticides Industry Sales and Usage:
1994 and 1995 Market Estimates. EPA 733-K-97-002. US
EPA Office of Prevention, Pesticides and Toxic Substances,
Washington, DC, USA.
92. USDA. 1994. Agricultural Resources and Environmental
Indicators. Agricultural Handbook 705. U.S. Department of
Agriculture, Economic Research Service, Natural Resources
and Environment Division, Washington, DC, USA.
93. GianessiL, Anderson JE. 1995. Pesticide Use inU.S. Crop
Production: National Summary Report. National Center for
Food and Agricultural Policy, Washington, DC, USA.
94. NASS. 1993. Agricultural Chemical Usage: 1992 Field
Crops Summary. Ag Ch 1(93). U.S. Department of
Agriculture, National Agricultural Statistics Service and
Economics Research Service, Washington, DC, USA.
95. NASS. 1993. Agricultural Chemical Usage:
Vegetables-1992 Summary. Ag Ch 1(93). U.S. Department
of Agriculture, National Agricultural Statistics Service and
Economics Research Service, Washington, DC, USA.
96. NASS. 1994. Agricultural Chemical Usage: 1993 Fruits
Summary. Ag Ch 1(94). U.S. Department of Agriculture,
National Agricultural Statistics Service and Economics
Research Service, Washington, DC, USA.
43
------- |