Assessment of the Contribution to
Personal Exposures of Air Toxics from
Mobile Sources
United States
Environmental Protection
Agency
-------
Assessment of the Contribution to
Personal Exposures of Air Toxics from
Mobile Sources
Assessment and Standards Division
Office of Transportation and Air Quality
U.S. Environmental Protection Agency
Prepared for EPA by
Clifford P. Weisel, PhD
Environmental & Occupational Health Sciences Institute
Robert Wood Johnson Medical School
University of Medicine and Dentistry of New Jersey
EPA Contract No. 68-C-03-149
NOTICE
This technical report does not necessarily represent final EPA decisions or
positions. It is intended to present technical analysis of issues using data
that are currently available. The purpose in the release of such reports is to
facilitate the exchange of technical information and to inform the public of
technical developments which may form the basis for a final EPA decision,
position, or regulatory action.
SER&
United States
Environmental Protection
Agency
EPA420-R-05-025
December 2005
-------
Executive Summary:
To evaluate the role of proximity to mobile source emissions on ambient air
surrounding residences, statistical analyses using linear regression models were
conducted for selected volatile organic compounds, carbonyls, PM2.5 mass, elemental
carbon and organic carbon with mobile emission sources. The log transformed ambient
air concentration of individual air toxics measured in Elizabeth, NJ during the
Relationship of Indoor, Outdoor and Personal Air (RIOPA) study was used as the
dependent variable and inverse distance to roadways, gas stations, and point sources and
meteorological parameters as the independent variables in the regression models. The
home, roadway, point and area sources in and around Elizabeth, NJ were geocoded using
Geographic Information System (GIS) techniques to determine the distance between the
homes and potential ambient sources. Meteorological data (wind speed, wind direction,
temperature, and atmospheric pressure) were obtained from the NOAA, Weather-Bureau-
Army-Navy (WBAN) station in the Newark Liberty International Airport, which is
immediately to the north of Elizabeth, and mixing height data from Brookhaven, NY (the
closest station to Elizabeth containing that type of data). The meteorological data were
averaged over the 48 hour sampling period to provide a single value for each sample.
The roads were stratified into six roadway types based on categories used in the EPA
Mobile 6 model. Quality assurance steps were taken to confirm the location and each
home and location, including direct visits to Elizabeth to verify the address and
coordinates. Various regression models (and selection criteria) were used to confirm that
repeatable set of associations were obtained.
-------
All target aromatic compounds (benzene, toluene, ethyl benzene, m,p xylene, o
xylene), methyl tert butyl ether, PM2.5, and organic carbon were statistically associated
with the inverse distance to urban major arterials (FC14) or the interstate highway
(FC11); methyl tert butyl ether (MTBE), benzene, m,p xylene, and o xylene were
statistically associated with the inverse distance to gasoline stations; the carbonyl
compounds (acetaldehyde, acrolein, and formaldehyde) were not associated with the
inverse distance to roadways; PM2.5 and elemental carbon were associated with area
sources of diesel emissions based on truck or bus depot and idling activity, two PAH
compounds (coreonene-gasoline emissions and benzo[ghi]perylene-mobile emissions)
were statistically associated with the inverse distance FC11 and PM area sources. Two
volatile compounds and two PM constitutes without mobile sources (carbon tetrachloride,
tetrachloroethylene, sulfur and selenium), were examined as controls to check for
spurious associations, were not associated with distance to roadways.
The regression model had overall r2 of between 0.16 and 0.67, indicating that between
approximate 20% and 70% of the variability in the air concentrations was explained by
the model. However, the partial r2 of the distance terms were less than 10%, as
meteorology was a more important factor on controlling the variations in the
concentrations than the distance between the home and a mobile emission source. The
effect of mobile sources emissions appears to be confined to residences very close to the
sources within 200 meters, though within that distance that can cause in several |ig/m3,
dependent upon the sources strength for that compound. Thus, for most homes in
Elizabeth, NJ the influence of mobile sources is to raise the general background levels of
the compounds emitted with the increase dependent upon the meteorological condition,
-------
especially the atmospheric stability, and there are appears to only be small increases in
concentrations as the distance decreases. For homes within very close proximity (no
more than several hundred meters) of gas stations and highly trafficked roadways the
regression model predict changes in the median ambient air concentration around the
homes from the typical background levels by 2 to 10|ig/m3.
-------
BACKGROUND
The Relationship between Indoor, Outdoor, and Personal Air (RIOPA) study was
undertaken to determine the influence of outdoor sources on indoor and personal air
concentrations of a set of volatile organic compounds (VOCs), aldehydes, and PM2.5
mass. Indoor/outdoor polyaromatic hydrocarbons (PAHs), and elemental carbon/organic
carbon (EC/OC) concentrations were also measured (Weisel et al 2004a,b). The study
collected data on indoor, outdoor, and personal air concentrations for approximately 300
non-smoking homes in Los Angeles (CA), Elizabeth (NT), and Houston (TX), visited
twice from the summer of 1999 to the spring of 2001. Either one or two homes was
visited on a single day, though some days had samples collected from three or four
homes. Samples were collected throughout the year. This report focuses on the analysis
of VOCs, carbonyls and PM2.5 associated with mobile source emissions and air samples
collected outside residence in Elizabeth, NJ. One dominant source in Elizabeth NJ for
aromatic hydrocarbons and PM2.5 is mobile sources. Prior to examining the ambient
source contributions to indoor/personal VOCs, the association between source emission
and ambient VOC concentrations near residences should be established. One approach to
this is to evaluate the role of proximity to the potential emission sources and
meteorological conditions on ambient concentrations. Precise proximity information
between the residences where the samples were collected and potential emission sources
are needed along with locally collected meteorological information to evaluate the effect
of proximity and meteorological conditions. All roadway classes bisect the city of
Elizabeth, NJ, so wide distributions of distances to each roadway type and gasoline
stations exist. The home selection criteria included over-sampling homes close to heavily
Final Report 1 11/22/2004
-------
trafficked roadways and being near gasoline stations. The association of air concentration
and proximity to mobile sources was examined by deriving linear regression equations
using air concentration as the dependent variable and proximity to roadways, gasoline
stations, and point sources and meteorological parameters as the independent variables.
Attempts to see such statistical association have met with only minimal success unless
the locations of the homes were very close to the roadways. The RIOPA dataset was
designed to contain a substantial number of homes within 0.5km of mobile sources
thereby allowing for examination of the effect of proximity to mobile source emissions in
a northeast urban environment, Elizabeth, NJ.
METHODOLOGY
Construction of the RIOPA Database
The major components in the RIOPA database were sample information, analysis
results, and questionnaire responses. The database was implemented in Microsoft Access
97® and upgraded in Microsoft Access 2000®. A decomposition process was used to
remove internal duplication in a series of steps without loss of data. Every tabular record
was indexed with a unique data-independent primary key. The unique, data-independent
primary key enables the linking, indexing, filtering and sorting of records in multiple
tables and their components. A normalization process was used to re-organized the data
into a streamlined effective tabular structure. For decomposition and normalization, the
Access commands 'selection query' and 'make table query' were most frequently used.
To find the repetition of the identical record in a table, the Access commands 'find the
duplicate query' was used, while 'find unmatched query' was used to determine when
Final Report 2 11/22/2004
-------
there was the missing data. Establishing relationships between one table and another
table by assigning a unique primary key such as identification field was mandatory for
the database performance.
Each sampling home and sample was assigned a unique identification number
(ID) prior to collecting the sample. Each unique sample number was linked to the home
ID so the samples associated with each home could be identified. The home ID was
coded to identify the state the sample was collected in using the two letter state
abbreviations (CA, TX, and NJ), followed by a three-digit number unique for each home
in that state. Among the three digit numbers, the first digit represented whether the visit
was the first or a repeat visit (1, 2 respectively), and the second and third digits the
chronological order the house was selected in (00-99). A unique five digit sequential
number was assigned to each sample as the sample identifier. The first digit of sample ID
was reserved to identify sample types while the remaining numbers randomly assign so
that the analyst could not determine where the sample came from nor the sample type
(indoor, outdoor, personal, blank, duplicate) prior to analysis. The descriptions of the data
fields contained in the RIOPA database are listed in Tables 1 to 3.
Quality Assurance of Database
The following quality assurance protocols were followed at each data entry and
modification step to find data entry errors and repeated or missing data. All the sampling
information, analysis results and questionnaire data were transferred into the database.
Quality assurance at the data entry level was performed by having an individual who did
not enter the data compare the original written sampling records to the electronic data
files. Validation equations were used in Access Query to identify potential data entry
Final Report 3 11/22/2004
-------
errors, especially for the fields containing calculated values. Access commands were used
to find duplicate data entries ("find the duplicate query"), which were deleted from the
database and missing data across different tables, ("find the unmatched query" ).
All detailed information concerning the sample collection, sample analysis and
questionnaires were consolidated and compiled into the main database in an organized
manner as illustrated in Figure 1 (adopted from Weisel et al. 2004b, RIOPA final report).
The final database was reviewed by research associates, experienced in analyzing each
specific type of sample. This review included cross checking keyed data entries against
the original printed hard copy of the analytical data. The research associate double-
checked all the calculations used to transform the analytical data into the reported
ambient air concentrations. Finalized data were confirmed by reapplying all of the
calculations to the original analytical data. After the research associate completed his or
her verification, the initial database was then classified as the preliminary database.
The field teams validated the preliminary database by reviewing the field
sampling information and confirming the calculations that incorporated the information
from the field sampling sheets. The field teams then made any necessary corrections and
noted the change, which was then reported back to the originator for further confirmation
of the needed correction. After the field teams made their comments and corrections, the
principal investigators randomly checked the data by cross-referencing the electronic data
for a subset of samples with the respective original data from the analytical results or
sampling information sheets.
Final Report 4 11/22/2004
-------
Table 1. Components and Data Fields of the Sampling Information in the RIOPA
Database
Data Fields
Description
Home ID
Source ID,
location
CAT ID
Sample ID
Sampling date,
time
Sample duration
Flow rate
Sample volume
Pump elapsed
time
Pump recorded
volume
Sample type
Equipment ID
Leak test
Unique identification number with state abbreviation
PFT source ID number (alpha-numeric) and location (floor-room)
Capillary absorbent tube ID (numeric)
Unique 5 digit number linked to the house ID, identifying
contaminant category measured (VOC, DNPH, DNSH, Teflon and
Quartz filter for PM2.5, PUF)
Date (mm/dd/yy) and time (hh:mm) sampling started and ended
Calculated duration of sampling in minutes
Initial, final and average flow rate of pump (cc/min, or L/min)
Calculated volume of sample (L, or m3)
Pump elapsed time recorded on the pump counter in minutes
Pump recorded volume of air sampled (m3)
Sample type (indoor, outdoor, personal adult, child, duplicate,
blank, control)
Pump, head and battery IDs
Leak test check done before and after sampling (yes/no)
Final Report
11/22/2004
-------
Table 2. Components and Data Fields of the Information of the Analysis Results in the RIOPA Database
Data Fields
VOCs
Carbonyls
PM2.5
PAHs
Description
Concentration (ppb, |ig/m3) of 1,3-butadiene, methylene chloride,
chloroprene, methyl tert butyl ether, carbon tetrachloride,
chloroform, benzene, m,/?-xylene, toluene, trichloroethylene,
tetrachloroethylene, ethylbenzene, o-xylene, styrene, |i-pinene, |i-
pinene, J-limonene, 1,4-dichlorobenzene
Concentration (ppb, |ig/m ) of formaldehyde, acetaldehyde,
acetone, acrolein, propionaldehyde, crotonaldehyde, benzaldehyde,
hexaldehyde, glyoxal, methylglyoxal
PM2.5 mass, Concentration (ppb, |ig/m3) of organic carbon (OC)
and elemental carbon (EC), elements; Ag, Al, As, Ba, Be, Bi, Br,
Ca, Cd, Cl, Co, Cr, Cs, Cu, Fe, Ga, Ge, Hg, In, K, La, Mn, Mo, Ni,
P, Pb, Pd, Rb, S, Sb, Se, Si, Sn, Sr, Ti, Tl, U, V, Y, Zn, Zr
Concentration (ppb, |ig/m3) of gas/ particle phase poly cyclic
aromatic hydrocarbons; Dibenzothiophene, Phenanthrene,
Anthracene, 2-Methylanthracene, 1-Methylanthracene, 1-
Methylphenanthrene, 9-Methylanthracene, 4,5-
Methylenephenanthrene, 3,6-Dimethylphenanthrene, 9,10-
Dimethylanthracene, Fluoranthene, Pyrene, Benzo[a]fluorene,
Retene, Benzo[b]fluorene, Cyclopenta[c,d]pyrene,
Benzo[a]anthracene, Chrysene+Tripheny 1 ene,
Benzo[b]naphtho[2, l-d]thiophene, Benzo[b+k]fluoranthene,
Benzo[e]pyrene, Benzo[a]pyrene, Perylene, Indeno[ 1,2,3-
c,d]pyrene, Dibenzo[a,c+a,h]anthracene, Benzo[g,h,i]perylene,
Coronene
House Information Air exchange rate (1/hr) and the volume of house (m3)
Meteorological
Information
Temperature and relative humidity measured inside and outside of
house
Final Report
11/22/2004
-------
Table 3. Components and Data Fields of the Questionnaire Data in the RIOPA Database
Data Fields Description
Technician
Walkthrough
Baseline Survey
Activity
Questionnaire
Time Diary
Evaluation of the house and its usage and a description of the
neighborhood regarding possible sources.
Household and participant characteristics; demographics and
socioeconomic status; housing characteristics, facilities and usage;
personal exposure activities before the study period; and
respiratory health status of participant
A detailed series of questions related to activities, duration and use
of consumer products
48-hour activity log listing the time spent in each
microenvironment
Final Report
11/22/2004
-------
FIELD SAMPLING/DATA COLLECTION
Questionnaires
•Technician Walkthrough
•Baseline
•Time Activity Diary
f Original questionnaires
/ stored in locked file
I cabinet. Access restricted
\ to Pis and designated
\. field technician.
_L
_L
Field Sampling Information
Sheets
•Subject ID
•Sampling Date/Time
•Sampler Type
•Sample Location
1U. J
Sample
•Aldehydes
•VOC
•PM
•PAH
•AER
Provided Directly to Pis
Entered into
Database by
Designated Field
Technician
QUESTIONNAIRE
DATABASE
(restricted access!!)
Pis retain sampling
information sheets until
completion of sample
analysis.
Final
Verification by
Pis
Verification
and
Comments by
Field Teams
SUB DATABASE:
CA
NJ
TX
Corrections When
Necessary
Data Keyed-ln/
Consolidated
By Laboratory
Technician
PRELIMINARY DATABASE
Samples transported
from the field to the
laboratory on ice in
coolers.
Laboratory
Technician
for Analysis
CA field samples
shipped on blue
ice to NJ
analytical lab via
overnight carrier.
NJ&
TX Samples
INITIAL DATABASE
Initial
Verification by
Research
Associates
Analysis:
Summary Tables
Plots
Statistical Analysis
Figure 1. The Flow Diagram of the Transference of Information from the Field Sampling to Database Construction and
the Quality Assurance Processes (adopted from Final Report of the RIOPA study)
Final Report
11/22/2004
-------
Data Integration in the RIOPA Database
To expand the utility of the RIOPA database and to facilitate data analysis with
meteorological and geographical datasets, different databases in the public domain were
either imported into or linked to the RIOPA database. The details of the integration of the
databases are illustrated in Figure 2. The databases included were the National Emission
Inventory of 1999 (version 3.0 final for HAPs and criteria pollutants, US EPA), National
Climatological data obtained from the National Oceanographic and Atmospheric
Administration (NOAA), 2000 US Census data, 2000 TIGER/Line data, and Roadway
Information & Transportation data obtained from NJ DOT (Table 4)
National Emission Inventory of 1999
The emission data of the states of New Jersey and New York from mobile, area,
and point sources were obtained from the 1999 National Emission Inventory (NEI, the
final version 3.0 for the hazardous air pollutants, released on Dec 2003; the final version
3.0 for the criteria pollutants, released on Feb 2004). The datasets were divided into four
categories (On-road, Non-road, Point, Non-Point) and available from the Technology
Transfer Network, Clearinghouse for Inventories and Emission Factors (TTN CHIEF,
http://www.epa.gov/ttn/chief/net/1999 inventory.html). The emission sources of
compounds collected in the RIOPA study were selected from the inventory datasets of
the counties containing or adjacent to the RIOPA study area. The counties were Union,
Essex, and Hudson Counties, New Jersey, and Richmond County, New York.
The emissions from on road mobile sources were calculated to evaluate which
road types to consider in the regression models. Actual emission rates were not used as
Final Report 9 11/22/2004
-------
inputs in the models since only statistical associations were examined in this analysis and
not a comparison of predicted to measured concentrations. The emissions were
calculated by multiplying emission factors (g/mile) estimated by US EPA using MOBILE
6.2 model and vehicle miles traveled (VMT, 106 miles). The VMT were estimated from
the sampled traffic counts of road segments by Federal Highway Administration
(FHWA)'s Highway Statistics 1999 (US EPA, Documents forNEI, 2003). The emission
estimates for each county were stratified by road types (6 urban categories of public roads
were present in Union County) and by twelve vehicle types. The emission rate per unit
length of public road by functional classification was estimated from the total roadway
mileages of Union County and the annual total emission from on-road sources in Union
County. The emission rates of selected VOCs by roadway class are listed in Table 5. The
emission rates calculated from the major roadways (FC11, urban interstate highways;
FC12, urban other freeways and expressways; FC14, urban major arterials) were more
than 6 to 90 fold higher than the emission rates from the minor classes of roadways
(FC16, urban minor arterial; FC17, urban collector; and FC19, urban local). To apportion
the annual total amount of emissions from the on-road mobile sources countywide to
Elizabeth, the ratio of the roadway mileage in Elizabeth to the roadway mileage in Union
County was calculated for each category of functional classification. The public roadway
mileages in Elizabeth were 11.5, 3.7, and 22.3 in kilometers for FC11, FC12, and FC14,
respectively. The percentage of the major public roadway miles inElizabeth classified as
FC11, FC12 and FC14 were 3.5%, 11% and 6.7%, respectively. The proportion of major
roadway miles in Elizabeth were larger than that in Union County as a whole (FC11,
1.4%; FC14, 3.6%), in New York Northeast New Jersey (FC11, 1.3%; FC14, 5.4%) and
Final Report 10 11/22/2004
-------
in the composite of New Jersey urban areas (FC11, 1.2%; FC14, 5.4%). As a result, the
proportion of urban local roads (FC19) was lower (64%) than the proportion of local
roads of other metropolitan areas mentioned (over 70%) (Table 6). The largest
contributions to on-road source emissions in Elizabeth were from roadways of FC14
(about 33%), followed by contributions from roadways of FC11 (about 30%). More than
75% of aromatic compounds and MTBE were emitted from major roadways (FC14,
FC11, FC12) according to the emission inventory data and public roadway information of
New Jersey.
Emissions from a specific area source were estimated from the annual emission
estimate for Elizabeth divided by the total number of area sources in Union County. The
population ratio of Elizabeth to Union county was used to apportion the annual emission
for specific area sources in Elizabeth. The national emission inventory of point sources
provided the annual generation and the coordinates. The daily emission from a point
source was estimated by dividing the annual total by 365 days, which assumes that the
facility operated everyday. The emission from the non-road mobile sources was ignored
because the total number non-road sources (lawn and garden equipment, snowmobiles,
snow blowers, construction equipment etc) in the study area an urban center, was
considerably lower than the on-road emissions or the off road emissions for the more
suburban regions of Union County.
A number of non-point sources for diesel emissions were identified in and near
Elizabeth, NJ. These included: a truck depot and bus depot in north-east Elizabeth, the
Port Authority-Marine Terminal in East Elizabeth and the Newark Liberty International
Airport located north - north east of Elizabeth. All of these locations were north to north
Final Report 11 11/22/2004
-------
east of the majority of sampling locations, though the truck and bus depots were close to
a subset of homes. No residencies exist intermingled with either the seaport or airport.
The Meteorological Data for New Jersey
Surface Observation Data
Meteorological data for Elizabeth, New Jersey, were obtained from
NCDC/NOAA (National Climatic Data Center, National Oceanic and Atmospheric
Administration). The data are part of the quality assured national climatological database.
The datasets contain hourly observation tables, along with daily and monthly summary
tables covering the entire period of the RIOPA Study. The hourly observation datasets
were used because those could be matched to the exact 48-hour sampling time of
individual samples. The ASCII data files were linked to the RIOPA weather database for
data extraction. First, the meteorological data were selected from the observation station
that was closest to the study area, the Weather-Bureau-Army-Navy (WBAN) station in
the Newark Liberty International Airport (EWR, 14734, Latitude; 40.72°, Longitude; -
74.17°). Next, a series of the selection queries in Access were used to retrieve the hourly
observation dataset corresponding to each individual sample according to the date/time
the sampling was started and ended.
Among the meteorological data extracted, the variables considered as possibly
influencing the ambient air concentrations were: the dry bulb temperature (°F), relative
humidity (%), precipitation (inches), station atmospheric pressure (inHg), resultant wind
speed (knots), resultant wind direction (tens of degrees from true north). The English
units were converted to the SI units. Meteorological values averaged for individual 48-
Final Report 12 11/22/2004
-------
hour sampling periods, were wind speed (U, m/s), temperature (K, Kelvin), atmospheric
pressure (mmHg), and relative humidity (RH, %). The precipitation was totaled for the
48-hour sampling period.
Mixing Height Data
The mixing height data were obtained from NCDC/NOAA. The mixing height
data were computed from source code made available by the US EPA. The dataset was
computed using the upper air data of Brookhaven, NY and the surface data of Newark,
NJ. Brookhaven, NY, was the closest monitoring station to the RIOPA study site
recording the upper level air data. Mixing heights were reported as AM and PM mixing
heights. The values were averaged for individual homes according to the corresponding
sampling duration of 48-hour.
Atmospheric Pasquill Stability
The Atmospheric Pasquill Stability classes with a time resolution of 3 hours were
retrieved from NOAA AIR Resources laboratory's READY (Real-time Environmental
Applications and Display system) web site (http://www.arl. noaa.gov/ready.html). The
archived datasets were EDAS (Eta Data Assimilation System) meteorological data
(80km, 3 hourly, US). The representative coordinates of Elizabeth (Latitude; 40.65°,
Longitude; -74.20°) were used as the location. The text results were tabulated and the
stability time-series plots were saved for individual sample dates when available. The 48-
hour average stability was calculated from the stability time-series classes for each
sample. The classification of the atmospheric stability is described in Table 7.
Final Report 13 11/22/2004
-------
Table 4. Description of Integrated Data from Databases in the Public Domain
Databases Description
National Climatological Data
Hourly
observations
Hourly
precipitation
Daily table
Mixing height
Atmospheric
stability
ASOS; WBAN number, date, time in local standard time, sky
conditions, visibility, significant weather types, dry bulb
temperature, dew point temperature, wet bulb temperature, relative
humidity, wind speed, wind direction, wind characteristic gusts,
value for wind character, station pressure, pressure tendency, sea
level pressure, report type, precipitation totals in inches
ASOS; WBAN number, date, time, hourly precipitation
ASOS; WBAN number, date, temperature (maximum, minimum,
average, departure from normal, average dew point, average wet
bulb), degree days (heating, cooling), significant weather types,
snow/ice depth and water equivalent, precipitation snowfall, pressure
(average station and average sea level), resultant wind speed,
resultant wind direction, average speed, maximum 5 second, 2
minute speed and direction
Morning and afternoon mixing height (meters) produced from
surface air and upper air data by NCDC/NO AA
Atmospheric Pasquill stability class from NOAA AIR resources
laboratory
National Emission Inventory Data
County level estimates are stratified by type of roadways and
On-road sources vehicles; NEI for criteria pollutants and HAPs for year 1999 (version
3 final)
Non-road
sources
Point sources
Non-point
sources
NEI for criteria pollutants and HAPs for year 1999 (version 3 final)
County level estimates from registered point sources; NEI for criteria
pollutants and HAPs for year 1999 (version 3 final)
County level estimates of non-point sources; NEI for criteria
pollutants and HAPs for year 1999 (version 3 final)
Final Report
14
11/22/2004
-------
Geographic Information and the Spatial Data
Transportation Public roadway mileages, functional class of roadways, vehicle miles
data traveled by stratified vehicle types; NJ DOT
Census 2000 Line features (roadways, railroads, hydrography etc.), municipality
TIGER data from US Census Bureau
Final Report 15 11/22/2004
-------
Table 5. Estimated Emission Rates (jig/sec-m) of Selected VOCs for Public Roadways of
Union County by its Functional Classes. (Estimation based on 1999 NEI v3 Final)
VOCs
Xylene
Toluene
MTBE
Benzene
Ethylbenzene
Formaldehyde
Acetaldehyde
Acrolein
Table 6. Percent
of Elizabeth, NJ
VOCs
Xylene
Toluene
MTBE
Benzene
Ethylbenzene
Formaldehyde
Acetaldehyde
Acrolein
FC11
50.9
88.2
44.4
32.2
13.3
17.6
5.12
0.70
FC12
69.8
121.0
60.9
44.1
18.3
24.2
7.02
0.95
FC14
29.2
50.4
25.6
18.0
7.7
11.1
3.22
0.49
Contribution of On-road Source
(Estimation based on 1999 NEI
FC11
29.4
29.5
29.2
29.8
29.2
17.4
17.4
16.0
FC12
13.1
13.1
13.0
13.3
13.0
14.3
14.4
13.2
FC14
32.6
32.6
32.5
32.4
32.7
27.4
27.4
28.4
FC16 FC17 FC19
5.0 3.4 0.9
8.7 5.9 1.6
4.4 3.0 0.8
3.1 2.1 0.5
1.3 0.9 0.2
1.92 1.31 0.3
0.56 0.38 0.1
0.09 0.06 0.01
Emission by Roadway Types in the City
v3 Final)
FC16 FC17 FC19 Total
9.9 5.2 9.8 100
9.9 5.2 9.7 100
9.9 5.2 10.2 100
9.8 5.2 9.4 100
9.9 5.2 9.9 100
16.5 7.2 17.1 100
16.5 7.2 17.0 100
17.1 7.5 17.8 100
Final Report
16
11/22/2004
-------
Table 7. The Description of the Classification of the Atmospheric Pasquill Stability
Pasquill Stability Class Description Coded
A Extremely unstable conditions 1
B Moderately unstable conditions 2
C Slightly unstable conditions 3
D Neutral conditions 4
E Slightly stable conditions 5
F Moderately stable conditions 6
G Extremely stable 7
Final Report 17 11/22/2004
-------
DATA COLLECTION and INTEGRATION PROCESSES
National Emission Inventory
HAPs and Criteria Pollutants, 1999
• On-Road (Mobile) Sources
• Non-Road Sources
• Point (Industrial) Sources
• Non-Point (Area) Sources
Data Extraction bv Selection Query
By State and County; Union, Essex,
Hudson Co, NJ, and Richmond Co, NY
By Pollutants; VOCs, Carbonyls, PM25
i
r
I
Estimation of Emission Rates
Mobile Sources by FC of Roads
Area Sources by Population ratio
| Location Information of Point Sources
National Climatoloqical Data
NOAA ASOS Data (1999 ~ 2001)
• Hourly Observations Table
• Hourly Precipitation Table
• Daily Observations Table
• Monthly Observation Table
• Mixing Height & Atmospheric
Stability Data from NOAA
Dataset Extraction by Selection Query
By WBAN Station and by Coordinate
Purchased from NOAA by Combination of
Surface and Upper Air Data
Data Extraction for Sampled Date/Time
and Calculate 48-hour Averages
DATA ANALYSIS
Statistical Analysis
RIOPA DATABASE
Sampling Information
Sample Analysis Data
Questionnaire Database
Geographical Information
. 2000 U.S. Census Data
. 2000 TIGER/Line Data
• Transportation Data from NJ DOT
• List of Small Businesses from
HAZMAT Team of Union Co, NJ
• Digital Ortho Quad Quadrangles
(Aerial Photo)
Download Layers for the Study Area
By Municipality; Elizabeth City, Union,
Essex, Hudson Co, NJ, Richmond Co, NY
By DOQQ ID and the Location of Features
Geographical Layer Overlays
Visual Verification of Locations
Confirmation of the Sampler Location
and Area Sources by Re-visiting with
GPS
<=>
CIS
Layers & Geo-Database
Proximity Calculation
Soatial Analysis
Figure 2. Data Integration Processes of the Public Databases into the RIOPA Database for Data Analysis of New Jersey Site
Final Report
18
11/22/2004
-------
Geographical Information Systems
Arc View GIS (version 3.1, ESRI, Inc.) was used to build the geographical inputs
for statistical analysis. The spatial analyst extension used was for geo-processes such as
dissolve, merge, clip, union, spatial join, and select themes. The scripts downloaded were
used to measure the distances between geographical locations. For the geographic
coordinates of projection, NAD83 (North American Datum 1983), New Jersey State
Plane 1983 was used with units of decimal degrees and feet using ArcScript, Addxycoo
(ESRI). GIS application itself provided a powerful database tool for integration of
datasets by joining and linking databases.
Census 2000 TIGER/Line® Datasets
The Census 2000 TIGER® (Topologically Integrated Geographic Encoding and
Referencing system) datasets were downloaded from the Geography Network (US
Census Bureau, Geography Division, http://www.census.gov/geo/www/tiger). The line
features included were roads, railroads, and hydrography. The polygon features were
municipal boundaries such as county, township, and city borderlines. Not only were the
spatial data of Union County, NJ included in the resulting map, but also the spatial
features of adjacent counties (Essex, Hudson Counties, NJ and Richmond County, NY)
since the proximity information and source emissions were also reviewed for these
counties (figure 3).
Final Report 19 11/22/2004
-------
Digital Images
Digital orthoquarter quadrangles (DOQQs) are the combined image of a
photograph with geometric qualities of a map. The primary digital orthophotoquad has a
1-meter ground resolution, quarter-quadrangle (3.75-minutes of latitude by 3.75-minutes
of longitude) image cast on the Universal Transverse Mercator Projection (UTM) on the
North American Datum of 1983 (NAD83). For the RIOPA study area in New Jersey, the
corresponding 1997 DOQQs were downloaded from the New Jersey Image Warehouse
site of the NJ DEP, Bureau of GIS (http://njgin.nj.gov/OITJW/index.jsp). The
downloaded DOQQs are listed in Table 8. Figure 4 illustrates the digital image of the
City of Elizabeth with municipal borderlines.
New Jersey Road Network
The functional classes of roadways (Table 9) in Elizabeth were obtained from the
functional classification map of Union County from the Bureau of Transportation Data
and Development in Department of Transportation of New Jersey (http://www.state.nj.us/
transportation/refdata). The functional class information was assigned to the appropriate
road segments using the roadway line feature layer of Arc View GIS® project file. The
Straight Line Diagrams provided a graphical representation of state, toll, and county
roads and showed intersecting streets, administrative and geometric characteristics. The
Straight Line Diagrams provided the width of the roadways for estimating the general
offset distance from the centerline of roadways. The offset distance used was one half the
roadway width and was required to specify the location of each home relative to that the
roadway centerline. This allowed the home to be placed on the correct side of the
Final Report 20 11/22/2004
-------
roadway rather than on the center line and to calculate the distance from the home to the
center line of the roadway. Offset distances of 20 to 30 meters were used based on the
functional classes of the roadways. The customized map of the public roadways in the
study area is illustrated in Figure 5.
Location of Area and Point Sources
Lists of the street locations of service stations were obtained from visual
observation and written records made during the sampling, from web sites that list
gasoline stations by zip code for price comparison
(http://www.gaspricewatch.com/USGas_index.asp), and from the yellow pages
(http://www.yellowbook.com) for Elizabeth, New Jersey. After combining and
comparing the information contained in these lists, it was determined a more reliable
compilation was still needed. This was obtained from the Emergency
Response/HAZMAT of Union County, Division of Environmental Health and
Emergency Management. The list of the actually operating dry cleaning facilities in the
City of Elizabeth, NJ, was also obtained from HAZMAT Team of Union County. Figure
6 and Figure 7 are the maps of gas station and dry cleaning facilities identified and
located in the study area. The latitude and longitude of the point sources identified in the
study area from the emission inventory database were provided with the list used to
generate customized maps by making event themes. (Table 10), (Figure 8).
Final Report 21 11/22/2004
-------
Quality Assurance of Geographical Data
To evaluate the effect of proximity and meteorological conditions simultaneously,
the relative locations of sources and sampling sites should be defined precisely. All
downloaded geographical layers were overlaid on the New Jersey State Plane of NAD 83.
TIGER maps placed road centerlines substantial distances (15 ~ >50 meters) from actual
location based on aerial photos (DOQQs). Therefore, to obtain the needed accuracy of the
proximity data acquisition, TIGER data were evaluated before geo-coding and
calculating the distance between road centerline and receptor location. The errors of 2000
TIGER/Line® data were corrected by following the centerlines of the roadways observed
on the overlaid DOQQs as reference themes. The point themes were finalized after
correcting the locations based on the street information collected during confirmation
trips done by driving to each address listed in the RIOPA dataset, digital orthophoto, the
Elizabeth City engineer's map, pictures taken from the sampling, and the GPS readings
from the confirmation trip. The GPS unit used to read the coordinates was a GeoStats
wearable GeoLogger™ The GPS reading was used solely as an aid to locate the houses
during the quality assurance visit to Elizabeth. The values retrieved from the GPS were
not used in the data analysis, rather the longitude and latitude obtained from the GIS
mapping was used. The corrected point themes were the locations of the outdoor sampler,
point sources, gas stations, and the dry cleaning facilities (figures 5 - 8). Approximate
receptor (outdoor sampler) locations are given for each residence in figure 9 for
illustration purposed to maintain confidentiality of the subjects, actual locations
coordinates were used to determine proximity to sources.
Final Report 22 11/22/2004
-------
Measurement and Calculation of Geographical Data
The location of the residences, point sources, and area sources were determined
by the address-matching technique within Arc View on the corrected and quality assured
line files from Census 2000 TIGER/Line® as the reference theme using US streets with
zones. The spatial coordinates of the point themes, such as residences, point sources, gas
stations, and dry cleaning facilities were determined by "Addxycoo", a commonly used
ArcScript. The distances from point theme to point theme and the distances, from point
theme to line theme were measured by "the nearest features", an extension patch
available in ESRI's site for ArcScripts (http://arcscripts.esri.com).
Final Report 23 11/22/2004
-------
Table 8. The List of Digital Orthoquarter Quadrangles Used in this Study for Quality
Assurance (Source: NJ DEP)
QQ Number QQ Name
514 SEROSELLNJ
521 NW ELIZABETH NJ-NY
522 NE ELIZABETH NJ-NY
523 SW ELIZABETH NJ-NY
524 SE ELIZABETH NJ-NY
Table 9. The Functional Classification of Public Roadways in Urban Area (Source: NJ
DOT)
Functional Class Description
FC 11 Urban Interstate Highways
FC 12 Urban Other Highways/Freeways
FC 14 Urban Major Arterial
FC 16 Urban Minor Arterial
FC17 Urban Collector
FC 19 Urban Local
Final Report 24 11/22/2004
-------
Table 10. The Point Sources of Selected VOCs Used for Data Analysis (Source: 1999
NEI for HAPs version 3 Final)
PS ID
Xyl_PSl
Xyl_PS2
Xyl_PS3
Xyl_PS4
Tol_PSl
Tol_PS2
Tol_PS3
Tol_PS4
Tol_PS5
Bzn_PSl
Bzn_PS2
Bzn_PS3
Bzn_PS4
Ebz_PSl
Ebz_PS2
MTBE_PS1
PCE_PS1
Emissions
1.95
0.94
0.91
0.24
4.05
3.03
2.14
0.50
0.38
4.55
1.73
0.20
0.10
0.60
0.27
43.50
1.03
Facility /Process
Refinery
Tanker Terminal
Industry
Aviation Service
Refinery
Tanker Terminal
Industry
Industry
Aviation Service
Refinery
Tanker Terminal
Joint Meeting of Essex and Union
Aviation Service
Refinery
Industry
Refinery
Refinery
X
-74.22
-74.25
-74.19
-74.17
-74.22
-74.25
-74.19
-74.22
-74.17
-74.22
-74.25
-74.20
-74.17
-74.22
-74.19
-74.22
-74.22
Y
40.64
40.63
40.67
40.70
40.64
40.63
40.69
40.63
40.70
40.64
40.63
40.64
40.70
40.64
40.67
40.64
40.64
PS ID: Point Source ID, Emissions are annual total generations in metric tons, X:
Longitude, Y: Latitude.
Final Report 25 11/22/2004
-------
STATEN IS,
RICHMOND, N
Figure 3. The Location of Union County and City of Elizabeth in New Jersey
Final Report
26
11/22/2004
-------
?
6 Kilometers
Figure 4. Digital Image of Study Area, the City of Elizabeth, New Jersey (Source of DOQQs: NJ DEP, jpeg97)
Final Report 27 11/22/2004
-------
Hillside Township
Newark, Essex County
N
Elizabeth Cit
River
Major Roadway
FC11
FC12
FC14
Newark Libert
°>m r international Airport
izabe
Port Authorit
Marine Terminal
Staten Island
3 6 Kilometers
Figure 5. Major Public Roadways in Study Area, the City of Elizabeth, New Jersey
Final Report
28
11/22/2004
-------
Newark, Essex County
N
Newark Libert
ternational Airport
izabe
Port Authorit
Marine Terminal
Staten Island
3 6 Kilometers
Figure 6. Identified Gas Stations in Study Area, the City of Elizabeth, New Jersey
(Source: HAZMAT List of Union County)
Final Report
29
11/22/2004
-------
Newark, Essex County
N
Newark Libert
ternational Airport
izabe
Port Authorit
Marine Terminal
Staten Island
3 6 Kilometers
Figure 7. Identified Dry Cleaning Facilities in Study Area, the City of Elizabeth, New
Jersey (Source: HAZMAT List of Union County)
Final Report
30
11/22/2004
-------
Hillside Township
LEGEND
Point Sources
#
#
#
<1 Ton
1 - 4.55 Ton
> 4.55 Ton
Newark, Essex County
Newark Libert
ternational Airport
Union Township
izabe
Port Authorit
Marine Terminal
?r Linden
# Xvlene
Staten Island
3 6 Kilometers
Figure 8. Identified and Selected Point Sources of VOCs Studied in Study Area, the
City of Elizabeth, New Jersey (Source: 1999 NEI for HAPs, Version 3
Final)
Final Report
31
11/22/2004
-------
Newark, Essex County
N
Hillside Township
Newark Libert
International Airpor
Union Township
Elizabeth
-Port Authorit
Marine Term in a
Newark/ Bay
Staten Island
\3 6 Kilometers
Figure 9. Approximate Locations of Outdoor Samplers in the City of
Elizabeth, New Jersey (Locations randomly shifted by small amount for
illustration purpose to preserve subjects' confidentiality, Source: RIOPA
Questionnaire Database, 2003)
Final Report
32
11/22/2004
-------
Statistical Analysis
Statistical Treatment of Data
The SAS system for Windows (version 8.02) and SPSS for Windows (version
12.0) were used for all statistical analyses in. The blank subtracted, temperature adjusted,
and uncensored ambient air concentrations (|ig/m3) of the selected air toxics and PM2 5
were evaluated.
The distributions of the residential ambient air concentrations were examined by
the one-sample Kolmogorov-Smirnov (K-S) test to evaluate their normality. Natural log-
transformation of the concentrations was performed because it provided distributions that
were closer to a normal distribution with more constant variance than the un-transformed
concentrations. Any zero values in the uncensored dataset were replaced with one half
the minimum diction limit prior to the statistical analysis.
The sample means, standard deviations, median, percentiles, the minimum and
maximum values for the variables were computed. The scatter plots of residential
ambient air concentration and each independent variable were examined for obvious
associations.
Bivariate Pearson correlation coefficients and the significance of the statistics
were computed to examine the correlations between the response variables and the
predictor variables for the purpose of preliminary selection of the more influential
explanatory predictors among the groups of candidate variables. Correlations of un-
transformed, In-transformed, inversed, squared, and inverse squared of the concentration
and predictor variables were examined.
Two samples were collected from most homes several months apart and on most
Final Report 33 11/22/2004
-------
days one or two homes were visited, though occasional three or four homes were sampled
on a single day. The Mixed Model Proc in SAS was run with home identification
number and with date as the repeated measure to evaluate if whether multiple samples at
the same location or date affected the results. No affect was observed.
Multiple Linear Regression Analyses
Multiple regression analysis was used to examine the association between the
ambient air concentrations and the proximity and meteorological variables. A multiple
linear regression equation that expresses the response variable as a linear combination of
(p - 1) predictor variables, has the form:
where:
Yz is the response in the i th trial
fig, /?, ••• J3p_-, are the parameters (regression coefficients)
Xa, Xz2 • • • X;, p _j are the values of predictor variables
sz is the error term
This equation assumes that the relationship of independent variables with
response variable is linear, and that the distribution of error terms is normal with equal
variance. Two of the explanatory variable groups considered important for predicting
residential ambient air concentrations of the selected VOCs were the proximity of a
residence to the emission sources and the corresponding meteorological conditions.
Distances from residences to mobile, area, and point emission sources identified from the
emission inventories, wind speed, atmospheric stability, mixing height, temperature,
Final Report 34 11/22/2004
-------
relative humidity, precipitation, and atmospheric pressure were used as the independent
variables.
Selection of the predictors associated with elevated ambient air concentrations
around residences were examined using several multiple linear regression analyses
methods: forward selection, backward elimination, stepwise selection, r squares, and
maximum r2 improvement methods, to verify that consistent results were obtained
independent of the type of regression model used. Final model were determined using
stepwise selection. The default criteria of each method in the SAS program (version
8.02) were used for selecting variables to be included in the resulting model. The
parameter selection criteria used for forward selection, backward elimination, and
stepwise selection were p<0.50, p<0.10, and p<0.15, respectively. Due to the different
levels in selection criteria, the number of predictors included in resulting models differed.
The models selected by the different selection methods were compared and evaluated by
the p values of parameter estimates of predictor variables and the composition of
variables in the model. When the best-fitting model was selected for a VOC compound,
the model and the corresponding statistics were also evaluated. The equality of error
variances of the best-fitting model was visually examined on the appropriate diagnostic
plots and statistics computed. See Appendix A for discussion of multicollinearity which
was identified among several meteorological variables.
Identification and Tests of Outlying Observations
Details on how outliers were determined are given in Appendix A. Standardized
residuals were examined with a criteria of tinv (0.95, n-p-\) of ± 1.654 based on a
Final Report 35 11/22/2004
-------
minimum degree of freedom of 170 to determine if a value was a statistical outlier.
Presence of outliers suggest other processes, not accounted for by the independent
variables selected, was contributing to the concentration or there was analytical
uncertainty in the measurement. The final model chosen excluded those values (which
were <10% of the measurements) to determine the strength of the model for the data that
could be predicted, as the focus of this analysis was to establish how proximity affected
concentration. A separate analysis could be informative to indicate why outliers to the
regression analysis exist. The actual degree of freedoms for each compounds were as
follows; m,p-xylene (171); o-xylene (174); toluene (174); benzene (175); ethylbenzene
(171); MTBE (169); and PCE (161).
To test if the outliers removed from the multiple regression model, biases the
model outcome ANOVA tests were used to compare the means of independent variables
between groups of outliers and non-outliers. To verify that removing the outlying
observations did not eliminated specific conditions or situations, the analysis of variance
(ANOVA) tests were performed on the means of the predictor variables between group of
outliers and group of non-outliers. The regression model was run excluding the outliers
to obtain the final, best fit equation for each compound.
Diagnostics of Unequal Error Variances and Multicollinearity
To test the assumption of equal error variance, the heteroscedasticity of the
parameter estimates were tested To determine whether the error variance was constant
over all cases (Neter et al., 1996). The null hypothesis for this test is that the errors are
homoscedastic, independent of the predictors. Therefore, the equal error variance was
Final Report 36 11/22/2004
-------
assumed in the best-fitting model when the probability (p) of the chi-square test was
greater than 0.05.
The multicollinearity, which results from linear interactions between the predictor
variables, was tested because codependency might be detrimental when interpreting the
resulting regression model. First, the bivariate Pearson correlations between pairs of
predictors included in the final models were examined to identify the highly correlated
pairs of the predictors. Second, the magnitude of variance inflation factor (VIF) was
examined to determine if it was greater than 10. Third, the condition index and
eigenvalue were examined from the collinearity diagnostics. A condition index greater
than 100 and an eigenvalue smaller than 0.01 was considered evidence of
multicollinearity in the model since those values indicate the presence of highly
correlated variables when the proportion of variation is greater than 0.5.
Use of Dummy Variables (for Seasonally)
The three indicator (dummy) variables were introduced to the finalized best-
fitting models of selected VOCs. To avoid the not-fully ranked model problem, dummy
variables for spring, summer, and fall were generated by assigning 1 for the season of the
sampled date, and by assigning 0 for the other seasons. Therefore, the winter would be
defined by all three indicator variables to be zero.
Final Report 37 11/22/2004
-------
RESULTS
Dataset Extraction for Data Analysis
The RIOPA database was integrated with source emission inventory and
meteorological information to provide datasets for statistical data analyses that contained
accurate proximity information of emission sources of each sample with corresponding
meteorological conditions for each 48-hour sampling period. The blank subtracted,
temperature adjusted, uncensored residential ambient air concentrations of selected
VOCs: w,/?-xylene, o-xylene, toluene, benzene, ethylbenzene, and MTBE, carbon
tetrachloride and PCE (as control compounds); Aldehydes: formaldehyde, acelydehyde,
and acrolein; and Particulate Matter: PM2.5, elemental carbon, organic carbon and two
PAHs were examined. The distances from residences to identified mobile, area, and point
sources were determine as was the averages of meteorological variables for each time
period a sample was collected.
Descriptive Statistics :
The sample means, standard deviations, median, percentiles, and the maximum
values for the concentrations (|ig/m3) of selected the target compounds measured in
residential ambient air are listed in Table 11. The sample means, standard deviations,
median, percentiles, the minimum and maximum values of the closest distances from the
location of the RIOPA sampler to the public roadways by its functional class, and by the
roadway name are listed in Table 12 and 13, respectively. The sample means, standard
deviations, median, percentiles, the minimum and maximum values of distances from
sampler to the closest area and point sources are listed in Table 13. The sample means,
Final Report 38 11/22/2004
-------
Table 11. Concentrations of Selected VOCs in Residential Ambient Air (|ig/m3, N=183)
Compounds
m,p-Xy\ene
o-Xylene
Toluene
Benzene
Ethylbenzene
MTBE
Tetrachl oroethy 1 ene
Mean
3.25
1.71
6.82
1.50
1.34
5.75
1.10
Carbon TetrachlorideO.84
Formaldehyde
Acetaldehyde
Acrolein
PM2.5 Mass
Elemental Carbon
Organic Carbon
6.35
8.88
0.89
20.4
1.36
3.33
Standai
Deviati
4
6
5
1
2
5
3
2
2
6
1
.29
.51
.83
.54
.74
.34
.09
.28
.81
.50
.29
10.7
0
1
.64
.73
-d Percentiles
nn
25
1
0
2
0
0
2
0
0
2
O
J
0
.51
.59
.59
.69
.46
.23
.50
.48
.71
.05
.13
13.8
0
2
.92
.07
50
2.37
0.94
4.83
1.22
0.99
4.35
0.74
0.69
7.09
7.86
0.39
18.2
1.29
3.00
75
51.21
80.98
32.88
18.06
36.24
27.17
41.82
39.1
10.7
38.7
6.21
71.7
3.51
9.46
90
3.97
1.38
9.36
1.90
1.74
7.51
1.11
0.81
8.29
10.2
0.78
25.5
1.72
4.00
Comparison
Maximum NJ Urban
Concentration
6.44
2.16
14.67
2.68
2.51
12.13
1.50
0.94
9.33
14.6
1.69
30.9
1.96
5.61
2
1
5
0
0
6
0
0
2
1
-
.6
.2
.7
.62
.92
.83
.40
.09
.3
.1
15.8
-
-
A NJDEP mean concentrations reported in Elizabeth, NJ, 2001
(www. state.nj .us/dep/airmon/toxicsO 1 .pdf)
Final Report
39
11/22/2004
-------
Table 12. The Closest Distances from Sampler Location to the Public Roadways by
Functional Classes (km, N=183)
Roads
FC11
FC12
FC14
FC16
FC17
FC19
13. The
N=183)
Roads
195 a
Rtlb
Rt27b
Rt28b
Rt439 b
, , Standard
Mean _ . .
Deviation
1.53
2.53
0.50
0.19
0.29
0.03
Closest
1.05
1.16
0.54
0.17
0.22
0.02
Distances from
, , Standard
Mean _ . .
Deviation
1.89
1.10
1.23
1.54
0.93
1.23
0.83
0.83
0.87
0.81
Minimum
0.04
0.02
0.01
0.01
0.02
0.00
Percentiles
25
0.68
1.47
0.11
0.07
0.11
0.02
Sampler Location to
Minimum
0.05
0.03
0.04
0.10
0.01
50
1.33
2.87
0.33
0.13
0.25
0.03
Individual
75
2.28
3.44
0.65
0.32
0.39
0.04
Public
Percentiles
25
0.86
0.42
0.50
0.88
0.24
50
1.73
0.93
1.02
1.51
0.61
75
2.88
1.72
1.85
2.08
1.61
3.70
5.58
2.49
0.78
0.97
0.13
Roadways (km,
5.33
3.62
3.40
3.59
2.86
a: Interstate (FC11), b: Major Arterial (FC14)
Final Report
40
11/22/2004
-------
Table 14. The Closest Distances from Sampler Location to Area Sources and Point
Sources Likely Impact Elizabeth, NJ (km, N=183)
Emission Sources
Gas Station
A/r Standard Mimmu
Mean _ . .
Deviation m
0.36
Dry Cleaning Facilities 0.55
Refinery a
Tanker Terminal b
Industry c
Aviation Service d
Industry e
j?
Industry
Joint Meeting of
Essex and Union g
2.98
4.78
2.51
5.92
4.19
3.27
2.77
0
.21
0.39
1
.12
1.14
1
1
.00
.15
1.11
1
.14
1.20
0.03
0.06
0.84
3.23
0.62
2.80
0.81
0.99
0.40
Percentiles
25
0.22
0.25
2.06
3.78
1.75
5.03
3.50
2.46
1.91
50
0.36
0.43
3.07
4.58
2.60
6.18
4.43
3.24
2.36
75
0.49
0.77
3.77
5.72
3.26
6.91
5.08
3.98
3.80
Maximu
m
1.01
1.
.69
5.76
7.
.69
5.63
8.63
6.
.64
6.04
5.
.81
a: Refinery = Xyl PS1, Tol PS1, Bzn PS1, Ebz PS1, MTBE PS1, PCE PS1; b: Tanker
Terminal = Xyl PS2, Tol PS2, Bzn PS2; c: Industry = Xyl PS3, Ebz PS2; d: Aviation
Service = Xyl_PS4, Tol_PS5, Bzn_PS4; e: Industry = Tol_PS3; f: Industry = Tol_PS4; g:
Joint Meeting of Essex and Union = Bzn PS3
Table 15. The Meteorological
Variable, Unit
Temperature, K
Wind Speed, m/s
Relative Humidity,
Atmospheric
Pressure, mmHg
Precipitation, mm
Mixing Heights, km
Pasquill Stability
Class
Mean
284.2
4
.3
66.3
762.3
0
1
5
.0207
.027
.028
Variables (N=l 83)
Standard
Deviation
8.0
1.1
12.
4.5
6
0.0249
0.362
0.444
Minimum
265.5
1.9
42.7
750.3
0.000
0.414
3.867
Percentiles
25
279.4
3.6
58.1
759.6
0.000
0.767
4.706
50
284.6
4.4
66.4
761.6
0.010
0.948
5.000
75
289.9
5.1
75.9
765.5
0.040
1.214
5.300
Maximu
m
303.3
8.
0
91.8
773.1
0.
2.
6.
130
099
063
Final Report 41 11/22/2004
-------
standard deviations, median, percentiles, the minimum and maximum values of the
meteorological condition variables are listed in Table 15.
DISCUSSION
Prior to establishing the best-fit linear regression equations for each compound Bivariate
Pearson Correlations were conducted to guide the inclusion of different variables and
examine associations among the variables. The In transformed concentration data were
used since the concentration distribution was consistent with a log normal distribution
and linear regression analyses assumes a normal distribution for the independent variable.
The inverse distance was used since concentration declines inversely from line sources,
such as roadways, or as the square of the inverse for point sources based on an idealized
Guassian Dispersion. The square of inverse distance was also examine, but no
differences in results were observed, so only the inverse distance was retained in the final
mathematical models. As described in the method section andAppendix A outliers were
identified for the regression model calculated from the entire data set and a second
regression model was determined after eliminating the outliers. The variables selected in
the model were examined for multicollinearity. Details for the model evaluated for each
compound are given in Appendix A.
Test of Outlying Observations
To verify that the outlying observations were not eliminated based on specific
conditions, the ANOVA tests were performed on the means of the predictor variables
Final Report 42 11/22/2004
-------
between group of outliers and group of non-outliers. Duncan's multiple range test results
indicated that means of predictor variables were not significantly different between
groups of outliers and non-outliers for selected VOCs. The frequency of outliers removed
is listed by season in Table 16.
Model Summaries
The relative contribution to residential ambient air concentrations due to
proximity to ambient sources on the selected air toxics and PM2.5 with corresponding
meteorological conditions were determined by multiple linear regression analyses (Table
16). The F statistics were significant for overall models except carbon tetrachloride
(p<0.0001). Probabilities for parameter estimates were more significant and the r2 larger
for the meteorological variables than the proximity variables. This implies that a greater
percentage of the explanatory power of the regression equations for these compounds
were due to changes in the meteorological conditions than the distance to a source (see
below). There were some interactions between the predictor variables in the best-fitting
model, especially between the meteorological variables. The model coefficients of
determination for the compounds that included proximity predictors varied between 0.16
and 0.47 (Table 17). The samples and meteorological data were averaged over 48 hours,
reducing the possibility of accounting for shorter term variability that could alter the air
concentrations.
Among the variables associated with proximity to mobile source emissions, the
inverse distance to major urban arterial roadways (FC14) was selected as significant
predictor in best-fitting models of residential ambient air concentrations of all of the
Final Report 43 11/22/2004
-------
aromatic compounds and the inverse distance to the NJ Turnpike (FC11) for PM2.5,
organic carbon and the two individual PAHs examined (coronene and Benzo[ghi]pyrene).
The inverse distance to the closest gas station was included as a predictor in the models
of residential ambient air concentrations of m,p-xy\ene, o-xylene, benzene, and MTBE.
The inverse distance to areas in Elizabeth that had high truck traffic that included loading
and unloading and therefore idling trucks was included in the models for PM2.5 and
elemental carbon while the inverse distance to the refinery in Linden, NJ was included in
the regression equation for elemental carbon. The inverse distance to the closest dry
cleaning facility was selected as a significant predictor variable in the model of
residential ambient air concentration of PCE in Elizabeth, NJ. No variables associated
with the inverse distance to sources were identified for the three aldehyde compounds.
Nor were any of the proximity factors included in the control variable that did not have
mobile source emissions, carbon tetrachloride, tetrachloroethylene, particulate sulfur and
particular selenium.
Among the meteorological condition variables atmospheric stability, mixing
height, temperature, wind speed, and relative humidity were significantly associated with
one or more of the residential ambient air concentrations. The atmospheric stability and
temperature were consistently included as statistically significant predictors in the best-
fitting models of the aromatic compounds, MTBE and the particulate species, while
mixing height was selected for acrolein. Atmospheric stability is calculated based on
mixing height and temperature. Wind speed was included with a negative coefficient in
most models.
Final Report 44 11/22/2004
-------
A consistency in the parameter estimates of the proximity variables are observed
among the aromatic compounds. The order of mobile emission strength in Elizabeth, NJ
(Table 2) is toluene, xylenes (m,p xylenes is greater than o xylene), benzene and ethyl
benzene, the same order as the magnitude of the coefficients in the regression equation,
though the sum of the coefficients of o and m/p xylene exceeds that of toluene. The order
of the coefficients for proximity to gasoline stations (GS"1) is MTBE, m,p xylene,
benzene and o xylene with GS"1 not included in the regression equation for toluene.
MTBE is the compound with the highest concentration in gasoline and has the highest
vapor pressure (0.309atm) of the VOCs studied. The next most prevalent compound of
those that included GS"1 is m,p xylene. Lastly, while o xylene might be at a higher
concentration than benzene in gasoline, benzene has a high vapor pressure (0.125atm)
than the xylenes or ethyl benzene (0.0109-0.0125atm). Thus, the parameter coefficient
order is consistent with the abundance of these compounds in gasoline as modified by the
vapor pressure. It is unclear why proximity to gas stations was not included in toluene's
regression equation since its concentration in gasoline is second only to MTBE and its
vapor pressure is between that of benzene and m,p xylene, The lack of inclusion GS"1 in
the regression equation for ethyl benzene might reflects its lower concentration in
gasoline and the lower air concentration with more values being below detection.. The
regression equation for MTBE did not include distance from arterial roadways, while the
aromatic compounds did, but did include distance from the interstate highway. These
differences may reflect the more efficient combustion and removal in the catalytic
converter of MTBE compared to the aromatic compounds and lower tailpipe emissions
along with a (Poulopoulos and Philippopoulos 2003).
Final Report 45 11/22/2004
-------
Two polyaromatic hydrocarbons (PAHs) measured in the PM2.5, coreonene and
benzo[ghi]perylene, were evaluated for effects of proximity to mobile sources. Coronene
has been used as an index PAH compound to differentiate between gasoline and diesel
vehicles because coroene is found in emissions from gasoline powered vehicles, but has
not been detected in diesel emissions (Rogge et al, 1993). Benzo[ghi]perylene is present
in both diesel vehicle and gasoline vehicle exhaust, so should be an individual compound
representative of mobile source emissions (Harrison et al. 1996). It should be noted that
these compounds are also emitted from other combustion sources, so may not be solely
from mobile sources. Both compounds included the inverse distance to FC11 as well as
atmospheric stability, temperature and wind speed in the regression equation (Table 17).
The elemental carbon was associated with FC14. FC14 has three (Rt 1/9, Rt 27, Rt 439)
roadways that are major truck thoroughfare. A number of the homes in the study on very
close to FC14 and FC14 was the mobile source area associated with the aromatic
hydrocarbons. No clear difference in what sources contributed to PM2.5 mass and the two
individual PAHs that may be markers of mobile sources, was identified. The weakest
associations were observe for organic carbon which is expected to have more sources
besides combustion and diesel emissions than the other components of PM. All PM
components were influenced by a variety of meteorological factors, proximity to the NJ
Turnpike, major arterial roads and/or truck loading/unloading areas.
As a check on the possibility that there was an inherent bias in the sampling or
analyses that caused the associations between the proximity to mobile sources in the
regression equations to volatile compounds without mobile sources, carbon tetrachloride
and tetrachloroethene were evaluated. Carbon tetrachloride has little industrial or
Final Report 46 11/22/2004
-------
commercial uses and therefore minimal sources in Elizabeth, NJ, while tetrachloroethene
is the primary solvent used in the dry cleaning industry. No parameters, neither
proximity nor meteorological variables, were include in the regression equations for
carbon tetrachloride at a p<0.5 criteria indicating that no local sources nor distance to
roadways influenced the variability in its measured concentration. This is consistent with
the lack of local sources. Meteorological variables were included in the regression
equation for tetrachloroethene, but not proximity to roadways or gasoline stations. Since
tetrachloroethene is used in dry cleaning, the distance between the sampling locations and
dry cleaning facilities were determine and evaluated in the regression equation. The final
regression equation for tetrachloroethene included the inverse distance to dry cleaning
facilities, atmospheric stability, temperature, wind speed and relative humidity with
similar partial r2 and coefficients for the meteorological identified for the regression
equations for the compounds derived from mobile sources (Table 17).
To evaluate whether all particulate components might show associations with
mobile sources, selenium an element measured in PM2.5 that is not expected to be
associated with mobile sources was also examined. Its regression equation did not
include proximity or meteorological to mobile sources.
Effects of Source Proximity
The common interpretation of a regression coefficient is that it estimates the
change in the response variable per unit increase in the predictor variable. This estimation
has limitations when the predictor variables are seriously intercorrelated. When highly
correlated predictor variables vary together, the magnitude of the outcome variable
Final Report 47 11/22/2004
-------
change with a single predictor variable is altered. Since the multicollinearity in the
models was not serious for the immediate remedial measures (Appenix A), based on the
previous diagnostics, the effect of individual parameter estimates on the concentrations
were evaluated by holding all the variables constant except for the variable being
evaluated. This approach allows for the model to be evaluated for the effect of a single
variable across its range of values when considering all other variable to be constant. It is
a type of sensitivity analysis. The assigned constant values used were the median value
for the meteorological variables and the maximum value for the distance. The maximum
value of the distance was used since the smallest changes in concentrations with distance
would be expected at the furthest distance from the source. A plot of the predicted air
concentration with distance for each aromatic compound, MTBE and the PM2.5
components derived from the best fit regression equation are given in figures 11-19.
The shape of the decline with distance follows an exponential form since the
regression equations included distance as an inverse term and the concentration was
expressed as a log normal concentration. For the roadways (both FC11 and FC14) and
gasoline stations, the decline in predicted concentration is rapid during the first 200m
with little change due to roadways after that distance. The magnitude of this change
between 20 meters, the distance of the closest samples, to 200 meters was a factor of
approximately two for the PM2.5 constituents to four for the aromatic compounds and
MTBE. The scatter plots of concentration with distance for the actual data (figure 20 to
45) are consistent with the rapid falloff in concentration with distance predicted by the
regression equations, though the falloff may be at a slightly further distance, though the
changes in concentrations appear to be small. The predicted effect for the PM area
Final Report 48 11/22/2004
-------
sources of truck loading and unloading for PM2.5 and elemental carbon is over a longer
distance than the roadways. This is probably a statistical artifact the multiple area
sources associated with truck loading and unloading that are all to the east/north east of
Elizabeth and the difficulty in assigning the appropriate distance to the site since it covers
a large area (figures 46 and 47). The roadways FC19 which are small local roads show a
maximum effect on PM2.5 mass at 10 to 20 meters. This is suspect as a true mobile
source of PM2 5 from streets in this category is minimal since the traffic is very light with
little if any truck traffic. The scatter plot suggests that only a few data points are
responsible for the observation so it may be a statistical associate with other causes. All
homes will be close to roadway FC19, even if they are near major roadways as well, as is
evident by the maximum distance to a roadway classified as FC19 for any home was less
than 100 meters.
One meteorological parameter that we could not adequately incorporate in the
models was wind direction. Several different approaches to examine wind direction were
examined including categorizing wind direction based on the amount of variability in
wind direction during the sampling period as well as evaluation of the dominant
direction. However, the micrometeorology around the sample sites could not be
definitively represented by the meteorological station at Newark Airport since directional
changes are expected around buildings and roadways. Thus, the effect of wind direction
could not be adequately represented in the model and therefore final models did not
include that term.
Final Report 49 11/22/2004
-------
The meteorological variables contributed more to the explanatory power of the
regression equations than the proximity variables. One possible reason this is that there
were more homes sampled at distances greater than 200 meters, than closer than 200
meters, the distance with the maximum predicted effect of the roadway. If only homes
within 200 m of a major roadway were included in the study it is possible that the effect
of proximity would be stronger. The regression equations suggest that the effect of
distance due to mobile sources is minimal for homes further than 200 meters from major
roadways and that concentration changes nears homes more than 200 meters from
roadways or gasoline stations would be dependent upon meteorology which controls the
urban background levels for constant emission sources within an urban center and
transport of pollutants from outside Elizabeth, NJ. Exploratory analyses of only homes
within 200 meters and homes within 500 meters of FC11, the NJ Turnpike, suggest that
inverse distance to the NJ Turnpike was a potential predictor variable, but the term FC11
did not reach statistical significance at p<.15 for the aromatic compounds probably due to
the small n in that sub-sample of homes.
None of the regression models for the three aldehyde compounds studied,
formaldehyde, acetaldehyde and acrolein, included the inverse distance to any of the
mobile source proximity terms, even though they are exhaust emission products. The
positive association with the distance to FC11 roadways for formaldehyde implies that
roadways are not a source of formaldehyde. It is more likely that it is a result of an
association among FC11 distance, formaldehyde concentration and third variable not
evaluated. It is possible though that photochemical production of formaldehyde increases
Final Report 50 11/22/2004
-------
with distance from roadway in a manner similar to ozone, which is higher away from
roadways than directly adjacent from roadways as there is time component to its
maximum concentration. To attempt to evaluate whether the affect of proximity to
roadways could be observed in the absence of photochemistry, regression equations were
also determined for data when the mean temperature during sampling was <10°C, days
when photochemistry is expected to be minimal. Again, only meteorological variables
were only included in the regression equation for formaldehyde and acetaldehyde with a
p<0.15 (Table 17). These analyses had a smaller so had less statistical power to identify
an association.
Final Report 51 11/22/2004
-------
Summary
Mobile sources (cars, trucks and gasoline stations) are a main source for aromatic
hydrocarbons, methyl tert butyl ether, and PM2.5, elemental carbon and selected PAHs in
Elizabeth, NJ. Meteorological factors, in particular atmospheric stability, wind speed and
temperature were statistical predictors of the overall concentration of these pollutants in
the ambient air surrounding homes in the area. The air concentrations at homes that were
very close to roadways and gasoline stations within 200 to 500 meters, were inversely
related to the distance to those sources. Increases in the concentrations for the closest
residences are predicted to be factors of two to four above what might be considered the
background levels for the area. Area sources that were associated with truck activity or
possibly other mobile source (airport or shipping terminal) also appears to increase the
PM levels associated with diesel emissions. These increases in ambient air for homes
near ambient sources could potentially result in corresponding increases in personal
exposure for individuals living in homes without smokers since the ambient air
surrounding homes penetrates into the home and a strong association has been found
between ambient air concentrations outside a portion of the homes studied during the
RIOPA study with both indoor and personal air for these compounds (examples in
Figures 48 to 50 - from Weisel et al. 2004b).
Final Report 52 11/22/2004
-------
Table 16. Frequency of Non-detects, Outliers, and Non-outliers by Season
m,p-Xy\ene
Season
Fall
Spring
Summer
Winter
0-Xylene
Season
Fall
Spring
Summer
Winter
Season
Fall
Spring
Summer
Winter
Season
Fall
Spring
Summer
Winter
Season
Fall
Spring
Summer
Winter
Season
Fall
Spring
Summer
Winter
Season
Fall
Spring
Summer
Winter
Non
N
1
1
2
Non
N
1
Toluene Non
N
3
Benzene Non
N
1
Ethylbenzene Non
N
3
3
1
MTBE Non
N
6
1
1
PCE Non
N
5
1
9
Detects
%
25
25
50
Detects
%
100
Detects
%
100
Detects
%
100
Detects
%
42.86
42.86
14.29
Detects
%
75
12.5
12.5
Detects
%
33.33
6.67
60
Outliers
N
4
1
6
2
%
30.77
7.69
46.15
15.38
Outliers
N
4
1
4
6
%
26.67
6.67
26.67
40
Outliers
N
1
7
4
3
%
6.67
46.67
26.67
20
Outliers
N
5
1
5
7
%
27.78
5.56
27.78
38.89
Outliers
N
5
1
7
4
%
29.41
5.88
41.18
23.53
Outliers
N
6
7
10
5
%
21
25
35
17
.43
.71
.86
Outliers
N
O
5
3
2
%
23
38
23
15
.08
.46
.08
.38
Non-Outliers
N
51
36
46
33
%
30.72
21.69
27.71
19.88
Non-Outliers
N
52
36
48
31
%
31.14
21.56
28.74
18.56
Non-Outliers
N
55
30
46
34
%
33.33
18.18
27.88
20.61
Non-Outliers
N
51
36
48
29
%
31.1
21.95
29.27
17.68
Non-Outliers
N
48
36
43
32
%
30.19
22.64
27.04
20.13
Non-Outliers
N
50
24
42
31
%
34.01
16.33
28.57
21.09
Non-Outliers
N
48
31
50
26
%
30.97
20
32.26
16.77
Final Report
53
11/22/2004
-------
Table 17. Summary of Finalized Best-fitting Models of Selected VOCs (-p<0.15 used as
criteria for inclusion)
Pollutant
Total r2
m,p-Xy\ene
0.33
o-Xylene
0.42
Toluene
0.31
Benzene
0.41
Ethylbenzene
0.16
MTBE
0.25
PERC
0.31
Formaldehyde
0.15
Acetaldehyde
0.13
Acrolein
0.046
Row
Heading
X,
Pi(SE)
p_r2
x,
Pi(SE)
P-r2
x,
Pi(SE)
P-r2
x,
P,(SE)
P-r2
X,
P,(SE)
P-r2
X,
Pi(SE)
P-r2
X,
P i(SE)
P-r2
X,
Pi(SE)
P-r2
X,
P i(SE)
P-r2
X,
P i(SE)
P-r2
Intercept
Po
4.9(1. 7)b
Po
4.5(1.4)b
Po
3.1(1. 8)d
Po
10.(1.3)a
Po
6.0(2.4)c
Po
-2.7(2.3)**
Po
2.5(1.3)c
Po
9.2(1.1)
Po
2.2(0.3)
Po
-.68(.33)
Mobile/Area/Point
Source
FC141
7.9(4.4)d
0.01
FC141
7.4(4.5)d
0.01
FC141
14.7(4.4)b
0.04
FC141
10.1(1.5)*
FC142
9.7(5.6)c
0.02
FC142
22.3(14.3)c
0.01
GS1
17.4(6.3)t
0.04
GS1
9.5(5.5)c
0.02
GS1
5.5(3.3)d
0.01
GS1
33.6(8.3)a
0.09
DCF1
32.7(12.4)b
0.04
Meteorological Variables
Stab
0.54(0.11)*
0.18
Stab
0.52(0.09)a
0.27
Stab
0.71(0.12)a
0.22
Stab
16.1(5.6)b
0.03
Stab
0.44(0.16)b
0.08
Stab
0.24(0.15)c
0.01
Stab
0.14(0.09)*
0.01
MH
0.61(.32)
.046
K
-0.02(0.005)*
0.09
K
-0.02(0.004)a
0.09
K
-0.02(0.006)b
0.03
K
0.30(0.10)*
0.10
K
-0.03(0.007)b
0.06
K
0.01(0.007)c
0.01
K
-0.01(0.004)b
0.05
K
-0.11(.03)
.094
K
.023(.007)
.093
U
-0.12(0.04)b
0.04
U
-0.04(0.004)*
0.25
U
-0.11(0.07)c
0.015
U
-0.19(0.06)*
0.12
U
-0.14(0.04)"
0.18
U
-0.31(.22)
.014
U
-.13(.06)
.036
RH
0.01(0.005)c
0.02
RH
0.01(0.003
)b
0.03
Final Report
54
11/22/2004
-------
Pollutant
Total r2
PM2.5
0.47
EC
0.40
OC
0.33
Coronene
0.67
Benzofghi]-
pyrene
0.66
Sulfur
0.52
Selenium
0.41
Row
Heading
X,
P i(SE)
P-r2
X,
P i(SE)
P-r2
X,
P i(SE)
P-r2
X,
P i(SE)
P-r2
X,
P i(SE)
P-r2
Xi
P i(SE)
P-r2
Xi
P i(SE)
P-r2
Intercept
Po
1.0(0.6)
Po
-2.5(0.6)
Po
-2.2(.7)
Po
24(4)
Po
22(4)
Po
4.3(2.2)
Po
1.2(3.5)
Mobile/Area/Point Source
FC111
20(11)
.016
REF1
630(26)
.033
FC111
39(21)
.043
FC111
133(42)
0.091
FC111
123(38)
.094
03
29(5)
.090
03
27(9)
.059
FC191
4.2(1.7)
.052
TRUCK1
51(30)
.016
TRUCK1
78(36)
.050
Meteorological Variables
Stab
0.43(.09)
.32
Stab
0.32(0.13)
.078
Stab
0.66(0.14)
.25
Stab
0.81(0.28)
.06
Stab
0.71(.26)
.060
Stab
0.44(.ll)
.14
Stab
0.93(.19)
.29
U
-0.13(.04)
.066
RH
.011(.004)
.24
Precip
.013(.006)
.035
K
-O.lO(.Ol)
.28
K
-0.087(.01)
.26
K
-0.023(.008)
.23
K
-0.15(.08)
.029
U
-0.41(.10)
.24
U
-0.38(.10)
.25
U
-O.ll(.OS)
.039
U
-0.019(.007)
.034
Analysis of aldehyde data for days when the temperature was <10°C, to evaluate role of photochemistry
Formaldehyde
0.13
Acetaldehyde
0.13
Acrolein
0.046
X,
Pi(SE)
P-r2
X,
P i(SE)
P-r2
X,
Pi(SE)
P-r2
Po
1.5(0.5)
Po
2.2(0.3)
K
-0.03(.02)
.063
K
-0.03(0.02)
0.068
U
-0.13(.07)
.007
MH
0.43(.23)
.03
Final Report
55
11/22/2004
-------
X;, /' th predictor variable; Po, intercept of model; ft, parameter estimate of /' th predictor;
SE, standard error of parameter estimates; r2, coefficient of determination; P-r2, Partial r
square of the variable.
~l, indicates inverse values; "2, indicates inverse square values.
p<0.15 used as selection criteria for inclusion of a variable in the model
FC14"1 is the inverse distance (m) to the nearest major arterial roadways
GS"1 is the inverse distance (m) to the nearest gasoline station
PS"1 is the inverse distance (m) to a point source (Linden Refinery)
DCF"1 is the inverse distance (m) to the nearest dry cleaning facility
TRUCK"1 is the inverse distance (m) to the major truck loading areas
Airport"1 is the inverse distance (m) to Newark International Airport
Stab is atmospheric stability
K is temperature (°K)
U is the wind speed (m
MH is the mixing height (km)
Precip is precipitation (total mm)
Final Report 56 11/22/2004
-------
Regression Model Predictions Figures 10-20
There figures show the change in concentration as predicted by the regression models
while varying the variable indicated from the minimum to maximum value observed
during the study and holding all other variables in the model constant (median value for
the meteorological variables or maximum value for the distance variables). The side bar
is a box and whisker plot of the measured concentrations during the study (mean, median,
5th, 25th, 75th, 95th percentiles) for comparison.
Scatter Plots of Distance to Concentration Figures 21-45
These figures are the scatter plots of the concentration measured with the determined
nearest distance between each home and the nearest roadway in each class or gasoline
station for all values in the study. The figures provide a visualization of the association
concentration with distance to mobile sources without consideration for meteorology, a
major factor that influences concentration.
Final Report 57 11/22/2004
-------
Effect of Distance to Sources on Residential Ambient Air
Concentration of m,p-Xylene
4.5 T 1 1 ^4.5
4.0
3.5
3.0
CD
X 2.5
Q.
E
2.0
1.5
1.0
•FC14inv
GSIinv
50
100
150
200
Distance, meters
Mean
-4.0
-3.5
-3.0
-2.5
-2.0
-1.5
1.0
250
Figure 10: Effect of the Distance to the Emission Sources on the Residential Ambient Air Concentration of ^-Xylene Estimated by the Best-fitting
Model (Box Plot Shows Mean and Quartiles of Distribution of ^-Xylene Concentrations)
Final Report
58
11/19/2004
-------
Effect of Distance to Sources on Residential Ambient Air
Concentration of Benzene
2.1
1.8
£ 1.5
o>
c
CD
N
CD
GO
1.2
0.9
0.6
-FC14inv
GSIinv
Mean
-2.1
1.5
-1.2
-0.9
-0.6
50 100 150
Distance, meters
200
250
Figure 11: Effect of the Distance to the Emission Sources on the Residential Ambient Air Concentration of Benzene Estimated by the Best-fitting Model (Box
Plot Shows Mean and Quartiles of Distribution of Benzene Concentrations).
Final Report
59
11/19/2004
-------
LJJ
DQ
Effect of Distance to Sources on Residential Ambient Air
Concentration of MTBE
20
15
10
GSIinv
•FC11inv
Mean
50 100 150
Distance, meters
200
250
15
20
10
Figure 12: Effect of the Distance to the Emission Sources on the Residential Ambient Air Concentration of MTBE Estimated by the Best-fitting Model
(Box Plot Shows Mean and Quartiles of Distribution of MTBE Concentrations).
Final Report
60
11/19/2004
-------
Effect of Distance to Sources on Residential Ambient Air
Concentration of o-Xylene
2.00 -i 1 i ^2.00
1.75
1.50
x;
6
1.00
0.75
0.50
50
100
150
Distance, meters
•FC14inv
GSIinv
-1.75
-1.50
-1.25
-1.00
0.75
0.50
200
250
Figure 13: Effect of the Distance to the Emission Sources on the Residential Ambient Air Concentration of 0-Xylene Estimated by the Best-fitting
Model (Box Plot Shows Mean and Quartiles of Distribution of 0-Xylene Concentrations).
Final Report
61
11/19/2004
-------
E
O)
0"
0
_3
o
Effect of Distance to FC14 on Residential Ambient Air
Concentration of Toluene
14
12
10
6
-12
-10
Mean
14
-6
-4
50
100
150
200
250
Distance, meters
Figure 14: Effect of the Distance to the Emission Sources on the Residential Ambient Air Concentration of Toluene Estimated by the Best-fitting
Model (Box Plot Shows Mean and Quartiles of Distribution of Toluene Concentrations)
Final Report
62
11/19/2004
-------
2.0
Effect of Distance to FC14 on Residential Ambient Air
Concentration of Ethylbenzene
1.6
CD
C
CD
N
CD
.Q
>,
£
LLJ
0.8
0.4
50 100 150
Distance, meters
200
Mean
-1.6
2.0
-1.2
0.8
0.4
250
Figure 15: Effect of the Distance to the Mobile Source Emission on the Residential Ambient Air Concentration of Ethylbenzene Estimated by the Best-
fitting Model (Box Plot Shows Mean and Quartiles of Distribution of Ethylbenzene Concentrations)
Final Report
63
11/19/2004
-------
O)
LO"
c\i
0.
Effect of Distance to Sources on Residential Ambient Air
Concentration of PM2.5
20
15
10
FC19inv
— - - Truck Inv
- - - - FC11inv
Mean
-25
-22
-20
-18
-15
-12
•10
100
200
300
400
500
Distance, meters
Figure 16: Model Prediction Of PM25 Concentration With Distance To Fl 1, F19 And Truck Loading And Unloading Region Estimated By The Best-Fitting
Model (Box Plot Shows Mean And Quartiles Of Distribution Of PM2.5 Concentrations)
Final Report
64
11/19/2004
-------
Effect of Distance to FC14 on Residential Ambient Air
Concentration of EC
o
LJJ
1.5
1.4
1.3
1.2
1.1
1.0
0.9
Mean
1.5
-1.2
•0.9
0 1000 2000 3000 4000 5000
Distance, meters
Figure 17: Model prediction of elemental carbon concentration with distance to Truck loading/dock area Estimated By The Best-Fitting Model (Box Plot
Shows Mean And Quartiles Of Distribution Of Elemental Carbon Concentrations)
Final Report
65
11/19/2004
-------
~O)
i
o"
Effect of Distance to Sources on Residential Ambient Air
Concentration of OC
10
9
8
7
6
5
4
3
2
1 -
0
100 200 300
Distance, meters
400
Mean
10
-8
-6
-4
-2
500
Figure 18: Model Prediction Of Organic Carbon Concentration With Distance To Fl 1 Roadways Estimated By The Best-Fitting Model (Box Plot Shows
Mean And Quartiles Of Distribution Of Organic Carbon Concentrations)
Final Report
66
11/19/2004
-------
Effect of Distance to FC11 on Residential Ambient Air
Concentration of Coronene
3.0
O)
CD"
c
CD
C
2
o
O
2.5
2.0
1.5
1.0
0.5
0.0
Mean
-3.0
-2.5
-2.0
-1.5
-1.0
-0.5
-0.0
50 100 150 200 250
Distance, meters
Figure 19: Model Prediction Of Coronene Concentration With Distance To Fl 1. Estimated By The Best-Fitting Model (Box Plot Shows Mean And
Quartiles Of Distribution Of Coronene Concentrations)
Final Report
67
11/19/2004
-------
CD"
o
N
C
CD
GO
10
Effect of Distance to FC11 on Residential Ambient Air
Concentration of Benzo[ghi]pyrene
8
6
0
50 100 150
Distance, meters
200
FC11I
*
§
§
250
10
IV
-8
-6
-4
-2
Figure 20: Model Prediction Of Benzo[ghi]pyrene Concentration With Distance ToFll. Estimated By The Best-Fitting Model (Box Plot Shows Mean And
Quartiles Of Distribution Of Benzo[ghi]pyrene Concentrations)
Final Report
68
11/19/2004
-------
60.00
50.00
O 40.00
"+j
5
"" 30.00
0)
O
O
20.00
10.00
0.00
mp Xylene FC11
!» t*
* * »V»?«»A»A
0.00 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00
Distance FC11
Figure 21. Scatter plot ofm/p xylene with distance from FC11 Roadways, major urban arterial.
Final Report
69
11/19/2004
-------
8.00
7.00
6.00
c
S 5.00
5
*J , ««
<1>
o
c
o
o
.00
.00
1.00
0.00
Benzene
» •
• *
j_
0.00 0.50 1.00 1.50 2.00 2.50
Distance FC11
3.00
3.50
4.00
Figure 22. Scatter plot of benzene with distance from FC11 Roadways, major urban arterial.
Final Report
70
11/19/2004
-------
PM2.5withFC11
on
OU
yn
/ U
C fin
c
3 en
OU
O
CAC\
o
° 30
10 dU
^c on
^^H
O_
1 n
I U
n
.
* * * «» * * •
** ^*%**j! \* »*»< /» * * * **!
* **/***
\J i i i i i i i
0 0.5 1 1.5 2 2.5 3 3.5 4
Distance to FC11 (km)
Figure 23. Scatter plot of PM25 with distance from FC11 Roadways, major urban arterial.
Final Report
71
11/19/2004
-------
ECFC11
c
o
+J
£
<1>
o
c
o
o
o
LJJ
H-
3.5
rs
o
2.5
2
1.5
1
0.5
n
A
A
A A
A
A A A A A
A A A A
A A A A A A
A A A *
A A A
A A A* A
A A A A A A
A AA A A A A A
A
0 500 1000 1500 2000 2500 3000 3500 4000
FC11 Distance
Figure 24. Scatter plot of elemental carbon with distance from FC11 Roadways, major urban arterial.
Final Report
72
11/19/2004
-------
-------
mp Xylene FC14
Concentration
/1U.UU
IR nn
I O.UU
IK nn
I O.UU
14 nn
I *f .UU
1 9 nn
I Z.UU
m nn
I U.UU
8nn
.uu
R nn
O.UU
4 nn
t.uu
2.00
n nn
»
•
: • . .•
.*..-.
>«*» *rv» »» , • »
.«*••$ *..- ».-*.* t : t '
^MSfe t \ . : •:
U.UU n i i i i
0.00 0.50 1.00 1.50 2.00 2.50 3.(
Distance FC14
Figure 25. Scatter plot ofm/p xylene with distance from FC14 Roadways, interstate.
Final Report
74
11/19/2004
-------
oXyleneFC14
iJ.UU
C*^ f^n
_O
+J ^ nn
flj O.UU
+^ « i-«
CO f^fi
z.ou
0
O o /-»/-»
^ Z.UU
O
01 ^n
1 nn
i .\j\j
Ocn
n nn
*
» * »
»» »
^ »
•
• • * * • + *
;^*
-------
Ben-FC14
Concentration
o.uu
5nn
.uu
A nn
*f .UU
^ nn
o.uu
o nn
z.uu
1 nn
I . UU
n nn
•
• »
»;
•
* » »
•• * * •
• »• • * •
*• 4
4 * • * *
;,<;V«; • " •• • •
*:•;•> . «. .
I»v* ** • *
i'l^ri. . s • .. •
***** »
0.00 0.50 1.00 1.50 2.00 2.50 3.(
Distance FC14
Figure 27. Scatter plot of benzene with distance from FC14 Roadways, interstate.
Final Report
76
11/19/2004
-------
Toluene FC14
Concentration
OiJ.UU
on nn
on nn
^u.uu
1C nn
I vJ.UU
m nn
I U.UU
c nn
vJ.UU
n nn
*
^
•n * *••
• ^
*• *Y
A. • • A
;;;-;*... ; ..;
^?C^;\:* I ;<; i : ' * * ;• .
U.UU n i i i i
0.00 0.50 1.00 1.50 2.00 2.50 3.(
Distance FC14
Figure 29 Scatter plot of toluene with distance from FC14 Roadways, interstate.
Final Report
77
11/19/2004
-------
Ethyl Benzene FC14
iJ.VJVJ
4^0
.OU
4OO
.UU
,- r> cr»
C o.ou
o
+J O OO
0
is 9 oo
^ £..uu
O
O 1 50
1 OO
I .UU
O^o
n nn
•
"
*#*• * ** •
******* * *
«*vi • • • •
****** * *
»^***«** '****! *
>V*tf%i* :**»* »
If* ** * * * ;
0.00 0.50 1.00 1.50 2.00 2.50 3.(
Distance FC14
Figure 30. Scatter plot of ethyl benzene with distance from FC14 Roadways, interstate.
Final Report
78
11/19/2004
-------
MTBE FC14
c
_o
!_
+J
0
o
c
o
o
ou.uu
9^ nn
zo.uu
9n nn
zu.uu
1 ^ nn
I O.UU
m nn
I U.UU
c nn
\j.\j\j
Onn
• • »
* *»
A *
•
»»
»
•
•
+ •
** * »
• » »
j|» * »*«£* »» * * *
«^«* »t**»***^ »* *
1^ ^ »*«* » » • *
A A 4t A^ A. A. £
.UU •.»-»•!» | »| |
0.00 0.50 1.00 1.50 2.00 2.50 3.(
Distance FC14
Figure 31. Scatter plot of methyl tert butyl ether (MTBE) with distance from FC14 Roadways, interstate.
Final Report
79
11/19/2004
-------
an
ou
*- yn
£ /O
3^n
c
O 50
2
*^ An
c 4U
0
2 ?n
C ^U H
o
o
iq 20
s 1 n
K I U
Q_
i
C
PM2.5 and FC14
^* ***** * *
^*»^»4»^*>* * * ,* ^
»*»»*»**»
i i i i
) 0.5 1 1.5 2 2
Distance to FC14 (km)
5
Figure 32. Scatter plot of MP25 mass with distance fromFC14 Roadways, interstate.
Final Report
80
11/19/2004
-------
EC FC14
H-
C ^
o 3
"+J
(0 o c
I_ ^-3
+J
0 o
O ^
C
0 1 5
O T-O
fl5 1
LU
n ^
U.vJ
n
•
•
• •
•
"• • •• •
*'m. " " •
* • • •
* • 1 ' "
f \_ m m m
m
,
0
500
1000 1500
FC14 Distance
2000
2500
Figure 33. Scatter plot of elemental carbon with distance from FC14 Roadways, interstate.
Final Report
81
11/19/2004
-------
OC FC14
10
8
TO
£ 6
O 4
O
O 3
0
-+—* ^
0
500
1000 1500
FC14 Distance
2000
2500
Figure 34. Scatter plot of organic carbon with distance from FC14 Roadways, interstate.
Final Report
82
11/19/2004
-------
Tetrachloroethene FC14
Concentration
H-iJ.UU
40 00
HU.UU
r>c (-»(-»
OvJ.UU
^O OO
ou.uu
9C. OO
zo.uu
90 00
£.\j.\j\j
1C nn
I vJ.UU
1O OO
I U.UU
c (->(->
vJ.UU
0.00
O.I
•
^ » *
•4Mm^K*ltNf»» •! /s^t* t • t » *»l ft
DO 0.50 1.00 1.50 2.00 2.50 3.(
Distance FC14
Figure 35. Scatter plot of tetrachloroethylene with distance fromFC14 Roadways, interstate.
Final Report
83
11/19/2004
-------
PMwith F19
80
£T 70
£
"B) 60
I50
2 40
g 30
o
20
10
0
0
A
• »
*
•
0.02
0.08
0.04 0.06
Distance to F19 (km)
Figure 36. Scatter plot of PM25 Mass with distance from FC19 Roadways, small local roads.
0.1
Final Report
84
11/19/2004
-------
I :
2 2.1
+J
c
-------
.0
"+J
15
+j
c
0)
o
c
o
o
o
O
10
9
8
7
6
5
4
3
2
1
0
0
OCFC19
• •
-*-4-
0.02
0.04 0.06
FC19 Distance
0.08
0.1
Figure 38. Scatter plot of organic carbon with distance from FC19 Roadways, small local roads
Final Report
86
11/19/2004
-------
0)
o
c
o
o
20.00
18.00
16
m/p Xylene
0.2
0.4 0.6 0.8 1
Distance Gas Station (km)
1.2
Figure 39. Scatter plot of m/p xylene with distance from closest gasoline station.
Final Report
87
11/19/2004
-------
o Xylene
6.00
5.00
O 4.00
0)
o
3.00
» /
P 2.00
o
^ «-
* •
1.00
0.00
0
0.2
0.4 0.6 0.8
Distance Gas Station (km)
Figure 40. Scatter plot of o xylene with distance from closest gasoline station.
1.2
Final Report
11/19/2004
-------
Benzene
o.uu
5nn
.uu
c
OA nn
+j
15
^- ^ nn
^ o.uu
0)
O
c
o 9 nn
o ^-uu
1 nn
I .UU
n nn
-
*
• * » * *
» » • » * *
: ** * * *
t * » » »
* •»* »*» ** •%*/
* 2 ^ • ^ ^
^^TJfrfJ^7^ -:<
o
0.2
0.4 0.6 0.8
Distance Gas Station (km)
1.2
Figure 41. Scatter plot of benzene with distance from closest gasoline station.
Final Report
89
11/19/2004
-------
MTBE
c
I_
+J
0
O
c
o
o
ou.uu
9^ nn
zo.uu
9n nn
zu.uu
1 ^ nn
I O.UU
m nn
I U.UU
5 00
n nn
* » •
% »
+.
•
• •
•
»
*. *
• » * •
': < • •• .*• s. . • .
• ,. i.»; '*» <».»t f
* ** ^* ^* * * *
0 0.2 0.4 0.6 0.8 1
Distance Gas Station (km)
Figure 42. Scatter plot of methy tert butyl ether with distance from closest gasoline station.
1.2
Final Report
90
11/19/2004
-------
Tetrachloroethene
iJ.UU
C*^ f^n
0
+J ^ nn
ro J-uu
+^ « i-«
C9 *^n
z.ou
0
^ 9 nn
^ z.uu
O
01 ^n
1 nn
i .\j\j
Ocn
n nn
•
*
*
»
.
* • • *
4 \ • <•. ••< • • •
*»»'*• \^* I ^% /» | % • •
• •% *• *^ * *«• * ^.* »*«» »
* **v t ** **»^ *• *•* *****
* * , ,*•/ . \
o
0.2
0.4 0.6 0.8
Distance Gas Station (km)
1.2
Figure 43. Scatter plot of tetrachloroethylene with distance from closest gasoline station.
Final Report
91
11/19/2004
-------
EC PM02 Distance
c
0
15
c
0)
o
o
0
o
LJJ
H- ~
3.5
3
2.5
2
1 5
1
0.5
n J
X
X X
X
x x* x x
XX * x
X X « X
x * * x * x
X X
x x * x
X x X x X
* * * x *
* »* X
™ u, *^ >K
X
0.0 1.0 2.0
3.0 4.0 5.0 6.0
PM02 Distance
7.0 8.0 9.0
Figure 44. Scatter plot of PM2 5 Mass with distance from PM02, truck loading area.
Final Report
92
11/19/2004
-------
PM PM Source 03
80
70
c
.2 60
s
-£ 50
0)
o
o
o
in
40
30
20
10
0
•
».
0.0
1.0
2.0 3.0 4.0
Distance PM03
5.0
6.0
Figure 44. Scatter plot of PM2 5 Mass with distance from PM 03 truck loading area and dock.
Final Report
93
11/19/2004
-------
10
9
8
.2 7
"+j
£ 6
-------
PM Source 1-3 Plots
VJ.U
0.5
£U.4-
0
co 0.3
Q.
W0.2
Q
0.1
n
9 ' f
|
\ •— •
w M- m
'&:
• »PM2
• PM1
PM1
•
vsPMS
vsPM2
vsPMS
0
0.1
0.2 0.3 0.4
Dist PM2 or PM1
0.5 0.6
Figure 46. Scatter plot of distance from homes to PM 1, 2 and 3 source regions. Patterns same indicating a high correlation for homes close to these sources (a
few hundred yards).
Final Report
95
11/19/2004
-------
o
co 3
0_
ffi 2
Q
1
0
PM Source 1-3 Plots
Jt
,*i
0 0.5 1 1.5 2
Dist PM2 or PM1
Figure 47. Scatter plot of distance from homes to PM 1, 2 and 3 source regions to 2 km distance.
Final Report
96
11/19/2004
-------
Methyl tert butyl ether
(n=505)
Indoor concentration ( jig/m3)
8 g
1:1 line ^-"
' . '' *'.
..:'!.i:.£&* ,':/.•• '. •
0 10 20 30 40 5
Outdoor concentration (|j,g/m3)
igure 48. Scatter plots of MTBE for indoor/outdoor, (
Methyl tert butyl ether
Methyl tert butyl ether
(n"b04) (n=502)
^n en
Personal concentration
(ug/m3)
v K) CO .&>. (
O 0 O O O C
. • 1:1 line'
* • •*••'•
Personal concentration ( ^g/m3)
cs 8 8 5 i
1:1 line^''
^ "- ' :
*" *•-'''
• • s • x *
,* " * " . * ""'*
'-.' :v-%<:. "•
•*:#&&'
jar::-.:-.
0 10 20 30 40 50 0 10 20 30 40 5
.j / i j- Outdoor concentration Lig/m3) , , ,, , Indoor concentration (ng/m3)
jutdoor/personal and indoor/personal snowing rfiai there are homes around the 1 : 1 line so that pollutants
arising from outdoor will affect personal exposure.
Final Report
97
11/19/2004
-------
(n=505>
m,p-Xylene
(n=504)
i
8
8
?
1:1 line
5Q CQ 150
Outdoor concentration (
50 100 150 200
Outdoor concentration (u,g/m3)
250
O 200 -
1
S S- 150
g £
8o>
a.
o
S2
£
100 -
50 -
m,p-Xylene
(n=502)
1:1 line.
50 100 150 200
Indoor concentration (ixg/m3)
250
Figure 49. Scatter plots of MTBE for indoor/outdoor, outdoor/personal and indoor/personal showing that a subset of homes around the 1:1 line so that pollutants
arising from outdoor will affect personal exposure.
Final Report
98
11/19/2004
-------
I
1
5; pT"*1
8 E
81
PM25
(n=292) 1:1 Nne
-20
-DO -
80 -
60 -
40 -
20 -
n
,''
/'
*• ^ X*
• " x •
•3&&i '. ' '
jjiBffji' " "
x'"*Jr '
0 20 40 60 80 -DO 120
Outdoor concentration (ug/m3)
0
CO
E
1
PM25
(n=256) 1:1 Nne
180
160 -
140 -
-20 -
-DO -
80 -
60 -
40 -
20 -
n
B x ^
'
_
• • ". ^''
*•" " "
•"?" ^ •• ^ ^
«••--> ^ • •
'(•*' '
5".
^ -
§
1
§** ^
c £
§1
c
§
£
PM25
(n=246) 1:1Nne
180
160 -
140 -
120 -
-DO -
80 -
60 -
40 -
20 -
n
^^
,''
. * s
f • ^S
• • » . " ,- X
*,-.x^ •• ;
:«^^:.. /
ipi7-' *
ffT~
0 20 40 60 80 -DO 120 140 160 180
Outdoor concentration (ug/m3)
0 20 40 60 80 -DO 120 140 160 180
Indoor concentration (ug/m3)
Figure
50. Scatter plots of PM2 5 Mass for indoor/outdoor, outdoor/personal and indoor/personal showing that some homes are parallel to the 1:1 line so that
pollutants arising from outdoor will affect personal exposure.
Final Report
99
11/19/2004
-------
References:
Harrison, RM Smith, DJR and Luhana , L Source apportionment of atmospheric
polycyclic aromatic hydrocarbons collected from an urban location in
Birmingham, UK EST 30, 825-832, 1996.
Netter, J. Kutner, MH, Nachsheim, CJ and Wasserman, W Applied Statistical Models.
4th edition, R.D. Irwin, Inc, Homewood, IL 1996.
Poulopoulos, SG and Philippopoulos, CJ, "The Effect of Adding Oxygenated
Compounds to Gasoline on Automotive Exhaust Emissions" Transactions of the ASME,
125, 344-350, 2003
Rogge, WF Hildemann, LM Mazurek, MA Cass, GR and Simoneit, BRT,
Sources of fine organic aerosol. 5. Natural gas home appliances. Environmental
Science and Technology, 27, 636-651, 1993.
Weisel, CP, Zhang, JJ, Turpin, BJ, Morandi, MT Colome, S, Stock, TH Spektor, DM,
Korn, L, Winer, A, Alimokhtari , S, Kwon, J, Mohan, K Harrington, R, Giovanetti, R
Cui, W, Afshar, M, Maberti, S, Shendell, D "The Relationships of Indoor, Outdoor and
Personal Air (RIOPA) Study: Study Design, Methods and Quality Assurance/Control
Results, Journal of Exposure Analysis and Environmental Epidemiology, In Press 2004.
Weisel, Zhang, Turpin, Morandi, Colome, Stock and Spektor "Relationships of Indoor,
Outdoor and Personal Air (RIOPA)", HEI Final Report, In Press 2004.
-------
A-l
Appendix A
/H,/>Xylene
Bivariate Pearson Correlation
The correlation coefficient between the In-transformed /w^-xylene concentrations
and the distance to urban interstate (FC11) roadways was -0.20 (p=0.007). The correlation
coefficients between the In-transformed m,p-xylene concentrations and the inverse distance
to urban major arterial (FC14) roadways and the inverse distance to urban collector (FC17)
roadways were 0.19 (p=0.01) and 0.22 (p=0.0034), respectively. The correlation between the
ambient air concentration of r%,p-xy\ene and distances to individual roadways were examined
for the major roadway classes (1-95 for FC11; Rt.l, Rt.27, Rt.28, Rt.439 for FC14). The
distance to the 1-95 was statistically significantly correlated to the In-transformed
concentration of r%,p-xy\ene in the residential ambient air (-0.194, p=0.0093). The distance to
the US Highway Route 1 also was statistically significantly correlated to the In-trans formed
concentration of m,p-xylene in the residential ambient air (-0.272, p=0.0002).
The correlation coefficient between In-transformed ambient air concentration of m,p-
xylene and the inverse distance to the closest gas station was 0.28 (p=0.0002). For m,p-
xylene, only the point sources that were closer than 3 km from any of the sampled homes
and emissions larger than 0.9 tons of annual total generation, were considered in the data
analysis. Two point sources met the above criteria; one refinery in Linden, and an industrial
emission in Elizabeth. Only the distance between the refinery and the residences had a
statistically significant correlation with the In-transformed /w^-xylene concentrations (-0.17,
p=0.022).
-------
A-2
The Pearson correlation coefficients of the meteorological variables and the In-
transformed m,p-xylene concentrations that were statistically significantly correlated at
D =0.05, were atmospheric stability, 0.348 (p<0.0001); mixing height, -0.254 (p=0.0009);
wind speed, -0.235 (p=0.0014); and temperature, -0.19 (p=0.0101). The correlation
coefficients of precipitation and relative humidity were 0.125 (p=0.091) and 0.129 (p=0.082),
respectively. Atmospheric pressure was not correlated to the ambient m,p-xylene
concentration (p=0.47).
Preliminary Selection of Predictors
The preliminary regression analysis was performed on the In-transformed /w^-xylene
concentration to determine the relative importance of variables within the same types
(proximity and meteorological) of independent variables. The distances to the roadways,
either original or transformed, were grouped by its FC to examine the importance of
proximity of the mobile sources to the m,p-xylene air concentration. When the distances to
the functional classes were analyzed, the distances to the urban interstates (FC11) and the
urban principal arterials (FC14) were included in the resulting linear regression model
(p<0.15) with r2 = 0.0819. For the gas stations and the point sources, the inverse form of the
closest distance was always selected as the largest explanatory predictor variable in the model
regardless of the selection methods. The proximity to the refinery was also selected as the
larger explanatory variable along with an industry site in the model. Among the
meteorological variables, the 48-hour averaged mixing heights, temperature, and wind speed
were selected from the preliminary regression analysis. When the 48-hour averaged
atmospheric stability was introduced to the initial group of other meteorological variables,
-------
A-3
the model was improved (increased r2), but the mixing height was eliminated from the
resulting model.
Selection of the Best-fitting Model
The variables selected by the different regressions methods were relatively consistent.
Atmospheric stability, temperature and wind speed were included as predictors in the model
as were the inverse distances to the major roadways (FC11) and to gasoline stations (Table
A-l). The association of the distance to the refinery was not significant. The parameters and
analysis of variance of the regression equations for the ?%,p-xy\ene ambient air concentration
for the best-fitting model with 6 variables selected are given in Table A-l. The C(p), which
is Mallows' Cp statistic, associated with this particular subset of variables was determined to
be 7.0. The resulting model was appropriate in number of parameters, because the number
of parameters (p) including the intercept in the best-fitting model exactly matched to the
same value of the C(p). The diagnostic plots, the residual plot against the predicted values,
the normal probability-probability (PP) plot and normal quantile-quantile (QQ) plot of the
residuals were generated and visually examined (Figures A-l,2 and Appendix B). The
residuals were randomly distributed without showing any obvious trend or any particular
pattern (Figure A-l.) indicating close to a normal distribution and the constant variances.
The PP plot was nearly linear so it could be considered the error term of the model follows a
normal distribution. Based on the visual diagnosis, there was no significant evidence of lack
of fit or of significant unequal error variance for the best 7-parameter regression model.
Possible Outliers were found (Figure A-2) using the test statistics (+ tinv, .95, n-p-l =
175) of + 1.6545. The regression equation was recalculated after removal of the seventeen
Outliers. The parameters for the best fit equation are given in Table A-2. An increase in the
-------
A-4
r2 was obtained after removal of the outliers. The residuals plotted against the predicted
(Figure A-3.) seemed more randomly distributed compared to those in Figure A-l.
Diagnostics of Equal Variances and Multicollinearity Diagnostics
To test the assumption of equal variance, the heteroscedasticity of the parameter
estimates were tested as well as multicollinearity (Appendix C). The chi-square was 18 with a
probability of 0.19, a value greater than 0.05. Therefore, the variances of the parameter
estimates could be concluded as not being significantly different. As a consequence, the
equal error variances in parameter estimates were assumed in the best-fitting 6-parameter
regression model. The multicollinearity of predictor variables in the best-fitting model was
tested. The bivariate Pearson correlations between pairs of predictors included in the model
were examined for any significant correlation between the predictors.
The variance inflations for all predictors were close to 1, a value smaller than 10,
suggesting that there was no significant collinearity between the predictors in the model.
However, the collinearity diagnostics suggest that there were possible co-dependences,
which might overspecify the model outcome. In particular the meteorological conditions
were somewhat correlated and commercial enterprises, such as gasoline stations, are
preferentially located on or near major roadways so correlations in the proximity variables
could exist.
In order to attempt to reduce the multicollinearity diagnosed, the temperature, which
had larger proportion of variation than 0.5, was removed from the best-fitting model and the
multicollinearity of the resulting model diagnosed. When the temperature was removed from
the model, the coefficient of determination (r2) of the resulting 5-parameter model decreased
from 0.33 to 0.24, and the condition index decreased from 116 to 41. The interaction
-------
A-5
between the predictors in 5-parameter model appeared to be decreased after removal of the
temperature from the model, but the eigenvalue was still smaller than 0.01 (0.0023) and the
proportion of variation of the stability (0.97) were still greater than 0.5.
The multicollinearity diagnostics described above exhibited divergent results. The
largest condition index and proportion of variation indicated potential collinearity may exist
in the predictors. However, the variance inflation factors were much smaller than 10 for all
five parameter estimates indicating that the multicollinearity may not be a problem. Neter et
al suggested (1996) that even though there is serious multicollinearity, the fitted model may
be useful for estimating mean responses or making predictions, if the inferences of the fitted
regression model are restricted to the same multicollinearity pattern as the data on which the
regression model is based. Consequently, it was concluded that retaining all predictors
included in the best-fitting model with its qualitative characteristics is more beneficial for
explanatory observational purposes of this research than dropping the potentially
intercorrelated predictors from the model. Similar considerations were used for the other
compounds as well.
-------
A-6
Table A-l. Results of the Best-fitting 7-Parameter Model for m,p-Xjlene
Analysis of Variance
Source DF
Model 5
Error 177
Corrected Total 182
Root MSE
Dependent Mean
Coefficient of Variation
Sum of
Squares
35.9789
99.43418
135.4131
0.74952
0.81562
91.89477
Mean
Square
7.19578
0.56178
R- Square
F Value
12.81
Adjusted R-Square
0.2657
0.2450
Pr>F
<.0001
Parameter Estimates
Variable Label
Intercept Intercept
F14_lmlnv (Distance to FC14)-1
DF
1
1
GSlmlnv (Distance to Gas Station)-1 1
Stab4 Atmospheric Stability
K5 Temperature
U4 Wind speed
1
1
1
Parameter
Estimate
5.56153
14.56178
22.46222
0.52630
-0.02472
-0.12254
Standard
Error
2.24241
5.17879
8.56119
0.14753
0.00671
0.05923
t Value
2.48
2.81
2.62
3.57
-3.69
-2.07
Pr> 1 1 1
0.0141
0.0055
0.0095
0.0005
0.0003
0.0400
Summary of Stepwise Selection
Step Variable Entered
1 Atmospheric Stability
2 (Distance to Gas Station)-1
3 Temperature
4 (Distance to FC14)-1
5 Wind Speed
Partial
R- Square
0.1317
0.0460
0.0362
0.0340
0.0178
Model
R-Square
0.1317
0.1778
0.2140
0.2479
0.2657
Cp
27.9107
18.9388
12.3164
6.2173
3.9860
F Value
27.46
10.08
8.24
8.04
4.28
Pr>F
<.0001
0.0018
0.0046
0.0051
0.0400
-------
A-7
Residual Plot of the Best Fit Model of mp— Xylene
LnnpXD = 5.5615 +14. 562 f 14_1ninv +22. 462 GS1 ni nv +0.5263 3ab4 -0.0247 K5 -0. 1225 IX
2"
-2-
-3-
-4i
N
183
Ffeq
0.2657
AdjRsq
0.2450
0.7495
I I I I I I I I I I
-0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25
Redicted Val ue
Figure A-l. Residual vs. Predicted Plot of the Best-fitting 7-Parameter Model of m,p-Xylene
Outliers of Model for mp—Xylene
LnnpXO = 5. 5615 +14. 562 f 14_1nt nv +22. 452 GS1nt nv +0. 5263 3 ab4 -0. 0247 V5 -0.1225 Ut
3-
1 2'
—
-------
A-8
Table.A-2. Results of the Best-fitting 5-Parameter Model for m,p-Xylene after Removing the
Twenty Outliers
Analysis of Variance
Source DF
Model 5
Error 162
Corrected Total 167
Root MSE
Dependent Mean
Coefficient of Variation
Sum of
Squares
22.98982
46.58357
69.57338
0.53624
0.86365
62.08976
Mean
Square
4.59796
0.28755
R-Square
F
Value
15.99
Adjusted R-Square
0.3304
0.3098
Pr>F
<.0001
Parameter Estimates
Variable Label
Intercept Intercept
F14_lmlnv (Distance to FC14)-1
DF
1
1
GSlmlnv (Distance to Gas Station)-1 1
Stab4 Atmospheric Stability
K4 Temperature
U4 Wind Speed
1
1
1
Parameter
Estimate
4.94236
7.94739
17.43615
0.53744
-0.0232
-0.0653
Standard
Error
1.70161
4.43103
6.29951
0.11065
0.00507
0.04438
t Value
2.9
1.79
2.77
4.86
-4.58
-1.47
Pr> 1 1 1
0.0042
0.0747
0.0063
<.0001
<.0001
0.1431
Summary of Stepwise Selection
Step Variable Entered
1 Atmospheric Stability
2 Temperature
3 (Distance to Gas Station) -1
4 (Distance to FC14)-1
5 Wind Speed
Partial
R-Square
0.1813
0.0871
0.0385
0.0147
0.0090
Model
R-Square
0.1813
0.2684
0.3068
0.3215
0.3304
Cp
32.3859
13.4982
6.2732
4.7579
4.6109
F Value
36.76
19.64
9.10
3.52
2.17
Pr>F
<.0001
<.0001
0.0030
0.0624
0.1431
-------
A-9
Residual Plot of the Best Fit Model of mp— Xylene
LnnpXDOrt = 4.9424 +7. 9474 f 14_1ntnv +17.436 GSIntnv +0.5374 a ab4 -0.0232 K5 -0. 0653 IX
1.5
1.0"
0.5
-0.5
-to-
-1 5"
N
168
Ffeq
0.3304
AdjRsq
0.3098
R\
-------
A-10
o-Xylene
Bivariate Pearson Correlation
The correlation coefficients between In-transformed o-xylene concentrations and the
distance to urban interstate (FC11) roadways and the distance to urban major arterial (FC14)
roadways were -0.147 (p=0.048) and -0.148 (p=0.046), respectively. The distance to the US
Highway Route 1 also had a statistically significantly correlation coefficient of -0.266,
p=0.0003. The correlation coefficient between In-trans formed ambient air concentration of
o-xylene and the inverse distance to the closest gas station was 0.24 (p=0.0011). The refinery
was the only point source whose distance to the residences had a statistically significant
correlation with the In-trans formed o-xylene concentrations (0.174, p=0.019).
The meteorological variables that were statistically significantly correlated with o-
xylene concentrations were wind speed (-0.30, p<0.0001), atmospheric stability (0.427,
p<0.0001), mixing heights (-0.28, p=0.0002), relative humidity (0.16, p=0.027), and
temperature (-0.11, p<0.15). Precipitation and atmospheric pressure were not correlated with
the residential ambient air concentration of o-xylene.
Preliminary Selection of Predictors
A series of preliminary regression analyses for each group of variables were
performed using the In-transformed o-xylene concentrations to determine which variables to
include in the model. The distances to the closest gas station, the refinery, and the urban
major arterial roadways (FC14) were selected as important predictors among the variables
that describe the distance between sources and residences. From the meteorological
variables, wind speed, temperature, and stability were selected as predictor variables
-------
A-ll
Selection of the Best-fitting Model
The predictor variables selected by the different regression model selection methods
for the residential ambient air concentration of o-xylene were relatively consistent. The
meteorological variables, which were consistently included in the series of regression model,
were the atmospheric stability, temperature, and wind speed, in order of selection. The C(p)
was 7, the same as the number of parameters included in model. The parameter estimates
were significant (p<0.05), except for the intercept (p=0.26). The model statistics are
summarized in Table A-3. As illustrated in Figure A-5, the residuals were distributed
relatively random. The PP plot was nearly linear implying that the error term of the model
followed a normal distribution. Twelve data points were identified as possible Outliers by
using a test statistic of ±1.645 (0.95, df=175, Figure A-6). The analysis of the variance,
parameter estimates, and the summary of model statistics for the best-fitting 7-parameter
model for o-xylene after removal of outliers are listed in Table A-4. The selected model was
statistically significant (p<0.0001). The residuals for the model with the outliers removed
were randomly distributed without showing any obvious trend or any particular pattern
(Figure A-7). The standardized residuals of the best-fitting model were close to a normal
distribution and had the constant variances. Based on a visual diagnosis on residual plot,
probability plot, and quantile plot, there was no evidence of a lack of fit or unequal error
variance for the best-fitting 7-parameter regression model for the ambient residential o-
xylene. The Mallows' C statistic associated with this particular subset of variables was
determined at 7.0, indicating that the resulting model had the appropriate number of
parameters.
-------
A-12
Diagnostics of Equal Variances and Multicollinearity Diagnostics
To test the assumption of equal variance, the heteroscedasticity of the parameter
estimates were tested. The chi-square was 28 with a probability of 0.104, a value greater than
0.05 (Appendix C). Therefore, the variances of the parameter estimates could be concluded
as not being significantly different. As a consequence, the equal error variances in parameter
estimates were assumed in the best-fitting 7-parameter model. The same considerations
about parameter correlations expressed for tn/p xylene also apply to o xylene.
-------
A-13
Table A-3. Results of the Best-fitting 7-Parameter Model for o-Xylene
Analysis of Variance
Source DF
Model 5
Error 177
Corrected Total 182
Root MSE
Dependent Mean
Coefficient of Variation
Sum of
Squares
39.22208
80.5563
119.7784
0.67463
-0.09397
-717.895
Mean
Square
7.84442
0.45512
R-Square
F Value
17.24
Adjusted R-Square
0.3275
0.3085
Pr>F
<.0001
Parameter Estimates
Variable Label
Intercept Intercept
F14_lmlnv (Distance to FC14)-1
GSlmlnv (Distance to Gas Station)
Stab4 Atmospheric Stability
K5 Temperature
U4 Wind speed
DF
1
1
-1 1
1
1
1
Parameter
Estimate
2.28427
20.31620
13.94798
0.63575
-0.01848
-0.11480
Standard
Error
2.01835
4.66133
7.70577
0.13279
0.00604
0.05331
t Value
1.13
4.36
1.81
4.79
-3.06
-2.15
Pr>|t
0.2593
<.0001
0.0720
<.0001
0.0025
0.0326
Summary of Stepwise Selection
Step Variable Entered
1 Atmospheric Stability
2 (Distance to FC14)-1
3 Temperature
4 Wind Speed
5 (Distance to Gas Station) -1
Partial
R-Square
0.1907
0.0802
0.0285
0.0156
0.0124
Model
R-Square
0.1907
0.2709
0.2994
0.3150
0.3275
Cp
31.3278
12.4773
7.0815
5.0189
3.7836
F Value
42.65
19.81
7.27
4.06
3.28
Pr>F
<.0001
<.0001
0.0077
0.0454
0.0720
-------
A-14
Residual Plot of the Best Fit Model of o—Xylene
LnoXD = 2.2843 +20. 316 f 14 1ninv +13. 948 GSIninv +0.6357 aaM -0. 0185K5 -0.1148 IX
++ $
+
-1-
-2"
-3-
N
183
Ffeq
0.3275
AdjF^q
0.3085
0.6746
\ i i i i i i i i i i r
-1.25 -1.00 -0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50
Redicted Val ue
Figure A-5. Residual Plot of the Model of o-Xylene
Outliers of Model for o—Xylene
LnoXO = 2.2843 +20.316f14 1ntnv +13. 948 GBIntnv +0. 6357 3 ab4 -0.0185I« -0. 1148 Ut
6'
4'
2'
• o-
-4
*
+ +
++
25 50
75 100 125
Cbservat i on Nintoer
Figure A-6. Outliers of Model of o-Xylene
N
183
Ffeq
0. 3275
AljFfeq
0. 3085
0.6746
150 175 200
-------
A-15
Table A-4 Results of the Best-fitting 7-Parameter Regression Model for o-Xylene after
Removing of the Outliers
Analysis of Variance
Source DF
Model 5
Error 162
Corrected Total 167
Root MSE
Dependent Mean
Coefficient of Variation
Sum of
Squares
23.35913
31.95114
55.31027
0.44410
-0.09736
-456.16000
Mean
Square
4.67183
0.19723
R-Square
F Value
23.69
Adjusted R-Square
0.4223
0.4045
Pr>F
<.0001
Parameter Estimates
Variable Label
Intercept Intercept
F14_lmlnv (Distance to FC14)-1
GSlmlnv (Distance to Gas Station)
Stab4 Atmospheric Stability
K5 Temperature
U4 Wind speed
DF
1
1
-1 1
1
1
1
Parameter
Estimate
4.45813
7.44373
9.54244
0.52092
-0.02352
-0.12197
Standard
Error
1.40740
4.48291
5.47996
0.09234
0.00419
0.03697
t Value
3.17
1.66
1.74
5.64
-5.62
-3.30
Pr>|t
0.0018
0.0988
0.0835
<.0001
<.0001
0.0012
Summary of Stepwise Selection
Step Variable Entered
1 Atmospheric Stability
2 Temperature
3 Wind Speed
4 (Distance to Gas Station) -1
5 (Distance to FC14)-1
Partial
R-Square
0.2717
0.0877
0.0353
0.0178
0.0098
Model
R-Square
0.2717
0.3594
0.3947
0.4125
0.4223
Cp
38.3416
15.9706
8.1589
5.2175
4.4861
F Value
61.92
22.59
9.57
4.93
2.76
Pr>F
<.0001
<.0001
0.0023
0.0277
0.0988
-------
A-16
Residual Plot of the Best Fit Model of o—Xylene
LnoXDOrt =4.4581 +7. 4437 f 14 1ni nv +9. 5424 GS1ni nv +0. 5209 9ab4 -0. 0235 K5 -0.122 IX
1.25-
LOO"
0.75'
0.50-
0.25'
0.00"
0.25-
0.50-
0.75"
LOO'
1.25'
+ +
+ + I ++++ ++ + +
++ + + + +
+ ~§-A^ + + + J.
+ t + 5-^ ^ +$+ + +^+
I H" ^ff ~r I "I" ~h
~H .I "*" _j_ """ ~H
+ +++ -H-+ +++++ + +
+++ +* ++ ++ +
+ + ++ + ++ +
+ + + ++
+ + + +
N
168
Rsq
0.4223
AdjRsq
0.4045
RvEE
0.4441
\ \
1 I I \
\ \
-1.00 -0.75 -0.50
-0.25 0.00 0.25
Redicted Val ue
0.50 0.75 1.00
Figure A-7- Residual vs. Predicted Plot of the Best-fitting 7-Parameter Model of o-Xylene
after Removing the Outliers
Cp Plot with Reference Lines
LnoXOQit = 4. 4581 +7. 4437 f 14_1 nt nv +9. 5424 GS1 nt nv +0. 5209 3 ab4 - 0. 0235 V5 - 0.122 \M
50"
40"
30"
20-
10-
N
168
Ffeq
0. 4223
AdjFfeq
0.4045
RvEE
0.4441
n i i i i i i i
2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
Rot + + + CPP
--------- CP = 2P - (P f or f ul I model ) + 1
CP= P
-------
A-17
Table 4.A- Results of the Test of Multicollinearity of Predictor Variables Included in the
Best-fitting Model for o-Xylene
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |
Variance
| Tolerance Inflation
Intercept 1 5.19346 1.43598 3.62
f!4_lmlnv 1 8.75248 4.30991 2.03
GSlmlnv 1 9.80788 5.25048 1.87
Stab4 1 0.53003 0.08902 5.95
K4 1 -0.02635 0.00443 -5.94
U4 1 -0.13256 0.03548 -3.74
0.0004 . 0
0.0439 0.92783 1.07778
0.0636 0.91647 1.09114
<.0001 0.72367 1.38185
<.0001 0.93148 1.07355
0.0003 0.69736 1.43397
Collineanty Diagnostics
Collineanty Diagnostics
Condition ----------------------------- Proportion of Variation
Number Eigenvalue Index Intercept fl4_lmlnv GSlmlnv Stab4
U4
1 4.83871 1.00000 0.00002176 0.01278 0.01262 0.00022105 0.00002821 0.00172
2 0.63347 2.76376 0.00004494 0.37172 0.34247 0.00041480 0.00005795 0.00400
3 0.46833 3.21432 3.48031E-9 0.60663 0.62097 0.00000123 4.828461E-8 0.00002712
4 0.05572 9.31914 0.00038414 0.000028140.00518 0.01731 0.00064391 0.57135
5 0.00347 37.36627 0.02094 0.00109 0.01645 0.88898 0.05375 0.27248
6 0.00030792 125.35642 0.97861 0.00776 0.00231 0.09308 0.94552 0.15042
Table 4.4.4. Results of the Test of Heteroscedasticity of Parameter Estimates Determined in
the Best-fitting Model for o-Xylene
Consistent Covanance of Estimates
Variable Intercept f!4_lmlnv GSlmlnv
Stab4
K4
U4
Intercept 1.6675971894 0.6946045492 -1.338990966 -0.041187547 -0.004861092 -0.017907343
fl4_lmlnv 0.6946045492 12.512236841 -2.924001768 0.0253987847 -0.003166084 0.0004076095
GSlmlnv -1.338990966 -2.924001768 15.630366869 -0.007104998 0.0047403892 -0.010061741
Stab4 -0.041187547 0.0253987847 -0.007104998 0.0068550523 3.7178761E-7 0.0013929763
K4 -0.004861092 -0.003166084 0.0047403892 3.7178761E-7 0.0000168767 0.000016424
U4 -0.017907343 0.0004076095 -0.010061741 0.0013929763 0.000016424 0.0015020694
Test of First and Second
Moment Specification
DF Chi-Square Pr > ChiSq
20 28.23 0.1040
-------
A-18
4.5. Toluene
4.5.1. Bivariate Pearson Correlation
The correlation coefficients between In-transformed toluene concentration and the
distance to FC14 roadways and for inverse distance to FC14 were —0.177 (p=0.018) and
0.172 (p=0.02), respectively. The distance to the US Highway Route 1 also was statistically
significantly correlated to the In-transformed concentration of toluene in the residential
ambient air (-0.162, p=0.03). The correlation coefficient between In-transformed toluene
concentration and the distance to the closest gas station was —0.18 (p=0.015). The refinery
was the only point source whose distance from the residence to the facility had a statistically
significant correlation for the In-transformed toluene concentration in ambient air (—0.15,
p=0.04).
The meteorological variables that correlated with toluene concentration were
atmospheric stability (0.31, p<0.0001), wind speed (-0.198, p<0.01), mixing heights (-0.19,
p<0.01), relative humidity (0.17, p<0.05), temperature (-0.135, p<0.1), and precipitation
(0.115, p<0.15). Atmospheric pressure was not correlated with the residential ambient air
concentration of toluene.
Preliminary Selection of Predictors
The distances to the refinery, and the distance to major urban arterial roadways
(FC14) were selected as important predictors among the variables that describe the distance
between sources to residences. From the meteorological variables, wind speed and
atmospheric stability were selected as predictor variables (p<0.15).
Selection of the Best-fitting Model
-------
A-19
The predictor variables selected by the different selection methods for regression
model for the residential ambient air concentration of toluene from the proximity variables
and the meteorological variables were consistent. The meteorological variables included in
the regression model were the atmospheric stability, and temperature. The inverse distance
to the closest major urban arterial roadway (FC14) was included in the model as a predictor
among the proximity to the major roadway variables. The inverse distance to the refinery
was included as significant predictor variables in the model among the distance variables to
point source. The distance to the gas station was not selected as a predictor for the model of
toluene. The model was statistically significant (p<0.0001) with an r2 of 0.199 (adjusted r2 of
0.181).
The parameter estimates for the meteorological predictors were significant at p<0.01
and the parameter estimates for the proximity variables in the model were significant at
p<0.15. The model statistics are summarized in Table A-5. The residuals were distributed
relatively random (Figure A-9). The error term of the model followed a normal distribution,
based on the linearity observed in PP plot (Appendix B). The possible outliers were
determined by using the test statistics of + 1.645 (0.95, df. = 165) to improve by removing
the less contributing Outliers (Figure A-10). The analysis of the variance, parameter
estimates, and the summary of model statistics for the best-fitting 5-parameter model for
toluene are listed in Table A-5. The removal of the outlier improved the r2, from 0.20 to
0.33. The residuals were randomly distributed without showing any obvious trend or any
particular pattern based on a visual examination (Figure A-ll). The standardized residuals of
the best-fitting model appear to be close to a normal distribution and had constant variances.
The residual, PP, QQ plots, there appear to show no visual evidence of lack of fit or unequal
error variance. The Mallows' Cp statistic associated with this particular subset of variables
-------
A-20
was determined to be 5, indicating the resulting model had appropriate number of
parameters.
Diagnostics of Equal Variances and Multicollinearity Diagnostics
To test the assumption of equal variance, the heteroscedasticity of the parameter
estimates were tested as well as multicollinearity (Appendix C). The chi-square was 28.15
with a probability of 0.014, a value smaller than 0.05. Therefore, the variances of the
parameter estimates were concluded as significantly different. As a consequence, the error
variances in parameter estimates could not be assumed as equal for the best-fitting 5-
parameter model of toluene. The bivariate Pearson correlations between pairs of predictors
included in the model showed some statistically significant correlations were identified
between 'the inverse distance to the refinery' and 'the closest distance to the urban major
arterial roadways (FC14)' (-0.284, p<0.0001).
The variance inflations for predictor variables were close to 1 (1.01 ~ 1.11) which is
not greater than 10. Based on the variation inflation factors, there was no significant
collinearity between the predictors in the model. However, as a result of the collinearity
diagnostics, the condition index was 107, which was greater than 100, and the eigenvalue was
close to zero (0.00038), which was smaller than 0.01. The proportion of variation of
intercept (0.98) and temperature (0.97) were greater than 0.5, indicating that the two
parameters interacted. Therefore, there were possible co-dependences in the model, which
might overspecify the model outcome.
-------
A-21
Table A-5. Results of the 5-Parameter Multiple linear regression Model for Toluene
Analysis of Variance
Source DF
Model 4
Error 178
Corrected Total 182
Root MSE
Dependent Mean
Coefficient of Variation
Sum of
Squares
39.48395
146.1108
185.5947
0.90601
1.52498
59.41107
Mean
Square
9.87099
0.82085
F Value
12.03
R-Square
Adjusted R-Square
0.2127
0.1951
Pr>F
<.0001
Parameter Estimates
Variable Label
Intercept Intercept
f!4_lmlnv (Distance to FC14)-1
Stab4 Atmospheric Stability
K4 Temperature
RH5 Relative Humidity
DF
1
1
1
1
1
Parameter
Estimate
6.29278
16.66451
0.65082
-0.03208
0.01554
Standard
Error
2.41687
6.14926
0.16197
0.00830
0.00613
t Value
2.60
2.71
4.02
-3.87
2.54
Pr>|t
0.0100
0.0074
<.0001
0.0002
0.0120
Summary of Stepwise Selection
Step Variable Entered
1 Atmospheric Stability
2 Temperature
3 (Distance to FC14)-1
4 Relative Humidity
Partial
R-Square
0.1162
0.0384
0.0296
0.0285
Model
R-Square
0.1162
0.1546
0.1843
0.2127
Cp
23.6494
16.8377
12.0408
7.5148
F Value
23.8
8.18
6.50
6.44
Pr>F
<.0001
0.0047
0.0116
0.0120
-------
A-22
Residual Plot of the Best Fit Model of Toluene
LnTolO = 6.2928 +16.665f14 1ninv +0.6508aab4 -0.0321 K5 -K3.0155R-5
-1
-2"
-3-
-41
+ + + + ,. V
+ 4-
++
N
183
Rsq
0. 2127
AdjF^q
0.1951
RISE
0.906
\ I I I I I I I I I I \
0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00
Redicted Val ue
Figure A-9. Residual vs. Predicted Plot of the 5-Parameter Model of Toluene
Outliers of Model for Toluene
LnTol 0 = 6. 2928 +16. 665 f 14 1nt nv +0. 6508 3 ab4 -0. 0321 ¥5 +0. 0155 R-5
3-
2'
r
~ 0-"T-
-r
-2"
-S -3-
i + +
. + +
- ---------
N
183
Ffeq
0.2127
MjFfeq
0.1951
RVCE
0.906
25 50
75 100 125
Cbservat i on Nintoer
150 175 200
Figure A-10. Outliers of Model of Toluene
-------
A-23
Table A-6. Results of the Best-fitting 5-Parameter Model for Toluene after Removing the
Outliers
Analysis of Variance
Source DF
Model 4
Error 162
Corrected Total 166
Root MSE
Dependent Mean
Coefficient of Variation
Sum of
Squares
30.71102
67.51842
98.22944
0.64559
1.60929
40.11608
Mean
Square
7.67776
0.41678
F
Value
18.42
R-Square
Adjusted R-Square
0.3126
0.2957
Pr>F
<.0001
Parameter Estimates
Variable Label
Intercept Intercept
f!4_lmlnv (Distance to FC14)-1
Stab4 Atmospheric Stability
K4 Temperature
RH5 Relative Humidity
Parameter
DF Estimate
1
1
1
1
1
3.11017
14.72149
0.70584
-0.02067
0.01116
Standard
Error
1.84905
4.43634
0.12046
0.00635
0.00480
t Value
1.68
3.32
5.86
-3.25
2.33
Pr> t
0.0945
0.0011
<.0001
0.0014
0.0212
Summary of Stepwise Selection
Step Variable Entered
1 Atmospheric Stability
2 (Distance to FC14)-1
3 Temperature
4 Relative Humidity
Partial
R-Square
0.2245
0.0376
0.0276
0.0230
Model
R-Square
0.2245
0.2620
0.2897
0.3126
Cp
18.6321
11.8339
7.3616
3.9805
F Value
47.76
8.35
6.34
5.42
Pr>F
<.0001
0.0044
0.0128
0.0212
-------
A-24
Residual Plot of the Best Fit Model of Toluene
LnTolOOit =3.1102 +14.721 f 14 1ntnv +0. 7058 StaM -0. 0207K5 +0. 0112R-5
1.5-
1.0'
0.5'
-2.01
**" **
-0.5-
-1.0
-1.5-
** *
N
167
Rsq
0.3126
AdjRsq
0.2957
R\
-------
A-25
4.6. Benzene
4.6.1. Bivariate Pearson Correlation
A negative correlation coefficient of -0.196( p=0.008) was determined for the
distance to the US Highway Route 1 and natural In-transformed benzene concentrations.
The correlation coefficient between In-transformed benzene concentration and the inverse
distance to the closest gas station was significant (0.259, p=0.0004). Among the distances to
the four identified point sources, no point source was significantly correlated with benzene
concentration.
The meteorological variables that correlated with benzene concentrations were
temperature (-0.392, p<0.0001), atmospheric stability (0.263, p=0.0003), mixing heights (-
0.224, p=0.0024), wind speed (-0.167, p=0.025), and atmospheric pressure (0.165, p=0.026).
Precipitation and relative humidity were not significantly correlated with the benzene
concentration.
Preliminary Selection of Predictors
The distances to the closest gas station, the refinery, and the US highway Route 1
were selected as important predictors of ambient benzene concentration (p<0.15). Wind
speed, temperature, and atmospheric stability were selected as predictor variables from the
meteorological variables (p<0.15).
Selection of the Best-fitting Model
The predictor variables selected by the various selection methods in models for the
residential ambient air concentration of benzene, were relatively consistent. The
meteorological variables which were consistently included were: temperature, atmospheric
-------
A-26
stability, and wind speed in order of selection. The C(p) suggested the 6-parameter model
was appropriate. The model statistics and parameter estimates are summarized in Table A-7.
The residuals were distributed relatively randomly (Figure A-13) suggesting that there was
equal variance in residuals. The probability plot (Appendix B) was linear indicating that the
error term of the model followed a normal distribution.
Possible outliers were identified based on a test statistic of + 1.645 (0.95, df. = 175)
(Figure A-14). After the removal of sixteen possible Outliers, the model became the 5-
parameter model because the wind speed was not included in the best-fitting regression
model. The analysis of the variance, parameter estimates, and the summary of model
statistics for the best-fitting 5-parameter model for benzene are listed in Table A-8. A visual
examination of the residuals indicated that they were randomly distributed without showing
any obvious trend or any particular pattern (Figure A-15). The standardized residual of the
best-fitting model was close to a normal distribution with constant variances. There was no
visual evidence for the lack of a fit or of significant unequal error variance for the best-fitting
5-parameter regression model for the residential ambient air benzene concentration. The
Mallows' Cp statistic associated with this particular subset of variables was determined at 5,
suggesting the model result had appropriate number of parameters included.
Diagnostics of Equal Variances and Multicollinearity Diagnostics
To test the assumption of equal variance, the heteroscedasticity of the parameter
estimates were tested as well as multicollinearity. The chi-square was 18.85 with a probability
of 0.17, a value greater than 0.05. Therefore, the variances of the parameter estimates could
be concluded as not significantly different. As a consequence, the equal error variances in
parameter estimates were assumed in the best-fitting 5-parameter model. The bivariate
-------
A-27
Pearson correlations between pairs of predictors included in the model showed statistically
significant correlations between 'the inverse distance to the closest gas station' and 'the
inverse distance to the closest urban major arterial roadways (FC14)' (0.184, p=0.013).
The variance inflations for predictor variables were close to 1 (1.01 ~ 1.06) which is
not greater than 10. Based on the variation inflation factors, there was no significant
collinearity between the predictors in the model. However, as a result of the collinearity
diagnostics, the condition index was 103, which was greater than 100, and the eigenvalue was
close to zero (0.00036), which was smaller than 0.01.
-------
A-28
Table A-7. Results of the Best-fitting 6-Parameter Model for Benzene
Analysis of Variance
Source DF
Model 3
Error 179
Corrected Total 182
Root MSB
Dependent Mean
Coefficient of Variation
Sum of
Squares
26.86251
77.73533
104.5978
0.6590
0.1278
515.6321
Mean
Square
8.95417
0.43428
F Value Pr > F
20.62 <.0001
R-Square
Adjusted R-Square
0.2568
0.2444
Parameter Estimates
Variable Label
Intercept Intercept
GSlmlnv (Distance to Gas Station)
K4 Temperature
U4 Wind speed
DF
1
-1 1
1
1
Parameter
Estimate
11.46110
26.16590
-0.03743
-0.17239
Standard
Error
1.74199
7.30908
0.00585
0.04456
t Value Pr> t
6.58 <.0001
3.58 0.0004
-6.4 <.0001
-3.87 0.0002
Summary of Stepwise Selection
Step Variable Entered
1 Temperature
2 Wind Speed
3 (Distance to Gas Station)-1
Partial
R-Square
0.1407
0.0629
0.0532
Model
R-Square
0.1407
0.2036
0.2568
Cp
27.2654
14.1672
3.3947
F Value Pr>F
29.64 <.0001
14.22 0.0002
12.82 0.0004
-------
A-29
Residual Plot of the Best Fit Model of Benzene
LnfenO = 11. 461 +26.166 GSIni nv -0. 0374 K5 -0.1724 Ltt
2"
0 +
-2-
-3-
-4i
+ J
+x n -
+++ 4 +
N
183
0.2568
AdjRsq
0.2444
R\
-------
A-30
Table A-8. Results of the Best-fitting 5-Parameter Model for Benzene after Removing the
Outliers
Analysis of Variance
Source DF
Model 5
Error 163
Corrected Total 168
Root MSB
Dependent Mean
Coefficient of Variation
Sum of
Squares
26.0494
37.2387
63.2881
0.4780
0.1438
332.5000
Mean
Square
5.20987
0.22846
R-Square
F Value
22
Adjusted R-Square
.8
0.4116
0.3936
Pr>F
<.0001
Parameter Estimates
Variable Label
Intercept Intercept
F14_lmlnv (Distance to FC14)-1
DF
1
1
GSlmlnv (Distance to Gas Station)-1 1
Stab4 Atmospheric Stability
K4 Temperature
U4 Wind Speed
1
1
1
Parameter
Estimate
10.07440
5.49770
16.14780
0.30356
-0.03914
-0.08488
Standard
Error
1.49805
3.32981
5.57504
0.09971
0.00447
0.03966
t Value
6.73
1.65
2.90
3.04
-8.76
-2.14
Pr> 1 1 1
<.0001
0.1007
0.0043
0.0027
<.0001
0.0338
Summary of Stepwise Selection
Step Variable Entered
1 Temperature
2 Atmospheric Stability
3 (Distance to Gas Station)-1
4 Wind Speed
5 (Distance to FC14)-1
Partial
R-Square
0.2510
0.0996
0.0340
0.0172
0.0098
Model
R-Square
0.2510
0.3506
0.3846
0.4018
0.4116
Cp
41.1680
15.7541
8.3926
5.6629
4.9545
F Value
55.95
25.46
9.12
4.71
2.73
Pr>F
<.0001
<.0001
0.0029
0.0314
0.1007
-------
A-31
Residual Plot of the Best Fit Model of Benzene
LnfenOOit = 10.074 +5. 4977 f 14 1ntnv +16. 148 GSIntnv +0.3036 aab4 -0.0391 K5 -0. 0849 IX
1.5-
LO-
CI 5'
-0.5
-1.0-
-1 5'
N
169
0.4116
AdjRsq
0.3936
0.478
\ I I I I I I I I T
-0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50
Predicted Val ue
Figure A-15. Residual vs. Predicted Plot of the Best-fitting 5-Parameter Model of Benzene
after Removing the Outliers
Cp Plot with Reference Lines
LnfenOQit = 10.074 +5. 4977 f 14 1ntnv +16. 148 GBInknv +0.30363ab4 -0.0391 V5 -0.0849LM
so-
40'
30'
20'
10'
0'
N
169
Ffeq
0.4116
MjFfeq
0.3936
RVCE
0.478
2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
P
R ot + + + CPP CP = P
CP = 2P - (P for f ul I model) + 1
Figure A-16. Cp Plot for the Best-fitting 5-Parameter Model for Benzene after Removing the
Outliers
-------
A-32
4.7. Ethylbenzene
4.7.1. Bivariate Pearson Correlation
Ethylbenzene concentration was not correlated significantly with the any of the
proximity variables of roadway classification directly or following any transformations. The
inverse distance from the sampler location to US highway Route 1 (one of the individual
roadways of FC14) showed significant correlation for un-transformed ambient air
concentration of ethylbenzene (0.167, p=0.024). The inverse distance to the closest gas
station had significant correlation for un-transformed ambient air concentration of
ethylbenzene (0.206, p=0.005). There were two identified point sources of ethylbenzene in
the study area, but only the distance to the refinery was correlated with ethylbenzene
concentration (p<0.1).
The meteorological variables that correlated with In-transformed ethylbenzene
concentrations were atmospheric stability (0.27, p=0.0003), wind speed (-0.17, p=0.024), and
temperature (-0.14, p=0.06). Mixing height, relative humidity, precipitation, and atmospheric
pressure were not significantly correlated with the ambient air concentration of
ethylbenzene.
4.7.2. Preliminary Selection of Predictors
A series of preliminary regression analyses for each group of variable were
performed using the In-transformed ethylbenzene concentrations to determine which
variables to include in the model. The distances to the closest gas station was selected as an
important predictor among the variables that describe the distance between sources and
residences. From the meteorological variables, atmospheric stability was selected as a
predictor variable (p<0.15).
-------
A-33
4.7.3. Selection of the Best-fitting Model
The predictor variables selected in the models by the different selection methods for
the residential ambient air concentration of ethylbenzene were relatively consistent.
Atmospheric stability and temperature were selected among the meteorological variables as
predictor variables. The inverse square distance to the urban major arterial roadways (FC14)
was selected as a significant predictor in the model of ethylbenzene among the source
proximity variables. The C(p) was 4.3, close to the number of parameters (4) included in
model. The parameter estimates were significant at p<0.05 for the meteorological variables
(atmospheric stability and temperature). The model statistics are summarized in Table A-9.
The residuals were distributed relatively randomly (Figure A-17) suggesting that there was
equal variance in residuals. The probability plot (Appendix C) was linear indicating that the
error term of the model followed a normal distribution. Possible outliers were identified by
using a test statistics of + 1.645 (0.95, df. = 175) (Figure A-18). The analysis of the variance,
parameter estimates, and the summary of model statistics for the best-fitting 4-parameter
model for ethylbenzene after removal of outliers are listed in Table A-10. The removal of
the seventeen outliers did not improve the r2 as much as observed from the models of the
other VOCs in this research. However, the probability of parameter estimates of the best-
fitting model was improved for all variables (p<0.05).
The residuals for the model with the outliers removed were more randomly
distributed (Figure A-ll) compared to the distribution before removal (Figure A-9). The
standardized residuals of the best-fitting model were close to a normal distribution and had
constant variances. The probability plot showed the linearity of the error term followed a
normal distribution. Based on a visual diagnosis, there was no evidence of a lack of fit or
-------
A-34
unequal error variance for the best-fitting 4-parameter regression model for the ambient
residential ethylbenzene. The Mallows' C statistic associated with this particular subset of
variables was determined at 4 (Figure A-12), indicating that the resulting model had the
appropriate number of parameters.
Diagnostics of Equal Variances and Multicollinearity Diagnostics
To test the assumption of equal variance, the heteroscedasticity of the parameter estimates
were tested as well as the mulitcollinearity. The chi-square was 16.88 with a probability of
0.0506, a value slightly greater than 0.05. Therefore, the variances of the parameter estimates
could be concluded as not significantly different. As a consequence, the equal error variances
in parameter estimates were assumed in the best-fitting 4-parameter model. The bivariate
Pearson correlations between pairs of predictors did not identify any statistically significant
correlations between predictors in the model at D =0.05.
The variance inflations for predictor variables were close to 1 (1.007 ~ 1.023) a value
not greater than 10. Based on the variation inflation factors, there was no significant
collinearity between the predictors in the model. As a result of the collinearity diagnostics,
the condition index was 91, which was smaller than 100. However, the eigenvalue was close
to zero (0.00037), which was smaller than 0.01. The proportion of variation of intercept
(0.98) and temperature (0.97) were greater than 0.5, indicating that the two parameters were
interacted. As described for m,p xylene some codependency among the variable exist..
-------
A-35
Table A-9. Results of the Best-fitting 4-Parameter Model for Ethylbenzene
Analysis of Variance
Source DF
Model 4
Error 178
Corrected Total 182
Root MSE
Dependent Mean
Coefficient of Variation
Sum of
Squares
32.8909
223.166
256.057
1.1197
-0.2723
-411.2200
Mean
Square
8.22271
1.25374
FV
6.
R- Square
Adjusted R-Square
alue
56
0.1285
0.1089
Pr>F
<.0001
Parameter Estimates
Variable Label
Intercept Intercept
F14_lmlnv (Distance to FC14)-1
Stab4 Atmospheric Stability
K5 Temperature
U4 Wind Speed
DF
1
1
1
1
1
Parameter
Estimate
5.32543
13.31980
0.52795
-0.02653
-0.17165
Standard
Error
3.34905
7.59097
0.21804
0.00999
0.08826
t Value
1.59
1.75
2.42
-2.66
-1.94
Pr> t
0.1136
0.0810
0.0165
0.0086
0.0534
Summary of Stepwise Selection
Step Variable Entered
1 Atmospheric Stability
2 Temperature
3 Wind Speed
4 (Distance to FC14)-1
Partial
R-Square
0.0741
0.0204
0.0189
0.0151
Model
R-Square
0.0741
0.0945
0.1134
0.1285
Cp
7.2130
5.1088
3.3142
2.2823
F Value
14.49
4.06
3.81
3.08
Pr>F
0.0002
0.0455
0.0525
0.0810
-------
A-36
Residual Plot of the Best Fit Model of Ethylbenzene
LnEbzO = 5.3254 +13.32 f14 1ninv +0.5279 3ab4 -0.0265K5 -0.1716 LX
3"
-2'
-3
-4"
N
183
Rsq
0.1285
AdjRsq
0.1089
1.1197
I I I I I I I I I I I
-1.50 -1.25 -1.00 -0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00
Redicted Val ue
Figure A-17. Residual vs. Predicted Plot of the 4-Parameter Model of Ethylbenzene
Outliers of Model for Ethylbenzene
LnBzO = 5.3254 +13.32 f 14 1nknv +0. 5279 3 ab4 -0.0265 l« -0. 1716 LM
3"
I 2'
as
^ 1'
J °
X
- -r
-D
1 -2'
-a
£ -3-
as
-a
si -4"
-5'
N
183
Ffeq
0.1285
MjFfeq
0.1089
RVCE
1.1197
25 50 75 100 125 150 175 200
Cbservat i on Nintoer
Figure A-18. Outliers of 4-Parameter Model of Ethylbenzene
-------
A-37
Table A-10. Results of the Best-fitting 4-Parameter Multiple linear regression Model for
Ethylbenzene After Removing the Outliers
Analysis of Variance
Source DF
Model 4
Error 164
Corrected Total 168
Root MSE
Dependent Mean
Coefficient of Variation
Sum of
Squares
20.4924
105.564
126.056
0.8023
-0.1066
-752.6100
Mean
Square
5.12309
0.64368
F
R- Square
Adjusted R-Square
Value
7.96
0.1626
0.1421
Pr>F
<.0001
Parameter Estimates
Variable Label
Intercept Intercept
F14_lmlnv (Distance to FC14)-1
Stab4 Atmospheric Stability
K4 Temperature
U4 Wind Speed
Parameter
DF Estimate
1
1
1
1
1
5.98125
9.68110
0.43775
-0.02747
-0.11372
Standard
Error
2.43806
5.62338
0.16287
0.00732
0.06587
t Value
2.45
1.72
2.69
-3.75
-1.73
Pr>|t
0.0152
0.0870
0.0079
0.0002
0.0861
Summary of Stepwise Selection
Step Variable Entered
1 Atmospheric Stability
2 Temperature
3 Wind Speed
4 (Distance to FC14)-1
Partial
R-Square
0.0774
0.0552
0.0149
0.0151
Model
R-Square
0.0774
0.1326
0.1474
0.1626
Cp
13.4719
4.8002
3.9236
2.9959
F Value
14.01
10.56
2.88
2.96
Pr>F
0.0002
0.0014
0.0917
0.0870
-------
A-38
Residual Plot of the Best Fit Model of Ethylbenzene
LnEbzOOit = 5.9813 +9.6811 f 14 1ntnv +0.4377 SLaM -0.0275 K5 -0.1137 IX
2"
-2-
-3-
+ +
+ v--
\ I I I I I I \
\\
N
169
0.1626
AdjRsq
0.1421
0.8023
-1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8
Redicted Val ue
Figure A-19. Residual vs. Predicted Plot of the Best-fitting 4-Parameter Model of
Ethylbenzene after Removing the Outliers
Cp Plot with Reference Lines
LnBzOQit = 5.9813 +9.6811 f14_1ntnv +0.4377 3ab4 -0.0275 Y5 -0.1137 Ut
50'
40'
20'
10"
2.0
2.5
3.0
3.5
P
R ot + + + CPP
CP = 2P - (P for f ul I model) + 1
4.0
' CP= P
4.5
N
169
Ffeq
0.1626
MjFfeq
0.1421
RVCE
0. 8023
5.0
Figure A-20. Cp Plot for the Best-fitting 4-Parameter Model for Ethylbenzene after
Removing the Outliers
-------
A-39
Methyl tert Butyl Ether (MTBE)
Bivariate Pearson Correlation
The correlation coefficients between the untransformed ambient air concentration of
MTBE and the distance and the inverse distance to the nearest FC 11 (interstate highways in
urban) was 0.22, p=0.0027. The only identified point source of MTBE within 3 kilometers
of Elizabeth, NJ, which generated more than 0.9 tons in 1999, was the refinery in Linden.
However, the distance to the refinery was not significantly correlated to the MTBE air
concentration.
The meteorological variables that were significantly correlated with the In-
transformed MTBE air concentrations were: atmospheric stability (0.296, p<0.0001), wind
speed (-0.265, p=0.0004), relative humidity (0.196, p=0.0094), and temperature (0.173,
p=0.022). The atmospheric pressure and precipitation were not correlated with the MTBE
air concentrations.
Preliminary Selection of Predictors
The distances from the air sampler to the closest gas station was selected as a
predictor of ambient air concentration of MTBE at p<0.15. The distance to the major
roadways and refinery were not selected as significant predictors. Atmospheric stability was
selected as predictors from the meteorological variables.
Selection of the Best-fitting Model
The predictor variables selected by the different regression model selection methods
for the residential ambient air concentration of MTBE were relatively consistent. The
meteorological variables, which were consistently included in the series of regression model,
-------
A-40
were the atmospheric stability, temperature, and wind speed, in order of selection. The
distance to the closest interstate roadways (FC11) and the distance to the closet major urban
arterial roadways (FC14) were not selected as significant predictor variables in the model of
MTBE. The distance to the closest gas station was included as a significant predictor variable
in the model of MTBE. The model statistics are summarized in Table A-ll. The residuals
were relatively randomly distributed but had some irregular pattern in error variances (Figure
A-21). The PP plot was nearly linear implying that the error term of the model followed a
normal distribution (Appendix B). Fourteen data points were identified as possible Outliers
were identified using test statistics of + 1.655 at 0.95, df=165 (Figure A-22). The analysis of
the variance, parameter estimates, and the summary of model statistics for the best-fitting 5-
parameter model for MTBE after removing outliers are listed in Table A-12. After removing
the Outliers, the parameter estimates became more significant for all variables. A visual
examination of the residuals indicated that they were randomly distributed without showing
any obvious trend or any particular pattern (Figure A-23). The standardized residuals of the
best-fitting model were close to a normal distribution with constant variances. The PP plot
was nearly linear implying that the error term of the model followed a normal distribution
(Appendix C). There was no visual evidence for the lack of fit or unequal error variance for
the best-fitting 5-parameter regression model for the residential ambient air MTBE
concentrations. The Mallows' C statistic associated with this particular subset of variables
was 5.0. Since the number of parameters (p) including the intercept in the best-fitting model
was 5, the resulting model had the appropriate number of parameters.
-------
A-41
Diagnostics of Equal Variances and Multicollinearity Diagnostics
To test the assumption of equal variance, the heteroscedasticity of the parameter
estimates were tested as well as multicollinearity. The chi-square was 15.55 with a probability
of 0.34, a value greater than 0.05. Therefore, the variances of the parameter estimates could
be concluded as not significantly different. As a consequence, the equal error variances in
parameter estimates were assumed in the best-fitting 5-parameter model. The bivariate
Pearson correlations between pairs of predictors included in the model identified statistically
significant correlations between the wind speed and atmospheric stability (-0.51, p<0.0001),
and between the wind speed and temperature (-0.25, p=0.0007).
The variance inflations for seven predictor variables were close to 1 (1.03 ~ 1.46)
which is not greater than 10. Based on the variation inflation factors, there was no significant
collinearity between the predictors in the model. However, as a result of the collinearity
diagnostics, the condition index was 115, which was greater than 100, and the eigenvalue was
close to zero (0.00033), which was smaller than 0.01. The proportion of variation of
intercept (0.98) and temperature (0.94) were greater than 0.5, indicating that the two
parameters were interacted. Therefore, between some predictors, there were possible co-
dependences, which might overspecify the model outcome.
-------
A-42
Table A-ll. Results of the Best-fitting 5-Parameter Model for MTBE
Analysis of Variance
Source DF
Model 2
Error 180
Corrected Total 182
Root MSB
Dependent Mean
Coefficient of Variation
Sum of Mean
Squares Square
F Value
23.4697 11.7349 8.23
256.619 1.42566
280.089
1.1940 R-Square 0.0838
1.2495 Adjusted R-Square 0.0736
95.5620
Pr>F
0.0004
Parameter Estimates
Variable Label
Intercept Intercept
GSlmlnv (Distance to Gas Station)-1
U4 Wind speed
Parameter
DF Estimate
1 1.97438
1 39.49380
1 -0.21592
Standard
Error t Value
0.35393 5.58
13.21040 2.99
0.07758 -2.78
Pr>|t|
<.0001
0.0032
0.0060
Summary of Stepwise Selection
Step Variable Entered
1 (Distance to Gas Station)-1
2 Wind Speed
Partial Model
R-Square R-Square
0.0444 0.0444
0.0394 0.0838
Cp F Value
5.9898 8.40
0.3584 7.75
Pr>F
0.0042
0.0060
-------
A-43
Residual Plot of the Best Fit Model of MTBE
LnWIEBD = 1. 9744 +39. 494 GS1ni nv -0. 2159 Ltt
-r
-2-
-3
-4"
N
183
0.0838
AdjRsq
0.0736
1.194
n i i i i i i i i i r
0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75
Redicted Val ue
Figure A-21. Residual vs. Predicted Plot of the Best-fitting 5-Parameter Model of MTBE
Outliers of Model for MTBE
LnWIBBD = 1.9744 +39.494 CBMnv -0.2159 Uf
+
+ + + + +
v / +\ \ * ++ $. +/+
/*- + ^-bf ++ f+++ ++ ++ + -t
* t+
i
j
,
-r '
I I I
+
*+++++ +++ V
f -!£+++ ++++4
+ i"
+
+
-|-
|
+
-|-
+ + + ^
jt-K^jfefa.--^. .
++v +
i
N
Ffeq
0.0838
MjFfeq
0.0736
RVCE
25 50 75 100 125 150 175 200
Cbservat i on Nintoer
Figure A-22. Outliers of 5-Parameter Model of MTBE
-------
A-44
Table A-12. Results of the Best-fitting 5-Parameter Model for MTBE after Removing the
Outliers
Analysis of Variance
Source DF
Model 5
Error 165
Corrected Total 170
Root MSE
Dependent Mean
Coefficient of Variation
Sum of
Squares
29.1548
87.6716
116.826
0.7289
1.4525
50.1840
Mean
Square
5.83095
0.53134
R- Square
F Value
10.97
Adjusted R-Square
0.2496
0.2268
Pr>F
<.0001
Parameter Estimates
Variable Label
Intercept Intercept
Fll_lmlnv (Distance to FC11)-1
GSlmlnv (Distance to Gas Station)
Stab4 Atmospheric Stability
K5 Temperature
U4 Wind speed
DF
1
1
-1 1
1
1
1
Parameter
Estimate
-2.74300
22.25470
33.56270
0.24348
0.01239
-0.18702
Standard
Error
2.27071
14.30720
8.28221
0.14963
0.00669
0.06022
t Value
-1.21
1.56
4.05
1.63
1.85
-3.11
Pr>|t
0.2288
0.1217
<.0001
0.1056
0.0659
0.0022
Summary of Stepwise Selection
Step Variable Entered
1 Wind Speed
2 (Distance to Gas Station)-1
3 Temperature
4 Atmospheric Stability
5 (Distance to FC11)-1
Partial
R-Square
0.1226
0.0897
0.0136
0.0126
0.0110
Model
R-Square
0.1226
0.2123
0.2259
0.2386
0.2496
Cp
23.1543
5.7156
4.7657
4.0309
3.6458
F Value
23.62
19.13
2.94
2.75
2.42
Pr>F
<.0001
<.0001
0.0885
0.0991
0.1217
-------
A-45
Residual Plot of the Best Fit Model of MTBE
LrMEBDOit = -2. 743+22. 255 f 11 1ni nv +33. 563 GSIntnv +0. 2435 StaM +0. 0124 K5 -0.187 IX
2.0'
1.5'
1.0"
0.5'
0. 0'
0.5'
1.0"
1.5'
2.0'
2.5'
+ +
+ ~*~ 4- "*"
++ + + +
+ +l"+ |V ++ +"+ + ^
"*" + + + JL. "4;"l"iu.+ -t"h-+ "*" + + ++
+ +±t+ + +4+l++-^" ^ + +
+ •+ ^" ~Hi.H=i- _!__(- +
-(- + L±T^ j_ -H~+ + + _(.
+ i J~ 1 J L -(-
+ ++ + +*+ +
-+ f+
+ 1 "*"
"*" +
+ + + +
+
+
N
171
F%q
0.2496
AdjRsq
0.2268
0.7289
\ i i r
0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75
Predicted Val ue
Figure A-23. Residual vs. Predicted Plot of the Best-fitting 5-Parameter Model of MTBE
after Removing the Outliers
Cp Plot with Reference Lines
LnWIBBDQit = -2. 743 +22. 255 f 11 1nt nv +33. 563 (£1nt nv +0. 2435 3 ab4 +0. 0124 \<5 -0. 187 Ut
50'
40'
30'
20'
10'
0'
+
+ +
+ +
+ ,
+ + +
+ *
+ * +
+ +
L^^— ^ — •-
N
171
Ffeq
0.2496
MjFfeq
0.2268
RvEE
0.7289
2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
P
~A -1 1 — 1- /-rvti-i /~i-i _ n
CP = 2P - (P for f ul I model) + 1
Figure A-24. Cp Plot for the Best-fitting 5-Parameter Model for MTBE after Removing the
Outliers
-------
A-46
4.9. Tetrachloroethylene (PCE)
4.9.1. Bivariate Pearson Correlation
The distance to the closest dry cleaning facility (DCF1) was statistically significantly
correlated with the In-transformed ambient air concentration of PCE at D =0.01, regardless
of form of transformation of the distance variable. The refinery was the only identified point
source of the PCE with the combined annual generation of 1.0 ton. The distance to the
refinery was not correlated significantly with PCE in any of the analysis. A statistically
significant correlation was found at p<0.01 were the distance to the US Highway Route 1
and the distance to the closest gas station. Other proximity variables to roadway were not
significantly correlated.
The meteorological variables that were significantly correlated with PCE
concentrations were wind speed (-0.373, p<0.0001), relative humidity (0.313, p<0.0001),
atmospheric stability (0.282, p=0.0001), mixing heights (-0.254, p=0.0005), and precipitation
(0.235, p=0.0013). Temperature and atmospheric pressure was not correlated with the
residential ambient air concentration of PCE.
4.9.2. Preliminary Selection of Predictors
A series of preliminary regression analyses were performed on the In-transformed
PCE concentration to determine which variables to include in the model. The distances to
the closest dry cleaning facility was selected as an important predictor of ambient PCE
concentration from the variables that describe the distance between sources and residences.
The wind speed, precipitation, and stability were selected as predictor variables (p<0.15)
from the meteorological variables.
-------
A-47
Selection of the Best-fitting Model
The predictor variables selected by the different regression model selection methods
for the residential ambient air concentration of PCE were relatively consistent. The
meteorological variables, which were consistently included in the series of regression model,
were the wind speed, temperature, atmospheric stability, and relative humidity in order of
selection. The inverse distance to the closest dry cleaning facility was selected as a significant
predictor variable in the best-fitting model of PCE, while the distance to major roadways or
gas station were not selected as expected. The model statistics are summarized in Table A-
13. The residuals were relatively randomly distributed but had few outliers in error variances
(Figure A-25). The PP plot was nearly linear implying that the error term of the model
followed a normal distribution. Eight data points were identified as possible Outliers were
identified using test statistics of + 1.655 at 0.95, df=165 (Figure A-26). The analysis of the
variance, parameter estimates, and the summary of the model statistics for the best-fitting 6-
parameter model for PCE after removing the outliers are listed in Table A-14. The selected
model was statistically significant (p<0.0001).
A visual examination of the residuals indicated that they were randomly distributed
without showing any obvious trend or any particular pattern (Figure A-26). The standardized
residuals of the best-fitting model were close to a normal distribution with constant
variances. The PP plot was nearly linear implying that the error term of the model followed a
normal distribution. There was no visual evidence of lack of fit or unequal error variance for
the best 6-parameter regression model for the residential ambient air PCE concentration.
The Mallows' Cp statistic associated with this particular subset of variables was determined at
6.0 (Figure A-27). Since the number of parameters (p) including the intercept in the best-
fitting model was 6, the resulting model was appropriate in number of parameters.
-------
A-48
Diagnostics of Equal Variances and Multicollinearity Diagnostics
To test the assumption of equal variance, the heteroscedasticity of the parameter
estimates were tested as well as multicollinearity. The chi-square was 18.1 with a probability
of 0.58, a value greater than 0.05. Therefore, the variances of the parameter estimates could
be concluded as not significantly different. As a consequence, the equal error variances in
parameter estimates were assumed in the best-fitting 6-parameter model. The bivariate
Pearson correlations showed statistically significant correlations were identified between the
wind speed and atmospheric stability (-0.51, p<0.0001), and between the wind speed and
temperature (-0.25, p=0.0007). Relative humidity was also significantly correlated with
atmospheric stability (0.36, p<0.0001), and with wind speed (-0.43, p<0.0001).
The variance inflations for seven predictor variables were close to 1 (1.04 ~ 1.44)
which is not greater than 10. Based on the variation inflation factors, there was no significant
collinearity between the predictors in the model. However, as a result of the collinearity
diagnostics, the condition index was 130, which was greater than 100, and the eigenvalue was
close to zero (0.00033), which was smaller than 0.01
-------
A-49
Table A-13. Results of the Best-fitting 6-Parameter Model for PCE
Analysis of Variance
Source DF
Model 4
Error 178
Corrected Total 182
Root MSE
Dependent Mean
Coefficient of Variation
Sum of
Squares
25.689
96.5014
122.190
0.7363
-0.3394
-216.9400
Mean
Square
6.42224
0.54214
F Value
11.85
R- Square
Adjusted R-Square
0.2102
0.1925
Pr>F
<.0001
Parameter Estimates
Variable Label
Intercept Intercept
DCFlmlnv (Distance to DCF)-1
Stab4 Atmospheric Stability
U4 Wind speed
PrecipS Precipitation
DF
1
1
1
1
1
Parameter
Estimate
-0.65455
54.38580
0.21071
-0.22587
0.01026
Standard
Error
0.86751
21.18230
0.14312
0.05600
0.00388
t Value
-0.75
2.57
1.47
-4.03
2.65
Pr>|t
0.4515
0.0111
0.1427
<.0001
0.0088
Summary of Stepwise Selection
Step Variable Entered
1 Wind Speed
2 (Distance to DCF)-1
3 Precipitation
4 Atmospheric Stability
Partial
R-Square
0.1389
0.0314
0.0303
0.0096
Model
R-Square
0.1389
0.1704
0.2006
0.2102
Cp
13.2138
8.1961
3.4399
3.2933
F Value
29.20
6.82
6.78
2.17
Pr>F
<.0001
0.0098
0.0100
0.1427
-------
A-50
Residual Plot of the Best Fit Model of PCE
LnFCBD = -0.6546 +54.386 OFIninv +0.2107 aab4 -0. 2259 IX +0. 0103 Reel p5
0"
-1
-2-
-31
N
183
Rsq
0.2102
AdjRsq
0.1925
R\
-------
A-51
Table A-14. Results of the Best-fitting 6-Parameter Model for PCE after Removing the
Outliers
Analysis of Variance
Source DF
Model 5
Error 158
Corrected Total 163
Root MSE
Dependent Mean
Coefficient of Variation
Sum of
Squares
12.5816
27.6904
40.2721
0.4186
-0.2064
-202.8800
Mean
Square
2.51633
0.17526
R- Square
FV
alue
14.36
Adjusted R-Square
0.3124
0.2907
Pr>F
<.0001
Parameter Estimates
Variable Label
Intercept Intercept
DCFlmlnv (Distance to DCF)-1
Stab4 Atmospheric Stability
K5 Temperature
U4 Wind speed
RH4 Relative Humidity
DF
1
1
1
1
1
1
Parameter
Estimate
2.49450
32.67340
0.14442
-0.01229
-0.14410
0.00913
Standard
Error
1.34715
12.39640
0.08831
0.00416
0.03588
0.00301
t Value
1.85
2.64
1.64
-2.96
-4.02
3.04
Pr>|t
0.0659
0.0092
0.1040
0.0036
<.0001
0.0028
Summary of Stepwise Selection
Step Variable Entered
1 Wind Speed
2 (Distance to DCF)-1
3 Relative Humidity
4 Temperature
5 Atmospheric Stability
Partial
R-Square
0.1847
0.0381
0.0317
0.0463
0.0116
Model
R-Square
0.1847
0.2228
0.2545
0.3008
0.3124
Cp
25.6505
18.9710
13.7554
5.2151
4.5652
F Value
36.70
7.90
6.80
10.53
2.67
Pr>F
<.0001
0.0056
0.0100
0.0014
0.1040
-------
A-52
Residual Plot of the Best Fit Model of PCE
LnPCBDOit = 2.4945 +32.673 DCFInt nv +0.1444 StaM -0.0123 K5 -0.1441 Ltt +0.0091 R-5
1.25"
1.00-
0.75-
0.50-
0.25-
^ 0.00'
O3
-0.25'
-0.50-
-0.75-
-1.00-
-1.25-
+ +
+ + + + +
\ + ++-V ^ +^++ + + ? +
-t1" + ++ -fc
^ 1 ~ _L . il_.
+ + ++ JL*+ + ++ + +
"^ £ *
+ + t++++++++ +*+^+ + ^
+ + +
+
N
164
Rsq
0. 3124
AdjRsq
0.2907
RvEE
0.4186
\ i r
-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8
Predicted Val ue
Figure A-27. Residual vs. Predicted Plot of the Best-fitting 6-Parameter Model of PCE after
Removing the Outliers
Cp Plot with Reference Lines
LnFCEOOrt = 2. 4945 +32. 673 DCFInt nv +0.1444 3 ab4 -0. 0123 ¥5 -0. 1441 LM +0. 0091 R-5
50'
40'
30'
20'
10'
N
164
Rsq
0.3124
MjFfeq
0. 2907
RvCE
0.4186
2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
P
R ot + + + CPP CP = P
CP = 2P - (P for f ul I model) + 1
Figure A-28. Cp Plot for the Best-fitting 6-Parameter Model for PCE after Removing the
Outliers
-------
A-53
PM25 Mass
Bivariate Pearson Correlation
The correlation coefficients of the In-trans formed PM25 Mass and the distance to the
inverse of urban interstate (FC11) roadways, minor arterial roads (F16) and local roads (F19)
were 0.21 (p=0.03), 0.22 (p=0.03) and 0.23 (0.02), respectively.
The Pearson correlation coefficients of the meteorological variables and the In-
transformed PM25 Mass that were statistically significantly correlated were: atmospheric
stability, 0.56 (p<0.0001); mixing height, -0.26 (p=0.01); wind speed, -0.50 (p<.0001); and
relative humidity, 0.39 (p<0.0001).
Preliminary Selection of Predictors
The preliminary regression analysis was performed on the In-transformed PM25 Mass
to determine the relative importance of variables within the same types (proximity and
meteorological) of independent variables. The distances to the urban interstates (FC11), and
local roadways (FC19) were included in the linear regression model (p<0.15). Inverse
distance to a truck loading/unloading area (PM03) was also selected. Among the
meteorological variables, atmospheric stability, temperature, atmospheric pressure and wind
speed were selected.
Selection of the Best-fitting Model
The variables selected by the different regressions methods were relatively
consistent. Atmospheric stability was the most important factor in the regression model
with partial r2 of 0.318. The wind speed, temperature and atmospheric stability were also
included as predictors in the model. The model also included the inverse distances to the
-------
A-54
major roadways (FC11), local roadways (F19) and truck loading area (PM03). The
parameters and analysis of variance of the regression equations for the PM25 Mass
ambient air concentration for the best-fitting model with 6 variables selected are given in
Table A-15. The C(p), which is Mallows' Cp statistic, associated with this particular
subset of variables was determined to be 8.0. The resulting model was appropriate in
number of parameters, because the number of parameters (p) including the intercept in
the best-fitting model match to the C(p) value. The diagnostic plots, the residual plot
against the predicted values, and the normal probability-probability (PP) plot were
generated and visually examined (Figures A-30 and Appendix B). The residuals were
randomly distributed without showing any obvious trend or any particular pattern (Figure
A-30.) and close to a normal distribution and the constant variances. The PP plot was
nearly linear so it could be considered the error term of the model follows a normal
distribution. Based on the visual diagnosis, there was no significant evidence of lack of fit
or of significant unequal error variance for the best 8-parameter regression model.
No evidence of outliers was found.
Diagnostics of Equal Variances and Multicollinearity Diagnostics
The standardized residuals of the "best-fitting" model are close to a normal
distribution and have constant variances. A Shapiro-Wilk W test for normality was also
performed, and the p-value was very large (0.61), indicating that we cannot reject that the
residuals are normally distributed. Based on these results, there was no evidence of a lack
of fit or unequal error variance for this 8-parameter regression model.
The multicollinearity of the eight predictor variables was tested by checking their
variance inflation factors (vif), which varied between 0 and 1.35 and were never higher
-------
A-55
than 10 (reference value). This indicates that there was no significant collinearity
between the predictors in the model. However, the collinearity diagnostic tests showed a
condition index of 651 (much higher than 30, the reference value) and an eigenvalue of
0.00002 (much smaller than 0.01, the reference value). The proportion of variation for the
intercept (0.99) and for the pressure (0.98) were greater than 0.5 (reference value),
indicating that the two parameters were probably interacting, and that the 8-parameters
model could be overly specified. However, to a certain extent, this might be unavoidable
because it is extremely unlikely for all parameters to be completely independent (non
correlated) to each other.
In order to lessen the degree of multicollinearity diagnosed, the atmospheric
pressure, which showed some interaction with the intercept, was removed from the
predictor variables and a new multiple regression analysis was run. The coefficient of
determination (r2) of the resulting "best-fitting"-7-parameter model was decreased from
0.50 to 0.49, and the condition index was decreased from 651 to 127.3. The interaction
between the predictors in this new 7-parameter model appeared to be decreased after the
removal of the pressure from the model, but the eigenvalue was still smaller than 0.01
(0.0003), and the proportion of variation of the intercept (0.98) and the temperature
(0.94) were still greater than 0.5. Therefore, the temperature was removed from the 7-
parameter model and a new multiple regression analysis was run again. This time, the
coefficient of determination (r2) for the resulting 6-parameter model was decreased from
0.49 to 0.47, the condition index decreased from 127.3 to 43.2 (much closer to 30, the
reference value), and the eigenvalue increased to 0.002 (much closer to 0.01, the
reference value). However, the proportion of variations of the intercept (0.949) and that
-------
A-56
of the atmospheric stability (0.839) were still greater than 0.5. Eliminating the stability
parameter from the model resulted in a drastic decrease of the coefficient of
determination (r2); thus, we concluded that probably the 6-parameters regression equation
better describes the In-transformed outdoor concentration of PM2.5 (Table 16).
Residual vs, Predicted Plot
LnPM = 13.914 +25.773 f11_llnw *3.6275 f19_1inv +58.636 PM03DIS_inv -0.0093K4 -0.163U4
-0.0131 mmHG4 +0.4138Stab4
t I
4- *
N
102
Rsq
0.5010
fldjRsq
0.4639
RMSE
0.3G58
1.75 2.00 2.25 2.50 2.75 3.00 3.25 3.50 3.75
Predicted Value
Figure A-30. Residual vs. Predicted Plot of the Best-fitting 6-Parameter Model of PM25 Mass
-------
Table.A-15. Results of the Best-fitting 7-Parameter Model for LnPM2.5
A-57
Analysis of Variance
Source DF
Model 7
Error 94
Corrected Total 101
Parameter Estimates
Variable Label
Intercept Intercept
Fll_llnv Distance to FC11
F19_llnv Distance to FC19
PM03DIS_Inv
K4 Temperature
U4
mmHG4
Sum of
Squares
12.62973
12.57688
25.20661
DF
1
1
1
1
1
1
1
Stab4 Atmospheric Stability 1
Mean
Square F Value Pr > F
1.80425 13
0.13380
Parameter Standard
Estimate Error
13.91429 6.84008
25.77270 11.93020
3.62748 1.65616
58.63622 29.32827
-0.00927 0.00476
-0.16301 0.03855
-0.01311 0.00827
0.41383 0.09362
48 <.0001
vi Pr> 'I
Value '
4.14 0.0447
4.67 0.0333
4.80 0.0310
4.00 0.0485
3.79 0.0545
17.88 <.0001
2.52 0.1160
19.54 <.0001
Summary of Stepwise Selection
Step Variable Entered
1 Stab4
2 U4
3 F19_llnv
4 Fll_llnv
5 PM03DIS_Inv
6 K4
7 mmGH4
Partial
R-Square
0.3184
0.0655
0.0524
0.0210
0.0163
0.0141
0.0134
Model
R-Square P F Value Pr>F
0.3184 30.4114
0.3839 20.0725
0.4363 12.1924
0.4573 10.2394
0.4736 9.1699
0.4877 8.5164
0.5010 8.0000
46.71 <.0001
10.52 0.0016
9.12 0.0032
3.75 0.0557
2.97 0.0880
2.61 0.1094
2.52 0.1160
-------
A-58
Table.A-16 Results of the Best-fitting 5-Parameter Model for PM2.5 after Removing
Atmospheric Pressure and Temperature
Analysis of Variance
Source DF
Model 5
Error 96
Corrected Total 101
Parameter Estimates
Variable Label
Intercept Intercept
Fll_llnv Distance to FC11
F19_llnv Distance to FC19
PM03DIS_Inv
U4
Stab4
Sum of
Squares
11.93801
13.26860
25.20661
DF
1
1
1
1
1
1
Mean
Square
2.38760
0.13821
Parameter
Estimate
1.06000
25.27425
4.19851
50.94389
-0.13037
0.42820
FV
alue
17.27
Standard
Error
0.57297
12.03839
1.65726
29.55414
0.03636
0.09491
t
Value
3.42
4.41
6.42
2.97
12.86
20.35
Pr > F
<.0001
Pr> t|
0.0674
0.0384
0.0129
0.0880
0.0005
<.0001
Summary of Stepwise Selection
Step Variable Entered
1 Stab4
2 U4
3 F19_llnv
4 Fll_llnv
5 PM03DIS_Inv
Partial
R-Square
0.3184
0.0655
0.0524
0.0210
0.0163
Model
R-Square
0.3184
0.3839
0.4363
0.4573
0.4736
Cp
26.3067
16.3623
8.7980
6.9713
6.0000
F Value
46.71
10.52
9.12
3.75
2.97
Pr>F
<.0001
0.0016
0.0032
0.0557
0.0880
-------
A-59
Elemental Carbon
Bivariate Pearson Correlation
The correlation coefficients of the In-trans formed elemental carbon concentration
and the distance to the inverse of urban interstate (FC11) roadways and minor arterial roads
(F16) were 0.28 (p=0.04) and 0.29 (p=0.03), respectively. Distance to hamburger restaurants
where broiling of meats occur had a correlation coefficient of 0.35(p=0.01).
The Pearson correlation coefficients of the meteorological variables and the In-
transformed elemental carbon concentration that were statistically significantly correlated
were: atmospheric stability, 0.43 (p<0.0001); mixing height, -0.34 (p=0.01); wind speed, -0.33
(p=.02), relative humidity, 0.49 (p<0.0001) and precipitation 0.29(p=0.03).
Preliminary Selection of Predictors
The preliminary regression analysis was performed on the In-transformed elemental
carbon concentration to determine the relative importance of variables within the same types
(proximity and meteorological) of independent variables. The distances to the urban major
arterial roadways (FC14) was included in the resulting linear regression model (p<0.15).
Inverse distance to a truck loading/unloading area seaport area (PM02) was also selected.
Among the meteorological variables, atmospheric stability, relative humidity were selected.
Selection of the Best-fitting Model
The variables selected by the different regressions methods were relatively
consistent. Atmospheric stability and relative humidity were included as predictors in the
model. The model also included the inverse distances to the major roadways (FC14) and
truck loading/sea port area (PM02). The parameters and analysis of variance of the
-------
A-60
regression equations for elemental carbon ambient air concentration for the best-fitting
model with 6 variables selected are given in Table A-17. The C(p), which is Mallows' Cp
statistic, associated with this particular subset of variables was determined to be 5. The
resulting model was appropriate in number of parameters, because the number of
parameters (p) including the intercept in the best-fitting model match to the C(p) value.
The diagnostic plots, the residual plot against the predicted values, and the normal
probability-probability (PP) plot were generated and visually examined (Figures A- and
Appendix B). The residuals were randomly distributed without showing any obvious
trend or any particular pattern (Figure A-31.) and close to a normal distribution and the
constant variances. The PP plot was nearly linear so it could be considered the error term
of the model follows a normal distribution. Based on the visual diagnosis, there was no
significant evidence of lack of fit or of significant unequal error variance for the best 8-
parameter regression model.
No evidence of outliers was found.
Diagnostics of Equal Variances and Multicollinearity Diagnostics
The standardized residuals of the "best-fitting" model are close to a normal
distribution and have constant variances. A Shapiro-Wilk W test for normality showed a
very large p-value (0.65), indicating that the residuals are normally distributed.
From the figure above (PP-plot) we see that the distribution of the residuals
doesn't seem heteroscedastic and, therefore, we accept the hypothesis of homogeneity of
variance of the residuals in the 5-parameter model. Also in this case we checked the
multicollinearity of the five predictor variables, which varied between 0 and 1.29,
indicating that there was no significant collinearity between the predictors in the model.
-------
A-61
The collinearity diagnostic tests showed a condition index of 37 (a little higher than 30)
and an eigenvalue of 0.003 (smaller than 0.01). The proportion of variation for the
intercept (0.92) and for the atmospheric stability (0.95) were greater than 0.5. However,
because of the very low vif values we concluded that there was no significant collinearity
between the predictors in the model and the 5-parameters regression equation shown
before is basically adequate to describe the variation in outdoor concentration of LnEC.
Residual vs, Predicted Plot
LnEC = -2.7687 +8.4105 f14_linv +941.59 PM02D PS_]nv *0.0137RH4 *0.3S47Stab4
N
52
Rsq
0.3765
AdjRsq
0.3234
HMSE
0.3789
0.0 0.2
Predicted Value
Figure A-31. Residual vs. Predicted Plot of the Best-fitting 6-Parameter Model of Elemental
Carbon
-------
A-62
Table.A-17. Results of the Best-fitting 4-Parameter Model for Elemental Carbon
Analysis of Variance
Source
DF
Model 4
Error 47
Corrected Total 51
Parameter Estimates
Variable
Intercept
F14_llnv
PM02DIS_Inv
RH4
Stab4
Label
Intercept
Distance to FC14
Sum of
Squares
4.07456
6.74801
10.82257
DF
1
1
1
1
Atmospheric Stability 1
Mean
Square
F Value Pr > F
1.01864 7.
0.14357
Parameter
Estimate
-2.76874
8.41047
944.59427
0.01371
0.35474
Standard
Error
0.68589
3.84635
494.54016
0.00498
0.14301
09 0.0001
t
Value
16.30
4.78
3.65
7.60
6.15
Pr> t|
0.0002
0.0338
0.0622
0.0083
0.0168
Summary of Stepwise Selection
Step Variable Entered
1 RH4
2 Stab4
3 F14_llnv
4 PM02DIS.
_Inv
Partial
R-Square
0.2357
0.0562
0.0362
0.0484
Model
R-Square
0.2357
0.2919
0.3281
0.3765
Cp
9.6140
7.3748
6.6483
5.0000
F Value
15.42
3.89
2.58
3.65
Pr>F
0.0003
0.0542
0.1145
0.0622
-------
A-63
Organic Carbon
Bivariate Pearson Correlation
The correlation coefficients of the In-trans formed organic carbon concentration and
the distance to the inverse of minor urban arterials (FC16) roadways was 0.29 (p=0.04). The
Pearson correlation coefficients of the meteorological variables and the In-transformed
elemental carbon concentration that were statistically significantly correlated at a=0.05, were
atmospheric stability, 0.51 (p<0.0001) and relative humidity, 0.49 (p<0.0001).
Preliminary Selection of Predictors
The preliminary regression analysis was performed on the In-transformed organic
carbon concentration to determine the relative importance of variables within the same types
(proximity and meteorological) of independent variables. The distances to the interstate
roadways (FC11) was included in the resulting linear regression model (p<0.15).
Atmospheric stability was included from the meteorological variables.
Selection of the Best-fitting Model
The variables selected by the different regressions methods were relatively
consistent. Atmospheric stability was included as a predictor in the model. The model
included the inverse distances to the interstate (FC11). The parameters and analysis of
variance of the regression equations for elemental carbon ambient air concentration for
the best-fitting model with 6 variables selected are given in Table A-18. The C(p), which
is Mallows' Cp statistic, associated with this particular subset of variables was determined
to be 3. The resulting model was appropriate in number of parameters, because the
number of parameters (p) including the intercept in the best-fitting model match to the
-------
A-64
C(p) value. The diagnostic plots, the residual plot against the predicted values, and the
normal probability-probability (PP) plot were generated and visually examined (Figures
A-32 and Appendix B). The residuals were randomly distributed without showing any
obvious trend or any particular pattern (Figure A-32.) and close to a normal distribution
and the constant variances. The PP plot was nearly linear so it could be considered the
error term of the model follows a normal distribution. Based on the visual diagnosis,
there was no significant evidence of lack of fit or of significant unequal error variance for
the best 8-parameter regression model.
No evidence of outliers was found.
Diagnostics of Equal Variances and Multicollinearity Diagnostics
The standardized residuals of the "best-fitting" model are close to a normal
distribution and have constant variances. A Shapiro-Wilk W test for normality showed a
very large p-value (0.79), indicating that the residuals are normally distributed. From the
figure above (PP plot) we see that also in this case we can accept the hypothesis of
homogeneity of variance of the residuals.
The vif values of the three predictor variables varied between 0 and 1.006 (never
higher than 10, the reference value). The collinearity diagnostic tests showed a condition
index of 25.86 (lower than 30, the reference value) but an eigenvalue of 0.004 (a little
smaller than 0.01, the reference value). Even though the proportion of variation for the
intercept and for the atmospheric stability were greater than 0.5 (they both were 0.998)
we still can assume that there was no significant collinearity between the predictors in the
model because of the extremely low vif values and, therefore, the 3-parameters regression
-------
A-65
equation shown before is basically adequate to describe the variation in outdoor
concentration of LnOC.
Residual vs, Predicted Plot
LnOC = -1.6752 +26.646 f11_1Inv +0.5557 Stabl
N
52
Rsq
0.2889
AdjRsq
0.2599
RMGE
0.4041
0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7
Predicted Ualue
Figure A-32. Residual vs. Predicted Plot of the Best-fitting 6-Parameter Model of Organic
Carbon
-------
Table.A-18. Results of the Best-fitting 3-Parameter Model for Organic Carbon
A-66
Analysis of Variance
Source DF
Model 2
Error 49
Corrected Total 51
Parameter Estimates
Variable Label
Intercept Intercept
Fll_llnv Distance to FC11
Stab4 Atmospheric Stability
Sum of
Squares
3.25109
8.00212
11.25321
DF
1
1
1
Mean
Square
F Value Pr > F
1.62555 9.95 0.0002
0.16331
Parameter
Estimate
-1.67517
26.64584
0.55571
Standard
Error t Value Pr>
t
0.66211 6.40 0.0147
19.36632 1.89 0.1751
0.13474 17.01 0.0001
Summary of Stepwise Selection
Step Variable Entered
1 Stab4
4 Fll_llnv
Partial
R-Square
0.2614
0.0275
Model
R-Square
0.2614
0.2889
Cp F Value Pr>F
2.8931 17.70 0.0001
3.000 1.89 0.1751
-------
A-67
Coronene and Benzo-ghi-Pyrene
Bivariate Pearson Correlation
The correlation coefficients of the In-transformed Coronene and Benzo-ghi-Pyrene
concentrations and the distance to the inverse of urban collectors (FC17) roadways was 0.44
(p=0.04) for B-ghi-p and 0.42(P<.0001) for COR. The Pearson correlation coefficients of
the meteorological variables and the In-transformed Coronene and Benzo-ghi-Pyrene
concentrations that were statistically significantly correlated at a=0.05, were atmospheric
stability, 0.40 (B-ghi-P) and 0.44 (COR) (p<0.0001), temperature -0.42 (both) (P<0.0001)
and mixing height -0.29 (B-ghi-P) and -0.31 (COR) (p<0.04).
Preliminary Selection of Predictors
The preliminary regression analysis was performed on the In-transformed organic
carbon concentration to determine the relative importance of variables within the same types
(proximity and meteorological) of independent variables. The distances to the interstate
roadways (FC11) and to Newark Airport (PM01) were included in the resulting linear
regression model (p<0.15). Atmospheric stability, temperature and precipitation were
included from the meteorological variables.
Selection of the Best-fitting Model
The variables selected by the different regressions methods were relatively
consistent. Atmospheric stability was included as a predictor in the model. The model
included the inverse distances to the interstate (FC11) and PM01. The parameters and
analysis of variance of the regression equations for Coronene and Benzo-ghi-Pyrene
ambient air concentration for the best-fitting model with 5 variables selected are given in
-------
A-68
Table A-19 and 20. The C(p), which is Mallows' Cp statistic, associated with this
particular subset of variables was determined to be 6. The resulting model was
appropriate in number of parameters, because the number of parameters (p) including the
intercept in the best-fitting model match to the C(p) value. The diagnostic plots, the
residual plot against the predicted values, and the normal probability-probability (PP) plot
were generated and visually examined (Figures A-33 and Appendix B). The residuals
were randomly distributed without showing any obvious trend or any particular pattern
(Figure A-33.) and close to a normal distribution and the constant variances. The PP plot
was nearly linear so it could be considered the error term of the model follows a normal
distribution. Based on the visual diagnosis, there was no significant evidence of lack of fit
or of significant unequal error variance for the best 8-parameter regression model.
No evidence of outliers was found.
Diagnostics of Equal Variances and Multicollinearity Diagnostics
The standardized residuals of the "best-fitting" model are close to a normal
distribution and have constant variances. A Shapiro-Wilk W test for normality indicated a
very large p-values (0.72 and 0.75 for LnB-ghi-P and COR, respectively), suggesting that
the residuals are normally distributed.
From the residual versus predicted values plots shown above we see that the
distributions of the residuals doesn't seem overly heteroscedastic and, therefore, we can
accept the hypothesis of homogeneity of variance of the residuals in both 6-parameter
models.
The vif values varied between 0 and 1.27 for both PAHs, which indicates that
there was no significant collinearity between the predictors in the two models. Once
-------
A-69
again, the collinearity diagnostic tests showed condition indexes and eigenvalues that
were, respectively, slightly higher and lower than the reference values (30 and 0.01), and
proportion of variations for two of the predictor variables that were higher than 0.5 (the
reference value). However, because of the very low vif values obtained we concluded
that there was significant collinearity between the predictors in neither of the two models
and that the two 6-parameters regression equations shown before are basically adequate
to describe the variations in outdoor concentrations of LnB-ghi-P and LnCOR.
Residual vs. Predicted Plot
• - IS.00? *I5Z.3I fll_llnv *B5C,8 ftlOIDIS_lnv -14.471 PrDclf>4 -O.Ci«9K4 *
Residua! vs. Predicted Rot
- -1
inprmp
£.0
Residua vs. Predicted Plot
Residua vs. Predicted Plot
i_1lnu t&H3.3E PHfl1DIS_i*w -fl.OAJHIM -13.647 F
-3.0 -2.5 -2.0 -1.5 -l.i)
0.5 1.0 1.5
Figure A-33. Residual vs. Predicted Plot of the Best-fitting 6-Parameter Model of COR and
B-hgi-P
-------
Table.A-19. Results of the Best-fitting 5-Parameter Model for Corenen
A-70
Parameter Estimates
Variable Label
Intercept Intercept
Fll_llnv Distance to FC11
PM01DIS_Inv
Precip4
K4
Stab4
DF
1
1
1
1
1
1
Parameter
Estimate
14.23612
125.00731
563.35265
-13.04686
-0.08337
1.63208
Standard
Error
4.72659
53.04655
371.56318
5.65791
0.01603
0.35474
t
Value
9.07
5.55
2.30
5.32
27.04
21.17
Pr> t|
0.0045
0.0236
0.1375
0.0265
<.0001
<.0001
Summary of Stepwise Selection
Step Variable Entered
1 K4
2 Stab4
3 Fll_llnv
4 Precip4
5 PM01DIS_Inv
Partial
R-Square
0.2140
0.1966
0.0588
0.0437
0.0271
Model
R-Square
0.2140
0.4106
0.4694
0.5131
0.5402
CP F
25.6691
10.9925
8.0059
6.2988
6.0000
Value
11.71
14.01
4.54
3.59
2.30
Pr>F
0.0014
0.0005
0.0391
0.0654
0.1375
-------
Table.A-20. Results of the Best-fitting 5-Parameter Model for Benzo-ghi-Pyrene
A-71
Parameter Estimates
Variable
Intercept
Fll_llnv
PM01DIS_Inv
Predip4
K4
Stab4
Label
Intercept
Distance to FC11
Temperature
Atmospheric Stability
DF
1
1
1
1
1
1
Parameter
Estimate
13.
125
629.
-12.
-0.
1.
55955
65797
84523
18382
07630
37241
Standard
Error
4
48
336
5
0
0
.27716
.00264
.23325
.11993
.01451
.32101
t
Value
10.05
6.85
3.51
5.66
27.65
18.28
Pr>
0
0
0
0
t
0030
0125
0685
0223
<.0001
0
0001
Summary of Stepwise Selection
Step Variable Entered
1 K4
2 Stab4
3 Fll_llnv
4 Precip4
5 PM01DIS.
Jnv
Partial
R-Square
0
0
0
0
0
2160
1613
0719
0424
0420
Model
R-Square
0
.2160
.03773
0
0
0
.4493
.4917
.5337
CP F
24.
13
9.
7.
6.
5663
0761
0590
5090
0000
Value
11.85
10.88
5.36
3.34
3.51
Pr>F
0
0
0
0
0
0013
0020
0257
0751
0685
-------
APPENDIX B. PP (Probability), OO (Ouantile) Plots
PP Plot of the Best Fit Model of mp-Xylene PP Plot of the Best Fit Model of mp-Xylene
LnnpXD = 5.5615 +14. 562 f 14_1rrl nv +22. 462 GBlnl nv +0.5263 3 ab4 -0.0247 IS -0.1225U4 LnnpXOCUt = 4. 9424 +7. 9474 f14_1rrl nv +17. 436 GSM nv +0. 5374 3 ab4 -0. 0232 K5 -0. 0653 Ut
N
183
Rsq
0.2657
AdjRsq
0.2450
WEE
0.7495
(A)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
CUnul at i ve Dstribution of RBS| dual
Rsq
0.3304
MjFfeq
0.3098
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
CUnul at i ve D st ri but i on of fesi dual
QQ Plot of the Best Fit Model of mp-Xylene
LnnpXD = 5.5615 +14. 562 f 14_1rrl nv +22. 462 GBIrrl nv 40.52633*4 -0.0247 IS -0. 12251)4
QQ Plot of the Best Fit Model of mp-Xylene
LnnpXDCUt = 4. 9424 +7. 9474 f14_1rrl nv +17. 436 GSM nv +0. 5374 3 ab4 - 0. 0232 IS - 0. 0653 U4
N
183
Rsq
0.2657
AdjRsq
0.2450
WEE
0.7495
0.3304
MjFfeq
0.3098
FME
0.5362
-3 -2
(B)
o
Oiantile
-3 -2
(D)
-1 o
NT ml Qiartile
Figure B-l. Plots for Model of m,p-Xylene before (A,B) and after (C,D) removal of outliers
Final Report
B-l
-------
PP Plot of the Best Fit Model of o-Xylene
Lno)C = 2.2843 +20. 316 f 14_1nl nv +13. 948 GBIrrl nv +0.6357 3 ab4 -0.0185 IS -0.1148m
Rsq
0. 3275
AdjRsq
0.3085
FUSE
0.5746
(A)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Oinul at i ve D st ri but i on of fesi dual
QQ Plot of the Best Fit Model of o-Xylene
Lno)C = 2.2843 +20. 316 f 14 Inlnv +13. 948 GBIrri nv +0.6357 3 ab4 -0.0185 IS -0.1148 Ut
N
183
Rsq
0. 3275
AdjRsq
0.3085
(B)
0
l diantile
(C)
PP Plot of the Best Fit Model of o-Xylene
Lno)COlt =4.4581 +7. 4437 f14_1rtl nv +9. 5424 GBIrtl nv +0. 5209 3ab4 -0. 0235 IS -0.122 Ut
Rsq
0.4223
AJjRsq
0.4045
FME
0. 4441
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Oinul at i ve D st ri but i on of fesi dual
QQ Plot of the Best Fit Model of o-Xylene
Lno)COlt = 4.4581 +7. 4437 f 14 1rtl nv +9. 5424 GB1 nl nv +0. 5209 3 ab4 -0.0235 IS -0. 122U4
1.25-
1.00'
0.75"
0.50-
_ 0.25"
D
~ 0.00-
°° -0.25-
-0.50-
-0.75-
N
168
Rsq
0.4223
AdjRsq
0.4045
0 1
^brnBl diantile
Figure B-2. Plots for Model of o-Xylene before (A,B) and after (C,D) removal of outliers
Final Report
B-2
(D)
-------
PP Plot of the Best Fit Model of Toluene
LnTd 0 = 6. 2928 +16. 665 f 14_1rrl nv +0. 6508 3 ab4 - 0. 0321 IS +0. 0155 R-B
AdjRsq
0.1951
OOE
0.906
(A)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
CUnul at i ve D st r i but i on of fesi dual
QQ Plot of the Best Fit Model of Toluene
LnTd 0 = 6. 2928 +16. 665 f 14_1rrl nv +0. 6508 3 ab4 - 0. 0321 IS +0. 0155 R-E
N
183
AdjRsq
0.1951
OOE
0.906
(B)
0
l diantile
(C)
PP Plot of the Best Fit Model of Toluene
LnTdOCUt = 3.1102 +14.721 f14_1rtlnv +0.7058 SsM -0.0207 IS +0.0112 R-B
X
X*
AdjRsq
0.2957
OJBE
0.6456
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
CUnul at i ve D st r i but i on of Ffesi dual
QQ Plot of the Best Fit Model of Toluene
LnTdOCUt = 3.1102 +14.721 f14_1rrlnv +0.7058 SsM -0.0207 IS +0. 0112 R5
1.5-1
AdjRsq
0.2957
OJBE
0.6456
-3 -2 -1 0
Nb-nBl Oiantile
Figure B-3. Plots for Model of Toluene before (A,B) and after (C,D) removal of outliers
Final Report
B-3
(D)
-------
PP Plot of the Best Fit Model of Benzene
Ln&nO = 11.461 +26.166 GBInl nv -0. 0374 K5 -0.1724 Ut
Rsq
0.2568
AJjRsq
0.2444
FME
0.659
(A)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Cunul at i ve D st ri but i on of fesi dual
QQ Plot of the Best Fit Model of Benzene
Ln&nO = 11.461 +26.166 GBInl nv -0. 0374 K5 -0.1724 Ut
(B)
PP Plot of the Best Fit Model of Benzene
Ln&nOCUt = 10.074 +5. 4977 f 14_1n1 nv +16.148 GBIrtl nv +0.3036 3 ab4 -0.0391 K5-0. 0849U4
0.4116
AJjRsq
FME
0.478
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
CUrml at i ve Dstribution of fesi dual
QQ Plot of the Best Fit Model of Benzene
Ln&nOCUt = 10.074 +5. 4977 f 14 1rtl nv +16.148 GBIrtl nv +0.3036 3 ab4 -0.0391 IS -0. 0849U4
Adjfeq
0.3936
FWE
0.478
-3 -2
(D)
0 1
Nb-nBl Oiantile
Figure B-4. Plots for Model of Benzene before (A,B) and after (C,D) removal of outliers
Final Report
B-4
-------
PP Plot of the Best Fit Model of Ethylbenzene
LnBzO = 5.3254 +13. 32 f 14_1n1 nv 40. 5279 3 ab4 - 0. 0265 IS -0.1716U4
Rsq
0.1285
AdjRsq
0.1089
OOE
1.1197
(A
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
CUnul at i ve D st r i but i on of fesi dual
QQ Plot of the Best Fit Model of Ethylbenzene
LnBizO = 5. 3254 +13. 32 f14_1rrl nv -Kl. 5279 3 ab4 -0. 0265 K5 -0. 1716 Ut
Rsq
0.1285
MjRsq
0.1089
(B)
-2-10 1
Nbrnal Qiantile
PP Plot of the Best Fit Model of Ethylbenzene
LnBzOCUt = 5.9813 +9.6811 f14_1rrlnv +0. 4377 3 ab4 - 0. 0275 IS -0.1137 Ut
AdjRsq
0.1421
WEE
0.8023
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
CUnulative Dstribution of fesi dual
Plot of the Best Rt Model of Ethylbenzene
LnBlzOCUt = 5.9813 49.6811 f14_1rrlnv +0.4377 3 ab4 -0.0275 IS -0.1137Ut
AdjRsq
0.1421
(D)
-2-101
NbrnBl Oiantile
Figure B-5. Plots for Model of Ethylbenzene before (A,B) and after (C,D) removal of outliers
Final Report
B-5
-------
PP Plot of the Best Fit Model of MTBE
LnlUIBH) = 1. 9744 +39. 494 GBInl nv -0. 2159 Ut
1.0
Rsq
0.0838
Adjfeq
0.0736
OOE
1.194
(A
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Cunul at i ve D st r i but i on of fesi dual
QQ Plot of the Best Fit Model of MTBE
LnMBHl = 1. 9744 +39. 494 GBInl nv -0. 2159 l»
3"
Rsq
0.0838
MjRsq
0.0736
-3 -2
(B)
0 1
Nbrnal Qiantile
PP Plot of the Best Fit Model of MTBE
LnWTEBXXt = -2.743 +22. 255 f 11_1rrl nv +33. 563 GB1 nl nv 40.24353*4 +0.0124 IS -0. 187U4
AJjRsq
0.2268
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
CUnul at i ve Dstribution of fesi dual
QQ Plot of the Best Fit Model of MTBE
LnMBHlQlt = -2.743 +22.255 f11_1rrlnv +33. 563 GSM nv +0.2435 3 ab4 +0. 0124 K5-0. 187 U4
2.(H
1.5'
1.01
0.5
; 0.0
i -0.5-
-1.0-
-1.51
-2.0
-2.5
171
Rsq
0.2496
AdjRsq
0.2268
(D)
0 1
Mr ml Oiantile
Figure B-6. Plots for Model of MTBE before (A,B) and after (C,D) removal of outliers
Final Report
B-f
-------
PP Plot of the Best Fit Model of PCE
LnFCHl = -0.6546 454.386 DCFIrrl nv +0.21073ab4 -0.22591H +0.0103 Red p5
4*
Rsq
0.2102
AdjRsq
0.1925
OOE
0.7363
(A
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
CUnul at i ve D st r i but i on of Ffesi dual
QQ Plot of the Best Fit Model of PCE
LnFCHl = -0.6546 +54.386 DCFIrrl nv +0. 2107 3 ab4 - 0. 2259 Ut +0. 0103 FT eel p5
Rsq
0. 2102
MjRsq
0.1925
WEE
0.7363
-3 -2
(B)
0 1
NbrnBl Qiantile
PP Plot of the Best Fit Model of PCE
LnKHJOlt = 2.4945 +32.673 DCFIrtl nv +0.1444 3ab4 -0. 0123K5 -0.1441 Ut +0.0091 R-E
AJjRsq
0.2907
FME
0.4186
0.0 0. 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
CUnul at i ve D st r i but i on of Ffesi dual
QQ Plot of the Best Fit Model of PCE
LnFCHlCUt = 2.4945 +32.673 DtFlrrl nv +0.1444 3 ab4 - 0. 0123 K5 - 0.1441 U4 +0.0091 R-6
1.25'
1.00-
0.75
0.50
_ 0.25'
a
~ 0.00'
-0.25'
-0.50
-0.75
-1.00
-1.25
164
Rsq
0.3124
AdjRsq
0.2907
(D)
-1 0
NbrnBl Qiantile
Figure B-7. Plots for Model of PCE before (A,B) and after (C,D) removal of outliers
Final Report
B-7
-------
PP P ot
_lnv ->.**!!3 K4 -*.1«U4
: 1.1
~
Figure B-7. Plots for Model of PM2.5 Mass
PP Rot
- -E.7EI7 "I.4IOS flH.llrw *H4.Sf m»ZDIS_lnv *C.OI37»H *O.DH?StaW
5 «•«
Figure B-7. Plots for Model of Elemental Carbon
Final Report
-------
PP Plot
e Distribution of I
Figure B-7. Plots for Model of Organic Carbon
PP Plot PP Plot
Cunulatlve Di r tr i but-on of lleslduat
figure B-*7. plots'^for Model df Or^anic? CafBon '
Final Report
B-9
-------
APPENDIX C. Diagnostic Results of Equal Variance and Multicollinearity
M,p-Xylene
Consistent Covariance of Estimates
Variable
Intercept
f!4_lmlnv
GSlmlnv
Stab4
K5
U4
Parameter Estimates
Variable
Intercept
F14_lmlnv
GSlmlnv
Stab4
K5
U4
Intercept fl4_lmlnv
2.4504 0.30152
0.30152 10.3825
-0.86165 -2.90185
-0.08012 0.04758
-0.00668 -0.00222
-0.03337 0.00483
Test of First
DF
20
GSlmlnv Stab4
-0.86165 -0.08012
-2.90185 0.04758
35.1448 -0.05396
-0.05396 0.00971
0.00323 7.5E-05
0.01211 0.00231
and Second Moment Specification
Chi-Square Pr > ChiSq
23.57 0.2616
_„ Parameter Standard TT . _ .
DF „ . „ t Value Pr > t
Estimate Error '
1 4.94236
1 7.94739
1 17.4362
1 0.53744
1 -0.0232
1 -0.0653
K5 U4
-0.00668 -0.03337
-0.00222 0.00483
0.00323 0.01211
7.5E-05 0.00231
2.1E-05 4.8E-05
4.8E-05 0.00197
„ . Variance
lolerance , n .
Inflation
1.70161 2.9 0.0042 . 0
4.43103 1.79 0.0747 0.95826 1.04356
6.29951 2.77 0.0063 0.93148 1.07356
0.11065 4.86 <.0001 0.74522 1.34188
0.00507 -4.58 <.0001 0.91088 1.09784
0.04438 -1.47 0.1431 0.70158 1.42535
Final Report
B-10
-------
Collinearity Diagnostics
Number Eigenvalue
1 4.77579
2 0.64666
3 0.51787
4 0.05567
5 0.00365
6 0.00035
Condition
Index
1
2.7176
3.03677
9.26198
36.1517
116.442
Proportion of Variation
Intercept
2.5E-05
4.1E-05
7.3E-06
0.00043
0.0199
0.9796
fl4_lmlnv
0.01274
0.66273
0.31247
0.00407
0.00799
1.3E-06
GSlmlnv
0.0131
0.14372
0.81381
0.00016
0.02775
0.00147
Stab4
0.00023
0.00036
4.6E-05
0.01681
0.86081
0.12174
K5
3.4E-05
5.6E-05
1.1E-05
0.00081
0.06304
0.93606
U4
0.00179
0.0034
0.0003
0.58238
0.22528
0.18686
Final Report
B-ll
-------
o-Xylene
Consistent Covariance of Estimates
Variable Intercept
Intercept 1.67737
fl4_lmlnv 0.71942
GSlmlnv -1.41352
Stab4 -0.05168
K5 -0.00462
U4 -0.01991
f!4_lmlnv GSlmlnv Stab4
0.71942 -1.41352 -0.05168
13.7652 -3.15844 0.00684
-3.15844 15.3048 0.00274
0.00684 0.00274 0.00738
-0.00276 0.00464 2.7E-05
-0.00947 -0.0015 0.00141
K5 U4
-0.00462 -0.01991
-0.00276 -0.00947
0.00464 -0.0015
2.7E-05 0.00141
1.5E-05 2.1E-05
2.1E-05 0.0016
Test of First and Second Moment Specification
DF Chi-Square Pr > ChiSq
Results of Multicollinearity Test on the
Parameter Estimates
Variable DF
Intercept 1
fl4_lmlnv 1
GSlmlnv 1
Stab4 1
K5 1
U4 1
20 26.89 0.1384
Final Model of o-Xylene
Parameter Standard TT . _
„ . „ t Value Pr >
Estimate Error
4.45813 1.4074 3.17
7.44373 4.48291 1.66
9.54244 5.47996 1.74
. „ , Variance
t lolerance , n .
Inflation
0.0018 . 0
0.0988 0.93057 1.07461
0.0835 0.91381 1.09431
0.52092 0.09234 5.64 <.0001 0.72058 1.38777
-0.02352 0.00419 -5.62 <.0001 0.9089 1.10023
-0.12197 0.03697 -3.3
0.0012 0.68247 1.46527
Final Report
B-12
-------
Collinearity Diagnostics
Number Eigenvalue
1 4.83844
2 0.63267
3 0.46846
4 0.05647
5 0.0036
6 0.00035
Condition
Index
1
2.76544
3.21379
9.25612
36.6669
116.918
Intercept
2.4E-05
5E-05
1.97E-10
0.00042
0.02017
0.97934
fl4_lmlnv
0.01282
0.36289
0.61709
2.5E-05
0.00148
0.00569
Proportion of "V
GSlmlnv
0.01258
0.35101
0.60919
0.00486
0.01935
0.00301
rariation
Stab4
0.00022
0.00041
1.9E-06
0.01685
0.85845
0.12406
K5
3.4E-05
6.9E-05
3.02E-08
0.00079
0.06507
0.93403
U4
0.0017
0.00403
1.3E-05
0.5583
0.24109
0.19487
Final Report
B-13
-------
Final Report B-14
-------
Toluene
Consistent Covariance of Estimates
Variable
Intercept
fl4_lmlnv
Stab4
K5
RH5
Intercept f!4_lmlnv Stab4
2.53139 0.99584 -0.03324
0.99584 15.3306 -0.0417
-0.03324 -0.0417 0.00931
-0.00899 -0.00354 -1.1E-05
0.00266 0.00176 -0.00015
Test of First and Second Moment Specification
K5 RH5
-0.00899 0.00266
-0.00354 0.00176
-1.1E-05 -0.00015
3.5E-05 -1.2E-05
-1.2E-05 2E-05
DF Chi-Square Pr > ChiSq
Results of Multicollinearity
Parameter Estimates
14 26.13 0.0249
Test on the Final Model of Toluene
T7 ... Parameter Standard .
Vanable DF „ . „ t Value Pr >
Estimate Error
Intercept 1
f!4_lmlnv 1
Stab4 1
K5 1
RH5 1
. „ , Variance
t lolerance , n .
Inflation
3.11017 1.84905 1.68 0.0945 . 0
14.7215 4.43634 3.32 0.0011 0.98854 1.01159
0.70584 0.12046 5.86 <.0001 0.84859 1.17842
-0.02067 0.00635 -3.25 0.0014 0.84105 1.18898
0.01116 0.0048 2.33 0.0212 0.73064 1.36867
Final Report
B-15
-------
Collinearity Diagnostics
Number Eigenvalue
1 4.31674
2 0.65659
3 0.02114
4 0.00516
5 0.00038
Condition
Index
1
2.56408
14.291
28.9175
106.883
Proportion of Variation
Intercept
3.8E-05
2.5E-05
0.00462
0.01539
0.97993
fl4_lmlnv
0.01532
0.97236
0.00271
0.0002
0.00941
Stab4
0.00035
0.00024
0.01229
0.91675
0.07038
K5
4E-05
2.5E-05
0.00315
0.02567
0.97112
RH5
0.00122
0.00092
0.81896
0.03874
0.14016
Final Report
B-16
-------
Benzene
Consistent Covariance of Estimates
Variable
Intercept
f!4_lmlnv
GSlmlnv
Stab4
K5
U4
Intercept
1.92094
0.77079
0.1021
-0.08161
-0.00501
-0.02127
fl4_lmlnv
0.77079
7.55125
-2.74611
-0.02564
-0.00231
-0.0074
GSlmlnv Stab4
0.1021 -0.08161
-2.74611 -0.02564
28.7272 -0.07682
-0.07682 0.0103
0.00096 7.3E-05
-0.03097 0.00224
K5 U4
-0.00501 -0.02127
-0.00231 -0.0074
0.00096 -0.03097
7.3E-05 0.00224
1.6E-05 1.3E-05
1.3E-05 0.00161
Test of First and Second Moment Specification
Results of Multicollinearity Test on
Parameter Estimates
Variable
Intercept
fl4_lmlnv
GSlmlnv
Stab4
K5
U4
DF
1
1
1
1
1
1
DF
20
the Final Model of
Parameter
Estimate
10.0744
5.4977
16.1478
0.30356
-0.03914
-0.08488
Chi-Square Pr > ChiSq
25.52 0.1824
Benzene
Standard TT . _ .
„ t Value Pr > t
Error '
„ . Variance
lolerance , n .
Inflation
1.49805 6.73 <.0001 . 0
3.32981 1.65 0.1007 0.95886 1.0429
5.57504 2.9 0.0043 0.94282 1.06065
0.09971 3.04 0.0027 0.7092 1.41003
0.00447 -8.76 <.0001 0.9031 1.10729
0.03966 -2.14 0.0338 0.66435 1.50522
Final Report
B-17
-------
Collinearity Diagnostics
Number Eigenvalue
1 4.74689
2 0.66641
3 0.52523
4 0.0576
5 0.00351
6 0.00036
Condition
Index
1
2.66891
3.00627
9.0781
36.755
114.882
Proportion of Variation
Intercept
2.6E-05
3.5E-05
1.3E-05
0.0004
0.02129
0.97823
fl4_lmlnv
0.01248
0.7678
0.20451
0.00464
0.00542
0.00515
GSlmlnv
0.01339
0.07717
0.88566
8.8E-06
0.02252
0.00125
Stab4
0.00023
0.00029
8.8E-05
0.01655
0.86748
0.11536
K5
3.5E-05
4.6E-05
1.9E-05
0.00077
0.0673
0.93183
U4
0.00175
0.00289
0.00068
0.54143
0.25551
0.19774
Final Report
B-18
-------
Ethylbenzene
Consistent Covariance of Estimates
Variable Intercept
Intercept 6.25969
fl4_lmlnv 0.16325
Stab4 -0.23772
K5 -0.01664
U4 -0.07101
Table E.10. Results of Multicollinearity
Parameter Estimates
Variable DF
Intercept 1
fl4_lmlnv 1
Stab4 1
K5 1
U4 1
fl4_lmlnv Stab4
Test
DF
14
Test on the
Parameter
Estimate
5.98125
9.6811
0.43775
-0.02747
-0.11372
0.16325 -0.23772
31.8971 0.0485
0.0485 0.02746
-0.00188 0.00027
-0.02487 0.00441
of First and Second Moment Specification
Chi-Square Pr > ChiSq
22.44 0.0700
Final Model of Ethylbenzene
Standard . .
„ t Value Pr > t
Error '
K5 U4
-0.01664 -0.07101
-0.00188 -0.02487
0.00027 0.00441
5.2E-05 8.2E-05
8.2E-05 0.00625
„ , Variance
lolerance T n .
Inflation
2.43806 2.45 0.0152 . 0
5.62338 1.72 0.087 0.99172 1.00835
0.16287 2.69 0.0079 0.75403 1.32621
0.00732 -3.75 0.0002 0.92634 1.07952
0.06587 -1.73 0.0861 0.71056 1.40735
Final Report
B-19
-------
Collinearity Diagnostics
Number
1
2
3
4
5
Eigenvalue C
4.29372
0.64669
0.05557
0.00365
0.00038
1
2.57673
8.79047
34.3084
106.199
Proportion of Variation
Intercept
3.4E-05
2.4E-05
0.00046
0.0224
0.97709
fl4_lmlnv
0.01575
0.9744
0.00241
0.00158
0.00585
Stab4
0.0003
0.0002
0.01763
0.87486
0.10702
K5
4.6E-05
3E-05
0.00084
0.06731
0.93178
U4
0.00227
0.0019
0.58435
0.23986
0.17163
Final Report
B-20
-------
Methyl tert Butyl Ether (MTBE)
Consistent Covariance of Estimates
Variable Intercept
Intercept 4.96409
Fll_lmlnv 0.29828
GSlmlnv -1.29803
Stab4 -0.13099
K5 -0.01409
U4 -0.06632
fll_lmlnv GSlmlnv Stab4
0.29828 -1.29803 -0.13099
210.102 -18.3631 0.08589
-18.3631 61.7254 -0.34534
0.08589 -0.34534 0.0222
-0.00205 0.00883 3.8E-05
-0.09811 0.03525 0.0028
K5 U4
-0.01409 -0.06632
-0.00205 -0.09811
0.00883 0.03525
3.8E-05 0.0028
4.6E-05 0.00014
0.00014 0.00261
Test of First and Second Moment Specification
DF Chi-Square Pr > ChiSq
Results of Multicollinearity Test on
Parameter Estimates
Variable DF
Intercept 1
fll_lmlnv 1
GSlmlnv 1
Stab4 1
K5 1
U4 1
20 18.20 0.5745
the Final Model of MTBE
Parameter Standard TT . _
„ . „ t Value Pr >
Estimate Error
. „ , Variance
t lolerance T n .
Inflation
-2.743 2.27071 -1.21 0.2288 . 0
22.2547 14.3072 1.56 0.1217 0.97446 1.02621
33.5627 8.28221 4.05 <.0001 0.95972 1.04197
0.24348 0.14963 1.63 0.1056 0.70196 1.42459
0.01239 0.00669 1.85 0.0659 0.89917 1.11213
-0.18702 0.06022 -3.11 0.0022 0.6496 1.53942
Final Report
B-21
-------
Collinearity Diagnostics
Number Eigenvalue
1 4.66055
2 0.73182
3 0.54484
4 0.0588
5 0.00364
6 0.00036
Condition Index —
1
2.52358
2.92472
8.90275
35.8027
113.396
Proportion of Variation
Intercept
2.7E-05
2E-05
2.6E-05
0.00038
0.01965
0.9799
fll_lmlnv
0.01139
0.94942
0.0308
0.00666
0.00015
0.00157
GSlmlnv
0.01397
0.00242
0.95844
0.00263
0.02173
0.00081
Stab4
0.00024
0.00018
0.00018
0.01602
0.85101
0.13237
K5
3.8E-05
2.8E-05
3.7E-05
0.00077
0.06764
0.93148
U4
0.00183
0.00096
0.00209
0.53461
0.24151
0.21899
Final Report
B-22
-------
Tetrachloroethylene (PCE)
Consistent Covariance of Estimates
Variable Intercept DCFlmlnv Stab4 K5
U4 RH5
Intercept 1.73111 -0.03794 -0.0554 -0.00463 -0.02507 -0.00017
DCFlmlnv -0.03794 200.107 -0.25514 0.0035 -0.01538 -0.00311
Stab4 -0.0554 -0.25514 0.00855 2.7E-05 0.00145 -8.6E-06
K5 -0.00463 0.0035 2.7E-05 1.5E-05 3.5E-05 -1.5E-06
U4 -0.02507 -0.01538 0.00145 3.5E-05 0.00135 3.1E-05
RH5 -0.00017 -0.00311 -8.6E-06 -1.49E-06 3.1E-05 7.86E-06
Test of First and Second Moment Specification
DF Chi-Square Pr > ChiSq
20 11.97 0.9170
Results of Multicollinearity Test on the Final Model of PCE
Parameter Estimates
TT ... „„ Parameter Standard TT , „ ^
Variable DF . t Value Pr >
Estimate Error
. „ , Variance
t Tolerance T n .
Inflation
Intercept 1 2.4945 1.34715 1.85 0.0659 . 0
DCFlmlnv 1 32.6734 12.3964 2.64 0.0092 0.97021 1.0307
Stab4 1 0.14442 0.08831 1.64
0.104 0.74611 1.34028
K5 1 -0.01229 0.00416 -2.96 0.0036 0.85224 1.17338
U4 1 -0.1441 0.03588 -4.02 <.0001 0.71002 1.40842
RH5 1 0.00913 0.00301 3.04 0.0028 0.75523 1.3241
Final Report
B-23
-------
Collinearity Diagnostics
Number Eigenvalue
1 5.53937
2 0.371
3 0.06942
4 0.01614
5 0.00374
6 0.00034
Condition
Index
1
3.86407
8.933
18.5246
38.4987
127.603
Proportion of Variation
Intercept
1.9E-05
3.4E-05
1.7E-05
0.00354
0.02173
0.97466
DCFlmlnv
0.0088
0.94315
0.01382
0.02428
0.00461
0.00533
Stab4
0.00017
0.00023
0.00338
0.06688
0.79268
0.13666
K5
2.4E-05
4.5E-05
6.8E-05
0.00359
0.05781
0.93846
U4
0.00132
0.00492
0.42245
0.23735
0.22567
0.10829
RH5
0.0008
0.0016
0.0973
0.84095
0.00917
0.05018
Final Report
B-24
-------
4.10. PM2.5 Mass
Summary of Stepwise Selection
Variable
Step Entered
Variable
Removed
Stab4
Ufl9_linv
fll_llnv
PM03DlS_inv
KmmHG4
Number Partial Model
Label Vars In R-Square R-Square C(p)
F Value Pr > F
Stab4
U4
f!9 linv
fll_llnv
PM03DIS inv
K4
mmHG4
1
2
3
4
5
6
7
0
0
0
0
0
0
0
.3184
.0655
.0524
.0210
.0163
.0141
.0134
0,
0,
0,
0,
0,
0,
0,
.3184
.3839
.4363
.4573
.4736
.4877
.5010
30
20
12
10
9
8
8
.4114
.0725
.1924
.2394
.1699
.5164
.0000
46,
10,
9,
3,
2,
2,
2,
.71
.52
.12
.75
.97
.61
.52
<,
0,
0,
0,
0,
0,
0,
.0001
.0016
.0032
.0557
.0880
.1094
.1160
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value
Model T 12.62973 1.80425 13.48
Error 94 12.57688 0.13380
Corrected Total 101 25.20661
Parameter Standard
Variable Estimate Error Type II SS F Value Pr
intercept 13.91429 6.84008 0.55366 4.14 0.
fll linv 25.77270 11.93020 0.62441 4.67 0.
f!9 linv 3.62748 1.65616 0.64188 4.80 0.
PM03DIS inv 58.63622 29.32827 0.53482 4.00 0.
-0.00927 0.00476 0.50718 3.79 0.
K4
U4mmHG4
Stab4
Variable
Step Entered
Stab4
1
2 Ufl9_linv
-0.16301 0.03855 2.39254 17.88 <.
-0.01311 0.00827 0.33669 2.52 0.
0.41383 0.09362 2.61421 19.54 <.
Bounds on condition number: 1.5607, 58.934
Summary of Stepwise Selection
Variable Number Partial Model
Removed Label Vars In R-Square R-Square C(p)
Stab4 1 0.3184 0.3184 26.3067
U4 2 0.0655 0.3839 16.3623
f!9 linv 3 0.0524 0.4363 8.7980
Pr > F
<.0001
> F
0447
0333
0310
0485
0545
0001
1160
0001
F Value
46.71
10.52
9.12
Pr > F
<.0001
0.0016
0.0032
Final Report
B-25
-------
fll_llnv
PM03DlS_inv
fll_llnv
PM03DlS_inv
0.0210
0.0163
0.4573
0.4736
6.9713
6.0000
3.75 0.0557
2.97 0.0880
Source
Model
Error
Corrected Total
Analysis of Variance
Sum of Mean
DF Squares Square
5 11.93801 2.38760
96 13.26860 0.13821
101 25.20661
F Value Pr > F
17.27 <.0001
Variable
Intercept
fll_llnv
f!9_linv
PM03DlS_inv
U4Stab4
Find The Best Fitted Model for PM
Stepwise Selection: Step 5
Parameter
Estimate
1.
25.
4.
.06000
.27425
.19851
50.94389
-0.13037
0.42820
Standard
Error
0.57297
12.03839
.65726
.55414
0.03636
0.09491
1.
29.
Type II SS F Value Pr > F
0.47304
0.60922
0.88708
0.41068
1.77726
2.81332
42
41
6.42
2
12
97
86
20.35
0.0674
0.0384
0.0129
0.0880
0.0005
<.0001
Bounds on condition number: 1.3453, 28.976
Final Report
B-26
-------
C -11 Elemental Carbon
Variable
Step Entered
Variable
Removed
RH4
Stab4
f14_li nv
PM02DlS_inv
Summary of Stepwise Selection
Label
RH4
Stab4
f14_li nv
PM02DlS_inv
Number Partial Model
Vars In R-Square R-Square
0.2357
0.0562
0.0362
0.0484
0.2357
0.2919
0.3281
0.3765
C(p) F Value Pr > F
9.6140 15.42 0.0003
7.3748 3.89 0.0542
6.6483 2.58 0.1145
5.0000 3.65 0.0622
Analysis of Variance
Source DF
Model
Error 47
Corrected Total 51
Sum of
Squares
4.07456
6.74801
10.82257
Mean
Square
1.01864
0.14357
F Value Pr > F
7.09 0.0001
Variable
Intercept
f14_li nv
PM02DlS_inv
RH4
Stab4
Parameter
Estimate
-2.76874
8.41047
944.59427
0.01371
0.35474
Standard
Error
0.68589
3.84635
494.54016
0.00498
0.14301
Type II SS F Value Pr > F
2.33955
0.68647
0.52380
1.09096
0.88334
16.30
4.78
3.65
60
6.15
Bounds on condition number: 1.2892, 19.405
0.0002
0.0338
0.0622
0.0083
0.0168
Final Report
B-27
-------
C-12 Organic Carbon
step
Summary of Forward Selection
Variable
Entered
Stab4
fll_llnv
Label
Stab4
fll_llnv
Number
Vars In
1
Partial
R-Square
0.2614
0.0275
Model
R-Square
0.2614
0.2889
C(p)
2.8931
3.0000
F Value
17.70
1.89
Pr > F
0.0001
0.1751
Source
Model
Error
Corrected Total
Analysis of Variance
DF
49
51
Sum of
Squares
3.25109
8.00212
11.25321
Mean
Square
1.62555
0.16331
F Value
9.95
Pr > F
0.0002
Find The Best Fitted Model for PM
Forward Selection: Step 2
Variable
Intercept
fll_llnv
Stab4
Parameter
Estimate
-1.
26.
0.
,67517
,64584
,55571
Standard
Error
0,
19,
0,
.66211
.36632
.13474
Type
1.
0.
2.
II SS
,04536
,30915
,77773
F Value
6
1
17
.40
.89
.01
Pr > F
0
0
0
.0147
.1751
.0001
Bounds on condition number: 1.0061, 4.0244
Final Report
B-28
-------
C- 13 PAHs
Variable
Step Entered
Kstab4
fll_llnv
;_inv
Summary of Stepwise Selection
Variable Number Partial Model
Removed Label Vars In R-Square R-Square
K4 1
Stab4 2
fll_llnv 3
Precip4 4
PMOlDlS_inv 5
The REG Procedure
Model: MODELl
Dependent Variable: LnBghiPP LnBghiPP
Stepwise Selection: Step 5
C(p)
F Value Pr > F
0
0
0
0
0
.2160
.1613
.0719
.0424
.0420
0,
0,
0,
0,
0,
.2160
.3773
.4493
.4917
.5337
24
13
9
7
6
.5663
.0761
.0590
.5090
.0000
11,
10,
5,
3,
3,
.85
.88
.36
.34
.51
0,
0,
0,
0,
0,
.0013
.0020
.0257
.0751
.0685
Variable
Intercept
fll_llnv
PMOlDlS_inv
Precip4
K4Stab4
Parameter
Estimate
13.55955
125.65797
629.84523
-12.18382
-0.07630
1.37241
Standard
Error
4.27716
48.00264
336.23325
5.11993
0.01451
0.32101
Type II SS F Value Pr > F
6.21906
4.24028
2.17136
3.50416
17.10874
11.31041
10.05
6.85
3.51
5.66
27.65
18.28
0.0030
0.0125
0.0685
0.0223
<.0001
0.0001
Bounds on condition number: 1.3725, 30.547
Final Report
B-29
-------
Variable
Step Entered
Kstab4
fll_llnv
Precip4
PMOlDlS_inv
Summary of sF*p(?i sCQ&ection
Variable Number Partial Model
Removed Label Vars In R-Square R-Square
K4 1
Stab4 2
fll_llnv 3
Precip4 4
PMOlDlS_inv 5
The REG Procedure
Model: MODELl
Dependent Variable: LnCORP LnCORP
Stepwise Selection: Step 5
C(p)
F Value Pr > F
Variable
Intercept
fll_llnv
PMOlDlS_inv
Precip4
K4Stab4
0
0
0
0
0
.2140
.1966
.0588
.0437
.0271
0,
0,
0,
0,
0,
.2140
.4106
.4694
.5131
.5402
25
10
8
6
6
.6691
.9925
.0059
.2988
.0000
11,
14,
4,
3,
2,
.71
.01
.54
.59
.30
0,
0,
0,
0,
0,
.0014
.0005
.0391
.0654
.1375
Parameter
Estimate
14.23612
125.00731
563.35265
-13.04686
-0.08337
1.63208
Standard
Error
4.72659
53.04655
371.56318
5.65791
0.01603
0.35474
Type II SS F Value Pr > F
6.85515
4.19648
1.73710
4.01818
20.43059
15.99544
9.07
5.55
2.30
5.32
27.04
21.17
0.0045
0.0236
0.1375
0.0265
<.0001
<.0001
Bounds on condition number: 1.3725, 30.547
Final Report
B-30
-------
fll 1
f!2 1
f!4 1
f!6 1
f!7 1
f!9 1
DCF1
Tol PS1
Variable
f!2_l
f!4_l
f!6_l
f!7_l
f!9_l
earson Correlation between X Variables
DCF
Tol PS1
f!2 1
f!4 1
f!6 1
f!7 1
f!9 1
Label
f!2_l
f!4_l
f!6_l
f!7_l
f!9_l
GS1
DCF1
Tol PS1
DCF1
fl£_l
f!6 1
f!9_l
f!9 1
DCF1
DCF1
Tol P S1
- < 0001
PcibPSlIr under HO: Rho=0
Final Report
C-l
-------
fll Imlnv
f!2 Imlnv
f!4 Imlnv
f!6 Imlnv
f!7 Imlnv
f!9 Imlnv
GSlmlnv
DCFlmlnv
Variable
Tol_PSlmInv
fll_lmlnv
fl2_lmlnv
f!4_lmlnv
f!6 Imlnv
f!7_lmlnv
f!9_lmlnv
GSlmlnv
X Variables
f!2_lm
Inv
Label
fll_lmlnv
f!2_lmlnv
f!4_lmlnv
f!6_lmlnv
f!7_lmlnv
f!9_lmlnv
GSlmlnv
DCFlmlnv
Tol PSlmlnv
f!4_lm
Inv
f!7_lm
Inv
f!9_lm
Inv
DCFlmlnv
fmS_CmlRvprocedure
f!2_lmlnv
f!4_lmlnv
f!4_lmlnv
filtrjlmlBtatistics
f!6_lmlnv
r5Y_lmInv
f!7_lmlnv
f!9_lmlnv
fl9_lmlnv
DCFlmlnv
DCFlmlnv
Prob > |r|
Tol_PSlmInv
Tol PSlmlnv
Final Report
C-2
-------
Stab4
U4
RHE
PrecipJ
Variable
Sum
Correlation between X Variables
¥fiibCORR Procedure
Stab4
RH5
RH5
Prob >
under HO: Rho=0
Final Report
C-3
-------
MTBEO
PCEO
ble
Sum
BznO
EbzO
MTBEO
PCEO
0.61564
<.0001
The CORR Procedure
oXO
oXO 0.43479
<.0001
BznO
BznO
MTBEO
0.83145
under HQSOBmo=0
Final Report
C-4
-------
LnmpXO LnoXO
LnTolO
LnBznO
LnEbzO
LnMTBEO LnPCEO
LnCC140
Variable
LnmpXO
LnoXO
LnTolO
LnBznO
LnEbzO
LnMTBEO
Lfieaieon
LnCC140
Sum
LnmpXO
LnoXO
LnTolO
LnBznO
LnEbzO
LnMTBEO
LnPCEO
LnCC140
LnBznO
LnEbzO
LnMTBEO
LnPCEO
.62676
LnmpXO
LnTolO
Simple Statistics
LnBznO
LnBznO
LnEbzO
LnEbzO
0.
LnMTBEO
LnPCEO
LnPCEO
Prob >
LnCC140
LnCC140
Correlation Coefficients, N = 183
0 . 27 904
under HO:OBBeh=0
Final Report
C-5
-------
General Comments:
Overall, the report reflects a substantial amount of high-quality work, and reflects good
practices in ensuring the quality of geographic data for use in subsequent analysis. The
statistical approaches are supported by independent data, including relative vapor
pressures for BTEX species.
General comments follow, with detailed in-line comments following.
1.
The choice of ordinary (multiple) linear regression for analysis of this data set is
acceptable, but several caveats are in order. While a full description of the RIOPA data
collection protocol has not yet been published, it is EPA's understanding that several
homes were monitored concurrently for 48 hr (say n per subset), after which a new set of
homes was monitored. To conduct monitoring at 100 homes, 100/n = p different rounds
of home data collection would have to be undertaken. This process introduces an issue of
non-independence of data for homes collected during the same of each of the p rounds of
data collection. During the same 48 hr period of monitoring, the homes being monitored
shared the same meteorological data (used in the current analysis). As such, these data
may be analogous to the "clustering" phenomenon in surveys (associated with a loss of
sampling efficiency). While this is unlikely to have a significant impact on the
magnitude of the regression coefficients, it may have a substantial effect on their
estimated standard errors. One way to significantly strengthen the current analysis would
be to include for 2-3 compounds (say, one species each of PM, VOC, and PAH) a
sensitivity analysis in which a mixed effects model is applied to the data sets, to account
for random within-"cluster" variation. In SAS, the PROC MIXED procedure would be
used for such analysis. Addition of 1st order autocorrelation for data collected
simultaneously would also be appropriate here.
It is now reported that typically 1 or 2 homes were sampled on a single day, though some
days had 3 or 4 homes samples so clustering should not be an issue. PROC MIXED was
run with date as the repeated variable and no autocorrelation was found (page 33-34).
1.5
On a related note, the low partial-Rsqr of most of the regression coefficients should
be further explored. One interpretation is that spatial patterns are relatively small
contributors to overall variability in ambient concentrations. Another is that given the
RIOPA sampling approach, the small number of concurrently-monitored homes
resulted in assignment of a larger portion of explained variability to day-to-
day/"samling cluster" to "sampling cluster" variation than would be observed given a
"balanced" design in which spatial and temporal variability would be more seperable.
Recently, the Battelle Memorial Institute conducted an analysis of sources of
variability in EPA's pilot project for air toxics monitoring in ambient air within
several cities nationally. At a series of fixed sites with simultaneous measurements,
within-city spatial variability contributed almost as great a fraction of total variability
as temporal variability (Battelle Memorial Institute and Sonoma Technologies, Inc.
-------
(2003) Draft technical report for Phase II air toxics monitoring data: analyses and
network design recommendations. Prepared for Lake Michigan Air Directors
Consortium, Des Plaines, Illinois 60018). A discussion of the role study design in
interpretation of these results is appropriate in the report.
More details of the study design are reported and a copy of a paper in press detailing
that information is included. The clustering due to either date or location is not a
problem based on the study design, as homes were selected throughout the 18 month
study period from all sections of the city without concentrating on any portion of the city
during individual time periods.
2.
Appendix A and the results section discuss diagnostic procedures applied to regression
outputs to determine multicollinearity. However, neither the Appendix nor main report
provide reasoning for decisions to apply corrective measures or not. For instance, it is
mentioned that the distance to gas stations is significantly correlated with distance to
major roadways. What was the strength of this association? If greater than about 0.85,
this could lead to unstable coefficients. The relative significance of the associations
provide some assurance that variances are not super-inflated, but when one of the
distance terms was removed, did the other remain stable? Such description is necessary
for the reader to be able to properly interpret the regression results. Other areas where
further rationale is needed include decisions not to correct multicollinearity in cases with
failing diagnostics (e.g. condition index).
More details on the reason for the decisions have been included in Appendix A.
3.
Why were "traditional" residual diagnostics not employed? Cook's Distance, etc.
provide the standard approach to such diagnosis, but the rationale for not using them is
not provided here.
The approach used to look at the residual was a traditional residual diagnostic and is
more clearly stated. The Cook's Distance was not use as it was more time consuming
and not thought to provide additional information past what was obtain for the objective
being considered, to derive a cohesive data base to examine the role of proximity on
ambient concentration. The exclusion of outliers, which probably had other variable
impacting their concentration, was taken to address this fundamental issue and is a
restrictive approach to identify outliers, probably classifying some values as outliers that
were not.
4.
Please include a separate reference section, rather than citing the entire source in the text
itself.
Provided
-------
COMMENTS OF RICH COOK, EPA OTAQ
Chad -
I only had a chance to skim this before going on AL, but I have few comments:
1) In the section "National Emissions Inventory for 1999," I think a little more
discussion of how county level VMT is developed would help. Pechan actually
starts with State level VMT reported in HPMS by States (from sampling), which is
then allocated to the county level using roadway miles for 12 functional classes
and vehicle class splits. This is briefly oulined in the the technical documentation
for the NEI. Joe Somers can help with a description if needed.
The VMT analyses was used as a guide to indicate which roadways to group
together and in the statistical analyses. The actual emissions were not included
in the regression equations. This is now stated in the text. Thus, a more detailed
description of how they were derived is not warranted.
2) Table 10 - residential ambient air concentrations - I think it would be helpful
to compare these data to local ambient monitor data, or maybe even national
averages from AIRS, presuming resources permit. Aldehyde concentrations are
much higher than typically seen at ambient monitors. I wonder why?
Concentrations in the area measured by NJDEP has been added to the table.
3) When discussing why there is not a roadway proximity relationship for
aldehydes, it might be worth presenting some estimates of the secondary
contribution. I know that some modeling has estimated 90% of formaldehyde is
secondary. Again, this is subject to resource availability.
It is not clear to me how to include more on secondary contribution to aldehydes
using the approach taken other than what was done, examining the data by
splitting it into days above 10C and below, where different amounts of secondary
production should occur. More detailed source emission modeling, which
includes secondary production for formaldehyde, is being done by Dr. Panos
Georgopoulos with funding from the ACC and may address this issue in the
future.
4) I am suprised there is a signal for the two PAH compounds they measured.
Nationwide, less than 20% of PAH emissions are from mobile sources. This
suggests a pretty strong raodway effect, I think, given all the noise.
The effect does seem strong, but the compounds were selected as ones with
major mobile contributions.
5) Cliff says that coronene (which is mispelled in several places) is associated
more with gasoline vehicles and benzo(ghi)pyrelene more with diesels, but that
their analysis saw no clear difference in source contributions. I checked the
emission factors we used in the 1999 NEI and found the following:
-------
The text has been altered to indicate the coronene is predominantly gasoline
vehicles derived with the appropriate reference while benzo(ghi)pyrelene is
derived from both gasoline and diesel vehicles.
a) Average emission rate for light duty vehicles and trucks (Norbeck, J. M.,
T. D. Durbin, and T. J. Truex. 1998. Measurement of Primary Particulate Matter
Emissions from Light Duty Motor Vehicles. Prepared by College of Engineering, Center
for Environmental Research and Technology, University of California, for Coordinating
Research Council and South Coast Air Quality Management District. Tables 1 6 and
= 0.017mg/mi
b) Average emission rate for heavy duty diesels (Watson, J. D., E. Fujita, J.
C. Chow, and B. Zielinska. 1998. Northern Front Range Air Quality Study. Desert
Research Institute. See Table 4.4-4, page 4-41.) = 0.013 mg/mi
So I am wondering what the source of data is that shows benzo(g,h,i)pyrelene is
coming mostly from diesels. There is no reference in the report.
This is a good product. I hope these comments help.
Rich Cook
Environmental Scientist
U.S. EPA
Office of Transportation and Air Quality
2000 Traverwood Drive
Ann Arbor, Ml 48105
Phone: 734-214-4827 Fax: 734-214-4939
-------
Stephen Graham
09/13/2004 10:53 AM
To: Chad Bailey/AA/USEPA/US@EPA
cc: Janet Burke/RTP/USEPA/US@EPA
Subject: Re: RIOPA draft report
Hi Chad,
Some brief comments and questions. Overall, the draft needs work on sentence structure
in the both the text and descriptions in the tables/figures.
1) Should have more about the sample collection design (what samples collected and
when, for those included in this work) in background
More has been included and an in press paper is provided to give greater details.
2) All emission rate estimates in Table 4 are generally correlated (based on the Mobile
6.2 modeling, I assume).
a) there is artificial variability introduced for lesser emitted chemicals (e.g. the
aldehydes) due to rounding
b) since they are different roadways, should they not have different distributions of
vehicle classes on them resulting in different emission distributions or those chemicals
listed?
c) unsure why this was done since not used in regressions
As indicated in the response to a comment by Richard Cook, this was done to facilitate
the grouping of the roadways and individual emission rates were not included in the
regression model so the effect of rounding is not important. The individual road classes
are expected to have different vehicle distributions but each chemical and road class was
individually examined so this effect should be accounted for by the analyses.
3) If using statistics for "normal" data, then one should use normal data or at least the
most normal data. Several transformations were mentioned on page 33 and then
correlations performed on each of the transformed variables. Why do all possible
pairwise correlations, other than to 'see what gives the highest R'?
Only the Ln transformation of the concentration data was used in the analyses. As part
of the exploratory work to make sure that an association was not missed more extensive
correlations were evaluated.
4) In using multiple regression approaches (forward selection, backward elimination, etc)
a statement about what each does to the estimate of variance is warranted.
Only the stepwise was used for deriving the final models. The others were run to verify
that consistent results were obtained independent of how the regression equations were
derived.
5) For influential ("outliers") statistics, why not use something more standard like Cook's
D (apparently similar to what was used in this study, cook's uses F distribution rather
than t), DFFITS, DFBETAS, COVRATIO?
-------
See explanation provided the 1st reviewer.
6) condition number of 10-30 indicates mild collinearity, 30-100 moderate, >100 severe.
Impact of excluding/flagging only severe category should be mentioned, although it looks
like even parameters with severe collinearity were indeed included in the "final" models.
We agree that collinearity did exist, but it was predominantly in the meteorology
variables, so the models were deemed acceptable for examining, particularly in a semi-
quantitative manner, the role of proximity.
7) coronene was misspelled several differing ways.
Fixed
8) number of outliers in table 15 is not consistent with Appendix regression outliers. For
example Table 15 lists 13 outliers for m,p-xylene, page A-3 states 17, Table A-2 states
20. QA check should be done here.
Fixed
9) In observing some of the stats for m,p-xylene, it seems that a 5-parameter model was
best and used, rather than a 7 parameter mentioned in page A-3
Fixed
10) Table 16
a) lists 5 different significance levels ranging from 0.0001 through 0.105 (and I think in
the text it is mentioned on occasion as highly significant, more significant, etc.).
Establish a level of significance (e.g., p<0.05), and either something is statistically
significant or not, rather than varying degrees of significant.
For the final model, which was based on a stepwise procedure, p<0.15 was used as the
criterion for inclusion of a variable. The other routines were allow to have less stringent
significance criteria as part of the exploratory analyses.
b) does not indicate significance level for aldehyde, PM, PAH, and OC/EC parameter
estimates
All usedp<0.15
c) precip units are not mentioned, it is apparently a significant parameter for the PAHs
only, but it did not really rain/snow that much over the study period (maximum listed in
table 14 is 0.13 mm if units are correct). Would one expect that much washout from so
little precipitation? Even if it were inches, the median is 0.01, barely trace-level
precipitation. If it is real, why no impact to the PM since essentially these PAH would
all be associated with some form of particulate matter? I suspect that the precipitation is
acting as a surrogate for some other parameter that has not been measured or possibly
systematic error in PAH measurements.
Units now given. The regression suggests and association not an explanation. It could
be another variable that both correlate with.
-------
d) for ethyl benzene, inverse squared transformation was used and coefficient estimate is
167.14 in table A-10 and also in Appendix C, however is listed as 0.17 in Table 16. This
should be corrected, but I have a comment: It is good to see a general consistency among
BTEX coefficient estimates as expected, however, not sure why the inverse squared was
used outside of "it made a better model". It is not very significant (r2=0.16) and would
rather see it in the same units as the others.
All now use inverse of the transformed variable, not square.
e) in general the distance parameters (FC, GS, DCF, Truck, etc) did not add very much
to explaining variation in residential concentrations, even for the true mobile source
chemicals. This not surprising since the chemical is more than likely to never travel on a
direct vector from highway A to home 1. This tells us immediately that if we want to
know the impact a roadway is having on a residence, we need to do a better job of
measuring this in the future (i.e, the 'dilution' or mixing with other air not originating
from this source as a function of distance and micrometeorolgical conditions
(estimated?), the time-of-day, day-of-week, month-of year (i.e., modified AADT)),
otherwise we are really just taking stabs at it in the dark.
Agree
f) cannot remember why ambient concentration is not used as a parameter, even for a
single central site monitor since it will probably do more for the model than the distance
parameters.
No central site data were available.
11) correlation was mentioned among some of the input parameters-1 would like to see
what the actual correlations between FC14 and GS for the residences are rather than a
brief mention. This may be evident in the predictions given in figures 10, 11, and 13 that
show no effective difference in using the either the FC or GS distance parameter. What
about stability and temperature by season, are there correlations here?
A correlation matrix has now been included in Appendix D.
12) not sure where ridge regression was used (technique mentioned on page 37)
Not used for the data presented, section has been removed.
13) no reference section included
Reference section added.
------- |