United States       Air and Radiation      EPA420-R-95-004
          Environmental Protection              September 1995
          Agency
vvEPA    Development of a
          Methodology for
          Estimating Basic
          Emission Rates for
          Use in the MOBILE
          Emission Factor Model
                             > Printed on Recycled Paper

-------
                                  SR95-09-03
         Development of a Methodology for Estimating
                Basic Emission Rates for Use in the
                  MOBILE Emission Factor Model
                                  prepared for:

                       U.S. Environmental Protection Agency
                          under Contract No.  68-C4-0056
                           Work Assignment No. 0-06
                               September 30, 1995
                                  prepared by:

                                  Phil Heirigs
                                Robert G. Dulla

                              Sierra Research, Inc.
                                  1801 J Street
                             Sacramento, CA 95814
                                (916)444-6666
Although the information described in this report has been funded wholly or in part by the United
States Environmental Protection Agency under Contract No. 68-C4-0056, it has not been
subjected to the Agency's peer and administrative review and is being released for information
purposes only. It therefore may not necessarily reflect the views of the Agency and no official
endorsement should be inferred.

-------
                Development of a Methodology for
              Estimating Basic Emission  Rates  for
                   MOBILE Emission  Factor Model


                           Table of Contents

                                                                   Page

1.   Introduction   	   1

     Background   	   1
     Organization of the Report   	   3

2.   Overview of Emission Factors Development for MOBILES   	   4

     Database Adjustments   	   4
     Fuel/Temperature Adjustments    	   6
     IM240-to-FTP Correlations   	   7
     Correlation Adjustments   	   8
     TECHS Inputs   	9

3.   Alternative Methods for  Using  IM240 Data to Develop Basic
     Emission Rates   	 12

     Survey Summary   	 12
     IM240-to-FTP Conversion  Procedure  	 13
     TECHS Inputs   	19
     Treatment of Light-Duty  Trucks    	 23

4.   Adjusting I/M Data  to  a  Non-I/M Basis	24

5.   Incorporating "Off-Cycle" Emissions into MOBILE  	 28

6.   Use of State-Generated IM240 Data in MOBILE	32

     Data Collection and Development of Simulated FTP Scores   .... 32
     Development of Basic Emission Rates from Simulated FTP Scores   . 34

7.   References   	36

Appendix A - EIRG's Responses to Questionnaire on Developing
             Basic Emission Rates from IM240 Data

-------
                             List of Figures


Figure                                                              Page

2-1  Outline of Emission Factor Development for MOBILES    	  5

2-2  Change to the HC Cold-Start Offset as a Function of
     Mileage for 1983+ MPFI Vehicles  	  9
3-1  Comparison of Very High + Super Emitter Fractions -
     TECHS vs. Hammond Data for MPFI/CL Vehicles	21

4-1  Effect of An I/M Program on Emissions as a Function of
     Repair Cycle   	 26
                             List of Tables


Table                                                               Page

2-1  Final Seasonal Fuel/Temperature Adjustments Used for
     MOBILESa (Ratio of Lab/Indolene IM240 Scores to Lane/
     Tank Fuel IM240 Scores)  	6

2-2  IM240-to-FTP Correlation Equations Developed for MOBILES   ...   7

3-1  Summary of IM240/Basic Emission Rate Survey Scores   	  14

3-2  Effect of Multiple Counting of Foreign Vehicles on the
     Distribution of Emitter Categories by Technology Type for
     1983 and Later Model Years   	15

-------
                           1.  INTRODUCTION
Background

With the release of MOBILES, the U.S. Environmental Protection Agency
 (EPA) made a significant departure from the historical method of using
its Emission Factors database to develop exhaust basic emission rate
 (BER) equations  (i.e., the non-I/M emission rates in the model).   In
previous versions of MOBILE, data used for the BERs were collected
through a process often referred to as "surveillance" testing, where
vehicle owners are randomly contacted (usually by letter) and asked to
give up their cars for a week of testing.  Over the years, EPA has
become concerned that the vehicles they receive for the Emission Factors
testing are not representative of the in-use fleet,  particularly with
respect to the fraction of poorly maintained,  high-emitting vehicles.
This has been primarily attributed to a sample selection bias, e.g., if
a vehicle owner knows that his or her car has been poorly maintained or
has been tampered,  he or she will not voluntarily submit it for
emissions testing.

To overcome sample bias concerns {and to provide a much larger sample
for analysis),  EPA used IM240 emissions data collected during the
initial two years of an inspection and maintenance (I/M)  program in
Hammond,  IN,  to develop the exhaust basic emission rate equations for
MOBlLESa."  It  was  felt  that this approach  would provide  a less biased
sample because all vehicle owners had to participate in the state-run
portion of the program.   EPA then recruited vehicles from the state-run
testing lanes for the EPA tests.

Because all of the exhaust emission relations contained in MOBILE (e.g.,
temperature corrections,  speed corrections, etc.)  are based on FTP
testing with certification fuel (Indolene), a means to convert the IM240
data collected at the lane on tank fuel to an FTP/Indolene basis  was
needed.  This conversion process was a multi-step procedure,  consisting
of the steps  listed below.

     •  Factors that accounted for the differences in ambient
        temperatures and fuel characteristics between conditions
        experienced during IM240 testing at the I/M lane and IM240
        testing in the laboratory were developed from a subset of
        Hammond lane vehicles.
  Vehicles  were tested in their first I/M "cycle,"  and therefore  the
data represent emissions from a non-I/M fleet.

-------
         Those  factors  were used to convert  all  the Hammond lane IM240
         data (tested with tank fuel)  to a laboratory/Indolene IM240
         basis.

         Correlation equations  between IM240 emissions  on Indolene
         measured in the  lab and FTP values  on Indolene in the lab were
         developed from a sample of vehicles.

         These  correlation equations were then applied  to all  of the
         Hammond  IM240  data (first  adjusted  for  fuel  and temperature
         differences) to  put all data on an  FTP/Indolene basis.


Once  the  IM240-to-FTP  conversion process described above was  completed,
the TECH5 model  was  used to calculate the BER equations (zero-mile level
and deterioration rates)  for MOBILE.   The TECH  model uses a  "regime"
approach  to  develop  emission rates (as  a function  of vehicle  mileage)  by
model-year group (i.e.,  1981-1982  and 1983+)  and technology  (i.e.,
closed-loop  multipoint fuel injection (MPFI/CL), closed-loop  throttle-
body  injection (TBI/CL),  closed-loop  carbureted (CARB/CL), and  open-
loop) .  Four emitter groups (or regimes)  are  defined in the TECH  model:
normals, highs,  very highs,  and supers.   Emission  rates (by model-year
group/technology)  are determined by multiplying the emission  rate  of
each  emitter category by the fraction of each emitter  category  making up
the fleet at mileage intervals  corresponding  to vehicle age.  Thus, two
primary inputs to  the TECH  model are  the emitter-category emission rates
and the emitter-category population growth  rates.  Once the model year
group/technology emission rates  are calculated,  model-year-specific
emission factors  (which  are input  to  MOBILESa)  are generated  by
weighting the emission rates of  each  group  by its  expected fraction of
the fleet.

Although the IM240-to-FTP  conversion  approach provides  a  considerably
larger sample from which to develop BER  equations  for  the MOBILE model,
several potential shortcomings  have been identified in  evaluations
sponsored by the American  Petroleum Institute (API) .lr2"  Thus, in Work
Assignment 0-06  of contract #68-C4-0056,  EPA  directed  Sierra  Research,
Inc.  (Sierra)" to perform an evaluation of  ways in which the  use  of
IM240 data for the development  of  basic  emission rates  could  be
improved.  In addition,  the Work Assignment called for  an assessment of
how IM240 data collected in an  I/M area  could be adjusted to  a  non-I/M
basis, recommendations for  incorporating off-cycle effects into the
MOBILE model, and a review  of methodologies by  which state-generated
IM240 data could be used to develop user-input  basic emission rates for
MOBILE.   This report documents  the  evaluations  performed under  this Work
Assignment.
  Superscripts denote references listed in Section 7 of' this report.

  Sierra received assistance  from  subcontractors  Air  Improvement
Resource (AIR) and Energy and Environmental Analysis  (EEA) during the
performance of this work assignment.
                                   -2-

-------
Organization  of the Report

Following this introduction,  Section 2 provides an overview of  the
methods used to develop basic emission rate equations for MOBILES from
the Hammond IM240 data.  Section 3 follows with a discussion of
alternative methods for developing basic emission rates  from IM240  data.
An assessment of methods to adjust IM240 data collected  in an I/M area
to a non-I/M basis is contained in Section 4,  while recommendations for
incorporating off-cycle emissions into the MOBILE model  are presented in
Section 5.  Section 6 is a discussion of how IM240 data  collected by
states could be used to develop user-input basic emission rates  for
MOBILE, and Section 7 lists the references cited in this report.
                                  ###
                                  -3-

-------
          2.  OVERVIEW OF EMISSION FACTORS DEVELOPMENT
                              FOR MOBILES
Before proposing a method  (or methods) to develop basic emission rates
for the MOBILE model from IM240 data, it is useful to review the
procedure used to generate basic emission rates for MOBILES.  That
approach, which is diagrammed in Figure 2-1, was based on converting
IM240 data collected in Hammond, IN, to an FTP basis prior to the
development of inputs for the TECHS model.   The conversion process
(which consisted of a number of individual adjustments in addition to
the development of IM240-to-FTP correlation equations) and the TECHS
inputs developed from those converted data are described in this section
of the report.


Database Adjustments

Prior to the development and application of IM240-to-FTP correlations,
several adjustments were made to the IM240 database so that it better
reflected a national average mix of domestic and foreign vehicles.   In
addition, vehicles that had missing or suspicious odometer readings were
deleted,  as were vehicles tested in March and April on days in which the
temperature was 25°F or more above the monthly average.   These
adjustments are described below.

Foreign Manufacturers - Because the vehicles tested in Hammond did not
accurately reflect the national average fraction of foreign vehicles,
each foreign vehicle in the database was counted two to four times.
This adjustment increased the 1981 and later model year light-duty
vehicle sample size from 6,597 to 7,821.

Missing or Suspicious Mileage - A number of vehicles in the Hammond
database had '0'  or missing mileage and were deleted.   In addition,
vehicles that were coded as having an odometer reading > 300,000 miles
were deleted.  This adjustment decreased the database from 7,821 to
6,999 records.

Seasonal Outliers - Data collected on 14 test dates in March and April
when the ambient temperature was 25°F or more above the monthly average
were deleted because many of those vehicles were statistical outliers.
(Excessive purge was thought to be influencing the IM240 results.)   This
affected a relatively small number of vehicles,  and it resulted in
decreasing the database from 6,999 to 6,826 records.
                                   -4-

-------
Outline of Emission Factor Development for MOBILES"
         IM240 Data Collected in Hammond, Indiana
              > ;, Database Adjustments
          Non-representative foreign manufacturers
                Missing/suspicious mileage
                    Seasonal outliers
                           i
             t f-cfef/Temperature Adjustments
            Applied to get lane/tank fuel IM240s
                  on a lab/lndolene basis
              tf !M240-to-FTP Correlations
            Lab/lndolene IM240s correlated with
                    lab/lndolene FTPs
                    Cold-start function
                  Application of residuals
                Predicted FTP Scores from
                      IM240 data
                                        V *
              Emitter category emission levels
             Emitter category growth functions
* Discussion of components in shaded boxes follows.

-------
Fuel/Temperature  Adjustments

Because EPA wished to develop the IM240-to-FTP correlations based on
vehicles IM240 tested in a laboratory with Indolene, a method was needed
to account for the differences between the lane and the lab before the
correlation equations were applied to the Hammond lane IM240 data.  For
the Hammond database, it was felt that those differences were primarily
related to tank fuel versus Indolene and the temperature differences
occurring between the lane and the lab.  (However, a number of other
differences could also impact test variability between the lane and the
lab, e.g.,  vehicle preconditioning procedures,  inconsistent dynamometer
settings,  how well the IM240 speed-time trace is followed,  etc.)

The fuel/temperature adjustments prepared for MOBILES were based on a
subset of the Hammond vehicles that were tested at the lane on tank fuel
and at the lab on Indolene.  Adjustment factors were developed by season
(i.e.,  March-April,  May-June,  July-September,  and October-February)  and
the following emitter categories:

     •   Normal HC/CO - lane IM240 s 1.64 g/mi HC and < 13.6 g/mi  CO,
     •   High HC/CO - lane IM240 > 1.64 g/mi HC or > 13.6 g/mi CO,
     •   Normal NOx - lane IM240 < 2.0 g/mi NOx, and
     •   High NOx - lane IM240 > 2.0 g/mi NOx.

Once the data were segregated as outlined above,  the mean emission
levels  for  the lane/tank fuel scores and the lab/Indolene scores  were
determined.   Adjustment factors were then developed from the ratio of
these mean  values.  A summary of those adjustment factors is shown in
Table 2-1.
                               Table 2-1

     Final Seasonal Fuel/Temperature Adjustments Used for MOBILE5a
   (Ratio of Lab/Indolene  IM240 Scores to Lane/Tank Fuel IM240 Scores)
Pollutant
HC
CO
NOx
Emitter
Group
Normal
High
Normal
High
Normal
High
Seasonal Adjustment Factor
Mar -Apr
0.766
0.851
1.072
0.934
0.809
0.784
May-Jun
0.884
0.940
1.007
1.038
0.825
0.736
Jul-Sep
0.823
0.935
0.792
0.880
0.913
0.669
Oct-Feb
0.880
1.137
1.036
1.074
0.862
0.826
                                   -6-

-------
 IM240-to-FTP Correlations

 Once the Hammond lane IM240 data were adjusted to a  lab/Indolene basis,
 correlation equations relating the IM240 to the FTP  were applied to the
 data.  The IM240-to-FTP correlations were based on a regression analysis
 of data collected from vehicles tested over the IM240 on Indolene and
 the FTP on Indolene.  (The database used for the correlation analysis
 included vehicles from the Hammond program as well as vehicles tested in
Ann Arbor.)  The regressions were performed according to the following
model-year groups and technology types:

     •  1981-1982,
        1981+ open-loop,
        1983+ carbureted/closed-loop,
     •  1983+ throttle-body injection/closed-loop, and
     •  1983+ multipoint fuel-injection/closed-loop.

The HC and CO correlations were performed in log space with a cold-start
offset ("X" in the equation below) that varied by technology, while the
NOx correlations were based on a simple linear equation without a cold-
start offset value:

     •  Log10(FTPHC/co - X) = b + m*Log10(IM240HC/co)
FTP>,
= b
                    m*IM240N
For cases in which (FTPHC/CO - X) < 0.01, the IM240 score was substituted
for (FTPHC/CO - X) .   In this way, errors resulting from taking the
logarithm of a negative number were avoided.  In addition, if the
intercept term was not statistically different from zero at the 95%
confidence level,  the regressions were re-run without an intercept.
Table 2-2 summarizes the results of the correlation analysis.
                                Table  2-2

        IM240-to-FTP Correlation Equations Developed for MOBILES

Pollutant
HC




CO




NOx




Model Year/
Technology
1981-1982
1981+ Open-Loop
1983+ CARB/CL
1983+TBI/CL
1983+ MPFI/CL
1981-1982
1981+ Open-Loop
1983+ CARB/CL
1983+TBI/CL
1983+ MPFI/CL
1981-1982
1981+ Open-Loop
1983+ CARB/CL
1983+TBI/CL
1983+ MPFI/CL

N
58
24
73
224
211
58
24
73
224
211
58
24
73
224
266

X
0.309
0.315
0.195
0.180
0.222
2.140
1.640
1.579
1.541
1.696
NA
NA
NA
NA
NA

b
0.1382
0.1448
0.0000
0.0000
0.0000
0.0000
0.3090
0.0000
-0.1386
0.0000
0.2534
0.0000
0.0000
0.0767
0.1250

m
1.0715
0.9654
0.9745
0.9840
0.9520
1.004
0.851
0.906
1.072
0.886
0.7737
0.9306
0.8925
0.8234
0.7730

R2
0.909
0.879
0.905
0.873
0.915
0.943
0.904
0.873
0.782
0.780
0.825
0.976
0.961
0.901
0.825
                                   -7-

-------
Correlation Adjustments

When  the  correlation  equations  were  applied to the lane IM240  scores
 (which  had been  corrected  to  a  lab/Indolene basis) ,  two additional
adjustments were made.   First,  the cold-start  offset was assumed to be a
function  of vehicle odometer  (although  the  correlations were performed
with  a  constant  X value) ;  second, regression residuals  were randomly
applied to each  data  point.

Cold-Start Offset - The  cold-start offset  (X)  values used in the above
correlation equations were developed, by technology group, from the mean
value of  the  difference  between the  FTP and the IM240 based on normal
emitters  with FTP values greater than the IM240 (i.e.,  the value of
 (FTP  -  IM240) was determined  for each normal emitter, and the  mean  of
the positive  results  was used as X) .  When  the correlation equations
were  applied  to  the IM240  data,  the  value of X was adjusted to account
for the effects  of aging and  mileage.   Development of this adjustment
for 1983+ model  years is described below.   (A  slightly  different
procedure was used for 1981-1982 model-year vehicles.)

The value of  X in the correlation equations reflects the cold-start
offset  at the mean mileage of the correlation  sample.   At mileages  below
this mean, it follows that X  should  be  decreased by some amount to
account for the  fact  that  the catalyst  has  been aged less and  is
expected  to be more active.   (Alternatively, X should be increased  at
mileages  above the mean.)  Thus,  the cold-start offset  is actually  X
plus an increment that is a function of vehicle odometer, i.e.,

     X-Offset Function = f(x) =  X +  f (Odometer)


EPA has defined  f (Odometer) in  the above equation  to be "the difference
of the model year means  regression for  normal  emitters  and a 'New'  line
created by connecting a  point on the model  year means regression line  at
the mean  mileage of the  correlation  sample  with the  zero mile  level  used
in MOBILE4 . 1 . "  The X-offset  function is therefore:
     f(x) = X + ZVSLmB1M,l -  ZMLmMeans  +  ODOM*(DET.New,  -  DET^ Means)
A plot of the two lines described above for HC from multipoint  fuel-
injected vehicles is shown in Figure 2-2  (XHC  = 0.222  g/mi  for  that
group) .

Regression Residuals - Another adjustment made during the  application of
the correlation equations was the addition of randomized regression
residuals ,  i.e.,

     Log10(FTPHC/co  - X)  = b + m*Log10(IM240HC/co)  +  res
     FTPNOx = b + m*IM240NOx + res
where "res" represents regression residuals from the correlation  sample.

-------
                                  Figure 2-2
      (Pas
            0.6
            0.5
                Change to the HC Cold-Start Offset as a Function
                      of Mileage for 1983+ MPFI Vehicles
           0.269
            0.2
    Cold-Start Offset - X + l(Odomater)

AtO mile*. f(Odometer) = 0.269-0.308 = -0.030 8/rrt
                Modal Year Mean*
                 Regression Una
                         - ((Odometer)
                  Cold-Stan Offset It
                  Decreased Relative
                     toX
                'New1 Un« Based
               on MOBILE4.1 ZML
                                                  Mean Mileage
Cold-Start Offset It
Increased Relative
    toX
                            4       6       8      10
                             Odometer (10,000 miles)
                                      12
            14
According to EPA, adding  the  residuals randomly to the FTP emission
levels predicted by the correlation equations  attempts to restore  a
distribution of predicted FTP values for a given IM240 score.
Otherwise,  there will be  a single predicted FTP value for each  IM240
score.  A distribution of predicted FTP scores and emission levels is
important for some analyses,  such as the determination of I/M credits.
For example,  if residuals were not applied, 100% of the FTP emissions
from a certain emitter group  could be identified on the basis of the
IM240 score.
TECHS  Inputs

Once the  Hammond data were  converted to predicted FTP scores, the
results were used to develop inputs to the TECH5  model (i.e., emitter-
category  emission rates and growth functions).  The following emitter
categories  were used in TECH5 for HC and CO emissions:

     •  Normal HC/CO - HC 1 0.82 g/mi and CO  <,  10.2 g/mi,
     •  High HC/CO - HC > 0.82 g/mi or CO > 10.2  g/mi,
     •  Very High HC/CO - HC > 1.64 g/mi or CO  >  13.6 g/mi, and
     •  Super HC/CO - HC >  10.0 g/mi or CO >  150.0 g/mi.

NOx emissions were analyzed separately from HC  and CO, with only two
emitter categories being defined:  normals  (<. 2.0 g/mi) and highs
(> 2.0 g/mi).
                                     -9-

-------
 The  data were  also  segregated by  the  following  technology groups:

      •  open-loop,
      •  carbureted/closed-loop,
      •  throttle-body  injection  (TBI)/closed-loop,  and
      •  multipoint  fuel-injection (MPFI)/closed-loop.


 Finally, emission rates were determined  separately  for  1981-1982 model
 year vehicles  and 1983+ model year vehicles.

 HC/CO Emission Rates - For HC and CO, the emitter category emission
 rates (i.e., zero-mile level (ZM) and deterioration rates (DRs)) were
 constructed as outlined below.

      1. MOBILE4.1 ZMs were used for 1981-82 normals.
      2. 1983+  DRs were used for 1981-82  normals, highs, and very highs.
      3. Emission rates of normals were capped at the same rate  for
        1981-82 and 1983+ groups.
      4. Normal caps were set at the maximum of  the 1981-82 or 1983+
        100,000-mile levels calculated from the 1981-82 and 1983+ ZM
        and DR for  normal emitters.
      5. Deterioration rates that  were negative  and without significance
        were assumed to be zero.
      6. Regression  of carburetor  very highs was performed for 1983-1988
        model  years only (although the regression results were  applied
        to all 1983+ carbureted vehicles).   Including 1989 resulted in
        a negative  ZM.
      7.  A covariance analysis was used for fuel-injected very highs
        that resulted in the same DR but different ZM levels for the
        1981-82 group and the 1983+ group.   (This resulted in
        substantially higher HC and CO emission rates from the  1983+
        group  compared to the 1981-82 group.)
      8.  All model years were combined for supers.


NOx Emission Rates  - The following procedure was used to develop NOx
emitter category emission rates:

      1.  1981-82 model year normals used  the MOBILE4.1 ZM and the DR was
        determined  from the mean  emission level and mileage of  the
        Hammond sample.
      2.  1983+  model year normals used a  covariance analysis that forced
        the deterioration rates to be equal for vehicles certified to
        1.0 and 0.7 g/mi NOx.  (This resulted in different zero-mile
        levels.)
      3.  High NOx emitters used DRs from  the normal NOx  emitters, and
        the ZM levels were back-calculated from the mean emission level
        and mileage of the Hammond sample.


Growth Functions - Equally as important  as the emission level of each
emitter category are the growth functions assigned to those categories.
For MOBILES, EPA wanted to base emission control system deterioration on
both vehicle age and mileage.  This was  done by using data from 1987 and
                                  -10-

-------
later model years to establish  the growth rate of non-normals  (i.e.,
highs + very highs + supers)  for mileages less than  50,000.  For
mileages above 50,000, data  from the  1981-86 model years were  used  for
the TBI and carbureted technology groups, while data  from  1984-86 model
years were used for MPFI vehicles.   (EPA judged that  pre-1984  MPFI
represented "prototype" technology.)

The method used to establish  the emitter-category growth rates was  based
on first developing growth rates for  the following emitter groups:

     •  supers,
        very highs + supers,  and
     •  highs + very highs +  supers.

Once these were established,  individual emitter-category growth rates
were determined by subtraction.

The analytical technique used to develop the growth functions  for each
of the above groups is best explained with an example.  For the MPFI
very highs+super group, the following process was used.  First, the
<50,000 mile growth rate was established by determining the fraction of
very highs+supers from all 1987+ MPFI data.   In the Hammond sample,
there were 155 very highs+supers out of 1,716 total vehicles in this
group (i.e.,  9.03%).   This fraction was then divided by the average
mileage of the group (28,182) to obtain a growth rate of 0.03205/10,000
miles.   The growth rate beyond 50,000 miles was calculated by first
determining the  fraction of very highs+supers in the 1984-1986 model
year group (138/460,  or 30.0%) and the average mileage of that group
(68,464) .   The second growth rate was then calculated by linear
extrapolation of a line connecting the fraction of very highs+supers at
50,000 miles  (i.e.,  5*0.03205, or 16.0%)  and the point established from
the >50,000 1984-86 group (i.e., 0.300 at 68,464 miles).  This resulted
in a >50,000  growth rate of 0.07568/10,000 miles.
                                   ###
                                  -11-

-------
        3.  ALTERNATIVE METHODS  FOR USING  IM240  DATA TO
                   DEVELOP  BASIC  EMISSION RATES
In developing alternatives to the MOBILES methodology for using IM240
data to generate basic emission rate equations,  the following approach
was used.  First, members of an ad hoc Emission Inventory Review Group
(EIRG)* were asked to provide their thoughts  on  the  strengths  and
weaknesses of the MOBILES methodology.  In addition, they were asked to
suggest alternatives to that methodology.  Their responses were then
used to formulate an informal survey in which the MOBILES methods and
proposed alternatives were ranked.  The results of that survey helped
focus the development of the recommended alternatives presented in this
section.

The discussion below first presents a brief summary of the survey
responses.  That is followed by a description of each of the adjustments
and calculations performed in the MOBILES approach,  with a summary of
the concerns and limitations expressed by the EIRG.   Recommended
alternatives conclude each discussion point.   This discussion is
structured in two parts:  (1) the IM240-to-FTP conversion procedure,  and
(2) inputs to the TECHS model.  Although the Scope of Work for this
project called for the development of a single methodology for
estimating MOBILE basic emission rates from IM240 data,  in some cases it
is impossible to recommend a single method without first reviewing the
results of several alternatives.  Thus, some portions of the following
discussion contain more than one recommended approach.


Survey Summary

As described above,  a questionnaire was circulated to members of the
EIRG which summarized the methods used to develop basic emission rate
equations for MOBILES and asked for a listing of strengths,  weaknesses,
and alternatives to each specific adjustment that was performed.  (For a
summary of the methods used for MOBILES,  refer to Section 2  of this
report.)  The responses to that questionnaire,  which are summarized in
Appendix A,  helped form the basis of a survey that was distributed to
the EIRG.   In that survey,  participants were asked to rank the
importance of specific data adjustments and alternative methods that
could be used in the development of basic emission rate equations  from
IM240 data.   The purpose of the survey was to provide a more objective
ranking of the importance of adjustments and alternative methods,  which
  The EIRG was  made  up  of  individuals  responsible for emission  factor
development from EPA's  Office of Mobile Sources,  the California Air
Resources Board's Mobile Source Division,  EEA,  AIR,  and Sierra.
                                  -12-

-------
would  then help  focus efforts to expand on some of the EIRG's
recommendations.

Although surveys were not filled out by the entire EIRG, responses were
received by six participants.  A summary of those responses, with an
average score for each question/recommendation, is contained in
Table  3-1.  In general, the results indicate that alternatives to the
methods used to develop MOBILES basic emission rate equations from the
Hammond IM240 data are preferred.


IM240-to-FTP  Conversion Procedure

The first step in the development of basic emission rates for MOBILES
was to convert the lane IM240 data collected in Hammond to an FTP basis.
That process consisted of several steps, including adjustments for a
non-representative mix of foreign and domestic vehicles, corrections for
suspicious or missing mileages,  and corrections to get the lane IM240
data (collected with tank fuel)  on a laboratory/Indolene basis (these
are thought of as temperature and fuel corrections).   This last step was
necessary because the IM240-to-FTP correlations were based on laboratory
IM240 and FTP tests with a standard fuel (Indolene)  at a standard
temperature.  Finally,  correlation equations were applied to the lane
data to generate simulated FTP scores for the entire lane IM240
database.

Below is a brief description of the methods used in the IM240-to-FTP
conversion process for MOBILES.   Following the description of each
adjustment/method is a summary of the concerns expressed by the EIRG and
recommended alternatives.

Database Adjustments - Foreign Manufacturers - Because the vehicles
tested in Hammond did not accurately reflect the national average
fraction of foreign vehicles,  each foreign vehicle in the database was
counted two to four times.

Limitations and Concerns with Current Approach - The EIRG generally
agreed that sampling biases should be accounted for if it can be
established that durability differences are significant.  The
foreign/domestic split is only one possible bias,  and it is possible
that durability differences among engine families or among manufacturers
are equally important.

Recommended Alternative - There was really no consensus reached among
the survey respondents on how to proceed with a sample selection bias
correction.  However, it is clear that the first step is to determine if
durability differences are significant.   That can be done a number of
different ways.   For example,  Table 3-2 presents the distribution of
emitter categories (i.e.,  normals,  highs,  very highs,  and supers)  with
and without the foreign vehicle adjustment used in MOBILES.   The table
indicates that the impact of not making this adjustment is most
pronounced for carbureted vehicles,  with only slight changes occurring
in the distribution of emitter categories for the fuel-injected
technologies.
                                  -13-

-------
Summary of IM240/Basic Emission Rate Survey Scores
Adj ustment/Methodology
Database Adjustments - Weighting of Foreign Vehicles
Is this adjustment needed?
If so, what is the best approach?
Current method
Manufacturer/engine family basis
Foreign/domestic with more emphasis on tech type
Database Adiustments - Missing or Suspicious Mileage
Is this adjustment needed?
If so, what is the best approach?
Current method
Change to an age-based analysis
Assign sample average mileage
Assign average mileage based on vehicle age
Database Adiustments - Seasonal Outliers
Is this adjustment needed?
f so, what is the best approach?
Current method
Limit use of data to FTP temperature ranges
Establish a temperature range for each RVP "season"
Temperature correct the outliers
Determine if purge is really higher during those periods
Only reject data on basis of statistical/engr analyses
Fuel/Temperature Adiustments
Is this adjustment needed?
If so, what is the best approach?
Current method
Choose records that are similar to FTP conditions
Multivanate analysis of differences
Quantify temperature effects independently of fuel
Statistical analysis of all external variables
Correlate FTP directly with lane IM240s
MOBILE temp/RVP factors to adjust lane IM240s
Split data into temperature regimes
IM240-to-FTP Correlations
What is the best approach for correlating IM240 with FTP?
Current method
Multiple correlations by emitter group
Explore different equational forms, choose best stats
Base choice of equational form on most random variance
Regress IM240 against individual FTP bags
Eliminate cold-start offset; use IM240 for FTP bags 2/3
Correlation Adiustments - Cold-Start Offset
What is the best way of determining a cold-start offset?
Current method
Regress bag 1 versus IM240 and use if statistically significant
Use IM240 for non-start emissions; FTP for cold start
Use IM240 for bag 2/3; bag! vs bag2/3 from FTP data
Correlation Adiustments - Regression Residuals
Should regression residuals be used in IM240/FTP regressions?
If so, what is the best approach?
Add in randomized residuals when applying correlation eqn
Develop a probability distribution
Use a log-normal or Weibull distribution of residuals
TECHS Incuts - Emission Rates
Should it be assumed that DRs are the same for different MYs?
Is there any basis for using MOBILE4.1 rates in this analysis?
Do the MY breakpoints adequately reflect developing versus
mature technology?
TECHS Inputs - Growth Functions
What is the best way to develop emitter growth functions?
Current method
Add a third linear growth rate beyond 100,000 miles
Analyze the data in 10,000-mile bins
Respondent Scores
DJB

3

3
3
5

4

3
4
3
2

4

3
4
3
1
2
2



3
4
3
3
3
2
2
4


3
4
3
3
4
5


2
2
4
3

4

3
4
3

5
4

4


2
4
4
RAR

2

2
4
3

5

2
3
4
5




5





4

2
3
3
5
3
3
1
1


1
4
5
5
5
5


1
5
3
3

4





2
1

1


1
3
5
JL

3



4

4




4

2


4


4
4

1




5











4





4

4



4

3
2

2




5
LSC

3

1
4
4

5

1
4
1
2

3

2
3
3
2
4
4

3

2
3
4
3
4
3
3
3


2
4
4
2
4
3


2
3
3
3

2

2
2
2

1
1

3


1
3
4
ROD

5

1
5
4

4

1
4
4
5

5

2
1
5
5
3
5

5

1
1
5
3
4
1
4
4


1
3
3
2
4
5


0
3
4
5

2

1
2
3

1
1

1


1
1
5
PLH

3

2
3
3

5

2
2
2
5

4

2
4
5
2
4
2

5

2
4
3
4
4
3
2
2


1
3
4
3
3
5


1
3
5
4

3

3
5
4

1
1

1


1
3
5
Average
Score

3.2

1.8
3.8
3.8

4.5

1.8
3.4
2.8
3.8

3.6

2.3
3.5
4.0
2.5
3.4
3.4

3.6

2.0
3.0
3.6
3.8
3.6
2.4
2.4
2.8


1.6
3.6
3.8
3.0
4.0
4.5


1.2
3.2
3.8
3.7

3.2

2.3
3.3
3.2

2.2
1.7

2.0


1.2
2.8
4.7

-------
                                Table 3-2

 Effect of Multiple Counting of Foreign Vehicles on the Distribution of
  Emitter Categories by Technology Type for 1983 and Later Model Years


Technology
Multiple
Foreigns6
MPFI/CL
TBI/CL
CARB/CL
OPEN LOOP
Single Foreigns
MPFI/CL
TBI/CL
CARB/CL
OPEN LOOP
Total
Data
Points

2208
1991
1654
252


1742
1873
1344
196
Emitter Category

Normal

0.788
0.718
0.540
0.214


0.776
0.722
0.503
0.189

High

0.077
0.141
0.149
0.210


0.082
0.138
0.158
0.194

V. High

0.131
0.135
0.303
0.567


0.138
0.133
0.331
0.607

Super

0.004
0.007
0.008
0.008


0.005
0.007
0.008
0.010
  Represents data used for MOBILES.
It is recommended that a similar analysis (or some type of analysis of
variance) be performed on a manufacturer-specific basis to determine if
durability differences exist.  Based on the data presented in Table 3-2,
it appears that this would be most important for carbureted vehicles.
It would also be useful to determine if manufacturer-specific (or
foreign/domestic) differences are more prevalent as a function of model
year (e.g., do the early 1980 model year vehicles exhibit a greater
emissions difference than late 1980 model year vehicles).   If those
differences do exist, then the data should be weighted accordingly.

Database Adlustments - Missing or Suspicious Mileage - A number of
vehicles in the Hammond database had '0'  or missing mileage and were
deleted.  In addition, vehicles that were coded as having an odometer
reading > 300,000 miles were deleted.

Limitations and Concerns with Current Approach - The primary concern
with deleting these data points is that valid data are removed.   This  .
may be a particular problem for high-mileage vehicles which are badly
needed in the database.

Recommended Alternative - There was fairly strong sentiment that
corrections for missing and suspicious mileage should be made.  In terms
of suspicious mileages, we recommend running an odometer "cleaning"
routine on all vehicles to identify vehicles with unusually high or low
mileage accumulation rates.  That can be done by first estimating the
age of the vehicle at the time it was tested based on the difference
                                  -15-

-------
between  the  test date  and  the model year."  The age at the time of
testing  can  be used  to flag  vehicles with mileage accumulation  rates
below  3,000  miles per  year or above 30,000 miles per year  for closer
inspection.   (Clearly,  other mileage accumulation cutpoints  could be
used in  this type of analysis.)

In  the Hammond database, there were a  significant number  (about 10%) of
vehicles with missing  odometer readings.  Thus, some method  to  estimate
the mileage  of those vehicles is recommended.  The general consensus of
the EIRG was to assign those vehicles  the average mileage  of the
remaining vehicles in  the  database based on the age of the vehicle at
the time it  was tested.

A broader issue related to this adjustment is whether to develop
emission factors based on  vehicle age  rather than accumulated mileage.
Both accumulated mileage and vehicle age play a role in emissions
deterioration and emitter-category growth functions.   (Emitter-category
growth functions are discussed in detail below under "TECHS  Inputs.")
To the extent that some deterioration  of emission control  systems is due
to weathering effects,  emitter-category growth would be best
characterized by vehicle age.  To the  extent that deterioration is due
to vehicle use, emitter-category growth would be best characterized by
odometer reading.  As  currently structured, TECH5 and MOBILES use a
fixed  relationship between vehicle age and odometer so that only one of
these  variables can be used  in determining the population of the various
emitter categories.  The average relationship between vehicle age and
odometer reading shows  that  the average vehicle is driven  fewer miles
per year as  it ages.    Consequently, a nonlinear relationship between
either of these variables  and emitter-category population sizes
represents the combined effects of both.  As described below, this is
the approach  that is recommended for the development of the emitter-
category growth functions.

If age-based  versus odometer-based emission deterioration is still a
concern,  the  analysis  could be limited to vehicles that are within a
certain fraction (or standard deviation) of the mean mileage for each
vehicle age  in the dataset.  This is a reasonably easy adjustment to
perform,  but  it has the disadvantage of eliminating real,  valid data
points.

Lane/Tank Fuel-to-Lab/Indolene Adjustments - Because it is desirable to
develop the  IM240-to-FTP correlation equations with vehicles operated in
a lab on Indolene,  a means to convert the lane/tank fuel IM240 scores to
a lab/Indolene basis is needed.   It is thought that this adjustment is
primarily a  fuel and temperature correction that accounts for the
differences between the lane and the lab.  For MOBILES, this adjustment
was developed from a subset of Hammond vehicles that were tested at the
  Model  years are assumed to run from October 1 of the previous year to
September 30 of the model year.  The midpoint date of April 1 in the
model year is assumed as the initial operating date of the vehicle.  In
cases where the vehicle is tested during its initial model year, it is
assumed to have been placed in operation midway between the start of the
model year and the test date.
                                  -16-

-------
 lane  on tank fuel  and in the  lab on Indolene.   Adjustment  factors  were
 developed by season (i.e.,  March-April,  May-June,  July-September,  and
 October-February)  and emitter category.

 In  addition  to  the general  fuel/temperature  correction,  data  collected
 in  March and April on 14 test dates when the ambient  temperature was
 25°F  or more above the monthly average were  deleted because many of
 those vehicles  were statistical  outliers.   (Excessive purge was thought
 to  be influencing  the IM240 results.)

 Limitations  and Concerns with Current Approach  - The  EIRG's major
 complaint with  the lane-to-lab adjustment utilized in the  development of
 MOBILES emission factors is that it may  not  have accounted for all
 external variables affecting  emissions  (e.g., preconditioning effects).
 In  terms of  the deletion of data points  collected  on  the aforementioned
 test  dates,  there  was concern that  true  high emitters were deleted.

 Recommended  Alternative  - Although  a variety of alternatives  were
 offered by the  EIRG,  none stood  out as being vastly superior  to the
 others.   There  was general agreement that temperature effects should be
 quantified separately from fuel  effects, but the mechanism to do that is
 unclear given that fuel  samples  were not taken  in  the Hammond program.
 The easiest  and most  straightforward way is  to  consider only  data  that
 were  collected  within the FTP temperature range.   However, this may
 vastly  reduce the  number of valid records.   As  a first cut on this
 adjustment,  the number of tests  performed outside  of  the FTP  temperature
 range should be determined from  the  Hammond  database.  If  too many tests
 are discarded,  this approach  would  not be practicable.   (Thus, even if
 the Hammond  IM240  database, which contains approximately 16,000 records,
 is severely  diminished,  the large volume of  data available from
 operating  IM240  programs could be used to fill  the void.)

Another  approach that has merit  is  to establish a  different temperature
 range for  each  RVP  "season" that  would result in similar vapor
generation rates.  This  would minimize the effect  that excessive purge
might have on IM240 emission  rates.  In addition,   under hot stabilized
operation  (which,  ideally, is  the mode the vehicle is in during the
 IM240 test),   the temperature  impact  on emissions is not significant.
Establishing  temperature ranges  could be accomplished by analyzing fuel
samples  from  a  number of vehicles each week  or month, and  recording the
 test temperature for  each vehicle and the diurnal  temperature profile on
each test  date.  For  the Hammond  database, it would be worthwhile  to
determine  the availability of  fuel volatility statistics for  that  area
during  the test  program.  This information is likely  to be available for
the summer (e.g.,  through RVP compliance testing),  but the winter months
may pose difficulties.   A survey  of  refiners  supplying fuel to that area
may provide winter  fuel  specifications.

To serve as a check on possible preconditioning or excessive purge
problems,  a  comparison of the  IM240  bag 1 and bag  2 scores needs to be
performed  on  the lane data.   If  the  ratio (or difference)  of bag 1 and
bag 2 is outside of predetermined limits, that record should be
discarded.  A data set that may be useful to determine what those  limits
should be  is  the ASM/IM240 comparison test program conducted by EPA in
Phoenix.   In  that  program, roughly half of the vehicles were  tested with
                                  -17-

-------
 the  ASM first,  while the other  half  were tested with the IM240 first.
 The  IM240  scores  that were  collected immediately following the ASM test
 should  represent  a well-preconditioned subset  of vehicles.

 Correlation  Eoruations -  Once  the  Hammond lane  IM240  data were adjusted
 to a lab/Indolene basis,  correlation equations relating the IM240  to the
 FTP  were applied  to the  data.   The IM240-to-FTP correlations were  based
 on a regression analysis of data  collected from vehicles tested over the
 IM240 on Indolene and the FTP on  Indolene.   (The database used for the
 correlation  analysis included vehicles from the Hammond program as well
 as vehicles  tested in Ann Arbor.)  The regressions were performed
 according  to the  following  model  year groups and technology types:

      •   1981-1982,
         1981+ open-loop,
         1983+ carbureted/closed-loop,
         1983+ throttle-body injection/closed-loop, and
      •   1983+ multipoint fuel-injection/closed-loop.


 The  HC  and CO correlations  were performed  in log space  with a cold-start
 offset  ("X"  in  the  equation below) that  varied by technology,  while  the
 NOx  correlations  were based on a  simple  linear equation without  a  cold-
 start offset value:

      •   Logi0(FTPHC/co - X)  =  b  +  m*Log10(IM240HC/Co) + res
      •   FTPNOx = b + m*IM240NOx + res


 For  cases  in which  (FTPHC/CO  -  X) < 0.01,  the IM240 score was  substituted
 for  (FTPHC/CO  - X) .   In this  way, errors resulting from taking the
 logarithm  of  a  negative  number were  avoided.   The "res"  term in  the
 equation above  represents regression  residuals  from  the correlation
 sample.

 Limitations  and Concerns with Current Approach -  The  EIRG expressed  two
primary  concerns with the correlation method developed  for  MOBILES.
 First,  the use  of the cold  start  offset  ("X" in the  equation above)
 implies  that  the  IM240 can  predict vehicle emissions  during  cold
 operation.    Since the IM240 is a  hot  test, it  should  be used only  to
 estimate running  emissions.   Second,   there was  general  discomfort  with
 the use of the  log-based equation.   In addition,  it was not  clear  to
 some members  of the  EIRG  that adding  residuals  was necessary for
developing basic  emission rate equations  (particularly  since these data
were not used to generate the I/M identification rates  used by TECHS to
develop the  I/M credits matrices  for  MOBILES).

Recommended  Alternative  - We  recommend that IM240 data  be used only  to
predict hot  stabilized vehicle operation.  That being the case,  there
remains a question of whether the IM240  should  be correlated only  with
bag  2 of the  FTP  or with  a  combination of bag  2  and bag 3.   Because  a
combination  of  bag 2  and bag  3 encompasses a broader  range  of vehicle
operation  (i.e.,  the  speed  during bag  2 never  goes above 35  mph, with
most operation  below  30 mph),  it  is  recommended that  the correlation be
performed between the IM240 and a "hot FTP"  (i.e., bag  2 weighted  52.1%
                                  -18-

-------
 and bag  3 weighted  47.9%).  Although bag  3 contains a start, the impact
 of that  is minimal  because  the engine and emission control  system do not
 cool off significantly  during the  10-minute soak between bag 2 and
 bag 3 .

 The IM240/FTP regressions should be developed by exploring  a number of
 different correlation equations and choosing the ona(s) that gives the
 best agreement.  (Different  sets of data may have different  equations
 giving the best  agreement.)  One possible form to consider  is the use of
 separate regressions for the normal-emitting vehicles and the high-
 emitting vehicles.  This can be done without the arbitrary  selection of
 a break point by testing all possible data points as a break point.  The
 final break point is selected as the one  that provides the minimum
 error.  In addition to  exploring different functional forms for the
 regression equations, the technology groups chosen for MOBILE5 should be
 reevaluated.

 Although it is unnecessary  for the development of basic emission rates,
 it may be desired to add regression residuals to the correlation
 equations to obtain a more  random distribution of predictions.   If this
 adjustment is made, it  should be based on developing a distribution of
 residuals about  the mean.   It is likely that such a distribution would
 take the form of a  log-normal distribution,  with the longer "tail"  being
 above the mean.  For all cases in which residuals are applied,  an
 evaluation of how the application of those residuals changes the overall
 distribution of  emissions must be performed.   If the mean emission
 levels are changed  by adding in residuals, the method used to
 incorporate that effect must be reviewed and modified as necessary.

 A final point related to the correlation analysis is how to account for
 cold start emissions.   It is recommended that a cold start offset (i.e.,
 bag 1 - bag 3)  be developed from FTP data, with different factors being
 developed based on  technology and emitter category.   As discussed below
 (in the section on  off-cycle emissions),  it may be possible to  determine
 cold start emissions from recent testing conducted by CARB on the LA92
 cycle.  The start component of the LA92 is much more representative of
 driving behavior during vehicle start-up than bag 1  of the FTP  because
 it contains a wider range of in-use operation.   Alternatively,  a
 separate correlation between bag 1 and bag 2/bag 3 of the FTP could be
 developed using only FTP data.   Since bag 2/bag 3 estimates are
 available through the IM240/FTP correlation outlined above,  it  would
 then be possible to determine bag 1 emissions.   This approach has the
 advantage of being  computationally simple, but it may not adequately
 describe emissions  during vehicle start-up.


TECHS Inputs

Once the IM240 data were converted to an FTP basis,  the FTP-based
results were used to develop inputs to the TECHS model.   There  are two
primary inputs to the TECH model that drive the computation of  basic
 (i.e.,  non-I/M)  emission rates for use in MOBILE:  technology-specific
emitter-category emission rates,  and technology-specific emitter-
category growth functions.   In TECHS (which was used to develop basic
                                  -19-

-------
 emission  rates  for  MOBILES),  four  technology  groups were used  for  1981
 and  later model  year  vehicles:

      •  open-loop,
        carbureted/closed-loop,
      •  throttle-body fuel-injection  (TBI)/closed-loop, and
      •  multipoint  fuel-injection  (MPFI)/closed-loop.


 The  emitter categories used  in TECHS were  defined as  follows:

      •  Normal HC/CO     -     HC <;  0.82  g/mi  and CO <;  10.2 g/mi,
      •  High HC/CO       -     HC >  0.82  g/mi  or CO >  10.2 g/mi,
        Very High HC/CO  -     HC >  1.64  g/mi  or CO >  13.6 g/mi, and
      •  Super HC/CO     -     HC >  10.0  g/mi  or CO >  150.0 g/mi.


 NOx  emissions were  analyzed  separately from HC and CO, with only two
 emitter categories  being defined:   normals  (s 2.0 g/mi) and
 highs  (>2 . 0 g/mi) .

 Emitter-Category Emission Rates - The emitter category emission xates
 developed for TECHS were based on a mix-and-match methodology  that
 appeared  somewhat arbitrary.  Thus, most recommendations of the EIRG
 were directed at a  more consistent  approach to developing emitter-
 category  emission rates.

 Limitations and Concerns with Current Approach - The EIRG expressed
 concern about the use of MOBILE4.1  zero-mile  levels for the 1981-1982
 normal emitters in  the development  of emitter-category emission rates
 for MOBILES.   EPA indicated  that this was done because using only the
 Hammond data for this group  resulted in a zero-mile level that was above
 the emission standard.  This occurred because there were few low-mileage
 1981-1982 model year vehicles in the Hammond database, and the
 regression was being driven  by vehicles with well over 50,000 miles.
 Another concern expressed by the EIRG was that 1983+ deterioration rates
 were used for the 1981-1982 model year group; it is not clear that this
 adequately reflects the difference between evolving (1981-1982) and
 mature (1983+)  technologies, particularly for fuel-injected vehicles.

 Recommended Alternative - As a first cut, the choice of emitter category
 cutpoints needs to be re-evaluated based on a statistical analysis of
 the data  (e.g.,  through a "cluster" analysis) rather than multiples of
 the emission standards.  (It is our understanding that this is being
 done under another work assignment; thus, alternative  cutpoints for
 emitter categories were not  investigated in this effort.)   This re-
 evaluation should also be extended  to the model-year and technology
 groups used for analysis.  Second,  once the emitter categories and
 technology groups are chosen, emission rates should be determined
 independently for each - there is no reason to force deterioration rates
 of early 1980 model year vehicles to be the same as early 1990 model
year vehicles.   For cases in which data are sparse (e.g.,  low-mileage
 1981-1982 model-year normals), it may be possible to bolster the data
 set with FTP data collected  at the Ann Arbor  laboratory.  If the
 emitter-category cutpoints are chosen properly,  a normal emitter's
                                  -20-

-------
 characteristic emission rate should be  somewhat independent of the type
 of  I/M program under which the vehicle  was  operating (i.e., an I/M
 program changes the distribution of vehicles  among emitter categories
 but  not necessarily the emission rate of  those  categories - this is the
 basic  assumption used in California's emission  factor and I/M benefits
 model,  CALIMFAC).

 Emitter-Category Growth Functions - In  MOBILES,  the development of
 emitter-category growth functions relied  on a very simplistic approach
 in which the fraction of non-normals as a function of mileage was
 determined by drawing a line through three  points  - from the origin
 through a point representing the non-normal fraction of 1987+ model year
 vehicles at that  group's average mileage  (for the  < 50,000-mile growth
 rate),  and from the point where that line crossed  the 50,000-mile mark
 through a point representing the non-normal fraction of 1981-1986 model
 year vehicles at  that group's average mileage (for the  > 50,000-mile
 growth  rate).

 Limitations and Concerns with Current Approach  - The EIRG expressed
 concern that the  approach used in MOBILES is  too sensitive to the post-
 50, 000-mile sample  distribution,  and that the 50,000-mile break point is
 essentially arbitrary.   Figure 3-1 compares the TECHS very high+super
 emitter fractions versus the data collected in  the Hammond program for
MPFI vehicles.  As  the  figure shows,  the  growth in the  very high+super
 emitters  is nonlinear and the TECHS method  tends to inflate emissions at
higher  mileages.

                              Figure 3-1

                Comparison of Very High+Super Emitter Fractions
                TECHS vs. Hammond Data for MPFI/CL Vehicles
      1.2
     O
    *3
     O
     co
0.8
    0) 0.6
    Q.
    3
    tfl
    + 0.4
      0.2
                                                              TECHS

                                                             87+ (< SDK)'
                                                            84-86 (> SDK)
87+ (< SDK)'
83-86 (> 50K)
  -Er
  1983+-
  -O
                     5           10
                        Odometer (10,000 miles)
                                              15
                                                           20
    * Emitter fractions based on Hammond data.
    NOTE: Numbers in parenthesis indicate sample size.
                                   -21-

-------
Recommended Alternative  - From  the  survey  responses  received  from  the
EIRG,  there is very  strong  sentiment  that  the  emitter-category growth
functions  should be  developed from  a  statistical analysis of  data  broken
up  into  10,000-mile  bins.   It appears  that  there are sufficient data in
the Hammond sample to do this with  reasonable  certainty up to 100,000
miles.   However, the data beyond 100,000 miles should probably be
segregated into 25,000-mile bins.

Any number of analytical approaches can be  used to develop emitter-
category growth functions.  One method is  to account for the  possibility
that the emitter-category growth functions  could be a linear  relation
(as a  function of mileage) or a nonlinear  relation with either
increasing or decreasing slope.  This  can be modeled with the following
regression equation:

     Pi = A +  B(mile) + C(mile)2  + D(mile)*

where  pt represents the fraction of  emitter group  i as  a function  of
vehicle  mileage; A,  B, C, and D are regression constants; and mile
represents  vehicle mileage.  The emitter-category growth functions
should be  developed  using a weighted analysis where the weight for each
mileage  bin represents the total number of  vehicles in that bin.

The SAS  regression procedure REG, using the "adjusted R-squared" method,
can be used with the above equation to determine the regime growth
functions.  This method computes regression results for all possible
combinations of variables.  Seven different regression equations are
possible with this approach:

     •   linear term  only (C = D = 0) ,
     •   quadratic term only (B = D = 0) ,
         square-root  term only (B = C = 0),
     •   linear and quadratic terms  (D  = 0),
     •   linear and square-root terms  (C = 0),
     •   quadratic and square-root terms  (B  = 0), and
     •   all terms.
An alternative to the above could be the development of a step-wise
linear fit, which is similar to what was done for MOBILES.  However,
such an analysis should not be constrained to a predetermined flex point
(i.e., 50,000 miles), nor should it be constrained to only two lines.

Individual equations would be selected based on their regression
statistics; however, since the sum of emitter-group factions must equal
100%, the process and constraints outlined below must be followed.

     1. Compute emitter-category populations using the equations
        selected for each emitter group.

     2. Set any individual population fraction greater than 100% to
        100%.

     3. Set any negative individual population fraction to zero.

     4. Normalize the population fractions resulting from steps 1 and 2
        to a sum of 100%.
                                  -22-

-------
 The  adjustment process  outlined above would be done prior to comparing
 the  agreement of  the  regression results with the input data.  The set of
 equations  that provided the best match to the entire set of data would
 then be  selected.
Treatment  of Light-Duty Trucks

Light-duty-truck basic emission rate equations for EPA's MOBILE and
CARB's EMFAC models have historically been based primarily on passenger
car data.  That is because there have not been enough emissions data
collected on light-duty trucks to support an independent analysis.  In
the past, light-duty-truck emission rates have been determined by first
evaluating passenger car emission rates by technology type (e.g.,
carbureted versus fuel-injected) and then calculating the light-duty-
truck, model-year-specific emission rates by weighting the technology-
specific passenger car rates by the expected technology mix for light-
duty trucks.   Since the introduction of new technology on light-duty
trucks has generally lagged passenger cars by a few years (at least for
pre-1990 model year vehicles), the basic emission rates for light-duty
trucks were  higher than for passenger cars.  In addition,  adjustments
were also applied to account for the fact that light-duty trucks are
certified to less stringent numerical emission standards.   This latter
adjustment is  typically performed by applying a ratio of emission
standards to the passenger car basic emission rate zero-mile level,
while the deterioration rate is left unchanged.

There is concern that the above approach may understate emissions from
light-duty trucks because deterioration rates  (or,  more properly, the
growth rate  of high-emitting vehicles)  are based on passenger cars,
which are generally subjected to a less severe duty cycle than light-
duty trucks.  For that reason, we recommend that light-duty-truck
emission rates be determined independently from passenger cars,  using
IM240 data collected from light-duty trucks.   The data to do this will
be available within the next few years as IM240 programs are implemented
in various communities.  It is unclear that the IM240-to-FTP conversion
procedure would need to be tailored specifically for light-duty trucks,
but certainly the emitter category emission rates and growth functions
(input to the TECHS model)  developed from the simulated FTP scores
should be based on light-duty-truck data.  As a short-term alternative,
IM240 data from Arizona,  Colorado,  and Maine should be reviewed to
determine if there is a significant difference in emissions
deterioration between cars and light trucks.   If there is,  then scaling
factors,  which are a function of vehicle mileage,  could be developed and
applied to the light-duty-truck basic emission rates (if they have been
based on passenger car data).
                                   ###
                                  -23-

-------
            4.  ADJUSTING I/M  DATA  TO A NON-I/M BASIS
 In  the  future,  there  are  likely  to be considerable  IM240 data made
 available  for  emission  factor development.  Obviously, one source of
 those data is  I/M programs  that  are using the IM240 procedure.  However,
 EPA's I/M  rule requires a program effectiveness evaluation that includes
 IM240 testing-on a minimum  of 0.1% of the vehicle fleet.  Thus, programs
 not running the IM240 as  part of their standard test protocol will be
 required to collect IM240 data on at least some vehicles.

 One shortfall  related to  the use of state-generated IM240 data is that
 the data are from a fleet of vehicles subject to I/M," while the basic
 emission rates  used in  the  MOBILE model reflect a non-I/M condition.
 (I/M benefits  are determined in MOBILE based on I/M test type, test
 frequency,  compliance rate, waiver rate, etc.)  Thus, if the state-
 generated  IM240 data are  to be used in future versions of MOBILE, a
 method  to  adjust the I/M  data to a non-I/M basis is needed.

 This section presents alternative views on how to adjust IM240 data
 collected  in an I/M area  to a non-I/M basis.  Five different methods are
 discussed,  with recommendations for both short-term and long-term
 approaches.

 Use  of  Only Those Data Collected in Non-I/M Areas - One option for
 developing  non-I/M basic  emission rate equations is to continue using
 IM240 data  collected in non-I/M areas,  or in areas that are in their
 first I/M  cycle.  Non-I/M area testing could be accomplished through the
 use  of  a portable dynamometer in conjunction with a random pullover
 program.  Alternatively,  IM240 testing with a portable dynamometer could
 be  linked with annual safety inspections for areas that have those
 inspections but do not have an I/M program.   First-cycle IM240 data
 should be available in a  few areas of the country that will be starting
 up  I/M programs for the first time in the next year or two.  One source
 of  first-cycle IM240 data recently collected is Maine, which ran IM240
 tests from  July 1994 to the fall of that year.

 Although this is the preferred method of developing non-I/M emission
 rates,   it  is not a workable long-term solution.  The cost of operating a
 roving  IM240 data collection program would likely be prohibitively
 expensive,   and the availability of first-cycle I/M data will diminish in
 future years.

Use of Remote Sensing to  Develop Emitter-Category Distributions -
Although there is considerable support for remote sensing and the idea
 of determining in-use emission rates from RSD measurements is
 conceptually appealing,  there remain serious obstacles to the use of
  For areas  that are implementing I/M programs for the first time in
response to the I/M rule, the data collected in the first "cycle" could
be considered non-I/M data.  However, the majority of areas implementing
enhanced I/M programs already have some kind of I/M program in place.
                                  -24-

-------
 this  technology.   In theory,  RSD readings  collected in a  non-I/M area
 could be compared to RSD readings collected in an I/M area  (i.e.,  the
 area  in which IM240  data are  collected), and the  distribution  of
 vehicles among emitter  categories (based on RSD readings) in the I/M
 area  could be tuned  to  match  the non-I/M area distribution.  In
 practice,  there is so much  variability in  RSD measurements  (e.g.,  from
 siting differences,  equipment differences,  driver behavior, etc.)  that
 discerning a  20%  to  30% difference in emissions as a result of an I/M
 program would be  unlikely.

 For the reasons stated  above,  adjusting I/M data  to a non-I/M  basis
 using remote  sensing measurements is  not likely to provide an  acceptable
 degree of  certainty.

 Development of a  Statistical  Model Similar  to TECHS or CALIMFAC  -  One
 approach to developing  non-I/M IM240  scores  from  data collected  in an
 I/M area is to develop  a statistical  model  that accounts  for all  of the
 parameters considered in the  current  I/M models,  i.e.,  essentially run
 TECHS  or CALIMFAC  "backwards."   In this approach,  all  of  the constants
 in the emitter-category growth functions, emission rates, I/M  inspection
 and repair effectiveness, etc.  would  be viewed as  parameters that  could
 be varied  (within set bounds)  to  obtain the  best  possible agreement
 between predicted and measured emissions for  the  entire fleet.

 The optimization process  would continuously  compare the TECH predictions
 with  the actual measured emissions  for the set  of  data vehicles.   The
 parameters in  the model  would  be  adjusted,  based  on the error  in  this
 comparison, using the algorithms  of the optimization procedure.  A new
 comparison would be  made  between  measured emissions  and those  predicted
 using  the new  set of model parameters.  This  process would be  repeated
 until  a  satisfactory agreement  between predictions  and data was
 obtained.  The  optimization process for a new version of TECH  would be
 initiated by using the  parameters  from the previous version.    There are
 many different  approaches to optimization in  this  type of situation.
 The problem would be non-linear because emissions  are  the product  of
 emitter-category populations and  emissions:   the emission rate
 parameters and  the emitter-category population parameters would interact
 in multiplicative terms,  resulting  in non-linearity.  Thus,  linear
 programming (which is generally convergent) could  not be used,  and a
 non-linear optimization  technique would be necessary.

 Clearly, one of the drawbacks  to  this approach  is  that it would take a
 fairly  significant effort to develop and maintain  such a model.  Thus,
 it is unclear  that this approach  could be used  in  the short term,  and
 long-term usage would depend on EPA's commitment to support such a
model.

 Short-Term Recommendation - Continued Use of  Non-I/M Data - In the short
 term  (i.e., the next year or two),  the most reasonable approach to
developing non-I/M emission rates  is the continued  use of IM240 data
 collected in the first  cycle of I/M programs.  This  is most important
 for the development of  emitter-category growth  functions,  which really
drive overall emission deterioration rates.   For emitter-category
emission rates, the differentiation between data collected in  non-I/M
versus I/M programs  is  less important.  In fact, mixing the non-I/M and
 I/M data from Hammond would bolster the database used for MOBILES.  To
serve as a check on the growth  functions derived from the Hammond  non-
                                  -25-

-------
I/M data, IM240 data from the Maine program  (or other first-cycle
programs) could be used.

Long-Term Recommendation - Analysis of Repair Cycle Data  - Although
preferred, the continued use of IM240 data collected in non-I/M areas  is
probably not a valid long-term option for developing non-I/M emission
factors.  Because of that, a means to account for I/M effects is needed.
An alternative to RSD data analysis or a large statistical program is  to
simply calculate the repair cycle benefits observed in an operating
IM240 program, and assume that emission deterioration between repair
cycles is equal to the emission deterioration in the absence of an I/M
program.  This approach is illustrated in Figure 4-1, which shows the
classical sawtooth pattern associated with I/M test and repair.  The
data used to perform this analysis should be available as part of the
I/M program data collection responsibilities outlined in the I/M rule.
Each I/M test record must contain the vehicle identification number and
the category of test performed (i.e.,  initial test,  first retest, etc.).
Thus,  it will be possible to determine the before-repair state of
vehicles in each repair cycle (the top points of the sawtooth
illustrated in Figure 4-1) and the after-repair state of vehicles (the
bottom points of the sawtooth in Figure 4-1).  Assuming that the
deterioration observed from one cycle to the next (or one age to the
next)  is relatively independent of the I/M program,  adding the repair
benefit from the previous cycle (e.g., "A" in Figure 4-1) to the before-
repair point of the current cycle (i.e.,  the top of line "B" in Figure
4-1)  would give the non-I/M emission rate.
                               Figure 4-1
                  Effect of an I/M Program on Emissions
                      as a Function of Repair Cycle
                        468
                       Vehicle Age/Odometer
10
                                  -26-

-------
To describe  the concept,  the above discussion  focuses on average
emission rates; however,  it would also be possible  to use this approach
for developing emitter-category growth functions, or to superimpose an
emitter-category distribution associated with  each  of the A  through E
offsets on the next cycle or vehicle age.  This approach offers the
advantage of being conceptually simple, and the information  needed to
perform the  calculations  should be available with the IM240  emissions
data collected.by states  operating an IM240 test program.

A key assumption in the approach outlined above is  that emissions
deterioration  (or, similarly, the growth rate  of high-emitting vehicles)
between one  I/M cycle and the next is independent of the I/M program.
This is the  same assumption used in the current version of GARB's
CALIMFAC model,3  and was based on an analysis  of CARB's  First I/M
Evaluation Program "recapture" vehicles.   These vehicles were tested and
repaired during the program, then were returned to  the test  laboratory
after approximately six months in customer service.  This analysis
showed that,  with the exception of pre-1975 model year vehicles,  post-
repair emissions deteriorate at essentially the same rate as pre-repair
emissions in the tested vehicles.

To further validate the above assumption, an analysis of CARB's Second
I/M Evaluation Program should be performed.  (This program is commonly
referred to as the "1,100-Car Study.")   In Phase 1 of that project
(conducted from January 1991 to March 1992),  approximately 1,100
vehicles that initially failed an I/M test received an FTP before and
after repair.  Phase 2 of the project involved FTP testing of recaptured
vehicles after one year, while Phase 3  of the project involved FTP
testing of vehicles after two years (prior to  their next regularly
scheduled biennial inspection).   Approximately 750 vehicles were  tested
in Phase 2,  and 500 were tested in Phase  3.  This database represents a
fairly robust sample of between-inspection tests,  but it has never been
thoroughly analyzed for this purpose.

It may also be possible to use the approach described above  in
conjunction with the IM240 data collected as part of each state's I/M
program evaluation requirements (i.e.,  the 0.1% testing requirement in
§51.353 of the rule)  to develop non-I/M emission rates from data
collected in an I/M area.  However,  these IM240 data are supposed to be
collected at the time of initial inspection,  so after-repair data would
likely not be available for those vehicles, i.e.,  only the top points in
the "sawtooth" illustrated in Figure 4-1  would be available  for
analysis.   The bottom points of the sawtooth could be estimated based on
additional data analysis and reporting required in the I/M rule.
Section 51.353 also requires states to perform a program evaluation that
includes an assessment of the effectiveness of repairs performed on
vehicles that failed the tailpipe emission test.  Depending upon the
level of detail included in that assessment,  it may be possible to use
that evaluation to estimate the repair  benefit illustrated by the
letters A through E in Figure 4-1.
                                  -27-

-------
      5.  INCORPORATING  "OFF-CYCLE"  EMISSIONS  INTO  MOBILE
In the past four years, there has been an extensive effort on the part
of EPA and CARB to better understand in-use driving behavior.  That
effort has led to the development of alternative drive cycles that
include higher speeds and acceleration rates than are included in the
FTP.  It is generally recognized that vehicle operation under these more
severe conditions results in higher emissions than occur using the FTP.
Because the MOBILE model is based on emission data collected over the
FTP, EPA has requested an evaluation of methods that could be used to
incorporate off-cycle emissions in the next version of the MOBILE model.

A significant limitation to developing a method to account for off-cycle
emissions is the lack of data that have thus far been collected over
alternative driving cycles.   To date,  there have been two primary test
programs that have collected emissions data over alternative cycles:
(1) EPA and industry testing to support the supplemental FTP rulemaking,
and (2)  CARB "Unified Cycle" (LA92)  testing to support inventory
development.   Presented below is a brief description of these programs
and our recommendations for incorporating off-cycle emissions into the
MOBILE model.

Supplemental FTP Rulemaking - In February of .this year,  EPA published a
Notice of Proposed Rulemaking (NPRM)  recommending revisions to the FTP.
That rule would require vehicle manufacturers to conduct a Supplemental
Federal Test Procedure (SFTP)  which includes three new driving cycles
(or "bags")  to control emissions during air conditioning usage,
intermediate soak times and vehicle start-up,  and aggressive driving.
Only the effects of vehicle start-up and aggressive (or off-cycle)
driving are being considered in this Work Assignment.

Under contract to EPA and CARB,  Sierra has developed a number of
different driving cycles from instrumented vehicle data collected in
Baltimore and chase car data collected in Los Angeles.  Those cycles
include:

        a start cycle ("ST01")  that is representative of the first four
        minutes of vehicle operation;

        an aggressive driving cycle ("REP05")  that reflects speeds and
        accelerations not covered by the LA4 cycle; and

     •  a remnant ("REM01")  cycle,  which is intended to represent the
        balance of in-use driving not already covered by the ST01 and
        REP05.
In addition,  two "composite" cycles have been developed that capture the
range of speed and acceleration events observed in the drive cycle
databases - the EPA Composite cycle (based on Baltimore and Los Angeles
data) and the LA92 cycle (based on only Los Angeles data).   Ideally,
                                  -28-

-------
 proper  weighting  of  the  ST01,  REP05,  and  REM01  cycles  would  result  in
 equivalency  with  the EPA Composite  cycle.

 During  the development of the  SFTP  rulemaking,  EPA  tested  eight  well-
 maintained 1991-1993 model  year  vehicles  over the FTP,  ST01,  REP05,  and
 REM01 cycles.   These vehicles  were  also tested  on two  driving cycles
 that represented  extreme acceleration and speed profiles.  One of those
 cycles  was developed by  EPA/industry  ("HL07"),  and  one was developed by
 GARB  ("ARB02").   By  weighting  the ST01, REP05,  and  REM01 cycles
 according to the  fraction of VMT represented by these  cycles,  EPA found
 that emissions  increased by 0.04 g/mi NMHC, 2.8 g/mi CO, and 0.08 g/mi
 NOx relative to the  hot  FTP results.   (The average  hot FTP emission  rate
 for these vehicles was 0.04 g/mi HC,  1.6  g/mi CO, and  0.19 g/mi  NOx.)

 Following the completion of EPA's testing, auto manufacturers  sponsored
 an emission  test  program.   That  effort consisted of 26  late-model
 vehicles that were tested on the FTP,  REP05, HL07,  and ARB02.  Little
 testing was  conducted on the REM01  since  the focus  of  the EPA/industry
 effort was on developing a  control  cycle  (i.e.,  certification),  and  the
 REM01 cycle  was thought  of  as  an inventory cycle.

 Based on the results  of  the above test programs, a  high-speed/load
 transient control cycle  was developed (termed "US06")  which  is a
 600-second test comprised of segments  of  the REP05  and  the ARB02 cycles.
 It should be noted that  this cycle  was developed with  the intent of
 controlling  emissions from  aggressive  driving and transient  operation.
 It was not developed  for the purpose  of evaluating  in-use emissions.

 CARS Unified Cycle Test  Program  - In  the  summer of  1992, Sierra recorded
 in-use speed-time profiles  of  randomly selected vehicles that were
 followed by  a chase  car.   During this  chase car study,  which was
 sponsored by CARB, data  were collected over a mix of road routes
 designed to  represent all travel occurring in the Los Angeles area.
 These data were then used to develop  a "composite" driving cycle (the
 LA92 cycle)   designed  to  match  the overall speed-acceleration
 distribution observed in the Los Angeles data set.  To date,  CARB has
 performed FTP and LA92' emission  tests on  roughly  250 vehicles during
 two separate test programs.  As part  of CARB's 12th In-Use Surveillance
 Program, 170 1983 and later model year vehicles were tested over the
 LA92 cycle.   In addition, CARB conducted a special  test program that ran
 from late 1993 to mid-1994  in which 80 1971 and later model year
 vehicles were tested on  the LA92 and  the FTP.   Clearly, the CARB testing
  CARB performs the LA92 emission test in a manner similar to the FTP.
The test begins with a cold start, and emissions from the first 300
seconds of the cycle are collected in bag 1.  Emissions from the
remainder of the LA92 cycle are collected in bag 2.  The vehicle is then
allowed to soak with the engine off for 10 minutes, and the first 300
seconds of the LA92 are re-run, comprising bag 3 of the test.  CARB
computes a composite LA92 emission rate by assuming 43% of starts are
cold starts and 57% of starts are hot starts..  This is the same approach
used to compute a weighted FTP score.  However, because bags 1 and 3 of
the LA92 test are much shorter than bags 1 and 3 of the FTP  (1.2 versus
3.6 miles) and bag 2 of the LA92 is longer than bag 2 of the FTP  (8.6
versus 3.9 miles), the factors used to weight each bag's q/mi emission
rate are much different for the FTP and the LA92.
                                  -29-

-------
offers  a much  larger and more representative database  from which to
develop off-cycle  corrections than does EPA's SFTP program.

In addition  to the data already collected, CARS is planning in the next
year  to test 75 vehicles over the FTP, the LA92, and eight different
speed cycles developed from the Los Angeles chase car  data.  CARB also
has plans to test  approximately 250 vehicles over the  FTP and LA92
cycles  in its  next in-use surveillance project.  Many  of these data are
likely  be available prior to the next major release of the MOBILE model.

Recommended Approach - Although it may be tempting to  rely on the data
collected as part  of the SFTP development process to make adjustments to
MOBILE  for off-cycle effects, there are a number of problems associated
with  the use of those data.  First, only eight vehicles have been tested
over  the full  complement of cycles thought to capture  start-up and off-
cycle events.  Although the industry data were more robust in terms of
the number of  vehicles tested, those vehicles were tested only on the
FTP,  REP05, HL07,  and ARB02 cycles.  Additionally, SFTP data collected
in the  future  will likely be over the US06 cycle,  which is a combination
of the  REP05 and ARB02 cycles and was not designed to  represent in-use
driving.  The  use  of the US06 data for in-use emission estimates would
require some kind  of correlation or adjustment to get  the data on a
REP05-cycle basis.  Any adjustment of that kind would  introduce
additional uncertainty into the results.  Finally, and most importantly,
simply weighting the ST01,  REP05,  and REM01 does not best reflect the
proper mix of  speed and acceleration observed in the chase car and
instrumented vehicle databases.   That mix is better represented by one
of the composite cycles developed by Sierra.

Because of the data deficiencies in the SFTP test program,  it is much
more appropriate to develop off-cycle corrections from the LA92
emissions data collected by CARB for the purposes of estimating in-use
emissions.   Since  the LA92  cycle matches the acceleration/speed profiles
from all in-use vehicle operation (at least for Los Angeles),  a ratio of
the LA92 results to the corresponding FTP results will provide a good
indication of  the  emissions increase associated with off-cycle events.
Although it could be argued that the use of a data set developed with
the EPA Composite cycle (which incorporated in-use driving patterns in
Baltimore and Los Angeles)  would be more appropriate,  sufficient data
are not available  to characterize vehicle operation and emissions over
this cycle.

In terms of developing an off-cycle correction factor,  the CARB data
would allow a reasonable accounting for possible emission differences by
technology.  The 1993-94 special test program included 12 pre-1975 model
year vehicles,  31  1975 to 1980 model year vehicles,  and 37 1981 to 1992
model year vehicles.   All of the vehicles tested over the LA92 in the
12th Surveillance  Program (approximately 170)  were from the 1983 and
later model years.  With LA92 and FTP tests conducted on slightly over
200 1981 and later model year vehicles,  it would be possible to
investigate differences by fuel delivery technology and perhaps by
emitter category.  This approach will become more attractive as
additional data are collected by CARB.

Depending on the way in which start emissions are treated in the next
version of MOBILE, the actual development of vehicle start-up and off-
cycle correction factors could be performed in a number of different
ways.   For example, if start emissions are separated from running
                                  -30-

-------
 emissions,  then  bag  1  of  the  LA92  could  be  correlated with  bag  1  of  the
 FTP  (e.g.,  through a regression  analysis).   This  is  particularly
 important  since  bag  1  of  the  LA92  is much more  reflective of vehicle
 start-up operation than the FTP  bag 1.   If  start  emissions  are  treated
 as an  offset,  the difference  between bag 1  and  bag 3 of  the LA92  could
 be compared to the difference between bag 1  and bag  3 of the FTP.
 Alternatively, it may  be  desirable to determine start emissions directly
 from bag 1  (or bag 1 - bag 3) of the LA92 data  without considering the
 FTP data.

 Hot stabilized emissions  can  be  corrected for off-cycle  events by
 comparing a combination of bags  2 and 3  of  the  LA92  cycle (i.e., a "hot
 LA92") to a combination of bags  2 and 3  of  the  FTP.'   A correction
 factor can  be  developed by taking a simple  ratio  of  the  LA92 results to
 the FTP results  or through a  regression  analysis.  As with  the start
 correction,  the  data should be segregated by technology and perhaps
 emitter category (i.e., "normal" versus  "high"  emitters).   Although  this
 adjustment  inherently  includes a speed correction, the principal
 adjustment  is  to account  for  the failure of. the FTP  to adequately cover
 the full range of speeds  and  accelerations occurring in customer
 service.
                                   ###
  In  our  opinion,  the combination  of  bags  2  and 3  is  more  representative
of stabilized operation than bag 2 alone.
                                  -31-

-------
         6.  USE  OF STATE-GENERATED IM240 DATA  IN MOBILE
This Work Assignment also called for a review of methodologies that
states could use to develop locality-specific basic emission rates for
use in MOBILE.  The development of locality-specific basic emission
rates has the obvious advantage of allowing an area to accurately
represent its fleet of light-duty vehicles, while minimizing the
reliance on certain relations in the MOBILE (and TECH) models that have
been developed with the intent of reflecting national averages.   In
addition, developing locality-specific emission rates has the potential
to better reflect the impact of a particular I/M program on vehicular
emissions.  However, a number of issues must be considered in order to
have confidence that the local predictions are more representative of an
area than the estimates obtained by simply running MOBILE.

There are two steps involved in developing basic emission rate equations
from state-generated IM240 data.  First,  the IM240 data need to be
collected and converted to an FTP basis.   Second,  the simulated FTP
scores need to be analyzed to develop basic emission rate equations.
Although most of the procedures that would have to be followed to
generate locality-specific emission rates from IM240 data have already
been discussed in this document, this section reviews the areas  of
particular importance that would have to be considered in such an
analysis.


Data Collection and Development of Simulated FTP  Scores

As discussed previously,  there are potentially two sources  of IM240 data
that will be available from which to develop basic emission rate
equations.  Obviously,  if a state has an I/M program based on IM240
exhaust measurements,  the data collected in the program can be used.
For states not conducting IM240 testing as part of the standard I/M
program,  IM240 data will be available from the program evaluation
requirements in Section 51.353 of the I/M rule (i.e.,  0.1%  of the
subject vehicle fleet must be tested each year over the IM240 cycle or
another transient mass emission test approved as equivalent).   Although
ready access to these data makes the development of locality-specific
emission rates attractive,  there are a number of additional pieces of
data and test requirements that would be needed before the  data could be
used to develop simulated FTP scores.

Because it is unlikely that states would have the resources to conduct
FTP tests on a subset of vehicles tested at the I/M lane,  EPA-generated
IM240-to-FTP correlation equations would have to be used.   Since those
correlations are based on IM240 tests conducted in a laboratory
environment on a standard test fuel (i.e.,  Indolene),  the state-
collected IM240 data would have to be adjusted to reflect a standard
fuel and temperature.   To ensure that proper data are available  to
correctly predict FTP-based emission rates from IM240 data,  states
should be required to collect a number of pieces of information related
                                  -32-

-------
 to  the  fleet  of  vehicles  they  intend  to  use  for  emission  factor
 development.   This  includes  the  following:

        Ambient  temperature  should  be recorded for  all  vehicles  tested.
        The maximum and minimum  daily temperatures  should also be
        recorded.

        Fuel  samples should  be collected and analyzed for a  subset  of
        vehicles included in the database.
By analyzing the  test  temperature and  fuel parameters, a determination
can be made as to whether RVP/temperature interactions  (which may lead
to excessive purge) are having an inordinate influence on the results.
Those test reco.rds that are outside of a predetermined RVP/temperature
window could be excluded from the analysis.  In addition, the analysis
of other fuel parameters (e.g., oxygenates, aromatics, sulfur) might
allow base fuel/Indolene correction factors to be developed from the
reformulated gasoline  Complex Model (or from the data that were used to
formulate the Complex  Model, with correlations developed based on Bag 3
of the FTP).

In terms of data collection efforts, several other issues would need to
be considered by the states.  First, only full IM240 tests should be
used for emission factor development.   With the implementation of fast-
pass and fast-fail algorithms, it is unclear how many vehicles will
receive a full IM240 test in an operating program and whether those
vehicles that do receive a full IM240 will be representative of the
entire fleet.   Thus,  we recommend a means to ensure that vehicles
selected for emission  factor development be chosen at random  (e.g.,
every 20th vehicle tested at the lane) and identified as emission factor
vehicles.  In addition, those vehicles should be tested over the
complete IM240 cycle regardless of whether they pass or fail the fast-
pass or fast-fail cutpoints.  If this procedure is not followed,  very
clean and very dirty vehicles will not be properly represented in the
emission factor data set.   Second, vehicles selected for emission factor
development should be  run over a short preconditioning cycle prior to
the IM240 test (e.g.,   two to three minutes at 40 mph).  This would help
ensure that vehicles that may have cooled off in the queue are back up
to operating temperature before being tested.  Finally,  information on
technology type (e.g.,  carbureted, throttle-body injection,  multipoint
injection;  open-loop,   closed-loop) would be needed if the IM240 data
gathered in the program are to be used to forecast emissions.   It may be
possible to determine  technology type with a VIN decoding routine;'
however,  if states do  not have access to such a program, then technology
  The  I/M rule  requires VINs to be recorded for each I/M test record.
For 17-character VINs  (which have been the standard since the early
1980s),  the 9th character represents the "check digit," which is
intended to verify the accuracy of the VIN.  The check digit is
determined by a mathematical routine in which each VIN character is
assigned a number, which is then multiplied by a preset value based on
its position in the VIN.  These products are then summed and divided by
11, and the remainder represents the VIN check digit.  To ensure the
accuracy of the VINs collected in I/M lanes, it is recommended that an
electronic cleaning routine be used to verify the VIN check digit.
                                  -33-

-------
information would  also  have to be recorded for each vehicle used for
emission  factor  development.


Development  of  Basic  Emission  Rates from Simulated FTP
Scores

Once the  IM240 data  are converted to an FTP basis,  emission rate
equations can be developed.   If only the current year rates are desired,
the analytical technique  would be fairly straightforward.   The data
would first be sorted by  vehicle type (i.e.,  car versus truck)  and model
year.  Next, the average  pre-inspection emission rate would be
determined.   (If the data are from an operating IM240 program,  there
should be a field  indicating  whether the test is a  baseline or retest;
IM240 data collected as part  of the  0.1% requirement are supposed to
reflect emissions  immediately prior  to inspection.)   The effect of the
I/M program on each  model-year emission rate would  then be estimated
based on  the fraction of  failures and the benefit of repair (taking into
account waivered vehicles).   The repair benefit could be determined from
an analysis of pre-repair and post-repair I/M data  which should be
available from the repair effectiveness analysis required in the I/M
rule.  Finally,  the  model-year-specific emission rates would be
determined from  the  pre-inspection emission rates and the after-repair
emission  rates based on whether the  I/M program in  place is an annual
program or a biennial program.   For  an annual program,  the model-year-
specific  emission rate  would  be calculated as follows:

     ERAA    = Fractionpass  * ERPre_pass
              +  (l-FractionPass) *  (ERPre_Fail + ERA£.er_Rep)/2

where:

     ERM  = Annual  average emission rate,
     ERpre
         =  nnu    v                    ,
         .pass = Pre-inspection emission rate for passing vehicles,
        re.Fail = Pre-inspection emission rate for failing vehicles,
       Af;er_Rep  = After-repair emission rate,  and
     FractionPass  =  Fraction of vehicles passing the inspection.
     ^Rpre-Fail
     ER
For a biennial program, the following  equation would be  used:

     ERBA    = Fractionlnspecced * ERM
              +  (l-FractionInspecced)  * ERPre_A11


where ERBA is the model-year-specific emission rate for a biennial
program, ER^ is  the  annual  average rate defined  above,  Fractionlnspect:ed is
the fraction of  vehicles inspected in  a given year (i.e.,  50%  in a
perfectly biennial program) ,  and ERPre_A11 is  the prs-inspection emission
rate of all vehicles.

The method described above provides  only the  mean  model-year emission
rates for one calendar year.  Thus,  if the  data  were collected in 1995,
the model-year-specific emission rates could  only  be used in conjunction
with MOBILE to develop a calendar  year 1995 emission estimate  (i.e.,  by
inputting the model-year rates as  zero-mile levels and specifying zero
for deterioration rates).  Clearly,  not being able to forecast emissions
is a significant shortcoming  of the  above approach,  and  a means to
                                   -34-

-------
develop emission rates described by a  zero-mile level and a
deterioration rate is needed.

To develop emission factors that can be used with MOBILE to forecast
emissions, a method similar to that described above could be used.
First, the simulated FTP data would be sorted by vehicle type, age  (or
odometer), and technology  (e.g., carbureted, throttle-body injection,
multipoint injection).  Pre-inspection and after-repair emission rates
would then be determined by vehicle age  (or odometer), which would
result in a plot similar to that illustrated in Figure 4-1 for each
technology.  A single emission value would be determined for each
vehicle age by weighting the pre-inspection and after-repair points
based on whether the I/M program in effect has an annual or biennial
inspection frequency.  A regression analysis would then be performed on
these points to develop zero-mile levels and deterioration rates as a
function of vehicle technology.  Next,  model-year emission factors would
be calculated by weighting the technology-specific rates by the mix of
those technologies observed in the fleet.  In performing this analysis,
care would have to be taken to ensure that the vehicles included in
calculations had been certified to the same emission levels.   To account
for future emission standards,  the zero-mile levels would be adjusted by
the ratio of future-to-current standards.  (Deterioration rates would
remain unchanged,  as they represent the I/M program in effect.)  Note
that this approach provides a future inventory that includes the impact
of an I/M program,  but it does not predict the benefit of the I/M
program.

Once the model-year zero-mile levels and deterioration rates are
determined,  they can be input to MOBILE (as user-input emission rates)
and the model run.   Note that since these rates already account for the
presence of the I/M program,  the I/M options in MOBILE would not be
invoked.
Summary

Although the potential exists for states to develop locality-specific
basic emission rates from IM240 data collected as part of an operating
I/M program or the program evaluation requirements of the I/M rule,  it
is unclear how many states will attempt to do this.  This judgment is
based primarily on the following two factors.

        At this time, it appears that only a small number of states are
        likely to include IM240 testing in their enhanced I/M programs;
        thus, available IM240 data will come from the program
        evaluation requirements (i.e.,  0.1% of the subject fleet must
        be tested).   This results in a much smaller number of test
        records upon which to perform the analyses described above.

        Based on the information presented above, a significant
        investment in time and resources will be required on the part
        of states to develop basic emission rate equations from IM240
        data.
                                   ###
                                  -35-

-------
                            7.  REFERENCES
1.   "Investigation of MOBILESa Emission Factors: Evaluation of IM240-
     to-FTP Correlation and Base Emission Rate Equations," Prepared by
     Sierra Research for the American Petroleum Institute, API
     Publication Number 4605,  June 1994.

2.   "Investigation of MOBILESa Emission Factors: Assessment of Exhaust
     and Nonexhaust Emission Factor Methodologies and Oxygenate
     Effects," Prepared by Systems Application International for the
     American Petroleum Institute, API Publication Number 4603,  June
     1994.

3.   "Development of the CALIMFAC California I/M Benefits Model,"
     Prepared by Sierra Research for the California Air Resources
     Board,  Report No. SR-91-01-01,  January 1991.
                                  ###
                                  -36-

-------
                              APPENDIX A

    EIRG'S RESPONSES TO QUESTIONNAIRE  ON DEVELOPING BASIC
                  EMISSION RATES FROM IM240 DATA
                        Database  Adjustments
Weight Foreign Manufacturers - Because the vehicles tested in Hammond
did not accurately reflect the national average fraction of foreign
vehicles, each foreign vehicle in the database was counted 2 to 4 times.

Strengths:

1.   Accounts for fleet mix biases in testing areas.

2.   Foreign vehicles generally have a much lower DF  than domestic.

3.   Accounts for under-represented manufacturers.   Comparison with the
     non-weighted results indicated a net increase in non-normals,
     which was most pronounced for carbureted closed-loop technology.

4.   Important to account for foreign/domestic split  because of
     differences in quality,  durability,  etc.


Weaknesses:

1.   No area using MOBILE will match the  assumed national average
     fleet,  and there is no way to account for this.

2.   Variations among engine families are just as significant as
     foreign/domestic.

3.   Has it been established that foreign vehicles  are a bias?  If  it
     is important,  why not treat those vehicles as  a  separate
     technology group.   What about displacement,  mileage,  etc?  Seems
     like an arbitrary adjustment.

4.   Method used may be based on a poor sample and  not reflect
     representative mix of foreign vehicles.
Al terna ti ves:

1.   This is a second-order effect - ignore it.

2.   Predict base emission rates on a manufacturer/model year basis.

3.   Develop emission factors separately for foreign/domestic and allow
     users the option to input that parameter.
                                  A-l

-------
 4.   More  sophisticated  analysis  by  engine  family  or by  groups  of
     engine  families.

 5.   This  should not be  a problem for  tech-group-specific analyses.

 6.   Whether this  correction  is applied depends on how significant any
     technology or durability differences are.  Is foreign vs.  domestic
     enough,  or should all  individual  manufacturers be weighted.  If
     differences are significant,  use  sampling theory to pick the
     optimum sample for  desired weighting.

 7.   Develop a method to check representativeness  of the available
     foreign data, e.g., technology, manufacturer, age.  Compare with a
     more  robust sample  in  a  more  representative area and then  modify
     the weighting factors.


 Missing or Suspicious Mileage - A  number of vehicles in the Hammond
 database had  '0' or missing mileage and were deleted.  In addition,
 vehicles that were coded as having an  odometer reading > 300,000 miles
 were deleted.

 Strengths:

 1.   Removes bad data that could incorrectly influence emissions vs.
     mileage regressions, particularly for zero mileage.

 2.   Prevents compromising odometer-based relationships.

 3.   Avoids large  statistical impacts  from inclusion of extreme and
     likely erroneous mileages.

 4.   Obvious way of screening questionable data.


 Weaknesses:

 I.   Some suspicious data could be valid data points.

2.   Eliminating records reduces sample size.

3.   High-mileage,  poor  condition vehicles may be more likely to have
     odometer problems.

4.   Deletes valid data.

5.   May remove actual high-mileage vehicles, which are badly needed in
     the database.

6.   Limits sample size.
                                  A-2

-------
A1 terna ti ves:

1.   Group vehicles according  to  age  for  some  statistics  (e.g.,
     fraction  of high  emitters  at 5 years vs.  50,000 miles).   In  this
     way, incorrect mileages are  not  an issue.

2.   Compare odometer-based relationships with these vehicles
     classified by the mean mileage for that model year  (i.e., assign
     to  them the mean mileage by  model year or age).

3 .   Leave vehicles in the database and assign to them the average
     mileage of the remaining vehicles.   This  does not affect  the slope
     of  the regressions, just the y-intercept.

4.    Use an age-odometer algorithm to identify suspected erroneous
     data.  This works for both high  and  low mileage vehicles.

5.    Generate mileage as a function of age, but consider each  year's
     distribution of travel.  Look at that distribution in the
     database; if it is OK (i.e.,  not too wide), take mean values as a
     function of age and use that.


Seasonal Outliers - Data collected on 14  test  dates in March and April
when the ambient temperature was  25°F or more above the monthly average
were deleted because many of those vehicles were statistical outliers.
(Excessive purge was thought to be influencing the IM240 results.)

Strengths -.

1.    Excessive purge is added separately  in MOBILE (i.e., through
     temperature/RVP corrections)  and must be  eliminated from  data used
     to estimate base emission rates.

2.    Excessive purge is a problem  at high RVP  and temperature; this
     method solves it.

3.    Rejection of statistical outliers with data errors enhances the
     value of the database.

4.    Solves the problem of excessive purge effects.
Weaknesses \

1.   Reduces sample size.

2.   Deletes valid data, particularly when temperatures are high in the
     spring or fall.  High ozone episodes can occur during these times
     and improved emission factors under these conditions are worth
     some effort to develop.

3.   True high emitters may be deleted.
                                   A-3

-------
4.   Rejection of true outliers reduces the accuracy of the database.

5.   If it happened 14 out of 60 days, are these true outliers?

6.   This type of activity occurs in the real world.  How do the models
     account for this effect?


Alternatives:

1.   If sample is large enough,  only use data collected within the FTP
     temperature range for the IM240/FTP correlation and base emission
     rates.  IM240 data at different temperatures could be used to
     develop temperature/fuel correction factors.

2.   If this is a real problem in the spring and fall,  perhaps there
     should be a correction factor.

3.   Determine if purge is higher during these times than on hot mid-
     summer days.  Estimate vapor generation during running conditions
     and diurnals under both conditions using actual temperatures and
     estimates of local RVP.   If spring/fall vapor generation (and thus
     purge) is relatively high,  then fuel RVP is likely an important
     factor to include in the analysis (along with temperature).   If
     both seasons show similar vapor generation levels,  retain data and
     correct for temperature.

4.   Temperature-correct the outliers to see if that gives more
     realistic results.

5.   Data rejection should be based on a combination of statistical
     plus engineering analysis.   The procedure of rejecting all data
     when performance problems are suspected is a good one.   The long-
     range goal of IM240/FTP correlation will probably require some
     temperature correction correlation.   The problem of excessive
     purge may be reduced with ORVR-sized canisters (or increased if
     the vehicle was just refueled).

6.   Develop seasonal emission factors (i.e.,  summer,  winter,
     spring/fall) based on temperatures and fuels reflective of those
     seasons.

7.   Set some  test temperature range for each RVP "season" within which
     data are  used for FTP correlations.

8.   Rather than deleting data out of hand,  compute the
     temperature/fuel impact and use the results to validate the
     performance of the model.
                                  A-4

-------
                   Fuel/Temperature Adjustments


Because EPA wished to develop the IM240-to-FTP correlations based on
vehicles IM240 tested in a laboratory with Indolene, a method was needed
to account for the differences between the lane and the lab before the
correlation equations were applied to the Hammond lane IM240 data.  For
the Hammond database, it was felt that those differences were primarily
related to tank fuel versus Indolene and the temperature differences
occurring between the lane and the lab.   (However, a number of other
differences could also impact test variability between the lane and the
lab,  e.g.,  vehicle preconditioning procedures,  inconsistent dynamometer
settings,  how well the IM240 speed-time trace is followed, etc.)

The fuel/temperature adjustments prepared for MOBILE5  were based on a
subset of the Hammond vehicles that were tested at the lane on tank fuel
and at the lab on Indolene.  Adjustment factors were developed by season
(i.e., March-April,  May-June,  July-September, and October-February) and
the following emitter categories:

        Normal HC/CO - lane IM240 s 1.64 g/mi HC and <; 13.6 g/mi CO,
     •  High HC/CO - lane IM240 > 1.64 g/mi HC or > 13.6 g/mi CO,
        Normal NOx -' lane IM240 <, 2.0 g/mi NOx,  and
        High NOx - lane IM240 > 2.0 g/mi NOx.

Once the data were segregated as outlined above,  the mean emission
levels for the lane/tank fuel scores  and the lab/Indolene scores  were
determined.   Adjustment factors were  then developed from the ratio of
these mean values.

Strengths:

1.    Since MOBILE adjusts for fuel and temperature separately,  the base
     emission rates must be adjusted  to FTP conditions.   This approach
     is simple and easy to understand.

2.    At least some accounting for major differences.

3.    Only compares large sets of data.   Test-to-test and vehicle-to-
     vehicle variability is reduced.

4.    Any adjustment that accounts for variation due to external factors
     is helpful in the overall correlation.

5.    Accounts for differences between the lane and the lab.

6.    Some accounting for seasonal impacts.
Weaknesses -.

I.    The two-step adjustment adds uncertainty.   A simple adjustment may
     not be appropriate.  The emission groupings were not chosen for
     best results.  No technology groupings.
                                  A-5

-------
2.   Two-variable analysis may explain only part of the difference on
     specific vehicles.

3.   Includes possible offset in lab and lane measurements.  Merges
     fuel and temperature effects, when temperature is known and fuel
     specifications  (at  the  lane) are not.  Fuel effects are
     sufficiently difficult  to assess in controlled experiments with
     multiple tests on repeatable vehicles.  Probably impossible to
     determine under the test conditions existing here.

4.   One set of factors are  used to correct for fuel/temperature when
     going from lane IM240 to lab IM240, and a different set of factors
     when going from lab FTP to real-world FTP.  Shouldn't factors be
     similar to fuel/temperature factors for FTP bag 3?

5.   Was this part of a comprehensive study to determine the effects of
     different external variables?

6.   Data from Hammond were  extremely variable and not all of that
     variability could be reasonably explained by temperature and fuel
     effects (e.g.,  20% of the vehicles had lane/tank fuel IM240s and
     lab/Indolene IM240s for HC and/or CO differ by more than 3 times).

7.   These adjustments appear trivial considering a) cloning of foreign
     vehicles,  b)  using an average "X" in the regression equation,
     c) the "X" is of questionable merit,  d)  log space was used,  and
     e) residuals are applied.

8.   Not clear how the other differences - vehicle preconditioning,
     etc.  - are accounted for when analyzing test results.   Also,
     average temperature may mask the effect of unusual swings.
A1 terna ti ves:

1.   Fuel samples would aid in the adjustment and allow comparisons
     between fuels and temperature.

2.   If sample size allows, choosing only records that are similar to
     FTP conditions may make the adjustment less important.

3.   Possibly develop new emitter groupings or technology groupings.

4.   Develop a more sophisticated multivariate analysis of differences
     using an engineering model.

5.   Quantify temperature effect independently from fuel effect  (i.e.,
     IM240 versus temperature correlation).   Use measured ambient
     temperatures at the time of the IM240 test.  Do not segregate by
     season, given use of actual test temperature.  Fuel-related
     reasons for segregating by season appear to be weak.  While
     volatility changes with season, so does average temperature, in a
     compensating manner.  Largest volatility-related effects will be
     on cold/hot days within a season,  not across seasons.  Given no
     knowledge of the lane fuel parameters,  the fuel effect will be
                                   A-6

-------
     part of the constant in the temperature regression.  If, on
     average, in-use fuel is somewhat  "dirtier" than Indolene, then the
     lane emissions will generally be  higher than the lab measurements,
     temperature effects aside.

6.   Should try to do a statistical analysis to determine the
     significance of all possible external variables on the final
     correlation, then account for those variables vith the
     statistically significant impacts.

7.   Correlate FTP directly with lane  IM240 scores,  using only those
     conditions  (i.e.,  temperature and fuel) that reasonably match the
     FTP.

8.   Some accounting for inconsistent preconditioning should be
     considered.   For example,  look at the IM240 bag 1 vs bag 2 scores
     and possibly delete record if difference is outside a pre-
     determined window.  Alternatively, compare lane bag results to lab
     bag results.  If the difference is large in bag 1 but not bag 2,  a
     preconditioning problem could have existed.

9.   Use MOBILE temperature and RVP correction factors (for bag 3 or a
     combination of bags 2 and 3)  to adjust the lane scores to an FTP
     temperature and Indolene basis,  or at least use this information
     as a reality check on the factors developed with the test data.
     The MOBILE approach (or similar temperature/fuel factors developed
     specifically from IM240 tests)  could possibly also be used in a
     state-based IM240-to-FTP analysis.  (It is unlikely that states
     would have the resources to run the lab/Indolene IM240s for
     generating their own fuel/temperature corrections.)

10.   Split the data into more temperature-specific regimes based on
     values recorded each day (i.e.,  look at weather data) and base the
     corrections on the temperature regimes.
                                  A-7

-------
                     IM240-to-FTP  Correlations
Once the Hammond lane IM240 data were adjusted  to a  lab/Indolene basis,
correlation equations relating the  IM240 to  the FTP  were applied to  the
data.  The IM240-to-FTP correlations were based on a regression analysis
of data collected from vehicles tested over  the IM240 on Indolene and
the FTP on Indolene.  (The database used for the correlation analysis
included vehicles from the Hammond program as well as vehicles tested in
Ann Arbor.)  The regressions were performed  according to the following
model year groups and technology types:

     •  1981-1982,
        1981+ open-loop,
     •  1983+ carbureted/closed-loop,
     •  1983+ throttle-body injection/closed-loop, and
     •  1983+ multipoint fuel-injection/closed-loop.


The HC and CO correlations were performed in log space with a cold start
offset ("X" in the equation below)  that varied  by technology, while  the
NOx correlations were based on a simple linear  equation without a cold
start offset value:

     •  Log10(FTPHC/co - X) = b + m*Log10(IM240HC/co)
     •  FTPNOx = b +  m*IM240NOx


For cases in which (FTPHC/CO - X)  < 0.01,  the IM240 score was substituted
for (FTPHC/CO - X) .   In this way,  errors resulting from taking the
logarithm of a negative number were avoided.   (A discussion of the cold
start offset is included in the next section.)

Strengths:

1.   For an average,  these correlations should  provide a good estimate
     of average FTP emissions.

2.   Cold start emission excess could be unrelated to hot start
     emissions.   Any relationship between hot and cold emissions will
     automatically be included in the slope.

3.   Use of different technology group regressions for limited number
     of groups is a good balance between sample size and accounting  for
     different vehicles.

4.   This is a relatively straightforward and easy technique.

5.   It's slick and simplifies the use of the data.
                                  A-8

-------
Weaknesses:

1.   To  the extent  that  individual predicted  FTP values  are used,  these
     correlations are only good  for averages.  These  technology
     groupings were not  chosen for the best correlations.  One fit was
     used  for all emitter groups.

2.   Non-linear relationships were not investigated.

3.   Unclear what analyses were  performed  to  decide on logarithmic
     relationship for HC and CO  and the  linear relationship for NOx.

4.   The log-based equation is equivalent  to:

        FTPHC/CO = X + 10b[IM240HC/co]m

     Is this a realistic regression?   (For b=0 and m=l,  it gives a
     simple regression.)

5.   Has it been established that disaggregation by technology
     groupings is justified?

6.   There is really no connection between the IM240 and cold start.
     Cold start should be directly calculated from FTP data.

7.   This method implies that the IM240  is being defined as equivalent
     to a  "no-start" FTP, and there is no basis for this.  The fact
     that FTP-X can be negative bears this out.

8.   Calculating X from mean[FTP - IM240] implicitly assumes that the
     IM240 is equal to a "hot FTP."  Is  this reasonable?

9.   It is not at all clear that X does  a good job of accounting for
     the cold start offset.   The fact that there were problems with
     negative numbers suggests that it did not.
Al terna ti ves:

1.   Develop multiple correlations separately for emitter groups,
     possibly for new technology groupings.

2.   Explore different equational forms, but it is unclear that
     statistics would improve.

3.   Perform both log and linear regressions and examine the variance
     about the regression line.  The approach that shows a variance
     that is constant and randomly distributed about the regression
     line regardless of IM240 level is preferred, regardless of the
     correlation coefficient.  If a log function is still preferred for
     HC and CO, then switch to the more complex approach.

4.   Regress individual FTP bag data against IM240 level.  Use of  "X"
     should not be necessary for bags 2 and 3,  and may not be necessary
     for bag 1.  Again, be sure that the assumed functional
                                   A-9

-------
     relationship meets the basic assumptions necessary for performing
     a regression.

5.   Manufacturers have claimed that catalyst washcoat technology was
     significantly improved in the latter 1980s.  The analysis should
     explore another major model-year group.

6.   Try different regression formulae and pick the one with the best
     statistics.

7.   Get rid of the cold start offset in the IM240 correlation and only
     use the IM240 to predict bag 2 and/or bag 3 (or,  alternatively,  a
     "Hot FTP", i.e.,  [0.521*Bag 2 + 0.479*Bag 3]  = b + m*IM240).  The
     cold start offset could then be calculated from available FTP
     data,  with consideration for emitter groups.

8.   Focus on the relationship between IM240 and bags 2 and 3.
                                  A-10

-------
                      Correlation Adjustments
When  the correlation  equations were applied  to  the  lane  IM240  scores
 (which had been corrected  to a lab/Indolene  basis),  two  additional
adjustments were made.  First, the cold start offset was assumed to be a
function of vehicle odometer reading  (although  the  correlations were
performed with a constant  X value) , and second, regression residuals
were  randomly applied to each data point.

Cold  Start Offset - The cold start offset  (X) values used in the above
correlation equations were developed, by technology group, from the mean
value of the difference between the FTP and  the IM240 for normal
emitters with FTP values greater than the  IM240 (i.e., the value of (FTP
- IM240) was determined for each normal emitter, and the mean  of the
positive results was  used  as X) .   When the correlation equations were
applied to the IM240  data, the value of X was adjusted to account for
the effects of aging  and mileage.  The way that this adjustment was
developed for 1983+ model years is described below.   (A  slightly
different procedure was used for 1981-1982 model year vehicles.)

The value of X in the correlation equations reflects the cold  start
offset at the mean mileage of the correlation sample.  At mileages less
than this mean,  it follows that X should be decreased by some  amount to
account for the fact  that the catalyst has been aged less and  is
expected to be more active.  (Alternatively,  X should be increased at
mileages above the mean.)   Thus,  the cold start offset is actually X
plus an increment that is a function of vehicle odometer, i.e.,

     X-Offset Function = f (x) = X + f (Odometer)

EPA has defined f (Odometer) in the above equation to be  "the difference
of the model year means regression for normal emitters and a 'New'  line
created by connecting a point on the model year means ' regression line
at the mean mileage of the correlation sample with  the zero mile level
used in MOBILE4 . 1 . "   The X-offset function is therefore:
     f(x) = X + ZMLHOB^M.!  -  ZMLMYMeans + ODOM* (DET.New.  - DET^ Means)
Strengths -.

1.   Simple in concept and allows IM240 data to directly replace FTP
     data in the TECH model .

2.   Will yield directionally consistent results.

3 .   Accounts for cold start emissions which would not be measured in
     IM240 (unless IM240 vehicle is not warm) .

4.   Calculates a cold start offset that is a function of vehicle
     age/mileage .
                                  A-ll

-------
Weaknesses:

1.   Clumsy handling.  May not reflect the  "true" effects of cold
     starts (and hot starts).

2.   Why is MOBILE4.1 used as the  "gold" standard?

3.   Creates a mileage effect based on the results of two unrelated
     analyses.  The effect is then extrapolated far beyond the mileage
     at which data have been collected.

4.   Looks pretty hokey.

5.   Hard to tell.  What do statistics for measured versus computed
     cold start FTP emissions look like?

6.   Really odd way to perform this adjustment.  Why was MOBILE4.1
     brought into this analysis at all?

7.   It is unclear as to why X is defined as being independent of
     emissions (i.e., it's based on normals) but varies with
     age/odometer.  Isn't age important only because emissions
     deteriorate accordingly?

8.   Continues to use "X",  which is a poor surrogate for cold start.
Al terna ti ves •

1.   Use IM240 data only to estimate non-start emission rates.  Use
     other data for start emissions directly.

2.   Firsc determine whether cold start emissions are related to hot
     start emissions (e.g., regress bag 1 versus IM240).   If such a
     relationship exists, then it can be determined directly from the
     regression.   If not, cold start emissions must be determined from
     bag 1 data.   IM240 data should not be used, nor should estimated
     FTP data from IM240 data.  The IM240/FTP relationship is too
     uncertain,  and clearly produces higher in-use FTP estimates versus
     previous FTP measurements (i.e., MOBILE4.1).  The slope of cold
     start emissions versus mileage would be due in part to the change
     in methodology and overestimate of the mileage effect.

3.   Use IM240 to correlate with bag 2/bag 3 of the FTP.   Develop
     separate correlation between Bag I and bag2/bag 3 using only FTP
     data.

4.   Reexamine from scratch other possible adjustments (e.g.,
     multiplicative versus additive).  Look at obtaining actual data on
     cold start offset versus odometer as opposed to MOBILE4.1
     correlation.

5.   Develop a cold start offset that is entirely separate from the
     IM240 data.   Using FTP data, this could be done in a number of
     different ways.
                                  A-12

-------
6.   Low-mileage cold start offset can be determined  from bag  FTP
     results or from new car FTP versus  IM240  tests.
Regression Residuals - Another adjustment made during  the application of
the correlation equations was the addition of randomized regression
residuals,  i.e.,

     Log10(FTPHC/co  - X)  =  b  + m*Log:o(IM240HC/co)  +  res
     FTPNOx = b + m*IM240NOx + res

where "res" represents regression residuals from the correlation sample.
According to EPA,  adding the residuals randomly to the FTP emission
levels predicted by the correlation equations attempts to restore a
distribution of predicted FTP values for a given IM240 score.
Otherwise,  there will be a single predicted FTP value  for each IM240
score.  A distribution of predicted FTP scores and emission levels is
important for some analyses, such as the determination of I/M credits.
For example,  if residuals were not applied,  100% of the FTP emissions
from a certain emitter group could be identified on the basis of the
IM240 score.

Strengths -.

1.   Without some adjustment,  the individual predicted FTP values will
     tend to clump around the mean,  making any evaluations that depend
     on emission distributions (e.g.,  number of high emitters) suspect.
     Residuals are actual observed distribution effects.

1.   A relatively simple non-parametric way to model emission
     distributions.

2.   Agree with concept.

3.   Good for IM240 ID rates - not necessarily for FTP analysis.

4.   Introduces a "distribution" back into the data.

5.   Converting emission data from normal to log space, regressing in
     log space,  and then converting back to normal emissions tends to
     yield a lower average emission level when compared to a simple
     average of the original data.  This occurs because the average of
     the logarithms is akin to a geometric mean,  which is always lower
     than the arithmetic mean when some variability is present.  When
     the goal is to determine a relative (i.e.,  percent) change in
     emissions,  then this reduction in the mean is not a problem.
     However,  if the goal is to estimate absolute emission levels, then
     the reduction is a problem,  since the atmospheric impact is the
     average of the emissions,  not their logarithms.   In this case, EPA
     desired to develop absolute estimates of FTP emissions from IM240
     emissions.   EPA may have added the residuals in order to
     compensate for the inherent downward bias in the  logarithmic
     analysis relative to the atmospheric effect.

6.   Simple to implement.
                                  A-13

-------
 Weaknesses:

 1.    Since  the  residuals are randomly applied,  the analysis cannot be
      replicated without a mapping of which vehicles used which residual
      value.

 2.    Too dependent on  the individual points and character of the
      database used; could have problems with homoskedasticity.

 3.    Should add regression residual to value without cold start offset.
      Log space  could cause problems.

 4.    The residuals were developed in log space, so the sum of these
      residuals  (in log space) was zero.  However, when the antilog was
      taken  in the regression equation, it led  to a net increase in
      predicted  FTP emissions (relative to the  non-residual equation).
      This significantly influenced the fraction of normals and non-
      normals in the predicted FTP database (e.g., for the MPFI group,
      the fraction of normal emitters was 78.8%  with the residuals
      applied, and 90.2% without the residuals  applied).  Is this
      consistent with the relative difference between a linear
      regression and the log-space regression?

 5.    It is  not  clear that any analyses were done to demonstrate that
      application of residuals in fact yielded  results that matched the
      atmospheric impact of the original data.

 6.   Assuming this was done rigorously,  this is a good technique.
     However, it is compromised by the relatively large "X" effect,
     which  is not rigorous.

 7.   Not clear  that the use of the residuals in log space provides a
     representative distribution of predicted  FTP scores.  In fact,  it
     seems unlikely to do so.
Al terna ti ves:

I.   Rather than adjusting each predicted FTP value individually, a
     "probability" distribution might be developed which could be used
     to predict what portion of the fleet with a given IM240 score was
     above or below a given FTP score.

2.   Could use log-normal formulation or Weibull distribution, with x
     and a derived from data.  (Earlier suggested by EEA to EPA but
     rejected as being too complex.)

3.   Need to ask how this changes the overall distribution.  There is
     some initial distribution of IM240 scores.  Without the addition
     of the residuals this distribution will not change when FTP values
     are computed.  Do they change when residuals are added?  If so,
     are the results reasonable?

4.   This is a good idea, although it is unclear that it really needs
     to be done if all the database is used for is base emission rates.
                                  A-14

-------
     If it is desired to do this,  make sure that application of
     residuals does not unintentionally skew results.

5.    Compare the arithmetic means of the original FTP data and the
     estimated FTP levels using the IM240/FTP correlation.  Do this
     with and without the residuals added back in.   Use the technique
     that matches the original data best.
                                 A-15

-------
                             TECHS Inputs
Once the Hammond data were converted to predicted FTP scores, the
results were used to develop inputs to the TECHS model  (i.e., emitter
category emission rates and growth functions).  The following emitter
categories were used in TECH5 for HC and CO emissions:

     •  Normal HC/CO - HC < 0.82 g/mi and CO  < 10.2 g/mi,
        High HC/CO - HC > 0.82 g/mi or CO > 10.2 g/mi,
     •  Very High HC/CO - HC > 1.64 g/mi or CO > 13.6 g/mi, and
     •  Super HC/CO - HC > 10.0 g/mi or CO >  150.0 g/mi.


NOx emissions were analyzed separately from HC and CO, with only two
emitter categories being defined - normals (< 2.0 g/mi)  and highs (>
2.0 g/mi).

The data were also segregated by the following technology groups:

     •  open-loop,
        carbureted/closed-loop,
        throttle-body injection (TBI)/closed-loop,  and
        multipoint fuel-injection (MPFI)/closed-loop.


Finally,  emission rates were determined separately for 1981-1982 model-
year vehicles and 1983+ model-year vehicles.

HC/CO Emission Rates - For HC and CO,  the emitter category emission
rates (i.e.,  zero-mile level (ZM)  and deterioration rates (DRs))  were
constructed as follows:

1.   MOBILE4.1 ZMs were used for 1981-82  normals.
2.   1983+  DRs were used for 1981-82 normals,  highs,  and very highs.
3.   Emission rates of normals were capped at the same rate for 1981-82
     and 1983+ groups.
4.   Normal caps were set at the maximum of the 1981-82  or 1983+
     100,000-mile levels calculated from the 1981-82 and 1983+ ZM and
     DR for normal emitters.
5.   Deterioration rates that were negative and without  significance
     were assumed to be zero.
6.   Regression of carburetor very highs  was performed for 1983-1988
     model  years only (although the regression results were applied to
     all 1983+ carbureted vehicles).   Including 1989 resulted in a
     negative ZM.
7.   A covariance analysis was used for fuel-injected very highs that
     resulted in the same DR but different ZM levels for the 1981-82
     group  and the 1983+ group.   (This resulted in substantially higher
     HC and CO emission rates from the 1983+ group compared to the
     1981-82 group.)
8.   All model years were combined for supers.
                                  A-16

-------
 Strengths -.

 1.   Allows detailed evaluation of  the effects of high emitters and  the
     effect of control programs  (i.e., I/M).

 2.   Accounts for mileage  impact on emission rates.

 3.   Recognizes that technology changes/improvements  impact
     deterioration rates and creates a structure to account  for the
     effect.
Weaknesses:

1.   Emitter groups should be statistically chosen instead of based on
     emission standards.  Technology groupings need to be selected
     based on emission performance.  User input in MOBILE has no impact
     on the assumptions used for the base emission rates.

2.   Mix and match approach is not defensible.  Should use the same
     data set for all analyses of regime sizes and emission levels.

3.   Could double-count impact of emission deterioration and regime
     growth.

4.   Many of the assumptions on when technology changed appear
     arbitrary and do not account for differential performance that may
     occur within the defined groups (distributional effects).


A1 terna ti ves:

1.   Possibly develop new emitter groups and technology groups.

2.   Incorporate this function into the MOBILE code.

3.   Use more regimes so that emissions are not a function of odometer
     in a given regime.

4.   Use the same data set for all analyses.  In cases where data are
     sparse,  say so and do the best you can with what you've got.  In
     some cases, if you are slim on data it probably means there are
     not that many in the fleet (e.g.,  1981-82 MPFI vehicles)  so the
     impact on fleet-average emission estimates (which is ultimately
     what we are trying to figure out here)  is minimal.  On the other
     hand,  there probably were sufficient data for the 1981-82
     carbureted group to analyze by itself and not use the 1983+ DRs.
     If there was concern about the number of normals in this group
     (which was probably low,  given the fact the 1981-82 vehicles were
     10 years old when tested),  why not pull in some of the FTP data
     from Ann Arbor testing to represent low mileage normals.   It's
     really the emitter category growth functions that drive the
     deterioration rates anyway.
                                  A-17

-------
 5.    Need  to  discuss  too many  issues,  e.g., making  FI  DRs  the  same  for
      different  groups does  not seem  reasonable unless  it can be proven
      that  the assumption holds.


 NOx Emission  Rates  -  The following procedure was used  to develop NOx
 emitter category emission rates:

 1.    1981-82  model-year normals used the MOBILE4.1  ZM  and  the  DR was
      determined from  the mean  emission level and mileage of the Hammond
      sample.
 2.    1983+ model-year normals  used a covariance analysis that  forced
      the deterioration rates to be equal for vehicles  certified to  1.0
      and 0.7  g/mi NOx.   (This  resulted in different zero-mile  levels.)
 3.    High NOx emitters used DRs from the normal NOx emitters,  and the
      ZM levels  were back-calculated  from the mean emission level and
      mileage  of the Hammond sample.
Strengths:

I.   Using the ZM levels ties the results to actual FTP data.

2.   Covariance analysis can check the null hypothesis that DRs are the
     same for different standards.

3.   Simple approach.


Weaknesses:

1.   NOx emission estimates are significantly affected by several very
     high NOx emitters that are in the 8-12 g/mi range for IM240.  It
     is unclear if they received accompanying FTP tests.  Engine-out
     NOx for most of these vehicles should be in the 4-5 g/mi range.
     There has been no explanation as to why these were so high - there
     should be a review of the database to see if they were improperly
     tested at the I/M lane.

2.   Why not compute DRs for highs.  Method used could lead to
     unrealistically high zero-mile emissions for highs.

3.   The assumptions drive the emission estimates and it is not clear
     how well it represents real-world occurrences .- how is this
     validated?


Alternatives:

1.   Accept hypothesis that NOx emission DFs may be zero or negative?

2.   Compute DFs for each group as shown in the data.
                                  A-18

-------
Growth  Functions  - As  important as  the  emission  level of  each emitter
category  are  the  growth  functions assigned to those categories.  For
MOBILES,  EPA  wanted  to base emission control system deterioration on
both vehicle  age  and mileage.  This was done by  using data  from  the 1987
and later model years  to establish  the  growth rate of non-normals  (i.e.,
highs + very  highs + supers)  for mileages less than 50,000.  For
mileages  above 50,000, data from the 1981-86 model years  were used for
the TBI and carbureted technology groups, while  data from 1984-86 model
years were used for  MPFI vehicles.   (EPA judged  that pre-1984 MPFI
represented "prototype"  technology.)

The method used to establish  the emitter category growth  rates was based
on first  developing  growth rates for the following emitter  groups:

     •  supers,
     •  very  highs + supers,  and
     •  highs + very highs +  supers.


Once these were established,  individual emitter  category  growth rates
were determined by subtraction.

The analytical technique used to develop the growth functions for each
of the above groups  is best explained with an example.   For the MPFI
very highs+super group, the following process was used.    First,  the
<50,000 mile growth  rate was established by determining the fraction of
very highs+supers from all 1987+ MPFI data.   In  the Hammond sample,
there were 155 very  highs+supers out of 1,716 total vehicles in this
group (i.e.,  9.03%).   This fraction was then divided by the average
mileage of the group (28,182)  to obtain a growth rate of  0.03205/10,000
miles.   The growth rate beyond 50,000 was calculated by first
determining the fraction of very highs+supers in the 1984-1986 model
year group (138/460,  or 30.0%) and the average mileage of that group
(68,464) .   The second growth rate was then calculated by  linear
extrapolation of a line connecting the fraction  of very highs+supers at
50,000 miles  (i.e.,  5*0.03205, or 16.0%) and the point established from
the >50,000 1984-86  group (i.e.,  0.300 at 68,464 miles).  This resulted
in a >50,000 growth  rate of 0.07568/10,000 miles.

Strengths:

1.   Simple,  yet accounts for non-linearity.

2.   Straightforward technique, but it is unclear how this  is a
     function of both age and mileage.

3.   It gets numbers that can be used in the model.

4.   It's an easy procedure and gives the 50,000-mile kink  that has
     been assumed for years.

5.   Simple to do.
                                  A-19

-------
Weaknesses -.

1.   Does not account  for  "shape" of curve at very high mileages.

2.   This technique is too sensitive to post-50,000 mile sample
     distribution.

3.   The 50,000-mile break point is essentially arbitrary.   (The only
     objective reason  to choose it is that it represents the
     certification mileage.)  Because of relatively low mileages of
     samples, calculated slopes are extrapolated far beyond the range
     of the data and can drive important policy decisions.  Second
     slope is likely overestimated based on the assumption of zero high
     emitters at zero miles.  This assumes that there is a significant
     number of high emitters at 1,000 miles and minimizes the projected
     number of high emitters at 50,000 miles.  This latter fact in turn
     maximizes second slope.

4.   It does not account for variation in growth rate or zero
     population at zero miles.

5.   The 50,000-mile kink is not supported by the data.   This method
     artificially inflates the second DR.

6.   Linear growth rate is not obvious;  it's more likely to tail off at
     high mileage.  The method has not been validated and it is
     extremely sensitive to the two bins under and over 50,000 miles.
Alternatives:

1.   Cap number of highs, very highs, and supers?  Add third linear
     growth beyond 100,000 miles?  Non-linear fit?

2.   Statistical analysis of p(high,super) in 10,000-mile bins.

3.   Break data into small, but meaningful mileage increments  (e.g.,
     10,000 mile increments).  Plot fractions of high emitters and
     emissions of normals and highs.  Perform linear and non-linear
     regressions to determine if the slope at higher mileage is
     increasing or decreasing and, if so, whether the effect is
     statistically significant.

4.   Break data into 10,000-mile bins (at least up to 100K),  and
     determine the fraction of each emitter category in each bin.  Use
     a regression analysis to develop emitter category growth
     functions.

5.   Break the data into more bins and track the growth rate and
     revisit the assumptions about the model year groups included in
     the analysis (i.e., it seems to mix mileage and model year when it
     should just be mileage).
                                  A-20

-------