&EPA
  United States
  Environmental Protection
  Agency
Office of Transportation                   EPA420-B-04-010
and Air Quality                      July 2004
            Guidance on Use of Remote
            Sensing for Evaluation of
            I/M Program Performance

-------
                                                         EPA420-B-04-010
                                                                July 2004
        Guidance on Use of Remote Sensing for
         Evaluation of I/M Program Performance
                    Certification and Compliance Division
                   Office of Transportation and Air Quality
                    U.S. Environmental Protection Agency
                               NOTICE
  This Technical Report does not necessarily represent final EPA decisions or positions.
It is intended to present technical analysis of issues using data that are currently available.
       The purpose in the release of such reports is to facilitate an exchange of
       technical information and to inform the public of technical developments.

-------
1. Introduction	3
2. Background	4
  2. I.Hi story of I/M	4
3. General Approaches to I/M program evaluation	7
  3.1 Defining Program Evaluation	7
  3.2. On-Road Data Analysis	8
  3.4. Three RSD Program Evaluation Methods	10
4. Equipment Specifications and Measurement Procedures	12
  4.1. The Remote Sensing System	12
  4.2. Theory of Operation	12
  4.3. Operation	13
  4.4. Operational Difficulties	15
     4.4.1  Signal/Noise Considerations	15
     4.4.2. Weather	15
     4.4.3. Interference	16
     4.4.4. Optical Alignment	16
     4.4.5  Emissions Variability	16
  4.5 Instruments	16
     4.5.1. Calibration Checks	16
     4.5.2. Other Instrument Parameters	17
  4.6. Site Description	17
  4.7. Measurements	18
     4.7.1. Data Collection	18
     ₯.7.2. Multiple Measurements	18
     4.7.3. Operators	18
  4.8. Database Format	19
  4.9. Department of Motor Vehicle Data	19
  4.10. Note Any Changes that Could Affect the Analysis	19
5. Design Parameters and Quality Assurance/Quality Control Protocols	19
  5.1. Overview	19
  5.2. Vehicle Population	20
  5.3. Vehicle Loads	22
  5.4. Vehicle Identification	27
  5.5. Instrument Calibration	28
  5.6. Measurement Methods	28
  5.7. Socioeconomics	28
  5.8. Seasonal Effects	29
  5.9. Program Avoidance	35
  5.10. Regional Differences (policies, environment, fuel composition, etc.)	36
  5.11. Program Details	36
  5.12. Emissions Distributions	37
6. Evaluation Methods	41
  6.1. Step Change Method	41
     6.1.1. Description	41
     6.1.2. Application Examples	42

FINAL                                                                          - 2 -

-------
     6.1.3. Potential Systematic Errors	43
  6.2. Comprehensive Method	44
     6.2.1. Description	44
     6.2.2. Application Examples	44
     6.2.3. Steps	45
     6.2.4. Advantages/Disadvantages	47
     6.2.5. Potential Systematic Errors	48
     6.3.1. Description	49
     6.3.2. Application Examples	49
     6.3.3. Applying the Method	49
7. Summary	53
8. References	54
Appendix A: On-Road Evaluation of a Remote Sensing Unit	56
1. Introduction
This document is intended to provide guidance for performing I/M program evaluations using a
Remote Sensing Device (RSD). The next section is a background regarding EPA regulation of
state I/M programs and a history of methods  used to evaluate  these programs.  Section 3
describes  different  approaches to evaluate  I/M programs,  using roadside  pullover  data or
independent remote sensing measurements. Section 4 summarizes equipment specifications and
measurement procedures, while Section 5 outlines important design parameters for the collection
and analysis of RSD program evaluation data.  Section 5  also discusses quality control issues that
should be considered in an evaluation. Section 6 describes in  detail three alternative methods to
perform short-term evaluations of the I/M programs using remote sensing data and discusses the
advantages  and disadvantages of  each.    The three methods  are  the  Step  Change,  the
Comprehensive,  and the Reference  analysis  methods.  How in-program data can be  used to
evaluate the long-term, cumulative effect of  I/M programs is covered in a separate document.
Appendix A contains some simple trouble-shooting methods that can be applied in the field as a
first check to determine if an RSD unit is functioning properly.

It is strongly recommended that any state considering the use of RSD for  program evaluation
purposes work closely with their respective regional EPA office and the Office of Transportation
and Air Quality to  ensure the most up-to-date  practices are incorporated into  the evaluation.
Furthermore, states  interested  in using RSD for program evaluation must  recognize the need
within their own agencies to develop a  minimum  level of expertise with the technology and
procedures to ensure reliable data are collected and analyses performed.

It should also be recognized given the difficulties associated with I/M program evaluations, that
an evaluation based on both out-of-program data (e.g. RSD) and in-program data will provide a
more accurate estimate  of overall program  performance than  simply  relying on one  method
alone.
FINAL

-------
2. Background

2. I.Hi story of I/M

The  Environmental Protection Agency (EPA) has  had oversight and  policy  development
responsibility for vehicle inspection and maintenance (I/M)  programs since the passage of the
Clean Air Act (CAA) in  1970 (J_) , which included I/M as an option for improving air quality.
The first I/M program was implemented in New Jersey in 1974 and consisted of an annual idle
test of 1968 and newer light-duty gasoline-powered vehicles conducted at a centralized facility.
No tampering checks were performed and no repair waivers were allowed.

I/M was first mandated for areas with long term air quality problems beginning with the Clean
Air Act Amendments of 1977 (2). EPA issued its first guidance for such programs in 1978 (3);
this guidance addressed State Implementation Plan (SIP) elements such as minimum emission
reduction requirements,  administrative requirements, and  implementation  schedules.   This
original I/M  guidance was quite broad and  difficult to enforce, given EPA's lack of legal
authority to establish minimum,  Federal, I/M implementation requirements.   This lack of
regulatory authority — and the state-to-state inconsistency with regard to I/M program design that
resulted from it — was cited in audits of EPA's oversight of the I/M requirement conducted by
both the Agency's own Inspector General, as well as the General Accounting Office.

In response to the above-cited deficiencies, the 1990 Amendments to the Clean Air Act (CAAA)
(4) were much more prescriptive with regard to I/M requirements while also expanding I/M's role
as an attainment strategy. The CAAA required EPA to develop Federally enforceable guidance
for two levels of I/M program: "basic" I/M for areas designated as moderate non-attainment, and
"enhanced " I/M for serious and worse non-attainment areas, as well as for areas within an Ozone
Transport Region  (OTR), regardless  of attainment  status.   This guidance  was to  include
minimum performance standards for basic and enhanced I/M programs and was also to address a
range of program implementation issues, such as network design, test procedures, oversight and
enforcement requirements, waivers,  funding, etc.  The CAAA further mandated that enhanced
I/M programs were to be: annual (unless biennial was proven to be equally effective), centralized
(unless  decentralized  was shown to be equally effective),  and enforced  through registration
denial (unless a pre-existing enforcement mechanism was shown to be more effective).

In response to the CAAA,  EPA published  its I/M rule  on November  5,  1992 (5),  which
established the minimum procedural and administrative  requirements to be  met by basic and
enhanced I/M programs.  This rule  also included  a performance standard for basic I/M based
upon the original New Jersey I/M program and a  separate performance standard for enhanced
I/M, based on the following program elements:

   •   Centralized, annual testing of MY  1968 and newer light-duty vehicles (LDVs) and light-
       duty trucks (LDTs) rated up to 8,500 pounds GVWR.
* References are denoted by underlined italic numerals in parentheses and are listed in Section 8.

FINAL                                                                          - 4 -

-------
   •   Tailpipe test: MY1968-1980 - idle; MY1981-1985 - two-speed idle; MY1986 and newer
       - IM240.

   •   Evaporative system test: MY1983 and newer - pressure; MY1986 and newer - purge test.

   •   Visual inspection: MY1984 and newer - catalyst and fuel inlet restrictor.

Note that the phrase "performance standard" used above was initially used in the CAA and is
misleading in that it more accurately describes program  design.  Adhering to the "performance
standard" does not guarantee an I/M  program will meet a specific level of emissions reductions.
Therefore,  the performance standard is not what is  required to be  implemented, it is the bar
against which a program is to be compared.

At the time the I/M rule was published in 1992, the enhanced I/M performance standard was
projected to achieve a 28% reduction in volatile organic compounds (VOCs), a 31% reduction in
carbon monoxide (CO), and a 9% reduction in oxides of nitrogen (NOx) by the year 2000 from a
No-I/M fleet.   The basic I/M  performance standard, in turn, was projected to yield a  5%
reduction  in VOCs and 16%  reduction in CO.  These projections  were made based upon
computer simulations run using  1992 national default assumptions for vehicle age distributions,
mileage accumulation, fuel  composition,  etc.,  and  were performed  using the most  current
emission factor model then available for mobile sources, MOBILE 4.1.  That version of the
MOBILE model was the first to include a roughly 50% credit discount  for decentralized I/M
programs, based upon EPA's experience with the high degree of improper testing found in such
programs.  This discount was incorporated into the 1992 rule, and served to address the CAAA's
implicit requirement  that EPA distinguish between the  relative effectiveness of centralized
versus decentralized programs.

The CAAA also required that enhanced I/M programs include the use of on-road testing and that
they conduct evaluations of program  effectiveness biennially (though no explicit connection was
made  between these two requirements).  In establishing guidelines for the program evaluation
requirement, the 1992 I/M rule specified that enhanced I/M programs were to perform separate,
state-administered or observed IM240's on a random sample of 0.1% of the  subject fleet in
support of the biennial evaluation.   Unfortunately, the program evaluation procedure  for
analyzing the 0.1% sample was never developed  with sufficient detail to actually be used by the
states.  In  defining the on-road testing requirement, the 1992 rule  required that an additional
0.5%  of the fleet  be tested using either remote  sensing  devices (RSD) or road-side pullovers.
Furthermore, the role that this additional testing was to play — i.e., whether it was to be used to
achieve emission reductions over and above those ordinarily achieved by the  program,  or
whether it could be used to aid in program evaluation — was never adequately addressed.

At the time the 1992 I/M rule was being promulgated, EPA was criticized for not considering
alternatives to the IM240. California in particular argued in favor of the Acceleration Simulation
Mode  (ASM)  test, a steady-state,  dynamometer-based  test developed  by  California,  Sierra
Research, and Southwest Research Institute.  In fact, this  test had been considered by EPA while
the I/M rule was under development, but the combination of EVI240,  purge, and pressure testing
was deemed  sufficiently superior to the ASM that EPA dismissed ASM as an option for

FINAL                                                                         - 5 -

-------
enhanced I/M programs. Nevertheless, EPA continued to evaluate the ASM test in conjunction
with the State of California and by  early 1995, sufficient data had been generated to support
EPA's  recognizing  ASM as  an acceptable program element  for  meeting the  enhanced
performance standard.

In early 1995, when the ASM test was  first deemed an acceptable alternative to IM240,  the
presumptive, 50% discount for decentralized programs was still in place.  Even at  that time,
however, the practical  importance of the discount was waning, in large part due to program
flexibility  introduced by  EPA aimed at allowing enhanced I/M  areas to use their preferred
decentralized program designs.  This flexibility was  created by  replacing the single, enhanced
I/M performance standard with a total of three enhanced performance standards:

   * High Enhanced: Essentially the same as the enhanced I/M performance standard originally
     promulgated in 1992.

   * Low Enhanced: Essentially the basic I/M performance standard, but with light trucks and
     visual inspections added.  This standard was intended to apply  to those areas that could
     meet their other clean air requirements (i.e., 15%,  post-1996 ROP, attainment) without
     needing all the emission reduction credit generated by a high enhanced I/M program.

   * OTR Low Enhanced: Sub-basic. Intended to provide relief to those areas located inside the
     OTR which — if located anywhere else in the country — would not have to do I/M at all.

Despite the additional flexibility afforded enhanced  I/M areas by the new standards outlined
above, in November 1995 Congress passed  and the President  signed the National Highway
Systems Designation Act (NHSDA)  (6), which included a provision that allowed decentralized
I/M  programs to claim  100%  of the  SIP credit that would  be allowed for an  otherwise
comparable centralized I/M program.  These credit claims were to be based upon a "good faith
estimate" of program effectiveness, and were to be substantiated with actual program data 18
months after approval.  The evaluation methodology to be used for this 18-month demonstration
was developed by the Environmental Counsel of States (ECOS), though  the criteria used were
primarily qualitative, as opposed to quantitative.  As a result, the ECOS criteria developed for the
18-month NHSDA evaluations were not deemed an  adequate replacement for  the CAAA and
I/M rule required biennial program effectiveness evaluation.

In January  1998, EPA revised  the I/M  rule's  original provisions for program evaluation by
removing the requirement that the evaluation be based on IM240 or some equivalent, mass-
emission transient test  (METT) and replaced this with the more flexible requirement that  the
program evaluation methodology simply be "sound" (7).  In October 1998, EPA published a
guidance memorandum that outlined what the  Agency considered to be acceptable, "sound,"
alternative program  evaluation methods  (8).  All the methods approved  in the October 1998
guidance were based on tailpipe testing and required comparison to Arizona's enhanced I/M
program as a benchmark using a methodology developed by Sierra Research under contract to
EPA.  Even though EPA recognized that an RSD-based program evaluation method may be
possible,  a  court-ordered  deadline of October 30, 1998 for release of the guidance  prevented
EPA from approving an RSD-based approach at that time.

FINAL                                                                         - 6 -

-------
The focus of this document is to address the concerns EPA has concerning RSD-based program
evaluation methods with regard to equipment specifications, site selection, and data collection, as
well  as outline and explain the  advantages and limitations of each RSD analysis methodology.
As its operating premise, EPA  recognizes that every program  evaluation method will have its
limitations, regardless of whether it is based upon an RSD approach or more traditional, tailpipe-
based measurements.  Therefore, no particular program evaluation methodology is viewed as a
"golden standard." Ideally, each evaluation method would yield similar conclusions  regarding
program effectiveness, provided they were performed correctly. Unfortunately, it is unlikely we
will see such agreement among methods in actual practice, due to the likelihood that different
evaluation procedures will be biased toward different segments of the in-use fleet. Therefore, it
is conceivable that the most accurate assessment of I/M program effectiveness will result from
evaluations which combine multiple program evaluation methods.
3. General Approaches to I/M program evaluation

3.1 Defining Program Evaluation

Aside from the technical challenges involved in gathering I/M program evaluation data, there are
also subtleties regarding what data is necessary that must be understood. The evaluation of Basic
I/M programs is strictly qualitative as per standard SIP  policy protocols used to  evaluate
stationary source emission reductions.  Historically, these type of qualitative evaluations  have
included verification of such parameters as waiver rates, compliance rates, and quality assurance/
quality  control procedures,  but they have not involved quantitative estimates  of emission
reductions using in-program or out-of-program data.

The evaluation of Enhanced I/M programs is not as clearly defined and is left to the discretion of
the Regional EPA based on the data available. In some instances, it may be possible to estimate
the cumulative emission reductions, that is the current fleet emissions are compared to what that
same  fleet's  emissions would be if no I/M  program were in  existence.  However, directly
measuring the fleet's emissions to determine the No-I/M baseline is not possible in an area that
has implemented an I/M program.  Therefore, in order to determine quantitatively whether the
level of SIP credit being claimed is being achieved in practice, it becomes necessary to rely on
modeling projections to estimate the No-I/M  fleet emissions or measure the emissions of a
surrogate fleet that is representative of the I/M fleet.  The RSD procedures outlined in this
guidance provide methods for estimating a fleet's No-I/M emissions using a surrogate fleet.

Two other analyses are also possible that  can  provide useful information  regarding  program
performance.  The first method may be thought of as "one-cycle" since it compares the current
I/M fleet emissions to the same I/M fleet's emissions from a previous year or cycle. An analysis
such as this would yield information with regard to how the program is improving or declining
from year to year.  The other method should be considered "incremental"  in that it compares the
current I/M fleet's emissions to that same fleet's emissions while being subjected to  a different
I/M program, for instance, comparing a fleet's emissions in an area that has just implemented an
IM240 program to that same fleet's emissions the previous year when a  Basic Program was in

FINAL                                                                           - 7 -

-------
operation.  It should be noted, that there is a small window of opportunity prior to and during the
start-up of any I/M program, or program change,  to actually  measure the fleet emissions that
would provide empirical data on the No-I/M fleet emissions.  If resources and time permit, it is
recommended that these baseline data be gathered in order to reduce I/M program evaluation
dependency on modeling projections and provide the most accurate measure of I/M program
performance.
3.2. On-Road Data Analysis

Remote sensing measurements can be used as a tool to help achieve the main goal of all I/M
programs, namely the reduction of on-road emissions. The general advantages of remote sensing
data are the following:

       i)      The testing is unscheduled and measures on-road emissions.
       ii)     A sample of all vehicles driving in an area can be tested.
       iii)    A very large sample of vehicles can be tested for a fraction of the cost of I/M lane
              testing.
       iv)    Vehicles can be tested over a range  of driving conditions, rather than merely the
              conditions specified in the I/M test*.
       v)     Vehicles that are often not tested  due to condition, size or special dynamometer
              requirements (heavy duty vehicles,  vehicles  considered unsafe to test, vehicles
              requiring four- wheel -drive dynamometers) can be measured.
       vi)    The on-road data can evaluate  the  extent  to which owners are repairing their
              vehicles  prior to emission testing.  This is a program benefit that cannot be easily
              measured by means of in-program  data without the use of surveys.
       vii)    RSD  can be directly converted to mass  emissions per  volume or mass of fuel
              burned and may be used to develop emission inventories independent of models
In a well-designed remote sensing program, roadway grade and environmental conditions at the
measurement  site, as well  as vehicle speed  and acceleration, will  be measured and used to
calculate the vehicle load for each individual emissions measurement.  Analyses can then be
performed on a subset of measurements with  a distribution of loads similar to that encountered
by a single vehicle on the program's I/M test. In addition, by employing careful site selection
criteria, remote sensing has the potential to measure emissions under driving modes not currently
incorporated into I/M tests.

Emissions measured by remote sensing instruments, and in idle and ASM tests, are reported in
terms of concentration of total exhaust.  Remote sensing  data, then, can be  directly compared
 By measuring vehicles on-road, RSD has the ability to measure vehicle performance at high power, "off-cycle"
conditions that cannot be readily measured on a dynamometer because of tire slip, tire damage, safety concerns,
vehicle owner concern and damage claims. Although off-cycle emissions are not regulated by the vehicle
certification process, and their measurement may not be desired for I/M evaluation, they may be an important
component of estimating the mobile source inventory. Therefore, on-road measurement of high power, "off-cycle"
performance may be used to develop a complete emissions inventory and to assess the effectiveness of repairs under
"off-cycle" conditions.

FINAL                                                                            - 8 -

-------
with emissions results from  I/M programs utilizing idle or ASM testing.   However,  some
enhanced I/M programs measure mass emissions, and report emission results in grams per mile.
Remote sensing concentration  measurements  can be converted to grams per gallon,  using
combustion chemistry equations, and then grams per mile, using an estimate of the instantaneous
fuel economy (miles per gallon) of the vehicle at the time of measurement.  The accuracy of the
conversion from emissions concentration to grams per mile  depends on the accuracy of the
estimate of instantaneous fuel economy. Fuel economy varies by vehicle type, technology and
age, as well as by vehicle load, thus complicating the conversion.  Areas conducting IM240 or
ASM testing should plot mean RSD emissions against mean initial EVI240 or ASM emissions by
vehicle type  and model year.   These  plots typically show a linear  relationship with high
correlation coefficients and can be used to establish a direct relationship between the RSD
measurements and the I/M test results. EVI240 program data also includes CO2 emissions and
thus  can  be directly converted to emission per gallon and compared to on-road data. These
comparisons have been published and show R2 generally greater than 0.95, although the slopes
and intercepts are not 1.0 and 0.0 (10).

In particular, remote sensing data can be used in several ways to evaluate the effectiveness of an
I/M program:

       i)     Remote sensing programs measure vehicles at different times relative to their last
             I/M test.  Therefore, remote sensing data can be used to estimate how quickly
             repair effectiveness diminishes over time and how much repair is made just prior
             to the I/M test, as well as track changes in fleet emissions due to changes in test
             procedures.

       ii)     Remote sensing  programs  measure almost every  vehicle  that  drives by  the
             instrument,  regardless of whether it is participating in the I/M program.  Remote
             sensing  data therefore  can  be used to estimate the  number and emissions  of
             vehicles legally exempted from, or illegally avoiding, the I/M program, as well as
             estimating  their  emissions.   In  addition,  remote  sensing  data can  identify
             individual vehicles that never complete the current I/M cycle, or that do not report
             for testing in a subsequent test cycle, but are still being driven in the I/M area.

However, as with in-program data, there are inherent limitations to RSD data.

       i)     The primary objection raised by opponents of RSD is that it must be assumed that
             a one second snapshot of the vehicle's emissions is characteristic of that vehicle's
             emission profile.

       ii)     Fleet coverage is also a very realistic concern as  it is  often difficult to obtain
             readings on more than 50% of the fleet, which  means that there may not be any
             emission readings for half of the vehicle population.

       iii)    The quality  control and quality  assurance aspects of RSD  data  collection and
             analysis have not been as well documented as those for traditional tailpipe testing.
FINAL                                                                          - 9 -

-------
Random  roadside  pullover  testing has  similar advantages to  remote sensing;  the  test is
unscheduled, and vehicles can be tested at different times relative to their last I/M test. However,
roadside  testing programs may be more expensive  and time-consuming  than  some  remote
sensing programs, and so many fewer vehicles can be tested. California has operated a roadside
pullover testing program for several years. An advantage of roadside testing is that the vehicles
can be tested using the same test methods as those employed in the I/M program. They can also
be inspected for visual or functional failures. However, the sample of vehicles participating in
the California roadside testing program may not reflect the on-road fleet, since participation in
the program is not mandatory, and it is also difficult to verify that vehicle selection is unbiased.
Furthermore, roadside pullovers are politically unacceptable in many areas.

3.4. Three RSD Program Evaluation Methods

In this document three  methods,  not necessarily  exclusive, of using  remote sensing  data to
analyze  I/M  program  effectiveness are  discussed.    These  are  the Step  Change,  the
Comprehensive, and the Reference Methods. The Step Change and Comprehensive evaluation
methods are quite similar.  Remote sensing measurements are made on  a fleet of vehicles in an
I/M area.   The fleet is then divided into two  sub-fleets, based on whether  or not individual
vehicles have been tested under the current I/M program. The emissions of the two sub-fleets
are then compared, after accounting for differences in vehicle type and age.   The difference in
the emissions of the tested fleet and the untested fleet is the apparent benefit of the I/M program
in reducing emissions.

The primary difference between the two methods is the number of remote sensing measurements
required.  The Step  Change Method can be performed using a relatively  small number of
measurements,  on the order of 20,000 to 50,000.  The Comprehensive Method requires many
more remote sensing measurements (several million in the Phoenix example) in order to perform
the detailed analyses of program effectiveness. Collecting this much remote sensing data can be
relatively expensive; however, if such data are already being collected as part of another program
(such  as  a Clean Screen program), the additional cost of analyzing the data  is minimal. The
drawback of the Step Change and Comprehensive Methods (aside from the  general concerns
with regard to RSD mentioned above) is  that they  only measure the effect of incremental
changes in I/M programs unless repeated year after year.

The Reference  Method is designed to measure the full effect of an I/M program on a  vehicle
fleet, by comparing the emissions  of a fleet subject to I/M with estimated fleet emissions if no
I/M program were in place.  The accuracy of the Reference Method hinges on  the ability to find
a fleet in a non-I/M area as similar to the I/M area fleet as possible. Because vehicle emissions
are quite variable, both between vehicles and within an  individual vehicle, and because many
differences between vehicle fleets  and their environment can affect vehicle emissions, finding a
suitable reference area can be challenging. One way to determine the degree of bias in the
reference fleet is to obtain data  from a second reference fleet; if there  are few biases, the two
reference fleets should look the same.  The Reference Method  can also be used to compare the
impact of two I/M programs in different locals. Although this will provide a relative comparison
between two programs it will not provide any data to compare an I/M program to a No-I/M fleet.
FINAL                                                                         - 10 -

-------
 Figure 3.1 below illustrates some of these differences.
                      Figure 3.1.1/M Program Evaluation Methods Using Remote Sensing Data
e
H
o
EH
           RefiamceM ethod com pares an issions of vehicles in an Ijt/[ program  [tested fleet] with those of vehicles not in
           an Ifil  program  [reference fleet].
                                                                                       Basbl/M
                                                                                       Enhanced I/M
       StepM ethod com pares emissions in one cycle [tested fleet] w nth those in pievixis cycle tJntestEd fleetj.
       M eas_UES efiectof indHTiental changes to pKagram .Because untested fleetm eas_usd later in cycle than tested
       fleet, m ay overstate increm eritalpKxjram effect.
       C om pffihensive M ethod com pares em issjons of fleetatdifferentpoints in Ifl[ cycle. M easores eSsctof pie-test
       lepair, delay in post-testiepair, and em issions deterioration overtim e.
                                                                  Test
 FINAL
                                                                                                  -11-

-------
4. Equipment Specifications and Measurement Procedures

4.1. The Remote Sensing System
Figure 4.1 shows a generic diagram of an RSD system which measures CO, CO2, HC, NO, and
smoke opacity set up along a single lane of road. The make and model year of the vehicle are
identified from the video picture.
                          Figure 4.1: RSD Operational Diagram
                              WE All ILK '
                              STATION !
                                              OMISSIONS
                                              DETECTOR
                   IR/UV SOURCE
                                                        n CALIBRATION
                                                                GAS
                                                               /f\ LICENSE
                                                              /  /   PLATE
                                                                 t    VfDEO
4.2. Theory of Operation
Remote Sensing Devices have been designed to emulate the results one would obtain using a
conventional exhaust gas  analyzers. Because the effective plume path length and amount of
plume seen depend on turbulence and wind, one can only determine ratios of CO, HC, or NO to
CO2.   Assuming complete and instantaneous mixing, these  ratios, Q for  CO/CO2, Q' for
HC/CO2, and Q" for NO/CO2 are constant for a given exhaust plume. By themselves, Q and Q'
are useful parameters with which to describe the combustion system.  When the corresponding
combustion equations are  solved many components of the vehicle operating characteristics can
be determined including the instantaneous air/fuel ratio  and the  % CO,% HC, and % NO which
would be read by a tailpipe probe. The equations given below are based upon a carbon mass
balance and make use of the fact that the IR HC analysis method only measures about one half of
the carbon which would be measured by means of an FID for instance.
             % CO2 = 42/(2.79 + 2Q
             % CO = Q * (% CO2)
             % HC = Q' * (% CO2)
             % NO = Q" * (% CO2)
0.84Q')
FINAL
                                                                           - 12-

-------
To derive mass emissions in g/gal of fuel from Q and Q' a fuel density of 0.75 g/mL and the
carbon-hydrogen ratio of 1:2 are assumed to yield:

              CO2 mass emission (g/gal) = 89227(1 + Q + 6Q')
              CO mass emission (g/gal) = 5678*Q/(1 + Q + 6Q')
              HC mass emission (g/gal) = 8922*2*Q'/(1 + Q + 6Q')
              NO mass emission (g/gal) = 6083*Q"/(1 + Q + 6Q')

The vehicle's instantaneous air to fuel ratio is

              A7F by mass = 4.93(3 + 2Q)/(1  + Q + 6Q')

All diesel and most gasoline powered vehicles show a Q and Q' near zero since they emit little to
no CO or HC. To observe a Q greater than zero, the engine must have a fuel-rich air/fuel ratio
and the emission control  system, if present,  must not be fully operational (if).

In the  case of diesel combustion, misfire causes high HC  readings.  Since the overall air/fuel
ratio is very lean, even when over-fueling and sooting are taking place, CO emissions only arise
from pockets of incomplete combustion, and  are limited to about 3% CO, compared to a broken
gasoline-powered vehicle which can exceed 12% CO.

Recently, the ability to measure nitric oxide (NO) has been added to the existing IR capabilities.
The light source, across the road, now contains a deuterium or xenon arc lamp and IR/UV beam-
splitter which  is mounted in such a manner  that the net result from  the source is a collimated
beam of UV and IR light.  As with  CO  and  HC measurements, the NO measurements  are
possible by ratioing to the CO2 measured in the plume.  All pollutants except HC are a specific
gas which  can unambiguously be  measured and calibrated.  Exhaust HC is  a very complex
mixture of oxygenated and unoxygenated  hydrocarbons.  The filter chosen measures carbon-
hydrogen stretching vibrations which are present, but not equally in all HC compounds. This
system can easily distinguish gross polluters  from low emitters, but the results  on an individual
vehicle cannot be expected to correlate perfectly with a flame ionization detector, with  ozone-
forming reactivity, or with  air toxicity, since the three are not correlated to one another.  For
large sample sizes the fleet average emissions  correlate well with IM240  g/mi measurements
02).

Newer technologies may also be used in place of the UV/IR detectors described above, such as
tunable diode lasers.

4.3. Operation

When  a  motor vehicle passes through the beam of  a calibrated instrument on the road,  the
computer notices the blocked intensity of the reference beam. This causes the  previous 200 ms
of data (20 points) to be stored in memory  as the "before car" buffer. The blocked voltages are
continuously interrogated both to remember the lowest values (zero offset) and to look for a
beam unblock  signal.  When an unblock signal is recognized, the video picture is frozen into the
video screen memory and thus goes to the image recorder, and the next 50 data points (1/2 sec of

FINAL

-------
exhaust) are placed in a data table.  The zero offsets are subtracted from all  data.  The data
stream is interrogated for the highest CO2 voltage. This is the least polluted 10 ms average seen
during the 0.7 sec. of data devoted to this vehicle.  This set of data (often, but not always, in the
before car buffer) then becomes the "clean air reference" (CAR) against which all other data are
compared.  After all signals have been ratioed to the reference channel, and ratioing the results to
the CAR result for that channel,  one now has  a set of 50  postcar,  corrected, fractional
transmissions which are converted to gas concentrations such as  would have been observed in
the gas analyzer.  These concentrations are then correlated to CO2 and the slope and error of the
slope determined.   These slopes (the ratios  of  the pollutants to CO2) are corrected by the
correction factors determined for that time by means of roadside calibration.  These slopes now
are the Q, Q' and Q" described earlier.

The data obtained for each vehicle provide three pollutant ratios.  The RSD software now solves
the combustion equation for the measured pollutant ratios, compares the errors to preset error
limits, and, if acceptable, reports the measurements as % CO, % CO2, % HC, and % NO such as
would be measured by a tailpipe probe with the results corrected for water and for any excess air
which may not have participated in combustion.  In view of the fact that the instrument is
calibrated with propane, percent HC is reported as propane; however, other HC species such as
hexane or 1,3 butadiene could be used for this purpose as well.  The four derived concentrations,
% CO, % HC, % NO, and % CO2, are placed on the video output together with the vehicle image
(which has been waiting without results for about 0.7 sec.).

This image now stays on the screen until the next vehicle comes by to repeat the process.  If
these results are to be compared to vehicles of known emissions, or gas cylinders puffed into the
beam, it is important to compare the three ratios and not the four derived concentrations since
there are not actually four independent pieces of information.  For example, if a person blocks
the beam and exhales into it during the 1/2 sec.  after they  have unblocked the beam, the
computer sees the exhaled CO2, finds no CO,  HC, or NO, and reports zeros for those pollutants
and about 15% CO2. Exhaled breath rarely contains even 2% CO2, but the system only  measures
the ratios, and  assumes (incorrectly  in this  case)  that the  emissions  are from a fully
stoichiometric automobile using gasoline as fuel.  A puff from a cylinder which contains 50%
CO and 50% CO2 would be read as 8.6% CO and  8.6% CO2 because the ratio is what is
measured not the absolute concentrations.

Special software traps should be employed to deal with two cars traveling very close together.  In
this case, the before car buffer from in front of the first is used as a potential source of clean air
reference for the exhaust of the  second.  The video picture of the  first is replaced by the second
before any data are overwritten. High pickup  trucks thus often get two pictures, only the last of
which has emissions data.

Other software traps reject data when the slope errors are too large, and when there  is no sign of
any significant exhaust  plume (such as behind 18-wheel trailers whose tractors have elevated
exhausts).
FINAL                                                                          - 14 -

-------
For the interested reader, Appendix A  contains a brief description of some trouble-shooting
procedures that can be performed quickly in the field as a first step to verify if an RSD unit is
operating properly or if it is in need of service.
4.4. Operational Difficulties

4.4.1 Signal/Noise Considerations
Remote emissions measurements would all be very straightforward if one were able to measure
directly behind the tailpipe of each passing  car.  Absorptions would be large, and the system
signal/noise (S/N) would not be limiting.   In  fact,  vehicle tailpipes are  not in  standardized
configurations, vehicle engine sizes are not uniform, and there is very rapid turbulent dilution of
the exhaust behind vehicles  moving faster than about 5 mph.   Thus, one is forced  to make
engineering tradeoffs between the desire to  measure all vehicles and the necessity to have an
adequate S/N so as not to report incorrect exhaust emissions values.

The detection  of gas  absorption is based upon the reduction of signal on one detector versus the
reference detector. Thus, the average car measured at an uphill freeway ramp in Denver shows
an exhaust plume already diluted by a factor of about 10.  This situation gives rise to an easily
measurable 14% reduction in the CO2 voltage. Because the average  CO content is about l/20th
of the CO2 and the HC  1/1 Oth of the CO, the average total changes in CO and HC voltages are
only 3 and 1 part in 1000, respectively.  The NO channel shows a similar response as HC. Thus,
the instrument builder's  challenge is to build a system in which part per thousand changes in IR
and UV intensity are accurately  measured in all weather conditions beside a normal  road  at a
measurement frequency of 100 Hz.  At other locations, the plume dilution factor  is 100 and a
decision must be made  whether  the individual instrument's S/N is adequate for readings to be
reported or if the data should be  reported as invalid.  This bleak outlook is somewhat mitigated
by the fact that the  source need only maintain a stable intensity for about two seconds for a
complete measurement series and the fact that the data reduction process intrinsically "averages"
all the 1/2 sec. data to only three ratios.

Newer technologies  having  improved S/N ratios may be  available and used  over greater
distances.

4.4.2. Weather
Measuring light intensities over a 10 m  path to better than a few  parts per thousand can be
inhibited by bad weather. Ambient temperature and humidity variations are not a  problem, but
snowflakes and heavy rain add too much noise to all data channels. Wet or very dusty roadways
cause a plume of spray or dust behind vehicles moving  above about 10 mph. These plumes  also
add noise to the system,  and generally increase the data rejection rate to an unacceptable level.

At the most productive sites, the  remote sensor can gather data on 10,000 vehicles  in a working
day; thus, it often generates  data faster than  the operator can handle. In such cases, taking the
day off to analyze data when the weather conditions  are not appropriate may be beneficial.
Gross polluting vehicles are thought to be the same vehicles on dry as well as on wet days.
FINAL                                                                          - 15 -

-------
4.4.3. Interference
The  HC wavelength suffers  from  some  interference  from  gas  phase,  and  certainly  from
participate phase, water (so-called "steam" plumes from colder vehicles operating at low ambient
temperatures).  When steam plumes  are so thick that you cannot see through them (Fairbanks,
AK.,  at forty below zero) the system no longer operates since all wavelengths are absorbed or
scattered too much for useful data to be acquired.

4.4.4. Optical Alignment
If the instrument is not perfectly optically aligned, the voltages are likely to be very sensitive to
equipment vibration. Since moving vehicles both shake the roadway and generate wind pulses,
rigid  instrument mounting is as important as perfect internal and external  optical  alignment.
Software is written so that these noise sources generate "invalid" flags.  Proper alignment at a
well characterized RSD-site can yield 95% valid RSD readings on passing vehicles using UV/IR
detector technology.

The  system is designed to operate on a single-lane road.  Freeway ramps, turn lanes, and the
inevitable road closures for sewer, gas, water, telephone, and road maintenance are often good
candidates for RSD emission measurement sites.  Multiple-lane operation has been reported but
is not recommended.

4.4.5 Emissions Variability
Emissions of motor vehicles  are not constant from second to second or from day to day. Broken
vehicles  in  particular  often seem to have  a large random component to their emissions
irrespective  of what test is used to make  the measurement (13).   Some  vehicle emission
variability has known causes such as the initial operation  of cold vehicles before the engine
control system stabilizes and the catalyst begins operation, or when the vehicle is accelerated at
full throttle.  Both situations  give rise to large CO and HC emissions from even well-maintained
vehicles, but can be minimized through careful site selection.

4.5 Instruments

4.5.1. Calibration Checks
Two separate calibration procedures  should be performed on  every remote  sensing unit.  The
first is conducted in a laboratory and should be performed  by the  equipment manufacturer.  It
may consist of exposure in  the  laboratory  at a path length of about 22 ft to known absolute
concentrations of NO, CO, CO2, and propane in an 8 cm IR  flow cell with CaF2 or other IR
transmitting windows. The calibration curves are used to establish the fundamental sensitivity of
each detector/ filter combination to the gas of interest.  The results of this  calibration should be
provided to the state or contracting party upon request.

The  second calibration  should be every hour (14) during  operation until the stability  of the
individual system is quantified and  characterized using statistical  process  control methods.
Once control  charts have  been established,  the  calibration  frequency  may be  reduced
appropriately. Several puffs  of gas designed to simulate all measured components of the exhaust
are released from a cylinder containing certified amounts of NO, CO, CO2,  and propane into the
optical beam  path. The ratio readings from the instrument are compared to  those certified by the

FINAL                                                                         - 16 -

-------
cylinder manufacturer.  In this way the system never actually measures exhaust emissions;  it
basically compares the pollutant ratios in a known standard gas cylinder and those measured in
the vehicle exhaust.

The gases used for the second calibration shall by certified to +1-2% of a known NIST standard
and be in the following ranges:
              CO          1-9%
              HCasCS     300-4100ppm
              NO          1500-3600 ppm
              CO2          5-14% (with the balance oxygen free nitrogen)

Additionally, some  quick checks  are provided in Appendix A that may be useful in  trouble-
shooting equipment in the field.

4.5.2. Other Instrument Parameters
At a minimum the following parameters shall also be recorded in all RSD program evaluation
studies for each RSD site in a stations log.  The  log may be kept electronically or in hardcopy
format.

       i)      A  description  of the RSD equipment including light source,  make/model of
              instrument, and detector type.
       ii)     The name of the operator and the van.  If more than one operator or van are used,
              key and record which operator and/or van was used for each measurement.
       iii)     Complete description of the calibration procedure.
       iv)     Audit check results
       v)      Calibration check results
       vi)     Any equipment changes
       vii)    Verification of speed and acceleration measurement devices
4.6. Site Description

A site description for each RSD data collection site shall be generated that shall include the
following information.

       i)     Road map with features affecting traffic flow.
       ii)    Note any change in the position of the light source, detector, etc. from previous
             RSD studies
       iii)    Note any change in traffic patterns from previous RSD studies.
       iv)    Note the altitude of the site and the road grade.  Include a field in the database
             showing the road grade in percent for all measurements.
       v)    Digital picture of the site including all cones,  etc, that would influence motorist
             driving patterns.
       vi)    Global Positioning Satellite coordinates based on the NAD86 reference standard.
FINAL                                                                          - 17 -

-------
4.7. Measurements

4.7.1. Data Collection
The following measurements shall be recorded at each site where RSD program evaluation data
are collected.
       i)    %CO2, %CO, %NO, %HC, maximum CO2, all error terms, restarts, and negative
               emission numbers.  Include a field showing whether HC is reported as propane
               or hexane.
       ii)   Speed and acceleration. Vehicle Specific Power shall  be  calculated as described
               below. Valid VSP values shall be between 0-20 kW/ton.
                  VSPkw/t = 4.39*sin(slope)*v+0.22*v*a+0.0954*v+0.0000272*v3
                      where "a" is vehicle acceleration in mph/s, "v" is vehicle speed in mph,
                      and slope is the road grade in degrees .
       iii)   Location of speed  measurement  relative  to emission  measurement.   It  is
               recommended that vehicle speed be measured 5-10 m prior to the emissions
               measurements.
       iv)   Time and date of measurement
       v)   License  plate.  Record all plates including in-state,  out-of-state (OS),  dealer (D),
               paper plate (PP), obscured plate (OP), and no-plate-visible (NPV)
       vi)   Hourly temperature, barometric pressure, and relative humidity
       vii)  Describe how plume  strength is determined and flagged, as well as the criteria for
               rejecting measurement attempts.
       viii)  Site reference label
       ix)   RSD unit number or unique identifier

4.7.2. Multiple Measurements
Multiple measurements made on the same vehicle shall be treated in  one of the following ways;
however, the program evaluation report will clearly state which method has been chosen and the
rational behind this choice.  A multiple measurement is not restricted  by  the timeframe  over
which it is collected.   Therefore, it may be hours, days, weeks or months.  Option (iv) below is
recommended,  although  there  may be  circumstances  when  another  option may  be more
appropriate.

       i)    Multiple measurements are treated as independent readings
       ii)   Multiple measurements are averaged and treated as a single reading
       iii)   Multiple measurements are discarded and only the first reading is used
       iv)   The  maximum,  minimum  and  average  values  are  reported  to  provide  as
               comprehensive a snapshot of a vehicle's emission profile as possible.

4.7.3. Operators
Care  must be taken  to ensure operators are  properly trained in the routine operation of the
equipment and fully understand and implement the QA/QC required procedures. Furthermore, it
  This equation should be considered generic in that it may be applied to all types of vehicles. More accurate
equations dependent on MY and/or vehicle type may be developed in the future.

FINAL                                                                          -18-

-------
is imperative that daily vehicle quotas do not compromise the operators judgments or actions
with regard to QA/QC and the data collection process.

4.8. Database Format

The RSD data collected shall be made available in an ASCI text file that may be easily ported
into a standard commercially available database software package such as Access, Oracle, SAS,
etc. If special procedures are required to port the data into such a software package the software
code or procedures shall be provided upon request.

4.9. Department of Motor Vehicle Data

Department of Motor Vehicle data shall be reported as follows.

      i)     Date DMV data received from DMV
      ii)     Information indicating how current the most recent DMV data in the file are.
      iii)    VIN, Model Year, Make, Model, Fuel Type, Vehicle Type, Zip Code
      iv)    I/M test date.
      v)     I/M test results in g/mi, ppm or percent.

4.10. Note Any Changes that Could Affect the Analysis

Any changes to the I/M program which would impact the analysis shall be recorded and reported
in the program evaluation report.  Such changes may include, but are not limited to, changes in
the operational details of the I/M program itself, or the use of a seasonal fuel program to reduce
mobile source emissions.
5. Design Parameters and Quality Assurance/Quality Control Protocols

5.1. Overview

This section outlines a number of critical issues that must be addressed to perform a program
evaluation using RSD technology.   These issues  include data collection design parameters,
equipment specifications, calibration procedures, quality control, and several known sources of
bias in vehicle emissions measurements that can affect any evaluation of an I/M program. Some
of these  are unique  to remote sensing data,  while others apply to evaluations based on in-
program  data as well. The issues or types of bias that must be considered in a remote sensing
program  evaluation have been broadly grouped into the following categories and discussed under
the appropriate headings  below:  vehicle population,  vehicle  load,  vehicle  identification,
instrument  calibration,  measurement  method,  socioeconomics,  seasonal   effects,  program
avoidance, regional differences, program details and emissions distributions.

The importance of five issues (vehicle load, program avoidance, vehicle identification, program
details and emissions distributions) are roughly similar for each of the three evaluation methods.
Because  the Reference Method relies on measurement in two different geographic regions, it is

FINAL                                                                         - 19 -

-------
most sensitive to all of the remaining types of bias. The likelihood of bias can be minimized if
multiple  reference sites are chosen  and the sites are well-characterized with common load
characteristics.  Because the Comprehensive Method requires large numbers of measurements,
multiple vans and  sites can increase a bias due to instrument calibration, socioeconomics, and
seasonal  effects.  In collecting data at a single  site over a short time period, the Step method
eliminates  the  potential  for socioeconomic and seasonal  bias  between the two measured
subfleets; however, the estimate of program effectiveness may be biased if the site chosen or the
time of testing does not capture the distribution of driver  socioeconomics or environmental
variables representative of the I/M area. This potential source  of bias can be tested by comparing
the measured  fleet numbers by model  year to other data , bearing in mind that  on-read
measurements  are expected to measure  newer, higher annual mileage vehicles more than older,
lower annual mileage vehicles.
5.2. Vehicle Population
Goal: Account for differences in vehicle fleet distributions in the program evaluation analysis.

Perhaps the most common source of bias when comparing emissions of two fleets of vehicles is
the vehicle distribution of the two fleets.  Older, higher mileage, vehicles tend to have higher
emissions than newer, lower mileage, vehicles.  Light duty trucks were  built to less  strict
emissions standards than passenger cars, and are observed to have higher in-use emissions.  In
addition, there is a wide range in average emissions by vehicle model, even for vehicles of the
same age (15).

Differences in vehicle fleets can be determined by comparing vehicle distributions of the two
fleets by type and model year.  (Note: The Step, Comprehensive and Reference Methods all
compare fleet averages;  however, the composition  of these sub-fleets is  different for each
method.) Average  emissions  by  type  and  model year should be calculated for each fleet and
compared to determine any emissions differences between the two fleets. The average emissions
for each fleet should then be weighted by a  single distribution of vehicles by type and model year
(preferably that of the I/M program area), to determine the overall fleet emissions and the percent
difference between the two fleets.

Table 5.1 displays examples CO emissions by model year from samples of vehicles measured in
a reference area  and  an I/M area. The composite fleet averages  of 0.86% CO for the reference
area  and 0.58% CO for the I/M area suggest the I/M area vehicles are 32% cleaner. This is not a
fair comparison, however, because it is evident from the fleet fraction percentages (Columns D
and G) that the I/M area sample contains a greater proportion of newer vehicles.

To overcome this,  the I/M area model year CO contributions are re-weighted according to the
reference area fleet fraction percentages. This  is shown in column H. The adjusted composite
emissions level for the I/M area  is now 0.76% CO, resulting in  an apparent 12% (1-0.76/0.86)
benefit from the  program.  It should be noted that this 12%  apparent benefit should be converted
* This data may be obtained from Department of Motor Vehicle (DMV) records or modeling defaults, although
empirical DMV data would be preferred.

FINAL                                                                          - 20 -

-------
to a mass basis to be more meaningful and allow more direct comparisons to other I/M program
evaluation results as well as results from other air pollution control programs.

Of course, this raises the question as to what extent the greater proportion of newer vehicles in
the I/M fleet is the result of the I/M program.  Addressing this question is difficult.  No current
analyses of in-program or out-of-program data provides information in this regard. At this time,
further studies are needed to address this issue.

Because the Step Change and Comprehensive Methods compare fleets of vehicles from the same
I/M  area, there  is likely to be little difference between the  two fleets with respect to  fleet
distribution.   However, when  using the  Reference Method, vehicle populations can be
significantly   different  between  different  geographical  areas, as  can  fuel  composition,
environmental factors, and motorist socioeconomic status (discussed below).
FINAL                                                                          -21-

-------
                     Table 5.1: Average RSD Readings by Model Year
                    B
H
Model
Year
Pre-60
Y60-65
Y66-70
Y71-75
Y76-80
Y81
Y82
Y83
Y84
Y85
Y86
Y87
Y88
Y89
Y90
Y91
Y92
Y93
Y94
Y95
Y96
Y97
Avg/Tot
Reference Area
AvgCO
3.45
4.12
3.50
2.74
2.42
2.24
1.94
1.71
1.64
1.39
0.99
0.83
0.72
0.68
0.56
0.50
0.43
0.37
0.28
0.23
0.17
0.11
0.86
Count Fleet %
70
390
1333
2661
10259
2818
3430
5440
8424
10322
12067
12532
14410
14803
14479
14666
12977
14617
13222
15055
9668
876
194519
0.04%
0.20%
0.69%
1.37%
5.27%
1.45%
1.76%
2.80%
4.33%
5.31%
6.20%
6.44%
7.41%
7.61%
7.44%
7.54%
6.67%
7.51%
6.80%
7.74%
4.97%
0.45%
100.00%

AvgCO
1.60
3.61
3.24
2.50
2.19
.64
.34
.36
.23
.18
0.83
0.77
0.70
0.61
0.53
0.50
0.42
0.36
0.30
0.26
0.20
0.21
0.58
I/M
Count
16
39
137
310
1173
373
470
707
1203
1654
2172
2497
2853
3059
3366
3717
3645
4350
4507
5435
4320
2116
48119
Area
Fleet %
0.03%
0.08%
0.28%
0.64%
2.44%
0.78%
0.98%
1.47%
2.50%
3.44%
4.51%
5.19%
5.93%
6.36%
7.00%
7.72%
7.57%
9.04%
9.37%
11.29%
8.98%
4.40%
100.00%
ExD
0.00
0.01
0.02
0.03
0.12
0.02
0.02
0.04
0.05
0.06
0.05
0.05
0.05
0.05
0.04
0.04
0.03
0.03
0.02
0.02
0.01
0.00
0.76
5.3. Vehicle Loads
Goal: Ensure that RSD measurements are made under known vehicle operating conditions.

Another important source of potential bias is the load under which the vehicle is operating when
the emissions measurement is made. Emissions per gallon are very much less speed and load
dependent than emissions per mile, nevertheless load is an important variable. Researchers use
vehicle  specific  power (VSP equation given earlier)  which is a function  of vehicle speed,
acceleration, drag coefficient, and tire rolling resistance, and roadway grade, to characterize the
load the vehicle is operating under at the time the measurement is made (16,1Z).

On-road remote  sensing units measure tailpipe exhaust plumes for a fraction of a second as
vehicles pass by the unit.  HC, CO and NOx pollutant emissions are estimated by comparing the
FINAL
                                                                             -22-

-------
ratio of their concentrations to the concentration of CO2 seen in the vehicle exhaust plume.
Although, the remote sensing unit does not measure the volume of exhaust gases produced, a
number of vehicle load conditions can elevate the remote sensing observed emission levels:

   i) When a motorist lifts his/her foot off the gas pedal, the volume of air and  fuel flowing
      through the vehicle engine  and exhaust system  is  suddenly  reduced.   Under these
      circumstances, the ratio of HC and CO to the now reduced level of CO2 is often increased.
      Although the volume and mass of emissions are substantially reduced when  a driver lifts
      off the gas, to the remote  sensing unit, the ratio of the concentrations of HC and CO to
      CO2 are actually higher and a higher emissions value is recorded.  This effect is greatest
      for HC.

   ii) When a  motorist presses sharply  on  the accelerator, the vehicle may go  into what is
      termed an 'off-cycle' condition.   The  current generation  of vehicles have been certified
      using the Federal Test Procedure; however, this test does not cover the full power range of
      the vehicle.  Consequently, vehicles were designed to minimize emissions only over the
      power range tested in the certification cycle.  At higher powers, so called "off-cycle" or
      power enrichment  emissions  often  increase  dramatically  although the  vehicle  is
      functioning as designed.  Under these circumstances, a vehicle can have high emissions
      when measured by remote sensing but may meet the I/M inspection requirements.  This
      effect is greatest for CO and NOx.

For  these reasons,  multiple  remote sensing measurements  for  the  same vehicle  can  vary
considerably if the site is such that  the operating mode  of the  vehicle at the  time of the
measurement is not consistent.  As  stated earlier (Section 4.7.2), it is recommended that in the
case of multiple measurements, all  data are retained, or the maximum, minimum  and average
values are reported to provide as comprehensive a snapshot of a vehicle's emission profile as
possible. For broken vehicles, the variability and the likelihood of high readings is extreme. For
low  emitting, new or well-maintained vehicles, variability  caused by driving mode changes
under normal operating circumstances is very small.

The load under which each individual vehicle is driving,  or VSP, should be calculated based  on
vehicle speed, acceleration, and  roadway grade, as described earlier. The  distribution of VSP
should then be compared between  different  remote sensing sites to determine if vehicles are
being driven differently at different sites. If there are enough remote sensing measurements,
average  emissions by vehicle  type and model year can  be  weighted by a common  VSP
distribution to remove any bias introduced by different vehicle loads at different remote sensing
sites.

With regard to repair effectiveness, it is important to  recognize that not only  are  absolute
emission levels sensitive to vehicle load; the percent change in emissions from vehicle repair is
as well.  An analysis of repair effectiveness on a  sample of vehicles given a full IM240 test
before and after repair indicates that  the percent reduction  in  emissions over the moderately
loaded portion of the EVI240 was only half that of the reduction over the entire EVI240 (18).
FINAL

-------
Therefore,  it is critical that any analysis of remote  sensing data used  to  characterize fleet
emissions in general or estimate repair effectiveness include the calculation of vehicle load. To
minimize the possibility of a driver making sudden throttle changes it is recommended remote
sensing units be sited in locations such as highway on or off ramps.   In addition, analyses that
rely  on data  from more  than one remote sensing site should re-weight average emissions at
different sites by a similar distribution of vehicle loads, to allow proper comparison of emissions
data collected at each site.  There is some evidence that  older vehicles behave differently than
newer vehicles with respect  to VSP.  In the future, vehicles designed to meet supplemental FTP
certification  requirements  can  be  expected to  behave  differently than  today's  vehicles.
Consequently, adjusting calculations, if required,  should probably divide the populations into
several  ranges  of model  years.  Table 5.2 illustrates  the various loads vehicles are subject to
during emission tests or accelerations.

                            Table 5.2 Examples of VSP Values
Activity
Maximum Rated Power
0-60 in 15 seconds
60 mph up 4% grade
FTP or IM240 max
Typical RSD site
Average EVI240
ASM5015
ASM2525
VSP (kW/metric ton)
44-120
33
23
23
10-15
8
6
5
Figures 5.1, 5.2 and 5.3 illustrate the relationships between emission and VSP for various vehicle
MY groupings. Maintaining as narrow a VSP window as possible will help minimize variability
between site measurements, although there may be practical limitations of how tight the VSP
operating window can be held.  The data presented  in the  following three figures  indicate
relatively constant CO and HC emissions for VSP values between 5 and 20 kW/metric ton, while
NO emissions are more variable  even if the VSP window is reduced to 10 to 20 kW/metric ton.
Therefore, for this data set it would appear that a VSP range of 15 +/- 5 kW/metric ton would be
the    recommended         target     to    minimize    site-to-site    load     variability.
FINAL
                                                                               -24-

-------
                           Figure 5.1 RSD %CO vs VSP
                  (Denver Remote Sensing Clean Screen Pilot 12/99)
        O
                          RSD CO vs. Specific Power
-15   -10    -5     0
                                 5     10    15    20   25
                                Specific Power kW/t
40
FINAL
                                                                     -25-

-------
                         Figure 5.2 RSD %HC (C6) vs VSP

                   (Denver Remote Sensing Clean Screen Pilot 12/99)
       (0

       I
       Q.
       0
       X
                           RSD HC vs. Specific Power
          -15
               -10    -5
                                 5     10    15     20    25     30    35


                                Specific Power kW/t
                                                                       40
FINAL
                                                                      -26-

-------
                               Figure 5.3 RSD %NO vs VSP
                     (Denver Remote Sensing Clean Screen Pilot 12/99)
                                 RSD NO vs. Specific Power
         a.
         a.
           -15
                  -10
                        -5
                                     5     10     15     20

                                       Specific Power kW/t
                                                              25
                                                                    30
                                                                          35
                                                                                 40
5.4. Vehicle Identification
Goal: Identify vehicle license plate so RSD emissions may be linked to specific vehicle and I/M
test result if available.

Optical character recognition is commonly used to read license plates in RSD studies; however,
car must be taken to ensure these data are accurate.  The license plate's design or color scheme
may adversely affect the accuracy of the data, and this would obviously result in errors in linking
the RSD reading with the correct I/M test result. If manually entry is to be used to enter license
plate data into a database,  procedures should be developed to identify and correct transcription
errors.

It  must also be understood that  depending  on  a  state's infrastructure  regarding vehicle
registration tracking and ease of access to the I/M test database, matching the RSD data with the
appropriate I/M test result can be more difficult than anticipated.
FINAL
                                                                                -27-

-------
5.5. Instrument Calibration
Goal: Ensure RSD units are calibrated using standardized procedures.

More detailed calibration specifications are provided in Section 4.5; however, it should be noted
that the accuracy specifications on instruments may have a greater range than the differences
between fleets, so the instruments may meet specifications but still give significantly different
results.  For example, if the CO specification is +/- 0.25%, at a typical fleet average of 1% CO,
one system could be centered at 1.05% and another at 0.95%.  Both are well within specification
but would report a 10% difference in two identical fleets.

Several approaches are possible  for identifying and correcting  this problem. Not  all may be
feasible:

       i)      Examine unit certification and audit data to determine offsets.
       ii)      Run the units side by side to obtain comparative results.
       iii)     Compare emission distributions for new model years of vehicles whose emissions
              profiles are expected to be the same in both fleets.

5.6. Measurement Methods
Goal: Convert  concentration based RSD measurements on individual vehicles to mass based fleet
estimates.

Remote sensing measures emissions in terms of concentration ratios in the total exhaust, while
I/M  programs that use  idle  or ASM  testing  measure  emissions  concentrations.  However,
programs that  use IM240 or IM240-derivatives use concentration readings, air flow and miles
driven on a dynamometer to calculate mass emissions.  Therefore, fuel consumption data for an
area may be used with fleet average RSD or ASM measurements taken  in units of g/kg  fuel to
determine the fleet average emissions or the fleet average emissions could be  converted to  g/mi
values by using instantaneous vehicle fuel economy estimates.

Also, as mentioned earlier  (Section 3.3) areas conducting IM240  or  ASM testing should plot
mean RSD emissions against mean initial EVI240/ASM emissions by vehicle type and model
year.  These plots typically  show a linear relationship with high correlation coefficients and can
be used to establish a direct relationship between the RSD measurements and the I/M test results.

5.7. Socioeconomics
Goal: Minimize the socioeconomic influence on data collection so that the I/M program benefits
are quantified and not the socioeconomic differences that exist between fleets due to income.

It  is believed  that the  vehicles owned by relatively low-income  drivers tend to  have  higher
emissions, from  a combination of vehicle  age and mileage, model, and historical maintenance
practices.   Researchers  have  found that  vehicle  owner  socioeconomics can affect vehicle
emissions independent of even vehicle type, age, and model (19). Specifically, in one study CO
and HC emissions were found to be roughly 25% higher in Lynwood CA than in El Monte CA20.
FINAL                                                                          - 28 -

-------
The socioeconomic background of the drivers of vehicles measured by remote sensing can be
quite different depending on where the instrument is located.

The effect of driver socioeconomics on remote sensing emissions can be identified by graphing
average emissions by  vehicle  type and age for each measurement site, after correcting for
different load conditions at each site.  Driver socioeconomics must be considered when selecting
sites for remote sensing measurement. If measurements from different sites are to be compared,
such as under the Reference Method, sites with  similar driver socioeconomics  should be used.
One method to determine if a true cross section of vehicles is being sampled is to  plot the
percentage of RSD measurements vs. ZIP code .

If it is discovered that the differences in fleet emissions between two sites are due primarily to
socioeconomic factors, there is no easy way to  deconvolute the existing data.  Therefore,  this
issue should be addressed in the planning phase before any data is collected.
5.8. Seasonal Effects
Goal: Minimize the influence of seasonal variables on data collection.

Since no  existing I/M  programs vary their cutpoints vary by season,  seasonal  effects may
influence a vehicle's measured emissions and therefore whether it passes its I/M test.  However,
the seasonal effects impact vehicle operations independently of whether emissions are measured
by in-program  analyzers  or RSD.   Therefore, a seasonal effect may introduce a bias when
comparing, for instance, remote sensing measurements taken during two distinct time periods.

Vehicle emissions as measured by the Arizona program vary by season as depicted in Figures
5.4-6.  Figure 5.4 shows the daily average CO of initial IM240 tests  of Arizona passenger cars
over a three year period (filled circles, left scale).  Emissions of cars that are fast-passed or fast-
failed are extrapolated  to their full  IM240 equivalents.  The trend in the  maximum daily
temperature is also shown (gray lines, right  scale).  The solid vertical lines denote  the calendar
years, whereas the dashed vertical lines denote the changes in  fuel.  CO,  and HC, are higher in
warmer summer months; while NOx shows the opposite seasonal trend, and is higher in winter
months. Colorado EVI240 data show similar seasonal patterns.

It is unclear whether the seasonal variation is due to a combination of seasonal temperatures and
changes in fuel composition, or to inadequate conditioning of vehicles  prior to testing.  The
seasonal variation in Arizona remote sensing (Figure 5.5) and loaded idle (Figure 5.6) data
appears to mirror that of the Arizona IM240 emissions, suggesting that vehicle conditioning is
not the cause of the variation.  However, the seasonal variation in CO and HC in the Wisconsin
IM240 program (Figure 5.7) and the Minnesota idle program (Figure 5.8) are in  the opposite
direction, that is, CO and HC are higher in winter months. (The trend in Wisconsin NOx follows
that  of Arizona and Colorado.) More analysis is needed to better understand these seasonal
* Other parameters may be used to segregate the data such as IM area, previous IM test result or MY.  A specific
example in which IM area was used may be found in the June 19, 2000 Inspection & Maintenance Review
Committee Report, "Evaluation of the Enhanced Smog Check Program", Appendix F.

FINAL                                                                           - 29 -

-------
trends, and why they differ by area;  however, these trends can be identified using RSD and
should be discussed as a component of an IM program evaluation.

Average emissions can be plotted by time periods (preferably weeks or days) and compared with
average temperatures and fuel seasons to  determine if there is a seasonal variation in remote
sensing  and/or I/M  emissions.  To reduce any seasonal effect on emissions, remote sensing
measurements for the Reference Method should be made during roughly the same time period.
FINAL                                                                        - 30 -

-------
                         Figure 5.4. Daily Average CO, Arizona IM240
                       Daily Average CO  (adjisted), IiitialTests of Passenger Cais
                                      1995-97 Anzona M 240
                                                                                                 140
                                                                                                 120
                                              Day
FINAL
                                                                                    -31-

-------
                       Figure 5.5. Daily Average CO, Arizona Remote Sensing
                                     Aveiage R em ote S ensiig C O , by D ay

                                            1996-1997 Arizona
O
U
   0.6
                                                 Day
    FINAL
                                                                                    -32-

-------
0.9
               Figure 5.6. Daily Average CO, Arizona Loaded Idle (Pima County)
                                 Average Loaded EHe CO , by Day
                                        1995-97 Arfeona
                                                                                            140
                                                                                            120
0.0
                                              Day
  FINAL

-------
                     Figure 5.7. Daily Average CO, Wisconsin IM240
                       Daify Average CO , IrtalTests of PassengerCais
                                  1996-97 W isconsii M 240
                                                                                           140
                                                                                          -- 120
                                                                                          -- 100
                                                                                               ni
                                           Day
FINAL
                                                                            -34-

-------
0.9
0.0
                       Figure 5.8. Daily Average CO, Minnesota Idle
                            DaJlyAveiage CO , IiitialTests of Passenger Cais
                                      1991-95 M Jnnesota Hfe
                                                                                           140
                                             Day
5.9. Program Avoidance
Goal: Account for emissions from motorists who are avoiding the I/M program.

There is evidence that I/M programs are inducing owners to re-register their vehicles outside of
I/M areas (70, 2f). If these re-registrations are legitimate, i.e. drivers relocating their residences
or selling their vehicles to new owners outside of the I/M area, then the program has helped to
reduce emissions in  the  I/M area.   However, there is evidence that a portion of these re-
registrations are attempts to avoid I/M testing and many  of these vehicles  continue to be driven
in the I/M area (12, 22). Studies have estimated that program avoidance can lower the apparent
CO reductions on the order of 2% (10, 12). This program avoidance complicates any evaluation
of an I/M program, in that analysis of I/M data would indicate emissions reductions (vehicles
FINAL
                                                                               -35-

-------
leaving area) that are not occurring on the road.  As discussed above, remote sensing data can
include such vehicles in their estimate of fleet emissions. In addition, remote sensing data can be
used to identify the subset of vehicles that are no longer registered in the I/M area but continue to
be driven in the area.

The design of a remote sensing program itself can influence which vehicles are measured under
the program.  A program which provides a negative incentive, such as additional I/M testing, for
driving past a remote sensor may encourage  drivers to avoid having their vehicle measured by a
remote sensor.  On the other hand, a program that is intended for research  purposes only, or
provides only a positive incentive (the possibility of being exempted from the next I/M test), will
result in a more representative sample of vehicles measured.

The distribution of vehicles (by type and model  year) measured  by remote sensing should be
compared with the distribution of vehicles registered in the area,  or reporting for I/M testing.
Any differences between the two  distributions  may indicate  a bias in  one of the samples and
suggest a possible program avoidance issue that needs to be addressed.   However, care must be
taken when performing this comparison because RSD measurements reflect on-road driving
distributions while traditional I/M testing is registration based.  Therefore, it is  possible that RSD
will over-sample newer vehicles relative to a registration-based I/M program.

5.10. Regional Differences (policies, environment fuel composition, etc.)
Goal: Account for differences in fleet emissions not attributable to I/M  across  geographic
regions.

A number additional variables, such as environmental conditions,  fuel  composition, vehicle
registration, safety inspection,  public attitudes, and tax policies,  etc.,   can result in biases in
emissions measurements made in different regions. These biases would have the biggest impact
on an evaluation using the Reference Method.  Some  of these potential biases and methods for
minimizing their impact are discussed in more detail in the Reference Method section below.

5.11. Program Details
Goal: Identify and account for I/M program operation details in the program evaluation analysis.

Biennial I/M programs use a simple technique  to determine which vehicles are to be tested in
which year.  For example, Colorado requires vehicles of even model years to be tested in even
calendar years, and vehicles of odd model years to be tested in  odd calendar years. Arizona
bases a vehicle's test year on whether the last digit in the vehicle identification number is odd or
even.  Different states have different policies regarding whether I/M tests are required when a
vehicle changes ownership, and the circumstances under which a vehicle's registration date and
year changes. Additionally, vehicles that are newly registered in AZ or CO must be tested when
they are first registered, regardless of their model year or last digit in their VIN. These factors
complicate the determination of whether a particular vehicle  has  been  tested under the current
I/M program  or not.  Therefore, it is essential that the date of each  vehicle's last I/M test be used
to determine whether the vehicle has been tested under the current I/M program.
FINAL                                                                           - 36 -

-------
States may also often have different policies regarding vehicle license plates. The license plate of
a car sold in Colorado stays with the original owner, whereas the license plate is transferred to
the new owner in Arizona.   These policies may complicate the  matching of license plates
observed by remote sensing units with the correct vehicle and I/M test result information. These
and  similar details  of registration and  I/M programs  should be understood to minimize the
misidentification of the tested and untested vehicle fleet.

5.12. Emissions Distributions
Goal: Identify possible sources of bias in the measured emissions.

One way  of determining whether emissions measurements are  biased  is to compare average
emissions by vehicle type and model year, as described in Section 5.2.  If average emissions by
model year are  consistently higher for one group of vehicles than another, then the emissions of
that group of vehicles may be biased by  some of the factors discussed above. Another approach
is to compare the distribution of emissions of a subset of similar vehicles in the I/M-tested and
untested fleets.   Because vehicle emissions are highly skewed, with a majority of vehicles with
low  emissions and a small number of vehicles with very high emissions, differences between
groups of vehicles will be more readily apparent if the distribution is plotted on a log-normal
scale. Three ways to compare emissions distributions are outlined below.

   i)     One way of looking for changes in the shape of emissions distributions is to look at
          the contribution of the dirtiest 10% of vehicles which contribute a large percentage of
          the total emissions.   Table  5.3  illustrates  the  contribution  of the dirtiest 10% of
          vehicles in each model year.  The vehicles' CO emissions were measured at multiple
          RSD sites that have been divided into three groups based on the mean vehicle specific
          power of vehicles measured at the site.   The percentages in Table 5.3 show that the
          percentage of emissions concentrated in the dirtiest 10% of vehicles is greatest among
          the newest model vehicles that have a smaller number of high emitters.

   ii)     Another method is to divide vehicles into equal groups (quintiles or deciles), and plot
          the average emissions of each group.  Decile plots focus  attention on the majority of
          vehicles that have relatively low emissions. Figure 5.9 is a decile plot using the same
          data  as Table 5.3  for  2000 model  vehicles; however,  now it becomes  easier to
          distinguish between the low and high emitting vehicles.

   iii)    The  third method is to plot the full distribution of vehicles, rather than quintiles or
          deciles; the full distribution allows closer examination of the differences in the small
          number of high emitters in two samples of vehicles. Figure 5.10 is a full distribution
          plot  of the data shown in Table 5.3 and Figure 5.9 for 2000 model vehicles.  The use
          of a  logarithmic scale highlights the difference among the few high emitters in each
          data set.
FINAL                                                                           - 37 -

-------
                          Table 5.3. CO Emissions 10% dirtiest by MY
Year
1976-80
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
<15
46%
47%
50%
51%
54%
54%
60%
61%
62%
62%
60%
61%
61%
60%
59%
61%
61%
63%
65%
68%
71%
72%
Site VSP (kW/t)
15-17.5
43%
44%
44%
48%
48%
50%
56%
57%
60%
60%
60%
62%
61%
60%
60%
63%
61%
63%
66%
68%
73%
75%
>17.5
39%
42%
44%
48%
48%
48%
53%
55%
59%
59%
61%
63%
64%
63%
65%
67%
67%
68%
72%
74%
77%
77%
FINAL                                                                             - 38 -

-------
                             Figure 5.9. CO Emissions Decile Plots
                                     2000 Model RSD CO
                            100
                                  90
                                       80
                                            70
                                                 60
                                                     50
                                                          40
                                                              30
                                                                  20
                                                                      10
% of Vehicles
                                                                               C
                                                                               O
FINAL
                                                                               -39-

-------
                     Figure 5.10. CO Emissions Full Emissions Distributions
                                    2000 Model RSD CO
                                                                         17.5 kW/t
                                                                         5-17.5 kW/t
                                                                         15 kW/t
       % of Vehicles
           -1
  2
CO%
FINAL
                                                                                -40-

-------
6. Evaluation Methods
This  section  outlines  three  methods  to  use remote  sensing data to analyze  I/M program
effectiveness over the short term.  The first two methods, the Step Change and Comprehensive
methods, involve remote sensing measurements collected in an I/M area; the final method, the
Reference  Method, compares remote sensing measurements collected  in an I/M area with
measurements collected in an external, or reference. Each of these methods is described in more
detail below.

6.1. Step Change Method

6.1.1. Description

There are several reasons for performing on-road emission reductions independent of an I/M
program. New  technology  vehicles  are lower  emitting for a  given  fleet  age than older
technology vehicles. Depending on local and national economic factors, the fleet age itself may
be changing (newer vehicles are lower emitting) and  it is possible that  public education or
willingness to carry out required maintenance is less compliant than anticipated, and the auto
repair industry capability are improving irrespective  of the  presence or absence of  an I/M
program. All these factors  make it important  not only to measure the  on-road  emission
reductions  of the I/M fleet, but also to measure the  emissions of a well matched control fleet,
preferably differing only in I/M status.

The  Step Method is an on-road  evaluation of new or changed I/M programs  using a  built-in
representative control  group.  On-road emissions  are the parameter which I/M  programs are
intended to control ;  however,  most I/M programs  emphasize  testing of fully  warmed-up
exhaust  emissions.  If I/M  exhaust emissions  failure  is followed up  by  successful repair,
scrapping the vehicle  or relocating it to a region from which it is  rarely driven in the program
area,  then  the program should show on-road exhaust emission reductions. When a new I/M
program starts or when there is a major program change, then there is a window of opportunity to
evaluate the effectiveness of that change in  reducing on-road emissions.  That window arises
when the new (or changed) program has impacted about 50% of the local fleet. If an annual
program starts, then the window is after about six months. In a biennial program the window is
after the first year.  The concept  behind this  evaluation is that the untested fleet serves as the
representative control group for the tested fleet.  Ideally, data collection should be carried out at a
sufficient number of sites in the area to ensure appropriate representation and sampling should
include  surface streets as well as highway on/off ramps;  however, a single well-traveled site can
be representative of an I/M area.  As mentioned in Section 5.6, one method to determine if a true
cross section of vehicles is being sampled is to plot the percentage of RSD measurements vs. ZIP
code.
  It is tacitly assumed on-road emissions are controlled by linking the I/M standards to certification standards, as
vehicle emissions shouldn't be expected to be reduced below their certification levels.  Whether this strategy is
appropriate or valid with respect to reducing on-road emissions and improving air quality is a discussion beyond the
scope of this document.

FINAL                                                                           - 41 -

-------
6.1.2. Application Examples

Colorado  had various versions  of decentralized idle/2500  tests since  the  early  1980s and
switched in the Denver metro area to a biennial centralized EVI240 based program on January  1
1995. Because the program is biennial, by January of 1996, roughly half the measured fleet (odd
MY) had been through the new I/M program and the other half (even MY) had missed a year of
their old annual program. On-road monitoring was carried out for five days in January of 1996 at
a single heavily trafficked  site. Approximately 26,000  valid, plate-matched records were
obtained.

Data were collected at a freeway off-ramp to eliminate cold-start vehicle emissions. Vehicle load
was not measured as  it was assumed tightly curved uphill ramps have little off-cycle power-
enrichment, and  the tested and untested MY are randomly interspersed and subject to the same
loads thus making for a valid comparison independent of load.  Additionally, the VSP concept
was at best in the developmental stage.  However, EPA strongly would recommend vehicle load
be characterized  using VSP as described earlier for all program evaluation studies.

DMV records  provided county of registration, I/M eligibility and most recent I/M status (pass,
fail, or waiver).   Individual  emission data bases  are not normally distributed; however, if one
treats the means  from each measurement day as an independent  sample then these sub-samples
can be analyzed using normal statistics. This resulted in 5 means (1 for each day) per fleet. For
a fleet of about  26,000 vehicles it was found that the uncertainty in the apparent emissions
benefits is +2%.  This error would be reduced with a larger fleet  size provided that approximate
equality between tested and untested vehicles could be maintained.

The first analysis was  "eligible and  certainly tested" versus " eligible in the future but not tested"
giving an apparent 7+2% CO benefit. During  this first analysis it was recognized that many
vehicles should  have  been tested but were not,  so a second analysis was "should have been
tested" versus "not tested".  This reduced the apparent  benefit to 6+2%. Approximately 1300
vehicles registered in locations not  required to take the I/M test were  also measured at one site
and these vehicles showed higher average on-road emissions. However, they also showed an
alternation of emissions by  MY as if the  I/M program had  caused failing vehicles to be
reregistered to outlying counties but yet continue to be driven in Denver. A follow up study  a
year later confirmed that indeed this effect is happening and, when included for that site, reduces
the apparent benefit by 2%.  The contribution of these  "repair avoidance" cheaters to  the basin
wide fleet emissions cannot be determined from one freeway interchange site, but their emissions
were large enough that at the measurement site  the 6+2% apparent I/M CO benefit was reduced
to 4+2% (70, 72).

The same database actually  allowed for two other I/M benefit tests  of lower precision. Using
only the even MY vehicles, the on-road emissions of those tested versus those  untested was
evaluated. This resulted in a  5+3% apparent I/M CO benefit. Evaluation of the difference in on-
road emissions between vehicles of all MY tested within four months before the measurement
time and two months after indicated an apparent 8+6% benefit for CO. On-road benefits for HC
and NO were insignificant. The analyses discussed above were published in the literature (70).
FINAL                                                                         - 42 -

-------
Several factors obscured the clean 50/50 split between untested even MY and tested odd MY in
these studies.  For instance, many  1994 MY vehicles were tested in 1995, 1995 and later MY
new vehicles obtained  a four-year I/M waiver, and all  vehicles had to take the I/M test upon
change of ownership regardless of MY.  However, many of these potentially confounding factors
can be corrected.

6.1.3. Potential Systematic Errors

A major advantage of a single-site, single time I/M evaluation study is that instrument calibration
and vehicle load/speed are irrelevant since both fleets are subject to the same measurement
system. A  second advantage is the measured  and the control  fleets are perfectly  matched
socioeconomically. A third advantage is that the evaluation can be carried out with only a single
week of work to within 2% accuracy levels, and the fleet average remote sensing data has been
shown to correlate very well with fleet average IM240  data (10).

However, three disadvantages are apparent; one that the window of opportunity is only when a
new program starts up or a program change which is  predicted to have measurable effect is
initiated; the second that the reference group of untested vehicles may not be a correct reference;
the third is added diligence is needed to ensure a representative  sample is obtained.

There  is some  evidence that change of ownership vehicles  have higher emissions  than  the
average of the same MY. This effect would cause the average of the untested even MY vehicles
(the control group) to be biased low and thus cause an underestimate of apparent I/M benefit. It
is possible to attempt to correct for this bias25. This study eliminated the large sample of 1994
MY vehicles which had been tested because they were very numerous and certainly a few
months older than the untested (last quarter) of the  1994 MY.  These two effects both lower the
apparent emission of the untested  fleet, thus increasing the apparent I/M benefit  from  the
previous 4%-7% range to 8%-ll% with the same +2%  error. The last two  analyses are  not
effected by these corrections and remain at 5+3% and 8+6% apparent I/M benefit for CO (12).

There had been an annual I/M program in place in Denver for more than ten years. The odd MY
fleet took the old test in 1994 and the new in 1995. The untested even MY fleet skipped testing
in  1995 because their scheduled IM240 was in 1996. If the old  program had no benefit, then this
skip introduces no bias. If the old program had emissions benefits which last a long time (long
repair lifetimes as in the EPA Mobile model) then no bias is introduced, but, the apparent benefit
is that of the new program relative to the older one; not relative to  a "no I/M" baseline. To the
extent that repair lifetimes are not as long as modeled by EPA and  the old  program did lead to
reduced emissions, then the skipped annual test moves the control group back toward the no I/M
line, thus overestimating the I/M benefit relative to the previous program but with the upper limit
being relative to no I/M.

To correct for this bias, one needs to estimate both the emission reductions from the previous
(idle/2500) program and the apparent repair lifetime, but this is not straightforward. If from the
DMV records one can  determine which  tested odd  MY vehicles were not changing ownership,
then the even MY bias  is removed and the study measures the  apparent I/M benefit for the fleet
which does not change ownership.

FINAL

-------
6.2. Comprehensive Method

6.2.1. Description

The  Comprehensive Method involves comparing remote sensing emission measurements of a
fleet of vehicles measured prior to initial I/M testing with those of a fleet of vehicles measured
after final I/M testing.  The difference in fleet average remote sensing  emissions is  the initial
percent reduction due to the I/M program.  Sufficient numbers of measurements are made so that
emission reductions can  be  evaluated by  vehicle type and model year,  and by I/M result.
Important observations about repair effectiveness and program avoidance can be made if enough
vehicles are measured.

One  of the main reasons for using remote sensing measurements to evaluate the effectiveness of
I/M programs is that remote sensors measure emissions of vehicles that may not be participating
in an I/M program.  The Comprehensive Method differs from other remote sensing methods, in
that  it  explicitly compares emissions reductions of the I/M tested fleet as measured by  the
program  and as measured independently  by remote sensing. The Comprehensive Method can
also  be used to compare the  emissions of the I/M-tested fleet with  those of the non-I/M-tested
fleet, as can the other methods.

6.2.2. Application Examples

The  Comprehensive Method concept was  first applied by Doug Lawson, using unscheduled
roadside idle testing of randomly selected vehicles from CARB's  1989, 1990, and 1991 random
roadside  surveys.  Lawson found that average emissions levels of vehicles tested prior to their
I/M test were about the same  as those of vehicles tested after their I/M test. The emissions levels
measured during the scheduled I/M tests were 60% less than  the  emissions measured during
unscheduled testing either before or afterwards (24). The analysis was limited, in that fewer than
5,000 vehicles were analyzed in any given year.

Radian International was the first to apply this  method using  remote sensing data,  in a 1997
evaluation of California's I/M program for the California Bureau of Automotive Repair (25). For
their analysis Radian had  access to over 3.5 million RSD measurements from the Statewide On-
Road Emissions Measurement System. Because  of concerns regarding the accuracy of some of
the RSD instruments, the first 6 months or so of RSD data were not included in the analysis (the
report  gives no  indication  of how  many  measurements,  or vehicles,  were involved in  the
analysis).  Radian  also excluded RSD measurements  taken at sites that had a relatively high
percentage of high emitting vehicles from the newest model years.  Radian  grouped the RSD
measurements into two time  periods:  30 to 90 days prior and 0 to  90 days after. Radian also
grouped vehicles by model year group and I/M outcome (initial pass, initial fail/final pass, initial
fail/no  final pass).  However, despite the large sample  size, Radian did not have enough remote
sensing measurements to compare pre- and  post-I/M remote  sensing  emissions of the  same
vehicles (that is, a total of three emissions measurements on  each vehicle), so the cost  associated
with such a study should not be underestimated.
FINAL                                                                        - 44 -

-------
More recently  the  Comprehensive Method was used by Lawrence Berkeley Laboratory  in
analyzing 4 million remote sensing measurements on 1.2 million vehicles in the Phoenix I/M
area (18,, 22, 26). It was found that initial emissions reductions as measured by remote sensing
were roughly half that as measured by the initial and final IM240 tests; IM240 data indicated a
15% reduction  in fleet-wide CO and HC emissions due to the program (g/mi units), while the
remote sensing  data indicated only  a 7% reduction in CO and an 11% reduction in HC emissions
(g/kg fuel  units).   Because there  is a  small  gas mileage benefit to  CO  and HC  emissions
reductions the per mile emission reduction as measured by RSD would be slightly higher. For
instance, assuming a 10% gas mileage improvement and a 10% emissions reduction after repair
would increase the  7%  CO and  11% HC g/kg fuel RSD measurements to  8%  and  12%
respectively.  However, these values are still below the 15% reductions determined using IM240
data.  Part of this discrepancy may be due to the different loads vehicles are subjected to under
IM240 testing and remote sensing  measurement.  As in the earlier Step Method study, the VSP
concept was still under development and not available as a tool to reduce measurement bias due
to vehicle load.

The Comprehensive analysis found that average remote sensing emissions increased as vehicles
got further from their I/M test; the initial 12% reduction in fleet-wide CO emissions less than one
month after I/M testing declined to only a 6% reduction in  fleet-wide CO emissions one year
after I/M testing. In other words the repair benefits did not last nearly as long as they do in the
I/M models.  The Comprehensive  analysis also found that average RSD emissions increase  as
vehicles get closer to their scheduled I/M test; this is especially true  for vehicles that fail  their
initial I/M test. An  analysis of emissions trends in the weeks prior to their initial I/M test
indicates that the average emissions of these initial fail vehicles do decline slightly immediately
prior to I/M testing, suggesting that pre-test repairs and/or adjustments are being made.

6.2.3. Steps

Under the Comprehensive Method, a large number of remote sensing measurements are taken at
suitable sites throughout an I/M area. License plates from the remote  sensing measurements are
then matched with license plates either in a registration database,  or in the I/M testing database.
How remote sensing measurements are matched with vehicle information depends on how each
state registers vehicles.   For instance,  some states (such as  Arizona)  assign license plates  to
vehicles; when  a vehicle is sold, the license plate stays with the vehicle. In contrast, other states
(e.g. Colorado)  assign license plates to a driver and when a vehicle is sold the license plate stays
with the driver  and can be affixed to a new vehicle. It is critical that license plates obtained from
remote sensing  programs be matched to the correct vehicle, and in some cases this will require
tracking a vehicle's VIN to link it  with the appropriate I/M test record and then match it to the
RSD measurement.  It must be understood that depending on a state's infrastructure regarding
vehicle  registration tracking and ease of access to the I/M test database, this task can be very
difficult.

The result is a large database of vehicles, some with multiple remote sensing measurements and
multiple I/M tests (vehicles that fail their initial I/M test and return for subsequent testing).
Vehicles are then  classified into several groups, based on the results of their I/M  test(s):  1)
vehicles that pass their initial I/M test; 2) vehicles that fail their initial test but pass a subsequent

FINAL                                                                         - 45 -

-------
test; 3) vehicles that fail their initial test and do not receive a subsequent I/M test; and 4) vehicles
that fail their initial test and fail a subsequent I/M test.  Vehicles can be further categorized into
more groups, based on the time between initial and final I/M test, or the results of their emissions
test vs. the results of visual or functional I/M tests.

Individual  records are then categorized  based on  the  time between  the  remote  sensing
measurement  and the initial  or final  I/M  test.   For example, individual  remote  sensing
measurements of vehicles can be grouped into 3 month time periods (0  to 3 months, 3 to 6
months, 6 to 9 months) prior to the vehicle's initial I/M test and after the vehicle's final I/M test.
These time periods can be  shortened to as little as one month or one week, depending on  the
number of remote sensing measurements.  Remote sensing measurements of individual vehicles
with multiple measurements in  a  given  time  period can be  averaged to  obtain  a single
measurement for that vehicle in that time period, or can be treated as independent observations
(meaning that some vehicles are "double-counted" in some time periods).

Given the size of the database collected in the  Comprehensive Method, valuable insight into
repairs and repair durability can also be estimated. Analyses should include calculating average
emissions as  a function of time period, I/M result, vehicle type and model year and plotting the
results.   To  determine the initial  effectiveness of the I/M program, only  remote  sensing
measurements over a relatively short period should be used, to minimize the  impact of changes
to the vehicle on the results. For example, average emissions of up to 3 months prior to initial
I/M test  can be compared with average emissions of up to 3 months  after final I/M test.  The
difference in  average remote sensing emissions is the initial emissions reduction due to repair of
many vehicles identified by the I/M program as high emitters (some  of the emission reduction
may also be due to vehicles passing a subsequent I/M test without any repairs being made.)  The
emission reduction can be calculated for the entire tested fleet to determine the overall impact on
the  fleet, as well as for subsets of the fleet with different I/M results, to determine the impact of
the  program  on, say, vehicles that fail initial I/M testing.   The initial emissions reductions as
measured by remote sensing can then be compared  with the initial emissions reductions as
measured by  I/M testing.  The analysis can also be extended to time periods further after I/M
testing to analyze the short-term durability of any repairs made under the I/M program.

There is evidence that some vehicles are repaired or receive maintenance just before their
scheduled I/M test; these pre-test repairs may result in the initial I/M test underestimating  the
average emissions prior to I/M testing.  This underestimation of the baseline emissions may in
turn result in  an underestimation of the effectiveness of the  I/M program.  The data in Figure 6.1
provide evidence that  owners do perform maintenance prior  to  an  I/M test and survey data
indicated that 35% of vehicle owners brought their vehicle  in for a tune-up prior to their initial
test. The figure shows average weekly remote  sensing CO emissions in different time periods
before the initial, and after the final, I/M test of each vehicle. The figure indicates that emissions
increase as vehicles get closer to their I/M test; however, emissions decrease substantially (12%)
about three weeks prior to the initial I/M test.  An evaluation based only on measurements taken
immediately before and after I/M testing would estimate an 8% reduction  in emissions.  If the
effect of pre-test repairs and adjustments are included,  however, the reduction attributable to the
program increases to 18%.  To minimize the effect of pre-test repairs on baseline emissions,
remote sensing measurements made within a month before a scheduled I/M test can be excluded

FINAL                                                                          - 46 -

-------
from the analysis (i.e., remote sensing measurements from 1 to 3 months prior to the initial I/M
test can be compared with remote sensing measurements from 0 to 3 months after the final I/M
test; Radian used this approach in their analysis of California RSD data).
  Figure 6.1. Average CO RSD Emissions by Time Period, 1996-97 Arizona Remote Sensing

                  Average CO  RSD Emissions by Time Period
                           1996-97 Arizona Remote Sensing
     SS
     o
     u
     o
     Cfl
     oc
                    A.  12% reductic
                    B.   8% reduct:
                  A + B 18% reducti<
             H	h
                   H	h
                             H	1	1	h
                                             H	h
          13  12  11  10   9   8   7  6  5   4   3   2  1

              Number of Weeks Prior to Initial IM240
                                                   H	h
                                                             H	h
                                                                   H	h
1234567
      After Final IM240
6.2.4. Advantages/Disadvantages

There are several advantages to using the Comprehensive Method:

   i)   The initial emissions reductions  attributable to the  program can  be independently
        measured, and can be compared with those measured by the program itself.

   ii)   The repair effectiveness over the short-term (i.e., up to 2 years after final I/M testing)
        can be independently measured. Short-term repair effectiveness can be compared with
        long-term repair effectiveness as measured using multiple years of in-program data on
        the same vehicles.

   iii)  The effect of pre-test repairs on average emissions can be measured.

FINAL                                                                        - 47 -

-------
   iv)  Because large numbers of remote sensing measurements are made, the Comprehensive
        Method allows the identification of vehicles that do not report for, or do not complete,
        I/M testing, yet are still being driven in the I/M area.  Video camera surveillance can
        also be used to identify non-compliant vehicles, at less expense than remote sensing
        measurement; however, video cameras will  only  provide information on registration
        avoidance without any air quality data on high emitting vehicles

The primary disadvantage of the Comprehensive Method is that it requires a large number (on
the order of millions) of remote sensing measurements. The method can be applied on smaller
sample sizes (20,000 or more), but the error on the fleet average emissions estimate will increase.
Since RSD measurements  made up to roughly 3 months prior to and after I/M testing are most
representative of the condition of the vehicles when they were tested under the I/M program,
only these measurements can be used to estimate initial  program effectiveness. In a biennial (24-
month) I/M program, therefore, only about a quarter of the vehicles measured by RSD will have
been measured within 3 months of their I/M test.  However, the remaining RSD  measurements
can be used to estimate short-term repair effectiveness, and the effect of pre-test repairs on fleet
emissions.

6.2.5.  Potential Systematic Errors

Because the Comprehensive  Method relies on large numbers of remote sensing measurements,
the remote sensing program  will likely have to occur  over several months or possibly a year.
Vehicle emissions as measured by the Arizona and Colorado IM240 programs vary by season;
HC and CO are higher in warmer summer months, while NOx is higher in winter months. It is
unclear whether this  variation is due to a combination  of seasonal temperatures and changes in
fuel composition, or to inadequate conditioning of vehicles prior to testing (the seasonal variation
in the Wisconsin EVI240 program data, Arizona remote sensing data, and the Minnesota idle
program data are in the opposite direction of the variation in the Arizona and Colorado IM240
program data). No existing I/M programs vary their cutpoints by season to account for seasonal
effects on emissions.  There is a possibility that seasonal  variation in emissions measured by
remote sensing and the I/M program may introduce a systematic bias in the analysis.

The efficiency of remote sensing sites in identifying unique  vehicles decreases over time; that is,
many vehicles drive by the same sites every day. So concentrating the remote sensing program
on a handful of sites,  measured throughout the year,  may limit the  total  number of vehicles
measured. More sites may be used to increase the number of vehicles measured; however, this
may increase any effect  of site  bias  (either  due to the fleet of vehicles or the  roadway
configuration at individual sites) on the evaluation results.   The  vehicle specific power  of
individual remote sensing readings can be calculated, using roadway grade at the remote sensing
site as well  as speed and acceleration measurements, and used to  minimize  any  site bias
attributable to site characteristics, as discussed in Section 5.3.
FINAL                                                                         - 48 -

-------
6.3 Reference Method

6.3.1. Description

The  Reference Method for evaluating I/M programs involves comparing remote sensing data
from vehicles registered in an I/M program area to vehicles registered in a non-I/M program
area.  (The Reference Method may also be used to compare the fleet average emissions from one
I/M program to the fleet average emissions of another I/M program; although this section focuses
on the I/M to No-I/M comparison.)  Obtaining an adequate sample size of non-I/M program
vehicles will typically require conducting measurements in a separate geographic area, or the
"reference"  area. The reference area, by virtue of its absence of an I/M program, serves as a
surrogate untested fleet. The difference in fleet emissions between the I/M program area being
evaluated and its "reference" area represents the emission reductions  attributable to I/M program
effectiveness.  Additionally, this difference can then be compared with that predicted by mobile
models, such as MOBILE, to determine an overall effectiveness rating. The validity of this
approach depends upon selecting  a reference area  without distinctive characteristics that will
systematically bias the evaluation, as well as the accuracy  of the model if such an approach is
used. This section  provides  general  guidance for conducting such an evaluation,  including
selection of a reference area, data needs, and data analysis approaches.

6.3.2. Application Examples

The  Air Quality Laboratory of Georgia Institute of Technology used the Reference Method to
evaluate the effectiveness of the basic I/M program in place  in Atlanta in 1994. At that time, I/M
was  required for vehicles registered in only four counties of the Atlanta  13-county metropolitan
area:  Fulton, DeKalb, Cobb and Gwinnett.  The remaining nine counties, which were not tested
until enhanced I/M was implemented, served as the reference fleet. The results of the evaluation
indicated that Atlanta's basic I/M program was more  effective for cars than predicted by the
MOBILE model, but less effective than predicted for trucks. The Georgia Department of Natural
Resources used this result to support the mobile source emission reduction credit claimed in the
State of Georgia's 1996 State Implementation Plan.  The Reference Method was also be used to
evaluate Atlanta's enhanced I/M program in October 2000.

6.3.3. Applying the Method

Using the Reference Method for I/M program evaluation involves three major tasks: selecting a
reference area, gathering the necessary data, and analyzing that data.

6.3.3.1. Reference Area Selection

There are 6 key criteria to consider in selecting a reference area and they are presented below.

   i)  Distance
       Perhaps the  most critical criterion for  selecting  a reference  area is suitable geographic
       distance  from the reference  area. Recent analyses  of Denver  and Ohio  registrations
       suggest that  I/M programs  motivate vehicles to migrate  out of an area  to adjacent non-

FINAL                                                                         - 49 -

-------
       I/M counties (12, 27). Thus, if an agency were to select an adjacent area to evaluate its
       I/M program, higher-emitting vehicles may migrate to the reference area, making for an
       artificially  dirtier untested fleet.  Therefore,  reference  areas  should be  chosen  at  a
       significant  distance  from  the I/M program area to lower the  probability  of vehicle
       migration.

   ii)  Fleet Age
       The age of the fleet is another critical factor in selecting a reference area. Vehicle age is a
       well-documented contributor to  automobile  emissions. Consequently,  fleet age  is  a
       critical consideration in  selecting  a reference area for  an  I/M program evaluation. To
       illustrate, comparisons between an older fleet within an I/M area and a younger fleet in a
       reference area  will  underestimate  I/M program effectiveness.  Isolating emissions by
       model year between the  older and younger fleet will improve the comparison, but  such
       controls will not account for the affects of higher annual vehicle miles traveled (VMT) or
       potentially higher maintenance rates of the older fleet that influence emissions. VMT data
       are not readily available in all jurisdictions, but  may be inferred using traffic count data
       and vehicle  population information from the state department of transportation.  While
       VMT may  be estimated  from other data  sources, maintenance rates are generally
       unobservable. Thus,  the reference  fleet should be roughly the same age  as the I/M area
       fleet. Comparable fleet age can be determined most easily by a bar chart that plots the
       percentage distribution of vehicles within each model year for the I/M program area and
       its reference area.

   iii) Climate
       Climate is another key consideration  in selecting a reference  area. A variety of factors
       related to climate affect automobile emissions, and thus the selection of a reference area.
       For example, salt may be applied to roads in colder climates, potentially resulting in
       higher  rates of catalytic converter rusting,  which in turn influences vehicle emission
       control capacity. At the other extreme, high temperatures, such as those found in Arizona,
       may more rapidly dissolve the polymer used in emission components, adversely affecting
       their functioning. Altitude is another climatic factor which may  result in  differential
       emissions through potentially faster deterioration rates  of emission control systems. A
       wealth of resources - including National Weather Service  data - are available to assist
       policymakers in identifying areas  within their region that  provide comparable climatic
       conditions.

   iv) I/M Program Policies
       Differences  in policy programs between an I/M evaluation area and its reference  area
       may bias program evaluation. For example, a safety inspection program  that requires
       functional lights and brakes may speed fleet  turnover by denying registration to poor-
       condition vehicles.  While  the emission profiles of  these vehicles is uncertain,  it is a
       reasonable  hypothesis that they  are  higher than average  emitters  and that a safety
       inspection program will weed some of them  out, thus shifting fleet emissions downward.
       Thus, the presence  of a safety program  in a reference area might underestimate the
       effectiveness of the I/M program being evaluated by providing an artificially low baseline
       for comparison.

FINAL                                                                           - 50 -

-------
   v)  Motor Vehicle Tax System
       The tax system for motor vehicles is another source of variance in the fleet distribution.
       To illustrate, an ad valorem tax that declines rapidly with vehicle age may have the affect
       of slowing fleet turnover by making  ownership of  older vehicles more affordable.
       Conversely, ad valorem taxes in a reference area that are onerous among all model years
       may  shift the income level of older vehicle owners upward such that the socioeconomic
       characteristics of vehicle  owners are not  equivalent  by model  year between the
       comparison areas.  State policies  on antique vehicles  can also influence fleet  age and
       condition. For example, Georgia vehicles 25 years and older receive permanent tags with
       no further requirements for taxation, emissions testing or registration.  This exemption
       may  result in a concentration of very old vehicles compared with other areas that offer no
       such  exemption.  These  are just  a few examples of how public policies  seemingly
       unrelated to   air quality can  nonetheless influence  fleet  emissions.  Consequently,
       policymakers should research policy programs in candidate states to rule out the potential
       for systematic  emission biases that could result from their presence.

   vi) Socioeconomic Factors
       Finally, socioeconomic conditions are the least studied of the  influences on automobile
       emissions. Most  of the evidence  regarding the influence of socioeconomics  on fleet
       condition and  emissions is anecdotal, relying on conventional  wisdom that less affluent
       people will drive older vehicles (an assertion for which there is some evidence)  and that
       they  cannot afford to properly maintain their vehicles (for which there is little evidence).
       Another assumption is that older motorists  drive their cars infrequently but maintain them
       well. While socioeconomic conditions have received relatively little scholarly attention in
       comparison with  physical influences on automobile emissions, it is nonetheless wise to
       consider them in selecting  a reference  area because they may represent the unobserved
       influences of maintenance practices, driving behavior, and culture.

6.3.3.2. Data Needs

In addition to remote sensing data from the I/M evaluation area and its  comparison fleet, the
Reference Method requires registration data,  I/M records, and  model outputs, assuming it is
desired to include the model  as a part of the analysis protocol. Remote sensing data should be
collected from the I/M program area and its reference area under similar physical conditions and
within roughly the same timeframe. Simultaneous data collection prevents differences that may
occur due to temperature affects on emissions or seasonal policy changes  such as fuel changes.

Registration data are needed to generate the characteristics of remotely sensed vehicles, such as
registration address, model year and vehicle type. The registration address is particularly critical
for identifying whether a vehicle is located in the I/M area or reference area.  For example, if the
I/M program area and reference area are located near one another, then it is possible to measure
inspected vehicles in  the reference area  and reference area vehicles  in the  I/M program area.
Registration address can  also be used to generate demographic characteristics for the registration
area.  This process, known as geocoding, locates the census block  group  of the registration
address. The census block group,  in turn, can be  used to generate demographic data from the

FINAL                                                                           -51-

-------
most recent national census on its residents. These demographic data include median household
income,  median  family  income, and the number of  households receiving  social  security,
retirement and public assistance. Given the inverse relationship between a census block group's
median household income and the  average age of its registered fleet,  these  data  provides
additional controls for fleet age, as well as safeguard controls for the unobservable influences of
maintenance practices, driving habits, and cultural effects (28).

I/M records provide two optional pieces of information  for the Reference Method. The first is
odometer data. Odometer data can be used to extract  annual  vehicle miles traveled (VMT),
which  contribute to wear and tear and ultimate deterioration of a vehicle's emission control
system. VMT is   typically calculated by subtracting odometer readings  for two  consecutive
years, dividing by the number of days between inspections, and multiplying that figure by 365.
(The daily mileage must be multiplied by 730 for states with biennial testing.)

I/M records can also be used to identify  "invalid" reference area vehicles and non-compliant
inspection  area  vehicles. If  a significant number  of reference  area vehicles  have recently
migrated from the inspection area,  it is  possible that  the evaluation be biased high or low
depending on the average emission level of the migrating vehicles. I/M records can also be used
to estimate noncompliance in the I/M program area by identifying vehicles whose emissions
inspections have lapsed. This information prevents the I/M fleet from appearing artificially dirty,
while contributing valuable information to the  compliance  aspect of program performance.
Finally,  emission factor modeling  output  from MOBILE  or another model that predicts
emissions of the  inspected and non-inspected fleets can then be used to compare with real-world
differences in inspected and non-inspected fleets measured by the remote sensing data.

RSD data can also be combined with exhaust emission factors for cars  and light-duty trucks
extracted from the model tailpipe emission factors.  These  emission factors project average
grams/mile by model year and are the product of a range of inputs, including program design
(testing technology, model-year coverage, and emissions standards), fleet characteristics (fleet
VMT and age distribution), and operating modes (hot stabilized emissions to correlate with the
condition of in-use vehicles).  Inputs for the I/M-county fleet will include the design elements of
the current program such as the emissions analyzer, range of model years required for inspection,
and the testing mode, e.g. one-speed idle testing. I/M program elements for the non-I/M fleet are
simply omitted.  The modeling  process will also require the model  year distribution of  the
evaluation and reference fleets.

It should be noted that use of MOBILE may introduce analytical complexity as well as increased
technical uncertainty in the results due to the internal  coding of the model that will inherently
make comparisons and computational assumptions the user may not fully appreciate.

6.3.3.3. Data Analysis

The Reference Method can involve a variety of analytical approaches to assess the effectiveness
of an I/M program. The raw emissions of an I/M program area and its reference  area can be
compared with histograms to  determine any differences in the distribution of high emitters, low
emitters, and median points.  The significance of emissions differences by model year can be

FINAL                                                                          - 52 -

-------
determined through error bar charts that plot the mean emissions plus the associated uncertainty.
Regression modeling can be used to determine the influence of  registration in the I/M program
area versus its reference area on emissions. RSD emission differences in inspected and reference
fleets can be compared to the differences predicted by EPA mobile models to determine an I/M
program effectiveness rating (29).

6.3.4. Advantages and Disadvantages

The Reference Method has strengths and weaknesses for the evaluation of I/M programs. Most
importantly, it is a quantitative estimate of I/M  effectiveness  that is easy to calculate given
adequate data, although incorporating modeling output into the analysis will certainly add a layer
of complexity. As an external reference point for evaluating I/M programs, it provides ongoing
opportunities for evaluation whether a program is within a year  of implementation or five years
into operation is irrelevant. However, a significant amount of information is required beyond
remote  sensing  data,  including registration records, I/M  records  and   model  outputs.
Furthermore, no reference area will completely match the I/M area profile, thus there is always
the risk that some characteristic will systematically bias the  I/M program  evaluation higher or
lower than it should be. Finally, the method will not work in some states  (such as California),
where there are no reference fleets because the entire state is included in the I/M program area.
The Reference Method can also be used to compare  on-road  emissions  in the  region to be
evaluated to  those in another region,  such as Arizona, where I/M  effectiveness  has been
estimated by other methods.

7. Summary
Three methods for estimating  I/M program effectiveness using RSD data were outlined  in this
guidance.  Every effort was made to provide as  much detail as possible with regard to data
collection procedures, QA/QC  protocols, analysis methods, and sources of error or possible bias
associated with a given  method; however, it is recognized that improvements to those methods
outlined in this document will continue  to evolve.   Therefore, it is strongly recommended that
any state considering the use of RSD for program evaluation purposes work closely with their
respective regional EPA office and the Office of Transportation and  Air Quality to  ensure the
most up-to-date practices are incorporated into the evaluation. Furthermore, states interested in
using RSD for program evaluation must recognize the need within their own agencies to develop
a minimum level of expertise with  the  technology and procedures to ensure reliable data are
collected and analyses are performed properly.

It should also be recognized given the difficulties associated with I/M program evaluations, that
an evaluation based on both out-of-program data (e.g. RSD) and in-program data will provide a
more  accurate estimate of overall program performance than  simply relying  on one method
alone.
FINAL

-------
8. References
  7 Clean Air Act, 1970
  2 Clean Air Act Amendments, 1977
  3 EPA Inspection/Maintenance Policy Guidance, 1978.
  4_ Clean Air Act Amendments, 1990.
  5 57 FR 52950 or 40 CFR Part 51, IM Program Requirements; Final Rule, November 5, 1992.
  6 National Highway System Designation Act of 1995 (23 U.S.C. 101).
  7_ 62 FR 1362 or 40 CFR Parts 51 and 52, Minor Amendments to Inspection Maintenance
      Program Evaluation Requirements; Amendment to the Final Rule, January 9, 1998.
  8_ "Guidance on Alternative IM Program Evaluation Methods, EPA Memo, Office of Mobile
      Sources, Regional and State Programs Division, October 30, 1998.
  9 Singer, Harley, Littlejohn, Ho and Vo, "Scaling of Infrared Remote Sensor Hydrocarbon
      Measurements for Motor Vehicle Emission Inventory Calculations", ES&T (32)21,
      p.3241, 1998.
  10_ Stedman, Bishop, Aldrete, Slott, "On-Road Evaluation of an Automobile Emission Test
      Program" ES&T., 31, p.927, 1997.
  77 Stedman and Bishop, "Measuring  the Emissions of Passing Cars", Accounts of Chemical
      Research, 29(10), p.489, 1996.
  72 Stedman, Bishop, Slott, "Repair Avoidance and Evaluating Inspection and Maintenance
      Programs", ES&T, 32, p. 1544, 1998.
  73 Wenzel, Singer, and Slott, "Some Issues In the Statistical Analysis of Vehicle Emissions",
      J. Transportation and Statistics (3)2, p.l, September 2000.
  14_ Mann and Jones, CRC Report, "On-Road Remote Sensing of Automobile Emissions in the
      Research Triangle Park, North Carolina Area: 1997 and 1998", p.5, March 2000.
  75 Wenzel and Gumerman. "In-Use Emissions by Vehicle Model", Presented at 8th CRC On-
      Road Vehicle Emissions Workshop, San Diego, CA, April  1998.
  J_6 McClintock, "The Colorado enhanced I/M Program 0.5% Sample Annual Report", Remote
      Sensing Technologies Inc., Prepared for Colorado Department of Public Health and
      Environment, 1998.
  17_ Jimenez, McClintock, McRae, Nelson and Zahniser "Vehicle Specific Power: A Useful
      Parameter for Remote Sensing and Emission Studies." Presented at 8th CRC On-Road
      Vehicle Emissions Workshop, San Diego, CA, April 1998.
  75 Wenzel, Reducing Emissions from In-Use Vehicles: An Evaluation of the Phoenix
      Inspection and Maintenance Program using Test Results and Independent Emissions
      Measurement, Environmental Science and Policy, (4), p.359, 2001.
  79 Wenzel, "I/M Failure Rates by Vehicle Model", Presented at 7th CRC On-Road Vehicle
      Emissions Workshop, San Diego, CA, April 1997.
  20 Stedman, Bishop, Beaton, Peterson, Guenther, McVey and Zhang, "On-Road Remote
      Sensing of CO and HC Emissions in CA", Final Report to Air Resources Board, AO32-
      093.
  27 McClintock "The Denver Remote  Sensing Clean Screening Pilot" , Prepared for the
      Colorado Department of Public Health and Environment, 1999.
FINAL                                                                      - 54 -

-------
  22 Wenzel, "Reducing Emissions from In-Use Vehicles: An Evaluation of the Phoenix
      Inspection and Maintenance Program using Test Results and Independent Emissions
      Measurement", Environmental Science and Policy, (4), p.377, 2001.
  25 Slott, "The Use of Remote Sensing Measurements to Evaluate Control Strategies:
      Measurements at the End of the First and Second Year of Colorado's Biennial Enhanced
      I/M Program", Presented at the 8th CRC On-Road Vehicle Emissions Workshop, San
      Diego CA, April 1998.
  24_ Lawson, '"Passing the test'—Human behavior and California's Smog Check program," J.
      Air Waste Manage. Assoc., 43, p.1567, 1993.
  25 Klausmeier and Weyn, "Using Remote Sensing Devices (RSD) to Evaluate the California
      Smog Check Program", Report to the California Bureau of Automotive Repair, October
      2, 1997.
  26_ Wenzel, "Human Behavior in I/M Programs," Presented at the 15th Annual Mobile
      Sources/Clean Air Conference, Snowmass, CO, September, 1999.
  27 McClintock, "I/M Program Avoidance and Enforcement", Presented at the 15th Annual
      Mobile Sources/Clean Air Conference, Snowmass, CO, September, 1999.
  25 Leisha DeHart-Davis private communication with Jim Lindner.
  29 Rodgers, Lorang, DeHart-Davis, "Measuring EVI Program Effectiveness Using Optical
      RSD: Results of the Continuous Atlanta Fleet Evaluation", Atmospheric Environment,
      submitted for publication.
FINAL                                                                      - 55 -

-------
Appendix A: On-Road Evaluation of a Remote Sensing Unit

All on-road remote sensors carry out at least a measurement of the CO/CO2 ratio in the
exhaust of a passing vehicle. It is possible for an interested party to carry out a
quantitative evaluation of the precision of this measurement. This evaluation can be done
without going to the expense and complexity of an on-road  audit using a vehicle of
known emissions (wet gas audit), or a vehicle designed to puff surrogate compressed gas
mixtures of known ratios (dry gas audit).

The measurement of exhaust CO/CO2 ratio is obtained by estimating the slope of a graph
of CO versus CO2 (or more properly delta CO versus delta CO2). The evaluation is
carried out by observing the quality of the individual data points which are used to derive
this slope. Several on-road remote sensors operate for 0.5 seconds at 100 hz, thus
obtaining 50 data points for this correlation. Several on-road remote sensors use a puff of
gas of known CO/CO2 ratio as a field calibration. For these  sensors, the system operator
can display the CO/CO2 graph from a calibration, whether the calibration was considered
valid or not.

EVALUATION OF A CALIBRATION PUFF:

Figure 1 shows a valid CO/CO2, HC/CO2 and NO/CO2 on-road calibration puff (FEAT
3002, Sept. 27, 2001, Casa Grande, AZ). When evaluating a remote sensor, the first
parameter to note is the quality of the data and the fit. In the case shown, all  50 points are
almost touching the straight line and r2 = 0.99. The next parameter to note is the extent of
the data spread on the CO, HC, NO and CO2 axes. Different instruments use different
units. These graphs show the gas concentrations %CO, %HC (propane), %NO and CO2
in an 8cm cell . These units are chosen to correspond approximately to what would be
measured were one to directly probe a  tailpipe.  The units however do not matter, but the
spread of both gases in a plot such as Figure 1 is important to note.

Figure 2 shows a CO/CO2, HC/CO2 and NO/CO2 on-road calibration puff (FEAT 3002,
August 29, 2001, Phoenix AZ). This was  not a  valid calibration. In this case, the
calibration gas appears to be  mixed with exhaust from a vehicle which had recently
passed through the optical beam. It is not important that occasional invalid calibrations
look bad. It is important that the instrument is able to obtain valid calibrations, which
look like Figure  1, and are carried out with a data spread comparable to a typical
automobile at the same site. This parameter also must be determined at the roadside in
order to evaluate the instrument.
 It should be noted that air spectroscopy gas optical absorption data often are given in strange units
because what is measured is the product of concentration and path length. Thus, atm.cm or %.cm or
ppm.cm, or even % in 8cm are all units which may be used and all can be inter-converted. In fact the CO2
plume from a typical car as measured by an on-road sensor can be as large as latm.cm, but more often is
0.01 or 0.01 atm cm which could also be rendered as l%.cm. CO is typically 1/10 of that and HC and NO
1/100.
FINAL                                                                          - 56 -

-------
Another noise evaluation which one should ask any instrument to be able to perform is a
calibration but without any added calibration gas. The graphical evaluation is
uninteresting, namely a cluster of points at the origin. However, the spread of these points
along each of the axes is a direct measure of the noise which the instrument will see from
all passing vehicles. Again, the spread should be compared to the spread expected from a
typical motor vehicle in a realistic roadway situation using the same remote sensing unit.
FINAL                                                                           - 57 -

-------
                    0
Figure 1. Half-second puff calibration plots for CO, HC and NO. The straight lines are linear least squares
regressions of the data.
FINAL
                                                                                                -58-

-------

              0.3
              0.1
             0,15


              0.1


             0,05
                   0
                                               %CO?
Figure 2. Half-second calibration gas puff for CO, HC and NO which has been contaminated with exhaust
from a passing vehicle.
FINAL
                                                                                     -59-

-------
EVALUATION OF INDIVIDUAL MOTOR VEHICLE EMISSIONS:

At the roadside, when the instrument is operating and calibrated, call up and observe
CO/CO2 ratio graphs from about three randomly chosen vehicles. The skewed
distribution of emissions implies that these are all likely to be low emitting cars with very
small CO/CO2 slopes. The parameter to observe on these graphs is the range (spread) of
the CO2 data. If the CO axis is auto scaling, the noise may look very bad but actually be
very good. Note the CO2 spread. It should be comparable to the calibration, or at least not
less than about lOx smaller.

Figure 3 shows typical data from a passing vehicle. The CO2  readings are from about
0.3% to 1.3%, for a total spread of 1% CO2 in 8cm. The spread for the calibration shown
in Figure 1 is about 4.5% and in Figure 2 about 2.2%.  In both cases the calibrations are
at a comparable, although larger spread than the on-road data. Now it is necessary to
evaluate the CO/CO2 graph on a vehicle with higher than zero CO/CO2 ratio. If the raw
data are stored and can be recalled and graphed from each vehicle, then wait for a vehicle
with CO/CO2 >  0.25 (about 3.5%  CO on the video screen). Now observe this CO/CO2
graph. The CO2 spread should be comparable to the three low CO emitters observed
earlier. The CO  spread should be comparable to the CO spread on the calibration puff, or
at least not less than about lOx smaller. If these criteria are met and this graph looks
"good", for instance, r2 > 0.9, then you have an instrument likely to provide precise and
accurate measurements ,if the calibration gas supplier is trustworthy, data.

Figure 4 shows on-road CO/CO2 data from a cold-start vehicle measured at the
University of Denver. A similar evaluation analysis can be carried out for HC and NO;
however, if the CO/CO2 data do not pass muster, then HC/CO2 and NO/CO2 are much
less useful because the readings are missing a major component of the carbon balance.
Note also that HC emissions are smaller and harder to measure than CO, so more
(relative) noise is to be expected. If the data you see at roadside are of similar or better
quality then you are observing a good instrument. If they are  not up to this quality, then
your should think twice about  accepting the data until the operator/vendor can convince
you that the instrument is functioning properly.

The ability to read vehicle exhaust independently of vehicle type should also be verified.
This may be done by making a note  of the valid reading rate from normal sedans and
from SUV's and pickups while observing roadside operations. In a perfect world all
vehicles with ground level exhaust should be measured. In reality some are not, but this
should be observed to be a random process or a systematic one caused by driving mode
(noticeable decelerations) not one caused by vehicle type or body height.
FINAL                                                                         - 60 -

-------
          0.4
                                                                 1.5
Figure 3. In-use data for a low CO emitting vehicle.
FINAL
                                                                                             -61 -

-------
        0.4
        0.2
Figure 4. In-use data from a cold-start vehicle with elevated levels of CO.
EVALUATION USING EXHALED BREATH:

A non-smoking human exhales CO? and negligible amounts of CO, HC and NO. The
remote sensor should be able to read human breath as a passing car, as long as it is
accompanied by a blocked and unblocked optical beam. Fifteen readings of breath with
the FEAT instrument in the laboratory yielded a mean CO reading of 0.07% with a
standard deviation of 0.04%. HC read a mean of 39 ppm propane with a standard
deviation of 50 and NO a mean of-3 ppm with a standard deviation of 18 ppm.
FINAL
                                                                           -62-

-------