Methods to Identify Pattern Case Failures from I/M Data

Methods to Identify Pattern Case
Failures from I/M Data

£%	United States
Environmental Protect
Agency

-------
Methods to Identify Pattern Case
Failures from I/M Data
Office of Mobile Source Emissions
U.S. Environmental Protection Agency
Prepared for EPA by
Energy and Environmental Analysis, Inc.
EPA Contract No. 68-03-1865
Work Assignment No. 20
4>EPA
United States
Environmental Protection
Agency
EPA-420-R-86-001
September 1986

-------
TABLE OF CONTENTS
Page
1	INTRODUCTION 		1-1
2	CENTRALIZED STATE AUTOMOTIVE VEHICLE EMISSION INSPECTION/
MAINTENANCE PROGRAMS SURVEY 		2-1
2.1	Overview 		2-1
2 2 Testing Requirements 		2-2
2 3 Sample Characteristics 		2-5
2 4 Data Recording and Availability ... 	 ..	2-8
2 5 Special Features and Considerations 		2-11
2 6 Data from California 	 ... .. 	2-11
2	7 Summary ....	.					2-16
3.	DATA PROCESSING 	3-1
3	1 Overview	3-1
3.2	Step 1: Data Cleaning		3-3
3 2 Step 2• Record Standardization 	 .	3-4
3 3 Step 3* Tracking Test Sequences 	 ... .	3-6
3.5 Step 4: VIN Decoding 	 ..	... .... ...	3-11
3	6 Step 5 Analysis Output .	... 	 .3-15
4.	STATISTICAL ANALYSIS FOR IDENTIFYING PATTERN CASE FAILURES .. .4-1
4.1	Introduction		 .			.	4-1
4.2	Description of the Data and Underlying Assumptions .. ..	4-2
4.3	Question No. 1 Defining "High Failure Rate" 		4-6
4	4 Question No 2: Gauging and Adjusting for Technology
Group Impact .	... 		.	4-13
4 5 Question No 3: Effect of Reduced Failure Rates on Methods	4-16
4 6 Question No. 4' Combining Data Across States		4-18

-------
TABLE OF CONTENTS (cont'd)
k .1 Question No. 5: How to Analyze Results from
Alternative Sets of Outpoints	4-19
5 SUMMARY AND CONCLUSIONS 		5-1

-------
LIST OF TABLES
Page
2-1	General Testing Requirements 	 		2-4
2-2	Sample Characteristics		 2-6
2-3 Data Recording and Availability .. . .2-9
2-4	Special Features and Considerations 	 2-12
2-5	Ranking of Data Usefulness for Several I/M Programs 	 2-18
3-1	Connecticut 1984 Quarter 1 and Quarter 2 VIN and
Plate Counts	 3-9
3-2	Kentucky Data Sorting				3-10
3-3	VIN Decoding of CT1984 		3-12
3-4	VIN Decoding of KY1984 		3-13
4-1	Failure Rate Summary by Engine Family	 	4-4
5-1	Cost of Data Analysis		5-5
LIST OF FIGURES
Page
3-1	Program Flowchart to Compute Failure Rates ...	. . 3-9
4-1	Example Distribution of Failure Rates ...	.	. 4-7

-------
1. INTRODUCTION
EPA terms groups of vehicles that fail the I/M test procedure at an
"unusually high rate" as pattern case failures. These groups may fail a
particular type of I/M test, or more generally, fail different types of
short tests for a variety of reasons The reasons range from design
defects common to the particular group, to an emission control system
component failure at excessive rates causing vehicles in that group to
fail. These "pattern case" failures cause difficulties in I/M programs
as such vehicles may not be easily repairable, or it may be appropriate
to modify the test procedure for some vehicles. In other cases, it may
require EPA to force manufacturers to recall these vehicles for
modification.
EPA has traditionally relied on information supplied by individual I/M
programs, individual car owners, or in some instances, manufacturers
to identify "pattern case" failures. Under a previous work assignment
for the EPA, EEA obtained data from three I/M programs and calculated
failure rates at the certification engine family level and at several
distinct outpoints The failure rates were utilized by EPA to identify
engine families that were potential pattern case failures The objective
of this work assignment was to investigate methods to identify pattern
case failures using I/M data on a routine basis and to
• Minimize the complexity and time required to obtain such data
e Enhance statistical methods to better resolve pattern cases.
Accordingly, EEA organized the work effort into three separate areas
The first area is the data availability, where EEA investigated the
quality, quantity and types of data available from I/M programs on a
routine basis. The second area is the processing requirement to cal-
1-1-

-------
culate failure rates by engine family which to some extent depends on the
type and cleanliness of data supplied by the states The third area
encompasses the statistical tools required to identify pattern case
failures, given sample size and observed failure rates We have assumed
that EPA is interested in detecting pattern case failures in vehicles
that are model year 1981 or newer, since these vehicles are covered by
the "207(b)" Emission Warranty Furthermore, our analysis is restricted
to light-duty vehicles (LDV) and light-duty trucks (LDT)
This report is organized as follows. Section 2 discusses our findings
on the type, quality, quantity and availability of data from seven I/M
programs The findings are based on contacts with I/M program managers
in seven locations. Section 3 details the processing requirements,
starting from raw data as provided by individual I/M programs, to the
final product of computed failure rates by engine family Section 4,
prepared in conjunction with a subcontractor - Analysis and Simulation,
Inc. - provides a range of statistical tools required for identification
of so-called "pattern case" failures. Although EPA used the^ test, we
believe that more sophisticated methods are required for the analysis
Section 5 summarizes our findings and recommends analyses we believe
would be of greatest value to EPA.
1-2

-------
2. CENTRALIZED STATE AUTOMOTIVE VEHICLE EMISSION
INSPECTION/MAINTENANCE PROGRAMS SURVEY
2.1 OVERVIEW
The difficulties associated with the rapid identification of pattern case
failures are due to the quality, quantity and availability of test data
for analysis. The choice of inspection/maintenance data to analyze
pattern case failures must seek to maximize the availability of an
adequate sample of data that is relatively error free, contains all of
the variables of interest and does not require inordinate delays.
Previous analysis for the EPA has shown that failure rates are sensitive
to test procedures, and potentially to climatic variables. It is in
EPA's interest to obtain data from several different inspection/main-
tenance programs that represent different test procedures and are
geographically dispersed. In fact, failure rate differences arising
from test procedural differences or climatic/geographic differences may
be quantifiable if there is adequate data.
The requirement for large quantities of relatively clean and unbiased
data containing the Vehicle Identification Number (VIN) for each vehicle
tested resulted in narrowing the scope of our effort to encompass only
centralized I/M program data; analysis of data from decentralized I/M
programs have shown that much of the data is suspect. In this analysis,
we examined a variety of different centralized I/M programs that repre-
sent the range of diversity in location, test procedures and data
handling procedures. EPA has also been interested in determining
pattern case failures for California vehicles; this, of course, required
that we examine the California program even though the program is not
centralized The analysis of data from California is considered
separately in this section.
2-1

-------
A telephone survey was conducted to determine individual characteristics
of and differences between the vehicle inspection/maintenance programs
of seven states which use centralized (i e., state or contractor
operated) inspection facilities. The objective of the survey was to
determine which state program(s) produce emissions test data which
require little or no pre-processing cleanup or editing, and are available
on a frequent (monthly or quarterly) and timely basis. Individuals
surveyed were state I/M program engineers and technicians or private
contractors involved in the day-to-day operations of the programs The
seven programs selected were from diverse geographic regions of the
United States: Northeast, Middle Atlantic, Southeast, Midwest, Southwest
and Far West. The surveyed states include - Arizona, Connecticut,
Illinois, Kentucky, Maryland, Washington and Wisconsin. As stated
above, California is also considered but is separate from the analysis
of centralized programs.
This section discusses the general testing requirements, sample charac-
teristics, data recording and availability, and special features of each
of the seven programs surveyed
2.2 TESTING REQUIREMENTS
Although all state I/M programs ostensibly test light-duty vehicles on
the idle test, there appear to be considerable variations in the
definition of "light-duty", distinctions between cars and trucks, the
actual test procedure used, and the pass/fail requirements. EPA defini-
tions classify all cars as light-duty vehicles and all trucks up to
8,500 lb GVW as light-duty trucks (since 1979) Our survey recorded
general confusion regarding the 8,500 lb cutpoint, with some states
covering vehicles only up to 6,000 lb and others up to 10,000 lb GVW
To the extent EPA is interested in engine families and in light-duty
trucks between 6,000 and 8,500 lb GVW, there could be problems in
obtaining good data Test methods vary primarily in the preconditioning
2-2

-------
requirement, although some states now have pass/fail criteria on
emissions at high idle or loaded mode tests Cutpoint distinctions
between cars and light-trucks are more rare
The survey results are summarized in Table 2-1. The first column in
Table 2-1 lists the vehicles eligible for testing under the state
requirements. The second column is included to clarify the definition of
"light duty" as it pertains to vehicles eligible for testing (henceforth,
"vehicles" will refer to cars and trucks, unless otherwise noted) All
of the programs test all registered vehicles in the light duty category
covering at least the twelve most recent model years, which represent
approximately ninety percent of the in-use cars and trucks. Both
Illinois and Wisconsin test vehicles up to 8,000 lb (rather than 8,500
lb) GVW, but this distinction is based on registered GVW, which may not
be consistent with actual GVW. Washington and Maryland classify vehicles
only to 6,000 lb GVW as light-duty while Connecticut tests vehicles to
10,000 lb registered GVW. Complete capture of LDT's between 6,000 and
8,500 GVW is a potential problem with this variation.
The four types of emission tests for gasoline powered light duty vehicles
are listed in the third column of Table 2-1 The tests are: 1) T1 -
non-preconditioned idle, 2) T2 - loaded mode cruise on a dynamometer, 3)
T3 - final idle after preconditioning either at 2500 rpm idle or loaded
mode and 4) T4 - 2500 rpm no-load idle. Arizona and Connecticut are
unique as they use T3 test only for vehicles which failed T1 test.
Arizona, Connecticut, and Wisconsin precondition using a loaded mode
cruise on a dynamometer, while the other programs precondition at idle
Knowing the tests performed will allow for a comparison of failure rate
patterns between different testing sequences to gauge the effect of
alternate preconditioning and testing procedures
2-3

-------
TABLE 2-1 GENERAL TESTING REQUIREMENTS
State
Vehicles Included
"Light Duty"
Definitions
Test Sequence Standards/Tests-MYR
For Gas LDVa 1981 and Newer LDV
AZ	All gasoline and diesel vehicles,
1972 and newer0
CT	All gasoline powered cars and
trucks, 1968 and newer
IL	All gasoline powered cars and
trucks, 1968 and newer
KY	All gasoline and diesel powered
vehicles, all years
MD	All gasoline powered cars and
trucks, last 12 years
WA	All gasoline powered cars and
trucks, last 14 years
WI	All gasoline powered cars and
trucks, last 15 years
0-8,500 lbs
0-8,500 lbs
0-8,000 lbs
0-8,500 lbs
0-6,000 lbs
0-6,000 lbs
0-8,000 lbs
T1/T2/T3	207(b)b/Tl (T3 for
failed vehicles)
T1/T2/T3	207(b)/Tl (T3 for
failed vehicles)
T1/T4/T3	207(b)/T4, T3
T4/T3	207(b)/T3
T4/T3	207(b)/T3
T4/T3	1 5(2,0)/300/T3
T2/T3	207(b)/T3 (4.0/400
for LDT's 1981-86)
a T1 = First Idle, T2 = Loaded Cruise; T3 = Final Idle. T4 = 2500 rpm preconditioning.
b CO = 1.2%, HC = 220 ppm.
c Will test all 1967+ MYR gas and diesel starting 1/87.

-------
The fourth column of Table 2-1 lists the standards for model years 1981
and newer light duty vehicles. It can be seen that all the surveyed
I/M programs except Washington use the U.S. EPA suggested "207(b)"
standards of 1.2 percent carbon monoxide (CO) and 220 parts per million
hydrocarbon (220 ppm HC) This simplifies the task of comparing the
failure rates for all 1981 and newer light duty vehicles across test
procedures and states. Illinois is unique in this group in having 2500
rpm idle standards.
2.3 SAMPLE CHARACTERISTICS
It is preferable to have a large clean sample of data for the analysis
of pattern case failures. The cleanliness of the sample is affected by
retest/multiple test data and appearance of vehicles in the emissions
data that are difficult to track or can cause confusion.
Table 2-2 summarizes the characteristics of interest in each I/M program
data base Columns one and two pertain to all vehicles (light, medium
and heavy duty as applicable) tested, while the third column is specific
to light duty vehicles The first column lists the total number of
vehicles tested each month The greater the sample size, the more
significant any failure patterns detected will be. Sample sizes ranged
from 31,000 vehicles per month in Kentucky to 210,000 per month in
Illinois for vehicles of all model years.
The overall average failure rates (in percent) are listed in the second
column The failure rates are affected by the types and ages of vehicles
subject to testing as well as the pass/fail standards in effect. As
these failure rates include all classes of vehicles (except for
Maryland), they can be used as a general guideline for expected failure
rates. EPA may wish to focus analysis towards states reporting higher
than average failure rates.
2-5

-------
TABLE 2-2 SAMPLE CHARACTERISTICS
State
AZ
Monthly Sample Size
110,000
Reported Average
Failure Rate ($)
81 - 11.9
82	-
83	-
84	-
8 4
4 9
3 0
Test Requirements
1. New cars. first anniversary c4
2 Migrant out-of-state prior to
registration
3. Change of ownership
A)	Sold by dealers prior to sale
B)	Sold by individual new owner
registration renewal (will change
to sale 1/87)
CT
133,000
81	-
82	-
83	-
84	-
6 04
3.91
2.45
2 21
1.	New cars: first anniversary
2.	Migrant out-of-state: prior to
registration
3.	Change of ownership. VIR renewal
IL
210,000
N/A
1.	New cars first anniversary
2.	Migrant out-of-state, first
anniversary
3.	Change of ownership registration
renewal
KY
31,000
81	-	11 05
82	-	7 14
83	-	3.09
84	-	2 43
1.	New Cars first anniversary
2.	Migrant out-of-state registration
renewal
3.	Change of ownership registration
renewal

-------
TABLE
State	Monthly Sample Size
MD	133,000
WA	50,000
WI
135,000
2 SAMPLE CHARACTERISTICS (cont'd)
Reported Average
Failure Rate (%)		Test Requirement	
81-84 6.3	1 New cars: first anniversary "tJ*<£=--£¦
2.	Migrant out-of-state: prior to
registration
3.	Change of ownership. VIR renewal
81	- 5 86	1 New cars first anniversary ^
82	- 4 42	2. Migrant out-of-state: prior to
83	- 3.49	registration
84	- 5.58	3. Change of ownership registration
renewal
81-11.9	1. New cars, registration renewal (mini-
82	- 8 4	mum 90 days)
83	- 4.9	2 Migrant out-of-state: registration
84	- 3.0	renewal
3. Change of ownership: registration
renewal

-------
The third column of Table 2-2 summarizes the points at which three
classes of vehicles enter the sample data bases The three classes of
vehicles for which first test requirements are listed are new cars
bought in state, cars being registered in state for the first time (new
cars bought out of state and used cars migrating to the state), and used
cars that have had a change of ownership. All of the surveyed states
waive testing requirements for new cars either until the registration
renewal date (no minimum grace period in Kentucky, 90 day grace period in
Wisconsin) or until the car is one year old These practices limit the
number of new cars, which presumably have a very low failure rate,
present in the sample data bases. Migrant out-of-state cars are usually
tested prior to first in-state registration, except in Illinois where
they are given a one year grace period, and in Kentucky and Washington
where they are tested at registration renewal. Used vehicles are often
tested at the new owner's registration renewal. Arizona tests used cars
sold by dealers prior to sale. Preregistration testing leads to the
possibility that the same vehicle will be tested more than once in a
twelve month period, thereby biasing failure rate patterns. Moreover,
testing of unregistered vehicles results in data with blank fields for
VIN number (or nonsense numbers) and license plate Cleaning require-
ments are therefore higher when such vehicles are present. Connecticut,
Illinois and Maryland test vehicles only when the Vehicle Inspection
Report assigned to the vehicle is due for renewal and hence, these
problems are avoided.
2.4 DATA RECORDING AND AVAILABILITY
Table 2-3 summarizes the test data recording methods and practices used
in the various programs, plus the frequency with which the raw data
would be available for analysis
The first column of Table 2-3 lists the method used to enter vehicle
identification information (e g., VIN, license plate) into the test data
2-8

-------
TABLE 2-3 DATA RECORDING AND AVAILABILITY
State
Vehicle ID/Test
Results Entry Method3
Differentiate First
Test from Retests
Tape Availability
AZ
Pre 7/86: Manual/On line
7/86+ On line/on line
No; every third test
entered as first
Monthly, from state
CT
Manual/on line
No, every third test
entered as first
Quarterly, from state
IL
KY
On line/on line
On line/on line
Yes
Yes
Monthly, from state
Quarterly, from
contractor
MD
On line/on line
Yes; paid retests may be
entered as first test
Annually, from state
WA
On line/on line
Yes
Semi-Annually, from
state
WI
Bar code/on line
Yes, paid retests may be
entered as first test
Monthly, from state
a Manual written or key punched by test operator, On line 1) Vehicle ID information from DMV
records, 2) test results recorded by test apparatus.

-------
bases. Vehicle identification is entered either manually or directly
from a preexisting data base. The direct (on-line) data entry method is
preferable as fewer errors are introduced and/or propagated, particular
in recording the VIN. Direct data entry makes vehicle tracking, either
during one test year or from year to year, more accurate. Only
Connecticut relies on manual data entry while Arizona has converted to
automated entry as of July 1986.
Two methods of recording tests and retests are used. One technique is
to indicate as retests any test that is not the initial test for a
vehicle in a given year This method is preferred for analysis as it
simplifies the task of determining first test failure rates. The second
technique is to record every paid test or every third test for a vehicle
as a first test This method is not preferred due to the potential
difficulty in recognizing first tests and calculating first test failure
rates.
The third column in Table 2-3 lists tape availability frequencies
These frequencies will be the minimum time between delivery of raw data
tapes to EPA from the states involved, and will dictate the frequency
with which EPA can perform analyses Two states, Maryland and
Washington, do not produce master tapes on a schedule which lends itself
to frequent data analysis.
Also listed in the third column are the sources of the raw test data
All states will provide the raw data except Kentucky Kentucky's data
can be provided from the contractor which administers the tests. In all
cases, it appears that EPA's help will be required for the data to be
released to a contractor. In addition, Connecticut requires a confiden-
tiality of data agreement
2-10

-------
2.5	SPECIAL FEATURES AND CONSIDERATIONS
Table 2-4 lists selected special features of each I/M program that can
affect the usefulness of its data. The first column considers the final
disposition of vehicles which failed all tests and retests during a given
year. Knowledge of waivers, when granted, can allow year to year
tracking of failed/waived vehicles. The second column lists the method
used to handle retest records. Ideally, all test and retest data for a
given vehicle would reside on one record with waivers indicated This
method lends itself to the most accurate tracking of final vehicle
results, which can be monitored year to year to determine emissions
deterioration between inspections. Earlier analysis by EEA suggests
that a population of vehicles fails at every inspection and is waived
every year.
Only Arizona, Washington and Wisconsin merge test and retest records
together. Of these three, only Arizona, and Wisconsin indicate the final
status (pass/fail/waive) of each vehicle. Illinois, Kentucky and
Maryland indicate waivers on the final test or retest record for each
vehicle Connecticut and Washington keep separate files containing
waiver information These files are not merged with the test results
files so they are of no use to the study.
2.6	DATA FROM CALIFORNIA
Since California has separate standards and different engine families
than the other 49-states, recognition of pattern case California families
requires data from the state of California's inspection program. The
data has two drawbacks - first, the program is decentralized and the
quality of inspections unknown, second, the data does not contain the
VIN number which is a basic requirement for identifying engine family
However, there are some possible actions that one can take to enhance
the value of the data
2-11

-------
TABLE 2-4 SPECIAL FEATURES AND CONSIDERATIONS
State
AZ
Waiver Record Handling
Waiver indicator
Retest Record Handling
Merged with first test record
CT
Separate waiver file; merged with
test file when available (still
waiting for 1985 waivers)
Any retest records are separate
entries
IL	Waiver indicator on final test record Any retest records are separate
entries
K>
i
KY	Waiver indicator on final test record Separate record by vehicle visit
to inspection station
MD	Waiver indicator on final test record Separate record by vehicle visit
to inspection station
WA	Waiver not indicated on tape;	Merged with first test record
physically tracked
WI
Waiver indicator on final test record Results of up to 3 tests and/or
retests per record

-------
California requires that all vehicles up to 8,500 lb GVW in seven major
metropolitan areas in California be inspected bi-annually, but this
8,500 lb GVW limit is based on registered GVW (as in many other states).
The test usecyinian idle test, with 2500 rpm preconditioning. 1980 and
later vehicles-must meet standards for both the 2500 rpm test and the
idle test. Loaded-mode cruise test standards are "on the books", but
none of the regions require a loaded-mode test California is unique in
having (normal) idle test standards that vary by technology type for 1980
and later cars, as shown below.
HC		CO
No catalyst 150	2.5
Oxidation catalyst 150	2.5
Three-way open loop catalyst 150	1.2
Three-way closed loop catalyst 100	1.0
2500 rpm test standards are uniform at 220 ppm HC/1 2 percent CO for all
technologies. It is not clear what percent of cars are misidentified
and subjected to inspection at the wrong standards
Data entry on vehicle description is manual, and includes license plate,
vehicle type, GVW, make abbreviation, model year, number of cylinders,
engine size and odometer. EEA's examination of the records indicates
far fewer than expected records qualified as an LDT As a result, it is
possible that LDT's are being misclassified as far as the technology
category. On average, slightly less than 10 percent of all records are
classified as LDT's but registration records indicate that LDT pene-
tration in California is over 25 percent for newer model years. All
emission entries are automatic, with HC, CO, CO2 and RPM recorded Test
records are stored in a cassette tape at each mechanic station.
Data cassettes are collected on a monthly basis by a contractor and
transcribed on to a mainframe computer by the Bureau of Automotive
Repair (BAR) and its contracts. (This may be changed to process the
2-13

-------
records quarterly in the near future.) The total sample size is very
large, with 600,000+ vehicles tested per month. In 1986, over 200,000 of
these vehicles are 1980 and later models One advantage of the
California data is that the Bureau of Automotive Repair already performs
extensive cleaning of the data to eliminate records of calibration data,
aborted tests, invalid tests, etc. Moreover, first test records and
retest records for a given vehicle are merged and issuance of a waiver is
noted. If vehicles have multiple first test records, these records are
not merged, especially if they are from different stations. Yet another
advantage of the California data is that the BAR utilizes SAS for doing
its analysis and this will make the data easily adaptable to the
processing system described in Section 4.
A major drawback of the California data base is the lack of the VIN
number. We are currently using the VIN to determine:
•	Make
•	Model year
•	Model name (carline)
•	Engine displacement
•	Aspiration?natural/turbocharged
•	Gasoline/diesel
•	GVW category (for trucks)
All but two of the variables are being manually recorded for each
vehicle. The two variables not recorded are carline and aspiration.
For over 80 percent of all vehicles, knowledge of the engine displace-
ment, make and model year is sufficient to track engine family. (Of
course, neither the VIN nor the above variables reveal California/Federal
certification.) For the purpose of a general analysis that reveals
most, but not all, pattern failures, this may be sufficient EEA does
perceive a potential problem with not being able to distinguish between
cars and light trucks. If the vehicle type information is poor, it is
2-14

-------
conceivable that unambiguous determination of emission control technology
will be possible only in a small number of cases
The second method to overcome the data problem is by using the license
plate information and Department of Motor Vehicles records to identify
VIN. This would be expensive, as there are roughly 15 million light
duty vehicles in the state, roughly a third of which are 1980 or later.
BAR staff have contemplated this measure and believe that successful
matches of correct VIN numbers will occur in about 70 percent of the
cases. We cannot actually determine the quality of the VIN data unless a
small sample of license plates are matched to VIN records and the VIN
data examined This represents a possible area for additional
exploration by EPA in the future
A very interesting feature about the California program is that it
utilizes the 2500 rpm test for pass/fail determination and their idle
cutpoints are more stringent than the EPA "207(b)" cutpoints. Either as
a result of those factors, or due to other factors, California reports
the highest failure rate for 1980+ cars in the nation. On average the
failure rate (both tests combined) is approximately 25 percent, with even
model year 1984 vehicles reporting failure rates of over 10 percent in
1985. These failure rates are much higher than those in other
centralized I/M programs, even at the same cutpoints. In general,
decentralized programs typically display low failure rates; California's
failure rates are, therefore, surprising given that state officials have
suggested that tampering and misfueling of vehicles is lower in
California.
These factors suggest that California data could be useful to EPA,
especially if matching license plate to VIN proves feasible.
2-15

-------
2.7 SUMMARY
We have attempted to rank various aspects of each state's I/M program
for its usefulness to the proposed analysis. This ranking is based on
10 categories; no weights have been placed on the categories but EPA may
wish to weight the categories differently depending on immediate objec-
tives. The categories are as below
•	Vehicles coverage to 8.500 lb - We have awarded 2 points for
complete coverage, 1 for an intermediate point (such as 8,000
lb) and 0 for coverage up to 6,000 lb.
•	Preconditioned idle test - We have awarded 2 points if all
vehicles are subject to uniform preconditioning, 0 if there is
no requirements, and 1 if it covers only failed vehicles.
Uniform preconditioning is necessary to compare idle emissions
and failure rates at cutpoints that differ from those in use
j» Use of "207(b) standards - We have awarded 2 points for
standard cutpoints, making failure rate comparisons simple,
1 point for a situation where 207(b) cutpoints are applied to
only part of the 0-8,500 lb GVW fleet, and 0 for non-standard
cutpoints
•	Sample size - We have awarded 2 points if the total monthly
sample is over 100,000 vehicles, 1 if it is between 50,000 and
100,000 and 0 if it is lower than 50,000.
•	Data cleanliness - This refers to the presence of clean data
for all fields. In general, manual entry of data on vehicle
descriptions results in many errors, and is awarded 0 points.
Fully automated systems that track vehicles through their
registration records are awarded 2 points, while those are
dependent on some manual inputs, e.g., test cycle number, are
awarded 1 point
"• Ability to distinguish first test - Given the fact that many
vehicles have multiple records, either due to retests or
several "first" tests, unambiguous determination of the first
test in any calendar year is valuable. If the state can make
this determination with accuracy, it simplifies processing
requirements A score of 2 is provided if the test sequence
variable is judged highly reliable, 1 if there is no on-line
tracking of vehicles and 0 if there are known errors in this
variable.
•	Ability to gauge final outcome - This can be important if
tracking waiver rates, or the repairability of pattern failures
is an issue of interest. If all retests and the final outcome
2-16'

-------
(pass, fail, wavier, waiver type) of a given test sequence is
available in the data, the score is 2 points' Availability of
waiver data in a separate file that can be merged with the
data is given a 1 point score, and data sets that do not show
final outcome for failed vehicles are awarded 0 points.
•	Data pre-sort - If the state pre-sorts all of the data to
match records of each vehicle for test and retest, as well as
for any unusual tests (e.g., change of ownership, multiple
first tests) it is awarded 2 points, 1 point if only test and
retest data is merged, and 0 if vehicle test records are
unsequenced
•	Retrieval time - This factor shows how long it takes to obtain
the data after test completion. A score of 2 indicates data
availability within 3 months, a score of 1 indicates data
requiring 6 months And a score of 0 for a period longer than 6
months.
The scores for each of 7 centralized programs and California are shown
in Table 2-5. Assuming all factors are weighted equally, Illinois,
Kentucky and Wisconsin data appear to be the best, while Arizona,
Connecticut, Maryland and Washington are less preferable. (Arizona's
score is for the modernized system in use since July 1986.) Based on
our previous experience, where we found problems with the Connecticut,
Washington and (old) Arizona data to be of similar magnitude, the scores
appear to reflect well their usefulness to the analysis.
2-17

-------
TABLE 2-5
RANKING OF DATA USEFULNESS FOR SEVERAL I/M PROGRAMS
Distin-	Distin-
Uniform	Date guish	guish	Data
Vehicle	precondi- 207(b) Sample Clean- First	Final	Pre- Time
State Total up to 8500 lb tioning Outpoint Size liness Test7	Outcome?	Sorted Delay
AZ*
12
0
CT
10
IL
15
KY
14
MD
12
WA
11
WI
14
CA
12
(revised program from July 1986)

-------
3. DATA PROCESSING
3.1 OVERVIEW
The most resource intensive phase of the analysis is data processing.
As described in Section 2, each state processes in the neighborhood of
100,000 to 200,000 records monthly. As a rule of thumb with newer model
years, each model year accounts for 7 5 percent of all records. The
LDV/LDT split is typically between 4-l to 6.1 for a given model year. If
we assume that EPA's interest in any calendar year is in the five model
years covered by the emissions warranty, and the processing is on every
six months of records, the data base of interest is somewhere between
225,000 to 450,000 records for each state. Analysis of such large data
bases requires enormous amounts of computer time, and the slow turnaround
of each run (usually overnight) makes it difficult to identify and
correct errors.
As a result, the data processing requirements are separated into a
number of steps, with outputs at the end of each step to allow for error
identification and correction. EEA's previous experience with I/M data
programs suggests that these intermediate outputs are very important to
the success of the project Accordingly the analysis of data has been
divided into six steps:
o	Data cleaning
•	Standardization of variable/format
»	Sorting and sequencing
®	VIN decoding
c	Merging all data on individual vehicles
e	Analysis of failure rate and output
The steps are shown schematically in Figure 3-1 and discussed below.
3-1

-------
FIGURE 3-1
Program Flowchart
To Compute
Failure Rates
By Engine Family
Data Cleaning
Record
Standardization
RAW m DATA
... STEP 1
STEP 2

Tracking
Test Sequence
:STEP 3
Convert to
TEXT Data
STEP 4.1
VIN Decoding
STEP 4

Merge
STEP 4,2
Output
.. STEP 5

-------
3.2 STEP 1: DATA CLEANING
Any real world data tends to have errors; I/M data, especially in
certain programs, have errors related to vehicle or emission variables,
missing data fields or records generated from calibration and aborted
tests that should not be used for analysis. The cleaning step removes
such records by correcting or deleting them and also includes verifi-
cation and conversion of data into an appropriate format.
The first activity is to verify the data by reading the raw tape and
checking the values of vehicle descriptors (make, model year, type,
cylinders, odometer) and emission readings to make sure they do not
exceed the allowable range or contain blank fields Another item
included in the verification is the test result variable ("P" or "F");
this can be computed by comparing the emission readings to standards
associated with the particular model year (and, in some cases, the number
of cylinders). This step assumes that the tape copy of data received
from an I/M program is in the format agreed upon and the variables are
properly understood.
The second activity is the actual process of removing data records that
are incorrect, have missing fields or are not relevant for this analysis
In this step, tests of only light-duty vehicles (LDV) and light-duty
trucks (LDT) are retained, and an additional step of removing all diesel
LDV/LDT may be performed if there is an appropriate indicator field In
particular, the vehicle type indicator (absent in some states) is
retained for error checking after VIN decoding of vehicle type In
addition, aborted tests, calibration test and incomplete test records
are deleted.
The third activity involves assigning a separate computed pass/fail
variable (distinct from the one recorded) for results checking at the
end of other processing steps This is very useful in determining if
3-3

-------
other factors are being used in the program to pass or fail vehicles
(e.g., a tolerance on the standard that would pass vehicles very slightly
above standards or an underhood inspection that fails vehicles passing
the emission test).
Statistics on the total number of records and its breakdown by model
year, vehicle type, test month/year, make and test results before and
after processing are recommended outputs for this step. The statistics
are useful in determining what percentage of data is being rejected due
to cleaning, and any bias in record rejection (i.e., failed vehicles
having a larger percentage of records rejected than passed vehicles) A
very high record rejection rate or large bias in record rejection may
require acquisition of a new data tape or discussions with the program
managers to pinpoint the causes of the errors
Although this processing step appears routine, it has been EEA's exper-
ience that this step is necessary but tedious. In addition, this
cleaning step cannot be "standardized" as the types of errors and
requirements for record rejection vary from program to program. As an
example, the coding of emission values can be in percent CO or hundredths
of a percent of CO, and format specifications may neglect to mention the
units Moreover, EEA has had the experience where there are unannounced
format changes in the program leading to considerable confusion
Therefore, this step cannot be automated but instead requires con-
siderable intervention on the part of both a programmer and an analyst
3.3 STEP 2: RECORD STANDARDIZATION
It is anticipated that data from several different I/M programs will be
processed through the VIN decoder and analyzed Accordingly, this step
deals with:
•	Dropping unnecessary variables
•	Developing a standard format for variables of interest
3-4

-------
•	Standardizing alpha-numeric variables
•	Reading into SAS
Several variables recorded in the data set at any I/M program will
be irrelevant to the analysis required in this assignment. They include
records of inspection sticker number, tax codes, safety test results,
repair cost information, etc. Moreover, since we plan to analyze only
1981 and later vehicles (or the last five model years), we can delete
all unnecessary records and fields to minimize data storage requirements
and processing costs.
The second activity in this step involves creating a standard format for
all variables of interest. While this process is relatively straight-
forward, one area of particular concern is the test procedure and the
several variables in the procedure and in pass/fail requirements. The
number of HC/CO emission records vary according to the procedures in-use
which include1
•	All vehicles subjected to preconditioning, only one test at
idle
•	Unpreconditioned idle test, with preconditioning and a second
idle test only for failed vehicles
•	Unpreconditioned idle, high idle/loaded mode and second idle
tests for all vehicles for a total of three tests.
Pass/fail determination can be based upon any one, two or all three test
modes in some states; additionally, test and pass/fail requirements can
vary by model year. There is variation in the types of preconditioning
and states may change the test over time These test specific emission
records must be carefully tracked and the format must allow specification
of any combination of test mode and pass/fail criteria. In previous work
efforts for EPA, EEA suggested a standard format, but we now believe this
should be enhanced by an additional variable that provides information on
which test results are used to determine pass/fail, and to distinguish
between blank, missing and "zero" fields
3-5

-------
Standard format specification and care in conversion of alpha-numeric
variables apply primarily to the VIN, license plate, and MAKE codes.
Problems can arise in reading such variables and field length specifi-
cations are critical to avoid truncation errors. The MAKE code is
required primarily for error checks in sorting, as detailed below.
Typically, no standard abbreviations are used for makes and multiple
alternatives are used in the same state for designating the same make.
Moreover, same abbreviations lead to confusion - a common one is the use
of "MERC" to denote both Mercury and Mercedes-Benz In the past, EEA
has utilized a dictionary that maps up to 99 percent of non-standard
abbreviations into standard abbreviations as MAKE codes. This dictionary
is constructed by printing out all variables in MAKE in the raw data
tape, and assigning non-standard formats to standard codes This time
consuming effort may not be necessary if the VIN data is clean, as
described in the following subsection.
The step 2 processing reports will generate statistics on a number of
records and statistics on each field for blanks or missing data. If
required, a MAKE code frequency table and a report on makes not mapped
into standard code can also be provided.
3.4 STEP 3: TRACKING TEST SEQUENCES
This step is required to separate first test and retest records as well
as to track multiple "first" tests. As described in Section 2 of this
report, most centralized I/M programs have a variable to indicate first
test or retest, but EEA's experience has been that the variable is not
completely reliable. For example, in Arizona and Connecticut, every
third test (second retest) is counted as a first test
The data base must first be sorted to match all available records as a
single vehicle. Two types of test must be distinguished
•	Multiple first tests
•	Multiple retests.
3-6

-------
Multiple first test records can occur in an I/M program if motorists
go to separate I/M stations on the same day after failing a first test
and decide the vehicle can pass on a second try at a different station.
I/M locations like Kentucky have on-line computers that will prevent
motorists from claiming a second "first test", but many I/M programs
cannot recognize such vehicles as having already completed a first test
Multiple first test records can also occur over the course of six
months or a year if vehicles are required to go through both an annual
inspection and an inspection at change-of-ownership.
Retest records are easily confused with first test records as many
owners let the allowable repair period elapse before they appear for
their retest. In states with change of ownership inspections, it is
sometimes difficult to exactly distinguish which tests are retests. In
addition, vehicles with multiple retests have records that are more
susceptible to incorrect data entry (especially in I/M programs with
manual data entry) Tracking of first test and retest is important for
two reasons-
•	Records confusion exists only for vehicles failing the first
test, and their elimination will result in biased calculations.
•	In the interest of 207(b) warranty enforcement, EPA may need
to know the final outcome of test sequences
Sorting of all records by data for each vehicle is required for assigning
test sequence number. For the purposes of this work effort, we have not
pursued the algorithm for assigning the correct retest or multiple first
test number, and instead focused this effort into simply determining with
as much accuracy as possible, the first test for a given vehicle in a
given year
Sorting can be based solely on VIN, in states with manual data entry, VIN
keypunch errors may result in a poor match of records EEA has used VIN
or (MAKE and MYR and LICENSE PLATE) as a second sorting criterion All
3-7

-------
three variables must be matched together because license plates need not
be unique between commercial and non-commercial vehicles and, in some
states, the plates can be transferred from one vehicle to another
Contrasting the number of record matches using the two methods is a
useful check of VIN keypunch errors. This, of course, requires extra
effort in Step 2 for MAKE codes standardization
As an example, we utilized a sample of records from Connecticut,
illustrating the range of variation observed. The sorting performed was
in two different ways: first by VIN only and second by license plate
and MAKE/MODEL YEAR. Table 3-1 illustrates the results in the matrix of
record counts by the two methods. If VIN sorting produces a record
count for any particular vehicle of N, and sorting by license plate/
MAKE/MYR a record count for the same vehicle of M, ideally M should equal
N. However, a small percentage of cars sorted by the second method show
values of M lower than N, but in no case does M exceed N This indicates
that sorting by VIN is superior in all instances to sorting by license
plate/MAKE/MYR if the data is from a limited time period. Over longer
period such as two years, this may not be true.
Kentucky does not record license plate but should theoretically have a
test number variable that is very reliable Table 3-2 shows the results
of the VIN sort number N, as a function of Kentucky's test number -- 1,
2, and S (greater than 2) Clearly, for N=1, it is possible to have
vehicles with a higher Kentucky test number if their previous records are
in an earlier data tape Surprisingly, 12 percent of test records for
N=2 was labelled by Kentucky as a first test, indicating potential
deficiencies in the system. The table illustrates the need for the
sorting step even when we analyze data from a highly computerized I/M/
program
3-8

-------
TABLE 3-1
CONNECTICUT 1984 QUARTER 1 AND QUARTER 2 VIN AND PLATE COUNTS
N = TEST SEQUENCE BY VIN, M = TEST SEQUENCE Di PLATE
TABLE OF N BY M
N	fl
FREQUENCY
PERCENT
ROW PCT
COL PCT
l:
> *
k »
3!
4:
5:
6!
7:
s;
9 J
11
1
1
TOTAL
1
143633 1
o :
o :
o :
0 !
o :
o :
o :
o :
0
+
1
1
143683

78.87 !
o.oo :
o.oo :
o.oo :
o.oo :
o.oo :
o.oo :
o.oo :
o.oo :
0.00
1
78.87

100.00 :
0.00 i
0.00 !
o.oo :
0.00 !
0.00 i
o.oo :
o.oo :
0.00 i
0.00
1


99.73 !
o.oo :
o.oo :
o.oo :
o.oo :
o.oo :
o.oo :
o.oo :
o.oo :
0.00
I

*_
233 :
35648 !
o :
o :
0 !
0 !
o :
o :
o :
0
~
1
35831

0.13 :
19.57 :
o.oo :
o.oo :
0.00 !
o.oo :
o.oo :
0.00 !
o.oo :
0.00
t
19.70

0.65 :
99.35 !
0.00 !
o.oo :
o.oo :
o.oo :
0.00 !
o.oo :
o.oo :
0.00
«
1


0.16 :
99.59 !
o.oo :
0.00 I
o.oo :
o.oo :
o.oo :
o.oo :
o.oo :
0.00
1
1

3
so :
66 I
1196 :
o :
o :
o :
o :
0 i
o :
0
+
«
1342

„ 0.04 :
0.04 :
0.66 :
o.oo :
o.oo :
o.oo :
0.00 I
o.oo :
0.00 i
0.00
1
t
0.74

5.96 :
4.92 :
89.12 :
o.oo :
o.oo :
o.oo :
o.oo :
o.oo :
o.oo :
0.00
1


0.06 :
o.is :
96.69 :
o.oo :
0.00 !
o.oo :
o.oo :
o.oo :
o.oo :
0.00


4
8 :
42 :
7 :
435 I
0 !
o :
o :
o :
o :
0
+
492

0.00 !
0.02 :
o.oo :
0.24 :
0.00 !
o.oo :
o.oo :
o.oo :
o.oo :
0.00

0.27

1.63 :
8.54 !
1.42 :
88.41 :
o.oo :
0.00 !
o.oo ;
0.00 !
o.oo :
0.00



0.01 !
o. 12 :
0.57 :
93.95 :
o.oo :
0.00 :
o.oo :
o.oo :
o.oo :
0.00


5
12 !
3 :
9 :
*> «
1
88 :
0 i
o :
o :
o :
0

114

0.01 1
0.00 !
o.oo :
o .oo :
0.05 :
o.oo :
o.oo :
o.oo :
0.00 !
0.00

0.06

10.53 !
2.63 :
7.89 :
l .75 :
77.19 :
o.oo :
o.oo :
o.oo :
0.00 !
0.00



o.oi :
o.oi :
0.73 :
0.43 :
83.02 :
o.oo :
o.oo :
o.oo :
o.oo :
0.00


TOTAL
144073
35796
1237
463
106
65
35
30
24
349

13217?

79.08
19.65
0.68
0.25
0.06
0.04
0.02
0.02
n.ni
o. n

J oo. "">0

-------
TABLE 3-2
KENTUCKY DATA SORTING
EEA
TEST NUMBERS
VERSUS KENTUCKY TEST
NUMBERS

TABLE
OF
N BY TST_
3 EC'

N
TiT_SEQ




frequency





PERCENT





ROW PCT





COL PCT
1 £
: 1
} il
t
>
TOTAL






1
! 24
1
t
33442 !
408 i
33874
! 0. 07
1
>
93.45 !
1.14 :
94.66

! 0. 07
!
1
vS.72 ;
1.20 :


! 14.72
1
»
O CP 1
• ' • t
20.76 :







L.
l oo
1
»
209 !
1530 :
1 "7 "7 O
X ' f

! 0. 1 i
»
1
cr.0 !
4.28 I
4.97

: z. I*
1
11.75 :
o: ,-;S ;
U' • V - *


1 z o
J i.
1
1
0.62 :
77 bo: j


94
<
)
7 ¦
27 !
128

; 0.26
1
1
0.02 :
o.os ;
0.36

! 73.44
1
1
5.47 :
21.09 i


t c: 1 /. *7
1 J f • '
t
1
0.02 !
1.37 :

4
! 5
1
1
o :
o :
5

: o.oi
»
1
0.00 !
o.oo :
0.01

: ioo.oo
t
1
o. oo :
0.00 !


; 3.07

0.00 !
o.oo :

s
i 1
1 X

0 !
o ;
1

\ 0.00

0.00 i
0.00 !
0.00

! 100.00

0.00 !
0.00 !


! 0.61

0.00 !
o.oo :

			 	
^	
-+-
	+ -
	+

TOTftL
163

33658
1965
OPTO/.
O ' w

0.46

94.05
5.49
100.00
3-10

-------
3.5 STEP 4: VIN DECODING
Decoding the VIN to obtain engine family designation and emission
control system type (fuel control, catalyst, or secondary air fuel system
type) has been developed by EEA and is now available as a stand alone
program (For a description, see "VIN Decoder- User's Guide" EEA
Report to the EPA, September 1986.) The current VIN decoder is capable
of analyzing and decoding VIN for model years 1981-1984 light duty
vehicles and light-duty trucks. Another product of the VIN decoder is an
error report that allows tracking of the number of records with VIN
errors.
For this step, EEA recommends that only the license plate, make and
model year be retained with VIN in a separate data tape in TEXT format
for input to the VIN decoder thus minimizing memory requirements and
input/output processing. The operation of the VIN decoder as a unit is
straightforward, and the error analysis is output as required. A sample
of Kentucky and Connecticut calendar year 1984 data was processed to
reveal the typical percentages of record retention. Table 3-3 shows the
VIN numbers decoded for vehicles designated as MYR 1981-1984 in
Connecticut. As can be seen, only 82.8 percent of VIN's are successfully
decoded. Two major error types - 08 and 11 - account for most of the
VIN errors. Error code 08 arises from failure of the validity test, and
11 arises from non-standard VIN format, potentially as a result of
truncation of the VIN
Table 3-4 shows the results of VIN decoding for Kentucky data - 94.85
percent are successfully decoded, and the major error type (code 02)
arises from the particular engine key not being found in the table One
explanation for this is that running changes are not being incorporated
into the VIN decoder's certification data tape at the current time. The
examples show that the VIN decoding success rate is likely to vary from
3-11

-------
VIN DECODING OF CT1984
TABLE OF MYR BY ERRLVL
MYR
ERRLVL
FREQUENCY
PERCENT
ROW PCT
COL PCT
00
103
105
107
I 08
I 09
110
111
-+	
112
- +	
I TOTAL
•+
20406
11.07
0
0
0
0
0
0
0
0
20082
324

0.00
0 . 00
0.00
0.00
0.00
0.00
0.00
10.89
0.18

0.00
0 .00
0 .00
0.00
0.00
0.00
0.00
98.41
1.59

0.00
0 . 00
0.00
0.00
0.00
0.00
0 .00
100.00
100.00





1
1
1
1
1
1
1
1
+
t
1
1
1
1
1
1
1
+
1
1
1
1
1
¦h
1
1
1
1
1
1
1
80
28
1
0
0
0
0
0
0
0

0.02
0 . 00
0 .00
0.00
0.00
0.00
0.00
0 .00
0.00

96.55
3.45
0.00
0.00
0.00
0.00
0 .00
0.00
0.00

0.02
0.38
0 .00
0 .00
0.00
0.00
0 .00
0.00
0 .00
81
45428
44
31
146
2654
88
528
0
0

24.64
0. 02
0.02
0 .08
1.44
0.05
0.29
0.00
0 .00

92.86
0.09
0. 06
0 .30
5.43
0.18
1 . 08
0.00
0.00

29.76
16.54
13.66
28.68
35.34
20.47
22.33
0.00
0.00
+
+
+
82
48165
15
40
198
2492
109
502
0
0

26.13
0.01
0 . 02
0 .11
1.35
0.06
0.27
0.00
0 .00

93.49
0.03
0.08
0 .38
4.84
0.21
0 . 97
0 .00
0 .00

31.56
5.64
17.62
38.90
33.19
25.35
21.23
0 .00
0 .00






1
1
1
1
1
1
1
1
+
1
1
1
1
1
1
1
1
+
1
1
1
1
1
1
1
1
+
1
¦
1
1
1
1
1
1
-A.
83
55735
152
153
150
2145
197
1146
0
0

30 .24
0.08
0 . 08
0.08
1.16
0.11
0.62
0 . 00
0 . 00

93.39
0.25
0.26
0 .25
3.59
0.33
1. 92
0.00
0 .00

36.52
57 .14
67.40
29.47
28.57
45.81
48.46
0 .00
0 .00
84
3270
54
3
15
218
7
189
0
0

1.77
0 . 03
0.00
0 .01
0 .12
0.00
0.10
0.00
0 .00

87.06
1.44
0. 08
0.40
5.80
0.19
5.03
0 .00
0 .00

2.14
20.30
1. 32
2.95
2.90
1.63
7 . 99
0 .00
0 .00
85
0
0
0
0
0
29
0
0
0

0.00
0.00
0.00
0.00
0.00
0.02
0.00
0 . 00
0 .00

0.00
0 .00
0.00
0.00
0.00
100.00
0.00
0.00
0 .00

0.00
0.00
0.00
0.00
0.00
6 .74
0.00
0.00
0.00
TOTAL
152626
266
227
509
7509
430
2365
20082 324

82.80
0.14
0.12
0 .28
4.07
0.23
1.28
10 .89
0 .18
29
0.02
48919
26.54
51521
27.95
59678
32.37
3756
2. 04
29
0.02
184338
100 .00
TABLE 3-3

-------
TABLE 3-4
VIN DECODING OF KY1984
TABLE OF MYR BY ERRLVL
15s56 FRIDAY, SEPTEMBER 26.
MYR
FREQUENCY
PERCENT
ROM PCT
COL PCT
ERRLVL
00
103
105
107
I 08
|09
110
111
112
I TOTAL
0
0
0.00
0.00
0.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
162
0.56
97 .01
100.00
5
0.02
2.99
100.00
167
0.57
81
7752
290
0
7
66
7
15
0
0
8137

26.62
1.00
0.00
0.02
0.23
0.02
0.05
0.00
0.00
27.94

95.27
3.56
0.00
0.09
0.81
0.09
0.18
0.00
0.00


28.06
42.96
0.00
29.17
36.87
12.28
5.14
*— — ¦
0.00
0.00

82
7733
195
35
7
47
19
8
0
r	— — — — — —
0
8044

26.55
0.67
0.12
0.02
0.16
0.07
0.03
0.00
0.00
27.62

96.13
2.42
0.44
0.09
0.58
0.24
0.10
0.00
0.00


27.99
28.89
33.33
29.17
26.26
33.33
2.74
* — — » — — <
0.00
0.00

83
8478
81
57
8
48
21
117
0
_ — — — —,
0
8810

29.11
0.28
0.20
0.03
0.16
0.07
0.40
0.00
0.00
30 .25

96.23
0.92
0.65
0.09
0.54
0.24
1.33
0.00
0.00


30.69
12. 00
54.29
33.33
26.82
36.84
___________
40.07
o.oo
0.00
>
84
3664
109
13
2
18
6
152
0
0
3964

12.58
0.37
0.04
0.01
0.06
0.02
0.52
0 . 00
0.00
13.61

92.43
2.75
0.33
0.05
0.45
0.15
3.83
0.00
0.00


13.26
16.15
12.38
8.33
10.06
10.53
52.05
0.00
0 .00

85
0
0.00
0.00
0.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
4
0.01
100.00
7.02
0
0.00
0.00
0.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
•
4
0.01
•
TOTAL
27627
94.85
675
2.32
105
0.36
24
.08
179
0.61
57
0.20
292
1.00
162
0.56
5
0.02
29126
100.00

-------
80 to 95 percent. At 80 percent, some formal steps are required to check
for the high error rate. If the VIN errors are biased towards failed
vehicles, then deletion of record with VIN error could substantially
alter failure rate computations. One check method for data with high
error rates would be to tabulate the VIN error for "passed" and "failed"
vehicles separately and check the statistics for bias.
Two other administrative problems have been noted by EEA. The certifi-
cation data tape for any specific model year released by EPA (usually in
March) is based on pre-model year data, and does not contain any running
changes made by the manufacturer during the model year. Apparently, EPA
has a separate file in which running changes for all major manufacturers
except GM are maintained. The file cannot be easily merged with certifi-
cation data as the formats are not similar. At this point, the
resolution of the running change problem does not appear simple, and may
not be able to be resolved
The second administrative problem relates to vehicles whose title
specific model year, VIN decoded model year and engine family model year
are inconsistent EEA was not able to resolve why this problem exists,
but has learned informally that there may be some confusion at the
close of one model year and the beginning of the next, between VIN model
year and engine family model year. In general, this has resulted in
only a small number of vehicles (less than 1 percent of the sample)
being potentially misclassified for engine family designations.
EPA is aware that VIN decoding does not allow recognition of 49-State
versus California certification Decoding by the manufacturers has
allowed EEA to establish that even in a neighboring state to California
(like Arizona), the number of California vehicles is less than 4 percent
In other states further away from California, it is anticipated that
California vehicles are less than 1 percent of the population. Moreover,
3-14

-------
49-state and California cars have, in recent years, become nearly
technologically identical and differ only in calibration. As a result,
we believe this issue is not of significant concern except in the case
of California vehicles where 49-state vehicles are estimated to be 10-15
percent of the vehicle population.
Finally, EPA has been interested in failure rate by engine family and
transmission type, as certain models with automatic transmissions have
been reported as pattern case failures. We examined the VIN code and
determined that only four manufacturers - AMC, Honda, Renault and Subaru
- entered transmission type information in the VIN. As a result,
computation of failure rates at this level of detail is not possible.
3.6 STEP 5: ANALYSIS OUTPUT
After merging the VIN decoder outputs with the emissions test data, the
generation of failure rates by engine family is a straightforward step,
requiring only the cutpoints and the test results (Tl, T2, T3) com-
binations to be considered for determination of failure. One advantage
of utilizing SAS is that it can provide failure rate statistics by engine
family and by other levels of aggregation such as emission control type,
manufacturer, vehicle type, with very little additional programming The
generation of output tables in SAS is less convenient, but EEA already
has extensive programs to generate tables of failure rates at different
strata
A second advantage in utilizing SAS is that the output data file can be
directly tapped into for further statistical analysis to determine which
engine families are pattern case failures The methods, and their
availability in SAS are addressed in Section 4 of this report.
3-15

-------
4. STATISTICAL ANALYSIS FOR IDENTIFYING PATTERN CASE FAILURES
4.1 INTRODUCTION
EPA is interested in identifying engine families that may be failing at
rates significantly higher than average on the state vehicle inspection/main-
tenance test. An enqine familv corresponds to a unique make/model/enqine
size/emission technology, and is used bv EPA to determine certification to
standards and for recall. A failure is recoqnized when tail pipe HC and CO
idle emission concentrations exceed a given set of cutpoints. EPA is
interested in failure rates computed for at least two sets of cutpoints --
100 ppm HC/0.5*> CU and ??0 ppm HC/1.?% CO.
Once the failure rates by engine familv at each cutpoint are computed
Dased on each individual state's T/M data, there are some additional complica-
tions in comparisons between states. Each state has a slightly different I/M
test procedure that can give rise to differences in failure rates. Tn addi-
tion, climatic variables, such as temperature, can also influence failure
rates. Given all of these effects, the question is what statistical test or
tests should be employed to recognize hiqh failure rate families? How much
data is required to recognize these families?
Given this situation, there are five topical questions of interest to the
EPA. The\ are:
1. Defining "high failure rate." EPA has used a	test com-
paring each family's failure rate to the fleet average failure
rate. Ts this appropriate if the hiqh failure rate families are
a significant portion of the fleet, thus biasing the average?
P. Since there are different technologies used to meet standards --
e.g., carburetor versus fuel injection -- should the failure
rate for each family be compared to others in the same tech-
nology group?
3. For many of the newer model years, the fleet averaqe failure
rates are verv low -- 1 to ? percent. How should test methods
and sample sizes be structured in the compar i sons.''
4-1

-------
4. What is the most appropriate statistical test to compare a qiven
engine family's failure rate across state specific data? How
can data from different states be combined to increase resolu-
t ion?
!>. Should the recommended statistical tests be performed separatel>
for each set of cutpoints?
k.? DESCRIPTION OF THE DATA AND UNDEHLYINC ASSUMPTIONS
For the purposes of this inquiry, the available processed data can be de-
scribed as a collection of distinct vehicle-test samples, each sample charac-
terized b> a sample size (number of vehicles tested) and two test results:
(1) the number or percent of vehicles failinq the test under criterion ?07(B),
i.e., cutpoints P?U ppm HC/1.?*> CO; and (?) the number or percent failing under
cutpoints 100 ppm HC/U.!)^ CO. The qualities that define a distinct vehicle-
test sample arc: the unique enqine familv (as certified bv EH"\) to which the
vehicles belonq and the state in which the tests occurred. Fnqine families
are, further, classifiable bv model vear, vehicle class (1iqht-dutv vehicle,
LL)V or 1 iqht-dutv truck, LOT), manufacturer, and emission control technoloq>
t\pe. The Dlock structure for the vehicle-test samples mav thus be diagrammed
as shown below:
VEHICLE-TEST SAMPLE
TESTED VEHICLE
TEST STATE
MODEL
VEHICLE
MANUFAC-
EMISSION CONTROL
YEAR CLASS	TURER	TECHNOLOGY
(Y)	(C)	(M)	(T)
I			)
ENGINE FAMILY (E)
The structure ma> also be expressed in a conventional alqebraic notation bv
Sx((YxCxMxT) L) . The x svmbol denotes crossing of "treatments" while the ¦+¦
svmbol denotes nestinq. Thus, within each (Y, C, M, T) combination there will
4-2 '

-------
be zero or more distinct enqine families (F), but the indiv i rl< id I enqine
families within one (Y, C, M, T) combination bear no relation to enqine
families within an\ other combination.*
The attached paqe from an UFA report illustrates the nature of the data
for state (S) = Washington (Seattle), model \ear (Y) = 19b?, vehicle class
(C) = Iiqht duty vehicle (LDV), four particular manufacturers (M) = Nissan,
etc. and the individual 19tt? LL»V enqine families (F) of these manufacturers.
Each enqine famil\ belonqs to a specific emission control technology type (T),
and these are written in for most of the families on the paqe. Fach line thus
represents a specific vehicle-test sample and includes the three essential
numerical outputs for the inquiry: the number in the sample (N) and the
calculated failure rates and P^, corresponding to the two desiqnated sets
of cutpoints, ?2U ppm HC/1.?*> CO and 1U0 ppm HC/U.y* CO.
A few preliminarv ooservations. Sample size N varies over a tremendous
ranqe. Althouqh enqine family entries with N < 1u are t>pica11\ the result
of erroneous decodinq of the \ehicle identification number (VIN) and can
usuall\ be iqnored, the differences in precision of the estimated failure rates
are still ver\ qreat. Tn fact, some popular domestic manufacturer enqine fami-
lies have sample sizes exceedinq lU.UUO in some state proqrams. For example,
13 distinct emission control technoloq\ t\pes (T) have been defined in the FLA
report' for characterizinq all model year iy&? vehicles, but no (C, M) combina-
tion contains enqine families fallinq into more than four technoloq\ t\pes.
Furthermore, the number of enqine families corresponding to a particular (Y, C,
M, T) comoination that is represented also \aries. Thus, the block structure
is quite sparse and unbalanced for several reasons -- the restriction to a
small subset of all possible (Y, C, M, T) combinations, sariable numbers of
enqine families per represented combination, and widely \aryinq sample sizes
among the individual families.
* There is an exception to this statement. Some enqine family certifica-
tions are carried o\er from one model \ear to the next. The possibilitv
of identifvinq some enqine families across model \ears therefore exists,
but will be iqnored in the present anal\sis.
4-3

-------
TABLE 4-1
> FAILURE RATE
MODEL YEAR 1981 LIGHT-DUTY VEHICLES 	 INITIAL TESTS 	
ARIZONA 207(B) 100/0.5
FAILURE FAILURE FAILURE
N RATECX) RATE(X) RATE(X)
PI	P2
BAM258V2HP7	BAM258V2HP7	2020	9.3	11.6	18.5
AMERICAN MOTORS
BAM151V2BC4	BAM151V2FCI	362	1.9	1.9	3 6
CHRYSLER CORP.
BCR1.7V2HJ1
BCR2.2W2HA5
BCR2.6V2BJ2
BCR3.7V1BA0
BCR5.2V2HJ4
BCR5.2V4HC1
BCR5.2V9FAX
BCR1.7V2HJ1
BCR2 2V2HU8
BCR2 .6V2BL4
BCR3.7V1HE5
BCR5.2V4HC1
BCR5.2V9FF6
823	6.7
3508	4-2
1940	1.5
793	3.8
774	2.1
116	12.9
155	0.6
7.8	12.9
5.6	8.7
2.7	5.5
5.2	8.7
3.7	12.3
15.5	28 4
1.9	3.2
FORD MOTOR CO.
1 . 6AP
2s 3AHF
2. SAX
r. 3GQF
1.2/5.OAAC
4 .'Z/S. OGCC/ACC
4.2/5. OGCC/GCF
4.2/5.0GCF
4.2/5.0MAF
5.0CCF
5.8HBPF
1.6APC
2.3AX
3.3GQF
4.2/5.OGCC/ACC
4	2/5.OAAC
4.2/5 OGCC/ACC
5	OCCC
5.8HAXC
6108
7.2
9.4
19.5-
799
5.9
6.3
8.5
2132
7.4
8 3
11.1 —
10621
6.9
8 9
16.0-
1111
2.3
3.1
6.2
29
24 1
27 6
34.5-
142
9.9
10.6
16.9-
2248
13.5
16 1
23.5-
1743
3 7
4.8
9.0
1274
2 0
6 0
22 9-
407
9 8
13.3
19.7-
GENERAL MOTORS
11C2NDM/NN
11D2AC
11E2AC
11L4AC
11L4ACJ
11M2TNQZ
12H2AD
12S4AB
12S4ABD
12X2NN
13H2AE
13Y4AR
14E2TM
14E4NBD
14F4AE
16T5ADB
16T5ARB/DB
11C2NDM/NN
11H2TN4Z
12H2AD
12S4AB
12S4ABD
12X2NN
13H2AEJ
14E2TM
14F4AEJ
16T5ADBJ
9595
2.4
2.9
3153
2.1
3.4
7839
2 8
3.6
4295
3 8
5.2
281
3 6
4.6
7574
11.6
15.4
1123
1.8
2.4
113
1.8
3.5
154
4.5
5.8
8068
0.8
1.2
2295
1.7
2.7
3904
2.5
3 .6
14517
6.4
8.8
204
4.4
6 .4
1322
1.9
3.0
1587
0.5
0.8
1745
0.7
1 .7
4.7
11.1
11.2
14.9
14.2
32.5—
6.4
12.4
10.4
2.6
8.6
8.4
30 6-
23.5
7.5
3.6-
3.L.
SUMMARY BY ENGINE FAMILY
Technology Type
-	CARB/OXD/PMP
-	CARB/3CL/PMP
-^J.CARB/3CL/0XD/PMP
-Bicarb /3cl/oxd/ pmp
-	FI/3CL/0XD/PMP
QCARB/3 WAY/OXD/PMP
: minimm™
-^CARB/3CL/0XD/PMP
-	CARB/3WAY/0XD/PMP
-3cARB/3CL/0XD/PMP
All CARB/3CL/0XD/PMP except:
CARB/3CL/PLS
CARB/3CL/PMP
|fi/3cl/oxd/pmp

-------
Other variables mav be recorded which characterize individual tested
vehicles and which could very well be correlated with measured emission levels.
Notably among these are: odometer mileaqe, age (calendar vear - model year),
and month of test. For purposes of the present inquiry, however, these factors
will be iqnored and it will be assumed that each vehicle-test sample, i.e.,
(S, Y, C, M, T, E) combination is a sample from a homoqeneous population. The
population is then fully described stat i st ica I I \ by two parameters: p^ and p.,,
the probabilities of failinq cutpoint sets ?2Uf\.?% and 100/0.5%, respectively.
The response data and P., represent estimates of these underlying parame-
ters. There is a slightly more unifyinq way of viewinq the two responses which
derives from the fact that the	failures are a subset of the 100/0.5%
failures, i.e., one criterion subsumes the other. This is the categorical re-
sponse viewpoint which says that the result of a test puts the vehicle into one
of three mutually exclusive categories: pass 100/0.5%, fail 100/0.!)% but pass
/VO/1.?%, and rail	The associated probabilities are 1 - P^,
Pp - P |, and P |.
We proceed, next, to consider the five questions posed in the Statement of
Work. The focus in Question 1 ("high failure rate") is on comparing engine
families within a "fleet," without specific reference to explanatory factors.
The details of the block structure defined above will not be involved. Tt is
in response to (Question ?, which raises the issue of technology tvpe influence,
where we will introduce an approach for assessing the significance of effects
attributaole to various factors represented in the block structure. Our
discussions in (Question 3 will consider the interplav of sample size,
diminishing failure rates, and correspondinqlv lowered criterion for "hLqh
failure rate" in affectinq the power with which hiqh failure rate families can
be successfullv identified. Tn dealing with Question k on across-state com-
parisons, we will expand on the approach suggested in Question ? which should
also result in attaining increased explanatory power. Suggestions for Kindling
multiple-valued response, as requested in QuestLon 5, will be mane.
4-5

-------
4.3 QUESTION NO. 1: DEFINING "HIGH FAILURE RATE"
Consider a data set of k vehicle-test samples (engine families) denoted
Dv (n1, P|),	(nk, Pk) where n. is the i sample size and P. is
the failure rate within the ith sample calculated with respect to a single set
of cutpoints. The issue of multiple sets of outpoints is reserved for Ques-
tion !>. It is presumed that this data set is restricted to a particular state,
a particular vehicle class (LDV or LUT), and a particular model vear. Fven
though many different manufacturers are involved and they apply a variety of
emission control technologies, in principle, one miqht have expected a fairlv
homogeneous collection of true failure rates because the test method, the
distribution of environmental conditions, statutory emission standards for
vehicle certification, and the state-of-the-art apply uniformly over this set
of enqine families. Tn practice, one finds a considerable spread of estimated
rates. The problem posed is to guantify the notion of "hiqh failure rate" and
to describe a procedure for identifying the subset of engine families which can
confidently be said to have high failure rates.
A concrete example provides a useful framework for discussion. Tn the
accompanying figurp are plotted (in rank order) the estimated ??U/1.2to failure
rates +1 standard error for ?0 engine families more or less serially selected
from the first three listed manufacturers in the failure rate summary table for
model year 1Vb? LDV's in Arizona. No one is likelv to argue about calling
families l8-<>0 high failure rate families. What about families 16 and 17v
Their estimates are distinctively high, but, because of small sample size,
comparison tests with any of the smaller-rate families are not likelv to show
anv statistically significant difference. What about families 1-15? Ihere is
no intuitively odvious wav of partitioning that qroup into "normal" and "hiqh"
rate subgroups; still, statistical comparison tests would likelv show 11-14
siqnificantly different from I and ?.
What are some of the classical statistical methods for multiple compari-
sons among engine family "treatments" which could be applied here? Many
methods are inapplicable because of unequal sample sizes. One in particular is
Duncan's multiple range test, even with Kramer's extension to unequal sample

-------
	N_
41
307
2763
14
16
24
2047
4861
1499
870
5755
742
1003
43
1680
90
1863
1988
1179
169
ENGINE FAMILY
SAMPLE NO.
20 "
19 "
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
-O—
-o
-o-
-o
-o-
-o-
10%
20%
30%
ESTIMATED FAILURE RATE *_1 STANDARD ERROR
FIGURE 4-1 EXAMPLE DISTRIBUTION OF FAILURE RATES

-------
sizes.^ The problem with the extension is that it can't properly handle large
variations in sample size which is what we need here. This Is a shame because
2
the SAS statistical software package has Duncan's procedure with Kramer's
extension. A very well-known method due to Scheffe and the "multiple-t" method
are both applicable. For both methods, it is necessary to compute the withln-
treatments mean square, MS^f which can be expressed as a pooled variance,
namely,
MS«
I ("1-1)5'	k
1=1		= 1 E , - v 2
l '' (n - 1)	n "k 1=1 1 " S*
i=l
where n = En. and s.^ is the estimated variance of response within the
i treatment. Since, for the binomial samples, we have
s,2 = pi(1 - Pj)
it follows that
1 k
MS = 		 E (n - 1) P. (1 - P.)
w n - k	v i	l v i'
Under Scheffe* s method, we may simultaneously test for differences In
failure rate among any number of engine family pairs (i, j) at significance
level a by checking for whether the inequality
I P. - P.I > |"(k - 1)F1 (k - 1, n - k) .MS (7T" + -f-) 1 4
1 i jl I-	i-a	w	nj J
k-7a

-------
is satisfied, where F1 (k - I, n - k) is the 1UU(1 - a) percentile of the
i -a
F-distribution with k - 1, n - k degrees of freedom.* Unner the multiple-t
method, if we preset the total number of comparison tests we wish to make at
m, then the above test chanqes to checking for satisfaction of
| P. - P | > t. /5 (n - k) • [mS (X
i j' l-a/Pm	L w n. nj J
where t1	n " *s the 1uu'' "	percentile of the t-distriDution
with n - k degrees of freedom. Kecal1, in this application, n is the total
number of vehicle tests and k is the number of different engine families to
which the vehicles belong, a is at the user's discretion, but t>pical values
used are U.Ul and 0.0i>. Inasmuch as n - k is expected to be guite large,
(k - 1)F1 ^(k - 1, n - k)
and
'i-a/?.'" - k)
ma\ be approximated b\
t'l-c."
1) and z
1-a/?m'
respectiselv (referring, in turn, to 100(1 - ci) percentile and 100(1 - a/?m)
percentile points of the chL-square and standard normal distributions).
Final 1\, consideration of the likelv ranges of interest for k rind m lead to
the conclusion that the multiple-t method will invariablv have the qreater
power for a fixed level of significance a. Thus, we have reduced the multiple
comparison tests of interest to that of checking for satisfaction of
> z
1-a/?m
Mb (_L + _i_

'J
* Statements about the significance level of this and subsequent tests are
onl> approximate because the individual vehicle responses arc clearlv not
normallv distributed with homogeneous variance. However, the approxima-
tion is expected to be reasonably good because of the generalLv large
sample sizes.
4-8

-------
in order to establish that the true failure rates for the enqine families in
question, p. and p^, can be asserted to be different. As a concrete example,
suppose we ha\e iU enqine families, each with 100U vehicle tests (for a total
n of :>0,00u) and that pooled mean square MS^ is 0.0). Select level of siqnificance a = U.U1,
and assume that, at most, 50 comparison tests will be made (m = !>0) . Then we
need to find the b>9.yy percentile point of the standard normal distribution,
which is 3.7?, and this sets the value of the right-hand expression in the
above inequality to O.U?3. This means that anv two enqine families whose
calculated failure rates differ by more then ?. 3* may Oe inferred to have
different true failure rates. (At most m = !>0 such comparison tests are
permitted to keep the level of siqnificance at a = 0.01.)
Incidentals, if we want to be able to assert that p. > p then replace
the above by the correspondinq one-sided test, viz., check for satisfaction of
P. - P > z. , •
i j 1-a/m
Mb (-L + -L)
i nj ¦
Kemember, for significance level a to De applicaDle, m must be a preset
maximum number of comparisons we are permitted to make.
The aDove statistical comparison test will ultimately prove to be useful,
out it first requires an externalIv imposed criterion or line of demarcation to
define the meaning of "hiqh failure rate." The following procedure is pro-
posed. Kearranqe the engine families in the data set in increasing orriei of
P.. We will have thereby qenerated a new seguence (n^'f ^'1, •••»
with P < P. ... Select a fraction r for partitioning of the engine families
i - i + 1
into "normal" and "candidate high rate" families. \ typicalLv recommended
value for r is O.b. Find the smallest index I such that
I	k
£ n.' > r E n.' = rn
i=l 1 ~ i=l 1
4-9

-------
The set (n^1 P^'K •••> ^ n£' > ^2.'^ constitutes the defined core set of
"normal falLure rate" engine families. The remaining m = k - I engine
families are denoted "candidate high failure rate" families.
An alternative way of establishing the core normal failure rate set miqht
De to select a maximum acceptable failure rate p* and find the largest index
& such that ' £ p*. Tor everv r criterion, as defined above, there will
be an equivalent p* criterion that results in the same partition. *\s an
example, p* = 6* was imposed on the model vear 1^8? LUV data set for Arizona
(for the ??0/1.?% cutpolnts failure rates). This partitioned the data set of
12?,000 vehicle-tests into a "normal failure rate" set of 10?,DUO vehicle-tests
(with mean failure rate of ?.5%) and a "candidate high failure rate" set of
?0,000 vehicle-tests (with mean failure rate of 12.1%). The equivalent r
criterion would have been r z. 0.8^.
After establishing the core normal failure rate set, coalesce it into a
sinqle pooled sample of size
i
n. = En.'
° 1-1 1
and estimated fail'ire rate
P =-^- L n. 'P. '
° 0 1=1 ' '
The multiple-t method with the one-sided test option is now applied. Hecall
that there are m = n - I candidate hiqh failure rate engine families:
(n ; , Pi,+ l)' ••••»	Hk'' ' ^0fI,Pute MSW» 3S previously defined, using
the pooled normal failure rate set as a sinqle familv or "treatment." Select a
desired level of significance. A recommended value is a = 0.05. Perform the
m one-sided comparison tests:
P-' * Pn > z i / *
l 0 l-a/m
Mb (— + —)
L « ni nj j

2
; i=Jl + 1, — ,k
4-10

-------
rf the inequality is satisfied, enqine familv (n.*, P.') is rlesiqnated a hiqh
Tn i I ure rate familv. IT not, the enqine familv is set aside, After all m
comparison tests are completed, the set-aside families are absorbed into the
core normal failure rate Tamil\ and the combined collection referred to as
normal failure rate families. This collection mav thus include some estimated
hiqh rate enqine families which could not t>e asserted with confidence to be
hiqh rate families. The net result is to define a final collection of hiqh
failure rate enqine families.
IT the aDo\e method were applied to the prevLouslv illustrated example of
?U enqine famil\ samples, a verv plausible outcome, depenriinq on reasonable
choice of r or p* and a, could ha\e been that families li> throuqh ?u
would have initially been desiqnated candidate hLqh-faiIure-rate families, Dut
that, after application of the multiple-t method, onl\ families lt>, ly, and ?0
would retain the hiqh rate desiqnation.
A comment should be made about the possible lole of cluster analvsis in
findinq "natural" partitions of enqine families into similar qroups or
clusters. \ comprehensive treatment of this methodoloqv is qi\en nv
Hartiqan.'* Unfortunatel v , miich of the emphasis is on multidimensional
deterministic data. The usual approach is to introduce a metric from which a
"distance" can be derived for everv pair of data points. Ihe aim of clusterinq
is to minimize intra-cIuster distances whilr maximizing intei-cIuster dis-
tances. A reasonable distance definition lor enqine families could be the
closest separation between the +_l standard error inteivals centered aoout their
estimated failuie rates. Applvinq this definition to the previous example, the
distance from familv 5 to 16 would oe zero, while from 17 to M would be aoout
^ percentaqe points. A clustei procedure miqht then estaol ish 1^ ann ?U as a
sinqle cluster and the remaininq families into perhaps one, two. or thiee
clusters. How numner 15 fares would depend on the particul ir alqorithm and
optimization ci iter ion used. The MS packaqe has a cluster alqorithm which
i in fortunatelv uses an internal Iv qenerated fuel idean distance that cannot Oc
accommodated to provide the interva I-separation distance function described
above, borne problems with the application of cluster analvses to the present
problem are that multiple clusters could evolve tKit h^ve no useful interpreta-
tion and that par t i t ion inq mav expresslv not occur in the reqior, of failure
4-11 '

-------
rate \alues where one would like to see the distinction between normal and high
failure rates being made.
A note on EPA's use of a "x^ test comparing each famik's failure rate
to the fleet a\eraqe failure rate." This phrase does not precisely define the
procedure in use. We presume it is the following. Define,
r. =
s.
i
n.P.
I L
n. ( I
i
P. )
i
ReconstitutLng the original numerical counts of
failures and passes within the ith famiJ\
q. = If - f.
i J i
t. = 2s - s.
' J i
n = En.
Counts of failures and passes within all the other
fami1ies
P = Z f . / n
Heet a\erage failure rate
f. = n.P
i	i
I	J
s. - n. - f.
t	i	i
q'r - (n - n. )P
i	i
F-
t. = n - ri. - q.
i	i	i
Ixpected counts of failures and passes assuming
homogene i t\
lor each engine familv i = 1, ..., k, form the ? \ ? table
i ch Tamil \
AI 1 the Kest
\ AIL
PASS
f.
i
s.
i
gi
t.
L
4-12

-------
,ind compute the sinq le-deqree of freedom chi-square statistic

l
E
s.
/ E\2 i+ ^¦E^2
<9i "
E
9i
tE
1
to test for homoqene its , i.e., equal i t v of proport ions. If X ^ ^ 1 -a ^ ^ >
then the enqine familv's i true failure rate can be said to differ from the
true failure rate of all the rest, at significance level a. If, furthermore,
f > a . the statement ma\ be amended to state thdt the ith enqine familv's
i '
true failure rate is qreater than the failure rate of all the rest.
Several problems are seen with this presumed procedure, first, it relies
entirely on the notion of statistical siqnificance. It is well known that,
qiven sufficiently larqe sample sizes, just about ail compared populations will
be significantly different. Second, it is a multiple comparison test and the
siqnificance level neens to be appropriatelv reduced to maintain overall
level a. Third, if a particular enqine familv is determined to be a hiqh
failure rate familv at some stage of this sequential procedure, it should sub-
sequently be removed from the total class of enqine families. Nevertheless a
problem would still remain in that the procedure mav then be sensitive to the
order in which the comparisons are made.
QUESTION NO. 2: GAUGING AND AOOUSTING FOK TECHNOLOGY GROUP IMPACT
The question literallv asked is: should each familv be compared to others
in the same technoloqy qroup? We reformulate the question as follows. Can the
techno Ioqv qroup character of an enqine familv be used to explain some of the
variations in failure rates amonq families? Tf so, can the failure lates be
adjusted to remove effects due to the use of different technologies so that
remaining differences amonq families due to other causes would be highlighted?
4-13

-------
We trust that this reformulation is sufficient I\ comprehensive to cover the
intent of the original, question.
The answers are, of course, yes. Tn fact, a further general ization of the
question is suqqested -- why not also look to other characteristics, such as
manufacturer, model year, and LL)V/LL)T class as potential contributing explana-
tory factors for failure rate variations among enqine families? In particular,
a cursor} examination of the data provided suggests marked systematic in-
fluences associated with specific manufacturers. The extension to cover LUV's
and LDT's as well as multiple model years would help to provide a more unified
framework for interpretation of the data.
We propose an additive linear model (with no interactions) as follows.
I denote manufacturer (M)
3 denote emission control technology group (T)
K denote LUV, LOT class, respectively (C) (K = I)
L denote model year (V)
Let index i =
J	=
k	-
I	=
m	=
th
M( i, j, k, I) denote m engine fami 1v within eel
( i, j , k, 2-) (T )
Define P. „ to be the observed failure rate of the m
ij k£m
within cell (i, j, k, I,). The model is expressed as:
th
engine family
P. . =p + ot. + 8 • + ¦+- 6n + 0 . „ + e. , „
ijkJUi M i i k I i j kirn ijk£m
with constraints,
and
Ja, ¦ EBj = £Yk ¦ = 0
M(i,j,k,n)
»J
0 for all i, j, k, I
4-14
J

-------
the parameters p, a., 3^ , Y^, 6^,	which represent overall mean, M
effects, T effects, C errects, Y effects, and F effects, respectivelv,
are estimated from the observed {P..,„ } data. The z terms represent
i j kx.m
residual (unexplained) effects.
The above proposed model can be readilv implemented on available software
packaqes which provide for linear analysis of categorical responses. In par-
ticular the SAS package^ has the FUNCAT procedure which is sufficLent I\ compre-
hensive to handle the verv unbalanced (wide sample size variations) and sparse
(not all (i, j, k, 2.) cells occupied) type of problems which would be charac-
teristics of the (P. } data set. For example, for model year 1^(5^
ij kJcm
vehicle-tests in Connecticut, we found that amonq the 9 x 10 x ? - 1ttU possibLe
manufacturer x technology x vehicle class cells only 3a were occupied by at
least one engine familv.
The output of FUNCAT, in addition to parameter estimates, provides chi-
square statistics for testinq hypotheses that each of the main effects is
significant, and that each of the individual parameter estimates is signifi-
cantly different from zero. It also computes the chi-square statistic for
assessing the level of significance of the residual or unexplained effects.
Unce statistical significance is established for main effects and individual
parameters, the issue of substantive significance can be investigated. For
example, if M, T, C, and F effects were aLl found significant, but not Y
effects, and i f
p = j. yfc	(s iqni fi cant)
ct„ = 1.3*	(significant)
= -Q.7%	(not significant)
= u%	(sign i f icant)
= I.k%	(not significant)
(not significant)
one might draw the followinq conclusions: for both engine families 1 and ?
within the manufacturer-I, technology-1, vehicle class-1 (LUV), model vear-1
cell, a reasonable estimate for failure rate is i.b + 1.3 - ?.0 =	this
4-15

-------
number is explained as the sum of 3.5 - ?.0 = 1.5% (mean failure rate Tor all
LDV's) and 1.3% (effect due to manufacturer 1). Note that even though T
effects are found overall to be significant, the particular estimate for
6^ (representing the contribution due to technology group 1) is statistically
not significantly different from zero. Hence, it is treated as a zero contri-
bution. Other technology groups must have had a significant impact in order
for the overall technology effect to be significant, but apparently not
group 1. \ similar argument leads to the neglect of the +_!.<~% estimates
for the two engine family contributions. On the other hand suppose thdt
and	were both statisticaIiy significant but evaluated at
+0.?%. This possibility could arise if the two families in question had very
large sample sizes. Tn this instance, one could view the individual enqine
family effect as substantively insignificant and again choose to ignore it,
keeping a common failure rate estimate of ?.tn> for both families.
Hopefully, a non-interactive effects model will prove to be adeguate, as
would be evidenced by a small or insiqnificant level of residual effects. Such
a result would lead to relatively simple and plausible explanations for sources
of failure rate variation. Tf residual effects come out to be significant, one
miqht wish to explore certain interactions, but this extension will be limited
b\ the degrees of freedom available in the sparse experimental design for the
problem under consideration.
Tn summary, application of a cateqorical response linear model to the
vehicle-test data would help to identify the major sources for variation in
observed failuie rates. Tt would, in effect, also allow each engine family to
be compared to others having common features, Iixe same technology group or
same manufacturer.
<*.5 gUESTION NO. 3: EFFECT OF HEDUCEl) FAILURE RATES ON METHODS
If failure rates among all engine famiLies follow a generally diminishing
trend with successive model years, but the desired level of precision remains
invariant, then the situation would actually improve. On the other hand, if
the required precision is also reduced in direct proportion to the lowered mean
4-16

-------
overall failure rates, then the situation would worsen. These conclusions
derive from the fundamental properties of binomial distributions, buppose
emission tests were performed on n vehicles belonging to an engine family
whose underlying probability of failure Is p. Then the resulting number of
failures F is a binomial random variable with mean np and standard devia-
tion 0^ = /np(1 - p). Consequently, the derived failure proportion or failure
- rate P = h /n has mean = p and standard deviation = / p( 1 - p)/n. Our
principal interest is in p << 1 so that 
-------
It is not necessarilv true however that downward trends in failure rates
are describable by a general scale contraction. [t is entirely possible for
some of the failure reductions to follow a simple scale translation rather than
a contraction, in which case discr im inabi I it> should actually increase. How-
ever, not all rate reduction can be translations because rates can't reduce
below zero.
The net conclusion which we draw is that the statistical procedures, them-
selves, need not be modified, but that the power of these methods will Iike1>
(though not necessarilv) be reduced. If power does reduce, a compensator
strategy is to increase n, i.e., accumulate more data. This could be accom-
plished bv using a longer time interval; for example, waiting for six months of
data where the previous practice was to commence analysis at three months.
k.6 QUESTION NO. COMBINING DATA ACROSS STATES
State-specific data are easilv incorporated into the categorical response
linear model described under Question ? and for which packaged procedure is
readilv available within bAS. In detail, we introduce
index n = 1, ..., N denotes state (S)
and revise the model as folLows:
h*. n = L> + d. +3 + Y + ^ a * ® • i n	n
i j kJlmn v l j k I ijkJim n ijk£mn
with the added constraint,	= 0. fhe parameters	repiesent the effects
of individual states on failure rates. As before, onl\ a main effect is intro-
duced in the anticipation that there are no appreciable interactions with other
factors. Note that an "n" index is not added to the engine familv parameters
9.	because engine f.imilies identifv across states,
ij kJcm
4-18
-------
Assuming that noninteractive state effects will be found adequate, the
augmentation of the model should have two significant benefits, first, it
wouLd permit one to derive more \alid measures of state influences than could
be inferred from simple comparison of overall state data set means. The
reason, obviously, is that state-to-state differences in the detailed distri-
butions of other effects (technologies, manufacturers, etc.) introduce spurious
differences in the raw means. The second benefit is that the additional multi-
state data should add power to the determination of effects due to the other
factors. Of course if stronq interactions between state and other factors were
to be demonstrated by a larqe increase in the residual variation, then these
benefits may diminish or be vitiated. As noted before, limited exploration of
interactions can be conducted, if necessary.
H.7 QUESTION NO. 5: HOW TO ANALYZE RESULTS FROM ALTERNATIVE SETS OF OUTPOINTS
Since EPA is interested in alternative sets of cutpoints, the recommended
statistical tests should be done for each set. Hiqh failure rate enqine fami-
Lies can be identified in each set, but not necessarily using the same external
standard or line of demarcation between "normal" and "hiqh" rates. Tn fact,
cursorv examination of some of the data sugqests that the 100/0.!?% criterion
results, on the average, in rouqhly three times the failure rate as that pro-
duced bv the ??[)/}.?>» criterion. ff the r fraction method were used for set
partitioning, it would tend to natural I\ establish a higher equivalent failure
rate criterion for the 1U0/0.!>% cutpoints. This mav be an argument for using
the r fraction method rather than the direct p* criterion since the latter
requires separate designation of p* for the two sets of cutpoints. The
results may not then be comparable in severitv of the pruninq achieved.
because ol the substantia11v higher failure rates associated with the more
strinqent cutpoints (which can be viewed as a scale expansion effect), we
should expect increased relative precision and therefore sharper aDility (via
the multiple-t tests) to determine that a candidate high failure rate engine
family is a high rate family when it truly is in that category.
4-19
-------
The FUNCAT procedure in the SAS package readily accepts categorical re-
sponses of any multiplicitv (and even dimensionality). Thus, for each vehicle-
test sample one would read in, in addition to the design effects categories
(i, j, k, I, m, n), the response data as n, ^ and P? rather than just
n, P. There is provision within FUNCAT to define a scalar response function of
the input probabilities. Une could first select	run the model, then
select Pp and rerun the model to qet an analysis of variance and significant
effects estimation with reference to each set of cutpoints. Comparison of the
two results mas shed light on the sensitivit> of various effects to cutpoint
criteria. These have to do with the detailed distributions of measured HC and
CO concentrations within individual engine families. Tf these distributions
are fairly smooth and similar in shape (in the vicinity of the cutpoints) over
most engine families, then one should not expect much difference in effects
evaluation for the two cutpoint sets. Suppose the distribution saturates
between HC cutpoints for some families but not for others, depending, say, on
technology type, then profound differences in significant effects may be found
in the two analyses.
The flexLbiI its of FUNCAT with respect to response function also permits
running the model for such response combinations as Pj - P^ or	These
results would help focus on effects which contribute to translational or
scaling dissimilarities over the set of engine families.
REFERENCES FOR SECTION
1. Analysis of Emission Test Failure Kates in Centralized T/M Programs,
Report to the EPA, LLA, September l9bo.
?. Kramer, C. Y., Fxtension of Multiple Hanqe Tests to Group Means with
Unequal Numbers of Replications, biometrics, 1?, JU7-J1U, 19b6.
3.	SAS User's Guide, 1979 Fdition, SAS Institute Tnc., Cary, NC, 1979.
4.	Afifi, A. A. and Azen, S. U., Statistical Analysis - A Computer Oriented
Approach, Academic Press, New York, 197?, pp 7^-7!).
I>. Hartigan, J. A., Clustering Algorithms, John Wiley & Sons, New York, 197!).
4-20
-------
5. SUMMARY AND CONCLUSIONS
The identification of "pattern case" failures using emission data from
centralized I/M programs for 1981 and later model year light-duty
vehicles and light-duty trucks has been investigated in this work
assignment. The objectives are to define a system that will lend itself
to rapid, periodic analysis of data. Our investigation identified
issues in three areas:
% Selection of I/M programs from which one can obtain relatively
clean data for the analysis with little time lag.
0 A simplified processing scheme to minimize costs and turnaround
time.
•	Improved statistical methods to better define pattern failures
and possibly identify effects of test procedures and/or
ambient variables.
The selection of I/M programs for data analysis depends, to some degree,
on the questions being investigated. Our analysis shows that highly
automated programs such as those in Illinois, Wisconsin and Kentucky are
best suited in terms of data cleanliness and rapid "turnaround" of test
data. It will be possible to perform analysis on a quarterly basis if
data form these states are used. Unfortunately, these states also
utilize different test procedures, preventing easy data comparison
across states. Earlier analysis of data suggests that EPA should
examine data from states using identical test procedures and across
states using different test procedures, so that false failures related
to the test procedures can be identified. Moreover, all centralized
programs are improving their data acquisition methods. In a few years,
data from all states may be relatively similar as far as cleanliness and
turnaround time. At the current time, we would recommend the following
•	Investigate the three procedures currently used - idle with no
5-1
-------
preconditioning, idle with 2500 rpm preconditioning, and idle
with loaded-mode preconditioning.
•	Investigate I/M programs such that each test procedure type is
utilized in at least two geographically distinct programs.
This will require investigation of six I/M programs, at least.
•	Select the six I/M programs from the universe of I/M programs
based on a combination of sample size (at least a total sample
of 50,000 per month), automated data acquisition and rapid
turnaround. We believe that realistically, analysis of data
bi-annually will be possible from six programs.
•	California engine families require California data. Although
it has several drawbacks, the California data also suggests
some interesting possibilities that may make an engine family
specific analysis feasible.
Data processing steps required after acquiring the data from the states
include cleaning, sorting, VIN decoding and failure rate calculations.
The cleaning step is a general "front-end" step that will require highly
variable efforts, depending on the relative cleanliness of the input
data. However, even the best data sources require some cleaning, if
only to eliminate calibration, aborted tests, heavy-duty vehicles, etc.
One of the problems is that each state's program is constantly being
changed and the cleaning steps will have to reflect these changes This
step, therefore, requires programmer intervention and can be tedious
Sorting and sequencing of data is required to recognize initial tests and
retests for the same vehicle Although other elaborate schemes have
been considered, a VIN based sort may be adequate for this analysis
Sequencing is not an issue if pattern cases are to be recognized based
only on a first test failure; other issues, such as vehicle repairability
and waiver rates may be interesting to EPA but will require additional
sequencing steps.
Other steps including VIN decoding and calculation of failure rates are
straightforward. Identification of transmission type is generally not
possible. In addition, there is no easy resolution to the "running
5-2
-------
change in certification" problems. Problems with end of model year
(MYR) vehicles and certification family MYR versus vehicle MYR appear to
be restricted to very few cars. EPA certification staff have claimed
that proper specification of carryover engine families should be no
problem if the final version of the certification tape is used; there
may be some residual problems unknown to EPA.
Once failure rates by engine families have been calculated, a number of
statistical tools can be employed to identify pattern cases and address
several related issues. We recommend a test called Scheffe's multiple-t
test to both define and identify pattern failures, and this test is an
improvement over EPA's current	test. We have proposed statistical
linear models that can evaluate technology specific, manufacturer
specific, and testing procedure specific influences. In addition,
we have suggested methods to use data from two sets of cutpoints, and
methods to combine data from several states. All of the proposed
methods fit well into the processing framework, in that they are
available on SAS (Statistical Analysis System).
EPA has inquired about sample size requirements for the analysis, and
this cannot be answered in the absolute sense The sample size required
to identify any particular engine family depends on:
•	The sales of that engine family
•	The failure rate of that family in comparison to the fleet
average failure rate
•	The statistical significance with which EPA can claim the
family is a "pattern case"
•	Convoluting factors such as technology specific rates and
response to ambient conditions.
At this point, it does not serve any purpose to fix a sample size;
rather, as sample sizes are increased, one can expect pattern failures
to be recognized for low sales families with greater precision.
5-3
-------
Finally, EPA has requested some specific estimates on cost. We have
attempted to estimate a cost for analysis of one years' worth of data
form a program which tests 100,000 vehicles per month, and one-third of
the vehicles are from the five newest model years. Thus a total of 1.2
million vehicle inspections (up to 1.6 ~ 1.8 million records) are
obtained, and 400,000 vehicles' data are separated, cleaned, sorted and
VIN decoded. The resulting output of failure rates by engine family are
then statistically tested for pattern cases, no other statistical
analysis is performed. Costs are summarized in Table 5-1 for one such
data source. As can be seen, computer costs, if the analysis is done on
a private time-sharing mainframe, are very high. On the other hand,
access to government computers can reduce computer costs by a factors of
3. Costs for analysis of six programs will be six times the estimate;
however, the estimate does not scale linearly with sample size. Halving
the sample size will reduce costs only by about 25 percent.
5-4
-------
TABLE 5-1
COST OF DATA ANALYSIS
Assumptions - Initial tape has 1 2 million vehicles of which 400,000 are
1981+ light-duty. Processing on mainframe - IBM 3033 or equivalent.
CPU time (initial cleanup)3	60 minutes
CPU time (all other processing)	120 minutes
Computer Costs (government system)
CPU time @ $l/sec	$10,800
I/O costs	$1,000
Tape storage	$250
Disk storage	$250
Connect time	$700
Total	$13,000
Computer costs; Private systems	"$40,000
Labor costs
Programmer	120 hours
Manager	40 hours
Analyst	40 hours
Total cost @ $45.00/hour	$9,000
aMay be higher or lower depending on data source
5-5
-------