Results of the 2020-2022 USEPA Burn Wise Residential Wood Heater Testing Laboratory Proficiency Test


Results of the 2020-2022 USEPA
Burn Wise Residential Wood
Heater Testing Laboratory
Proficiency Test

Stack Test Solutions, LLC
August 30, 2022

John F. Buresh
Principal Research Scientist

-------
Table of Contents	Page 1

Introduction	Page 2

Methods and Materials	Page 2

Discussion	Page 4

Conclusions and Suggestions	Page 6

Table 1 Results	Page 11

Attachment 1-3 Dixon Tests	Page 13

l

-------
Introduction

Stack test Solutions (STS), sole provider of the 2020 - 2022 USEPA Burn Wise
Proficiency Test Program, began with proficiency testing at the first laboratory in
February of 2020. This round of the proficiency testing was completed with the final
report submitted from the last laboratory in May of 2022. In all, eight laboratories
participated in the program to remain on the USEPA list of accredited Residential Wood
Heater Testing Laboratories in the Burn Wise program. Those eight include (in
alphabetical order):

ClearStak

Danish Technical Institute
Intertek

OMNI-Test Laboratories
PFS Teco

Poly-Tests Services
Research Institute of Sweden
Strojirensky Zkusebni Ustav

It should be pointed out that the proficiency test was performed at one lab for each of
these companies. If any of these companies have more than one lab performing wood
stove certifications, STS cannot verify nor ascertain the same techniques or lab setup or
testing equipment at any satellite laboratories of these companies. STS submits this
final report to satisfy the requirements of the USEAP Burn Wise Proficiency Test
Provider requirements as described in the USEPA protocols.

Materials and Methods

Special acknowledgement should be given to Indeck Energy, who provided the pellets
for both the conditioning burn and the pellets for the test burn. The pellets were
predominantly from oak trees with some maple and possibly small amounts of birch.
These made for consistent pellets and a pellet that held up well. The pellets were %
inches in diameter and between % inches to % inches long. All pellets were taken
within a half an hour of each other from the pelletizing and bagging line. While the
conditioning pellets were shipped "as is", care was taken with the test burn pellets to
store them in nominal 5 pound hermitically sealed, evacuated storage bags to ensure
there was no moisture or oxidation degradation between pellets burned in February of

2

-------
2020 and March of 2022. Pellets were shipped near the testing date to ensure pellets
would not have time to degrade even if the vacuum seal was lost due to shipping.

All single use audit samples (filters) were procured from ERA-QC in Colorado. ERA-QC
is a well-known provider of environmental sampling audit materials. Filter audit samples
were 47mm glass fiber filters with a known quantity of a white dry material on the
surface. These were sent in advance of the proficiency test so STS could observe the
final weight on site.

Mr. John F. Buresh of STS arrived at the laboratories on Monday mornings, and after
introductions, provided the laboratory staff with the operating parameters provided by
the USEPA, and began the proficiency test. The Proficiency test included all activities
described in the USEPA Protocols: Observation of laboratory technique, equipment set
up, cleaning activities, sample recovery activities and inspection of all equipment
associated with the testing as it pertained to ASTM 2515 and ASTM 2779. STS
remained at the testing location during all of the testing periods, and followed the
sample throughout the stages of recovery until the final deposition in the desiccating
trays.

Upon completion of the final test run, STS affixed seals on the stoves and provided the
laboratories with a final review of observations and allowed time for any follow-up
questions. STS collected the final results of the calculations sheets, and documented
the final analysis of the audit samples. Several weeks later, the laboratories provided
STS with draft reports that were finalized shortly thereafter.

Upon receiving the final report, STS calculated the Dixon outlier test on the gram per
kilogram fuel combusted emissions of each individual test run, and found that no
individual run was an outlier from the data set of the 24 runs (Attachment 1). For Room
Air and Sample Train blank data, STS calculated The Dixon Outlier test on these
(Attachments 2 & 3). The 8 data set is the minimum that can be run with the Dixon
outlier test.

3

-------
Discussion

The Covid Pandemic started shortly after we had begun our 2020-2021 Round Robin
Proficiency Testing Program. STS and the laboratories had to work around this,
changing our schedules as international travel was suspended to half the labs. Some
labs had to reschedule due to Covid with the staff. STS staff was not immune to the
pandemic; coming down with Covid during a trip to Europe servicing labs there. Due to
the severity of Covid, the USEPA granted an extra year for the labs and STS to
complete this round of testing.

STS attended all activities of the USEPA protocols in person with the exception of three
audit filter weighings. Two Laboratories in question did not pass the initial probe audit
filter portion of the test and were required to redo that portion of the proficiency test. A
third laboratory found difficulty with the local customs and postal authorities and the filter
did not arrive at the laboratory until after STS had left the country. All activities of the
redo, including the opening, initial, intermediate and final weighings were observed via a
Microsoft Teams connection.

One laboratory had the unfortunate experience of losing their sample stove. It was
explained to STS that stoves undergoing or having undergone certification have
identifiers to keep them stored indefinitely. Other stoves without those identifiers are
removed on a regular basis. Unfortunately, their proficiency test stove did not have the
proper identifiers and was removed and sent for disposal. This was discovered shortly
before STS was to arrive and a new stove could not be procured in time. Another
laboratory shipped their stove before STS arrived and STS was able to find the security
labels intact and removed them in time for the test.

The Laboratories are de-identified by a color code. STS retains the actual data under
the laboratory name for records kept at STS company offices. The color-coded final
results can be found in Table 1.

At all of the participating Laboratories, STS inspected the wood stoves, the mixing and
sampling ductwork, the external sensors, the sampling trains, and the recovery areas to
ensure they met the standards in ASTM 2515 and ASTM 2779. STS utilized a checklist
that was developed from requirements found in ASTM 2515 and ASTM 2779.

-------
STS observed minor discrepancies from the ASTM methods from lab to lab, but
observed nothing that we believe would invalidate the results of the testing, or overly
bias the results. STS found all labs and staff were capable of performing the testing.

No laboratory was without findings or deviations from the written Method. Some
findings we were able to correct immediately. Corrections such as locations of
thermocouples, or pitot markings were corrected on the spot. Several labs had
inappropriate transition elbows between the mixing duct and the sampling duct. They
agreed to correct that in the near term and be ready for inspection at the next round of
proficiency testing. Several labs utilized Method 5 sample trains, which are not
nominally designed to sample at rates near 5-10 liters per minute (Ipm), and calibrations
were not appropriate for the sampling range.

We found one laboratory in the Sample Train blank did not meet the 90% outlier
requirement set by the USEPA in the 2020 Protocols. In reviewing the data, it was
insisted by our statistician that this test is inappropriate for such a small sample when
operating at such low values near or below the practical quantification limits of the
methods. Perhaps under several rounds of blank tests could an actual outlier be found,
when the data set reaches more than 20. Conversely, having each lab perform several
blanks to achieve a statistically valid number would be another option. Either way we
look to this in the future current data does not justify identifying any laboratory as
outside the round robin under the original scheme.

Other examples of shortcomings and corrects this round included:

1. Train not leak-checking for the entire 60 seconds: I informed the lab that the
method describes a 60 second leak check even though the DGM was not moving
at a pace that would indicate a leak check failure. The next leak check(S)
was/were performed properly.

2.	Hood conical area not meeting 4X diameter of chimney requirement. I ensured
through detailed observation that ALL chimney emissions into the hood were
captured.

3.	Anemometers not scaled low enough to meet Method specifications. There was
not much more that could be done with this other than procure a new
anemometer which was not possible in our time available at the lab(s). The air
was not moving in the sample location in my observation with the anemometer
available indicated no movement of air.

5

-------
4.	Sample probe location inaccurate. There are two factors to use when calculating
the sample locations and the lab(s) missed the second requirement. Upon
presenting the secondary consideration, the lab(s) made the necessary
corrections on the probe and sampled correctly

5.	Filter exposed for longer than 2 minutes during recovery. We reviewed the
method language and discussed how handling changes could be made to
minimize exposure of the filters.

6.	Various deficient laboratory techniques. We discussed how standard laboratory
practices could be implemented to minimize risk of losing sample or data.

7.	Duct lengths not meeting method specifications. We presented our the
measurements and had them double check our measurements and review the
test methodology language. It was long by around 12". Again, testing equipment
that would change the outcome are not what a lab would want during a round
robin test. It appeared they put things together and did not measure the final
result. The Lab indicated they would correct the length for future testing.

8.	Using gloves when handling filters, probes... a couple of labs did not use gloves
for handing the filters and the probes. I explained that for a round robin test, it
would be in their interests to utilize gloves as the majority of labs do. They found
gloves for subsequent sample handling.

9.	No permanent (machine ink) identification marks on filters. The lab used regular
pen to identify the filters. They felt this was adequate, but indicated they would
investigate finding machine ink for labeling filters.

All of these findings were documented and reviewed with the laboratory managers in
their respective labs.

Conclusions and Suggestions

The 2020-2022 USEPA Burn Wise Proficiency Testing Program was the second
association between STS and the Laboratories, and the level of comfort and confidence
with STS attendance was more relaxed. The laboratories allowed me to review
technique and equipment, and as some of this could be considered proprietary, STS
again made all efforts to avoid any documentation that might identify the individual
laboratory. STS concludes that all the results are accurate and represent actual testing
and procedures of the individual laboratories.

After the inaugural proficiency test STS made several suggestions and have listed how
these suggestions affected the results of this round of testing.

6

-------
1.	The Dixon Test used to identify outliers suggests that 8 samples be a minimum
number to use in the test, as it is a significance test, not a confidence test. The
USEPA decided to consider each test run to be an independent variable rather
than the average of the three runs from each laboratory to improve the statistical
strength of the analysis. While this is not an unreasonable decision, it presumes
each run is independent of the prior run. Observations in the field indicated that
since the burn pot in the stove was not cleaned after each run, the prior run
possibly influenced the proceeding emissions value. Field observations recorded
one combustion pot was so fouled the stove could not re-light until the pot was
agitated with pellets in the ash to allow for initial start-up. Be that as it was, the
Dixon test did not find any of the 24 runs outliers due to the wide variability of the
test runs. The mean was 2.00 grams per kilogram with a standard deviation of

0.121 grams per kilogram. If the next protocols are using this scheme to
identify outliers, STS suggests that both the combustion pot as well as the
duct be cleaned prior each run to reduce this variable.

The USEPA protocols were changed to require the combustion pot to be cleaned
prior to each test run. The duct cleaning was NOT incorporated. As you can see
in the data from each lab, the between-run variability of each lab was reduced as
well as the actual measured emissions. It is the opinion of STS that this one
change provided a much more accurate assessment of the laboratory staffs
internal quality assurance skills in sampling.

The USEPA did not incorporate the suggestion to clean the chimney prior to
each run, but clean it prior to the beginning of the first run only.

2.	For the Probe analysis of the Protocols, a 2% error limit was overly liberal,
considering the probes were 39-45 grams in mass. For the next proficiency test,
STS suggests using a Dixon test on three separate probe challenges to each lab.
STS will present the probes on day one and the labs will have three days to
achieve final weights. If final weight cannot be met while STS is on the premises,
they can be finished as per protocols approved by the EPA for remote viewing of
laboratory practices. STS suggests calculating the outlier based on those 24
independent measurements.

The USEPA dropped the probe audit portion of the proficiency test. This
suggestion became moot.

3. The calculations data set proved problematic for many of the laboratories. Their
spread sheets were not designed to take single points. Some found the only way
to calculate the results was by hand instead of the spread sheets they normally
use. The rounding conventions and carrying of significant figures in the answer

7

-------
sheet did not seem to follow USEPA conventions. Perhaps STS can work with
the USEPA developing the answer key in the next calculations sheet to ensure
we understand them well enough to provide guidance to the laboratories. The
USEPA might want to consider either creating a data set of 60 data points
for an hour of simulated testing of the data that a laboratory must collect
on a run, and challenge the laboratory in that manner, or possibly drop this
portion of the test.

The USEPA dropped this portion of the proficiency test.

4.	STS will work with audit sample providers to find a proper audit sample with
suspended particulate in acetone for the labs that recover the sample equipment
with solvent for gravimetric analysis.

The USEPA dropped this portion of the proficiency test

5.	STS recognized room air balance and combustion air might be mitigating factors
in combustion efficiency for these small stoves. To eliminate this, STS suggests
USEPA to consider requiring these stoves be attached to an unobstructed
outside air source.

The USEPA did not include this as a requirement in the most recent proficiency
test.

6.	STS suggests the USEPA requires a flow to be performed prior to each run.

This was included in the most recent Protocols.

7.	STS requests guidance whether the leak check should occur with the flow meter
(rotameter) or the dry gas meter.

STS again requests guidance.

8.	Examining the first runs from each laboratory, it appears two or possibly three
laboratories would not pass the Dixon test and would have been identified as
outliers. Allowing for three runs protects the labs from a very unforgiving
statistical analysis. STS suggests the USEPA considers maintaining the 3-
run course.

The USEPA maintained the three-run course.

9.	Three-hour test runs allow the Laboratories to complete testing in two days. If
the EPA wants greater mass collected on the filters, they could consider going up

8

-------
to four-hour test runs. Five-hour test runs would require the laboratories to have
three days of testing.

The USEPA maintained the three- hour run.

10. The back filter never collected any measurable particulate, and in many

instances actually subtracted from the total catch. I presume this requirement is
for wetter wood testing when there is a greater chance of condensable material
captured. For this testing, USEPA could consider either dropping the need for
the back filter, or only using the data when the mass is a positive value.

The back filter was maintained in the test.

11.	STS recognized some laboratories chose to induce draft in the chimney to a
number just below the ASTM limit of 1.25 Pa (0.005 inches of water). STS is not
certain what that does for combustion, but it probably has an effect. STS
suggests the USEPA considers dropping that limit to 0.25 Pa (0.001 inches
of water) to eliminate that effect.

The ASTM limit was used.

12.	The stove has a 1-9 setting with one being lowest and 9 being highest. The
laboratories operated at #4 setting for the tests. There was some variability that
might be innate or due to some other lab parameter, possibly room air balance or
draft induction. STS suggest EPA selects a number (1-9) for the proceeding
rounds of testing.

The USEPA selected the #7 operation setting for the testing.

13.	Required sample flow rate was an issue last year, as the EPA requested one at
10 Ipm, when the method does not allow greater than 1 (LPM). There were
some labs that did not have the proper equipment to reach the 10 LPM rate.
STS suggests the EPA provides STS with ample time to review protocols in
advance of the proficiency tests to insure appropriateness of the test
parameters.

The USEPA reduced the sample rate to 5 Ipm.

Below I have provided ideas and suggestions for the USEPA to consider for future
proficiency testing under the Protocols the EPA might consider when developing the
protocols for the next round (2023-2024):

9

-------
1. The air flow issue should be addressed. We had the fortune to see one stove
operated by two labs. While neither lab had an air source pulling in outside air, it
is very possible the ventilation system allowed for more air and better burn than
the other one. The particulate results suggest that. As I examined the rooms
where testing was occurring, it was very apparent that the combustion air could
be affected by multiple fans pulling air out of the room. Although it may be difficult
for some labs to accommodate this, the lack of an unobstructed outside
combustion air source may very well be the reason for differences between labs
seen in this round of testing. The manufacturer is very specific in this
requirement.

STS suggests again that the manufacturers requirement for outside air be
heeded.

2. The 5 Ipm was still too high for some labs.

STS suggests going to a longer run and dropping the sample rate to 3 or 4
Ipm.

3. The USEPA has indicated it plans to reduce the testing to 1 run per lab for the
2023-2024 proficiency test. That may well reduce the time labs have to put aside
for testing, and I understand that burden. However, in reviewing the data and
performing the Dixon test on one sample for each lab, moving only one sample
result towards the mean (not an outlier) caused laboratories to fail the 95%
significance test on both the high and low side. 8 samples are the absolute
minimum one can use on the Dixon test. And the statistical treatment is very
rigid and unforgiving when n=8. STS does not make a suggestion on this
decision, but wishes to provide this warning.

4. As can be seen in the data, the labs nearer the low end of the particulate loading
demonstrated greater variability in the results, which may be due to sampling
near the limit of quantification (LOQ). If a requirement for outside air combustion
source reduces all the stoves emissions, we may see LOQ issues driving
variability in the program. That would undercut our purpose to evaluate
proficiency and begin to put chance as a greater part of the differences between
laboratories. STS suggests sample volume of air be considered in the next
proficiency test.

10

-------
Table 1

Parameter

Run
1

Run 2

Run3

Mean

Values

grams/hour

2.24

2.30

2.06

2.20



grams/kg

1.68

1.71

1.48

1.61



Sample Blank Value









0.00 mg

Room Blank Value









0.1 mg

Filter Error









-1.1 mg





Pink







Parameter

Run
1

Run 2

Run3

Mean

Values

grams/hour

1.92

2.18

1.99

2.03



grams/kg

1.07

1.20

1.06

1.11



Sample Blank Value









0.00017 mg

Room Blank Value









0.00008 mg

Filter Error









-0.93 mg





White







Parameter

Run
1

Run 2

Run3

Mean

Values

grams/hour

1.94

1.99

1.93

1.96



grams/kg

1.02

1.06

1.05

1.04



Sample Blank Value









0.0 mg

Room Blank error









0.0 mg

Filter Error









-0.7 mg





Parameter

Run
1

Run 2

Run3

Mean

Values

grams/hour

2.59

2.42

2.50

2.50



grams/kg

1.34

1.30

1.36

1.33



Sample Blank Value









0.0000 mg

Room Blank Value









0.0000 mg

Filter Error









-1.3 mg

11

-------
Table 1 (cont.)





Orange







Parameter

Run
1

Run 2

Run3

Mean

Values

grams/hour

2.101

1.977

1.895

1.991



grams/kg

1.070

1.030

0.977

1.026



Sample Blank Value









0.0000 mg

Room Blank Value









0.0000 mg

Filter Error









-1.0 mg





Blonde







Parameter

Run
1

Run 2

Run3

Mean

Values

grams/hour

2.46

2.39

2.50

2.45



grams/kg

1.35

1.29

1.37

1.34



Sample Blank Value









0.0 mg

Room Blank Value









-0.20 mg

Filter Error









-2.8 mg

Parameter

Run
1

Run 2

Run3

Mean

Values

grams/hour

2.93

2.84

2.73

2.83



grams/kg

1.60

1.54

1.57

1.57



Sample Blank Value









0.2 mg

Room Blank Value









0.00 mg

Filter Error

-0.5







-0.7 mg

Parameter

Run 1

Run 2

Run3

Mean

Values

grams/hour

2.675

2.445

2.480

2.53



grams/kg

1.390

1.318

1.337

1.348



Sample Blank Value









0.008 mg

Room Blank Value









-0.19 mg

Filter Error









-1.01 mg

12

-------
Attachment 1

Dixon's Outlier Test ASTM 2515 Emissions Testing

Number of Observations = 24
10% critical value: 0.367
5% critical value: 0.413
1% critical value: 0.497

1.	Observation Value 1.71 is a Potential Outlier (Upper
Tail)?

Test Statistic: 0.162

For 10% significance level, 1.71 is not an outlier.
For 5% significance level, 1.71 is not an outlier.
For 1% significance level, 1.71 is not an outlier.

2.	Observation Value 0.997 is a Potential Outlier (Lower
Tail)?

Test Statistic: 0.055

For 10% significance level, 0.997 is not an outlier.
For 5% significance level, 0.997 is not an outlier.
For 1% significance level, 0.997 is not an outlier.

13

-------
Attachment 2

Dixon's Outlier Test for Sample Train Blank

Number of Observations = 8
10% critical value: 0.479
5% critical value: 0.554
1% critical value: 0.683

1. Observation Value 0.2 is a Potential Outlier (Upper
Tail)?

Test Statistic: 0.960

For 10% significance level, 0.2 is an outlier.
For 5% significance level, 0.2 is an outlier.
For 1% significance level, 0.2 is an outlier.

2. Observation Value 0 is a Potential Outlier (Lower Tail)?

Test Statistic: 0.000

For 10% significance level, 0 is not an outlier.
For 5% significance level, 0 is not an outlier.
For 1% significance level, 0 is not an outlier.

14

-------
Attachment 3

Dixon's Outlier Test for Room Air Blank

Number of Observations = 8
10% critical value: 0.479
5% critical value: 0.554
1% critical value: 0.683

1. Observation Value 0.1 is a Potential Outlier (Upper
Tail)?

Test Statistic: 0.345

For 10% significance level, 0.1 is not an outlier.
For 5% significance level, 0.1 is not an outlier.
For 1% significance level, 0.1 is not an outlier.

2. Observation Value -0.2 is a Potential Outlier (Lower
Tail)?

Test Statistic: 0.050

For 10% significance level, -0.2 is not an outlier.
For 5% significance level, -0.2 is not an outlier.
For 1% significance level, -0.2 is not an outlier.

15

-------