ESTIMATION METHOD 8: Estimation of Variance of the Size-Weighted Cumulative
Distribution Function for the Total of a Discrete Resource; Horvitz-Thompson Variance Estimator

1	Scope and Application

This method calculates the estimated variance of the estimated size-weighted cumulative
distribution function (CDF) for the total of a discrete resource that has an indicator value equal to or
less than a given indicator level. The size-weight is a measurement of the discrete resource such as
area of a lake. There are two variance estimators presented in this method. An estimate can be
produced for the entire population or for an arbitrary subpopulation with known or unknown size.
This size is the size-weighted total in the subpopulation. The method applies to any probability
sample and the variance estimate will be produced at the supplied indicator levels of interest. This
method does not include estimators for the CDF. For information on CDF estimators, refer to
Section 7.

2	Statistical Estimation Overview

A sample of size na units is selected from subpopulation a with known inclusion probabilities
%= { Tt,,---, 7t.,---, tt:„ }, joint inclusion probabilities given by where i*j, and size-weight

*	a

values w =	}. The indicator is evaluated for each unit and represented by

a

y = {^i	}• The inclusion probabilities are design dependent and should be furnished

a

with the design points. See Section 9 for further discussion.

The Horvitz-Thompson variance estimator of the size-weighted CDF for total, V [Ffl(xfe)], is

calculated for each value of the indicator levels of interest, xk. There are two Horvitz-Thompson
variance estimators presented in this method. The first is a variance estimator of the Horvitz-
Thompson estimator of a total. The second is a variance estimator of a Horvitz-Thompson ratio
estimator. This variance estimator requires as input the CDF estimates produced using the Horvitz-
Thompson ratio estimator of the size-weighted CDF for total, along with the known subpopulation
size.

The output consists of the estimated variance values.

3	Conditions Under Which This Method Applies

•	Probability sample with known inclusion probabilities and joint inclusion probabilities

•	Discrete resource

•	Arbitrary subpopulation

•	All units sampled from the subpopulation must be accounted for before applying this
method


-------
4 Required Elements

4.1	Input Data

v( = value of the indicator for the ith unit sampled from subpopulation a.

71. = inclusion probability for selecting the i'h unit of subpopulation a.

iZy = joint inclusion probability for selecting both the i'h and f' units of subpopulation a.

wt = size-weight value for the i'h unit sampled from subpopulation a.

Fa(xk) = estimated size-weighted CDF (total) for indicator value xk in subpopulation a.

4.2	Additional Components

na = number of units sampled from subpopulation a.

xk = kh indicator level of interest.

Wa = subpopulation size (size-weighted total), if known.

5 Formulas and Definitions

The estimated variance of the estimated size-weighted CDF (total) for indicator value xk in
subpopulation a, V [^(x^)], with known subpopulation size, Wa; Horvitz-Thompson variance
estimator of the Horvitz-Thompson estimator of a CDF is

V[Fa{xk)\= f w;11{yt

w:

w2 •

rr a >

_? w.

K' E— . 4 =

i = 1 It;

i(y^xk) -





dJ = WJ



H*k)




-------
For these equations:

Fa(xk) = estimated size-weighted CDF (total) for indicator value xk in subpopulation a.

'	10, otherwise

xk = kh indicator level of interest.

v(. = value of the indicator for the ith unit sampled from subpopulation a.
71. = inclusion probability for selecting the i'h unit of subpopulation a.

7ty = joint inclusion probability for selecting both the i'h and f' units of subpopulation a.

wt = size-weight value for the i'1' unit sampled from subpopulation a.
na = number of units sampled from subpopulation a.

6 Procedure

6.1 Enter Data

Input the sample data consisting of the indicator values, v,, and their associated inclusion
probabilities, 7L and size-weights, wt. For example,

Calcium

Inclusion

Lake



Probab ility

Area





wt

1.5992

.07734

24.249

2.3707

.00375

92.251

1.5992

.75000

28.018

2.0000

.75000

52.953

7.0000

.00375

362.254

2.8196

.02227

140.671

1.2204

.01406

7.758

1.5992

.03750

29.702

2.9399

.00586

149.276

.7395

.00375

1.081


-------
6.2 Sort Data

Sort the sample data in nondecreasing order based on theyt indicator values. Keep all occurrences
of an indicator value to obtain correct results.

Calcium

Inclusion

Lake



Probab ility

Area





wt

.7395

.00375

1.081

1.2204

.01406

7.758

1.5992

.07734

24.249

1.5992

.75000

28.018

1.5992

.03750

29.702

2.0000

.75000

52.953

2.3707

.00375

92.251

2.8196

.02227

140.671

2.9399

.00586

149.276

7.0000

.00375

362.254


-------
6.3 Compute or Input Joint Inclusion Probabilities

The required joint inclusion probabilities are in the following table. For this example, they were
computed by the formula n.. = [2 (n - 1) ft .ft. ] / [2 n - ft: ~ fti ] and are displayed in the

'J	* J	J

following table.

Joint Inclusion Probability- tl., ftjj= ftj

j

i

1

2

3

4

5

6

7

8

9

1



















2

.000047

















3

.000262

.000983















4

.002630

.009867

.054457













5

.000127

.000476

.002625

.026350











6

.002630

.009867

.054457

.547297

.026350









7

.000013

.000047

.000262

.002630

.000127

.002630







8

.000075

.000282

.001558

.015636

.000754

.015636

.000075





9

.000020

.000074

.000410

.004111

.000198

.004111

.000020

.000118



10

.000013

.000047

.000262

.002630

.000127

.002630

.000013

.000075

.000020

6.4 Obtain Subpopulation Size

Input Wa if using a known subpopulation size. Wa = 156000 for this dalaset.

Calculate Wa from the sample data only if using the variance estimator of the Horvitz-Thompson
ratio estimator of a CDF. Divide each wi by the inclusion probability, ftj , for all units in the sample
a. Sum each of these quantities to obtain Wa .

Wa = (1.081/.00375) + (7.758/.01406) + (24.249/.07734) + . . . + (362.254/.00375) = 155045.265
for this data set.


-------
6.5 Input Indicator Levels of Interest and Estimated CDF Values

For this example data, the variance of the empirical CDF is of interest; xk values = (.7395, 1.2204,
1.5992, 2, 2.3707, 2.8196, 2.9399, 7).

Input Fa(xk) for each xL if the Horvitz-Thompson ratio estimator was used to estimate the CDF.

Calcium

Size-Weighted CDF
for Total,

Ratio Estimator

Xk

to

.7395

290

1.2204

845

1.5992

1995

2.0000

2066

2.3707

26818

2.8196

33174

2.9399

58804

7.0000

156000

6.6 Compute Estimated Variance Values

Calculate V [Fa(xfe)] for xL using the formulas from Section 5.

Compare each v, to xk . Set I(yt
-------
6.7 Output Results

Output the indicator levels of interest and at least the associated estimated variance, V [Fa(x^)].

Calcium

Estimated Variance of
Size-Weighted CDF
for Total,

Ratio Estimator
(x 100,000)

Estimated Variance of
Size-Weighted CDF
for Total,
Wa= 156000
(x 100,000)



y

y

.7395

1.33932

.82786

1.2204

7.75618

3.47933

1.5992

31.28985

7.80689

2.0000

32.70403

7.63202

2.3707

7976.63932

5928.54471

2.8196

9540.63204

5950.33953

2.9399

20483.46898

10551.14618

7.0000

0

91052.68363

7	Associated Methods

An appropriate estimator for the estimated CDF may be found in Method 4 (Horvitz-Thompson
Estimator).

8	Validation Data

Actual data with results, EMAP Design and Statistics Dataset #8, are available for comparing
results from other versions of these algorithms.

9	Notes

Inclusion probabilities, ft , and joint inclusion probabilities, 7iy„ are determined by the design and

should be furnished with the design points. In some instances, the joint inclusion probabilities may

be calculated from a formula such as Overton's approximation where

71.. = [2(n - 1)71.71:.] / [2 n - 71. - 71] , which is used in Section 6.3.

*J	* J	J


-------
10 References

Cochran, W. G. 1977. Sampling techniques. 3rd Edition. New York: John Wiley & Sons.

Lesser, V. M., and W. S. Overton. 1994. EMAP status estimation: Statistical procedures and
algorithms. EPA/620/R-94/008. Washington, DC: U.S. Environmental Protection Agency.

Overton, W. S., D. White, and D. L. Stevens Jr. 1990. Design report for EMAP, Environmental
Monitoring and Assessment Program. EPA 600/3-91/053. Corvallis, OR: U.S. Environmental
Protection Agency, Environmental Research Laboratory.

Sarndal, C. E., B. Swensson, and J. Wretman, 1992. Model assisted survey sampling. New York:
Springer-Verlag.


-------