Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Correcting for shared measurement error in complex dosimetry systems
(USC Thesis Other)
Correcting for shared measurement error in complex dosimetry systems
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
CORRECTING FOR SHARED MEASUREMENT ERROR
IN COMPLEX DOSIMETRY SYSTEMS
by
Terri Kang Johnson
------------------------------------------------------------------------------------------------------
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(BIOSTATISTICS)
May 2007
Copyright 2007 Terri Kang Johnson
ii
Dedication
In loving memory of my late grandfather
iii
Acknowledgements
I would like to thank my committee members: Drs. Daniel Stram, Duncan
Thomas, Bryan Langholz, Frank Gilliland, and Simon Tavaré. This work would not
have been fruitful without their patience and support. I also would like to give
special thanks to Drs. Ann Hamilton and Pete Kraft for being my mentors and good
friends all these years.
iv
Table of Contents
Dedication
ii
Acknowledgements
iii
List of Tables
vii
List of Figures
viii
Abstract
xi
Chapter 1: Introduction
1
1.1 Foreword
1.2 Terminology and Notation
1.3 Measurement Error Components
1.4 Classical vs. Berkson Error
1.5 Assessment of Measurement Error Distribution
1.6 Effects of Exposure Measurement Errors
1.6.1 Effects on Mean Structures
1.6.2 Effect on Variance Structures
1.6.3 Effect on Associations and Interactions between
Covariates and Outcome
1.7 Existing Methods of Correcting for Measurement Errors
1.7.1 Two Stage Approach (regression substitution)
1.7.2 Two Stage Approach (regression substitution):
Correction for Standard Error and Confidence
Interval
1.7.3 Maximum Likelihood Method
1.8 Correcting for Measurement Error in Complex Dosimetry
System with Shared Dosimetry Error
1.8.1 Multiple Imputation
1.8.2 Monte-Carlo Maximum Likelihood
1.8.3 Fully Parametric Bootstrap Method
1
8
9
10
10
12
12
14
14
14
15
15
16
18
20
21
24
v
Chapter 2: Colorado Plateau Uranium Miners and Oak Ridge National
Laboratory Cohorts
27
2.1 Colorado Plateau Uranium Miners
2.1.1 Work History Data
2.1.2 Exposure Data
2.1.3 Modification of Data in Previous Study (Stram et al.,
1999)
2.1.4 Guesstimates
2.1.5 Publicly Available Data
2.1.6 Description of Subjects and Data
2.2 Oak Ridge National Laboratory Workers
2.2.1 Data Collection, Policies, and Practices
2.2.2 Dose Adjustments
2.2.3 Complex Dosimetry System
2.2.4 Description of Subjects and Data
27
28
28
30
32
33
33
37
37
39
43
45
Chapter 3: Update of Single Imputation for Colorado Plateau Uranium
Miners
47
3.1 Correction For Measurement Error
3.1.1 The Model for True Dose and Imputation
3.1.2 Mine-Year Sampling Error
3.2 The Two-Stage Clonal Expansion (TSCE) Model
3.2.1 Biological Rationale
3.2.2 Mathematical Formulation
3.3 Complex Dosimetry System and Shared Errors
3.4 Results
3.4.1 Single Imputation of Dose
3.4.2 The Two-Stage Clonal Expansion Model
3.4.3 Complex Dosimetry System and Shared Errors
3.5 Discussion
49
49
52
53
55
57
58
61
61
66
70
72
Chapter 4: Simulation Experiment: MCML and Fully Parametric
Bootstrap Methods
4.1 Methods
4.2 Results
4.3 Discussion
77
77
80
85
vi
Chapter 5: Applications: MCML and Fully Parametric Bootstrap
Methods
5.1 Fully Parametric Bootstrap Method on the ORNL Data
5.2 Validation of the MCML Method
5.3 MCML Method on the Colorado Plateau Uranium Miners
Data
5.3.1 Total Dose Risk on the Colorado Plateau Uranium
Miners Data
5.3.2 Total Dose and Dose-rate Risks on the Colorado
Plateau Uranium Miners Data
87
88
92
95
96
100
Chapter 6: Conclusions and Future Research
105
References
112
Appendix A 120
vii
List of Tables
Table 1: Frequency Distribution of Mine-Year Exposure Dose
Estimates by Year and State
34
Table 2: Bias Factors and Uncertainties in the ORNL Data
45
Table 3: Results of Fitting Multilevel Random Slope and Intercept
Model by Maximum Likelihood (Stram et al. 1999)
51
Table 4: Mean Coefficient of Variation Comparison with Mine-
Year Measurement Data with and without Guesstimates
in 1950
66
Table 5: Comparison of Maximum Likelihood Estimates of the
Model Parameters (lag time = 9 years)
67
Table 6: Distributions of between-Miner Covariances and
Products of Means
70
Table 7: Simple Linear Regression (OLS) of Covariance on
Product of Means
70
Table 8: Distributions of between-Miner Variances and Squares of
Means
72
Table 9: Simple Linear Regression (OLS) of Variances and
Squares of Means
72
Table 10: Summary of Beta Estimate from the Fully Parametric
Bootstrap Method over 500 Simulations
90
Table 11: Summary of MCML Validity Test
94
Table 12: Dose-rate Parameter Estimate Comparisons
102
Table 13: Estimated Expectation and Variance of Difference
between log of Observed Measurements and log of True
Measurements over the Number of Spot Visits Made Per
Year from Simulation Test
121
List of Figures
viii
)
Figure 1: Frequency Distribution of Mine-Year Exposure Dose
Estimates by Year
34
Figure 2: Frequency Distribution of Mine-Year Exposure Dose
Estimates by Year and State
35
Figure 3: Cumulative Frequency Distribution of First Exposure to
Radon by Year
36
Figure 4: Distribution of Miners Exposed to Radiation in
Underground Mines by Month
36
Figure 5: Flow Chart of Previous Studies Done on the ORNL Data
39
Figure 6: Pathway for a Malignant Cell Described by TSCE Model
56
Figure 7: Conditional Mean Expected Radiation Exposure Dose
from 1950 to 1969 by Month ( Z X E |
62
Figure 8: Mean Coefficient of Variation on Radiation Exposure
Dose from 1950 to 1969 by Month
63
Figure 9: Comparison of Total Expected Dose between New
Estimates and the Recomputed PHS Estimates during
1950 to 1969
64
Figure 10: Mean Total Expected Dose by Age and Study
65
Figure 11: Mean Coefficient of Variation Comparison with Mine-
Year Measurement Data with and without Guesstimates
by Month
65
Figure 12: Lifetime EAR per WLM at Age 70 as a Function of the
Duration of Exposure for Total Radon Exposure of 500
WLM
67
Figure 13: Lifetime EAR per WLM at Age 70 as a Function of the
Duration of Exposure for Total Radon Exposure of 500
WLM with Classical Error
68
Figure 14: Increased Lifetime EAR per WLM at Age 70 as a
Function of the Duration of Exposure for Total Radon
Exposure of 500 WLM with Classical Error
69
Figure 15: Scatter Plot of Covariance by Product of Means
70
Figure 16: Scatter Plot of Variance by Square of Means
72
Figure 17: Comparison of the Effect of Multiplicative Error
( )
22
0.0025
SA Ai
σσ ==
(a). No shared multiplicative error
(b). Variance of shared a multiplicative error equal to 0.0025
( )
2
0.0025
SM
σ =
(c). Variance of shared multiplicative error equal to 0.25
( )
2
0.25
SM
σ =
81
Figure 18: Comparison of the MCML Confidence Interval to the
Distribution of the Fully Parametric Bootstrap
ˆ
r
β when
2
0.0025
SM
σ =
(a). Plot of likelihoods from the MCML method
(b). Distribution of
ˆ
r
β from the fully parametric bootstrap
method
83
Figure 19: Comparison of the MCML confidence interval to the
distribution of the fully parametric bootstrap
ˆ
r
β when
2
0.25
SM
σ =
(a). Plot of likelihoods from the MCML method
(b). Distribution of
ˆ
r
β from the fully parametric bootstrap
method
84
Figure 20: Effect of Unshared Multiplicative Error
( ; )
22 2
0.0025
SM SA Ai
σσ σ == =
2
0.01
Mi
σ =
85
Figure 21: Log-likelihood Plots with and without the Measurement
Error
97
ix
x
Figure 22: Plot of Profile Likelihoods and Average Likelihood using the
MCML Method
(a). Using First 225 Replications
(b). Using Top 1% Maximum Likelihoods (enlarged)
(c). Using Selective Replications (excludes 100 top 10% ranked
likelihoods)
99
Figure 23: Plots of Maximum Log-likelihoods with various Dose-rate
Effects using Doses from Conditional Expectation
102
xi
Abstract
In occupational cohort studies, a panel of experts often creates an exposure
matrix or a dosimetry system that estimates dose histories for workers, and then
these estimates are used in disease-risk analysis. Errors in the exposure matrix that
were shared by time and/or a group of workers were generally ignored. We have
developed and tested two different methods (Monte-Carlo maximum likelihood and
fully parametric bootstrap methods) to study the effect of shared uncertainties. A
simple simulation experiment showed that the MCML agreed with the uncorrected
likelihood ratio test for small additive and small shared multiplicative error
distributions. Clear widening of confidence intervals was seen from the MCML and
the fully parametric bootstrap methods as the shared multiplicative error increased.
Although the confidence intervals widened for both methods under the large error
model, the range of the confidence intervals disagreed. Hence, a validation analysis
was conducted using the Oak Ridge National Laboratory cohort data. We performed
multiple runs of the MCML method on newly created outcome data from running the
fully parametric bootstrap method, and saw that the MCML method was quite a
feasible way to correct for shared uncertainties. However, the results from the
MCML method applied to the Colorado Uranium Miners data showed additional
work may be necessary. The results suggest that 1,000 replications created from the
complex dosimetry system of the Colorado Uranium Miners may not be sufficient to
fully capture the variability of uncertainties. In addition, true dose may need to be
xii
sampled from a distribution conditional on both disease and input data in a case
when there is a strong speculation that dose-response relationship exists.
These comparisons between Stram and Kopecky’s SUMA method, the
MCML, and the fully parametric bootstrap method will give guidance to future use
of “complex dosimetry systems.”
1
Chapter 1. Introduction
1.1 Foreword
Measurement error is a common problem in assessing exposure estimates and
a common concern in epidemiologic studies. The ascertainment of input data and/or
the physical model used to estimate the individual dose may be subjected to the
measurement error problem (Stayner et al., 1999). These errors in input data can
occur due to recall bias, biological variability and laboratory errors, or inappropriate
statistical modeling(Thomas et al., 1993; Hatch and Thomas, 1993). And when one
or more of these random measurement errors occur, risk estimates may be
attenuated, especially in case of large relative variation (the variability of
measurement error relative to the variability of true measurement; Armstrong, 1998).
Furthermore, when exposure is extended over time, measurement error can lead to
such phenomena as spurious inverse dose-rate effects (Stram et al., 1999).
Measurement errors can happen in any of three types of variables (exposure,
outcome, and confounder) during their ascertainment (Armstrong, 1998).
Epidemiologic studies assess the effect of exposure on the outcome, such as smoking
on lung cancer, and try to adjust for potential confounders. In particular, radiation
epidemiology has posed many interesting problems in understanding the effects of
measurement error on attenuation of dose response in all studies, residual
confounding between early and late effects of radiation in Atomic Bomb Study
(Sposto et al., 1991; Neriishi et al., 1991), inverse dose rate effects in the miners
2
(Jablon, 1971; Gilbert, 1984), and the Cohen hypothesis (Cohen, 1995; Puskin,
2003) that low doses of radiation are protective. Many studies that assessed the
effect of radon and smoking on lung cancer used lifetime total exposure dose
estimates for their analyses (Lundin et al., 1971; Wittemore and McMillan, 1983;
Hornung and Meinhardt, 1987; Roscoe et al., 1989). Those studies were followed by
further consideration of individual temporal pattern of exposure (Lundin et al., 1971;
Hornung and Meinhardt, 1987; Thomas et al., 1994; Hornung et al., 1998; Luebeck
et al., 1999). However, all these studies accepted the individual exposure estimates
as the truth, and distortions due the measurement error rarely have been considered
carefully.
Many of these studies in radiation epidemiology involve regression-type
response models for binomial, Poisson, and survival data (Preston and Pierce, 1988;
Stram and Mizuno, 1989; Stram et al., 1990; Stram and Sposto, 1991; Sposto et al.,
1992; Stram et al., 1993; Preston et al., 2000; Preston et al., 2003). In studies of
radiation and cancer, the expected response is typically modeled as linear or
quadratic. However, the radiation dose-estimation errors are often more
homogeneous on a multiplicative scale rather than on an additive scale, and the
distribution of true doses is extremely non-normal (Pierce et al., 1990).
The research study here puts emphasis on understanding the effect of
measurement error in an occupational setting in which the dosimetry system was
based on a hierarchical model. We adapt three different approaches to better
3
understand the implications of assuming a realistic model for dosimetry error in this
system.
1. The shared/unshared multiplicative/additive (SUMA) error method of
Stram and Kopecky (2003).
2. Monte-Carlo Maximum Likelihood (MCML).
3. Fully parametric bootstrap.
We compare and contrast these approaches in a simulation experiment that captures
many of the features in an important and special example.
This dissertation describes in detail what has been done previously for the
miners and makes a certain additional correction that includes the use of a single
imputation model in the case of fitting a TSCE model. In addition, the data corrected
with a single imputation were made publicly available.
It also describes a “complex dosimetry system,” a program that can be used
to simulate repeated dose history for the entire Colorado Plateau Uranium Miners
cohort. This consists of a computer program that computes multivariate realizations
from the conditional distribution of the miners’ exposure histories given all the
measurement data. This computer program involves
A. Sampling random realizations from the distribution of the log of true
dose given log measured dose. This distribution is conditional
multivariate normal with means and variances described below – and
which has been computed already to give the “single imputation” of
dose (see also Chapter 3).
4
B. For each random realization from the full distribution of true dose
given above, exponentiating the realizations and relinking the miner’s
histories to form a full multivariate random imputation of histories
given all measurement data.
As described in the paper by Stram and Kopecky (2003), this system was designed to
be used as a “black box,” i.e. it is assumed that the statistician knows nothing about
the way the dosimetry system is constructed but can run it as many times as
necessary to characterize the effects of uncertainty in dose on the statistical
estimation of the disease response relationship.
The computations are similar to those that allowed computation of the
conditional expectation of log dose histories and conditional variances of the log
dose histories, as will be described later. We use standard methods (e.g. Cholesky
decomposition of the variance-covariance matrix) to generate repeated realizations of
true dose rate from their conditional normal distribution. The matrices at the district
level only need to be computed once, and although large, repeated realizations are
obtained simply by post-multiplying the Cholesky square root of the variance-
covariance matrix with random independent standard normal variables and adding
the conditional means.
Using 1,000 sets of randomly simulated realizations, simple analysis on the
effect of shared uncertainties is conducted. In Chapter 3, we describe the estimating
process of both shared and unshared multiplicative and additive error components
using Stram and Kopecky’s SUMA method (2003). Due to a poor fit of this simple
SUMA model as implemented, the need for a different method to evaluate the shared
uncertainties was foreseen.
Therefore, a simple simulation experiment is described to contrast the results
from Stram and Kopecky’s SUMA method (2003) to those results from the MCML
method and the fully parametric bootstrap method (Chapter 4). First, for the MCML
method, initial β is estimated with a simple model where the hazard model consists
only of exposure dose and age, i.e. ( ) ( ) ( ) ( ) t X t t β λ λ + = 1
0
, where is the
expected cumulative exposure to radon by age t. A profile log-likelihood based on
( ) t X
() () ( )
D
t t S λ log , where D is the actual disease status of a miner, is computed with
each realization of dose over a grid of possible values chosen. At each grid point of
β , profile likelihoods (maximizing out the parameter in the baseline) are calculated
for each realization and averaged over the realizations. Then the
MLE
β is the grid
point that yields the maximum of these averaged likelihoods, and the 90%
confidence limit is the values of β where ( ) ( )
MLE
2 ln ln 2.706 LL ββ ⎡⎤ −− =
⎣⎦
under
.
2
1,0.90
χ
Second, the fully parametric bootstrap analysis consist of simulating D as
well from its conditional distribution:
( ) p N Binomial , 1 =
where
() () () ()
{ }
0
0
exp 1
w
w
p PT w t X t dt λβ => = − +
∫
,
5
6
)
and t is age, is the age at entry, and w is the age at failure or censoring for a
miner. For the purpose of generating data, w is partitioned in one-year intervals, i.e.
, then the probability of surviving until time
is
0
w
( )(
01 2
, , ,..., 0,1, 2,...,
W
wwww w W = =
k
w
()( ) ( ) ( )
01 0 2 1
|| |
kk
p PTw PT w T w PTw Tw PT w T w
−
= > > > > > ⋅⋅⋅ > >
1
.
Then D
i
is regressed on observed doses to re-estimate β
r
. Note that each simulation
use different true doses X
i
to generate different outcomes D
i
, but these different
outcomes D
i
are analyzed using the same observed doses Z
i
as the independent
variable in the regression. The effect of shared dosimetry errors on the distribution
of the estimates of β is examined by comparing the empirical variance of to the
variance from standard analysis ignoring dosimetry errors.
β
ˆ
The purpose of this simulation is to examine the accuracy and computational
feasibility of the MCML method in a setting where a complex dosimetry system has
been used and where considerable “sharing” of dose errors takes place (due to the
extensive amount of interpolation used). The comparisons between Stram and
Kopecky’s SUMA method, the MCML and the fully parametric bootstrap method
will give guidance to future use of complex dosimetry systems.
Error occurring in assessing a dose-response relationship in workplace
exposure condition is not restricted to the Colorado Plateau Uranium Miners cohort;
hence, a different data set was analyzed. The Nuclear Workers Study is an
international collaborative study of cancer risk among workers in the nuclear
industry to estimate the cancer risk following extended low-dose exposure to
ionizing radiation (Cardis et al., 1995; Thierry-Chef et al., 2002). These publicly
available data allow us to test the proposed methods in one participating cohort, the
Oak Ridge Nuclear workers. These nuclear workers were individually monitored
using film badges. However, the badges used were changed over time, and
uncertainties continue to exist in the performance of each type of monitoring
(Thierry-Chef et al., 2002). These monitor-specific uncertainties again are shared
error, for which a lognormal model has been already developed (Thierry-Chef et al.,
2005). A paper, already in development, compared an MCML approach to results
obtained from ignoring shared uncertainties using the Oak Ridge Nuclear cohort data
(Stayner et al., in press). We compare this MCML result with a fully parametric
bootstrap approach since the estimation of a one-parameter excess relative risk
model for cancer risk similar to that described for the Colorado miners is of primary
importance in the Oak Ridge Nuclear cohort (Chapter 5).
However, the MCML result did not fully agree with a fully parametric
bootstrap approach on the impact of shared errors in the ORNL data; therefore,
further examination of the MCML confidence interval was necessary. To examine
the true confidence interval coverage of β from the MCML method, multiple runs of
MCML were executed (Chapter 5). Independent runs of MCML were executed for
each set of new outcomes generated from running the fully parametric
bootstraps. The validity of the MCML confidence interval was examined by
r
D
7
counting the fraction of times in the simulation that the MCML confidence interval
contained the true value of β.
Finally, we apply the MCML method to multiple realizations already created
from the complex dosimetry system of the Colorado Plateau Uranium Miners Cohort
in a cohort setting, instead of in the case-control setting as in the ORNL data
(Chapter 5). We utilize a disease-risk model that contains both total dose and dose-
rate effects. Then the MCML maximizes both parameters simultaneously using 2-
dimensional grid, . ()
total dose dose-rate
, ββ β =
In the remaining of this chapter, we describe necessary terminology and
notation along with an introduction to various methods to correct for measurement
error. Chapter 2 illustrates background and description of Colorado Plateau Uranium
Miners Cohort and Oak Ridge National Laboratory (ORNL) Cohort data. Chapter 3
updates the single imputation of Colorado Plateau Uranium Miner data. It also
describes the way that 1,000 replications of doses were formed for later analyses.
Chapter 4 describes the simulation experiment based on the ORNL cohort. Finally,
Chapter 5 describes applications of the MCML and the fully parametric bootstrap
methods to the Colorado Plateau Uranium Miners Cohort and the ORNL Cohort
data.
1.2 Terminology and Notation
The term “measurement error” refers to any discrepancy between true value x
and its measured value z. Errors are classified as either systematic or random. An
error is systematic if an exposure is consistently overestimated or underestimated at
8
fixed or proportional magnitude. An example of a systematic measurement error
would be a weight scale at a health club that underestimates everyone by 3 pounds.
In many settings, measurement error typically is assumed to have a random
distribution around true exposure, where some are overestimated and some are
underestimated. It is not too hard to see that the effects of systematic errors are easy
to predict, whereas random error poses a much more difficult problem.
9
)
Both systematic and random error can be further classified into differential or
nondifferential error. Differential error depends on the outcome whereas
nondifferential error does not. For example, the error is differential if a case is more
“honest” and gives a more detailed response than a control (e.g., a mother of a child
with leukemia would be more likely to admit the use of marijuana during her
pregnancy; Bhatia and Neglia, 1995). Hence, the error is nondifferential if
, i.e. no recall bias, which implies () ( x z P y x z P | , | = ( )( x y P z x y P | , | = ). In this
paper, we only dealt with non-differential random error.
1.3 Measurement Error Components
There are two general approaches to measurement error problem, the
“structural” and the “functional” approach. For the “structural” approach to the
problem, the unknown exposure x for each individual is regarded as a random
variable with a distribution ( ) x P among the population of interest. The structural
measurement error problem consists of three modeling components: the disease
model , the measurement model ( x y P |) ( ) x z P | , and the distribution of true
exposure (Prentice, 1982; Thomas et al., 1993). On the other hand, for the
“functional” approach, each subject’s exposure value of x is treated as a fixed but
unknown parameter to be estimated jointly with the parameters in the disease model.
While some headway on this problem has been made (Prentice, 1982; Stefanski,
1989; Nakamura, 1990; Buzas, 1995), we focused here on the structural approach.
( ) x P
1.4 Classical vs. Berkson Error
10
)
With the structural measurement error model, a distinction is often made
between “classical” and “Berkson” measurement errors. In a classical measurement
error problem, the measurements are random and independently distributed around
the true values, whereas the reverse is true for the Berkson model. For models of
disease risk when is linear in x, using the observed measurements z in place
of true dose x yields biased estimates under a classical measurement model. For the
Berkson errors on the other hand, no attenuation occurs. The conventional example
of the Berkson model is when in an experiment that uses a machine for delivering
dose x, which are randomly distributed around the “dial-setting” z on the machine.
( x y E |
1.5 Assessment of Measurement Error Distribution
There are several general types of studies that provide estimates of the error
and true dose distributions needed for analysis: validation study/calibration studies,
reproducibility studies, and pathway and/or “instrumental variable” analysis. A
validation study compares a “gold standard” measurement of x against the flawed
measurement z to be used in the main study (Wacholder et al., 1993; Spiegelman et
11
al., 1997). It provides direct estimate of the error distribution, but when available,
the gold standard method is generally expensive and can be done only on a sub
sample Schatzkin et al., 2003).
A reproducibility study obtains indirect information on error distribution by
obtaining two or more separate assessments of z performed on individuals. Key in a
reproducibility study is the assumption of conditional independence of each z’s given
x. However, the responses collected next day on the same question, for example,
may be influenced by the recollection of the previous day’s response or may contain
individual components of error that remain in each repeated measurement for the
same individual. Due to this possible observed correlation between two z’s, a
reliability study provides only a lower bound on the misclassification probability or
measurement error variance.
In an instrumental variables analysis, a second source of information,
concerning true dose with errors that can be assumed to be independent of the error
in measured dose, is collected for some or all participants and the correlation
structure of outcome, dose, and instrumental variable is examined.
A pathway analysis derives an exposure estimate by computing doses over
multiple pathways, which consists of multiple steps with uncertainties (Wacholder et
al., 1993). The final dose is then computed as the expectation of this sum over the
joint distribution of all the uncertainty components. Often due to the complexity of
the dose assignment algorithm, analytic calculation of the distribution of dose errors
is not feasible. In such a situation, a Monte-Carlo simulation can be conducted,
which generates multiple dose estimates for each subject by using the distributions in
terms of their means and variances. The unique feature of this approach is that it
provides a dose estimate for each subject as well as a separate uncertainty estimate
for each individual’s dose. Then these individualized uncertainties could be used to
correct dose-response relationships for measurement errors in a way that essentially
gives greater weight to the subjects with more precisely estimated doses. More
discussion on correction for measurement error is given later in section 1.7.
1.6 Effects of Exposure Measurement Errors
12
)
There are several ways that the exposure measurement errors can have an
effect on statistical inference. They may change the observed mean and variance
structures: compared to ( z y E | ( ) x y E | and ( ) z y Var | compared to ,
respectively. They may also distort the associations and interactions between
covariates and outcomes.
() x y Var |
1.6.1 Effects on Mean Structures
Binary exposure variables
Let
p: , true prevalence of exposure () i x p
i
= = Pr
r: ( i x y r
i
) = = = | 1 Pr , risk of disease in true exposure
m: , misclassification probabilities ( Pr |
ij
mzjx == =)i
where and 1 = ∑
i
p 1 = ∑
j
ij
m . Then the observed disease risks classified by z are
13
)
)
()
()( )
()
()(
()(
∑
∑
=
∑
∑ = = =
= = =
= = =
∑ = = = = =
= = =
i
k
k kj
i ij
i
i
k
i
j
p m
p m
r
k x k x j z
i x i x j z
j x y
j z j x j x y
j z y R
Pr | Pr
Pr | Pr
| 1 Pr
| Pr | 1 Pr
| 1 Pr
Since , 1 ≤
∑
k
k kj
i ij
p m
p m
for all i, j, the risk estimate is biased towards the null.
j
R
Continuous exposure variables
If disease risk is linear in x, ( ) x x y E β α + = | , then the observable exposure-
response relationship can be derived by taking expectation over x, so that ( z y E |)
) () ( z x E z y E | | β α + = . Assuming the observed exposure and true exposure has a
linear relationship () bz a z x E + = | , we have
( ) ( )
()
z
bz a
bz a
z x E z y E
' '
| |
β α
β β α
β α
β α
+ =
+ + =
+ + =
+ =
where and . Under a classical error model z=x+e, b equals a β α α + =
'
b β β =
'
()
()
()
() ( )
1
,
<
+
=
e Var x Var
x Var
z Var
z x Cov
. Hence, , and b is an attenuation factor. β β β < = b
'
1.6.2 Effect on Variance Structures
Consider a linear dose-response model ε β α + + = x y , where ε is the error.
With an observed exposure z instead of true exposure x, the model can be rewritten
as
( ) ( ) [ ]
()
*
|
| |
ε β α
ε β β α
+ + =
+ − + + =
z x E
z x E x z x E y
where ( ) () ( ) ε β ε Var z x Var Var + = |
2 *
. Since ( ) 0 |
2
> z x Var β , ( ) ( ) ε ε Var Var >
*
,
i.e. y|z takes over dispersed random variable with variance larger than when true
exposure is measured.
1.6.3 Effect on Associations and Interactions between Covariates and
Outcome
Suppose there are two outcome variables, and , each with a linear dose-
response relationship and are independent given true exposure. Then the observed
covariance between and given estimated dose z is
1
y
2
y
1
y
2
y
() ( ) z x Var z y y Cov | | ,
2 1 2 1
β β = . Hence, the measurement error induces an
artifactual association between response variables that would otherwise be seen to be
independent if true exposure x was available.
1.7 Existing Methods of Correcting for Measurement Errors
Despite zealous effort to minimize the error when collecting data,
measurement errors cannot be fully avoided, and therefore, analyses results may be
distorted to yield wrong inferences. Various methods have been proposed for
14
adjusting the effect of measurement errors in assessing the relationship between
exposure and outcome (Prentice, 1982; Armstrong, 1989; Rosner et al., 1989; Rosner
et al., 1990). Such methods to correct for measurement errors include two stage
approaches (single imputation and calibration equation methods) and structural
equations (maximum likelihood).
1.7.1 Two Stage Approach (regression substitution)
As in section 1.6.1, assume disease risk is linear in x, ( ) x x y E β α + = | , and
the relationship between observed and true exposure is also linear, e bz a x + + = ,
where ( )
2
, 0 ~ ω N e . Given data such as that available in validation/calibration
studies, a and b can be estimated by simply regressing x on z, and calculating the
expected value for each subject. With the expected value () z b a z x E
ˆ
ˆ | + = ( ) z x E | ,
β can be estimated by substituting ( ) z x E | for x and the confidence interval can be
assessed by regressing y on ( ) z x E | . However, one should keep in mind that this
simple method generally yields smaller variance estimates for β since it ignores
errors in the estimation of x from z.
1.7.2 Two Stage Approach (regression substitution): Correction for Standard
Error and Confidence Interval
Although the simple method mentioned above may yield a smaller variance
for β , this imprecision of estimation of x given z can be corrected. In a special case
in which both the population exposure and measurement error models are normal,
( )
2
, ~ σ µ
x
N x and ( )
2
, ~ | ω x N x z , the expected value becomes
15
()
()z c c
z
z
z x E
− + =
+
+
+
=
+
+
=
1
1 1 1 1
1 1
|
2 2
2
2 2
2
2 2
2 2
σ ω
ω
σ ω
σ µ
σ ω
σ µ ω
where 1
2 2
2
<
+
=
ω σ
σ
c . This yields a linear relationship between the observed
exposure and response but a biased association towards the null,
( ) ( )
()
z
cz c
z x E z y E
' '
1
| |
β α
β β α
β α
+ =
+ − + =
+ =
where . Thus, one could simply estimate by c β β =
' '
β c β and the variance of
by
β
ˆ
() () ( ) c Var c c Var
2
2 ' 2 '
ˆ ˆ
β β + using the delta method. In general, and are
computed from a validation or reproducibility study. In validation study, Rosner et
al. (1989) proposed a method to correct for measurement error in a logistic
regression analysis. They estimated c by simply regressing x on z since the
coefficient of z in this regression is in c, and estimated as and as
from the validation subpopulation. It is important to note that the delta
method assumes that the validation study population and main study population are
independent.
2
σ
2
ω
2
σ ( ) x Var
2
ω
( x z Var −)
1.7.3 Maximum Likelihood Method
The regression substitution method, both with and without correcting for
variance of the parameter estimate, is clearly an ad hoc approach since different
16
pieces of information are used in separate estimations of the different parameters. In
case where all variables are normally distributed and linearly related to each other,
maximum likelihood using a structural equations approach is much more attractive
than the regression substitution method. The maximum likelihood method attempts
to fit the model in a single stage by estimating all the parameters simultaneously
using all relevant information, and its basic idea is to compute the “marginal model”
that describes the joint distribution of all the observable variables as a function of the
model parameters, after integrating over the unknown x’s.
Consider again the linear disease-exposure and linear observed-true exposure
relationships, i.e. ε β α + + = x y and e bz a x + + = , respectively, where () 0 = e E ,
, () 0 , = e x Cov () 0 , = ε x Cov , and ( ) 0 , = e Cov ε . The model is then fitted by finding
the values of the parameters for the predicted covariance matrix Σ of the observed
data that are closest to their observed values C. This can be done by maximizing the
multivariate normal likelihood.
()()
() () { } β β z - y z - y z) | f(y
1
exp
det 2
1
2
1
2
1
−
− = Σ
Σ
T
n
π
then
L= () () () C tr
n
1
2
1
det log
2
1
2 log
2
1
log
−
− − − = Σ Σ π z) | f(y ,
where , , and
⎥
⎥
⎥
⎥
⎦
⎤
⎢
⎢
⎢
⎢
⎣
⎡
=
n
z
z
z
1
... ...
1
1
2
1
z
⎥
⎦
⎤
⎢
⎣
⎡
=
β
α
β ( ) ( ) β β z - y z - y
T
C = .
17
Then we have score U and information I, respectively, as
⎥
⎥
⎦
⎤
⎢
⎢
⎣
⎡
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
∂
∂
−
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
∂
∂
+
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
∂
∂
− =
− − − −
C tr
C
tr tr U
j j j
1 1 1 1
2
1
Σ
Σ
Σ Σ
Σ
Σ
β β β
and
⎥
⎥
⎥
⎥
⎥
⎦
⎤
⎢
⎢
⎢
⎢
⎢
⎣
⎡
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
∂
∂
∂
∂
+
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
∂
∂
−
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
∂
∂
+
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
∂
∂
+
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
∂
∂
∂
∂
−
=
− − − − −
− − − −
C tr C tr
C
tr tr tr
I
j j j
j j j j
1 1 1 1
2
2
1
2
2
1
2
2
1 1 1
* 2
2
1
Σ
Σ
Σ
Σ
Σ Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
Σ
β β β
β β β β
.
1.8 Correcting for Measurement Error in Complex Dosimetry System with
Shared Dosimetry Error
Several recent epidemiologic studies, such as the Utah Thyroid Cohort Study
(Kerber et al., 1993; NCI, 1997) and the Hanford Thyroid Disease Study (Draft
Report), have used a complex dosimetry system to estimate radiation-related health
effects of exposures to fallout or nuclear-plant releases. In those studies, individual
doses of radioactive iodine (
131
I) to the thyroid gland were estimated many years
after exposure. In a report by Stram and Kopecky (2003), a number of observations
were made concerning analysis of study power to detect a simple dose-response
relationship and analysis of the uncertainty in estimating dose-response relationships
(Stram et al., 2003). Their setting differed from the traditional measurement-error
problem in that errors in dosimetry were not assumed independent from subject to
subject.
Conventionally (e.g. Hanford Thyroid Disease), given W (information
gathered that are related to exposed dose such as the source of milk in the Hanford
Thyroid Disease Study) for each individual, experts come up with dosimetry system
18
to impute total dose, . Due to uncertainties, dose is random rather than fixed, and
shared uncertainties exists. And these uncertainties biased the dose-response
relationship towards the null. As a heuristic, Stram and Kopecky (2003) considered
adapting a simple model for both shared and unshared error as
i
X
Si SA i Mi SM i
Z X ε ε ε ε + + = ,
If we assume a linear model in ,
i
X ( )
i i i
bX a X D E + = | , we have with observed
dose()( ) ( )
i i Mi SM Si SA i i
Z b a Z b a Z D E
* *
| + = + + + = ε ε ε ε
and
( ) b b E =
*
If there are no measurement errors, i.e. Var(ε)=0, or for small b
*
,
()
()
( )
( )
i
i
i
Z D
Z NVar
D Var
Z Z
b Var
i i
≈
−
=
∑
2
2
|
*
σ
“naïve” estimate of variance of b
ˆ
The “naïve” estimate of variance of b neglects the effect of shared errors. But,
consider the total variance of over the distribution of
ˆ
*
ˆ
b
SM
ε and
SA
ε . Then
( ) ( ) { } ( ) { }
()
()
i
i
SM
SA SM SA SM
Z NVar
D Var
b
b Var E b E Var b Var
+ ≅
+ =
2 2
,
*
,
* *
|
ˆ
|
ˆ ˆ
σ
ε ε ε ε
In the presence of shared error and association between disease and exposure
|β|>0, Stram and Kopecky (2003) concluded the following:
1. Ignoring shared error in the dosimetry system does not affect the asymptotic
size of the test of the null hypothesis that β = 0.
19
2. However, sample sizes or the power of a test calculated under a specific
alternative hypothesis, | β | > 0, will be incorrect if they ignore shared
dosimetry error. This yield overstated power or, equivalently, understated
necessary sample size.
3. Ignoring shared dosimetry error will result in narrow confidence intervals.
4. However, it is the upper bounds of | β | that are most affected.
1.8.1 Multiple Imputation
Increasingly, many studies that investigate uncertainty in dose do so by
developing a dosimetry system that gives not just one estimate of dose but rather
many replications of possible dose for each subject (100 in the case of the Hanford
study). Moreover, the estimates are not generated independently for each subject;
instead, each run of the dosimetry system provides a new realization of possible dose
for the entire study population. And the uncertainties in shared characteristics result
in a complex correlation structure between the dose estimates from subject to
subject.
The multiple imputation method draws samples from the joint distribution of
true dose X
given both the input data W and the full set of outcomes D. The
variance of the overall regression estimate is computed as the sum of two statistics:
the sampling variance of and the average of the “naïve” estimates of the variance
of computed conditionally upon X
r
, where r is the number of replicates. The
application of the multiple imputation method for data analysis using a Monte-Carlo
ˆ
r
b
ˆ
r
b
20
dosimetry system is problematic due to the difficulties of modifying the existing
dosimetry system in order to produce samples of dose estimates conditional upon
both the input data and the outcomes. Some discussion of this problem is given in
the section below.
1.8.2 Monte-Carlo Maximum Likelihood
A simulation-based approach for maximum likelihood, in principle, can be
used both to approximate maximum likelihood estimates and to construct
approximations to full likelihood-based confidence limits (based on the change in the
log-likelihood; Geyer, 1996). Let l(a,b) be the log-likelihood ratio for testing the
null hypothesis that (a,b) = (a
0
,b
0
). Because the full likelihood, f(D|W)
is equal to
the integral ( , | ) f d
∫
DX W X , we have
()
( )
()
()
()
()
()
()
()
X
W D
W X D,
W X D,
W X D,
W D
X W X D,
W D
W D
d
b a f
b a f
b a f
b a f
b a f
d b a f
b a f
b a f
b a l
0 0
0 0
0 0
0 0
0 0
, ; |
, ; |
, ; |
, ; |
log
, ; |
, ; |
log
, ; |
, ; |
log ,
∫
∫
=
=
=
00
00
(| , ; , ) ( | )
log ( | , ; , )
(| , ; , ) ( | )
fabf
f ab d
fabf
=
∫
DXW X W
XDW X
DXW X W
00
00
(| ; , )
log ( | , ; , )
(| ; , )
fab
f ab d
fab
=
∫
DX
XDW X
DX
00
,
00
(| ; , )
log
(| ; , )
, |
ab
fab
E
fab
⎧ ⎫
=
⎨ ⎬
⎩⎭
DX
DX
DW (1)
If total of n samples, X
r
, were generated from the distribution
21
0 0
(| , ; , ) f ab XDW (2)
of true dose, X, given both disease, D, and input data, W, we can approximate
Equation (1) as
00
(| ; , ) 1
(, ) log
(| ; , )
r
n
r
f ab
lab
nf ab
⎛⎞
=
⎜⎟
⎝⎠
∑
DX
DX
(3)
Note that the choice of a
0
and b
0
in Equation (2) is arbitrary and that the change in
log-likelihood for any two choices of the parameters (a
1
, b
1
) vs. (a
2
, b
2
) can be
written as l(a
2
,b
2
)-l(a
1
,b
1
). Thus, we can choose b
0
= 0 to remove the conditioning on
D in Equation (1), so that D
i
is independent of X
i
. This implies that we can calculate
confidence limits for the dose-response parameter b by using the samples, X
r
, from
provided by the dosimetry system in Equation (3). (| ) f X W
In the Colorado Plateau Uranium Miners example, an initial β is estimated
with a simple model where the hazard model consists only of exposure dose and age,
i.e. () () ( () t X t t ) β λ λ + = 1
0
where ( ) t X is the expected cumulative exposure to radon
by age t. We initially work with a simple parametric model for the baseline hazard
as an n
th
-polynomial of age, i.e. ( )
0
t λ ( )
0
,
n
tbt n λ 4 = ≤ . The model assumption
behind the model choice for the baseline hazard rate is based on the multistage
model of Armitage and Doll (1957). This model gives a hazard of carcinogenesis
that is polynomial (to fourth to seventh order) with age (Cook, 1969). This differs
from the more complex TSCE model in that there is no plateau of risk at the oldest
ages. For the parametric model, a profile log-likelihood based upon () ( ) ( )
D
t t S λ log ,
where D is the actual disease status of a miner, is computed with each realization of
22
dose over 100 grid values chosen within three standard deviations of initial β to
cover the range of possible confidence interval limits and maximum likelihood
estimate (MLE). At each grid point of β , profile likelihoods (maximizing out the
parameter in the baseline) calculated for each realization is averaged. Then the
MLE
β
are taken as the grid point that yields the maximum of these averaged likelihoods.
Confidence limits are found by observing the change in the log of the averaged
profile likelihoods. For example, the 90% confidence limits were the values of β
where under . () ( )
MLE
2 ln ln 2.706 LL ββ ⎡⎤ −− =
⎣⎦
2
1,0.90
χ
Note that the hazard model can be extended to include more variables that may have
effect on the risk of disease (e.g. age and smoking). These variables are
accommodated in the profile likelihood procedure applied to each realization to
calculate the parameter estimates for those variables. For models with two
parameters of interest, a profile log-likelihood can be calculated over the possible
sets of grid of β ’s, i.e. ( )
total dose dose-rate
, ββ α = to estimate both a response and
inverse dose-rate parameter. One example is a “mechanistic” dose-rate model for
risk of lung cancer mortality as
() () {}
4
total dose
dose-rate
0
dose-rate
11exp
t
tbt xu du
β
λα
α
⎛⎞
=+ − −
⎜⎟
⎝⎠
∫
, where ( ) x u is the
exposure rate at time u and can be estimated from total dose and work history (NRC,
1998; Stram et al., 1999). These methods can be attempted in a partial likelihood
approach as well since the general form that we consider here is basically a relative
risk model.
23
In general, Stram and Kopecky noted that this simpler approach will work
well only for b near 0 because for |b| > 0 the ratio in the summand of Equation (3)
becomes extremely variable. Accurate evaluation requires a surprising and often
prohibitive amount of computer time, combined with the use of special computer
arithmetic techniques to improve numerical accuracy in dealing with these very
variable summands. Therefore, it is best, at least numerically, to perform the
simulation by using the maximum likelihood estimate of a and b as a
0
and b
0
in
Equation (2). Thus, if the dose-response relation is strongly significant, it will be
important to provide a means of sampling from the conditional distribution of X,
given both W and D, just as it is needed in the multiple-imputation method.
1.8.3 Fully Parametric Bootstrap Method
The fully parametric bootstrap method is used instead of a simple bootstrap
method that randomly selects an entire history of an individual because the simple
bootstrap method fails to incorporate sharing of errors. Here we use the term “fully
parametric bootstrap” to mean a simulation experiment in which data are generated
for true X from the dosimetry system and D is generated from the assumed model
with the model parameters set to their estimated values with the resulting distribution
of
ˆ
r
β used as confidence intervals for the
ˆ
β observed with the true data (r is the
index for replication in the simulation experiment). The outcomes, D, are simulated
as
( ) p N Binomial , 1 =
24
where
() () () ()
{ }
0
0
exp 1
w
w
p PT w t X t dt λβ => = − +
∫
,
and t is age, is the age at entry, and w as the age at failure or censoring for a
miner . For the purpose of generating data, w can be partitioned in one-year
intervals, i.e.
0
w
( ) ( )
01 2
, , ,..., 0,1, 2,...,
W
wwww w W = = , then the probability of surviving
until time is
k
w
()( ) ( ) ( )
01 0 2 1
|| |
kk
p PTw PT w T w PTw Tw PT w T w
−
= > > > > > ⋅⋅⋅ > >
1
)
.
Hence, if a failure occurs within the first year for an individual, the outcome
simulation process will stop for that individual; otherwise the outcome will be
simulated in the second year. If a failure occurs in the second year, the outcome
simulation process will stop for that individual otherwise the outcome will be
simulated in the third year, and so on. The difference between this cohort approach
of outcome simulation compared to just simulating one outcome within each risk set
is that the cohort approach automatically incorporates a “dilution effect” in which
decreases in time as the higher risk individuals are removed from the later
risk sets.
( | EX Z
We adopt the same baseline hazard model that is implemented in the MCML
method. Then D
i
is regressed on observed doses to re-estimate β
r
, so that each
simulation uses different true doses X
i
and different outcomes D
i
but the same
observed doses Z
i
. The effect of shared dosimetry errors on the distribution of the
estimates of β is examined by comparing the empirical variance of to the variance β
ˆ
25
from standard analysis ignoring dosimetry errors. The 90% confidence interval, for
example, is calculated by the interval containing 90% of the
ˆ
r
β .
26
27
Chapter 2 Colorado Plateau Uranium Miners and Oak
Ridge National Laboratory Cohorts
This chapter describes in more detail available data on workers employment
and exposure histories while working at either the Colorado Plateau Uranium Miners
or the ORNL. It describes the data collection process, errors and uncertainties
associated with data collection, and previous attempts to correct for possible errors.
It also describes 1,000 replications of dose exposure of the Colorado Plateau
Uranium Miners made publicly available, to be used in later analyses.
2.1 Colorado Plateau Uranium Miners
For the existing exposure dose estimates for the Colorado Plateau Uranium
Miners, the United States Public Health Service (PHS) estimated the exposure rates
by either averaging the measured exposure rates for each mine-year if there were
measurements or by interpolating these averages of measured exposure rates to years
and mines if there were no measurements. Then these measurements were combined
with the miner’s work history to produce monthly dose-rate estimates for each miner
in the cohort.
The work history data and exposure data were both provided by the National
Institute of Occupational Safety and Health (NIOSH) for the Colorado Plateau
Uranium Miners cohort.
28
2.1.1 Work History Data
Work history of the miners was collected at the first examination by
interview, and the mining companies provided some supplementations. The work
history data provided by National Institute of Occupational Safety and Health
(NIOSH) consisted of individual miner’s history in the following order: maximum of
25 occurrences of start working month, 25 occurrences of start working year, 25
occurrences of end working month, then 25 occurrences of end working year,
identification (record) number, sex, race, birth month, birth year, vital status, death
month, death year, status of lung cancer death identified based on ICD code, up to
two possible contributing causes of death, certification number, beginning of
employment month, beginning of employment year, number of different working
periods, end of employment month, end of employment year, total months worked in
underground mine(s), smoking status, year of physical examination, working level
month (WLM), WLM from hard rock mines, number of years worked in hard rock
mines, 5 sequences of smoking information (smoking rate, month began smoking,
and year began smoking), 9 dates that reached certain WLM cutoffs (60, 120, 240,
360, 600, 840, 1800, 3720, infinity), and number identifying mine worked in each of
the 25 working periods.
2.1.2 Exposure Data
Exposure data, also provided by NIOSH, contained the dose-rate
measurements for the mine-year in the Colorado plateau. It also consisted of pit
type, mine type, number of district, locality, and mine, number of spot measurement
29
taken in a mine-year, state number, dose rate, and year. This data included
measurements recorded from those other than underground mines.
The mine-year measurement in this exposure data is an average dose-rate of
radon daughters in working levels (WL) from samples taken during spot visits made
by the PHS inspectors. A standard sample and counting technique for radon
daughters was used, and instruments were calibrated so that the results were
comparable. Missing mine-year measurements (i.e. years in which no measurements
exist from a particular mine) were imputed based on “nearness” of measurements in
time and geography (Lundin et al., 1971). The imputation followed a hierarchical
classification of a mine within locality, a locality within district, and a district within
state. When there was missing mine-year measurements, a combination of forward
and backward extrapolation of a single measurements and interpolation between two
measurements were used to assign the dose-rate. If a mine had only one
measurement or more than a 5-years gap between mine-year measurements, the
annual average was assigned to two years backward and forward. Otherwise, when
there were one to four years between two mine-year measurements, an average of
two adjoining mine-year measurements was assigned to fill in between them. When
this extrapolation and interpolation failed, an area average of the dose rate for that
year was used. Only the actual measurements were used to estimate area averages of
locality, district, and/or state. In order to reduce sampling variability, at least three
measurements in a locality were required to use the area average method. If less
than three measurements were available, the district average was assigned.
30
Similarly, if a district failed to meet the criterion the state average was assigned to
that district.
2.1.3 Modification of Data in Previous Study (Stram et al., 1999)
The PHS made both the work history file and the exposure file available to
Stram and his collaborators. Only Caucasian males, who were employed for at least
one year during 1950 to 1960, were included in the analysis. 345 out of those 3,347
Caucasian males with ICD code of 161 or 162 were assigned as lung cancer cases.
Only those miners who were employed after January 1950 were included since the
measurements taken before 1950 were inaccurate and sparse. In addition, data only
up to December 1969 were incorporated because dramatic reductions in measured
exposure rate were observed due to better monitoring and ventilation systems. If a
miner worked in an open pit mine then his work history of that mine was also
eliminated from the data, i.e. a zero was used as the dose estimate in such periods.
As stated previously, radon daughters were not measured directly until 1951 and the
practice was not systematic until 1952. Hence, these Caucasian miners were
considered in two subsets of data: initial mining exposure began in 1950 or later
(N=2,704) and 1952 or later (N=2,388).
From this modified data, two sets of data were created: work history and
smoking history data. The work history data contained miner’s record number, mine
number, and beginning and end date of underground mining. The smoking history
data contained miner’s record number, birth month, year, and day, total smoking
exposure, death month and year, end of employment month and year, 9 sequences of
31
age in months when WLM cut-points are reached, and 5 sequences of smoking
information (beginning of smoking, smoking status, smoking rate, and end of
smoking), lung cancer status, and age at the beginning and end of employment.
Stram et al. (1999) undertook a reexamination of the above interpolation
process by relating it to a state/locality model for the hierarchy of district, locality,
mine, and year. They attempted to reconstruct the miners’ dose estimates using
mine-year exposures and miners’ history data, which were made publicly available
by NIOSH. Linking the work history file to the exposure file, Stram et al. (1999)
found that many miners’ job history records referred to mines that had no records in
the exposure file. Dr. Victor Archer, a former PHS investigator, provided additional
information regarding these unknown mines. These unknown mine codes in the
work history file were described to be conglomerations or summaries of mines that
miners worked in during specific periods but were unable to recall the exact location.
For example, a miner may recall the state, district, and general locality in a year but
not the specific mine. It was also possible that a miner recalled the state and district
but was unable to recall the specific locality and mine. Each such pattern of missing
data was assigned a distinct mine number, we refer to these as “pseudo-mines.” The
additional data provided by Dr. Archer included information on the geographic
location of most pseudo-mines and their dose-rates that were believed to be used by
the PHS in constructing the exposure data.
32
2.1.4 Guesstimates
“Guesstimates” were assigned to measurements of dose-rate prior to 1951
since measurement of radon samples were taken rather than of radon daughters, as
their significance was not yet known. The guesstimates were made based on
knowledge made in 1951 and 1952 of ore bodies, ventilation practice, emanation
rates from different types of ores and radon measurements.
The extrapolation procedure of Stram et al. (1999) followed the hierarchical
classification of mines within locality, locality within district, and district within
state described in Lundin et al. (1971). Their measurement error model utilized a
multi-level statistical model for all actual mine-year measurements following the
same hierarchical model. Then they replaced the imputation process of the PHS to
provide dose estimates for missing mine-year measurements with an imputation
scheme utilizing the multi-level model. The guesstimates were considered as actual
measurements because although not based directly on measurements of radon
daughters, they incorporated measurements of radon level as well as the expert
opinion of the PHS investigators. The exclusion of guesstimates increased the
coefficient of variation for both the random intercept and slope parameters from 62%
to 80% and 52% to 97%, respectively, while the basic allocation of the percent
variation between mine, locality, and district remained the same with a much larger
fraction of the variation allocated to the mine level (12% to 54%).
33
2.1.5 Publicly Available Data
The same inclusion/exclusion criteria of Stram et al. (1999) were adapted
here. The dataset, now available publicly, provides calculated individual miner’s
dose estimates by calendar year (rather than age) by month. The dataset represents
each miners work history by months during 1950 to 1969. The mine-year dose rate
estimates were evenly divided through months in the year that a miner has worked.
Although the measurements were used in log scale to estimate dose rates, the results
were transformed back into a unit scale (WLM) in the final data presentation.
In addition, a large data set that contains means and variance of all mine-year
estimates are available. Also available are 1,000 random samples of replication from
the “complex dosimetry system” that computes multivariate realizations from the
conditional distribution of the miners’ exposure histories given all the measurement
data to study the effect of shared uncertainties using the simulation approach. These
replications of radon exposure doses are represented in 5-year age intervals. A
sampling program written in SAS and linking program written in GAUSS are also
accessible.
2.1.6 Description of Subjects and Data
A total of 3,903 mine-year measurements were used to estimate individual
miner’s exposure dose. Of these, 3,475 (89%) were averages of actual spot visit
measurements taken by the PHS inspectors over each year during 1951 through
1969, and 428 (11%) were guesstimates from 1950. Figure 1 shows a frequency
distribution of mine-year measurements. Before 1956, most mine-year
measurements came from Colorado and Utah (Table 1 and Figure 2; 41% and 53%,
relatively). Arizona had only one measurement in 1951 and 11 measurements in
1952. During that period before 1956, New Mexico had three measurements and
Wyoming had no measurements at all. However, from 1956 and on, all states except
Wyoming had relatively many mine-year measurements, with Utah contributing the
greatest proportion of the measurements. Wyoming had 22 available mine-year
measurements in four years: 1961, 1962, 1967, and 1968.
34
able 1 Frequency Distribution of Mine-Year Exposure Dose Estimates by
Arizona Colorado New
Mexico
Utah Wyoming
Figure 1 Frequency Distribution of Mine-Year Exposure Dose Estimates by
Year
0
50
100
150
200
250
300
350
400
1951
1953
1955
1957
1959
1961
1963
1965
1967
1969
Number of mine-year measurements
T
Year and State
N % N % N % N % N %
1951
1
20.0 40.0
4
0
1
1952
1953
1
1
0
7.3
0.0
2
96
0
63.6
0.0
0
3
0
0.0
2.0
0.0
2
1
56
40.
27.2
00.0
0
0
0
0.0
0.0
0.0
Table 1, Continued
Arizona Colorado New
Mexico
Utah Wyoming
N % N % N % N % N %
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
0
0
18
19
14
20
21
19
18
18
11
9
3
2
2
2
0.0
0.0
17.8
12.9
25.9
7.1
11.8
5.8
5.4
5.7
4.1
3.4
1.1
0.8
0.8
1.3
1
3
59
79
4
148
93
222
237
213
173
184
209
183
158
107
3.0
75.0
58.4
53.7
7.4
52.7
52.3
10.6
11.0
7.0
12.3
10.8
9.9
9.4
6.7
6.0
0
0
2
24
24
40
37
35
37
22
33
29
27
25
25
9
0.0
0.0
2.0
16.3
44.4
14.2
20.8
10.6
11.0
7.0
12.3
10.8
9.9
9.4
9.7
6.0
32
1
22
25
12
73
27
50
41
62
51
46
35
50
65
31
97.0
25.0
21.8
17.0
22.2
26.0
15.2
15.2
12.2
19.7
19.0
17.2
12.8
18.8
25.0
20.8
0
0
0
0
0
0
0
4
3
0
0
0
0
6
9
0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
1.2
0.9
0.0
0.0
0.0
0.0
2.3
3.5
0.0
Figure 2 Frequency Distribution of Mine-Year Exposure Dose Estimates by
Year and State
0
50
100
150
200
250
1951
1953
1955
1957
1959
1961
1963
1965
1967
1969
Number of mine-year measurements
Arizona Colorado New Mexico Utah Wyoming
35
Figure 3 Cumulative Frequency Distribution of First Exposure to Radon by
Year
142
320
496
772
1177
1402
1718
2188
2328
2529
2678 2694 2712 2714 2715
0
500
1000
1500
2000
2500
3000
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
Number of miners
Figure 4 Distribution of Miners Exposed to Radiation in Underground Mines
by Month
36
37
There were 2,715 underground miners who were exposed to radiation during
1950 to 1969. More than 50% of these miners entered the workforce by the end of
1955, and all miners began working by 1964 (Figure 3). The distribution of total
number of miners exposed to radiation between 1950 and 1969 showed a maximum
centered around 1960 (Figure 4). By May 1969, there was no miner exposed to high
levels of radon from working in an underground mine, and all doses beyond 1969
have been assumed to equal zero.
2.2 Oak Ridge National Laboratory Workers
In September 1942, an isolated area in eastern Tennessee was selected as the
site for the development of full-scale production facilities for uranium separation and
for the construction of an experimental nuclear pile that would be used to produce
plutonium for research in the war effort. An air-cooled experimental pile, a chemical
separation plant and supporting laboratories were constructed at the X-10 site. This
X-10 facility was originally named the Clinton Laboratories, and then later renamed
as Oak Ridge National Laboratory (ORNL).
2.2.1 Data Collection, Policies, and Practices
There were problems with data collected between 1943 and 1972 from the
ORNL for the following reasons. The ORNL began monitoring only selected
workers for external radiation in 1943. Prior to November 1951, only those workers
entering areas of potential external radiation exposure were monitored for external
doses. In 1947, the policy changed to monitor all workers who entered a radiation
38
area more than three times a week. By 1949, all workers entering the restricted area
at least once a week were issued permanent film badges. In November 1951, all
workers entering the main area were required to have a film badge, and in September
1953, the film badge and security badge needed for entry were combined into one
(Watkins et al., 1994).
There were other problems with the ORNL data other than the changes of
policy in that facility. The change in types of instrument used to monitor
individual’s radiation exposure has received the most attention. The ORNL began
monitoring for external radiation, primarily γ−rays, based on pocket meters from
1943 to July 1944. The pocket meters were evaluated daily with minimum
detectable limit of 0.02 mSv (Hart, 1966). Film badges were used from July 1944 to
1975, and thermo luminescent dosimeters were used from then on. The film badges
were evaluated weekly until July 1956, and then quarterly monitoring was initiated.
The lower limit of detection of a sensitive film badge used at the ORNL was 0.10
mSv given an experienced technician evaluating the exposed films (Mitchell et al.,
1997). The dose was recorded as annual external dose in the sum of all readings for
the year. It is important to note that due to the minimum detectable limits, missing
doses (recorded as zero exposure) may have occurred for those workers with low
estimates of radiation dose. This may have led some external radiation dose
estimates to be biased downward. Therefore, the magnitude of recorded dose
reflects the amount of radiation exposure as well as policies and practices of
monitoring and recording at the time. More detailed descriptions of data collection,
policies, and practices are discussed in Watkins et al. (1993).
2.2.2 Dose Adjustments
Figure 5 Flow Chart of Previous Studies Done on the ORNL Data
Hardcopies of personal dosimetry
recorded by technician(s)
(electronically) recorded
yearly radiation dose estimate
Watkins et al. 1994
Epi analyses
(mostly mortality
studies)
Complex dosimetry system:
Bias & uncertainty factors
(cond true dose dist given recorded dose)
Thierry-Chef et al. 2004
Validation experiments
On film-badges
Thierry-Chef et al.
2001 & 2002
Stayner et al. 2004
Dose replication
MCML & 90% CL
ˆ
β
Mortality studies using the ORNL data were first reported in 1993. These
studies, as well as many other types of studies that looked at the association between
radiation exposure and disease, used the individual’s cumulative exposure during
time of employment by summing the yearly external radiation doses as the truth
Gilbert et al., 1989; Wing et al., 1991; Wilkinson and Dreyer, 1991; Wing et al.,
39
40
1993; Gilbert et al., 1993; Cardis et al., 1995; From et al., 1997; Richardson and
Wing, 1998; Richardson and Wing, 1999; Wing et al., 2000). Since there were many
changes in the instruments used to measure the doses and in policies and practices at
the facility, these yearly measurements impose many uncertainties. Due to the
practice of recording as zero exposure for below-detectable doses, reliable and
accurate exposure estimates are especially problematic for lower level exposures
(Kerr, 1994; Wing et al., 1994).
To address the uncertainties on recorded yearly doses, there were largely
three different approaches attempted to correct for the possible bias associated with
external radiation exposure doses. There were three methods (Watkins et al. 1994;
Mitchell et al. 1993; Thierry-Chef et al., in press) that represent different dose
adjustment methods, in which the resulting adjusted exposure doses were made
available to be used in other studies to assess the relationship between disease and
exposure (Watkins et al., 1994; Mitchell et al., 1997; Thierry-Chef, 2005). Figure 5
illustrates what has been done so far with the ORNL data. The correction method by
Mitchell et al. (1997) is not mentioned in this figure since there was no disease-risk
analysis performed on these adjusted doses.
To accommodate the discrepancies in the quality of the ORNL data, the Oak
Ridge Institute for Science and Education (ORISE), as part of the Health and
Mortality Study of Department of Energy (DOE), conducted an extensive review and
revision of electronically available data on the ORNL cohort (Watkins et al., 1993).
Actual hardcopies of recorded doses (daily, weekly, and quarterly recorded doses)
41
were also revisited to check the accuracy of personal monitoring data, and personal
record files and vital status files were rechecked. Adjustments on these yearly
(electronically) recorded doses are described in more detail in Watkins et al. (1993).
Briefly, the implementation depended on selection of unexposed employment-year
(e.g. certain classifications, such as accountants, were excluded from the adjustment
since they were unlikely to have any occupational radiation exposure), availability of
hard copy monitoring records for below detectable dose limit, and availability of
pocket meter data for 1944-1956 doses to estimate missing doses (Watkins et al.,
1994). The newly adjusted yearly external radiation dose for the ORNL cohort was
made publicly available electronically. Studies, before and after the adjustment,
which utilized the ORNL data used the available yearly doses as the truth.
As previously mentioned, the primary concerns for the ORNL data were the
changes in type of devices used to measure the external radiation and the truncation
(or censoring) of low dose due to minimum detectable limits, regardless whether the
device used was a pocket meter or a film badge. Hence, there were bias and
uncertainty associated with using the recorded dose instead of true dose, and
additional uncertainty in applying badge doses to organs or the whole body.
Watkins et al. (1994) and Mitchell et al. (1997) primarily dealt with the
problem of truncation that was introduced in the expressed dose due to the minimum
detection limits of the devices and to the practice of collecting measurements weekly
instead of over a longer period of time, so that true dose was often truncated and
recorded as zero. Watkins et al. (1994) dealt with systematic error due to truncation
42
by using a regression model to estimate the missing dose of each individual, whereas
Mitchell et al. (1997) attempted to give a posterior dose distribution for an
individual’s true dose. The method of Mitchell et al. (1997) was applied to a small
number of samples of hard-copy dosimetry records from 1945 to 1955. There are a
few challenging issues that merit attention with this approach. First, this method
needs to be applied to each employment-year of each individual, and this process can
be extremely costly and time consuming. Second, it ignores possible correlation in
dose history. A worker who received zero exposure for four years consecutively, for
example, would be more likely to receive zero exposure on fifth year than a person
who recorded high exposure for four years. In addition, measurement error may be
shared by all workers whose badges were evaluated by a same inexperienced
technician or who worked at a same site. Ignoring these types of shared uncertainty
may yield underestimation of variance (Stram and Kopecky, 2003). Both Watkins et
al. (1994) and Mitchell et al. (1997) assumed that the errors in the recorded values
are independent in time and across subjects; we call these unshared errors. On the
other hand, Thierry-Chef et al. (2004) corrected for the shared error that exists
between the entire cohort by dealing with bias factors found (in experimental work)
to be associated with the various types of dosimeters used, rather than dealing
directly with individual’s recorded dose. They studied the behavior of different
dosimeters used in specific time periods, and found different degrees of bias in each
dosimeter. These studies, however, were only able to partly characterize the amount
of bias that exists for any dosimeter, giving a posterior distribution for true bias
43
centered around its best estimate from the experiments. Then a computer program
was written by Stayner et al. (in press for Radiation Research) using the results of
Thierry-Chef et al. (1994), and this program was used as a “complex dosimetry
system” to modify the dose estimates for all members of the cohort in our work.
This program, while a very simple algorithm, is “complex” in the sense of Stram and
Kopecky (1999) that it gives samples of possible doses rather than a single best
estimate.
2.2.3 Complex Dosimetry System of the ORNL Data
The International Collaborative Study of Cancer Risk among Radiation
Workers in the Nuclear Industry (the International Study), supported by the
International Agency for Research on Cancer (IARC), was set up to obtain more
accurate estimates of radiation induced cancer risk following protracted low doses of
ionizing radiation. A previous paper (Thierry-Chef et al., 2001) described a method
developed to assess the proportion of the dose in three energy ranges (<100, 100-
300, and 300+ keV) using a French multi-element film dosimeter (PS-1) as a
spectrometer (Thierry-Chef et al., 2001). This range has been chosen due to the
response of film dosimeter that is energy dependent and varies particularly in the
ranges between 0-300 keV.
As a part of the International Study, a validation study carried out by Thierry-
Chef et al. (2002) experimented with a selection of historical types of dosimeters
used at various times to estimate errors in the recorded dose so that these errors can
be taken into account when estimating cancer risk (Thierry-Chef et al., 2002). A
44
sample of 10 representative types of dosimeter out of 124 different types used over
time in the participating facilities was selected to assess their responses to different
energies and geometries of exposure typical of those facilities. In the experiment,
the dosimeters were irradiated at known doses and gave estimated doses. Estimates
of error correction factors for the response of the dosimeters were developed from
these experiments by taking the ratio of the personal dose equivalent assessed by the
dosimeter to the known personal dose delivered, i.e. the dose received by the
“phantom” that wore the badge in the experiment. A further complication was that
the original film types used in the 1950s were no longer available. Current films had
been used and calibrated to derive doses. Hence, this is an additional source of
uncertainty that was ignored, and estimates of the bias factors and estimates for the
sampling variability of the estimated bias factors were given rather than complete
characterization or correction for the biases on the dosimeters. Treating bias factors
as random introduces the concept of shared dosimetry error since many individuals
used each dosimeter type.
Then using the error correction factors, a panel of experts was formed to
develop factors for correcting the biases in the recorded doses for all dosimeters used
and to characterize the remaining uncertainty in the dose estimation (Thierry-Chef et
al., 2005). Overall bias factors were developed for each facility and time period
where there was a change in types of personal dosimeter used or other work practices
and policies that may have affected the accuracy of measurements. This was based
partly on the variability of the bias factors observed in the validation experiments
and partly on expert opinions about those dosimeters that were not directly
experimented on.
Table 2 represents the bias factors and uncertainties associated with each
bias factor acquired. There were 4 bias factors for each time period (1943, 1944-
1952, 1953-1979, and 1980-1997) when a change of a film badge used or a practice
or policy at the ORNL made a substantial impact on estimating the radiation dose.
This bias factor B was defined as the ratio between the reported dosimeter dose Z and
the true dose X, i.e.
ii i
B Z X = . Following the general practice of other uncertainty
analyses performed in radiation epidemiology, each bias factor was assumed to take
lognormal distribution with mean log(B) and variance log(K)/1.96, i.e. using the
geometric standard deviation (GSD), K=GSD
1.96
, since K B ⋅ and B K cover 95%
of the estimated values.
Table 2 Bias Factors and Uncertainties in the ORNL Data
Bias Factor (B)
Uncertainty (K)
1943
1944-1952
1953-1979
1980-1997
1.59
1.05
0.93
0.88
1.53
1.62
2.43
1.68
2.2.4 Description of Subjects and Data
The International collaborative study of cancer risk among radiation workers
in the nuclear industry (the International Nuclear Workers Study), a large
45
46
multinational epidemiological study, was conducted by the International Agency for
Research on Cancer (IARC). This study was initiated in 1993 and includes close to
600,000 workers from 15 countries with 154 facilities (18 facilities from the US).
The main study population included workers who had been employed in one or more
facilities for at least one year and had been monitored for external radiation exposure
(x and γ). Those workers with doses predominately from higher energy photon
radiation and possible substantial dose from internal contamination or neutrons were
also excluded. For illustration purposes, we restricted the data to one of the study
facilities, the Oak Ridge National Laboratory (ORNL), as data from this cohort are
publicly available on the DOE CEDR website (http://cedr.lbl.gov). The data to be
used here are based on 5,345 workers who were employed between 1943 and 1972 at
ORNL. These workers were followed through 1990, and their vital statuses were
collected from the Social Security Administration, the National Death Index, and
employers. Information on underlying cause of death and contributory causes of
death related to cancer were ascertained from death certificates. There were 1,029
deaths due to all causes, among these 225 deaths were due to all cancers excluding
leukemia.
47
Chapter 3 Update of Exposure Estimates for Colorado
Plateau Uranium Miners
Measurement error associated with exposure estimates has merited attention
in epidemiologic studies. Several studies have assessed the dose-response
relationship using dose estimates from complex dosimetry systems. For example,
the Utah Thyroid Cohort Study and the Hanford Thyroid Disease Study estimated
individual doses of radioactive iodine transported to the thyroid gland, and the
Colorado Plateau Uranium Miners cohort estimated individual radon particles
deposited in the lungs.
Residential radon exposure has been widely recognized as a risk factor for
lung cancer. However, due to the uncertainties associated with ascertaining accurate
measurements, the risk estimates at low level radon exposure were extrapolated from
the various underground miner cohorts (Lubin et al., 1995). Hence, the Colorado
uranium miners data are important for studying protracted exposure to radon because
detailed information is recorded for individual miners over a 20-year period, and
because of the study’s long follow up period.
Time is considered to be an important modifier of exposure-response
relationship for many chronic diseases, which complicated the extrapolation from
miners’ data to residential exposures. The time periods for radon exposure in mines
were much shorter than typical exposure in residence, and there is a 10-fold or higher
risk associated with miners’ radon exposure compared to residential exposure.
Additionally, time can be described in many different ways, such as age and calendar
48
time at first and last exposure, age at diagnosis, time since exposure (latency),
duration of exposure, and exposure rate, so the differences in the exposure pattern
over time are difficult to describe using common risk analysis methods. Many
studies that evaluated the Colorado Uranium Miners cohort data had not fully
utilized the detailed temporal exposure information available in that data.
Therefore, this chapter describes a data set, now being made publicly
available, which is a reconstruction of miners’ dose estimates for the Colorado
Plateau utilizing a model that incorporates both shared components and temporal
pattern. These data have been analyzed previously in Stram et al. (1999). However
in that paper, the hierarchical model that corrected for interpolation errors was not
fully utilized. In addition, updated information on mine-year estimates is also
available now (personal communication from Dr. Victor Archer, a former PHS
investigator), which in turn reduces the measurement error for certain miners.
Therefore, we further refined a dose reconstruction for individual miners by
incorporating the updated mine-year dose rates and by fully utilizing the hierarchical
model for interpolation to characterize the measurement error for the Colorado
Plateau Uranium Miner Cohort.
As an example of the use of the reconstructed dose estimates, we presented
an updated analysis of the two-stage clonal expansion model proposed in Luebeck et
al.
(1999) using these estimates. We illustrated that Luebeck’s results are sensitive to
measurement error, especially the promotion effect that was attributed to the inverse
dose-rate effect often noted by other investigators. In particular, we showed that the
impact of exposure uncertainty on the estimated parameters of the TSCE model is
sensitive to classical error. Heidenreich et al. (2004) only focused on the Berkson
error structure in their analysis of the impact of measurement error in the same data.
Furthermore, we examined the results from the updated complex dosimetry
system by computing the average shared and unshared multiplicative and additive
error components suggested by Stram and Kopecky (2003). We sampled multiple
realizations of true dose from its conditional distribution using the same multilevel
model for dose error and analyzed the covariance matrix to show the effect of shared
uncertainties.
3.1 Correction for Measurement Error
Here we describe the computation of ( ) | EX Z for reconstruction of
dosimetry system simulations of random realizations from distribution of X|Z. These
calculations were all based on following model.
3.1.1 Model for True Dose and Imputation
Let denote the log of the true average dose rate with working
level months (WLM) as its unit in year t in mine m within locality l within district d.
The measured values
dlmt
x
dlmt
X
z (on log scale) are written as
dlmt dlmt dlmt
e x z + =
where ( )
2
, 0 ~ γ N e
dlmt
.
The model for the geographical and temporal variation of the true WLM
exposure is a multilevel random slope and intercept model:
49
( ) () ( ) ()
( )
dlmt
M
dl m
L
d l
D
d
M
dl m
L
d l
D
d dlmt
t b b b a a a x ε β α + × + + + + + + + = .
Here the α and β are the intercept and slope of an overall linear change in log WLM
levels by year. The a’s and b’s specify random intercepts and slopes at each level of
the hierarchy. The model assumes that the intercepts and slopes at each level are
uncorrelated with the intercepts and slopes at different levels and with
dlmt
ε , where
the residuals
dlmt
ε are normally distributed with mean-zero and variance .
However, the intercepts and slopes at the same level of the hierarchy may be
correlated with each other.
2
σ
Then
() ( ) () ( )
( )
[]
() ( )
() ( )
.
1
dlmt
dlmt M
dl m
L
d l
D
d
M
dl m
L
d l
D
d
dlmt
M
dl m
L
d l
D
d
M
dl m
L
d l
D
d dlmt
b b b
a a a
t
t b b b a a a x
ε
ε
β
α
ε β α
+ =
+
⎥
⎥
⎦
⎤
⎢
⎢
⎣
⎡
+ + +
+ + +
=
+ × + + + + + + + =
β t
At each level the intercepts and slopes were assumed to be random variables that
have normal distributions with mean-zero and homogeneous variance-covariance
matrix, , , and
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
D
D
d
D
d
D N
b
a
, ~
β
α
()
()
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
L
D
d
D
d
d L
d l
L
d l
D
b
a
c N
b
a
, ~
()
()
()
()
()
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
M
L
d l
L
d l
d l M
dl m
M
dl m
D
b
a
c N
b
a
, ~.
The MLN program was used to estimate the parameters, β , , D
D
, D
L
, and D
M
,
which are then treated as known values (Stram et al., 1998; Table 3).
2
σ
50
Table 3 Results of Fitting Multilevel Random Slope and Intercept Model by
Maximum Likelihood (Stram et al. 1999)
Model Parameter
Estimate
Standard
Error
Overall intercept α
Overall intercept β
1.269
1
-0.111
1
0.124
0.012
Variance of district-level intercepts,
D
d
a
0.252
1
0.0953
Variance of district-level slopes,
D
d
β
0.0029
1
0.0010
( )
D
d
D
d
a Cov β,
-0.0144
1
0.0075
Variance of locality-level intercepts,
( )
L
d l
a
0.0384
2
0.0212
Variance of locality-level slopes,
( )
L
d l
β
0. 0.0002
() ( )
( )
L
d l
L
d l
a Cov β ,
0. 0.0025
Variance of mine-level intercepts,
( )
M
ld m
a
0.3256
1
0.0367
Variance of mine-level slopes,
( )
M
ld m
β
0.0004
1
0.0005
() ( )
( )
M
ld m
M
ld m
a Cov β ,
-0.01851
1
0.003334
Variance of mine-year level random
effects,
dlmt
e
0.8323
1
0.02542
1
p<0.001
2
p<0.006
In addition, due to the interpolation and extrapolation errors, there are errors
in the mine-year measurements themselves. These errors occurred because mines
were not monitored continuously and a number of measurements taken in each mine
varied. Since only natural ventilation was used in early years, the concentration of
radon daughters could fluctuate vastly depending on air temperature and/or
barometric pressure. Hence, we attempted to correct this sampling error by
incorporation of a model for the errors in true mine-year average dose.
51
3.1.2 Mine-Year Sampling Error
The imputation model estimates the true value of the exposure conditioned on
the observed measurements. Then the conditional expectation and variance of ,
and
dlmt
x
()
n dlmt
z z z x E ,..., , |
2 1
( )
n dlmt
z z z x Var ,..., , |
2 1
can be computed from the theory of
multivariate normal distribution as
( ) ( )
()
T
xz
1 -
zz xz xx
z
-1
zz xz x
z x
z z x
Σ Σ Σ − Σ =
− Σ Σ + =
|
|
Var
E µ µ
where and (
dlmt
x = x ) ( )
n
z z z ,..., ,
2 1
= z ,
x
µ and
z
µ are unconditional expectations
of x and z, respectively, and
xx
Σ
zz
Σ are unconditional variances of x and z,
respectively, and ( ) z x
xz
, Cov = Σ . For more detailed calculations of expectation and
variance, see Appendix. To produce the random realizations, we sampled from these
conditional distributions on the log scale then exponentiated and re-linked to miners’
work histories.
Using the multilevel model, Stram et al. (1999) imputed mine-year
measurements with the average mine-year measurements and guesstimates from the
exposure data and the work history data. The dates of work history and birth were
linked to estimate the dose-rate by age in years. All age calculations were done by
rounding down the number from the difference in months between birth date and end
date of interest divided by 30.5.
We exploited the same multilevel model and applied the actual measurements
and guesstimates as observed measurements, similar to Stram et al. (1999). Note
that each imputation uses all the observed measurements from the same district of
52
that mine to estimate the dose rate. Of course for a mine-year with many
measurements, the imputation largely reflects just the average of those measurements
alone. For a pseudo-mine with location completely unknown (no locality and district
information), this was implicitly equivalent to a mine in a district with no
measurements belonging to an appropriate level of hierarchy (state, district, or
locality), i.e. the estimates from the marginal distribution were computed. Between
these two extremes are mine-years in mines with few or no measurement, then the
district and locality information were utilized in the imputation.
These estimators of ( ) ts measuremen all | x E and ( ) ts measuremen all | x Var
are on the log scale. We transformed these to arithmetic scale by
() ( ) ()
⎭
⎬
⎫
⎩
⎨
⎧
+ = z x Var z x E Z X E |
2
1
| exp|
() ( ) { }
() () () ()
,| exp , | 1
1
exp | | | | .
2
ij ij i j
ij i j
Cov X X Z Cov x x z
E x z E xz Var x z Var xz
ρ=−
⎧ ⎫
⎡ ⎤
∗+ + +
⎨ ⎬
⎣ ⎦
⎩⎭
3.2 Two-Stage Clonal Expansion (TSCE) Model
One important concern raised by many epidemiologists is how the time and
age dependence of cancer incidence rates resulting from various temporal patterns of
carcinogenic exposure depend on the way an environmental agent induces cancer
and on the interactive effects of exposures. These patterns of dependency on age and
exposure have been described in relation to models of carcinogenesis in which a
normal cell proceeds by multiple mutations to become a malignant cell. Armitage
and Doll (1957) first introduced the multistage theory, which assumed that a single
53
normal cell could generate a malignant tumor only after it undergoes a certain
number of heritable changes (Armitage and Doll, 1957). The underlying assumption
was that a normal cell must go through a specific order of independent cellular
change i at constant background (spontaneous) rate of
i
λ , in the absence of any
specific carcinogenic insult, in order to become a malignant tumor. This stochastic
multistage model has been proposed to account for the age-dependence increase in
age-specific incidence rates observed for many human carcinomas. However, the
Armitage and Doll model fits age-specific incidence rates with fifth to seventh power
of age (Moolgavkar and Knudson, 1981). In the analysis of many cancer data sets,
including the uranium miners, the hazard functions approach a finite asymptote at the
higher exposure rates so that the Armitage and Doll model becomes worse with
increasing age (Moolgavkar et al., 1993). Due to this positive dependence on age,
the multistage model does not fit well for childhood cancer or on cancers of breast,
ovary, testis, or Hodgkin’s disease, which show a decreasing slope of incidence
during late ages (Moolgavkar and Knudson, 1981). In addition, this multistage
model depends heavily on the stage of the process and time since exposure (Day and
Brown, 1980; Brown and Chu, 1983; Brown and Chu, 1983b; Freedman and Navidi,
1989). More importantly, the Armitage and Doll model does not account for
multiplication and death of cells in any preneoplastic compartment.
Moolgavkar and Knudson proposed a two-stage model that incorporates the
differential growth of cells that also fits data on embryonic cancers, whose age-
specific incidence behaves differently with age than other adulthood cancers
54
55
(Moolgavkar and Knudson, 1981). Moolgavkar and Knudson argued that their two-
stage model is biologically reasonable since no more than two distinct stages have
been experimentally demonstrated and is consistent with the development of
homozygosity at a cancer gene locus.
3.2.1 Biological Rationale
The two-stage model first proposed by Moolgavkar and Knudson (1981)
describes cellular changes with two mutational steps for a normal stem cell to
become a malignant cell. It is assumed that each cell goes through transformation
independently and one malignant cell is sufficient to give rise to malignant tumor. In
addition, Moolgavkar and Knudson assumed that the statistical fluctuations about the
mean time for tumor detection are small once a malignant cell has been generated.
The two-stage clonal expansion model allows explicit consideration of the effects of
a carcinogenic agent on the pathway to cancer: the initiation, transformation, and
proliferation of intermediate cells, so that the results can be interpreted in terms of
key biological events (Luebeck et al., 1999).
Figure 6 describes a pathway for a normal stem cell to become a malignant
cell as proposed by Moolgavkar and Knudson (1981). A stem cell can divide into
two daughter stem cells, it may differentiate (or die), or it may divide into one
normal cell and one that suffered the first event to become an initiated cell. The
initiated cell can divide into two daughter initiated cells, differentiate (or die), or
divide into one initiated cell and one that suffers the second event to become a
malignant cell, which give rise to malignant tumor. In general, when a stem cell
divides, one daughter cell is committed to differentiation whereas the other daughter
remains as a stem cell (Cairns, 1975).
Figure 6 Pathway for a Malignant Cell Described by TSCE Model
D
I
I I
α
β
vX
I
M X µ
56
Moolgavkar (1983) argued that distinction between exposure to initiators and
exposure to promoters is necessary since these two classes of carcinogens have
different modes of action in carcinogenesis: initiators affect the transition rates and
promoters affect the kinetics of growth. Note that proliferation of initiated cells
leads to an exponential increase in the number of intermediate cells, which then leads
to an increase in the probability that one of these initiated cells would undergo the
second mutation event to become a malignant cell. He also argued that if a
carcinogenic agent changes the initiation rate v then the risk of an exposed person
relative to unexposed person remains constant over time. However, if the exposure
changes the proliferation rate (net proliferation rate of α−β−µ) then the relative risk
increases with time. Moolgavkar suggested that this increase in proliferation is
responsible for an increase in the probability of the critical second mutation
(Moolgavkar, 1983). An analysis done by Luebeck et al. (1999) showed that the
Moolgavkar’s two-stage clonal expansion model suggests that radon exposure affects
both initiation and proliferation of intermediate cells with the radon effects on
promotion being strongest and the most statistically significant. Hence, they
attributed the inverse dose-rate effect seen from the study to the promotional effect
of radon exposure.
3.2.2 Mathematical Formulation
The mathematical formulation of the TSCE model is well described in
Luebeck et al. (1999), and it is beyond scope of this paper. This TSCE model
predicts an inverse dose-rate effect if exposure is estimated to increase proliferation
rates of intermediate cells. The following nine-parameter dose-response model was
used:
() () ()[] ( ) ( ) [ ] () { } t R y t S y y y t R t S y
4
2
3 2 1
exp 1 1 1877 year birth , + − − + − =
() () () ( ) [ ] ( ) [ ] {} t R g g t S I g g t R t S g
3 2 1 0
1 log 0 1 , + + > + = ,
where S(t) is the smoking rate and R(t) is the radon exposure rate at age t. Then y
measures the rate of initiation of normal cells and g measures the net proliferation of
initiated cells. The rate for the second rate-limiting step was assumed to be constant,
m=1, as in the previous finding (Moolgavkar et al., 1993).
57
58
)
As mentioned previously, there are possible errors from each miner’s work
history as well as the estimated radon exposure measurements that contribute to large
errors in assessing individual exposure. It has been noted that these individual radon
exposures used in all previous analyses to assess the lung cancer risk had been
overestimated (Lundin, 1971). In our analysis, we have used the estimate of true
dose given observed derived for radon exposure R(t) to determined the
results of this overestimation. It was also in the R(t) derivation where classical error
was incorporated. Note that exposure to smoking S(t) was uncorrected for
measurement error.
( Z X E |
We used a computer program in Fortran that was written by Mark Huberman
to estimate the parameters for comparisons of the different single imputation data
sets.
3.3 Complex Dosimetry System and Shared Errors
A program to compute the conditional mean and covariance of radon
exposure doses given the observed dose measurements and estimates of parameters
for the model of Stram et al. (1999) was written in the computer language GAUSS.
It utilized the full hierarchical model described previously. These means and
covariance estimates were outputted as an ASCII file, and then imported into SAS
program Version 9 to calculate estimates for all mine-years (20 years for each mine).
One large permanent SAS data set (dosim.sas7bdat) that contains means and
covariances for all mine-years was created.
59
Using the dosimetry system components (means and covariances) stored as
SAS data set dosim.sas7bdat, radon doses were simulated for each miner. This file
was first linked to all the miners’ work histories to find the necessary mine-years
(mine-years that any miner worked in) to be simulated. Conditional means and
covariances of necessary mine-year were extracted from the dosimetry component
file. If there were no observed measurements of a mine-year in the dosimetry system
such that no conditional mean and covariance were in the dosimetry system because
no information was available even at the district level, i.e. a miner worked in a
“pseudo-mine,” then a marginal mean and covariance were calculated.
In order to simulate data, a Cholesky decomposition of a covariance matrix
was computed for each district (since districts are independent of each other). The
factor procedure in SAS version 9 was used for this step. Since the factored matrix
(an output from PROC FACTOR) is standardized to unit variance, randomly
generated data under standard normal multiplied by the “common factor” matrix
need to be multiplied again by their standard deviations.
The simulated data were re-linked to each miner’s work history to estimate
the individual miner’s cumulative radon exposure dose in 5-year age intervals. Since
a miner rarely worked before the age of 15 years, cumulative dose was calculated
beginning at 15 years of age and 5-year intervals onward. Similarly, there was no
miner who worked in a mine after age 75; hence, the last cumulative dose was
between 70-75 years of age.
60
Final output data describe each individual miner’s radon exposure in 5-year
age intervals: ages <15, 15-20, 20-25, 25-30, 30-35, 35-40, 40-45, 45-50, 50-55, 55-
60, 65-70, and 70-75 years. It also includes miner’s ID, birth month and year, death
month and year, and simulated data indicator number. We have tentatively
generated 1,000 data sets to be used for interested users; in addition, we have posted
the macro SAS program with necessary data sets so that a user can request a larger
number of data sets to be simulated if necessary. Linking of miners’ work history
data to simulated data for a 5-year age interval of cumulated exposure calculation
was done in GAUSS version 5.0. However, we must warn the users in advance that
it takes approximately 30 minutes to generate 1,000 data sets of mine-year data and 5
hours to link the generated data sets into 5-year age-exposure intervals. The linking
of the simulated data set takes much longer since it needs to be done for each
individual miner and each simulated datum. Hence, for 2,715 miners with 1,000
simulations each, the program loops over 2,715,000 mine-years during the linking
process. The linking of the simulated data set with individual miner’s work history
process was later improved by using Fortran instead of SAS when we modified
miners data into one –year-age-exposure intervals to estimate parameters using the
MCML method (Chapter 5). It essentially cut the time from 5 hours to 10 minutes.
Stram and Kopecky (2003) suggested some initial exploration of the results
from complex dosimetry systems (i.e. systems that generate dose realizations,
incorporating both shared and unshared uncertainties for the entire cohort). In
particular, they suggested computing the average shared and unshared multiplicative
and additive error components as follows:
61
j
(1). Compute the samples variances, , and covariances, , for
subjects i and j, (i
i
V
ij
C
≠ ) over the dose replications.
(2). Regress these variables on the squared means and product of mean
doses, respectively.
The slope term in the regression for directly estimates the average shared
multiplicative uncertainties. The other shared and unshared components are also
easily calculated from the regression estimates from and . The effect on
statistical power of shared dose uncertainty was examined using the following
approximation of the relationship between true
ij
C
ij
C
i
V
*
1 β − and assumed power 1 β −
ignoring measurement error:
()
*
1
1
2
11
1
SM
z
z
zz
β
β
αβ
σ
−
−
−−
=
++
,
where is the value from the normal and . z α is the type 1 error.
3.4 Results
3.4.1 Single Imputation of Dose
There are three different sets of doses used here:
1. The “recomputed” unadjusted PHS estimates derived from re-linking the
miner’s work history to the mine-year file given to us by the PHS
2. The Stram et al. (1999) adjusted estimates
3. The newly adjusted estimates
Note that the first dose from data 1 is not identical to the original PHS doses used by
Luebeck et al. (1999) since additional information was incorporated to re-compute
the PHS estimates. In most cases, they are very similar to Stram et al. (1999).
62
)
Figure 7 Conditional Mean Expected Radiation Exposure Dose from
1950 to 1969 by Month
( Z X E |
The monthly mean of “newly adjusted” imputed radiation exposure dose of
Colorado Uranium Plateau Miners showed a steady decrease over the period (Figure
7). In January 1950, 10 miners who worked in underground mines received an
average of 37.3 WLM (SD=18.0 WLM). By August 1957, when the maximum
number of miners (N=1,427) was exposed to radiation, the mean exposure
diminished to 11.2 WLM (SD=12.5 WLM), a decrease of 70%. In April 1969, only
25 miners of the original cohort were still exposed to radiation, at an average of 1.4
WLM (SD=0.7 WLM). Also, examination of the variability of dose indicated that
the dose measurements of the underground mines had become much more accurate
over time. Although the decrease was less monotone and smooth as for the mean
exposure estimates, the improvement in accuracy of dose measurements was
indicated by the steady decrease of the coefficient of variations (Figure 8).
Figure 8 Mean Coefficient of Variation on Radiation Exposure Dose from
1950 to 1969 by Month
The comparison of total exposure doses between the new and the recomputed
PHS estimate showed a linear relationship although the correlations were small
(r=0.75). The estimates were dispersed quite widely around the slope of 1, 82 out of
2698 (3%) with absolute residual larger than 1000 (Figure 9). After excluding these
“outliers”, the goodness of fit improved from 0.75 to 0.90. On the other hand, the
radiation exposure dose estimates calculated from the previously analysis, which did
not fully utilize the additional data, tend to be slightly smaller: mean difference of
12.1 (SD=485.6) WLM and median difference of 21.9 WLM with range=[-14205.0,
63
1718.2] WLM. The correlation between the old and new estimates was 0.80.
However, the goodness of fit improved dramatically to 0.99 when 11 outliers with
absolute residual greater than 1000 were deleted to assess the correlation. In both
comparisons, the new estimates were slightly larger, but smaller with outliers
included. The miners were most exposed between ages of 19 to 51 with average WL
larger than 10 (Figure 10). There was no miner who was exposed to radiation from
underground mining after age 73.
Figure 9 Comparison of Total Expected Dose between New Estimates and the
Recomputed PHS Estimates during 1950 to 1969
64
Figure 10 Mean Total Expected Dose by Age and Study
Figure 11 Mean Coefficient of Variation Comparison with Mine-Year
Measurement Data with and without Guesstimates by Month
To study the effect of including the guesstimates in the mine-year
measurement data, the coefficients of variation (CV) were compared to the exposure
estimates calculated with and without the guesstimates (Figure 11). The results
65
66
showed that the difference came largely in year 1950 with an increase of more than
20% in coefficient of variation when the guesstimates were excluded (Table 4). In
the subsequent months and years, the increase in CV was very small, ranging from
0.02% to 2.2%.
Table 4 Mean Coefficient of Variation Comparison with Mine-Year
Measurement Data with and without Guesstimates in 1950
Month
Mean CV (%)
Mean CV without
Guesstimates (%)
Increase in
CV (%)
January
February
March
April
May
June
July
August
September
October
November
December
115.1
137.8
145.1
145.9
168.3
166.0
185.6
179.9
188.8
191.0
184.1
177.8
176.1
193.4
188.9
196.4
209.5
207.5
219.9
219.0
222.4
224.0
220.2
216.3
53.0
40.4
30.2
34.6
24.4
25.0
18.6
21.7
17.8
17.3
19.7
21.7
3.4.2 Two-Stage Clonal Expansion Model
The model implemented here for Moolgavkar-Knudson’s two-stage clonal
expansion model was identical to that of Luebeck et al. (1999). It incorporated both
smoking and radon exposure. Table 5 shows the comparison of the maximum
likelihood estimates of the model parameters using various exposure estimates. Both
models fitted with unadjusted exposure estimates (the recomputed PHS estimates)
and Luebeck’s results (fitted using the original PHS doses) showed a weaker
dependence of initiation rate and a stronger dependence of proliferation rate on
radiation exposure then the models fitted with the Stram et al. (1999) adjusted and
newly adjusted exposures. Figure 12 shows the lifetime excess absolute risk (EAR)
per WLM per 100,000 at age 70 as a function of the duration of exposure for total
radon exposure of 500 WLM (the exposures were centered at Age 40). It represents
the comparison of the inverse dose-rate effects using various exposure estimates.
With the recomputed PHS exposure estimates, the inverse dose-rate effect was
apparent until around age 35. However, with the adjusted exposure (used in Stram et
al., 1999), the inverse dose-rate is much smaller and effective until around age 25.
The inverse dose-rate effect was even smaller using the newly adjusted exposure
estimates.
4
y
3
g
Table 5 Comparison of Maximum Likelihood Estimates of the Model
Parameters (lag time = 9 years)
Parameter
Luebeck et
al. (1999)
Unadjusted
(PHS)
Adjusted
(Stram et al.,
1999)
Newly
Adjusted
1
y
X 10
7
(year
-2
)
0.077 0.070 0.055 0.057
2
y
5.642 3.000 3.740 3.329
3
y
[(cigarettes/day)
-2
] 0.013 0.023 0.023 0.023
4
y
[(WLM/month)
-1
]
0.101 0.069 0.232 0.171
0
g
0.091 0.114 0.121 0.114
1
g
0.485 0.241 0.211 0.312
2
g
1.292 1.782 2.340 2.835
3
g [(WLM/month)
-1
] 0.897 0.332 0.192 0.149
67
68
To further illustrate that the correction for measurement error causes the
difference seen in Figure 12, we added classical error that assumed a normal
distribution with mean 0 and variance 0.25 to the newly adjusted doses and refitted
the model using these estimates as well (Figure 13). In Figure 14, baseline EARs
were subtracted from each type of estimates to better compare their inverse dose-rate
effects (for both Figures 13 and 14, the exposures were centered at age 40 and lag
time = 9 years). Figure 14 also shows that the estimated with classical error behaved
similarly to the estimates from the PHS and Luebeck et al., especially after 20 years
of duration.
Figure 12 Lifetime EAR per WLM at Age 70 as a Function of the Duration of Exposure for Total Radon Exposure of 500
WLM (The Exposures were centered at Age 40; lag time = 9 years)
0
5
10
15
20
25
30
0 5 10 15 20 25 30 35 40 45 50
Duration (years)
Lifetime EAR/WLM x 100,000
the Recomputed PHS Stram et al, 1999 Newly adjusted
total radon exposure = 500 WLM
67
Figure 13 Lifetime EAR per WLM at Age 70 as a Function of the Duration of Exposure for Total Radon Exposure of 500
WLM with Classical Error (The Exposures were centered at Age 40; lag time = 9 years)
0
5
10
15
20
25
30
0 5 10 15 20 25 30 35 40 45 50
Duration (years)
Lifetime EAR/WLM x 100,000
the Recomputed PHS Stram et al, 1999
Newly adjusted Newly adjusted w/ CE
total radon exposure = 500 WLM
68
Figure 14 Increased Lifetime EAR per WLM at Age 70 as a Function of the Duration of Exposure for Total Radon Exposure
of 500 WLM with Classical Error (The Exposures were centered at Age 40; lag time = 9 years)
0
5
10
15
20
25
0 1020 30405060
Duration (years)
Lifetime EAR/WLM x 100,000
the Recomputed PHS Stram et al, 1999 Newly adjusted
Newly adjusted w/ CE Luebeck et al
total radon exposure = 500 WLM
69
70
3.4.3 Complex Dosimetry System and Shared Errors
Figure 15 Scatter Plot of Covariance by Product of Means
Table 6 Distributions of between-Miner Covariances and Products of Means
Minimum Maximum Mean Median*
Covariance -2730.65 52716.42 9.98 -0.25
Product of means 2.32 311400.46 4933.38 2357.53
* Among 300 randomly sampled miners
Table 7 Simple Linear Regression (OLS) of Covariance on Product of Means
Estimate Standard error
Intercept
()
2
SA
σ
-4.09 0.13
Slope
()
2
SM
σ
2.9x10
-3
1.50x10
-5
Figure 15 shows a scatter plot of covariance estimates and product of means
between 2,715 miners over the 1,000 simulated datasets for total dose for each
miner. There were 3,684,255 possible pairs of covariances of miners’ total doses
71
that were regressed on their product means. Tables 6 and 7 provide the descriptive
statistics on the covariances and the parameter estimate resulted from fitting a linear
regression to the products means in Figure 15. The negative intercept term, which
estimates
2
SA
σ , was interpreted as having no shared additive error, hence was set
equal to 0. Figure 16 plots the 2,715 variance estimates as a function of the square of
the total cumulative dose estimates. In general, the intercept estimates the sum of
shared and unshared additive error. Since the shared additive error was set to 0 from
the previous analysis, the intercept here (shown in Table 9) estimates the unshared
additive error as
2
A
σ =547.15 WLM
2
. The slope in this analysis represents
. Solving this equation with
()( )
2 2
1 1
SM M
σσ + + 1−
2
SM
σ =2.85x10
-3
WLM
2
calculated
previously, we get
2
M
σ =0.215 WLM
2
. There was a statistically significant effect of
square means on variance (p<0.0001). The combination of Figures 15 and 16
illustrates an obvious strong multiplicative component of error; however, it appears
that the unshared components are far stronger than the shared components. We
discuss possible implications of the finding below.
In the analysis with covariances and product of means, although the slope
term was statistically significantly different from 0 (p<0.0001), which means that
there is evidence of multiplicative shared error, the estimate of
2
SM
σ =2.85x10
-3
WLM
2
was too small to have much effect on statistical power (data not shown).
72
Figure 16 Scatter Plot of Variance by Square of Means
Table 8 Distributions of between-Miner Variances and Squares of Means
Minimum Maximum Mean Median
Variance 0.12 113313.82 2484.12 873.69
Product of means 2.32 344744.39 8873.10 2775.14
Table 9 Simple Linaer Regression (OLS) of Variances and Squares of Means
Estimate Standard error
Intercept
()
22
SA A
σ σ +
547.15 72.61
Slope
()( )
22
11
SM M
σσ
⎡
+ +−
⎣⎦
1
⎤
0.22 3.37x10
-3
3.5 Discussion
The importance of accurate exposure estimates is often understated or
ignored in epidemiologic studies. In the Colorado Plateau Uranium Miners Cohort
dataset provided by the PHS, the exposure is represented in a cumulative fashion in a
73
certain time interval. However, sampling error and interpolation/extrapolation error
in the dose rate estimation process may introduce attenuated risk. Brugmans et al.
(2002) discussed possible explanations for why the promotion effect in radiation
carcinogenesis has been overrated, but did not mention measurement error as an
explanation. We have shown the effects of corrections in measurement errors in the
Colorado data using the TSCE model as an example. Stram et al. (1999) attempted
to improve these estimates by utilizing a hierarchical multilevel model for their risk
analysis, and we further refine the estimates by using the updated mine-year
measurements provided later by the PHS and utilizing an upgraded multilevel model
and make available 1,000 replications of dose for this cohort. This in turn would
provide better data to those researchers who are interested in conducting studies
using the exposure dose rate estimations.
In this chapter, we have dealt with the interpolation error by adapting a
multilevel model and calculating the conditional expectation and variance of true
given observed measurements under that model. We have ignored the possible
Berkson error that may arise from the within-mine differences in miners’ exposures.
Recall that the measurements were taken from unannounced spot visits by PHS
investigators. Also, job descriptions may vary from one miner to another; hence one
spot visit measurement may not accurately represent exposure dose rates for all
miners who worked in that mine. However, most of these underground mines were
relatively very small, so a miner’s job was not specialized during this period (in
many cases, all miners participated in all jobs: digging, transporting, loading, etc.;
74
Lundin et al., 1971). Thus, we concluded that the error introduced from within-mine
differences would be relatively small and can be ignored. Moreover, the true effect
of such errors may be somewhat milder than classical error. We were able to
recreate the effect of uncertainty due to classical error (see Figure 14). Once we
incorporated the classical error when interpolating the data, the effect was similar to
that of the PHS and Luebeck et al. (1999) in that the shape of lifetime EAR per
WLM per 100,000 curve was similar with the incorporation of classical error to that
of Luebeck et al. (particularly the inverse-dose rate effect). Although it is difficult to
explain the effect of classical error for short duration of lifetime EAR per WLM per
100,000 (the dose-rate seems to be increasing for duration of 10 years or less; Figure
14), the inverse dose-rate effect was magnified for a longer duration.
Last, an error may arise from the inadequacy in the miners’ work histories.
We have handled this issue by using the marginal expectation and variance for
“pseudo-mine-years” when the mine location was completely unknown. We are not
completely satisfied with this approach and are uncertain about how this would
affect the risk estimate analysis. Fortunately, only a small portion of mine-years in
the miner’s work history file had no state or district location. More than 50% of
miners in 1950 worked in pseudo-mines. By 1956, less than 30% recalled or had
proper work history records in the worked mines.
In addition to the possible sources of error mentioned above, uncertainties
due to shared error between measurements would prevent an accurate estimate of
disease risks from exposure, particularly on the estimates of the standard errors of
75
the parameter estimates. A paper by Stram and Kopecky (2003) tested the effect of
shared error on power and showed that shared multiplicative error and power to
detect disease risk are inversely related. In this chapter, we first performed a single
imputation by adapting parameter estimates of the multilevel model from the
previous study that corrects for exposure measurement error (Stram et al., 1999).
Hence, in order to understand the effects of both unshared and shared measurement
errors simultaneously, we sampled multiple realizations of true dose from its
conditional distribution using this same model for dose error and analyzed the
covariance matrix to show the effect of shared uncertainties. The result showed that
multiplicative shared error was very small (although significant); this suggests that
the effect of shared dose uncertainty on statistical power was negligible. This may
be further examined as described below.
As Stram and Kopecky (2003) called attention to the effect of errors in
individual-level input data on the system, we saw statistically significant and large
unshared multiplicative and additive errors. We have built a model that accounts for
individual input data error in the model for repeated runs of dosimetry error (for
further investigation to come in later chapters). In contrast to the stated view of
Heidenreich et al. (2004), our model reflects a view that the uncertainties due to
classical error are very important and that there is a rather complicated correlation
structure existing between subjects, which Heidenreich et al. (2004) ignored. In any
case, the focus of this study is to learn more about the impact of measurement errors
in dose-response relationship. Using the TSCE model as an example to see the effect
76
of measurement error associated with the reconstruction of individual levels of radon
exposure, we saw that the inverse dose-rate effect may be overstated, therefore, the
promotional effect of radon on the intermediate cells may be overemphasized in
Luebeck et al. (1999). Hence, we examined our dosimetry system further by
performing additional analyses of the actual lung cancer outcomes of the Colorado
Uranium Miners Cohort using all of the 1,000 replications (Chapter 5).
Recall that the shared multiplicative error is small, but the fit of the simple
SUMA model was not very good. Therefore, we performed additional analysis to
determine the effect of the complex error structure in our dosimetry system. The
emphasis was on how to use those 1,000 replications of doses that we have
produced. Intuitively, one would regress the actual outcome
i
D on each set of the
realization
r
X to get parameter estimate , and use the distribution of these
resulting parameter estimates as an empirical confidence interval for b. However, as
described later, this approach produces biased estimates, hence, it is unfeasible to
construct the confidence interval this way. In next chapters, we described and
applied a more appropriate use of the replications.
ˆ
r
b
The complete dosimetry system, a simulation program and a linking program
to miners’ work histories written in Fortran, and 1,000 simulated multiple
replications for the entire Colorado Uranium Miners cohort are publicly available
(http://Chp220g.usc.edu/dosimetry).
77
78
Chapter 4 Simulation Experiment: MCML and Fully
parametric Bootstrap Methods
In occupational cohort studies, a panel of experts often creates an exposure
matrix or a dosimetry system that estimates dose histories for workers, and then
these estimates are used in disease-risk analysis. Errors in the exposure matrix that
were shared by time and/or a group of workers are generally ignored. Stram and
Kopecky (2003) introduced a method that estimates shared and unshared uncertainty
components, and evaluation of the Colorado plateau data using this model indicated
that only very minor effects of shared error on power or risk estimation were likely
(Chapter 3). However, the assumed linear model for the Colorado data did not
produce a good fit, i.e. a perfect model would lay the covariances perfectly on a
single regression line, so that a different approach may be necessary. We tested two
different methods (Monte-Carlo maximum likelihood and fully parametric bootstrap
methods) in a simple simulation experiment to further study the effect of shared
uncertainties.
4.1 Methods
A simple simulation experiment was conducted to compare the Monte-Carlo
maximum likelihood to the fully parametric bootstrap methods. The simulation was
conducted as follows:
79
1. Individual observed doses Z
i
were generated once from the
distribution of the ORNL cohort, which we approximated as
lognormal with mean 0.032 and standard deviation 0.102.
2. For each subject i with Z
i
generated from Step 1, was
generated under a logistic model using a single sequence of true doses
X
i
drawn from the SUMA model:
i
D
( )
iSA Ai SM Mi i
X Z εε ε ε =+ + + ,
where
SA
ε and
Ai
ε are shared and unshared (individual) additive
errors, respectively, and
SM
ε and
Mi
ε are shared and unshared
multiplicative errors, respectively.
3. A MCML confidence interval was computed using the
from Step 2, and 1,000 dose simulations from the SUMA dosimetry
system.
i
D
A grid of values for ( )
12
, ,...,
n
β ββ β = was set up. The
MCML method averaged the conditional logistic likelihoods
computed for each set of
i
X and
k
β .
ˆ
MLE
β was calculated as the
value of
k
β that maximized the average likelihoods.
4. Using the
ˆ
MLE
β estimated under the MCML from Step 3, the
fully parametric bootstrap simulations were performed 1,000 times.
Each bootstrap replication, r, consisted of sampling
ir
X from the
80
dosimetry system. These
ir
X were used to generate new , and
then
ir
D
ˆ
r
β were computed by conditional logistic regression of on
ir
D
i
Z . The number of times these
ˆ
r
β ’s fall within the 90% confidence
interval from the MCML method was noted, and 5
th
and 95
th
percentile values of
ˆ
r
β were also noted to yield 90% empirical
confidence interval.
To generate the outcome D
i
, a simple logistic regression model with baseline
risk of 5% and odds ratio of 2 per log-Sievert was used. A cohort was simulated
until 223 cases were generated and the number of controls was truncated to equal 10
times the number of cases. Then ten controls were randomly assigned to each of the
223 risk sets.
For the SUMA model, the unshared multiplicative error was fixed at 1, i.e. no
individual multiplicative error was simulated. Both
SA
ε and
Ai
ε were assumed to
follow a normal distribution with mean 0 and standard deviation 0.05. The effect of
additive errors and shared multiplicative error on the MCML method was tested.
The effect of additive errors was examined by comparing the simulations
versus
22
0.0025
SA Ai
σσ == 0
SA Ai
ε ε = = with , and the effect of shared
multiplicative error was examined by comparing the simulations
versus with . In addition, the MCML method was tested
with two different shared multiplicative error variances: and
2
0.0025
SM
σ =
2
0.0025
SM
σ =
2
0
SM
σ =
22
0.0025
SA Ai
σσ ==
2
0.0025
SM
σ =
81
2
0.25
SM
σ = with . The purpose of these experiments was to
observe the performance of the MCML method with various magnitudes and types
of measurement errors. Also,
22
0.0025
SA Ai
σσ ==
SM
ε was assumed to be from a normal distribution
with mean 1. Furthermore, we examined the effect of unshared multiplicative error:
and .
22 2
0.0025
SM SA Ai
σσ σ == =
2
0.01
Mi
σ =
4.2 Results
Figure 17 illustrates the effect of shared multiplicative error. In the absence
of shared multiplicative error (and with very little additive errors), log-likelihood
plots for observed dose and simulated dose were almost identical (Figure 17(a)), and
small shared multiplicative error had very little effect (Figure 17(b)); with the
variance of shared multiplicative error set to 0.0025, the risk estimate from using the
observed dose was 1.25 with a “naïve” 90% confidence interval (ignoring
measurement error) equal to (0.10, 2.41) while the estimate from the MCML method
was 1.20, and its 90% confidence interval was (0.06, 2.34). Figure 17(c) examines
the effect of larger shared multiplicative error. When the variance of shared
multiplicative error was increased to 0.25 from 0.0025, the risk estimate using the
observed dose was 1.31 with “naïve” 90% confidence interval (0.16, 2.46), and the
MCML method yielded 1.01 for the estimate of beta with 90% confidence interval
(0.03, 2.90). Under the small error distribution with and
, the fully parametric bootstrap method yielded the empirical
2
0.0025
SM
σ =
22
0.0025
SA Ai
σσ ==
82
90% confidence interval (-0.11, 2.28) and 89.0% of
ˆ
r
β ’s fell within 90% confidence
interval of the MCML method. In other words, the MCML and the fully parametric
bootstrap method agreed with no measurement error.
Figure 17 Comparison of the Effect of Multiplicative Error
( )
22
0.0025
SA Ai
σσ ==
( the MCML log-likelihood; the uncorrected log-likelihood)
(a). No shared multiplicative error
-3
-2
-1
0
-2 0 2 4
Beta
Loglikelihoods
(b). Variance of shared a multiplicative error equal to 0.0025 ( )
2
0.0025
SM
σ =
-3
-2
-1
0
-2 0 2 4
Beta
Loglikelihoods
83
(c). Variance of shared multiplicative error equal to 0.25 ( )
2
0.25
SM
σ =
-3
-2
-1
0
-2 024
Beta
Loglikelihoods
Figure 18(a) and Figure 19(a) illustrate the average likelihoods from the
MCML method, and Figure 18(b) and Figure 19(b) illustrate the distribution of
ˆ
r
β
from the fully parametric bootstrap method. Under the large error model distribution
with and , the empirical 90% confidence interval was
(-0.41,2.34), and 85.3% of
2
0.25
SM
σ =
22
0.0025
SA Ai
σσ ==
ˆ
r
β ’s fell within 90% confidence interval of the MCML
method; however, 13.4% fell below the lower limit and only 1.3% were above the
upper limit of MCML. It can be seen from Figures 18 and 19 that when the shared
multiplicative varied more, the distribution of
ˆ
r
β became more dispersed.
84
Figure 18 Comparison of the MCML Confidence Interval to the Distribution of
the Fully Parametric Bootstrap
ˆ
r
β when
2
0.0025
SM
σ =
(a). Plot of likelihoods from the MCML method
0
0.2
0.4
0.6
0.8
1
1.2
-2 024
Beta
Likelihoods
(b). Distribution of
ˆ
r
β from the fully parametric bootstrap method
0
20
40
60
80
100
120
-2 -1.2 -0.4 0.4 1.2 2 2.8 3.6 4.4
Beta
Frequency
85
Figure 19 Comparison of the MCML confidence interval to the distribution of
the fully parametric bootstrap
ˆ
r
β when
2
0.25
SM
σ =
(a). Plot of likelihoods from the MCML method
0
0.2
0.4
0.6
0.8
1
1.2
-2 0 2 4
Beta
Likelihoods
(b). Distribution of
ˆ
r
β from the fully parametric bootstrap method
0
20
40
60
80
100
120
-2 -1.2 -0.4 0.4 1.2 2 2.8 3.6 4.4
Beta
Frequency
Figure 20 illustrates the effect of unshared multiplicative error. The
uncorrected log-likelihood and the MCML log-likelihood almost perfectly agree with
each other, as would be expected for a perfectly linear model since we are essentially
86
intervals from the MCML and the fully parametric bootstrap methods as the shared
simulating independent Berkson error and the Berkson error model gives unbiased
estimates only for a linear model. This probably implies that although our disease-
risk model is logistic rather than linear, it is very close to a linear model over the
range of the dose distributions or α used here.
Figure 20 Effect of Unshared Multiplicative Error ( ;
)
22 2
0.0025
SM SA Ai
σσ σ == =
2
0.01
Mi
σ =
-3
-2
-1
0
-2 0 2 4
Beta
Loglikelihoods
MCML Uncorrected
4.3 Discussion
There were several observations from this simple simulation experiment.
First, the MCML agrees with the uncorrected likelihood ratio test for small additive
and small shared multiplicative error distributions. Second, the MCML approach
and the fully parametric bootstrap approach give confidence intervals that agrees
with each other for the small error model, and both are close to the “naïve”
confidence intervals in this case. Third, we see a clear widening of confidence
87
erential
multiplicative error gets larger. Fourth, although the confidence intervals widen for
both methods under the larger error model, the range of the confidence intervals
disagree. In particular, the lower confidence interval for the fully parametric
bootstrap method appears to be too low. We expect that correction for nondiff
measurement error should not change inference about the significance of a test that
0 β = (see Chapter 1 Section 1.8). The MCML and the uncorrected confidence
rom the observed are very similar between 0 and the MLE. This observatio
in line with the basic idea that correction for nondifferential measurement error
should not change inference about the null hypothesis that there is no association
On the other hand, the fully parametric bootstrap method gives much less evidence
against the null hypothesis than either the uncorrected or the MCML likelihoods,
which seems to raise doubt about the validity of the approach. In order to test the
validity of the MCML method explicitly, one should apply the MCML multiple
times and observe the distribution of confidence intervals for beta counting the
fraction of times that this contains the true
limit f n is
.
β , and this will be explored further i
Chapter 5.
n
88
Chapter 5 Applications: MCML and Fully parametric
Bootstrap Methods
In this chapter, we apply the MCML method and the fully parametric
bootstrap method to real data: Colorado Plateau Uranium Miners and Oak Ridge
National Laboratory (ORNL) cohort. Background and description of data were
already described in detail in Chapter 2. First, we made the comparison using the
ORNL cohort data, which is in a much simpler setting than the Colorado Uranium
Miners data. For the ORNL cohort data, the estimation of shared uncertainty in
exposures using the MCML method is already in development. The data consisting
of yearly dose for each individual of the ORNL cohort are publicly available
(http://cedr.lbl.gov). Our primary interest is in results from the fully parametric
bootstrap method and in comparing these results to the MCML results. In addition,
we studied further the validity of the MCML method by comparing the distribution
of
ˆ
r
β from the fully parametric bootstrap method to likelihoods from the MCML
method with varying shared errors. Last, we applied the MCML method to the
Colorado Uranium Miners data. We applied the MCML method to a simpler
disease-risk model that contains total radiation dose only, then the MCML method
was applied to a model that contains both total dose and dose-rate effects to
maximize both parameter simultaneously using sets of grid.
89
5.1 Fully Parametric Bootstrap Method on the ORNL Data
For the ORNL cohort data, the estimation of shared uncertainty in exposures
using MCML method is already in development. The initial work has already been
performed using a “black box” computer program that generates workers’ dose
histories (i.e. complex dosimetry system) based on publicly available work and dose
histories (http://cedr.lbl.gov). Our primary interest is in the results from a fully
parametric bootstrap method and in comparing these results to the MCML results.
Since our main purpose to use the ORNL data is to compare the fully parametric
bootstrap method with the results from the MCML method, we acquired the data set
used by the investigators who studied the MCML method with the ORNL data. This
data set consisted of ID and yearly external radiation dose for each individual. It
consisted of 5,357 ORNL workers who were employed for at least one year between
1943 to 1972, with radiation exposure monitoring, had no substantial dose from
internal contamination of neutrons, had no involvement in a radiation
accident/incident, and did not receive an annual dose of 250 mSV or more in any
year. They were followed through 1990. Among these workers, there were 225
cases of all cancer excluding leukemia.
For the MCML method, analyses of all solid tumor mortality were conducted
using a partial likelihood method applied to a linear excess relative risk (ERR)
model, which has the basic form ( ) ( ) 1 t X λα βt ⎡ ⎤ =+
⎣ ⎦
, where λ(t) is the hazard rate
at time t and X(t) is the individual’s doses in Sievert cumulated up to time t allowing
for a disease lag period of 10 years. The data were stratified on sex, age, and socio-
90
economic status, to control for the potential confounding effects of these factors.
Hence, these case-control risk set data were also acquired for the fully parametric
bootstrap analysis.
A total of 500 data sets were simulated for the fully parametric bootstrap
method. For each simulation, new sets of yearly dose were generated for each
individual with a randomly sampled bias factor associated with that year. Note that
the same sampled bias factors were applied to the entire cohort for each run of the
simulation in order to produce shared dose errors. Then a case was sampled for each
risk set under a multinomial distribution with probabilities proportion to each
individual’s calculated hazard from the newly generated dose.
The fully parametric bootstrap analysis was done in a case-control setting,
where (partial likelihood) relative risk regression was implemented. The partial
likelihood approach eliminated the dependence on the nuisance variable α by
stratification of variables that may affect the disease risk. Hence, parameters of
interest to be estimated were reduced to a single linear dose-response parameter β.
We set β equal to 4.82, adapted from the estimates by the MCML method. With the
new case-control risk set, beta was re-estimated by maximum likelihood on a grid
from 0 to 20 with 0.02 intervals on the recorded doses. Also, whether this new beta
estimate was within the 90% confidence limit of beta estimate from MCML
(CL=0.41,13.31) was noted.
There were 225 cases in the ORNL data. Time was defined to be the age of
case at failure. With this time classification, there were 2 cases without any controls
91
at their times of failure, which yielded 223 case-control sets. There were 17 (7.6%)
cases who had 0 cumulative life time exposure, and maximum exposure among cases
was 277.7 mSv. Mean cumulative exposure for cases was 24.05 mSv with standard
deviation of 42.9 mSv.
The mean of beta estimates from 500 simulations was 5.72 (ERR/Sv) with a
standard deviation of 4.03. This was slightly higher than the initial beta (the
estimated beta from the MCML method) of 4.82. The 5
th
percentile value was -
0.0008 and the 95
th
percentile was 13.25 (compared to 90% CI of MCML (0.41,
13.31)). Only 86.6% of estimated betas from the fully parametric bootstrap method
were within 90% CL of the MCML beta estimate, the 90% empirical confidence
interval from the FPB method was wider towards the null (43 out of 500 (8.6%)
yielded beta estimates to be below the MCML’s lower confidence limit). Table 10
shows the summary of the result.
Table 10 Summary of Beta Estimate from the Fully Parametric Bootstrap
Method over 500 Simulations
Estimate
90% CL
Count (%)
*
MCML method
(Stayner et al.)
4.82
0.41, 13.31
NA
FPB
5.72
**
-0.0009, 13.25
433 (86.6)
*
Number of times a beta estimate fell within 90% CI of the MCML method
**
Mean of 500 estimates
92
The mean and median of risk estimate from the fully parametric bootstrap
method were similar to the estimate from the MCML method. The 90% CL of the
MCML method contained less than 90% of the beta estimated from the FPB method.
Similar to the simulation experiment, the CL from the FBP did not have larger upper
bound than the CI from the MCML. Also, note that using the recorded dose
estimates yielded an estimate for the slope β of 5.38 with a 90% confidence interval
(CI) of 0.54 to 12.58,
34
and the 90% coverage of the FPB method had a smaller
lower bound as well.
As we have seen in the simulation experiment, there was clear widening of
confidence intervals from the MCML and the fully parametric bootstrap methods,
and the range of the confidence intervals disagreed. The lower confidence interval
for the fully parametric bootstrap method was too small, i.e. it failed to be consistent
with the view that correcting for nondifferential dosimetry error should not make a
positive result using observed doses no longer significant. (Note again that the
MCML and observed dose analyses both gave a 90% CI that did not overlap 0.)
Therefore, we studied the validity of the MCML method by re-applying the method
to each of 500 simulated realizations created from the FPB method to calculate true
coverage, and examine the distribution of 500 confidence intervals calculated from
the MCML in the next section.
93
5.2 Validation of the MCML Method
The result from the fully parametric bootstrap method and the MCML
method did not fully agree on the impact of shared errors in the ORNL data;
therefore, further examination of the MCML confidence interval is necessary. A
precise confidence limit for a dose-response coefficient cannot be given if data are
reconstructed by a “complex dosimetry system.” If the disease outcome in the
ORNL data was a random occurrence, i.e. one of many possibilities due to unknown
or unaccounted factors, then the confidence interval found using the MCML method
by Stayner et al. is one of many possibilities as well. Hence, in order to examine the
true confidence interval coverage of β from the MCML method, multiple runs of
MCML were executed. For each set of new outcomes generated from running
the fully parametric bootstraps, a MCML run provides a confidence interval for the
true risk estimate. The validity of the confidence interval is examined by counting
the fraction of times in the simulation that the MCML confidence interval contains
the true value of β .
r
D
As described previously in the previous section, a total of 500 data sets were
simulated for the fully parametric bootstrap method. For each simulation, a new set
of yearly dose
ir
X was generated for each individual with randomly sampled bias
factor that applied to the entire cohort. Then a case was sampled for each risk
set under a multinomial distribution with probabilities proportion to newly calculated
ir
D
94
hazard using the newly generated dose. Hence, this process produced 500 new case-
control data sets.
Each set of then was used to perform MCML method to calculate
ir
D
,
ˆ
MLE r
β
and its confidence interval. The parametric form of the model chosen for the
evaluation was an additive relative risk model as in the fully parametric bootstrap
model
( ) ( ) 1
rr
tX λα β
r
t ⎡ ⎤ =+
⎣ ⎦
,
for r=1,2,…,500, where ( )
r
t λ is the hazard rate at time t,
r
α is the background
hazard rate at time t (not specified by the Cox model),
r
β is the slope parameter, and
( )
r
X t is the individual’s doses in milli-Sievert cumulated up to time t allowing for a
lag between exposure and cancer induction of 10 years.
As before, profile partial likelihood was calculated over a grid from 0 to 20
with 0.02 intervals. An average of the profile likelihood was calculated at each grid
point, and the MLE of rth replication
,
ˆ
MLE r
β was then taken to be the value of β for
which the averaged profile likelihood ( )
r
L β was at its maximum. Again, the 90%
confidence bounds were the values of β for which the difference between
-2ln[ ( )]
r
L β and
,
-2ln[ ( )]
rMLEr
L β was 2.7055.
The mean of 500
r
β estimated under the MCML method (with 1,000
simulations for each run) was 5.84. This is still higher than the estimated beta from
the MCML method by Stayner et al. and even higher than the mean beta estimates
95
from the fully parametric bootstrap method (4.82 and 5.72, respectively). The 5
th
percentile value was close to 0 (but bigger than 0) and the 95
th
percentile value was
13.95. 38 out of 500
r
β estimates (7.6%) fell below the lower MCML confidence
limit of 0.41, and 29 out of 500 (5.8%) were bigger than the upper MCML
confidence limit of 13.31. 438 out of 500 (87.6%) confidence limits calculated using
the MCML contained the true beta used 4.82 β = (Table 11). 6.8% and 5.6% of CL
were below and above the true beta, respectively.
Table 11 Summary of MCML Validity Test
his experiment showed that the MCML method was quite feasible way to
correct for possible shared uncertainties at least in this data set. Although there were
slightly more than 5% of CL’s that fell below the true
Frequency
Percent
CL below true 4.82 β =
34
6.8
CL contained true 4.82 β = 438 87.6
CL above true 4.82 β =
28 5.6
T
β , none of them included 0,
which is appropriate as explained above. We saw that slightly less than 90% of the
CL’s contained the true β , and this maybe due to simulating only 500 case-control
outcome data sets and/ pling only 1,000 doses for each MCML run. If we
were to simulate mo e data and/or sample 10,000 data sets as Stayner et al.
did for
or sam
re outcom
β estimation using the MCML, we may capture better sense of the
96
0
re,
5.3 MCML Method on the Colorado Plateau Uranium Miners Data
for the
entire C
ven
a full
olesky
decomp
trices at
components were estimated following the method from Stram and Kopecky (2003).
feasibility of MCML. However, one should keep in mind that simulating 50
outcome data sets and sampling 1,000 doses for each MCML run, as we did he
was already very time intensive.
We wrote a computer program that simulates repeated dose histories
olorado Plateau Uranium Miners cohort to develop a “complex dosimetry
system” (Chapter 3). This “complex dosimetry system” computed multivariate
realizations from the conditional distribution of the miners’ exposure histories gi
all the measurement data. This computer program sampled random realizations from
the distribution of the log of true dose given log measured dose, then for each
random realization from the full distribution of true dose given observed data,
exponentiated the realizations, and re-linked the miners’ work histories to form
multivariate random imputation of histories given all measurement data.
This “complex dosimetry system” used a standard method (e.g. Ch
osition of the variance covariance matrix) to generate the repeated
realizations of true dose from their conditional normal distribution. The ma
the district level were computed once and repeated realizations were obtained by
post multiplying the Cholesky square root of the variance covariance matrix with
random independent standard normal variables and adding the conditional means.
With these 1,000 simulated data sets, shared and unshared uncertainty
97
Althou
ed to
erfect
tem of the Colorado Plateau Uranium Miners
Cohort .
rado Plateau Uranium Miners Data
Linking the 1,000 already simulated data sets and the miners’ work history
rage
yearly
gh significant, the shared multiplicative error appeared too small to have
much impact on the standard errors of the dose response parameter estimate.
However, as seen in Figures 15 and 16 in Chapter 3, the linear model implement
estimate the shared uncertainty components did not produce a good fit, i.e. a p
model would lay the covariances perfectly on a single regression line. Therefore, a
different approach may be necessary.
Hence, we applied the MCML method to multiple realizations already
created from the complex dosimetry sys
in a cohort setting (instead of case-control setting as in the ORNL data)
First, we utilized a disease-risk model that contains total dose only. Then the
MCML method was applied to a model that contains both total dose and dose-rate
effects. The MCML maximized both parameters simultaneously using a 2-
dimensional grid, ()
total dose dose-rate
, ββ β = .
5.3.1 Total Dose Risk on the Colo
data set, the simulated doses of individual miners were first converted to ave
dose in one-year age intervals. These data were then fitted to estimate
total dose
β with a simple model where the hazard consisting only of exposure dose a
age, i.e. () ()
nd
( ) () t X t t β λ λ + = 1
0
where ( ) t X is the expected cumulative exposu
t. We initially set the baseline hazard
re to
radon by age ( ) as a 4
th
-polynomial of age,
0
t λ
98
first
d β values were selected up to approximately three
standar s of b
um
l
Log-likelihood Plots of MCML and Single Imputation
i.e. ()
4
0
tbt λ = . The conditional expected dose was fitted into this model to
estim b and β.
First, a grid of b an
ate initial
d errors away from the initial estimates, i.e. total of 40,000 combination
and β. Second, a profile likelihood was calculated at each set of b and β for each
replication. Third, 1,000 profile likelihoods were averaged at each set of b and
β. Then MLE was chosen to be a combination of b and β that yielded the maxim
ikelihood.
Figure 21
-5
-4.5
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
0.00 0.20 0.40 0.60 0.80 1.00
beta
MCML E(X|Z)
Using the expected adjusted dose (single imputation as in Chapter 2), the
estimat
rror
e of β was 0.38 ERR/100 WLM, whereas the MCML method yielded an
estimate of 0.44 ERR/100 WLM. This difference was less than half a standard e
of β. The slope for the adjusted dose was somewhat in between the estimates from
99
sted
b
ethod
shifted
25 profile likelihoods calculated from 1,000
replica
k
e
y 8
unadjusted and adjusted dose in Stram et al. (0.28 and 0.44 ERR/100 WLM,
respectively), and the estimate from the MCML method was same as the adju
dose in Stram et al. At each β, the maximum log-likelihood over the entire grid of
was noted and plotted. Figure 21 shows a plot of the expected dose log-likelihood
and the average log-likelihood from the MCML method, i.e. with and without
adjusting for shared measurement error. Two log-likelihood plots do not seem to
differ much. As noted earlier, the estimate from the expected dose (without
measurement error) was slightly smaller than the estimate from the MCML m
(with measurement error). It is more important to note that the confidence limit (the
width of the curve) does not appear to be any wider after adjusting for the
measurement error. The curve that adjusts for measurement error is simply
upward slightly. This may indicate a presence of additive error but no multiplicative
error. Or it is also possible that there were a few replications that were simulated out
of norm that drive the results.
Figure 22(a) plots first 2
tions and Figure 22(b) plots profile likelihoods with top 1% maximum
likelihoods, along with average likelihood from the MCML method (thick blac
line). Because of the very strong dose exposure in these data, we expect that ther
may be problems in convergence of the MCML (see Section 1.8, page 19). As
foreseen, few replications seem to drive the average likelihood. There were onl
replications that yielded maximum likelihood bigger than that of average likelihood.
Figure 22(c) shows the selected profile likelihood curves with excluding curves that
100
od
Figure 22 Plot of Profile Likelihoods and Average Likelihood using the MCML
Method
(a). Using First 225 Replications
yielded top 10% of maximum likelihood (the largest being 90 percentile and the
second largest being 80 percentile). The thick red line represents average likeliho
of the profile likelihoods that exclude top 10%. The average likelihood falls between
profile likelihoods with maximum likelihood of 80 percentile and 70 percentile.
-1.00E+11
0.00E+00
1.00E+11
2.00E+11
3.00E+11
4.00E+11
5.00E+11
6.00E+11
0.00E+00 2.00E-03 4.00E-03 6.00E-03 8.00E-03 1.00E-02
(b). Using Top 1% Maximum Likelihoods (enlarged)
-5.00E+10
0.00E+00
5.00E+10
1.00E+11
1.50E+11
2.00E+11
2.50E+11
3.00E+11
3.50E+11
4.00E+11
0.00E+00 2.00E-03 4.00E-03 6.00E-03 8.00E-03 1.00E-02
101
(c). Using Selective Replications (excludes 100 top 10% ranked likelihoods)
0.00E+00
2.00E+07
4.00E+07
6.00E+07
8.00E+07
1.00E+08
1.20E+08
0.00E+00 2.00E-03 4.00E-03 6.00E-03 8.00E-03 1.00E-02
Hence, 1,000 replications generated from the “complex dosimetry” of the
Colorado Uranium Plateau Miners Cohort may not be sufficient to accommodate a
possible existence of shared error. Perhaps 10,000 replications are needed as in
Stayner et al. to detect and adjust for possible shared error. However, it is worth
noting that this would be very costly considering it was already very time intensive
with 1,000 replications to generate the replications, converting these replications into
proper format for the parameter estimate (mine-year format to age interval format for
each miner for each replication), then a grid search for b and β.
5.3.2 Total Dose and Dose-rate Risks on the Colorado Plateau Uranium
Miners Data
The hazard model can be extended to include more variables that may have
effect on the disease risk. The effect of these variables can be estimated in the
profile likelihood procedure applied to each realization. For models with two
102
e of
grid of
param ters of interest, for example, a profile log-likelihood over the possible sets
β ’s, i.e. , can be calculated to estimate both a
response and inverse dose-rate ter. For simplicity, we adapted an empirical
dose-rate m ortality as
()
total dose dose-rate
, ββ α =
parame
odel for risk of lung cancer m
() () ()
{ } ()
4
total dose dose-rate
1 exp 15 tbt Xt c Idrt λβ ⎡ ⎤ =+ >
⎣ ⎦
, where () X t is the exposure
dose up to age t. The calculation of ( ) dr t needs a bit of attention; it was calculated
as the cumulative exposure dose ( ) X t divided by number of previous year(s) that a
miner had any radiation exposure up to age t. Hence, if a miner had gap in working
( ) dr t mine-years, his dose-rate would equal to his average dose-rate of all previous
employed year. And if he resumed working the following year but had very little
exposure, his dose-rate for that year may be smaller than the previous year’s dose
rate.
The MCML method was applied to this empirical dose-rate model. Note that
when
dose-rate
0
-
α = , it simply refers back to the relative risk model that contains the
total dose only. Then a grid of
dose-rate
α was chosen between –1.00 and 0, since the
aram ates were –0.96 and –0.48 using unadjusted and adjusted dose,
respe g to resu n by Stram et al.). For each
p eter estim
ctively (accordin lts give
dose-rate
α , the
MCML
a
method was performed to estimate MLE on a grid of b and β, as done
previously with total dose only model, and its likelihood was noted. As before,
103
combination of b and ( )
total dose dose-rate
, ββ α = that gave the largest likelihood would
yield the MLE.
using Doses from Conditional Expect
Figure 23 hoods with various Dose-rate Effects
Dose-rate Param
Plots of Maximum Log-likeli
ation
Table 12 eter Estimate Comparisons
Total dose,
total dose
β ,
per 100 WLM
Dose-rate,
dose-rate
α ,
P-value
15+ WL
†
Corrected dose (Stram et al.)
New Corrected dose
MCML method
0.60
0.56
0.59
-0.48
-0.31
-0.35
0.02
0.10
0.06
Uncorrected dose (Stram et al.) 0.59 -0.96 <0.0001
†
P-value tests the significance of dose-rate,
dose-rate
α , 15+ WL compared to ≤ 15
-1460
-1458
-1456
-1452
-1450
-1448
0.00 0.20 0. 0.60 0.80 1.00 1.20
Total dose
100 WLM)
Log likelihood
-1454
-1446
40
(per
dose-rate=0 dose-rate=-0.31
104
eters using d
putation. Figure 23 plots log-likelihoods of the conditional
xpectation with and without the dose-rates effects. These log-likelihoods were
dose-rate was –0.31 (p<0.10), which was bigger than two estimates using the PHS
unadjusted doses and the adjusted doses by Stram et al. (-0.96 and -0.48,
respectively). Table 12 compares results of parameter estimation under an empirical
disease-risk model that contains both total dose and dose-rate using the uncorrected
and the corrected dose by Stram et al. to the corrected dose by conditional
expectation and using the MCML method on 1,000 replications sampled from the
complex dosimetry of the Colorado Plateau Uranium Miner data. The MCML
method yielded the parameter estimate for dose-rate as
We first estimated param oses from the conditional expectation,
i.e. the single im
e
plotted by total dose parameter given the MLE of age at exposure. The MLE for
dose-rate
0.35 α = −
the uncorrect
(p=0.06).
This estimate was substantially lower than that from ed dose and still
maller than from the corrected dose by Stram et al. However, using the dose from
ffect compared
to –0.35).
We did not see much difference in dose parameter estimates after
incorporating additional information on miners’ work history and using the complex
dos rtainties sing the MCML meth id
see, however, a decrease in dose-rate effect in our analyses c d to res iven
by Stram et al. On the other hand, the dose-rate effect estimate using the MCML
method did not further reduce the estimate compared to the estimate from the single
s
( 0.31 − the single imputation seems to underestimate the dose-rate e
imetry to incorporate shared unce then u od. We d
ompare ults g
105
imputa
lematic
kes
e
tion. As we’ve seen in the case with the total dose only model, the MCML
method seems to be pretty reliable for “weak” dose exposure but may be prob
for strong dose-response. However, it may be that it is the unshared error that ma
the MCML so variable, and if only unshared error is present as in this case with th
Colorado Plateau Uranium Miners data then use of ( ) | EX W probably is good
enough in most cases.
106
ease risk.
not be fully avoided despite our best effort to minimize them.
occupational settings when a group of workers share a common environment, a
hared measurement error may arise when collecting information on commonplace
xposure. Many studies were done previously that introduced methods to deal with
possible measurement errors; however, correcting for shared uncertainties has not
been studied well. In this dissertation, we have investigated two methods (MCML
and fully parametric bootstrap method) to correct for possible shared measurement
error.
First in Chapter 1, we introduced necessary terminology and notation that are
related to measurement error. We also described existing methods that correct for
measurement error such as maximum likelihood and two-stage methods, as well as a
brief description on the MCML and the fully parametric bootstrap methods.
In Chapter 2, we described the history of the data (the Colorado Plateau
Uranium Miners and the Oak Ridge National Laboratory Cohort) used to perform the
MCML and the fully parametric bootstrap methods. In the Colorado Plateau
Uranium Miners data, apart from an obvious shared measurement error that may
exist due to shared mine-year dose measurements by miners who worked in a same
Chapter 6 Conclusions and Future Research
Measurement error is ubiquitous in epidemiological studies that make
inference on a risk of exposure on a disease. Measurement error can potentially
distort results to yield incorrect inferences about dis Unfortunately,
measurement errors can
In
s
e
107
e tim ars in which
o spot isit measurement was made by the PHS in a particular mine) were imputed
based o ta,
then
f
ator, provided useful information regarding unidentified mines (pseudo-
mines)
d
ansion
model
n,
mine during sam e period, missing mine-year measurements (i.e. ye
n v
n “nearness” of measurements in time and geography. In the ORNL da
shared uncertainties may arise from sharing the same type of monitoring device
and/or policy change in dose collection methods, i.e. weekly before July 1956,
quarterly.
In Chapter 3, we updated the expected conditional radiation exposure dose o
Colorado Plateau Uranium Miners using the hierarchical model. Although the
hierarchical model was first described in Stram et al., we have applied the model at
the district level rather that locality level as in Stram et al. In addition, this updated
dose was different from the previous computation of Stram et al. in that we used
additional information on miners’ work history. Dr. Victor Archer, a former PHS
investig
that some miners worked in. He also provided information on the geographic
location of these pseudo-mines and their dose-rates. These dose-rates were believe
to be used by the PHS in constructing the exposure data. We then used these
reconstructed doses to present an updated analysis of the two-stage clonal exp
proposed in Luebeck et al.
(1999). We demonstrated that part of the
promotion effect found in Luebeck et al. is likely to be due to the inverse dose-rate
effect. In particular, we showed that the impact of exposure uncertainty on the
estimated parameters of the TSCE model is sensitive to classical error. In additio
we updated the complex dosimetry system and examined the results by computing
108
od.
eated the outcome data (case-control setting) using the generated
observe
t
e
p
t
the average shared and unshared multiplicative and additive error components
suggested by Stram and Kopecky (2003).
In Chapter 4, we conducted a simple simulation experiment to study the
feasibility of the MCML method compared to the fully parametric bootstrap meth
We first generated observed dose under the distribution of the ORNL cohort data.
Then we cr
d dose under Stram and Kopecky’s SUMA model. Then we generated
multiple samples to estimate total dose effect using the MCML method. It is
important to note that the same sampled bias factors were applied to the entire cohor
for each run of the simulation in order to produce shared dose errors. Using the total
dose estimate from the MCML method, multiple samples were generated again
under the complex dosimetry system. For each run of complex dosimetry that
created a correlated dose structure of the entire cohort, the outcome data was
regenerated to give a new case-control risk data set. Then the total dose effect was
computed by conventional conditional logistic regression of disease outcome on
observed dose. The number of times these estimates fell within the 90% confidenc
interval from the MCML method was noted. The 90% empirical confidence interval
was calculated using 5
th
and 95
th
percentile values of the estimates.
Lastly in Chapter 5, we applied the MCML and the fully parametric bootstra
method to real data: the Colorado Plateau Uranium Miners and the ORNL cohor
data. We first applied the fully parametric bootstrap method to the ORNL data since
its data is in a simpler setting and the estimation of a one-parameter excess relative
109
The
tric bootstrap method was compared to
an MC ult
ethod.
trap
CML
ns
ons
d,
β β =
risk model for cancer risk is similar to that described for the Colorado miners.
results obtained from using the fully parame
ML approach that was already in development. However, the MCML res
did not fully agree with a fully parametric bootstrap approach on the impact of
shared errors in the ORNL data. Hence, we executed multiple runs of MCML to
further examine the true confidence interval coverage of β from the MCML m
For each set of new outcomes generated from running the fully parametric boots
method on the ORNL data, independent runs of MCML are executed and the M
confidence interval was examined. With these multiple MCML confidence intervals
calculated, the validity of the MCML confidence interval was examined by counting
the fraction of times in the simulation that the MCML confidence interval contai
the true value of total dose effect estimate.
Also in Chapter 5, we applied the MCML method to multiple realizati
already created from the complex dosimetry system of the Colorado Plateau
Uranium Miners Cohort in a cohort setting (Chapter 3), instead of case-control
setting as in the ORNL data (Chapter 4 and earlier section of Chapter 5). We utilized
a disease-risk model that contains both total dose and dose-rate effects. Then the
MCML maximized both parameters simultaneously using sets of gri
( )
total dose dose-rate
,α . Maximum likelihood was considered to be a combination
that maximized both
total dose
β and
dose-rate
α (along with the nuisance parameter b, i.e.
age at exposure effect).
110
L method and the fully parametric
bootstr
p
nd
possible
shared uncertainties al
However, the results from the MCML method applied to the Colorado
Uranium Miners data showed additional work is needed. The results suggest that
With the TSCE model that described the disease’s pathway and Stram and
Kopecky’s SUMA model provides estimates of various uncertainty, we have shown
that the refined single dose imputation of the Colorado Uranium Miners data
incorporating additional information and fully utilized hierarchical model alone
made impact when assessing a disease-risk analysis. Using the TSCE model as an
example to see the effect of measurement error associated with the reconstruction of
individual levels of radon exposure, we saw that the inverse dose-rate effect may be
overstated and the promotional effect of radon on the intermediate cells is
overemphasized in Luebeck et al. (1999). Under the SUMA model, we saw that the
shared multiplicative error is small but the fit of the simple SUMA model was not
very good.
Therefore, we implemented the MCM
ap method to further examine the effect of shared measurement error. We
saw that the results from the MCML method and the fully parametric bootstra
method did not quiet agree (using the ORNL data). Hence, we investigated the
feasibility of the MCML method by performing multiple runs of MCML on the
newly created outcome data from running the fully parametric bootstrap method, a
we saw that the MCML method was a quite feasible way to correct for
though the confidence interval from the validity test of the
MCML method was slightly smaller (87.6% instead of 90%).
111
1,000 r
tioning
ata where
is dissertation, we studied the effect of shared uncertainty that may exist
in epid
way
hen
m
se when there is a
strong mind
.
eplications created from the complex dosimetry system of the Colorado
Uranium Miners may not be sufficient to fully capture the variability of
uncertainties. Recall from Chapter 1 Section 1.8.2 that we removed the condi
on the outcome so that the outcome is independent of true exposure. However,
Stram and Kopecky observed that when in fact there is a strong dose-response
relationship, we would need to sample true dose given both disease and input data.
On the other hand, it may be that it is unshared error that makes the MCML so
variable; hence as in the case with the Colorado Plateau Uranium Miners d
only the unshared error is present, use of conditional expectation probably is good
enough.
In th
emiological studies. We also introduced two methods, the MCML and the
fully parametric bootstrap method, to correct for the shared uncertainty, and
examined the accuracy and computational feasibility of the MCML method in a
setting where a complex dosimetry system has been used and where considerable
“sharing” of dose errors takes place. We saw that the MCML method is a viable
to correct for shared measurement error. However, a large number of replications
need to be sampled from a complex dosimetry system to fully capture its effect w
the uncertainty distribution is quite variable, and true dose needs to be sampled fro
a distribution conditional on both disease and input data in a ca
prior belief that a dose-response relationship exists. One should keep in
that these both studies may be very time intensive and computationally challenging
112
and
“complex
These comparisons between Stram and Kopecky’s SUMA method, the MCML
the fully parametric bootstrap method will give guidance to future use of
dosimetry systems.”
113
Armitage P, Doll R. Stochastic models for carcinogenesis in relation to the age
human cancer. British Journal of Cancer. 11:161-169, 1957.
Armstrong B. The Effects of Measurement Errors on Relative Risk Regression.
American Journal of Epidemiology. 132(6):1176-1184, 1990.
Armstrong B. Effect of Measurement Error on Epidemiological Studies of
Environmental and Occupational Exposure. Occupational and
Environmental Medicine. 55:651-656, 1998.
Armstrong BG, Whittemore AS, Howe GR. Analysis of case-control data with
covariate measurement error: application to diet and colon cancer. Statistics
in Medicine, 1989. 8:1151-1163.
Bhatia S. Neglia JP. Epidemiology of childhood acute myelogenous leukemia.
Journal of Pediatric Hematology/Oncology. 17(2):94-100, 1995.
Brown CC, Chu KC. Implications of the multistage theory of carcinogenesis applied
to occupational arsenic exposure. Journal of National Cancer Institute.
70:455-463, 1983.
Brown CC, Chu KC. A new method for the analysis of cohort studies: Implications
of the multistage theory of carcinogenesis applied to occupational arsenic
exposure. Environmental Health Perspectives. 50:293-308, 1983b.
Brugmans MJP, Bijwaard H, Leenhouts HP. The Overrated Role of ‘Promotion’ in
Mechanistic Modelling of Radiation Carcinogenesis. Journal of Radiological
Protection. 22:A75-A79, 2002.
Buzas JS. Stefanski LA. A Note on Corrected Score Estimation. Statistics &
Probability Letters. 1995.
Cairns J. Mutation selection and the natural history of cancer. Nature. 255:197-200,
1975.
Cardis E, Gilbert ES, Carpenter L, Howe G, Kato I, Armstrong BK, Beral V,
Cowper G, Douglas A, Fix J, et al. Effects of low doses and low dose rates
of external ionizing radiation: cancer mortality among nuclear industry
workers in three countries. Radiation Research. 142(2):117-32, 1995.
References
distribution of
114
bert ES, Carpenter L, Howe G, Kato I, Armstrong BK, Beral V, Cowper
G, Douglas A, Fix J, et al. Effects of low doses and low dose rates of
external ionizing radiation: cancer mortality among nuclear industry
ay NE ary prevention of cancer.
Freedm
s. 81:169-188, 1989.
. A
Ridge,
Geyer C
. W.R. Gilks, S.T. Richardson, and D.J.
Gilbert ysis of
Gilbert
orkers at the Hanford Site, Oak Ridge
National Laboratory, and Rocky Flats Nuclear Weapons Plant.
ilber ES, Gragle DL, Wiggs LD. Updated Analyses of Combined Mortality Data
eapons Plant. Radiation Research. 136:408-421, 1993.
L
,
Cardis E, Gil
workers in three countries. Radiation Research. 142(2):117-132, 1995.
Cohen BL. Test of the Linear No-threshold Theory of Radiation Carcinogenesis for
Inhaled Radon Decay Products. Health Physics. 68:157–174; 1995.
Cook PJ, Doll R, Fellingham SA. A mathematical model for the age distribution of
cancer in man. International Journal of Cancer. 4:93-112, 1969.
and Brown CC. Multistage models and prim D
Journal of National Cancer Institute. 64:977-989, 1980.
an DA and Navidi WC. Multistage models for carcinogenesis.
Environmental Health Perspective
Frome EL, Cragle DL, Watkins JP, Wing S, Shy CM, Tankersley WG, West CM
mortality study of employees of the nuclear industry in Oak
Tennessee. Radiation Research. 148(1):64-80, 1997.
. Estimation and optimization of functions, in Markov Chain Monte Carlo in
Practice Chapman and Hall
Spiegelhalter, Editors. 1996. Chapman and Hall. p. 241-258
ES. Some Effects of Random Dose Measurement Errors on the Anal
Atomic Bomb Survivor Data. Radiation Research. 98:591-605, 1984.
ES, Fry SA, Wiggs LD, Voelz GL, Cragle DL, Petersen GR. Analyses of
combined mortality data on w
Radiation Research. 120(1):19-35, 1989.
G
for Workers at the Hanford Site, Oak Ridge National Laboratory, and Rocky
Flats W
Hart JC. A progress report dealing with the derivation of dose data from ORN
personnel exposure records applicable to the Mancuso study. Technical
Report ORNL/M-2614, Oak Ridge National Laboratory, Oak Ridge, TN
1966.
115
Heiden
ornun
ornun ranium
ablon G
erber tin S, Rallison
Kerr G ong workers at
ak Ridge National Laboratory. Health Physics. 66(20):206-208, 1994.
Lubin J power
esearch.
144:329-341, 1995.
uebec the
Data for the Colorado Uranium Miners Cohort: Age, Dose and Dose-Rate
Luebec
the Colorado Uranium Miners
Cohort: Age, Dose and Dose-Rate Effects. Radiation Research. 152:339-351,
Lundin ory Cancer
Quantitative and Temporal Aspects. Report from the Epidemiological Study
Mitche
tional Radiation Dose to Individuals, Using Weekly Dosimetry Data.
Radiation Research. 147:195-207, 1997.
Hatch M, Thomas D. Measurement Issues in Environmental Epidemiology.
Environmental Heal Perspectives Supplements. 101(4):49-57, 1993.
reich WF, Luebeck EG, Moolgavkar SH. Effects of Exposure Uncertainties
in the TSCE Model and Application to the Colorado Miners Data. Radiation
Research. 161:72–81, 2004.
g RW, Meinhardt TJ. Quantitative risk assessment of lung cancer in U.S. H
uranium miners. Health Physics. 52(4):417-30, 1987.
g RW, Deddens JA, Roscoe RJ. Modifier of Lung Cancer Risk in U H
Miners from the Colorado Plateau. Health Physics. 74: 12-21, 1998.
. Atomic Bomb Radiation Dose Estimate at ABCC. Technical Report 23-71. J
Atomic Bomb Casualty Commission, Hiroshima, 1971.
RA, Till JE, Simon SL, Lyon JL, Thomas DC, Preston-Mar K
ML, et al. A cohort study of thyroid disease in relation to fallout from nuclear
weapons testing. Jama, 270:2076-82, 1993
D. Missing dose from mortality studies of radiation effects am
O
H, Boice JD Jr, Samet JM. Errors in exposure assessment, statistical
and the interpretation of residential radon studies. Radiation R
k EG, Heidenreich W, Hazelton, W, et al. Biological Based Analysis of L
Effects. Radiation Research. 152:339-351, 1999.
k EG, Heidenreich W, Hazelton, W, Paretzke HG. Moolgavkar SH.
Biological Based Analysis of the Data for
1999.
F, Wagoner J, Archer V. Radon Daughter Exposure and Respirat
of United States Uranium Miners, 1971.
ll TJ, Ostrouchov G, Frome EL, Kerr GD. A Method for Estimating
Occupa
116
Moolga
Carcinogenesis. Journal of National Cancer Institute. 66:1037-1052, 1981.
Moolga
f the Colorado Plateau Uranium Miners’ Data.
Epidemiology. 4(3):204-217, 1993.
Moolga nvironmental agents.
Environmental Health Perspectives. 50:285-291, 1983.
Nakam
Methodology and Application to Generalized Linear Models. Biometrika.
CI, Estimated Exposures and Thyroid Doses Received by the American People
Mortality Among A-Bomb Survivors. Radiation Research. 125(2):206-13,
NRC. Health effects of exposure to radon: BEIR VI. Washington, DC: National
Academy Press; 1998.
Pierce
123:275-284, 1990.
Prentic
reston DL. Pierce DA. The effect of changes in dosimetry on cancer mortality risk
reston DL. Pierce DA. Shimizu Y. Age-time patterns for cancer and noncancer
):733-4;
vkar SH, Knudson Jr AG. Mutation and Cancer: A Model for Human
vkar S, Luebeck EG, Krewski D, Zielinski J. Radon, Cigarette Smoke, and
Lung Cancer: A Re-analysis o
vkar SH. Model for human carcinogenesis: action of e
ura T. Corrected Score Functions for Errors-in-variables Models:
77:127-137, 1990.
N
from Iodine-131 in Fallout Following Nevada Atmospheric Nuclear Bomb
Tests: A Report from the National Cancer Institute. 1997, Washington DC:
National Cancer Institute.
Neriishi K, Stram DO, Vaeth M, Mizuno S, Akiba S. The Observed Relationship
Between The Occurrence Of Acute Radiation Effects And Leukemia
1991.
DA. Stram DO. Vaeth M. Allowing for Random Errors in Radiation Dose
Estimates for the Atomic Bomb Survivor Data. Radiation Research.
e RL. Covariate measurement errors and parameter estimation in a failure
time regression model. Biometrika, 1982. 69(2):331-342.
P
estimates in the atomic bomb survivors. Radiation Research. 114(3):437-66,
1988.
P
excess risks in the atomic bomb survivors. Radiation Research. 154(6
discussion 734-5, 2000
117
reston DL. Shimizu Y. Pierce DA. Suyama A. Mabuchi K. Studies of mortality of
123-130, 1998.
nal
lth Perspectives. 107:649-656, 1999.
nt
Rosner B, Spiegelman D, Willet WC. Correction of logistic regression relative risk
gy,
24-Hour Recall For Use In An Epidemiological Cohort
l of Epidemiology. 145(2): p. 184-96, 1997.
on Research. 128(2):157-69, 1991.
P
atomic bomb survivors. Report 13: Solid cancer and noncancer disease
mortality: 1950-1997. Radiation Research. 160(4):381-407, 2003.
Puskin JS. Smoking as a Confounder in Ecologic Correlations of Cancer Mortality
Rates with Average County Radon Levels. Health Physics. 84(4):526-32,
2003.
Richardson DB, Wing S. Methods for investigating age differences in the
effects of prolonged exposures. Am J Ind Med. 33(2):
Richardson DB, Wing S. Radiation and Mortality of Workers at Oak Ridge Natio
Laboratory: Positive Association for Doses Received at Older Ages.
Environmental Hea
Roscoe RJ. Steenland K. Halperin WE. Beaumont JJ. Waxweiler RJ. Lung cancer
mortality among nonsmoking uranium miners exposed to radon daughters
[comment]. JAMA. 262(5):629-33, 1989.
Rosner B, Willett WC, Spiegelman D. Correction of logistic regression relative risk
estimates and confidence intervals for systematic within-person measureme
error. Statistics in Medicine. 8(9):1051-69; discussion 1071-3, 1989.
estimates and confidence intervals for measurement error: the case of
multiple covariates measured with error. American Journal of Epidemiolo
1990. 132(4):734-745.
Schatzkin A, Kipnis V, Carroll RJ, Midthune D, Subar AF, Bingham S, Schoeller
DA, Troiano RP, Freedman LS. A Comparison Of A Food Frequency
Questionnaire With A
Study: Results From The Biomarker-Based Observing Protein And Energy
Nutrition (OPEN) Study. International Journal of Epidemiology. 32:1054–
1062, 2003.
Spiegelman D. Schneeweiss S. McDermot A. Measurement error correction for
logistic regression models with an "alloyed gold standard". American
Journa
Sposto R, Stram DO, Awa AA. An Estimate Of The Magnitude Of Random Errors
In The DS86 Dosimetry From Data On Chromosome Aberrations And
Severe Epilation. Radiati
118
-
York Academy of Sciences. 895:212-222, 1999.
Stefans Normal Mean with
Application to Measurement Error Models. Statistics,
Stram D orrecting for Exposure
Measurement Error in a Reanalysis of Lung Cancer Mortality for the
tram DO. Mizuno S. Analysis of the DS86 atomic bomb radiation dosimetry
Stram DO. Akiba S. Neriishi K. Stev
proteins in atomic-bomb survivors in Japan. American Journal of
Stram D
Stram DO. Sposto R. Preston D. Abrahamson S. Honda T. Awa AA. Stable
ong A-bomb survivors: an update. Radiation
Stram D epidemiological studies
of radiation-related disease risk in which dose estimates are based on a
Sposto R. Preston DL. Shimizu Y. Mabuchi K. The effect of diagnostic
misclassification on non-cancer and cancer mortality dose response in A
bomb survivors. Biometrics. 48(2):605-17, 1992.
Stayner L, Bailer J, Smith R, et al. Sources of Uncertainty in Dose-Response
Modeling of Epidemiological Data for Cancer Risk Assessment. Annals of
the New
Stayner L, Vrijheid M, Cardis E, Stram D, Deltour I, Gilbert SJ, Howe G. Monte
Carlo maximum likelihood methods for estimating uncertainty arising from
shared errors in exposures in occupational cohort studies. Radiation Research
(in press).
ki LA. Unbiased Estimation of a Nonlinear Function of a
Communications in
Series A. 18:4335-4358, 1989.
, Langholz B, Huberman M, Thomas D. C
Colorado Plateau Uranium Miners Cohort. Health Physics. 77(3):265-275,
1999.
S
methods using data on severe epilation. Radiation Research. 117(1):93-113,
1989.
ens RG. Hosoda Y. Smoking and serum
Epidemiology. 131(6):1038-45, 1990.
O. Sposto R. Recent uses of biological data for the evaluation of A-bomb
radiation dosimetry. Journal of Radiation Research. 32 Suppl:122-35, 1991.
chromosome aberrations am
Research. 136(1):29-36, 1993.
O, Kopecky KJ. Power and uncertainty analysis of
complex dosimetry system: some observations. Radiation Research.
160(4):408-17, 2003.
119
g Cancer
s.
s, Department of
Preventive Medicine, School of Medicine, University of Southern California,
Thierry udy of a selection of
10 historical types of dosemeter: variation of the response to Hp(10) with
.
Thierry rmann F, Cardis E, Fix JJ, Gilbert ES, Hacker C, Heinmiller B,
Marshall M, Ohshima S, Pernicka F, Pearce MS, Utterback D. Study of
r Research on Cancer (IARC), Lyon. In press,
2005.
Thierry rmann
ch Centre Saclay (France). Radiation Protection Dosimetry.
94(3):215-225, 2001.
Thierry
n of 10 historical types of dosemeter: variation of the response to
H
p
(10) with photon energy and geometry of exposure. Radiation Protection
Thoma rement Error: Influence on Exposure-
Disease Relationships and Methods of Correction. Annual Review of Public
homas D, Pogoda J, Langholz B, Mack W. Temporal Modifiers of the Radon-
Wacho
standard. American Journal of Epidemiology. 137(11): p. 1251-1258, 1993.
Watkin nd
e Oak
s mortality study. Technical Report ORISE 93/J-42,
Oak Ridge Institute for Science and Education, Oak Ridge, TN, 1993.
Stram D, Langholz B, Thomas D. Measurement Error Correction of Lun
Risk Estimates in the Colorado Plateau Cohort. Part I: Dosimetry Analysi
Technical Report No.134. Division of Biostatistic
1998.
-Chef I, Pernicka F, Marshall M, Cardis E, Andreo P. St
photon energy and geometry of exposure. Radiation Protection Dosimetry
102(2): 101-13, 2002.
-Chef I, Be
Errors in Dosimetry within the International Collaborative Study of Cancer
Risk among Radiation Workers in the Nuclear Industry IARC. Technical
Report, International Agency fo
-Chef I, Cardis E, Ciampi A, Delacroix D, Marshall M, Amoros E, Be
F. Method to Assess Predominant Energies of Exposure in a Nuclear
Resear
-Chef I, Pernicka F, Marshall M, Cardis E, and Andreo P. Study of a
selectio
Dosimetry. 102(2):101-113, 2002.
s D, Stram D, Dwyer J. Exposure Measu
Health. 14:69-93, 1993.
T
smoking Interaction. Health Physics. 66(3):257-262, 1994.
lder S. Armstrong B. Hartge P. Validation studies using an alloyed gold
s JP, Reagan JL, Cragle DL, From EL, West CM, Crawford-Brown D, a
Tankersley WG. Collection, validation, and description of data for th
Ridge Nuclear facilitie
120
SE 94/G-
ilkinson GS, Dreyer NA. Leukemia among nuclear workers with protracted
ing S olf S, Cragle DL, Frome EL. Mortality among
Wing S
153, 2000.
ittemore A and McMillan A. Lung Cancer Mortality among US Uranium Miners:
Watkins JP, Cragle DL, From EL, West CM, Crawford-Brown DJ, and Tankersley
WG. Adjusting external doses from the ORNL and Y-12 facilities for the
Oak Ridge Nuclear Facilities mortality study. Technical Report ORI
34, Oak Ridge Institute for Science and Education, Oak Ridge, TN, 1994.
W
exposure to low-dose ionizing radiation. Epidemiology. 2(4):305-309,
1991.
, Shy CM, Wood JL, W W
workers at Oak Ridge National Laboratory. Evidence of radiation
effects in follow-up through 1984. JAMA. 265(11):1397-1402, 1991.
, Shy CM, Wood JL, Cragle, DL, Tankersley W, Frome EL. Job Factors,
Radiation and Cancer Mortality at Oak Ridge National Laboratory: Follow-
up Through 1984. American Journal of Industrial Medicine. 23:265-279,
1993.
Wing S, Richardson D, Wolf S, Mihlan G, Crawford-Brown D, Wood J. A case
control study of multiple myeloma at four nuclear facilities. Ann
Epidemiol. 10(3):144-
Wing S, West CM, Wood JL, Tankersley W. Recording of external radiation
exposures at Oak Ridge National Laboratory: implications for
epidemiological studies. J Expo Anal Environ Epidemiol. 4(1):83-93, 1994.
W
A Reappraisal. Journal of National Cancer Institute. 71:489-499, 1983.
121
A given
Appendix A
mine-year may have been measured several times, but the specific
measurements
dlmti
Z for
dlmt
N i ,..., 1 = taken from the unannounced spot visits were
unavailable. Instead on e of these measurements over a year was saved, ly an averag
. dlmt
Z . Hence, when we analyzed the data on the log scale, lost information arose
using log of the average values rather than average of log values. Therefor due to e,
of logn
gnormal random variable with appropriate mean and variance. Then
the following was attempted. We observed (see below) that the arithmetic average
ormal random variables might often be reasonably well approximated as a
lo
()
⎟
⎞
⎜
⎛
+ =
2
1
exp γ
dlmti
x Z E
⎠ ⎝
2
dlmt
and
( ) ( ) ( ) 1 exp 2 exp
2 2
− + = γ γ x Z Var .
dlmt dlmti
Hence,
()
⎟
⎠
⎞
⎜
⎝
⎛
+ =
2
.
2
1
exp γ
dlmt dlmt
x Z E
and
() ( ) ( ) [ ] 1 exp 2 exp
1
2 2
.
− + = γ γ
dlmt
dlmt
dlmt
x
N
Z Var ,
where is the number of spot visits made in that year at district d, locality l, and
mine m. The mean and variance are same as a lognormal random variable with
dlmt
N
122
⎥
⎦
⎤
⎢
⎣
⎡
1
2
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛ −
+ − +
dlmt
dlmt
N
x
1 exp
1 log
2
2
γ
γ and
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛ −
+
dlmt
N
1 exp
1 log
2
γ
, respectively. Therefore,
in order to center =
dlmt
z ( )
dlmt
Z log about , we subtracted x
dlmt
⎥
⎦
⎢
⎣
⎟
⎟
⎠
⎜
⎜
⎝
+ −
N
1 log
2
2
γ . Hence, we defined
⎤ ⎡
⎞ ⎛ −
dlmt
1 exp 1
2
γ
()
⎥
⎦
⎢
⎣
⎟
⎠
⎜
⎝ dlmt
dlmt dlmt
N
Z z 1 log
2
log
2
.
.
In addition, approximately 10% of the measured average doses
⎤ ⎡
⎟
⎞
⎜
⎛ −
+ − − =
1 exp 1
2
γ
γ
. dlmt
Z were coded as
equal to zero. Hence to avoid the problem of taking the log of zero, 0.3 WLM (the
current standard) was first added to each
. dlmt
Z .
Table 13 Estimated Expectation and Variance of Difference between log of
Observed Measurements and log of True Measurements over the
Number of Spot Visits Made Per Year from Simulation Test
Number of spot visits (year) ( )
t t
x z E − ˆ ( )
t t
x z Var − ˆ ()
t
z Var
1
3
5
10
50
100
200
300
-0.0054
0.0110
-0.0009
-0.0003
-0.0024
0.0011
0.0008
0.0010
0.2624
0.0894
0.0527
0.0263
0.0056
0.0029
0.0014
0.0010
0.2500
0.0905
0.0553
0.0280
0.0057
0.0028
0.0014
0.0009
A quick simulation test was done to validate the above approximation of the
ons
arithmetic average of lognormals by a lognormal. The expectation and variance of
x were adapted from the technical report by Stram et al. (1998). The simulati
dlmt
123
were performed over various numbers of spot visits per year. The estimated
expectation of the difference between redefined log measurement and true log
wa lose to ze onl r (Table 13).
Furthermore, the variance e wn to be almost equivalent to the
explicit variance of
Expectation and Variance of Lognormal Distribution
dlmt
z
measurement
dlmt
x s c ro after y 5 visits per yea
stimation was also sho
dlmt
z .
Let x
1
and x
2
be bivariate normal with mean
i
µ and varia
2
i
σ , i=1,2. If
i
x
i
e X = , then X
i
has lognormal distribution with ()
nce
⎟
⎠
⎜
⎝
+ =
i i i
X E σ µ
2
exp and
⎞ ⎛ 1
() ( )
2
2 exp
i i i
X Var σ µ + = .
() ( ) ( ) ( )
2 1 2 1 2 1
, X E X E X X E X X Cov − = Since , we need to calculate
=
2 1
E
o variate normal distribution for X
1
and X
2,
where
is the joint distribution of x Then
() ) ()
∫∫
∞
∞ −
∞
∞ −
+ +
=
2 1 2 1
,
2 1 2 1
dx dx x x f e e X X E
x x x x
, which is closely related to the (
m ment generating function of bi
()
2 1
, x x f
1
and x
2
.
() ()
⎭
⎬
⎫
⎩
⎨ 1
µ
⎧
+
2 1
σ σ σ + + + =
+ 2
2
2
1 2
2
2
1
exp
2 1
ρ σ µ
x x
e E , where ρ is the corre
between X
1
an Therefore,
lation
d X
2
.
()() ()
⎭
⎬
⎩
⎨ 2 1 2 1 2 1
2
exp 1 , .
⎫ ⎧
+ + + − =
2 2
1
2 1
σ σ µ µ
σ ρσ
e X X Cov
Expectation and Variance in Hierarchical Model
As noted previously, the measured values are written as
124
() ( ) dl m d l d
dlmt
b b b
e x z
⎥
⎦
⎢
⎣
+ + +
+ =
β
ε
1
() ( ) () ( )
()
]
() ( )
dlmt dlmt
dlmt dlmt M L D
M
dl m
L
d l
D
d
dlmt
M
dl m
L
d l
D
d
M
dl m
L
d l
D
d
dlmt dlmt dlmt
e
e
a a a
t
e t b b b a a a
+ + =
+ +
⎥
⎤
⎢
⎡ + + +
=
+ + × + + + + + + + =
ε
ε
α
β α
β t
with
[
( )
2
, 0 ~ σ ε N d
dlmt
an has known variance . Hence,
dlmt
e
2
dlmt
γ
( ) () (
(
) ()
)
dlmt
dlmt
M
dl m d
t b
ε
ε
+ =
+ × +
β t
This yields
L
l
D
d
M
dl m
L
d l
D
d dlmt
b b a a a x β α + + + + + + =
( )
() ( ) β β β
β β
z
t | e x | z = + = =
x
t | x = =
E E
E
z
x
µ
µ
Note th x and z volves
time when the exposure dose measurements we the
t
z
involves time when a miner worked in a underground mine that contributed to an
individual dose estimation. This yields the conditional expectation of the true
measurement given the observed measurement as
at t’s, when calculating expectations of , are different. The t
x
in
re taken in a mine-year, whereas
x z
( ) ( )
z
-1
zz xz x
z z x µ µ − Σ Σ + = | E .
s
,
Since x also can be written a
dlmt
()
()
()
()
dlmt M
dl m
M
dl m
L
d l
L
d l
D
d
D
d
dlmt
b
a
b
a
b
a
x ε
β
α
+
⎥
⎥
⎦
⎤
⎢
⎢
⎣
⎡
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
+
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
+
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
+
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
= t
125
the covariance of x forms a nested block matrix as the following:
()
'' ' '
'' '''
'
'' ' '
'
,
,
dlmt d l m t
LM
D
ld m dl
d
D LM
d ld m dl
LM
D
ld m dl
d
dlmt d l m t
m
Cov x x
aa
a
b bb
Cov
aa
a
α
β
α
εε
⎧⎫
⎡⎤
⎛⎞⎛ ⎞
⎛⎞ ⎛⎞
⎪⎪
⎢⎥ ⎜⎟⎜ ⎟ ++ +
⎜⎟ ⎜⎟
⎜⎟⎜ ⎟ ⎪⎪
⎢⎥
⎝⎠ ⎝⎠
⎝⎠⎝ ⎠
⎪⎣ ⎦ ⎪
=
⎨⎬
⎡⎤
⎛⎞⎛ ⎞
⎪⎪
⎛⎞ ⎛⎞
⎢⎥ ⎜⎟⎜ ⎟ +++ + +
⎪
⎜⎟ ⎜⎟
⎢⎥
⎝ ⎪
⎣⎦
⎩⎭
t
t'
()
()
()
()
()
()
()
()
()
()
()
'
'' '''
'
' '' '''
'
, | ', ', '
,
, | ', ', '
dlmt dlmt
LM L M
DD
ld m dl l d m d l
dd
DD LM L M
dd ld m dl l d m d l
dlmt dlmt
M
mdl
Cov d d l l m m
aa a a
aa
Cov
bb bb b b
Cov d d l l m m
a
Cov
εε
εε
⎪
⎪
====
⎧⎫
⎛⎞⎛ ⎞ ⎛ ⎞⎛ ⎞
⎛⎞ ⎛⎞
⎪⎪
⎜⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ++ + + +
⎨⎬ ⎜⎟ ⎜⎟
⎜⎟⎜ ⎟ ⎜ ⎟⎜ ⎟
⎝⎠ ⎝⎠ ⎪⎪
⎝⎠⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎩⎭
===
+
tt'
t
()
()
()
()
()
'''
'
,| ','
M
mdl
MM
DD
ld ld
a
dd l l
bb
aa
aa
⎧⎫
⎛⎞⎛ ⎞
⎪⎪
⎜⎟⎜ ⎟ ==
⎨⎬
⎜⎟⎜ ⎟
⎪⎪
()
()
()
()
()
()
() ( )
' '' ''
D LM
d ld dl
b bb β ⎜ ⎟⎜ ⎟
⎠ ⎝⎠
⎝ ⎠⎝ ⎠
=
() () '''
'
', ', ' ', '
,| ' ,
|| |
mdl m d l
LL
ld ld
dd
DD LL
dd
ML
dd l l m m d d l l d
Cov d d Cov
bb bb
DD Σ
== = = =
⎝⎠
⎧⎫
⎛⎞⎛⎞
⎧⎫ ⎛⎞⎛⎞
⎪⎪⎪ ⎪
⎝ ⎠
⎩⎭
⎜⎟⎜⎟ +=+
⎨⎬⎨ ⎬ ⎜⎟⎜⎟
⎜⎟⎜⎟
⎪⎪ ⎝⎠⎝⎠ ⎪⎪
⎝⎠⎝⎠
⎩⎭
t'
⎩⎭
=+ +
tt't t'
tt' t t'
'
D
d
D
=
+tt'
Similarly, the covariance of z is
() { }
M
D V z z Cov + Σ + = | | , t' t t' t t' t
D
d d
L
l l d d m m l l d d t m l d dlmt
D D + +
= = = = = = ' ' , ' ' , ' , ' ' ' ' '
| ,
and the covariance of x and z is
D D D + + +
= = = = = ' ' , ' ' , ' , '
| | .
()
d t m l d dlmt
z x Cov Σ =
= ' ' ' '
| ,
t' t t' t t' t
D
d d
L
l l d d
M
m m l l d
Abstract (if available)
Abstract
In occupational cohort studies, a panel of experts often creates an exposure matrix or a dosimetry system that estimates dose histories for workers, and then these estimates are used in disease-risk analysis. Errors in the exposure matrix that were shared by time and/or a group of workers were generally ignored. We have developed and tested two different methods (Monte-Carlo maximum likelihood and fully parametric bootstrap methods) to study the effect of shared uncertainties. A simple simulation experiment showed that the MCML agreed with the uncorrected likelihood ratio test for small additive and small shared multiplicative error distributions. Clear widening of confidence intervals was seen from the MCML and the fully parametric bootstrap methods as the shared multiplicative error increased. Although the confidence intervals widened for both methods under the large error model, the range of the confidence intervals disagreed. Hence, a validation analysis was conducted using the Oak Ridge National Laboratory cohort data. We performed multiple runs of the MCML method on newly created outcome data from running the fully parametric bootstrap method, and saw that the MCML method was quite a feasible way to correct for shared uncertainties. However, the results from the MCML method applied to the Colorado Uranium Miners data showed additional work may be necessary. The results suggest that 1,000 replications created from the complex dosimetry system of the Colorado Uranium Miners may not be sufficient to fully capture the variability of uncertainties. In addition, true dose may need to be sampled from a distribution conditional on both disease and input data in a case when there is a strong speculation that dose-response relationship exists.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Inference correction in measurement error models with a complex dosimetry system
PDF
Nonparametric estimation of an unknown probability distribution using maximum likelihood and Bayesian approaches
PDF
Models and algorithms for pricing and routing in ride-sharing
PDF
Efficient inverse analysis with dynamic and stochastic reductions for large-scale models of multi-component systems
Asset Metadata
Creator
Johnson, Terri Kang
(author)
Core Title
Correcting for shared measurement error in complex dosimetry systems
School
Keck School of Medicine
Degree
Doctor of Philosophy
Degree Program
Biostatistics
Publication Date
06/07/2007
Defense Date
03/22/2007
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Colorado Plateau Uranium Miners,complex dosimetry system,monte carlo maximum likelihood,OAI-PMH Harvest,Oak Ridge National Laboratory,parametric bootstrap,radiation risk,shared error
Language
English
Advisor
Stram, Daniel O. (
committee chair
), Gilliland, Frank D. (
committee member
), Langholz, Bryan (
committee member
), Tavaré, Simon (
committee member
), Thomas, Duncan (
committee member
)
Creator Email
tkang@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m519
Unique identifier
UC1339138
Identifier
etd-Johnson-20070607 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-499970 (legacy record id),usctheses-m519 (legacy record id)
Legacy Identifier
etd-Johnson-20070607.pdf
Dmrecord
499970
Document Type
Dissertation
Rights
Johnson, Terri Kang
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
Colorado Plateau Uranium Miners
complex dosimetry system
monte carlo maximum likelihood
Oak Ridge National Laboratory
parametric bootstrap
radiation risk
shared error