Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Non-parametric multivariate regression hypothesis testing
(USC Thesis Other)
Non-parametric multivariate regression hypothesis testing
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
NON-PARAMETRIC MULTIVARIATE REGRESSION HYPOTHESIS TESTING
by
Shanshan Xu
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulllment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(MATHEMATICS)
December 2012
Copyright 2012 Shanshan Xu
To my parents.
ii
Acknowledgments
My deepest gratitude is to my academic advisor, Dr. Rand Wilcox, for teaching me
much of the background for this thesis, suggesting the direction of research, and giving
substantial help along the way. His patience and support helped me overcome many
crisis situations and nish this dissertation. I am also deeply grateful to my co-advisor,
Dr. Sergey Lototsky, who has been always there to give guidance and advice.
My graduate career also benets a lot from the excellent teaching of many of the
mathematics faculty at the University of Southern California. I am also thankful to all
graduate students and sta in the mathematics department at USC for their support
and encouragement.
Most importantly, I would like to express my heart-felt gratitude to my family for
their constant love and concern.
Finally, I appreciate the nancial support from the graduate school and technical
support from HPCC at USC.
iii
Table of Contents
Dedication ii
Acknowledgments iii
List of Tables v
List of Figures vii
Abstract viii
Chapter 1: Introduction 1
Chapter 2: Methodology 3
2.1 Minimum Covariance Determinant Estimator . . . . . . . . . . . . . . . . 3
2.2 Method 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Method 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Method 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Chapter 3: Simulation 10
3.1 Simulation Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.1 Method 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.2 Method 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.3 Method 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Chapter 4: Conclusion 24
Bibliography 25
iv
List of Tables
3.1 Some properties of the g-and-h distribution. . . . . . . . . . . . . . . . . . 10
3.2 Choices for (X
i
) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Method 1, VP1(homoscedasticity),
x
= 0,
y
= 0 . . . . . . . . . . . . . . 13
3.4 Method 1, VP1,
x
= 0:5,
y
= 0 . . . . . . . . . . . . . . . . . . . . . . . 13
3.5 Method 1, VP1,
x
= 0,
y
= 0:5 . . . . . . . . . . . . . . . . . . . . . . . 14
3.6 Method 1, VP1,
x
= 0:5,
y
= 0:5 . . . . . . . . . . . . . . . . . . . . . . 14
3.7 Method 1, VP2,
x
= 0,
y
= 0 . . . . . . . . . . . . . . . . . . . . . . . . 14
3.8 Method 1, VP3,
x
= 0,
y
= 0 . . . . . . . . . . . . . . . . . . . . . . . . 15
3.9 Method 1, VP4,
x
= 0,
y
= 0 . . . . . . . . . . . . . . . . . . . . . . . . 15
3.10 Method 1, VP5,
x
= 0,
y
= 0 . . . . . . . . . . . . . . . . . . . . . . . . 15
3.11 Method 2, VP2,
x
= 0,
y
= 0 . . . . . . . . . . . . . . . . . . . . . . . . 16
3.12 Method 2, VP2,
x
= 0:8,
y
= 0 . . . . . . . . . . . . . . . . . . . . . . . 16
3.13 Method 2, VP2,
x
= 0,
y
= 0:8 . . . . . . . . . . . . . . . . . . . . . . . 17
3.14 Method 2, VP2,
x
= 0:8,
y
= 0:8 . . . . . . . . . . . . . . . . . . . . . . 17
3.15 Method 2, VP3,
x
= 0,
y
= 0 . . . . . . . . . . . . . . . . . . . . . . . . 17
3.16 Method 2, VP3,
x
= 0:8,
y
= 0 . . . . . . . . . . . . . . . . . . . . . . . 18
3.17 Method 2, VP3,
x
= 0,
y
= 0:8 . . . . . . . . . . . . . . . . . . . . . . . 18
3.18 Method 2, VP3,
x
= 0:8,
y
= 0:8 . . . . . . . . . . . . . . . . . . . . . . 18
3.19 Method 2, VP4,
x
= 0,
y
= 0 . . . . . . . . . . . . . . . . . . . . . . . . 19
3.20 Method 2, VP4,
x
= 0:8,
y
= 0 . . . . . . . . . . . . . . . . . . . . . . . 19
v
3.21 Method 2, VP4,
x
= 0,
y
= 0:8 . . . . . . . . . . . . . . . . . . . . . . . 19
3.22 Method 2, VP4,
x
= 0:8,
y
= 0:8 . . . . . . . . . . . . . . . . . . . . . . 20
3.23 Method 2, VP5,
x
= 0,
y
= 0 . . . . . . . . . . . . . . . . . . . . . . . . 20
3.24 Method 2, VP5,
x
= 0:8,
y
= 0 . . . . . . . . . . . . . . . . . . . . . . . 20
3.25 Method 2, VP5,
x
= 0,
y
= 0:8 . . . . . . . . . . . . . . . . . . . . . . . 21
3.26 Method 2, VP5,
x
= 0:8,
y
= 0:8 . . . . . . . . . . . . . . . . . . . . . . 21
3.27 Method 3, VP1,
x
= 0,
y
= 0 . . . . . . . . . . . . . . . . . . . . . . . . 22
3.28 Method 3, VP2,
x
= 0,
y
= 0 . . . . . . . . . . . . . . . . . . . . . . . . 22
3.29 Method 3, VP3,
x
= 0,
y
= 0 . . . . . . . . . . . . . . . . . . . . . . . . 23
3.30 Method 3, VP4,
x
= 0,
y
= 0 . . . . . . . . . . . . . . . . . . . . . . . . 23
3.31 Method 3, VP5,
x
= 0,
y
= 0 . . . . . . . . . . . . . . . . . . . . . . . . 23
vi
List of Figures
2.1 An example of Heteroscedasticity . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 MCD is less aected by the outliers . . . . . . . . . . . . . . . . . . . . . . 5
vii
Abstract
We introduce three nonparametric multivariate methods for testing the elements of the
regression matrix. We investigate the nite-sample performance, robustness and het-
eroscedasticity of these methods. Our simulation results show that Method 1 performs
well when the error term has a non-Gaussian distribution and there is homoscedasticity.
Method 2 performs well when there is heteroscedasticity, but is conservative. Method 3
modies the second method on the side of conservativeness.
viii
Chapter 1
Introduction
Letf(x
i
;y
i
)g,i = 1;:::;n be a multivariate sample of size n, where x
i
= (x
i1
;:::x
ip
)
t
is a
p-variate predictor and y
i
= (y
i1
;:::;y
iq
)
t
is a q-variate response. A general objective is
understanding how y
i
is related to the p predictors. The most common assumption is
that y
i
is a linear combination of the p predictors plus an error term:
y
i
=B
t
x
i
+
i
(1.1)
where B =
0
B
B
B
B
B
B
B
@
11
:::
1q
: ::: :
: ::: :
p1
:::
pq
1
C
C
C
C
C
C
C
A
is the slope matrix.
We are interested in testing
H
0
:B = 0 (1.2)
One way of doing this is to carry out a series of univariate hypothesis testing for
each element of the slope matrix
pq
. However, by using multiple signicance tests, the
chance of false positive ndings will be increased as the number of tests increases. In
addition, the information about the correlation structure is missing by analyzing each
response variable separately. Therefore, a multivariate technique is needed.
Most classical multivariate methods assume that the error term satises:
E(
i
) = 0
VAR(
i
) =
2
(homoscedastic)
The
i
are independent and identically distributed and have a normal distribution.
1
If the error term,
i
, has variance
2
i
and
2
i
6=
2
j
for some i6=j, then the model is
said to be heteroscedastic, which is more accurate for many real data. Heteroscedasticity
can result in relative low eciency when using ordinary least squares estimator, meaning
that the estimator of can have a relatively large standard error. So results of hypothesis
tests are possibly wrong.
Our main goal is to construct hypothesis testing methods that perform well when
the error term has a non-normal distribution or when there is heteroscedasticity. In
addition, another goal is to avoid the defective eects of outliers including poor eciency.
Classical regression methods, such as OLS(Ordinary Least Squares), don't perferm well
under these cases. Therefore we need an alternative estimator for the slope which is more
robust. The weighted Minimum Covariance Determinant estimator(MCD) proposed in
Rousseeuw 2004 [1] is performed here.
To test the slope, three methods were studies. Method 1 is based on the wild boot-
strap technique and data depth method. The idea is to see how deep the test estimator
located in the cluster of estimators acquired by bootstrap samples. Method 1 performs
well when the error term has a non-Gaussian distribution and there is homoscedasticity.
Method 2 combines MCD estimator with bootstrap estimate of standard errors followed
by Hotelling type test. This method performs well when there is heteroscedasticity, but
is conservative. Method 3 uses a rened bootstrap technology to decrease the overlaps
in resampling. And it modies the second method on the side of conservatism.
2
Chapter 2
Methodology
2.1 Minimum Covariance Determinant Estimator
Let (X
i
;Y
i
),i = 1;:::;n, be a multivariate sample of size n, and supposeX
i
= (x
i1
;:::x
ip
)
t
is a p-variate predictor, Y
i
= (y
i1
;:::;y
iq
)
t
is a q-variate response. The linear regression
model is given by
Y
i
=B
t
X
i
+ +
i
(2.1)
where is the constant and
i
is the error term. Least squares regression estimator is
by far the most widely used modeling method, dened to be
^
B =
^
1
xx
^
xy
(2.2)
^ = ^
y
^
B
t
^
x
(2.3)
^
=
^
yy
^
B
t
^
xx
^
B (2.4)
However, there are several problems with this approach. The breakdown point of
the OLS is only 1=n. This means that a single outlier can destroy the estimator in the
sense that it can be made arbitrarily large or small. Furthermore, this method is optimal
for the Gaussian distribution, but loses a great deal of its statistical eciency for even
slightly non-Gaussian errors. Low eciency also arises even when the error term has a
normal distribution but is heteroscedastic. See Wilcox (2005). And also an example in
Rousseeuw(2004) showed that the classical multivariate regression did not detect leverage
points as regression outliers because they have small residual distance. Hence, clearly,
the least squares multivariate regression has been in
uenced by this leverage point.
3
Figure 2.1: An example of Heteroscedasticity
Heteroscedasticity may be caused by measurement error. It may also arise from
violation of the assumption.
Therefore, we need some alternative regression methods with higher breakdown point,
and remain high eciency when the error term is heteroscedastic. Rousseeuw(1983)
proposed a robust method for multivariate regression based on the minimum covariance
determinant (MCD) estimate of the joint (x;y) variables.
Let Z
i
= (X
i
;Y
i
), then Z
1
;:::;Z
n
2 R
p+q
anddn=2e h n. First select a subset
Z
i
j
;j = 1;:::;h whose covariance matrix has the smallest determinant among all the
size h subsets.
The MCD estimator is dened to be
t
n
=
1
h
h
X
j=1
Z
i
j
(2.5)
for the center and
C
n
=c
n
c
1
h
h
X
j=1
(z
i
j
t
n
) (z
i
j
t
n
)
t
(2.6)
4
Figure 2.2: MCD is less aected by the outliers
for the scatter estimate wherec
is a consistency factor andc
n
is a small sample correction
factor (see Pison et al. 2002). Regressions of all possible splits in X and Y variables can
be carried out once the MCD of the joint (X; Y) has been computed.
One way to judge the robustness of an estimator is the breakdown point mean-
ing the proportion of arbitrarily large or small observations an estimator can handle
before giving an arbitrarily large result. The breakdown point of the MCD estimator is
approximately
= (nh)=n. The highest possible breakdown value is achieved when
h =d(n +p +q + 1)=2en=2 where
:5. It has be shown that the MCD can have
a low eciency.
To increase the eciency, Rousseeuw (2004) proposed LRMCD, a reweighted regres-
sion estimator by reweighting the location and regression.
Reweighting the location and scatter with nominal trimming portion
L
:
t
L
n
=
P
n
i=1
w(d
2
(Z
i
))Z
i
P
n
i=1
w(d
2
(Z
i
))
(2.7)
5
and
C
L
n
=d
L
P
n
i=1
w(d
2
(Z
i
))(Z
i
t
L
n
)(Z
i
t
L
n
)
t
P
n
i=1
w(d
2
(Z
i
))
(2.8)
where d
L
is a consistency factor. d(Z
i
) = ((Z
i
t
n
)
t
C
1
n
(Z
i
t
n
))
1=2
is the robust
distance of obervationZ
i
based on the initial MCD estimates (t
n
;C
n
) in (2.5) and (2.6).
The weights are computed as w(d
2
(Z
i
)) =I(d
2
(Z
i
q
L
)), where q
L
=
2
p+q;1
L
. And
it is customary to take
L
=:025. (Rousseeuw and Van Driessen 1999)
Now we can compute the multivariate regression estimates
^
B
L
, ^
L
, and
^
L
in (2.2),
(2.3), (2.4) based on the reweighted location and scatter (t
L
n
;C
L
n
), and get residuals
r
L
i
=Y
i
(
^
B
L
)
t
X
i
^
L
.
By weighting the residuals r
L
i
, LRCMD, the reweighted regression estimators are
dened as:
^
T
LR
n
=
n
X
i=1
w(d
2
(r
L
i
))u
i
u
t
i
!
1
n
X
i=1
w(d
2
(r
L
i
))y
i
u
i
(2.9)
and
^
LR
=d
r
P
n
i=1
w(d
2
(r
L
i
))r
LR
i
(r
LR
i
)
t
P
n
i=1
w(d
2
(r
L
i
))
(2.10)
where
^
T
LR
n
= ((
^
B
LR
)
t
; ^
LR
)
t
, u
i
= (x
t
i
; 1)
t
. And d(r
L
i
) = ((r
L
i
)
t
(
^
L
)
1
r
L
i
)
1=2
,
w(d
2
(r
L
i
)) =I(d
2
(r
L
i
)q
r
), where q
r
=
2
q;1r
.
It has been shown that LRMCD has better eciency than MCD while maintains the
high breakdown value. And
:25 gives a better compromise between eciency and
breakdown value.
LRMCD is used in the following three methods.
2.2 Method 1
The rst method combines bootstrap technique and data depth methods.
The critical idea of bootstrap technique is that rather than by making perhaps unre-
alistic assumptions about the distribution associated with the population, it may be
better to draw conclusions about the characteristics of a population strictly from the
6
sample at hand. Bootstrapping involves resampling the data with replacement many,
many times in order to generate an empirical estimate of the entire sampling distribution
of a statistic.
To test H
0
:B
i
j = 0, i = 1;:::;p, j = 1;:::q:
Step 1: Find the LRMCD regression estimator
^
0
=
(
^
B
1;1
;:::;
^
B
1;q
;
^
B
2;1
;:::
^
B
2;q
;:::;
^
B
p;1
;
^
B
p;q
) for the initial sample (X
i
;Y
i
);i = 1;:::;n
Step 2: Use xed-X bootstrap to resample the response variable and generate
(X
i
;Y
i
)
Note: The bootstrap we used here is fixedX bootstrap, which is resampling Y
only. It is practical in some cases where the observations are meant to represent a larger
'population', but they are not literally sampled from that population. It therefore makes
some sense to think of these observations used in the regression as xed with respect to
replication of the study. The response values, however, are random, because of the error
component of the model.
Step 3: Apply LRMCD estimator to (X
i
;Y
i
) and get
^
Step 4: This process was repeated B times, resulting in a cluster of bootstrap
estimators
Step 5: Assuming the null hypothesis, which means setting (0; 0; 0;:::; 0)
1pq
as the center the cluster, the depth or outlyingness was measured in a center-
outward ordering of the cluster. compute the Mahalanobis distance from each
^
away from the center, and count J=the number of
n
withjj
n
jj greater
than
^
0
Step 6: Get the p-value as p =J=B
The idea of this Limiting P-value is given by R.Liu and K. Singh 1997.
Table 3.3-3.6 show estimators of Type 1 error under dierent correlation matrix of X
and Y. From the results, it doesn't make much dierence when correlation changes. And
Method 1 is robust when both X
i
and
i
having skewed and heavy-tailed distribution.
And Table 3.7-3.11 show Method 1 one has little power under heteroscedasticity.
7
2.3 Method 2
Step 1: Find the LRMCD regression estimator
^
0
=
(
^
B
1;1
;:::;
^
B
1;q
;
^
B
2;1
;:::
^
B
2;q
;:::;
^
B
p;1
;
^
B
p;q
) for the initial sample (X
i
;Y
i
);i = 1;:::;n
Step 2: Use bootstrap to resample pairs of observations and generate (X
i
;Y
i
)
Note: Here we used resampling of both X
i
and Y
i
.
Step 3: Apply LRMCD estimator to (X
i
;Y
i
) and get
^
Step 4: This process was repeated B times, resulting in
^
1
, ...,
^
B
Step 5: Assume
^
has distribution T
2
pq;n2
, nd the p-value of
^
0
For Method 2, we tested dierent kinds of heteroscedasticity by using dierent (X)
showed in Table 3.2. Table 3.12-3.27 shows it performs well except being too conservative
with VP4.
2.4 Method 3
The simple bootstrap estimate of prediction error is not entirely satisfactory. Each
bootstrap sample is drawn from the same original sample; the original sample is not
really a pure test sample. The overlap leads to the test method conservative. To decrease
the overlap eect, for any bootstrap, about a third of the observations will by chance
not be selected for the test sample. With a sucient number of bootstrap samples, any
given observation will likely fall in a test sample several times. This rened bootstrap
technique leads the test method against the conservative side.
Step 1: From the initial sample (X
i
;Y
i
);i = 1;:::;n, randomly choose 70%
portion of it (X
0
i
;Y
0
i
);i = 1;:::;m where m=70%n.
Step 2: Find the LRMCD regression estimator
^
0
=
(
^
B
1;1
;:::;
^
B
1;q
;
^
B
2;1
;:::
^
B
2;q
;:::;
^
B
p;1
;
^
B
p;q
) for (X
0
i
;Y
0
i
);i = 1;:::;m.
Step 3: Use bootstrap to resample the entire initial sample (X
i
;Y
i
);i = 1;:::;n
and generate (X
i
;Y
i
)
8
Note: Here (X
i
;Y
i
) would not be less overlapped with the sample we used in Step
2 to estimate
^
0
since it contains 30% unselected initial sample.
Step 4: Apply LRMCD estimator to (X
i
;Y
i
) and get
^
Step 5: This process was repeated B times, resulting in
^
1
, ...,
^
B
Step 6: Assume
^
has distribution T
2
pq;n2
, nd the p-value of
^
0
The comparison between tables 3.20-3.23 and 3.31 shows that method 3 improves
method 2 under VP4.
9
Chapter 3
Simulation
3.1 Simulation Design
To generate the skewed and long-tailed distributions of both X and Y, g- and -h distri-
butions (Hoaglin, 1985) is used. It is a transformation of standard normal distribution
along with two changeable coecientsg andh. More precisely, if Z is a standard normal
random variable, then X is dened to be
X =
exp(gZ) 1
g
exp(hZ
2
=2) (3.1)
When g = 0,
X =Z exp(hZ
2
=2): (3.2)
The case g = h = 0 corresponds to a standard normal distribution. When g = 0, the
distribution is symmetric, and the tail of the distribution gets heavier as h gets large.
Table 3.1: Some properties of the g-and-h distribution.
g h
1
2
0:0 0:0 0:00 3:0
0:0 0:2 0:00 21:46
0:0 0:5 0:00
0:2 0:0 0:61 3:68
0:5 0:0 1:75 8:9
0:2 0:2 2:81 155:98
0:5 0:5
10
Table 3.1 summarizes some properties of these four g-and-h distributions, where
skewness is measured with
1
=
3
=
1:5
2
, and kurtosis with
2
=
4
=
2
2
, where
k
=
E(X)
k
.
For each method, X
i
were generated in g- and -h distributions with combinations as
standard normal (g=0, h=0), asymmetric light-tailed (g=0.5, h=0), symmetric heavy-
tailed (g=0, h=0.5), and asymmetric heavy-tailed (g=0.5, h=0.5). The error term
i
is
also independently generated with these four g-h distributions.
Correlations among the x valus were formed as follows. Let R be the desired corre-
lation matrix and form the Cholesky decompositionU
0
U =R, whereU
0
is the transpose
ofU. ThenXU produces annp matrix of data that has population correlation matrix
R. Correlations among the error terms were formed in a similar manner. Here we tested
R =
0
B
B
B
B
B
B
B
@
1 :::
1 :::
: ::: :
::: 1
1
C
C
C
C
C
C
C
A
where = 0; 0:5; 0:8 for both X
i
and
i
.
To test whether our hypothesis testing methods perform well under heteroscedastic
cases, we use this model in stead of (2.1)
Y
i
=B
t
X
i
+ +(X
i
)
i
(3.3)
Heteroscedasticity was generated using ve functions of (X
i
) listed in Table 3.2.
Table 3.2: Choices for (X
i
)
VP (X
i
)
VP 1 (X
i
) = 1
VP 2 (X
i
) =
p
jX
i
j
VP 3 (X
i
) =jX
i
j
VP 4 (X
i
) = 1 + 2= (jX
i
j + 1)
VP 5 (X
i
) =jX
i
+ 1j
11
Here we focused on the small sample performance, so sample size is chosen to be
40 and 60. We examined every possible combination of (X
i
;
i
) from 4 types of g-
and -h distributions along with dierent correlation matrix. The actual Type I error
probability, when testing at the =:05 level, is estimated by the proportion of p-values
less than .05 based on 1,000 resamplings.
12
3.2 Simulation Results
3.2.1 Method 1
We tested VP1-VP5 withp = 3,q = 2, sample sizen = 40 and bootstrap resmpling times
b = 1000. Method 1 performs well under VP1(homoscedasticity) and the correlation
matrix does not make much dierence. We tested VP2-VP5 with
x
= 0 and
y
= 0:
Table 3.3: Method 1, VP1(homoscedasticity),
x
= 0,
y
= 0
g
x
= 0 g
x
= 0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.045 0.051 0.051 0.050
g
y
= 0:5
h
y
= 0 0.041 0.044 0.042 0.040
g
y
= 0
h
y
= 0:5 0.012 0.011 0.031 0.028
g
y
= 0:5
h
y
= 0:5 0.017 0.050 0.031 0.040
Table 3.4: Method 1, VP1,
x
= 0:5,
y
= 0
g
x
= 0 g
x
= 0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.046 0.047 0.050 0.051
g
y
= 0:5
h
y
= 0 0.049 0.045 0.047 0.045
g
y
= 0
h
y
= 0:5 0.018 0.030 0.029 0.027
g
y
= 0:5
h
y
= 0:5 0.017 0.021 0.035 0.042
13
Table 3.5: Method 1, VP1,
x
= 0,
y
= 0:5
g
x
= 0 g
x
= 0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.042 0.055 0.055 0.047
g
y
= 0:5
h
y
= 0 0.047 0.048 0.005 0.048
g
y
= 0
h
y
= 0:5 0.020 0.028 0.020 0.035
g
y
= 0:5
h
y
= 0:5 0.021 0.026 0.034 0.042
Table 3.6: Method 1, VP1,
x
= 0:5,
y
= 0:5
g
x
= 0 g
x
= 0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.046 0.052 0.047 0.047
g
y
= 0:5
h
y
= 0 0.048 0.044 0.047 0.049
g
y
= 0
h
y
= 0:5 0.023 0.030 0.029 0.030
g
y
= 0:5
h
y
= 0:5 0.019 0.027 0.034 0.040
Table 3.7: Method 1, VP2,
x
= 0,
y
= 0
g
x
= 0 g
x
= 0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.114 0.113 0.135 0.132
g
y
= 0:5
h
y
= 0 0.104 0.092 0.101 0.102
g
y
= 0
h
y
= 0:5 0.037 0.055 0.055 0.068
g
y
= 0:5
h
y
= 0:5 0.042 0.057 0.058 0.068
14
Table 3.8: Method 1, VP3,
x
= 0,
y
= 0
g
x
= 0 g
x
= 0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.194 0.219 0.196 0.183
g
y
= 0:5
h
y
= 0 0.154 0.195 0.151 0.148
g
y
= 0
h
y
= 0:5 0.092 0.093 0.091 0.081
g
y
= 0:5
h
y
= 0:5 0.088 0.090 0.069 0.082
Table 3.9: Method 1, VP4,
x
= 0,
y
= 0
g
x
= 0 g
x
= 0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.031 0.020 0.022 0.020
g
y
= 0:5
h
y
= 0 0.023 0.025 0.023 0.015
g
y
= 0
h
y
= 0:5 0.012 0.022 0.021 0.024
g
y
= 0:5
h
y
= 0:5 0.013 0.017 0.026 0.023
Table 3.10: Method 1, VP5,
x
= 0,
y
= 0
g
x
= 0 g
x
= 0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.137 0.183 0.165 0.167
g
y
= 0:5
h
y
= 0 0.104 0.162 0.128 0.129
g
y
= 0
h
y
= 0:5 0.066 0.096 0.077 0.063
g
y
= 0:5
h
y
= 0:5 0.048 0.075 0.064 0.074
15
3.2.2 Method 2
We tested Method 2 under VP2-VP5 with sample size n = 60, p = 3 and q = 2. This
method is too conservative under VP4 which implies it may have a large type II error.
Table 3.11: Method 2, VP2,
x
= 0,
y
= 0
g
x
= 0 g
x
= 0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.020 0.016 0.021 0.030
g
y
= 0:5
h
y
= 0 0.015 0.016 0.018 0.022
g
y
= 0
h
y
= 0:5 0.011 0.010 0.007 0.007
g
y
= 0:5
h
y
= 0:5 0.011 0.006 0.007 0.011
Table 3.12: Method 2, VP2,
x
= 0:8,
y
= 0
g
x
= 0 g
x
= 0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.017 0.009 0.018 0.023
g
y
= 0:5
h
y
= 0 0.008 0.014 0.017 0.019
g
y
= 0
h
y
= 0:5 0.008 0.005 0.008 0.010
g
y
= 0:5
h
y
= 0:5 0.010 0.003 0.007 0.007
16
Table 3.13: Method 2, VP2,
x
= 0,
y
= 0:8
g
x
= 0 g
x
= 0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.016 0.016 0.019 0.020
g
y
= 0:5
h
y
= 0 0.012 0.019 0.013 0.021
g
y
= 0
h
y
= 0:5 0.009 0.008 0.008 0.013
g
y
= 0:5
h
y
= 0:5 0.006 0.009 0.007 0.014
Table 3.14: Method 2, VP2,
x
= 0:8,
y
= 0:8
g
x
= 0 g
x
= 0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.016 0.009 0.024 0.021
g
y
= 0:5
h
y
= 0 0.007 0.008 0.012 0.016
g
y
= 0
h
y
= 0:5 0.007 0.005 0.010 0.010
g
y
= 0:5
h
y
= 0:5 0.007 0.002 0.007 0.006
Table 3.15: Method 2, VP3,
x
= 0,
y
= 0
g
x
= 0 g
x
= 0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.034 0.029 0.034 0.056
g
y
= 0:5
h
y
= 0 0.018 0.035 0.031 0.030
g
y
= 0
h
y
= 0:5 0.022 0.021 0.015 0.017
g
y
= 0:5
h
y
= 0:5 0.020 0.010 0.011 0.017
17
Table 3.16: Method 2, VP3,
x
= 0:8,
y
= 0
g
x
= 0 g
x
= 0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.020 0.020 0.023 0.033
g
y
= 0:5
h
y
= 0 0.019 0.022 0.021 0.030
g
y
= 0
h
y
= 0:5 0.017 0.013 0.015 0.015
g
y
= 0:5
h
y
= 0:5 0.015 0.013 0.008 0.017
Table 3.17: Method 2, VP3,
x
= 0,
y
= 0:8
g
x
= 0 g
x
= 0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.030 0.030 0.028 0.043
g
y
= 0:5
h
y
= 0 0.017 0.018 0.031 0.024
g
y
= 0
h
y
= 0:5 0.019 0.014 0.015 0.012
g
y
= 0:5
h
y
= 0:5 0.009 0.010 0.014 0.019
Table 3.18: Method 2, VP3,
x
= 0:8,
y
= 0:8
g
x
= 0 g
x
= 0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.022 0.013 0.022 0.030
g
y
= 0:5
h
y
= 0 0.016 0.018 0.021 0.025
g
y
= 0
h
y
= 0:5 0.019 0.006 0.011 0.013
g
y
= 0:5
h
y
= 0:5 0.009 0.009 0.007 0.011
18
Table 3.19: Method 2, VP4,
x
= 0,
y
= 0
g
x
= 0 g
x
= 0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.001 0 0 0.006
g
y
= 0:5
h
y
= 0 0.001 0.002 0.002 0.004
g
y
= 0
h
y
= 0:5 0.003 0.003 0 0.001
g
y
= 0:5
h
y
= 0:5 0.006 0.002 0.001 0.001
Table 3.20: Method 2, VP4,
x
= 0:8,
y
= 0
g
x
= 0 g
x
= 0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.002 0.002 0 0.002
g
y
= 0:5
h
y
= 0 0.005 0.003 0.001 0.001
g
y
= 0
h
y
= 0:5 0.006 0.004 0 0
g
y
= 0:5
h
y
= 0:5 0.005 0.005 0 0.002
Table 3.21: Method 2, VP4,
x
= 0,
y
= 0:8
g
x
= 0 g
x
= 0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.003 0 0.001 0.002
g
y
= 0:5
h
y
= 0 0.002 0.002 0.001 0.002
g
y
= 0
h
y
= 0:5 0.003 0.003 0 0.001
g
y
= 0:5
h
y
= 0:5 0.005 0.001 0 0
19
Table 3.22: Method 2, VP4,
x
= 0:8,
y
= 0:8
g
x
= 0 g
x
= 0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.001 0.003 0.001 0.002
g
y
= 0:5
h
y
= 0 0.005 0.003 0.001 0.001
g
y
= 0
h
y
= 0:5 0.002 0.003 0 0.001
g
y
= 0:5
h
y
= 0:5 0.003 0.005 0.001 0.002
Table 3.23: Method 2, VP5,
x
= 0,
y
= 0
g
x
= 0 g
x
= 0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.021 0.018 0.023 0.037
g
y
= 0:5
h
y
= 0 0.013 0.017 0.018 0.020
g
y
= 0
h
y
= 0:5 0.012 0.006 0.013 0.012
g
y
= 0:5
h
y
= 0:5 0.007 0.005 0.007 0.012
Table 3.24: Method 2, VP5,
x
= 0:8,
y
= 0
g
x
= 0 g
x
= 0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.019 0.011 0.018 0.028
g
y
= 0:5
h
y
= 0 0.011 0.011 0.023 0.014
g
y
= 0
h
y
= 0:5 0.010 0.005 0.008 0.012
g
y
= 0:5
h
y
= 0:5 0.010 0.002 0.003 0.011
20
Table 3.25: Method 2, VP5,
x
= 0,
y
= 0:8
g
x
= 0 g
x
= 0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.014 0.014 0.021 0.028
g
y
= 0:5
h
y
= 0 0.012 0.015 0.016 0.016
g
y
= 0
h
y
= 0:5 0.008 0.007 0.005 0.010
g
y
= 0:5
h
y
= 0:5 0.007 0.005 0.006 0.007
Table 3.26: Method 2, VP5,
x
= 0:8,
y
= 0:8
g
x
= 0 g
x
= 0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.018 0.009 0.019 0.024
g
y
= 0:5
h
y
= 0 0.008 0.011 0.018 0.016
g
y
= 0
h
y
= 0:5 0.012 0.003 0.005 0.011
g
y
= 0:5
h
y
= 0:5 0.010 0.002 0.006 0.010
21
3.2.3 Method 3
We tested Method 3 under VP1-VP5 with sample size n = 60, p = 3, q = 2,
x
= 0 and
y
= 0. Method 3 performs well under VP4, but appears to be too liveral under VP2-3
and VP5.
Table 3.27: Method 3, VP1,
x
= 0,
y
= 0
g
x
= 0 g
x
= 0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.054 0.041 0.041 0.050
g
y
= 0:5
h
y
= 0 0.050 0.040 0.034 0.046
g
y
= 0
h
y
= 0:5 0.057 0.046 0.042 0.051
g
y
= 0:5
h
y
= 0:5 0.064 0.050 0.042 0.054
Table 3.28: Method 3, VP2,
x
= 0,
y
= 0
g
x
= 0 g
x
0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.052 0.062 0.071 0.088
g
y
= 0:5
h
y
= 0 0.054 0.064 0.074 0.081
g
y
= 0
h
y
= 0:5 0.054 0.066 0.065 0.074
g
y
= 0:5
h
y
= 0:5 0.058 0.060 0.060 0.069
22
Table 3.29: Method 3, VP3,
x
= 0,
y
= 0
g
x
= 0 g
x
0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.075 0.095 0.098 0.078
g
y
= 0:5
h
y
= 0 0.083 0.094 0.105 0.102
g
y
= 0
h
y
= 0:5 0.075 0.069 0.078 0.089
g
y
= 0:5
h
y
= 0:5 0.069 0.077 0.078 0.080
Table 3.30: Method 3, VP4,
x
= 0,
y
= 0
g
x
= 0 g
x
0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.034 0.029 0.036 0.042
g
y
= 0:5
h
y
= 0 0.025 0.035 0.030 0.045
g
y
= 0
h
y
= 0:5 0.028 0.034 0.044 0.039
g
y
= 0:5
h
y
= 0:5 0.027 0.036 0.039 0.042
Table 3.31: Method 3, VP5,
x
= 0,
y
= 0
g
x
= 0 g
x
0:5 g
x
= 0 g
x
= 0:5
h
x
= 0 h
x
= 0 h
x
= 0:5 h
x
= 0:5
g
y
= 0
h
y
= 0 0.063 0.063 0.083 0.095
g
y
= 0:5
h
y
= 0 0.057 0.064 0.082 0.079
g
y
= 0
h
y
= 0:5 0.052 0.065 0.063 0.078
g
y
= 0:5
h
y
= 0:5 0.045 0.047 0.063 0.062
23
Chapter 4
Conclusion
During the study, we investigated three methods under the violation of normality and
homoscedasticity with small sample size. None of these methods performs well univer-
sally under all the cases.
Method 1 is robust when bothX
i
and
i
having skewed and heavy-tailed distribution.
And Table 3.7-3.11 show Method 1 one has little power under heteroscedasticity. For
Method 2, we tested dierent kinds of heteroscedasticity by using dierent (X) showed
in Table 3.2, VP2 through VP5. It appeares to be conservative in general, but too
conservative with VP4. In Method 3, we rened the bootstrap resampling technique,
and it provides an acceptable Type I error with VP4.
Therefore, the recommendation of testing methods should be based on dierent prop-
erties of heteroscedasticity.
24
Bibliography
[1] Peter J. Rousseeuw, Stefan Van Aelst, Katrien Van Driessen, Jose Agullo
(2004)Robust Multivariate Regression, Technometrics, Vol. 46, No. 3(Aug.,2004),
pp. 293{305.
[2] Ella Roelant, Stefan Van Aelst, Gert Willems (2009)The minimum weighted covari-
ance determinant estimator, Metrika (2009) 70:177C204
[3] Regina Y. Liu, Kesar Singh (1997)Notions of Limiting P Values Based on Data
Depth and Bootstrap, Journal of the American Statistical Association, Vol. 92,
No.437(Mar.,1997), pp. 266{277.
[4] Regina Y. Liu, Jesse M. Parelius, Kesar Singh (1999)Multivariate Analysis by Data
Depth Descriptive Statistics, Graphics and Inference, The Annals of Statistics,
Vol. 27 (1999), No.3, pp. 783{858.
[5] Efron, B. (1982)The Jackknife, the Bootstrap and Other Resampling Plan, Philadel-
phia:SIAM.
[6] Hoaglin, D.C.,(1985) Summarizing shape numerically: The g-and-h distributions.,
Exploring Data Tables, Trends and Shapes, Wiley.
[7] R. W. Butler, P. L. Davies, M. JhunSource(1993) Asymptotics for the Minimum
Covariance Determinant Estimator, The Annals of Statistics, Vol. 21, No. 3 (Sep.,
1993), pp. 1385-1400Published
[8] ROUSSEEUW, P. J. (1983)Multivariate estimation with high breakdown point,
Mathematical Statistics and Applications B (W. Grossmann, G. P
ug, I. Vincze
and W. Wertz, eds.) 283-297. Reidel, Dordrecht.
[9] Lopuha a HP, Rousseeuw PJ (1991) Breakdown points of ane equivariant estimators
o multivariate location andcovariance matrices, The Annals of Statistics 19:229C248
[10] Mardia KV, Kent JT, Bibby JM (1995) Multivariate analysis. Academic Press Ltd.,
London
25
[11] Pison G, Van Aelst S, Willems G.(2002)Small sample corrections for LTS andMCD,
Technical Report, Univ. of Antwerp
[12] Rousseeuw PJ, Van Driessen K (1999) A fast algorithm for theminimum covariance
determinant estimator, Technometrics, Vol.41, No.3, Aug., 1999
26
Abstract (if available)
Abstract
We introduce three nonparametric multivariate methods for testing the elements of the regression matrix. We investigate the nite-sample performance, robustness and heteroscedasticity of these methods. Our simulation results show that Method 1 performs well when the error term has a non-Gaussian distribution and there is homoscedasticity. Method 2 performs well when there is heteroscedasticity, but is conservative. Method 3 modies the second method on the side of conservativeness.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Asset price dynamics simulation and trading strategy
PDF
Large deviations rates in a Gaussian setting and related topics
PDF
Second order in time stochastic evolution equations and Wiener chaos approach
PDF
Statistical inference of stochastic differential equations driven by Gaussian noise
PDF
Robustness of rank-based and other robust estimators when comparing groups
PDF
Multi-population optimal change-point detection
PDF
Linear filtering and estimation in conditionally Gaussian multi-channel models
PDF
Automatic tracking of protein vesicles
PDF
Asymptotic problems in stochastic partial differential equations: a Wiener chaos approach
PDF
Gaussian free fields and stochastic parabolic equations
PDF
Tamed and truncated numerical methods for stochastic differential equations
PDF
On the non-degenerate parabolic Kolmogorov integro-differential equation and its applications
PDF
Evaluation of sequential hypothesis tests for cross validation of learning models using big data
PDF
On stochastic integro-differential equations
PDF
Non-parametric models for large capture-recapture experiments with applications to DNA sequencing
PDF
On the simple and jump-adapted weak Euler schemes for Lévy driven SDEs
PDF
Theoretical foundations of approximate Bayesian computation
PDF
Equilibrium model of limit order book and optimal execution problem
PDF
Statistical inference for second order ordinary differential equation driven by additive Gaussian white noise
PDF
Parameter estimation in second-order stochastic differential equations
Asset Metadata
Creator
Xu, Shanshan
(author)
Core Title
Non-parametric multivariate regression hypothesis testing
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Mathematics
Publication Date
11/22/2012
Defense Date
10/17/2012
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
hypothesis testing,multivariate,non-parametric,OAI-PMH Harvest
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Lototsky, Sergey V. (
committee chair
), Mikulevičius, Remigijus (
committee member
), Wilcox, Rand R. (
committee member
)
Creator Email
sasa.daniella@gmail.com,xiaomiannao@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-118232
Unique identifier
UC11292280
Identifier
usctheses-c3-118232 (legacy record id)
Legacy Identifier
etd-XuShanshan-1338.pdf
Dmrecord
118232
Document Type
Dissertation
Rights
Xu, Shanshan
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
hypothesis testing
multivariate
non-parametric