Page 21 
Save page Remove page  Previous  21 of 54  Next 

small (250x250 max)
medium (500x500 max)
Large (1000x1000 max)
Extra Large
large ( > 500x500)
Full Resolution
All (PDF)

This page
All

Proposition 2.2.1. If we use the least absolute deviations (LAD) criteria for the above sample Y, then we will get ˆμ = median{Y } Proof. According to LAD criteria, we try to minimize the sum of absolute errors: SAE = ,n i=1 $i = ,n i=1 yi − μ (SAE)% = ,n i=1 sign(yi − μ) The solution can be written as: ˆμ = ../ ..0 y( n+1 2 ) if n is odd, x,where y( n 2 ) < x < y( n+2 2 ) if n is even. where y(i) is the order statistics of Y (y(1) < y(2) < · · · < y(n)). The LAD solution is unique if and only if the sample size is odd. We can also see that the error term has the property: median({$i}) = 0. If we separate the sample Y by one factor (e.g. marriage status) into two levels (married and single), we can label the sample as Y = {y11, y12, · · · , y1k1, y21, y22, · · · , y2k2}, where k1 + k2 = n; y1i is the measured value from a married person; y2i is the measured value from a single person. We can decompose the sample as: yij = μ + 'i + $ij It states that each value yij is the sum of the global term (μ), the deviation due to the classification of marriage status ('i) and the random error term $ij . 13
Object Description
Title  Analysis of robustness and residuals in the Affymetrix gene expression microarray summarization 
Author  Ge, Huanying 
Author email  hge@usc.edu 
Degree  Master of Science 
Document type  Thesis 
Degree program  Statistics 
School  College of Letters, Arts and Sciences 
Date defended/completed  20080701 
Date submitted  2008 
Restricted until  Restricted until 19 June 2010. 
Date published  20100619 
Advisor (committee chair)  Li, Lei M. 
Advisor (committee member) 
Goldstein, Larry M. Chen, Liang 
Abstract  DNA microarray has been widely used in the field of functional genomics. The estimation of gene expression from microarray is a statistical problem where a lot of effort has been made. In this study, we focus on the summarization step of Affymetrix microarray preprocessing. We apply the Least Absolute Deviation (LAD) regression to estimate the probe and treatmentspecific effect in the widelytaken twofactor model. The median polish can be used as an approximation approach for the LAD regression in this twofactor summarization model. We show that the LAD estimator is robust in the sense that it has bounded influence where the bound is strongly associated with the RNA concentration. Furthermore, we calculate the influence bound and standard error which are used as the measure of accuracy for the logratio estimate. 
Keyword  LAD; microarray; robustness 
Language  English 
Part of collection  University of Southern California dissertations and theses 
Publisher (of the original version)  University of Southern California 
Place of publication (of the original version)  Los Angeles, California 
Publisher (of the digital version)  University of Southern California. Libraries 
Type  texts 
Legacy record ID  uscthesesm1278 
Contributing entity  University of Southern California 
Rights  Ge, Huanying 
Repository name  Libraries, University of Southern California 
Repository address  Los Angeles, California 
Repository email  cisadmin@lib.usc.edu 
Filename  etdGe20080619 
Archival file  uscthesesreloadpub_Volume32/etdGe20080619.pdf 
Description
Title  Page 21 
Contributing entity  University of Southern California 
Repository email  cisadmin@lib.usc.edu 
Full text  Proposition 2.2.1. If we use the least absolute deviations (LAD) criteria for the above sample Y, then we will get ˆμ = median{Y } Proof. According to LAD criteria, we try to minimize the sum of absolute errors: SAE = ,n i=1 $i = ,n i=1 yi − μ (SAE)% = ,n i=1 sign(yi − μ) The solution can be written as: ˆμ = ../ ..0 y( n+1 2 ) if n is odd, x,where y( n 2 ) < x < y( n+2 2 ) if n is even. where y(i) is the order statistics of Y (y(1) < y(2) < · · · < y(n)). The LAD solution is unique if and only if the sample size is odd. We can also see that the error term has the property: median({$i}) = 0. If we separate the sample Y by one factor (e.g. marriage status) into two levels (married and single), we can label the sample as Y = {y11, y12, · · · , y1k1, y21, y22, · · · , y2k2}, where k1 + k2 = n; y1i is the measured value from a married person; y2i is the measured value from a single person. We can decompose the sample as: yij = μ + 'i + $ij It states that each value yij is the sum of the global term (μ), the deviation due to the classification of marriage status ('i) and the random error term $ij . 13 