Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Genomic variant analysis in whole blood of recurrent early-onset major depression patients and differential expression in BA25 of suicide completers with major depression
(USC Thesis Other)
Genomic variant analysis in whole blood of recurrent early-onset major depression patients and differential expression in BA25 of suicide completers with major depression
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
GENOMIC VARIANT ANALYSIS IN WHOLE BLOOD OF RECURRENT EARLY-ONSET MAJOR DEPRESSION PATIENTS AND DIFFERENTIAL EXPRESSION IN BA25 OF SUICIDE COMPLETERS WITH MAJOR DEPRESSION by Emily Anne Chen A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements of the Degree DOCTOR OF PHILOSOPHY (GENETIC, MOLECULAR AND CELLULAR BIOLOGY) August 2015 Copyright 2015 Emily Anne Chen ii Dedication This dissertation is dedicated to my parents for their unwavering love and support. iii Acknowledgements First and foremost, I would like to thank my mentor Dr. James Knowles. Your knowledge and guidance have helped me in more ways than I can name. The time and effort you have invested in me is immensely appreciated. Thank you for your determination and passion for this field, for through it you have helped me to understand the critical importance of psychiatric research. I would also like to thank Dr. Oleg Evgrafov. Your acumen and critical thinking will always be appreciated. Thank you for all the times you have imparted your knowledge and pointed me in the right direction. Thank you to my committee members Dr. Carlos Pato and Dr. Zoltan Tokes. Your helpful input and attention to detail have guided me solidly through the years. I am grateful to you for your time and words of wisdom. I also wish to thank all the past and present members of the Knowles lab. I will miss your camaraderie. Thank you as well to my closest friends over these last years, especially Trisha Staab and Regina Tompkins. Words cannot express what your friendship means to me. I also would like to thank Erik Andersen for his support and patience throughout the years. My gratitude is immeasurable. Last but certainly not least, I would like to thank my parents, my sister, and my grandmother for always believing in me and for their unconditional love. Without you, I would not be where I am today. iv Table of Contents Dedication ........................................................................................................................... ii Acknowledgements ............................................................................................................ iii List of Figures .................................................................................................................... vi List of Tables .................................................................................................................... vii Abstract ............................................................................................................................ viii Chapter 1: Introduction 1.1 Background and significance of Major Depressive Disorder ........................................1 1.2 Antidepressants ..............................................................................................................2 1.3 Heritability .....................................................................................................................4 1.4 Next-generation sequencing technology ........................................................................5 Chapter 2: Genomic Investigation of Recurrent Early-Onset Major Depressive Disorder 2.1 Introduction ....................................................................................................................9 2.1.1 Preliminary studies ..............................................................................................10 2.2 Materials and Methods 2.2.1 The Genome Analysis Toolkit ............................................................................14 2.2.2 Custom enrichment and sequencing of chromosome 15q25-26 .........................15 2.2.3 Exome sequencing of Major Depressive Disorder cases ....................................18 2.3 Results 2.3.1 MFGE8 variant in targeted sequencing cases .....................................................22 2.3.2 Novel KIF23 splice site variant in exome cases .................................................25 2.4 Discussion ....................................................................................................................32 2.5 Acknowledgements ......................................................................................................32 Chapter 3: Differential Expression in Human Brains of Depressed Suicide Completers 3.1 Introduction ..................................................................................................................33 3.2 Materials and Methods 3.2.1 RIN-dependent methods for library preparation .................................................34 3.2.2 Genome and Transcriptome Free Analysis of RNA ...........................................41 3.2.3 Dissection of brains and preparation of samples ................................................43 3.3 Results 3.3.1 Dysregulation in BA25 strongly associated with depression .............................53 3.3.2 GZMK, HOXB3, PROX1 are dysregulated in BA25 ...........................................64 3.3.3 BTBD6 upregulated in BA25 with 82 eQTLs in LD with trending variant ........68 3.3.4 GDNF downregulated in BA25 ..........................................................................69 3.3.5 Type I interferon signaling gene OAS1 downregulated in BA25 .......................72 v 3.3.6 FDR-corrected significant gene RP4-539M6.19 downregulated in BA25 .........73 3.4 Discussion ....................................................................................................................77 3.5 Acknowledgements ......................................................................................................78 Chapter 4: Merging of genetic and transcriptomic approaches 4.1 TMED9 downregulated in BA25 and contains intronic insertion in exome cases ......79 4.2 Discussion ....................................................................................................................81 Chapter 5: Summary and Future Directions 5.1 Genomic variants in Recurrent Early-Onset Major Depressive Disorder ...................83 5.2 Differential expression in BA25 of depressed suicide completers ..............................84 5.3 Genetic and transcriptomic overlap in depression .......................................................85 5.4 Conclusion ...................................................................................................................86 References ..........................................................................................................................88 vi List of Figures 1.1 Illumina HiSeq 2000 flow cell .......................................................................................7 1.2 Illumina instrumentation ................................................................................................8 2.1 Genome scan results of GenRED cases .......................................................................13 2.2 Linkage fine mapping of chromosome 15q .................................................................13 2.3 Nonparametric LOD scores for 18 largest linked GenRED families ..........................17 2.4 Agilent 2100 TapeStation ............................................................................................21 2.5 Covaris S2 sonicator ....................................................................................................21 2.6 Chromosome 15q targeted region ................................................................................23 2.7 Adult human MFGE8 ISH in brain ..............................................................................24 2.8 MFGE8 expression in human brain development .......................................................25 2.9 Principal Components Analysis of 96 cases and 168 controls ....................................28 2.10 Principal Components Analysis of 96 exomes, 168 controls, 4 African- American cases, and 7 African-American controls ...................................................28 2.11 KIF23 expression in human brain development ........................................................29 2.12 Single representative case and control with novel KIF23 splice site variant ............31 3.1 Heat degradation of intact RNA into three successively lower quality samples .........35 3.2 Effect of RNA integrity on gene expression correlations with untreated RNA using the same sample preparation protocol ......................................................39 3.3 Effect of RNA integrity on gene expression correlations between different sample preparation protocols .......................................................................................39 3.4 Genome and Transcriptome Free Analysis of RNA ....................................................42 3.5 Lateral and medial surfaces of human brain ................................................................45 3.6 NuGEN Ovation RNA-Seq FFPE System protocol.....................................................47 3.7 Illumina TruSeq DNA Sample Preparation kit v2 protocol .........................................48 3.8 Principal Components Analysis of four brain regions .................................................54 3.9 Number of nominally significant differentially expressed genes with fold change of 2 in brain regions .........................................................................................55 3.10 Benjamini-Hochberg adjusted p-values of top canonical IPA pathways ...................56 3.11 Top canonical IPA pathways in BA25 .......................................................................58 3.12 BA25 nervous system networks with interferon signaling genes connected .............60 3.13 Principal Components Analysis of 18 MDD cases and 17 controls ..........................63 3.14 Nervous system network of BA25 dysregulated genes .............................................65 3.15 GZMK expression in human brain development .......................................................66 3.16 HOXB3 expression in human brain development ......................................................66 3.17 PROX1 expression in human brain development ......................................................67 3.18 BTBD6 expression in human brain development .......................................................69 3.19 GDNF expression in human brain development. .......................................................71 3.20 OAS1 expression in human brain development .........................................................73 3.21 RP4-539M6.19 expression in human brain development ..........................................75 3.22 Genes connected to RP4-539M6.19 ..........................................................................76 vii List of Tables 2.1 Number of DNA-Seq cases and controls with each genotype .....................................29 3.1 R between degraded and intact sample for each protocol and total reads for each sample ...................................................................................................40 3.2 University of California Irvine/Davis Brain Depository brain details .........................50 3.3 National Institute of Mental Health Brain Collection Core brain details ....................52 3.4 Eight Type I IFN signaling genes from Mostafavi et al. connected to merged networks .......................................................................................................................61 3.5 Genes within ± 300 kb of a GWAS SNP .....................................................................67 4.1 Overlap in genes between DNA-Seq variation and RNA-Seq dysregulation ..............81 viii Abstract Major depressive disorder (MDD) is one of the most challenging psychiatric disorders to study. It is a polygenic disorder that has a lifetime incidence of 10-15%, with women affected twice as often as men, for unknown reasons (Bierut, Heath et al. 1999). The high lifetime prevalence of this non-infectious but debilitating disorder necessitates progress in this field so that better therapeutic drugs can be developed to more effectively treat patients. Genome-wide association studies (GWASs) on major depression have yet to produce significant findings. This is likely because the effects of individual loci are small and/or because the heterogeneity of the disease is high. The involvement of non-genetic factors such as environmental variables is an additional complicating factor. As a result, the deciphering of the genetic architecture is difficult. While both genes and the environment are factors in the development of MDD, this project will focus on the genetics of the disorder. The long-term objective of this project is to understand the genetic basis of MDD and it is our goal to accomplish this by using next-generation sequencing to discover the genes that play a role in the development of this disorder. We will approach this problem from both genomic and ix transcriptomic angles to combine the study into one integrated analysis. It is hoped that the study of the genome and transcriptome will elucidate our understanding of the cause of MDD. No valid biological markers or clear-cut etiologic pathways have been discovered yet, so there is currently no method to predict or biologically test for this disease. The primary variant of interest from the genomic sequencing results resides in a natural splice site in KIF23, a gene involved in microtubule-dependent movement of organelles within the cell. An intronic insertion in the majority of cases in TMED9, which is involved in vesicular protein trafficking, was also discovered to be a potential predisposing variant. From the transcriptomic results, hundreds of genes were found to be dysregulated. However, a select few were of interest due to their distinct biological roles. GZMK, HOXB3, PROX1, BTBD6, GDNF, OAS1, RP4-539M6.19, TMED9 may each have an impact on the disease state through their dysregulation in brain tissue, namely in the BA25 region. 1 Chapter 1: Introduction 1.1 Background and significance of Major Depressive Disorder MDD is a complex mood disorder that has been referred to as the epidemic of our time (Savitz and Drevets 2009). It is characterized by a period of at least two weeks of five of the following nine symptoms: depressed mood; anhedonia; depressed appetite or significant unexpected weight loss; insomnia or hypersomnia; motor agitation or retardation; fatigue or loss of energy; feelings of worthlessness or excessive/inappropriate guilt; the inability to think or concentrate (or indecisiveness); or recurrent thoughts of death, recurrent suicidal ideation, or a suicide attempt or plan. Either depressed mood or anhedonia must be one of the symptoms (APA 2013). It is the leading cause of medical disability for individuals ages 14 to 44 (Stewart, Ricci et al. 2003) and approximately 14.8 million American adults are affected (Kessler, Chiu et al. 2005). The cost of this disorder, measured both in lost productivity and medical expenses, was $83 billion per year as of 2000 (Greenberg, Kessler et al. 2003). MDD is considered to be a major suicide precursor by the World Health Organization. Its lethality is quantifiable by more than 35,000 people 2 in the U.S. and more than a million people worldwide completing suicide annually (Joiner 2011). This translates to one death every 40 seconds and a predicted one death every 20 seconds by 2020. It is estimated that approximately half of all suicides from MDD are preventable and the numbers underscore the importance in developing proper treatments. While not everyone who completes suicide suffered from depression while living, and not everyone who is depressed will complete suicide, there are both shared and distinct neurobiological risk factors between depression and suicide, which will need to be deciphered by researchers. 1.2 Antidepressants To treat a disorder, we need to understand its etiology. For MDD, however, the etiology is unknown. There are many theories as to the cause of the disorder with the main theory arising from the serendipitous discovery of the first antidepressant. In the 1950s, iproniazid, a drug originally intended to treat tuberculosis, was discovered to have mood-elevating effects (White, Walline et al. 2005). This discovery led researchers to focus on the mood neurotransmitters such as serotonin 3 as they relate to the development of depression. The most common theory of why depression arises is that serotonin levels are low in depressed individuals. There are three problems with this theory. The first problem involves timing. If low serotonin levels are what make an individual feel depressed, then antidepressants that increase synaptic levels of serotonin should alleviate depression symptoms immediately. However, antidepressants can sometimes take more than 4 weeks for patients to start showing response (Blier 2009). The second problem concerns whether or not the drugs actually work. Any one antidepressant leads to remission in about a third of patients, while the other two thirds of individuals spend the majority of time shuffling from one prescription to another, trying to find the optimal drug (Sussman 2010). The largely heterogeneous genetic makeup of individuals may also be responsible for the phenotype of the group of patients known as “treatment-resistant” patients who do not respond to any combination of treatments, including electroconvulsive therapies (Schlaepfer, Cohen et al. 2007). The third problem is a lack of evidence that shows low serotonin levels are responsible for depression. It was discovered that when serotonin was depleted in a normal individual, depression did not develop (Delgado, Price et al. 1994). Similarly, when serotonin was depleted 4 in a depressed individual, the pre-existing depression did not worsen. Overall, studies looking at serotonin levels being linearly related to depression have not been consistently replicated. The current antidepressants on the market today leave the majority of patients with residual symptoms and serious side effects. In patients with mild or moderate depression, it has been shown that antidepressants have little to no benefit (Fournier, DeRubeis et al. 2010). It is in patients with severe depression that the pharmacologic effect is superior to that of placebos. So while it is possible that serotonin may be one of the predisposing factors, it appears that low levels are not causative of depression. 1.3 Heritability Although the etiology of MDD is unknown, what is known is that it is heritable. Twin studies have shown the heritability to be about 40% (Shi, Potash et al. 2010), highlighting the existence of a genetic aspect to this disease. Monozygotic and dizygotic twins are used because it is assumed that pairs of twins are likely to be exposed to the same environmental factors up to the appearance of the first 5 depressive episode. Therefore, any differences in phenotypes should reflect the action of genetic variation. The relative risk of major depression to first-degree relatives is 2-3, with an increased risk when the probands have recurrent episodes or experience an age of onset in the 30s or earlier (Holmans, Weissman et al. 2007). Relative risk refers to the ratio of the likelihood of a family member of a proband developing depression to the likelihood of controls in the general population developing depression. Environmental factors also play a role. The complex background of life variables ranges from stress to traumatic events such as abuse or death. The potential experiences are unlimited, making gene by environment studies difficult. With a discrete amount of data in the human genome, we can focus on finding the genes that predispose an individual to develop depression. 1.4 Next-generation sequencing technology The pathway by which we start to delve into the analysis of the human genome is through next-generation sequencing of the genomic DNA (gDNA). The transcriptome can also be studied by sequencing the complementary DNA (cDNA). 6 Using an Illumina cBot, the libraries consisting of the regions of interest are turned into clonal clusters that are bound to an Illumina HiSeq 2000 flow cell (Figure 1.1). The flow cell has eight channels for multiple samples. The inside surface of the glass flow cell is coated with a gel matrix that has cross-links of specific adaptors that are complementary to the oligonucleotides on the libraries. After cluster generation, every cluster is ~1,000 copies of an identical template. The flow cell containing hundreds of portions of the same genome is then loaded onto the Illumina HiSeq 2000 for paired-end or single-end sequencing (Figure 1.2). Paired-end reads are generated by including two different sequencing primer sites attached to both adaptor ends, so that the insert can be sequenced from both ends of the fragment. Single-end reads are sequenced in one direction only. The HiSeq 2000 uses Illumina’s Sequencing by Synthesis (SBS) technology to perform parallel sequencing, one base at a time. A solution of fluorescent, reversibly-terminated nucleotides is introduced into the channels of the flow cell. When the base has been incorporated to the end of the primer sequence, a laser is activated to excite the fluorescently-labeled dye. A series of four images are then taken in order to determine the incorporated base. A cleavage reagent is added to remove the fluorescently-labeled dye and the blocking 7 group that was previously added, so that the next base can be added to the fragment. This process of chemistry, laser excitation, and imaging represents 1 cycle in the SBS technology. This process is generally repeated 101 times to constitute a read of 101 base pairs. Figure 1.1 Illumina HiSeq 2000 flow cell. The flow cell contains eight channels for loading of multiple samples. The inside surface of the glass flow cell is coated with a gel matrix that has cross-links of specific adaptors that are complementary to the oligonucleotides on the libraries. 8 Figure 1.2 Illumina instrumentation. In-house Illumina cBots (top field) and two Illumina 2000 sequencers and one Illumina 2500 sequencer (bottom field). 9 Chapter 2: Genomic Investigation of Recurrent Early-Onset Major Depressive Disorder 2.1 Introduction Neighboring genes on a chromosome tend to be inherited together during meiosis. The closer the loci are to each other (known as being “linked”), the greater the chance is that they will not separate during chromosomal crossover. This concept is used in hereditary research. If depression is passed on to offspring along with specific markers, then we can conclude that the disease loci are found close to the marker loci. In determining the disease loci of recurrent early-onset major depressive disorder (RE-MDD), we look for variants that segregate in cases within families (or singleton cases) and that are either absent or seen at a low frequency in controls. Previously completed studies have used microsatellites and single nucleotide polymorphisms (SNPs) as marker loci in linkage families and have discovered chromosomal regions of interest as a result. 10 2.1.1 Preliminary studies A genome-wide linkage scan known as the Genetics of Recurrent Early- Onset Major Depression (GenRED) study (coordinated by Doug Levinson at Stanford and in conjunction with James Knowles and other sites) contains families with multiple cases of RE-MDD. Initial results observed suggestive evidence for linkage on chromosome 15q. After fine-mapping, genome-wide significant evidence for linkage was discovered on chromosome 15q25.3-26.2 (Levinson, Evgrafov et al. 2007). The GenRED study consists of 656 families with multiple cases of RE-MDD (depression onset before age 31 in probands and before age 41 in relatives). The probands that were enrolled were required to have at least two major depressive episodes in their lifetimes that continued past age 18 or one episode that lasted 3 or more years that resulted in major role impairment (Holmans, Weissman et al. 2007). All of the individuals in the families were genome-wide linkage scanned using 418 microsatellite DNA markers on every chromosome at a mean spacing of 9 centimorgans (cM) to find the locations where affected individuals inherited the 11 same sequence variants more often than was expected by chance. In the first wave of the study (297 families), the researchers found suggestive evidence for linkage on chromosome 15q using the multipoint linkage analysis program ALLEGRO. This corresponded with a Z likelihood ratio score of 4.31 for the European families and 3.93 for all ethnicities (Kong and Cox 1997). In the first and second wave combined (656 families), genome-wide suggestive linkage was found on chromosome 15q25- 26 with a Z likelihood ratio score of 3.43 for the European families (631 families) and 3.05 for all ethnicities (Figure 2.1). Using 88 SNPs, the first wave of European families were fine-mapped in the region of chromosome 15q25-26 (corresponding to 87,270,232 - 97,270,232 base pairs on hg19) and genome-wide significant evidence for linkage was observed (Levinson, Evgrafov et al. 2007). In a second wave, another group of European families were fine-mapped for the same region and replicative support for linkage was observed, but was not significant by itself. When the two waves were combined for the 631 European families, genome-wide significant evidence was achieved again, corresponding to a Z likelihood ratio score of 4.69 (Figure 2.2). Two other independent studies have supported this linkage finding on chromosome 15q. A 12 study of Utah pedigrees found their greatest evidence for linkage in males at 121 deCODE cM (LOD=2.88), which was 12.4 cM from the GenRED fine-mapping peak at 109.6 cM (Camp, Lowry et al. 2005). The Depression Network (DENT) study discovered their fifth largest peak to be at 88.2 deCODE cM (LOD=1.14) (McGuffin, Knight et al. 2005). Based on these results, our aim was to look further at chr15q25-26 using next-generation sequencing to find possible disease loci in that region that might contribute to depression susceptibility. We sequenced chr15q25-26 in one set of samples and the whole exome and untranslated region (UTR) in a larger set of samples. 13 Figure 2.1 Genome scan results of GenRED cases. (Holmans, Weissman et al. 2007). The Z likelihood ratio scores are plotted throughout the genome. Figure 2.2 Linkage fine mapping of chromosome 15q. (Levinson, Evgrafov et al. 2007). The Z likelihood ratio scores are plotted throughout the genome for wave 1, wave 2, and all families. Information content and physical map are shown as well. 14 2. 2 Materials and Methods 2.2.1 The Genome Analysis ToolKit For all DNA-Sequencing (DNA-Seq) in this project, the pipeline used was as follows: The DNA libraries were sequenced on the Illumina HiSeq 2000 using single end 101 base pair reads, CASAVA v1.8.4 was used to create FASTQ files from the sequencing data, Burrows-Wheeler Aligner (BWA) was used to create BAM files containing the sequence alignment data, and Picard sorted and marked any duplicates which may have arisen from library preparation. Next, the Genome Analysis Toolkit (GATK), developed by the Broad Institute, was used to create Variant Call Format (VCF) files (McKenna, Hanna et al. 2010). This package receives raw reads in FASTQ formats from the sequencer and produces high-quality variant calls. There are various tools within GATK that allow for the analysis of high-throughput sequencing data. Coverage calculators and SNP callers are particularly useful in the analysis of large datasets. For the purposes of analyzing our dataset, five tools within GATK were used: 1) Indel Realigner to locally realign reads that may have misaligned because of the presence of indels, 2) BaseRecalibrator to recalibrate the 15 base quality score based on numerous covariates that the user dictates (e.g. machine cycle, nucleotide context), 3) HaplotypeCaller to call SNPs and indels, 4) GenotypeGVCFs to merge gVCF files produced by Haplotype Caller into a single combined VCF file, and 5) Variant Quality Score Recalibrator to assign a probability to each call, allowing for the most accurate calls possible in the final VCF file. 2.2.2 Custom enrichment and sequencing of chromosome 15q25-26 Of the top 18 GenRED families with the highest linkage to chr15q25-26, the 15 families with the highest nonparametric LOD (NPL) score were selected (Figure 2.3). One individual from each family who was maximally Identical by Descent 1 (IBD1) was then selected in order to look for genes in that region that contribute to disease susceptibility. IBD is a term that describes matching segments of DNA between two people that have been inherited from the same common ancestor. Six screened European-American controls greater than the age of 50 who did not endorse any symptoms of MDD or tobacco use were then chosen. Using the SureSelect Target Enrichment System, all of these cases and controls were sequenced at 16 chromosome 15 between 87,270,232 - 97,270,232 base pairs on hg19 at a depth of ~325X. The workflow consisted of shearing the gDNA on a Covaris S2 instrument and then constructing libraries for each sample. Libraries were created by blunting the DNA ends, adding an A-tail and adaptors, size selecting for approximately 300 base pair fragments, and then amplifying. The DNA fragments from the libraries were selectively hybridized to the target specific baits for ~44 hours at 65°C using the SureSelect protocol. Streptavidin magnetic beads were used to isolate the region of interest through the binding of the biotinylated oligonucleotides and DNA fragments. The captured DNA was then eluted off the beads with high salt and then amplified. The target-enriched region was clustered on the Illumina cBot and then DNA-Seq was performed with paired-end 101 base reads on an Illumina HiSeq 2000 sequencer. The bases were called in real-time and then CASAVA was used to create FASTQ files, followed by quality control measures. The reads were mapped to hg18 using BWA. Duplicates were removed and the variants were called using SAMtools. Lastly, annotation was performed with Sequence Variant Analyzer (SVA) (hg18, dbSNP128). 17 Figure 2.3 Nonparametric LOD scores for 18 largest linked GenRED families. The 18 families are split into two panels of 9 families each for clarity. 18 2.2.3 Exome sequencing of Major Depressive Disorder cases To determine if the variants from the targeted sequencing results are true positives, data from an increased number of samples was necessary. We decided to sequence 96 additional GenRED cases. Based on the number of samples planned for sequencing as well as the prices of various library preparation kits, it was more cost- efficient to sequence the entire exome rather than only the ~10Mb region of interest on chr15q. The Illumina TruSeq DNA Sample Prep v2 kit followed by the Roche NimbleGen SeqCap EZ Exome+UTR capture kit were the protocols selected. From the first wave of 484 Caucasian GenRED families, the 15 highest linked families were excluded as these were already screened for the region of interest. The genotypes at 883 SNPs for these families was input into PLINK in order to estimate pairwise IBD for each pair of affected full siblings (Purcell, Neale et al. 2007). The results were ranked by PI-HAT, where PI-HAT = (P(IBD=2) + 0.5 * P(IBD=1), a variable that represents the degree of relatedness between pairs. One sibling was arbitrarily selected from each of the 96 affected pair of full siblings with the highest PI-HAT (ranging from 0.9985 to 0.8261). 19 Sequencing data for the controls was obtained through the Genomic Psychiatry Cohort (GPC). From 364 Caucasian controls, those who identified with being Hispanic/Latino were removed in order to maintain as ethnically homogenous a sample set as possible. Individuals who answered yes to any of 4 depression endorsement questions were also removed. The questions were as follow: 1) Have you ever felt depressed, down, sad, or blue for most of the day, nearly every day for 2 weeks or more? 2) Did you ever have a period of 2 weeks or more when you lost most or all interest in your normal activities? 3) During that same time (either 1 or 2), did you also have feelings of worthlessness, or feel too much guilt, or spend a lot of time thinking about death or dying, or have thoughts about hurting or killing yourself? 4) During that same time (either 1 or 2), did you experience a significant change in your appetite, have unplanned weight gain or loss, experience changes in your normal sleep pattern, or have difficulties concentrating? From this filtering, 168 controls remained. They were whole genome sequenced at ~30X, as part of the larger GPC sequencing project. The gDNA for each of the 96 case samples was run on the Agilent 2200 TapeStation to obtain their concentrations (Figure 2.4). For each sample, 1 ug of 20 gDNA was sheared to ~300 base pairs using the Covaris S2 (Figure 2.5). The sheared DNA was input into the Illumina TruSeq DNA Sample Prep v2 kit, where the protocol consists of end repair, 3’ adenylation, adapter ligation, and amplification. These DNA libraries were then input into the Roche NimbleGen SeqCap EZ Exome+UTR capture kit, which begins with the hybridization of the sequencing library to the exome+UTR oligonucleotide pool, followed by bead capture, washing, and ending in amplification. These 96 exome+UTR samples were sequenced using single-end 101 base pair reads on the Illumina HiSeq 2000. Coming off the sequencer, the DNA pipeline was as in Section 2.2.1 and a VCF file was produced. Using the GATK tool SelectVariants, the VCF file for the control data was filtered to keep only the variants present in the exome+UTR region. The two files were merged using the GATK tool GenotypeGVCFs. 21 Figure 2.4 Agilent 2100 TapeStation. This system performs quantification of RNA, DNA and protein by retrieving concentration and RIN (for RNA samples). Figure 2.5 Covaris S2 sonicator. This ultrasonicator utilizes sound waves to shear DNA, RNA, and chomatin as well as homogenize tissues, lyse cells, dissolve compounds, and micronize particles according to user-specified sizes. 22 2.3 Results 2.3.1 MFGE8 variant in targeted sequencing cases The on-target rate of probe hybridization was ~82%. Using SVA, visual confirmation of the correct captured region was employed (Figure 2.6). Integrative Genomics Viewer (IGV) was used to look at each SNP manually to see if it was correctly called, removing the ones that appeared to be low quality bases (Robinson, Thorvaldsdóttir et al. 2011). The remaining nonsynonymous SNPs (nsSNPs) were filtered by absence in dbSNP128, keeping only novel SNPs. Variants that were present in controls were omitted. Three different protein prediction programs, SIFT, PolyPhen-2, and SVA were used to determine which SNPs are predicted to be deleterious. (Ng and Henikoff 2001, Adzhubei, Schmidt et al. 2010, Ge, Ruzzo et al. 2011). If at least one program returned a result of “damaging,” that SNP was retained in the analysis. The last filtering method was segregation within pedigrees. This was done by determining whether or not the SNP is on the shared chromosome. Forward and reverse primers were designed for each SNP and then the PCR product of the variant was capillary sequenced (GENEWIZ) for each individual 23 Figure 2.6 Chromosome 15q targeted region. The targeted region of ~87Mb to ~98Mb on chromosome 15q (hg19) was captured. The mapped read depth was ~325X. in the family in which that SNP was initially observed. If a SNP was found to be on the non-shared chromosome, it would most likely not be involved in the transmission of the disorder to affected individuals. Fourteen nsSNPs remained, with none having perfect segregation. The three best candidates were the genes MFGE8, WDR93 and MCTP2. MFGE8 (milk fat globule-EGF factor 8) is of particular interest because of its consistent intolerance prediction throughout all three protein prediction programs as well as its high segregation. This gene is involved in the maintenance of intestinal epithelial homeostasis and helps with mucosal healing as well as phagocytic removal of apoptotic cells (Fricker, Neher et al. 2012). An MFGE8 variant changing a leucine to a proline was observed in 5 out of 6 affected individuals in one family. Another MFGE8 variant changing a phenylalanine to a valine was observed in 3 out of 5 24 affected individuals in a different family. Imaging from BrainSpan: Atlas of the Developing Human Brain shows in an situ hybridization (ISH) image that it is expressed in layer 3 pyramidal neurons in the dorsolateral prefrontal cortex (DLPFC) (Figure 2.7, BrainSpan 2011) as well as throughout development (Figure 2.8). Figure 2.7 Adult human MFGE8 ISH in brain. In adults, MFGE8 is highly expressed in layer 3 pyramidal neurons in the DLPFC. 25 Figure 2.8 MFGE8 expression in human brain development. (BrainSpan). Top field is sorted from 8 weeks postconception to 40 years in adulthood. Second field represents brain regions. Third field represents gene expression for MFGE8. Also shown are the 28 most highly correlated genes and their gene expressions per region and developmental stage. 2.3.2 Novel KIF23 splice site variant in exome cases A larger sample set was exome sequenced in an attempt to confirm and replicate data from the targeted sequencing results. To this end, 96 cases and 168 controls were selected, libraries prepared, and sequenced. Principal components analysis (PCA) is a covariance tool that transforms the data set to the principal components in order to retain the data characteristics that contribute most to the variance (Yeung and Ruzzo 2001). PCA was performed in 26 Partek Genomics Suite on the case and control exome+UTR variant normalized counts (Figure 2.9). Six cases clustered away from the core of the samples. In an attempt to determine if these were due to errors in filing of ethnicity, eleven African- American samples were included in the PCA to see if the six exome+UTR cases grouped with them (Figure 2.10). In fact, the African-American samples clustered away from the exome+UTR cases and controls, pulling five of those six cases back into the core set of samples. While it was concluded that these were not African- American samples mislabeled as Caucasian, they were different enough from the majority of the samples that they were removed from the analysis. For the final 90 cases and 168 controls, the VCF file consisted of over 184,000 variants, which needed to be filtered down to a more manageable number, retaining only the potentially interesting variants. The percentage of samples for each genotype was first determined. The variants with at least a 60% difference between cases and controls for each genotype were retained. The resulting 175 variants consisted of 30 in the untranslated region, 25 in the exonic region, 17 in the intergenic region, and 103 in the intronic region. The SKAT-O package, which allows for testing of any association present between variants and phenotypes, had 27 beta (1,25) weighted p-values for 9 of these variants fall below 0.05 (Lee, Emond et al. 2012). Twenty-four p-values were above 0.05, and the remaining were indeterminate. Those variants of the 175 total variants with a strong disparity between number in cases and controls were retained, leaving 10 intronic variants. One of these variants, an insertion in the majority of the cases, resides within a natural/authentic splice site at chr15:69715488 (hg19) between exons 6 and 7a of KIF23 (Figure 2.11, Figure 2.12), a gene involved in microtubule-dependent transport of organelles in cells as well as movement of chromosomes during cell division (“KIF23 Gene”). The allele frequencies at this locus are 27% C and 73% deletion in the 1000 Genomes Project. The genotype of 72 of the 90 cases was a novel homozygous T/T (homozygous being defined as at least 80% T in the reads) and 18 were heterozygous T/deletion (less than 80% T). For the controls, 165 of the 168 controls were heterozygous T/deletion and 3 were homozygous T/T (Table 2.1). The chi-squared p-value for this distribution was less than 0.001. Capillary sequencing of two homozygous T/T samples confirmed the variant was an accurate call and not the result of a sequencing error. In addition, KIF23 is a gene with a 28 Residual Variation Intolerance Score (RVIS) of -0.11 (34.18%), indicative of intolerance to variation (Petrovski, Wang et al. 2013). Figure 2.9 Principal Components Analysis of 96 cases and 168 controls. Cases are in blue and controls are in green. Six cases cluster away from the core (far left sample superimposed on top of another sample). Figure 2.10 Principal Components Analysis of 96 exomes, 168 controls, 4 African-American cases, and 7 African-American controls. 90 exome+UTR cases are in blue and 168 Caucasian controls are in green. African-American cases in fuschia and African-American controls in aqua. 29 Homozygous T/T Heterozygous T/del Cases 72/90 18/90 Controls 3/168 165/168 Table 2.1 Number of DNA-Seq cases and controls with each genotype. The criteria for homozygosity was at least 80% of the reads and for heterozygosity, below 80%. Figure 2.11 KIF23 expression in human brain development. (BrainSpan). Top field is sorted from 8 weeks postconception to 40 years in adulthood. Second field represents brain regions. Third field represents gene expression for KIF23. Also shown are the 28 most highly correlated genes and their gene expressions per region and developmental stage. 30 The natural acceptor and donor splice sites are the default sites for removal of introns so exons can be joined together. The donor site is located at the 5’ end of the intron and the acceptor site is at the 3’ end of the intron. Small nuclear ribonucleoproteins bind to various sites along the introns. U1 binds to an invariant GU dinucleotide at the beginning of the 5’ splice site. U2 binds to a highly conserved A base in the branchpoint, which is just upstream of a polypyrimidine tract. Downstream of the polypyrimidine tract is the 3’ splice site, the last two bases of which are an invariant AG. U4, U6, U5, and multiple small associated proteins also bind, allowing the 5’ splice site to attach to the branchpoint. A lariat structure is then formed, which is cleaved from the 3’ splice site so the exons may join through ATP hydrolysis. When mutations occur in these natural splice sites, cryptic splice sites are then usually employed. Therefore, the variant in this natural acceptor splice site is highly suggestive of potential altered splicing, which may be partially responsible for the disease phenotype. 31 Figure 2.12 Single representative case and control with novel KIF23 splice site variant. Screenshot from GoldenHelix GenomeBrowse of representative case and control BAM files zoomed in to region of interest. The chr15:69715488 (hg19) locus is 10 base pairs from the start of exon 7a. The genotype of the case is homozygous T/T (top field) and the genotype of the control is heterozygous T/deletion (bottom field). Long blue and green bars are the reads mapping to each strand. 32 2.3 Discussion Genes are structured in such a way that exons are separated by introns, which must be removed following transcription. This specific architecture denotes the importance of accurate splicing. Mutations that reside in natural splice sites often have deleterious, disease-causing effects. It has been estimated that 15% of disease- causing point mutations are found to result in splicing defects (Krawczak, Reiss et al. 1992). The novel splice junctions that arise are responsible for the creation of different proteins than what would be produced from the primary transcript. The novel protein may not be sufficient to deter the onset of disease. As a result, mutations in splice sites such as the one in KIF23 are strong candidates for being predisposition variants to disease. 2.4 Acknowledgements This research was supported by funds from the Della Martin Chair in Psychiatry and Neuroscience. We thank Dr. Douglas Levinson for conception of the original study, provision of the samples as well as the genotyping data. We thank Jennifer Herstein for computing contributions. 33 Chapter 3: Differential Expression in Human Brains of Depressed Suicide Completers 3.1 Introduction Due to the evident expression of depression in the central nervous system, we are interested in using RNA-Sequencing (RNA-Seq) to examine the global genetic expression profiles in the brains of individuals who suffered from the disorder compared to those individuals who did not. Studying transcriptional variation allows for the detection of which genes exhibit differential expression across disease states, in this case, in the presence or absence of depression. Significant dysregulation of certain genes or more heavily dysregulated brain regions may point us towards the etiology of the disorder. In this case, the phenotype studied is depression with suicide completion. While depression and suicide do have distinct neurobiological markers, the two phenotypes will be treated in combination here. Other studies can be done in the future to separate the risk factors between them. 34 3.2 Materials and Methods 3.2.1 RIN-dependent methods for library preparation (Chen, Souaiaia et al. 2014) RNA is highly prone to degradation so when working with tissues, it is imperative that steps be taken to ensure the integrity of the nucleic acid be as intact as possible. If nucleic acid is still low despite efforts otherwise, other accommodations may be possible. Using RNA-Seq results, we compared gene expression of heat-degraded RNA samples to the expression profiles of the corresponding high-quality starting samples (Chen, Souaiaia et al. 2014). We constructed a pool of 20 ug of high-quality total RNA (RNA Integrity Number (RIN) 9.4, measured using the Agilent 2100 Bioanalyzer) using a Zymo Research Direct-zol RNA MiniPrep kit from neural progenitor cell lines made from 4 individuals (Evgrafov, Wrobel et al. 2011). The RIN algorithm analyzes the electrophoretic trace to determine if any degradation products exist, with a score of 10 representing RNA that is completely intact. Using heat (60 minutes at 60°C, 35 followed by 6, 20 and 30 minutes at 90°C), this pool was degraded to RINs of 7.4, 5.3, and 4.5 (Opitz, Salinas-Riester et al. 2010, (Figure 3.1). Figure 3.1 Heat degradation of intact RNA into three successively lower quality samples. High-quality RNA (RNA 9.4) was heat-degraded for 60 minutes at 60°C and for an additional 6-30 minutes at 90°C to achieve lower quality RNA samples with RINs of 7.4, 5.3, and 4.5. 36 Sequencing libraries were then created using three different protocols. In the first protocol, poly-A RNA was purified from 1 ug of total RNA with oligonucleotide-dT beads, fragmented using divalent cations, turned into cDNA and then ultimately, into sequencing libraries (TruSeq RNA Sample Preparation kit v2). In the second protocol, ribosomal RNA was removed from 1 ug of total RNA with the Epicentre Ribo-Zero rRNA Removal kit and libraries were created in the same manner as in the first protocol, but without the poly-A selection step. In the third protocol, cDNA was created from 200 ng of starting total RNA using the NuGEN Ovation RNA-Seq FFPE System, then sheared to 300 base pairs with a Covaris S2, followed by library construction using the TruSeq DNA Sample Preparation kit v2. Each sample library was sequenced with 4.5-60 million 101 base pair single- end reads on an Illumina HiSeq 2000. The reads were uniquely mapped with three or fewer mismatches using PerM (Chen, Souaiaia et al. 2009) to GENCODE v17. For each protocol, Pearson's pairwise correlation coefficients (denoted by the letter R) were calculated between the degraded and high-quality sample across the HUGO genes that contained at least one read alignment in either sample. R was calculated by taking the log of (reads plus an offset of 1) (Figure 3.2). All three protocols 37 performed well at RIN 7.4 (R = 0.958 to 0.984, s.e. = 0.001 to 0.002) (Table 3.1). As RNA quality decreased (RINs 5.3 and 4.5), protocol #1 produced data with lower correlations of gene expression to the intact sample (R = 0.533 and 0.366, s.e. = 0.005). In contrast, both protocols #2 and #3 performed relatively well as RNA quality decreased (R = 0.951 to 0.967, s.e. = 0.002), with protocol #3 performing slightly better. For each RIN quality, R was calculated for between the reads from each pair of protocols. The reads from the two best methods (Protocol #2 and Protocol #3) maintained high correlations regardless of decreased sample quality (R = 0.845 to 0.879, s.e. = 0.003). For Protocol #1, there was a drop in read correlation to both Protocol #2 and Protocol #3 as RIN decreased (Figure 3.3). To confirm the accuracy of the mapper, all the samples were mapped using TopHat v1.4.0 (Trapnell, Pachter et al. 2009) to GENCODE v17. The resulting BAM files were run through HTSeq v0.6.1 (Anders, Pyl et al. 2014) to obtain uniquely mapped read counts. Essentially the same results were obtained as with PerM. In addition, to rule out any bias from differences in numbers of reads, all of the samples were downsampled to 4.5 million reads, and the results were again essentially the same. 38 The poor performance of protocol #1 at lower RINs can likely be explained by the poly-A selection step. As RNA becomes more degraded, less full length poly- A RNA is recovered, which leads to a cDNA library that is increasingly 3’ biased. This observation is supported by the 5’ to 3’ read distribution of each library. Those from protocols #2 and #3 remain the same with decreasing RIN, while the distribution for samples from protocol #1 is greatly 3’ biased by RIN 4.5. This data shows that the results of RNA-Seq are influenced by RNA quality with a widely used cDNA/sequencing library construction protocol. However, this problem can be avoided with alternative protocols, allowing samples with a wider range of RNA qualities to be used, facilitating the investigation of disease tissues, which often can have lower RINs. 39 Figure 3.2 Effect of RNA integrity on gene expression correlations with untreated RNA using the same sample preparation protocol. For each protocol, plotted against RIN are the Pearson's pairwise correlation coefficients (R) between the degraded and high-quality sample across the genes that contained at least one read alignment in either sample. R was calculated by taking the log of (reads plus an offset of 1). The precipitous drop in R for the Illumina TruSeq RNA Sample Preparation kit is likely due to the poly-A selection used in the kit. Figure 3.3 Effect of RNA integrity on gene expression correlations between different sample preparation protocols. For each RIN, plotted are R values between the reads from each pair of protocols. The reads from the two best methods (Protocol #2 and Protocol #3) maintained high correlations regardless of decreased sample quality. 40 Protocol RIN R to RIN 9.4 sample Total Reads NuGEN Ovation RNA-Seq FFPE System + Illumina TruSeq DNA Sample Preparation Epicentre RiboZero rRNA Removal Kit + Illumina TruSeq DNA Sample Preparation Illumina TruSeq RNA Sample Preparation Table 3.1 R between degraded and intact sample for each protocol and total reads for each sample. Shown are values of R for each correlation of degraded sample to high-quality sample as well as the total number of reads for each sample. 9.4 1 37,959,903 7.4 0.971 51,131,950 5.3 0.967 32,285,466 4.5 0.955 58,476,532 9.4 1 25,339,734 7.4 0.958 59,722,682 5.3 0.956 48,134,260 4.5 0.951 50,650,245 9.4 1 59,509,601 7.4 0.984 35,262,055 5.3 0.533 20,939,261 4.5 0.366 4,531,318 41 3.2.2 Genome and Transcriptome Free Analysis of RNA Genome and Transcriptome Free Analysis of RNA (GT-FAR) is an RNA- Seq pipeline composed of various tools that allows for the analysis of RNA-Seq data (Figure 3.4, “Pegasus GT-FAR”). Primarily, GT-FAR can perform alignment, quantification, differential expression, and variant calling. Unlike most other RNA- Seq analysis programs, a reference genome and gene model are optional with this software package. The first step involves the retention of only high quality reads and the trimming of adaptors. For this dataset, in the second step, the reads were mapped to all gene models in the GENCODE 22 reference transcriptome and to the hg19 reference genome. In the third step, a gapped alignment was performed to determine the presence of any novel or canonical genes and splice junctions. Reads that aligned to ribosomal RNA, mitochondrial RNA, pre-mRNA, and intergenic regions were removed. This analysis tool quantifies expression levels to gene models, ultimately producing both uniquely and ambiguously aligning reads. For our purposes, only uniquely aligning exonic, known junction, novel junction, intronic junction, and intronic reads in the form of counts were retained. 42 Figure 3.4 Genome and Transcriptome Free Analysis of RNA. For the data in this project, the raw reads are input and quality check steps are conducted so that only high quality reads are retained. Adapters are trimmed, mitochondrial- and ribosomal-aligning reads are removed, and exome-aligned reads are selected in an iterative fashion. The final step is the production of uniquely and ambiguously aligning reads in the form of counts to the user-specified gene model. 43 3.2.3 Dissection of brains and preparation of samples Seven postmortem brain samples were obtained from the University of California at Irvine/Davis Brain Repository through a process approved by the Institutional Review Board (Table 3.2). The next-of-kin of the deceased provided written informed consent. Three of the brains were from suicide completers and four were from matched psychiatrically normal controls who died by natural or accidental causes. Thorough record examination was conducted for the samples, including the coroner’s findings, psychiatric records, toxicology results, neuropathological tests (to exclude samples with significant pathological features such as cerebrovascular disease), and interviews of family members (Sequeira, Morgan et al. 2012). All brains were screened for agonal factors (e.g. coma, hypoxia, seizures) since these conditions are known to affect the integrity of RNA and subsequent gene expression profiles (Tomita, Vawter et al. 2004). Validated psychological autopsies procedures (Kelly and Mann 1996) diagnosed the suicide victims with MDD. None of the subjects had any history of drug or alcohol dependence. Ages ranged from 36 to 52 years. Brain dissection steps were performed on dry ice at the University of California at Irvine. After inspection and photographing, the brainstem and 44 cerebellum were separated from the cerebrum (Sequeira, Morgan et al. 2012). Coronal slices were made to the cerebrum. Brodmann areas were dissected for each slice. The brain regions were placed in cryogenic vials and stored in -80°C freezers. The brain regions were pulverized in a fume hood with mortars and pestles in liquid nitrogen. The tissue was ground to a fine powder and stored in the same vial in which the initial whole region was kept. Total RNA was extracted from the tissue using the mirVana miRNA Isolation Kit. The total RNA was analyzed for RIN and concentration using on-chip capillary electrophoresis with the Agilent 2100 Bioanalyzer. A cut-off value of RIN ≥ 3.95 has been proposed as the threshold to be used for human brain tissue (Weis, Llenos et al. 2007). In our group of samples, those with RIN ≤ 5.0 were excluded as were those with 18S and 28S peaks that were not clearly distinguishable on the electropherogram. Three regions were selected for sequencing based on literature implicating them in mood disorders: Brodmann Area 25 (BA25), DLPFC (BA8/9), and nucleus accumbens (NAc, BA34) (Amen, Prunella et al. 2009, "Brodmann area," Figure 3.5). Significant low regional cerebral blood flow in BA25 has been shown to be 45 Figure 3.5 Lateral and medial surfaces of human brain (top and bottom, respectively). Regions are numbered as Brodmann areas. 46 associated with depression (Skaf, Yamada et al. 2002). It has also seen through the use of functional neuroimaging that there is abnormally decreased perfusion in the DLPFC of suicidal individuals during both resting states and cognitive activation and increased activity in cases of recovered depression (Koenigs and Grafman 2009) (Desmyter, van Heeringen et al. 2011). Also, recent evidence has indicated that BA25 and NAc may be involved in depression as electrical deep brain stimulation to these regions have antidepressant-like effects (Mayberg, Lozano et al. 2005, Bewernick, Kayser et al. 2012). The mean RIN was 6.7 in BA25 (ranging from 6.0 to 8.2), 6.6 in DLPFC (ranging from 5.1 to 8.4), and 6.4 in NAc (ranging from 5.1 to 8.8). There were a total of 39 samples for all seven brains with multiple levels per region. NACc was split into its two regions, shell (NACs) and core (NACc), at the analysis stage. For each of the samples, cDNA was created using the NuGEN Ovation RNA- Seq FFPE System (Figure 3.6). Sequencing libraries were made from the cDNA using the Illumina TruSeq DNA Sample Preparation (Figure 3.7). Specific index tags were added to each library. Seven multiplexed libraries were pooled to a lane on flow cells that amplify cDNA, creating single-stranded SPIA product. Random 47 nonamers then created double-stranded cDNA product, and final TruSeq libraries were clustered using Illumina’s cBot. RNA-Seq was performed with single-end 101 base reads on an Illumina HiSeq 2000 sequencer. f Figure 3.6 NuGEN Ovation RNA-Seq FFPE System protocol. Oligo-dT and random primers are used to bind the RNA, followed by reverse transcription into cDNA occurs. NuGEN’s SPIA technology creates single-stranded SPIA product. Random nonamers then create double-stranded cDNA product. 48 Figure 3.7 Illumina TruSeq DNA Sample Preparation kit v2 protocol. This library preparation protocol consists of end repair, 3’ adenylation, adapter ligation, and amplification. 49 The raw reads were input into GT-FAR to obtain uniquely aligning, exonic read counts. This data was then imported into Partek Genomics Suite and PCA was performed on normalized counts to analyze the expression profiles in hopes of identifying predominant patterns in MDD as it may relate to region-specific gene expression. Differential expression was conducted to determine the genes that are differentially expressed between samples. The normalization procedure was discovered to have the largest influence over differential expression detection (Bullard, Purdom et al. 2010). Although RPKM values are intrinsically normalized and often used, normalization for each sample was performed by the gene found at the upper-quartile (75 th percentile) mark, after exclusion of genes with no expression (Mortazavi, Williams et al. 2008). The false discovery rate (FDR) correction was applied to control for the expected proportion of false discoveries (Benjamini and Hochberg 1995). The p-values and fold-changes were generated from Partek Genomics Suite and then ranked. 50 Sample ID Affection status Gender Race Age 1744 Suicide M Caucasian 37 1758 Suicide F Asian 38 2107 Suicide M Caucasian 36 2035 Control M Caucasian 48 2089 Control M Caucasian 52 2266 Control F Caucasian 53 3017 Control M Caucasian 38 Table 3.2 University of California Irvine/Davis Brain Depository brain details. Listed are sample ID, affection status, gender, race, and age for the cases and controls from sample set 1. 51 Based on the results of the brains obtained from University of California at Irvine/Davis Brain Repository, it was decided to obtain additional brains and sequence RNA (cDNA) from the BA25 region. We acquired from the National Institute of Mental Health (NIMH) Human Brain Collection Core the BA25 region from 18 depressed suicide completers and 17 psychiatrically normal controls (Table 3.3). The region was pulverized and the RNA was extracted in the same manner as for the previous brains using the mirVana miRNA Isolation Kit. The libraries were prepared as they were for the other samples using the NuGEN Ovation RNA-Seq FFPE System followed by the Illumina TruSeq DNA Sample Preparation. Sequencing was performed using 101 base pairs reads on the Illumina HiSeq 2000. The downstream RNA-Seq pipeline was also the same in its method of count production. The differential expression analysis was performed with DESeq (Anders and Huber 2010). The p-values and fold-changes generated for the genes were ranked. 52 Sample ID Affection Status Gender Race Age 1387 Suicide M Caucasian 55 1489 Suicide M Caucasian 52 1508 Suicide M Caucasian 45 1930 Suicide M Caucasian 56 2024 Suicide M Caucasian 39 2554 Suicide M Caucasian 38 2578 Suicide M Caucasian 41 2608 Suicide M Caucasian 32 1928 Suicide M Caucasian 25 1983 Suicide M Caucasian 25 2032 Suicide M Caucasian 50 2459 Suicide F Caucasian 20 2468 Suicide M Caucasian 26 2604 Suicide M Caucasian 31 1551 Suicide F Caucasian 52 1763 Suicide F Caucasian 52 2307 Suicide F Caucasian 48 2547 Suicide F Caucasian 52 2052 Control M Caucasian 40 2063 Control M Caucasian 18 2080 Control M Caucasian 36 2084 Control M Caucasian 48 2257 Control M Caucasian 59 2322 Control M Caucasian 63 2333 Control F Caucasian 63 2371 Control M Caucasian 23 2454 Control F Caucasian 60 2521 Control M Caucasian 51 2557 Control M Caucasian 40 2582 Control M Caucasian 41 2629 Control M Caucasian 57 2638 Control M Caucasian 46 2301 Control M Caucasian 37 2534 Control M Caucasian 52 2542 Control M Caucasian 54 Table 3.3 National Institute of Mental Health Brain Collection Core brain details. Listed are sample ID, affection status, gender, race, and age for the cases and controls from sample set 2. 53 3.3 Results 3.3.1 Dysregulation in BA25 strongly associated with depression The exonic, protein-coding, non-ribosomal reads from the brains acquired from the University of California at Irvine/Davis Brain Repository were analyzed for differential expression analysis with Partek Genomics Suite to obtain nominally significant differentially expressed genes (p-value < 0.5 and fold change of at least ±2). Data were analyzed with the use of QIAGEN’s Ingenuity Pathway Analysis (IPA, QIAGEN Redwood City, www.qiagen.com/ingenuity). We found BA25 to be substantially different than the other regions in multiple levels of analyses: 1) PCA of BA25 showed the clear separation between cases and controls, while the other three regions displayed intermingling among the cases and controls (Figure 3.8). 2) There were 5,034 nominally differentially expressed genes with fold change of at least 2 from BA25, whereas the other regions had much lower numbers of differentially expressed genes: DLPFC=250, NACs=636, and NACc=182. We do not 54 Figure 3.8 Principal Components Analysis of four brain regions. Sample set 1. BA25 (top left), DLPFC (top right), NACs (bottom left), NACc (bottom right). Cases are in red and controls are in blue. Of the 4 regions, BA25 shows the clearest separation between cases and controls. 55 Figure 3.9 Number of nominally significant differentially expressed genes with fold change of 2 in brain regions. There were 5,034 differentially expressed genes from BA25. The other three regions had lower numbers of differentially expressed genes: DLPFC=250, NACs=636, and NACc=182. 0 1000 2000 3000 4000 5000 # of DE genes BA25 DLPFC NACs NACc downregulated upregulated 56 Figure 3.10 Benjamini-Hochberg adjusted p-values of top canonical IPA pathways. Shown are the top three pathways’ B-H-adjusted p-values for each of the four brain regions. BA25 has, by far, the lowest p-values. The p-value associated with each pathway represents the likelihood that the association between a set of genes in the analysis and a particular pathway is random. Since the –log(B-H p-value) is taken, the longer the bar is, the more likely it is that the association isn’t random and the more significant it is. 57 think this is due to one sample driving the results because removal of any one sample did not greatly change the results (Figure 3.9). 3) BA25 showed a substantially lower Benjamini-Hochberg (BH)-adjusted p- values for the top canonical pathways in IPA (i.e., for the top pathway: BA25=5.63E-07, DLPFC=2.09E-01, NACs=8.92E-01, NACc=6.72E-01) (Figure 3.10). 4) Genetic association between MDD and the interferon α/β signaling pathway genes has been postulated (Mostafavi, Battle, et al. 2014). Twenty nominally significant genes in the interferon α/β signaling pathway were discovered. For BA25, the four top pathways in IPA were immune-related: Agranulocyte Adhesion and Diapedesis, Role of Cytokines in Mediating Communication between Immune Cells, Granulocyte Adhesion and Diapedesis, and Differential Regulation of Cytokine Production in Intestinal Epithelial Cells by IL-17A and IL-17F (Figure 3.11). Due to this immune function relationship with depression, we attempted to connect as many of the top 20 interferon α/β signaling pathway genes as possible to the merged nervous system networks from IPA. Four of these 20 connect to multiple 58 differentially expressed genes in BA25, but no connections were observed for the other brain regions (Figure 3.12, Table 3.4). Figure 3.11 Top canonical IPA pathways in BA25. Shown are the top six canonical pathways out of IPA for the dysregulated genes in BA25. Of note is the fact that the first four pathways are immune-related pathways. There has been research detailing a connection between major depression and the immune system. The orange ratio represents the number of input genes that participate in a given pathway divided by the total number of genes associated with that pathway. 59 5) Within the IPA nervous system networks, there existed eight differentially expressed genes from BA25 that lay within 300 base pairs of a SNP with p-value ≤ 10 -6 in a mega-analysis of GWASs of MDD (Major Depressive Disorder Working Group of the Psychiatric GWAS Consortium 2013). DLPFC contained one such differentially expressed gene, NACs contained two, and NACc did not contain any, given the same conditions. 60 Figure 3.12 BA25 nervous system networks with interferon signaling genes connected. Network 1, 5, 19 with scores of 34, 29, 25, respectively. The pink genes are the interferon genes that were connected. The gray genes are the nervous system networks out of IPA. 61 Gene Mostafavi p-val BA25 p-val (fold change) IRF7 1.64E-03 3.55E-03 (3.18) IRF9 1.07E-02 3.55E-03 (3.18) ISG15 3.78E-03 3.55E-03 (3.19) PTPN6 1.26E-03 4.31E-02 (2.53) OAS2 4.16E-03 n/a IFIT2 7.30E-03 n/a IFIT3 3.78E-04 n/a IRF8 6.93E-03 n/a Table 3.4 Eight Type I IFN signaling genes from Mostafavi et al. connected to merged networks. P-values from Mostafavi et al. and p-value with fold change from BA25 data. Four of the genes from the Mostafavi article were represented in our BA25 data (IRF7, IRF9, ISG15, and PTPN6). All eight genes listed were able to be connected in to the merged networks. 62 We were intrigued by these results from BA25 (the region targeted by the Mayberg group in deep brain stimulation studies), but were concerned that our results may be due to a small sample size. To rule out this possibility, we sequenced an additional 35 BA25 samples from the NIMH Brain Collection Core. Increasing our sample size may allow us to determine if our preliminary results are significant findings by providing greater power to our study. The preparation of the samples is described in Section 3.2.3. PCA using Partek Genomics Suite of the normalized counts from the 35 BA25 samples from the NIMH shows a relatively robust separation between the 18 cases and 17 controls (Figure 3.13). For this set of data, there were 918 nominally differentially expressed downregulated genes and 125 nominally different expressed upregulated genes. 63 Figure 3.13 Principal Components Analysis of 18 MDD cases and 17 controls. Sample set 2. Cases are in red and controls are in blue. A relatively robust separation is seen between cases and controls. 64 3.3.2 GZMK, HOXB3, PROX1 are dysregulated in BA25 (For the rest of this chapter where differential expression is discussed, the data is from the NIMH brainset.) Any individual major depression GWAS has seen limited success. This may be due to multiple factors. The phenotypic heterogeneity and complex architecture are likely contributing factors to the lack of strong GWAS signals. As well, there are various subtypes of depression and many GWASs fail to separate these in their analyses, which may be contributing to the inability in identifying robust findings. It is also becoming apparent that the effects of common disease loci are smaller than expected, necessitating much larger than existing sample sets. A mega-analysis of nine discovery studies consisting of 1.2 million autosomal and X chromosomal SNPs using 9,240 MDD cases and 9,519 controls of European ancestry was performed (Major Depressive Disorder Working Group of the Psychiatric GWAS Consortium 2013). The replication phase consisted of the genotyping of 554 SNPs using 6,783 MDD cases and 50,695 controls. In both the discovery and replication studies, no loci achieved genome-wide significance. Inputting all the dysregulated nominally significant genes in BA25, the first network (score of 39) produced in IPA corresponded to a nervous system network (Figure 65 3.14). Within this network, three GWAS SNPs were located within ± 300kb of a dysregulated gene in BA25 (Table 3.5). These genes are GZMK, HOXB3, and PROX1 (Figure 3.15, Figure 3.16, Figure 3.17). GZMK encodes for a serine protease that is located within the cytoplasmic granules of lymphocytes (“GZMK Gene”). HOXB3 and PROX1 are both members of the homeobox transcription factor family. HOXB3 is a gene that codes for a protein involved in regulating morphogenesis ("HOXB3 Gene"). PROX1 may be involved in early central nervous system development as well as the regulation of undifferentiated neurons ("PROX1 Gene"). Figure 3.14 Nervous system network of BA25 dysregulated genes. First network in IPA. Score of 39. Red represents upregulated gene and green represents downregulated genes. 66 Figure 3.15 GZMK expression in human brain development. (BrainSpan). Top field is sorted from 8 weeks postconception to 40 years in adulthood. Second field represents brain regions. Third field represents gene expression for GZMK. Also shown are the 28 most highly correlated genes and their gene expressions per region and developmental stage. Figure 3.16 HOXB3 expression in human brain development. (BrainSpan). Top field is sorted from 8 weeks postconception to 40 years in adulthood. Second field represents brain regions. Third field represents gene expression for HOXB3. Also shown are the 28 most highly correlated genes and their gene expressions per region and developmental stage. 67 Figure 3.17 PROX1 expression in human brain development. (BrainSpan). Top field is sorted from 8 weeks postconception to 40 years in adulthood. Second field represents brain regions. Third field represents gene expression for PROX1. Also shown are the 28 most highly correlated genes and their gene expressions per region and developmental stage. Gene p-val for GWAS SNP ± 300kb from gene BA25 p-val (fold change) GZMK 3.50E-04 3.30E-03 (-2.22) HOXB3 1.04E06 1.55E-02 (-2.26) PROX1 4.11E-03 4.69E-02 (1.29) Table 3.5 Genes within ± 300 kb of a GWAS SNP. Within the sole nervous system network in IPA from the list of differentially expressed genes, three GWAS SNPs were located within ± 300kb of a dysregulated gene in BA25 (Major Depressive Disorder Working Group of the Psychiatric GWAS Consortium 2013). 68 3.3.3 BTBD6 upregulated in BA25 with 82 eQTLs in LD with trending variant BTBD6 is a gene that codes for BTB/POZ domain-containing protein (Figure 3.18). It is involved in late neuronal development as well as the relocation of the transcriptional repressor Plzf (promyelocytic leukemia zinc finger) from the nucleus to the cytoplasm (Sobieszczuk, Poliakov et al. 2010). BTBD6 targets Plzf for ubiquitination and degradation, thereby preventing neurogenesis from being inhibited when Plzf is overexpressed. In BA25, the BTBD6 p-value was 3.79E-02 and it was upregulated 1.71-fold in cases. In the GWAS mega-analysis, BTBD6 is within ± 300kb of a SNP with a p–value of 7.121E-06 (rs1008628). In GTEx Portal v4, 82 expression quantitative trait loci (eQTLs) for BTBD6 were discovered to be in linkage disequilibrium (LD) with rs1008628 (given an r 2 threshold of 0.8). These are locations in the genome where the genotype significantly affects gene expression. The p-values for the 82 eQTLs range from 7.20E-08 to 1.70E-06. The r 2 ranges are from 0.82 to 1.00, representing a strong LD of coinheritance 80% to 100% of the time. 69 Figure 3.18 BTBD6 expression in human brain development. (BrainSpan). Top field is sorted from 8 weeks postconception to 40 years in adulthood. Second field represents brain regions. Third field represents gene expression for BTBD6. Also shown are the 28 most highly correlated genes and their gene expressions per region and developmental stage. 3.3.4 GDNF downregulated in BA25 GDNF is a gene that encodes glial cell-derived neurotrophic factor, which has been shown to protect central nervous system neurons and foster their growth and differentiation (Kotyuk, Keszler et al. 2013, Figure 3.19). It is a member of the transforming growth factor beta superfamily. It binds to GDNF-family receptor α1 and forms a complex with the receptor tyrosine kinase. Various clinical trials involving GDNF in rat and monkey models have been performed that show promise in its use as a therapy (Deng, Liang et al. 2013, Kordower, Emborg et al. 2000). In 70 2003, Dr. Steven Gill led a small clinical human trial where scientists in Bristol developed and placed catheters with GDNF solution in the putamen of five individuals with moderate Parkinson’s disease, an area known to be degenerated in patients with the disease (Lang, Gill et al. 2006). The implanted catheters were connected to a pump that was implanted in the abdominal wall, which continuously pumped the solution into the brain region through a fixed dose each day, increasing at eight-week intervals, ending in a five-week period of no solution administration. The patients all reported significant movement and motor improvement after one year of treatment. One patient in particular, a 62-year-old man, had detailed a huge improvement in the quality of his life over the 43 months post-treatment (Patel and Gill, 2007). Regretfully, he succumbed to a heart attack unrelated to the trial approximately ten years later. An autopsy revealed sprouting of neuronal fibers in the right side of the putamen, which is the area where the catheter tip had been placed, indicative of healing in that region of the brain. The left side of the putamen which did not receive the GDNF solution, showed an area with dopamine fibers five times smaller in size than the right side. These results are suggestive of the neuroprotective effects in the brain of GDNF. 71 In addition, a meta-analysis consisting of 12 studies (526 cases and 502 controls) that looked at blood GDNF levels in MDD patients concluded that blood GDNF levels were lower in patients with MDD than in controls (Lin and Tseng, 2015). These results were maintained after a remove one analysis was performed to ensure no one study was driving the results. The p-value from the meta-analysis was 1.10E-03 with an effect size (also known as correlation) between GDNF expression and depression of -0.62 (negative corresponding to downregulation in cases), supporting the use of blood GDNF as a potential depression biomarker. The p-value in our BA25 data was 5.29E-02 and the gene was downregulated 1.588-fold in cases. Figure 3.19 GDNF expression in human brain development. (BrainSpan). Top field is sorted from 8 weeks postconception to 40 years in adulthood. Second field represents brain regions. Third field represents gene expression for GDNF. Also shown are the 28 most highly correlated genes and their gene expressions per region and developmental stage. 72 3.3.5 Type I interferon signaling gene OAS1 downregulated in BA25 One of the top 20 nominally significant Type I interferon signaling genes from Mostafavi et al. is OAS1, which encodes for 2’-5’ oligoadenylate synthetase 1 (Figure 3.20). OAS1 is generally expressed at low levels and type I interferons induce its upregulation (Sadler and Williams 2008). The protein is in its inactive monomeric form while in the cytoplasm. When viral double-stranded RNA activates it, oligomerization into its tetrameric form occurs, and this allows for the synthesis of 2’-5’ oligoadenylates. This causes ribonuclease L to be activated, which is then capable of cleaving both cellular and viral RNAs. In Mostafavi et al., the p-value for OAS1 was 2.52E-04, and the gene was upregulated in whole blood from cases. It was the 15 th gene when ranked by lowest to highest p-value. In our BA25 data, the p- value was 1.06E-02 and downregulated 1.718-fold in cases. It was ranked 159 th when ranked by lowest to highest p-value. 73 Figure 3.20 OAS1 expression in human brain development. (BrainSpan). Top field is sorted from 8 weeks postconception to 40 years in adulthood. Second field represents brain regions. Third field represents gene expression for OAS1. Also shown are the 28 most highly correlated genes and their gene expressions per region and developmental stage. 3.3.6 FDR-corrected significant gene RP4-539M6.19 downregulated in BA25 When sample sets 1 and 2 were combined for a total of 20 MDD-suicide cases and 20 controls, a novel gene survived FDR correction. RP4-539M6.19 on chromosome 22 is an uncharacterized protein-coding gene (Figure 3.21). Its FDR- corrected p-value was 4.28E-02 and fold change was -3.75472. Its nominal p-value was 8.73E-07. Inputting this gene into GENEMANIA produced a network showing that this gene interacts with CLVS1, CLVS2, and TTPA through physical interactions, co-expression, and shared protein domains (Warde-Farley, Donaldson et al. 2010, 74 Figure 3.22). CLVS1, CLVS2, and TTPA are involved in phosphatidylinositol phosphate binding at an FDR-corrected p-value of 5.61E-01, making it possible that the correlated novel gene is involved in the function as well. This is a pathway that is overstimulated in the platelets of depressed patients (Dwivedi and Pandey 2009). Lithium inhibits the enzyme inositol monophosphatase (IMPase), a member of the phosphodiesterase family of enzymes (Brown and Tracy 2013). This is an essential enzyme in the phosphatidylinositol signaling pathway, affecting various functions from cell growth to secretion and apoptosis. Inhibiting IMPase has been known to be effective in treating bipolar disorder (Singh, Halliday et al. 2013). In addition, long- term lithium maintenance is known to greatly lower the risk of suicide in individuals who suffer from mood disorders (Baldessarini, Tondo et al. 2003). As much as a 93% difference in risks for suicidal acts between affected individuals without lithium treatment and those on treatment have been recorded, bringing the risk levels for those on treatment close to the risk levels for the general population. As a result of these connections, RP4-539M6.19 may be a potential gene involved the pathway of depression development. 75 Figure 3.21 RP4-539M6.19 expression in human brain development. (BrainSpan). Top field is sorted from 8 weeks postconception to 40 years in adulthood. Second field represents brain regions. Third field represents gene expression for RP4-539M6.19. Also shown are the 28 most highly correlated genes and their gene expressions per region and developmental stage. 76 Figure 3.22 Genes connected to RP4-539M6.19. Screenshot from GENEMANIA of gene connections made to RP4-539M6.19. Highlighted in yellow are CLVS1, CLVS2, and TTPA, which are known to be involved in phosphatidylinositol phosphate binding, a pathway where lithium inhibits one of its enzymes. Lithium is also known to reduce suicide risk. 77 3.4 Discussion Various factors have pointed to dysregulation in the BA25 brain region being implicated in depression. There were differences in results between sample set 1 and sample set 2 that likely have to do with the analysis size. However, there may also have been batch effects that are difficult to accommodate. We concluded that the greater number of samples from the NIMH gave a more accurate differential expression dataset. To this end, the sequencing of BA25 in MDD-suicide cases and controls has pointed to the dysregulation of multiple genes of interest. GZMK, HOXB3, PROX1, BTBD6, GDNF, OAS1, and RP4-539M6.19 are the candidate genes that have resulted from this transcriptome study. These genes from this dataset are likely a subset of the entire group of additional genes involved in the predisposition to depression. Different combinations of dysregulated genes may be present in different affected individuals, an additional likely reason why depression has been a difficult disorder for scientific advances. 78 3.5 Acknowledgements This work was supported by funds from the Della Martin Chair in Psychiatry and Neuroscience as well as a research grant from the NHGRI (HG006531). We thank Dr. William Bunney, Dr. Fabio Macciardi, Dr. Steven Potkin, Dr. Theodorus Van Erp, Dr. Marquis Vawter, and Dr. Barbara Lipska for access to the samples. We thank Dr. James Fallon for neuroanatomical expertise. We thank Dr. Firoza Mamdani, Dr. Adolfo Sequeira, and Dr. Harker Rhodes for dissection work. We thank Dr. Tade Souaiaia, Dr. Federica Torri, and Jennifer Herstein for computing contributions. 79 Chapter 4: Merging of genetic and transcriptomic approaches 4.1 TMED9 downregulated in BA25 and contains intronic insertion in exome cases In looking at the overlap between the RE-MDD cases and the MDD-suicide completers as well as their respective controls, we discovered a variant in a gene of interest. For the DNA-Seq data in RE-MDD cases and healthy controls, only exonic blood variants with at least a 60% difference between cases and controls for each genotype were kept. Additionally, the only variants that were retained were those with a clear high burden in one genotype only between cases versus controls. For the RNA-Seq data in BA25 MDD-suicide cases and healthy controls, only the differentially expressed genes with a p-value of at least 0.05 were retained. Genes in both sets of data were matched by variants in the genomic analysis and p-value with fold change in the differential expression analysis. Two genes resulted from this filtering: MUC5B and TMED9 (Table 4.1). The variation in the MUC5B gene was exonic and synonymous and the RVIS score was 16.524 (99.98), indicative of being functional and not damaging, so this result was removed. The variation in the TMED9 gene was an intronic deletion in the controls and the RVIS score was -0.029 80 (51.40). This gene is involved in vesicular protein trafficking, particularly in the early secretory pathway (“TMED Gene”). The variant is located 12 base pairs from the exon, but is not located within the natural donor splicing site. The majority of the cases (82/90) were homozygous CG, while all of the controls were homozygous C. The allele frequencies in 1000G are 33% CG and 67% C. Therefore, this variation may be treated as an insertion in the cases, if considering population allele frequencies. TMED9 was downregulated in the BA25 MDD-suicide samples 1.569- fold and had a p-value of 3.69E-02. Lastly, the SKAT-O beta (1,25) weighted p- value for TMED9 was 0.00766. 81 Gene MUC5B TMED9 DNA-Seq RVIS 16.524 (99.98) -0.029 (51.40) DNA-Seq Chr:bp (hg19) Chr11:1258217 Chr5:177019687 DNA-Seq Type exonic, synonymous intronic, insertion (cases) DNA-Seq Ref T CG DNA-Seq Alt G C DNA-Seq Cases homo ref 4 82 DNA-Seq Cases het 85 4 DNA-Seq Cases homo alt 0 0 DNA-Seq Controls homo ref 163 0 DNA-Seq Controls het 5 0 DNA-Seq Controls homo alt 0 168 RNA-Seq p-val 1.24E-02 3.69E-02 RNA-Seq fold change -2.462 -1.569 Table 4.1 Overlap in genes between DNA-Seq variation and RNA-Seq dysregulation. Two genetic variants in the DNA-Seq analysis of RE-MDD cases and healthy controls were discovered to be variants present in dysregulated genes in the RNA-Seq BA25 data. For the genomic data, shown are the two variants in the RE-MDD data along with their RVIS score, location, type of variation, reference and alternate allele, and genotypes in cases and controls. For the transcriptomic data, shown are the p-values and fold changes for the genes. 4.2 Discussion Genetic and transcriptomic variation can cause disease in different ways. There may exist a genetic variant that does not cause gene dysregulation. In this case, the mutation still permits mRNA and protein to be made, but the protein perhaps is truncated and not functional. There may be dysregulation in gene expression but no mutation in the gene. This could be due to a variant in the 82 promoter region of the gene. Up until this point, RE-MDD was studied from a genomic standpoint and MDD with suicide was investigated from a transcriptomic avenue. However, there is a third way of disease occurring genetically, which is through a variant in a protein-coding gene causing dysregulation in that gene. The variant in and dysregulation within TMED9 allows for us to classify this as a potential candidate variant and gene in depression susceptibility that warrants further examination through sequencing of additional samples. 83 Chapter 5: Summary and Future Directions 5.1 Genomic variants in Recurrent Early-Onset Major Depressive Disorder GWASs are generally employed to look for more common variants, and linkage studies are used to look for rare variants. There have been multiple GWASs to date that consist of thousands of cases and controls, but have yet to produce significant findings. While our samples are from linkage families, it may be that our sample size of 90 cases and 168 controls is not large enough to ascertain the appropriate predisposition variants. The original GenRED study that pointed to region chr15q consisted of 297 families in the first wave and contained 656 families total. Although we focused on the exome and UTR rather than genotype specific SNPs in the whole genome, our 90 cases may have fallen short of the necessary number of samples to properly interrogate the exome and UTR. The variant in the natural acceptor splice site in KIF23 may be having an integral effect in the proper splicing of that gene. However, achieving an increased number of both cases and controls in the analysis is essential to replicating and confirming this finding as well as discovering additional predisposition variants. Lastly, RNA from the controls is 84 accessible to our group. Therefore, in the future, RNA-sequencing on the 3 homozygous T controls and a few of the heterozygous T/deletion controls will be performed. The RNA-Seq results will allow visualization of both canonical and novel splice junctions, thereby elucidating if the homozygous T genotype at the splice site locus is indeed causing altered splicing. 5.2 Differential expression in BA25 of depressed suicide completers Postmortem brains from depressed suicide completers are not nearly as accessible as blood samples from living depressed patients. While more samples give greater power, we think our brain sample set is of a reasonable number to provide strong differential expression data. However, despite a sufficient number of brain specimens, many variables can still affect final RNA composition. The postmortem interval (between 12 and 61.5 for our samples), pH, presence of alcohol or drugs, and various other factors can all intermingle to mask what would be considered true gene expression. It is possible to conduct analyses with these confounding variables accounted for, but doing so would greatly raise the noise level in our sample set. As 85 such, we decided against discriminating out these variables. Many genes were discovered to be differentially expressed in our data. We selected our genes of interest based on literature and network analyses and believe we have discovered strong candidate genes. Replication studies need to be performed to confirm that GZMK, HOXB3, PROX1, BTBD6, GDNF, OAS1, and RP4-539M6.19 are differentially expressed between cases and controls. In addition, DNA-Seq may be performed on these BA25 samples in order to determine the presence of possible eQTLs that may be responsible for gene dysregulation. 5.3 Genetic and transcriptomic overlap in depression To confirm both the intronic variant in TMED9 in the recurrent early-onset major depression cases as well as its downregulation in the BA25 region of suicide completers with MDD, additional samples of both blood and ideally brain tissue should be acquired and sequenced. 86 5.4 Conclusion Major depression is a difficult disease to study, being both genetically and non-genetically highly heterogenous in nature. Many studies combine multiple subtypes of depression into one phenotype, making significant findings elusive. Ideally, studies should comprise distinct subtypes, but the appropriate number of samples would be difficult to achieve and the number itself hard to estimate. Compared to other psychiatric diseases with high heritability such as schizophrenia, for which there have been strong SNP associations discovered, the research on major depression has been numerous but stagnant. The research detailed here lists potential genomic variants that may be involved in the RE-MDD subtype as well as differentially expressed genes that may be predisposing to MDD with suicide. While these are two subtypes of depression, it may be possible to combine the two datasets. Here, doing so led to one gene (TMED9) and a variant within it. Overall, depression is a debilitating disease that needs far greater etiological elucidation than exists currently. It is becoming apparent that current technologies may not be sufficient to achieve this goal. Studies of subtypes of depression and larger numbers of more stringently diagnosed individuals are possible avenues 87 through which to move forth with robust findings of the predisposition variants and genes involved. Overall, mental illness is a challenging field to study. Depression is an affliction no less serious than many physical disorders. However, the stigma associated with it may prevent affected individuals from seeking treatment, including participating in clinical trials. This cultural stigma is a barrier to progress being made. The individuals who suffer often feel a lack of support and perhaps even judgment. Additionally, the number of people afflicted around the world is a sign that the stigma is making us sicker. Although there is much research being done, advances are being made much too slowly. At an individual level, raising awareness that depression is not a weakness or a personal failing is one way that may speed up scientific and medical progress by propelling people who need help to seek it. This may be the start that is needed to bring about the long overdue curative research. 88 References The 1000 Genomes Project Consortium. (2012). “An integrated map of genetic variation from 1,092 human genomes.” Nature 491:56-65. Adzhubei, I.A., S. Schmidt, et al. (2010). “A method and server for predicting damaging missense mutations.” Nat Methods 7(4):248-9. Amen, D. G., J. R. Prunella, et al. (2009). "A comparative analysis of completed suicide using high resolution brain SPECT imaging." J Neuropsychiatry Clin Neurosci 21(4): 430-439. Anders, S. and W. Huber (2010). "Differential expression analysis for sequence count data." Genome Biology 11:R106. Anders S., P.T. Pyl, et al. (2014). "HTSeq - A Python framework to work with high- throughput sequencing data." BioRxiv preprint American Psychiatric Association. (2013). Diagnostic and Statistical Manual of Mental Disorders: DSM-5. Washington, D.C.: American Psychiatric Association. Baldessarini, R. J., L. Tondo, et al. (2003). "Lithium treatment and suicide risk in major affective disorders: update and new findings." J Clin Psychiatry 64(5):44-52. Benjamini, Y. and Y. Hochberg (1995). "Controlling the false discovery rate: a practical and powerful approach to multiple testing." J.R. Statist. Soc. 57(1): 289- 300. 89 Bewernick, B. H., S. Kayser, et al. (2012). "Long-term effects of nucleus accumbens deep brain stimulation in treatment-Rresistant depression: evidence for sustained efficacy." Neuropsychopharmacology 37(9): 1975-1985. Bierut, L. J., A. C. Heath, et al. (1999). "Major depressive disorder in a community- based twin sample: are there different genetic and environmental contributions for men and women?" Arch Gen Psychiatry 56(6): 557-563. Blier, P. (2009). "Optimal use of antidepressants: When to act?" J Psychiatry Neurosci 34(1):80. BrainSpan: Atlas of the Developing Human Brain [Internet]. Funded by ARRA Awards 1RC2MH089921-01, 1RC2MH090047-01, and 1RC2MH089929-01. © 2011. Available from: http://developinghumanbrain.org. "Brodmann area." Wikipedia: The Free Encyclopedia. Wikimedia Foundation, Inc., 7 Jan. 2015. Web. 18 May 2015. <http://en.wikipedia.org/wiki/Brodmann_area> Brown, K. M. and D. K. Tracy. (2013). "Lithium: the pharmacodynamic actions of the amazing ion." Ther Adv Psychopharmacol 3(3):163-176. Bullard, J. H., E. Purdom, et al. (2010). "Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments." BMC Bioinformatics 11(1): 94. Chen, E. A., T. Souaiaia, et al. (2014). "Effect of RNA integrity on uniquely mapped reads in RNA-Seq." BMC Research Notes 7:753. Chen, Y., T. Souaiaia, et al. (2009). "PerM: efficient mapping of short sequencing reads with periodic full sensitive spaced seeds." Bioinformatics 25(19): 2514-2521. 90 Delgado, P. L., L. H. Price, et al. (1994). "Serotonin and the neurobiology of depression." Arch Gen Psychiatry 51(11): 865-874. Deng, X., Y. Liang, et al. (2013). "Co-transplantation of GDNF-overexpressing neural stem cells and fetal dopaminergic neurons mitigates motor symptoms in a rat model of Parkinson's disease." PLoS One 8(12):e80880 Desmyter, S., C. van Heeringen, et al. (2011). "Structural and functional neuroimaging studies of the suicidal brain." Progress in Neuro-Psychopharmacology and Biological Psychiatry 35(4): 796-808. Dwivedi, Y., G. N. Pandey. (2009). "Pharmacological characterization of inositol 1,4,5-tris phosphate receptors in human platelet membranes." Cardiovasc Psychiatry Neurol 618586. Evgrafov, O. V., BB Wrobel, et al. (2011). "Olfactory neuroepithelium-derived neural progenitor cells as a model system for investigating the molecular mechanisms of neuropsychiatric disorders." Psychiatr Genet 21(5):217-228. Fournier, J. C., R. J. DeRubeis, et al. (2010). "Antidepressant drug effects and depression severity: a patient-level meta-analysis." JAMA 303(1): 47-53. Fricker, M., J. J. Neher, et al. (2012). "MFGE-8 mediates primary phagocytosis of viable neurons during neuroinflammation." Journal of Neuroscience 32(8): 2657- 2666. Ge, D., E. K. Ruzzo, et al. (2011). "SVA: software for annotating and visualizing sequenced human genomes." Bioinformatics 27(14):1998-2000. Greenberg, P. E., R. C. Kessler, et al. (2003). "The economic burden of depression in the United States: how did it change between 1990 and 2000?" J Clin Psychiatry 64(12): 1465-1475. 91 GTEx Portal. The Broad Institute of MIT and Harvard. Web. 18 May 2015. <www.gtex.portal.org> "GZMK Gene." GeneCards. Web. 18 May 2015. < http://www.genecards.org//cgi- bin/carddisp.pl?gene=GZMK> Holmans, P., M. M. Weissman, et al. (2007). "Genetics of recurrent early-onset major depression (GenRED): final genome scan report." Am J Psychiatry 164(2): 248-258. "HOXB3 Gene." GeneCards. Web. 18 May 2015. < http://www.genecards.org/cgi- bin/carddisp.pl?gene=HOXB3> Isometsa, E. T. and J. K. Lonnqvist (1998). "Suicide attempts preceding completed suicide." British Journal of Psychiatry 173: 531-535. Joiner, T. (2011). Myths about Suicide, Harvard University Press. Kelly, T. M. and J. J. Mann (1996). "Validity of DSM-III-R diagnosis by psychological autopsy: a comparison with clinician ante-mortem diagnosis." Acta Psychiatr Scand 94(5): 337-343. Kessler, R. C., W. T. Chiu, et al. (2005). "Prevalence, severity, and comorbidity of 12-month DSM-IV disorders in the national comorbidity survey replication." Arch Gen Psychiatry 62(6): 617-627. "KIF23 Gene." GeneCards. Web. 18 May 2015. < http://www.genecards.org/cgi- bin/carddisp.pl?gene=KIF23> Koenigs, M. and J. Grafman (2009). "The functional neuroanatomy of depression: Distinct roles for ventromedial and dorsolateral prefrontal cortex." Behavioural Brain Research 201(2): 239-243. 92 Kong, A. and N. J. Cox (1997). "Allele-sharing models: LOD scores and accurate linkage tests." Am. J. Hum. Genet. 61(5): 1179-1188. Kordower, J.H., M.E. Emborg, et al. (2000). "Neurodegeneration prevented by lentiviral vector delivery of GDNF in primate models of Parkinson's disease." Science 290(5492):767-73. Kotyuk, E., G. Keszler, et al. (2013). "Glial cell line-derived neurotrophic factor (GDNF) as a novel candidate gene of anxiety." PLoS One 8(12):e80613. Krawczak, M., J. Reiss, et al. (1992). "The mutational spectrum of single base-pair substitutions in mRNA splice junctions of human genes: causes and consequences." Hum Genet 90(1-2):41-54. Lang, A. E., S. Gill, et al. (2006). "Randominzed controlled trial of intraputamenal glial cell line-derived neurotrophic factor infusion in Parkinson disease." Ann Neurol 59(3):459-66. Lee, S., M. J. Emond, et al. (2012). "Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies." Am J Hum Gen. 91(2):224-237. Levinson, D. F., O. V. Evgrafov, et al. (2007). "Genetics of recurrent early-onset major depression (GenRED): significant linkage on chromosome 15q25-26 after fine mapping with single nucleotide polymorphism markers." Am J Psychiatry 164(2): 259-264. Lin, P. Y., P. T. Tseng. (2015). "Decreased glial cell line-derived neurotrophic factor levels in patients with depression. A meta-analytic study." J Psychiatr Res 63:20-7. 93 Major Depressive Disorder Working Group of the Psychiatric GWAS Consortium. (2013). "A mega-analysis of genome-wide association studies for major depressive disorder." Molecular Psychiatry 18(4):497-511. Mayberg, H.S., A. M. Lozano, et al. (2005). "Deep brain stimulation for treatment- resistant depression." Neuron 45(5):651-60. McGuffin, P., J. Knight, et al. (2005). "Whole genome linkage scan of recurrent depressive disorder from the depression network study." Human Molecular Genetics 14(22): 3337-3345. McKenna, A., M. Hanna, et al. (2010). "The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data." Genome Research 20(9):1297-1303. Mortazavi, A., B. A. Williams, et al. (2008). "Mapping and quantifying mammalian transcriptomes by RNA-Seq." Nature Methods 5(7): 621-628. Mostafavi, S., A. Battle, et al. (2014). "Type I interferon signaling genes in recurrent major depresssion: increased expression detected by whole-blood RNA sequencing." Molecular Psychiatry 12:1267-74. Ng, P.C. and S. Henikoff. (2001). "Predicting deleterious amino acid substitutions." Genome Res 11(5):863-74. Opitz, L., G. Salinas-Riester, et al. (2010). "Impact of RNA degradation on gene expression profiling." BMC Med Genomics 3:36. Patel, N. K. and S. S. Gill (2007). "GDNF delivery for Parkinson's disease." Acta Neurochir Suppl 97(Pt 2):135-54. Partek® Genomics Suite® software, version 6.6 Copyright ©; 2014 Partek Inc., St. Louis, MO, USA. 94 "Pegasus GT-FAR." USC Genomics. Decoding the Genome on the Cloud. Web. 18 May 2015. <http://genomics.isi.edu/gtfar> Petrovski, S., Q. Wang, et al. (2013). "Genic intolerance to functional variation and the interpretation of personal genomes." PLoS Genetics 9(8):e1003709. "PROX1 Gene." GeneCards. Web. 18 May 2015. <http://www.genecards.org/cgi- bin/carddisp.pl?gene=PROX1> Purcell, S., B. Neale, et al. (2007). "PLINK: a toolset for whole-genome association and population-based linkage anaylsis." Am J Hum Genet 81(3):559-75. Robinson, J. T., H. Thorvaldsdóttir, et al. (2011). "Integrative Genomics Viewer." Nat Biotechnol 29(1):24-26. Sadler, A. J. and B. R. G. Williams. (2008). "Interferon-inducible antiviral effectors." Nature Revievws Immunology 8(7):559-568. Savitz, J. and W. C. Drevets (2009). "Bipolar and major depressive disorder: Neuroimaging the developmental-degenerative divide." Neuroscience & Biobehavioral Reviews 33(5): 699-771. Schlaepfer, T. E., M. X. Cohen, et al. (2007). "Deep brain stimulation to reward circuitry alleviates anhedonia in refractory major depression." Neuropsychopharmacology 33(2): 368-377. Sequeira, A., L. Morgan, et al. (2012). "Gene expression changes in the prefrontal cortex, anterior cingulate cortex and nucleus accumbens of mood disorders subjects that committed suicide." PLoS ONE 7(4): e35367. 95 Shi, J., J. B. Potash, et al. (2010). "Genome-wide association study of recurrent early-onset major depressive disorder." Molecular Psychiatry 16(2): 193-201. Skaf, C. R., A. Yamada, et al. (2002). "Psychotic symptoms in major depressive disorder are associated with reduced regional cerebral blood flow in the subgenual anterior cingulate cortex: a voxel-based single photon emission computed tomography (SPECT) study." Journal of Affective Disorders 68(2-3): 295-305. Singh, N., A. C. Halliday, et al. (2013). "A safe lithium mimetic for bipolar disorder." Nat Commun 4:1332. Sobieszczuk, D. F., A. Poliakov, et al. (2010). "A feedback loop mediated by degradation of an inhibitor is required to initiate neuronal differentiation." Genes Dev 24(2):206-18. Stewart, W. F., J. A. Ricci, et al. (2003). "Lost productive time and cost due to common pain conditions in the US workforce." JAMA 290(18): 2443-2454. Sussman, N. (2010). "Reducing the trial and error factor in antidepressant treatment." Primary Psychiatry 17(5): 16-18. "TMED9 Gene." GeneCards. Web. 18 May 2015. < http://www.genecards.org/cgi- bin/carddisp.pl?gene=TMED9> Tomita, H., M. P. Vawter, et al. (2004). "Effect of agonal and postmortem factors on gene expression profile: quality control in microarray analyses of postmortem human brain." Biological Psychiatry 55(4): 346-352. Trapnell, C., L. Pachter et al. (2009). "TopHat: discovering splice junctions with RNA-Seq." Bioinformatics 25(9):1105-1111. 96 Weis, S., I. C. Llenos, et al. (2007). "Quality control for microarray analysis of human brain samples: The impact of postmortem factors, RNA characteristics, and histopathology." Journal of Neuroscience Methods 165(2): 198-209. Warde-Farley, D., S. L. Donaldson, et al. (2010). "The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function." Nucleic Acids Res 38:W214-W220. White, K. J., C. C. Walline, et al. (2005). "Serotonin transporters: implications for antidepressant drug development." AAPS Journal 7(2): 421-433. Yeung, K. Y. and W. L. Ruzzo (2001). "Principal component analysis for clustering gene expression data." Bioinformatics 17(9): 763-774.
Abstract (if available)
Abstract
Major depressive disorder (MDD) is one of the most challenging psychiatric disorders to study. It is a polygenic disorder that has a lifetime incidence of 10-15%, with women affected twice as often as men, for unknown reasons (Bierut, Heath et al. 1999). The high lifetime prevalence of this non-infectious but debilitating disorder necessitates progress in this field so that better therapeutic drugs can be developed to more effectively treat patients. ❧ Genome-wide association studies (GWASs) on major depression have yet to produce significant findings. This is likely because the effects of individual loci are small and/or because the heterogeneity of the disease is high. The involvement of non-genetic factors such as environmental variables is an additional complicating factor. As a result, the deciphering of the genetic architecture is difficult. While both genes and the environment are factors in the development of MDD, this project will focus on the genetics of the disorder. The long-term objective of this project is to understand the genetic basis of MDD and it is our goal to accomplish this by using next-generation sequencing to discover the genes that play a role in the development of this disorder. We will approach this problem from both genomic and transcriptomic angles to combine the study into one integrated analysis. It is hoped that the study of the genome and transcriptome will elucidate our understanding of the cause of MDD. No valid biological markers or clear-cut etiologic pathways have been discovered yet, so there is currently no method to predict or biologically test for this disease. ❧ The primary variant of interest from the genomic sequencing results resides in a natural splice site in KIF23, a gene involved in microtubule-dependent movement of organelles within the cell. An intronic insertion in the majority of cases in TMED9, which is involved in vesicular protein trafficking, was also discovered to be a potential predisposing variant. ❧ From the transcriptomic results, hundreds of genes were found to be dysregulated. However, a select few were of interest due to their distinct biological roles. GZMK, HOXB3, PROX1, BTBD6, GDNF, OAS1, RP4-539M6.19, TMED9 may each have an impact on the disease state through their dysregulation in brain tissue, namely in the BA25 region.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Exploring three-dimensional organization of the genome by mapping chromatin contacts and population modeling
PDF
Identification and characterization of cancer-associated enhancers
PDF
Application of genetic association methods in mice to understand phenotypes with a complex etiology
PDF
Detecting joint interactions between sets of variables in the context of studies with a dichotomous phenotype, with applications to asthma susceptibility involving epigenetics and epistasis
PDF
Investigating the function and epigenetic regulation of ABCA3, a novel LUAD tumor suppressor gene
PDF
Integrative genomic and epigenomic analysis of human cancer
PDF
Functional characterization of colorectal cancer GWAS loci
PDF
Behavioral choice assays and alcohol preference in Drosophila melanogaster
PDF
Using genomics to understand the gene selectivity of steroid hormone receptors
PDF
The essential role of histone H2A deubiquitinase MYSM1 in natural killer cell maturation and HSC homeostasis
PDF
Novel roles for Maf1 in embryonic stem cell differentiation and adipogenesis
PDF
The relationship between DNA methylation and transcription factor binding in colon cancer cells
PDF
Genetics and the environment: evaluating the role of noncoding RNA in autism spectrum disorder
PDF
The mechanisms of somatic cell reprogramming
PDF
The function of BS69 in mouse embryogenesis and embryonic stem cell differentiation
PDF
Exploring the genetic basis of complex traits
PDF
Gene expression and genetic variation of ERG is associated with inflammation in endothelial cells and risk of coronary artery disease in humans
PDF
Bayesian hierarchical models in genetic association studies
PDF
The contribution of arachidonate 5-lipoxygenase to inflammatory response in coronary artery disease
PDF
Characterization of three novel variants of the MAVS adaptor
Asset Metadata
Creator
Chen, Emily Anne
(author)
Core Title
Genomic variant analysis in whole blood of recurrent early-onset major depression patients and differential expression in BA25 of suicide completers with major depression
School
Keck School of Medicine
Degree
Doctor of Philosophy
Degree Program
Genetic, Molecular and Cellular Biology
Publication Date
07/14/2015
Defense Date
06/02/2015
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
BA25,brain,major depression,OAI-PMH Harvest,psychiatric,Suicide
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Pato, Carlos (
committee chair
), Knowles, James A. (
committee member
), Tokes, Zoltan A. (
committee member
)
Creator Email
emilyach@med.usc.edu,emilyannechen@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-592528
Unique identifier
UC11302041
Identifier
etd-ChenEmilyA-3592.pdf (filename),usctheses-c3-592528 (legacy record id)
Legacy Identifier
etd-ChenEmilyA-3592.pdf
Dmrecord
592528
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Chen, Emily Anne
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
BA25
brain
major depression
psychiatric