Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Genetic architecture underlying variation in different traits in the Pacific oyster Crassostrea gigas
(USC Thesis Other)
Genetic architecture underlying variation in different traits in the Pacific oyster Crassostrea gigas
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
i GENETIC ARCHITECTURE UNDERLYING VARIATION IN DIFFERENT TRAITS IN THE PACIFIC OYSTER CRASSOSTREA GIGAS by Xiaoshen Yin __________________________ A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (MARINE BIOLOGY AND BIOLOGICAL OCEANOGRAPHY) August 2018 Copyright 2018 Xiaoshen Yin ii Acknowledgements This dissertation would not have been possible without the advice and the assistance from many people. First and foremost, I would like to truly thank my advisor, Dennis Hedgecock, for his understanding, patience, encouragement, support, inspiration and instruction. I would also thank him for starting setting up those valuable partially inbred lines of Pacific oysters since 1996, when I just finished my first year in elementary school and did not really expect to do a Ph.D. in the future. I will always remember those up to three-hour weekly discussions with him, which not only equip me with essential background knowledge in genetics, but also show me how to do good science as a young scientist. It has been an invaluable experience to work with Dennis for nearly five years and I could not have imagined a better advisor than him. Also, I would like to thank Alberto Arias Perez, the previous postdoctoral researcher in the Hedgecock lab. Alberto has helped me to find answers for countless questions and to fix all kinds of problems since I started my Ph.D. program at USC, which has set a solid foundation for my Ph.D. work. Meanwhile, I would like to thank my screening and guidance committee members, who have added important tools to my toolkits for doing independent research, and my dissertation committee members, whose valuable advice and critiques have made me think critically and deeply. Furthermore, I would like to thank all staff in Marine Biology and Biological Oceanography program and at Wrigley Marine Science Center for their generous help and support in my research. Moreover, I would like to thank all of my excellent friends. Last but not the least, I would like to deeply thank my family for being always with me, no matter where I go and no matter what I do. iii Table of Contents Acknowledgements ......................................................................................................................... ii Table of Contents ........................................................................................................................... iii List of Tables ................................................................................................................................ vii List of Figures ................................................................................................................................ ix Abstract .......................................................................................................................................... xi Introduction ..................................................................................................................................... 1 1. Background to Questions Studied ........................................................................................ 1 1.1 Fitness-related traits and classical vs. balanced views of genetic diversity ................. 1 1.2 Type III survivorship and distortions of Mendelian segregation ratios ........................ 2 1.3 Inbreeding depression and heterosis ............................................................................. 3 1.4 Sex polymorphism and sex determination.................................................................... 5 1.5 Questions to study ........................................................................................................ 6 2. Approaches to Constructing Genetic Architecture underlying Phenotypic Variation ......... 7 2.1 Linkage mapping .......................................................................................................... 8 2.2 Quantitative Trait Locus (QTL) mapping .................................................................... 9 3. Pedigrees and Rearing of Six, F2 Interrelated Families ..................................................... 11 References ................................................................................................................................. 14 Chapter 1 ....................................................................................................................................... 18 An Application of a Bayesian Hierarchical Model in Analyzing Crossbreeding Trials to Increase the Yield of the Pacific Oyster Crassostrea gigas ........................................................................ 18 Abstract ..................................................................................................................................... 18 1. Introduction ........................................................................................................................ 19 2. Materials and Methods ....................................................................................................... 21 2.1 Data on yield from diallel crosses .............................................................................. 21 2.2 Statistical models ........................................................................................................ 22 2.3 Diallel cross simulation .............................................................................................. 24 2.4 Parent-line ranking ..................................................................................................... 26 2.5 Double-cross hybrid parent selection ......................................................................... 27 3. Results ................................................................................................................................ 28 iv 3.1 Predictions by GLM and Bayesian methods in diallel cross analysis ........................ 28 3.2 Selection of superior parent lines for F1 and double-cross hybrids ............................ 28 4. Discussion .......................................................................................................................... 35 4.1 Comparison of GLM and Bayesian analysis .............................................................. 35 4.2 Line-specific GCA ranking and parent line selection ................................................ 35 4.3 Double-cross hybrids .................................................................................................. 37 References ................................................................................................................................. 40 Chapter 2 ....................................................................................................................................... 42 High-Density Linkage Maps Constructed with SNPs from Genotyping by Sequencing for the Pacific Oyster Crassostrea gigas .................................................................................................. 42 Abstract ..................................................................................................................................... 42 1. Introduction ........................................................................................................................ 43 2. Materials and Methods ....................................................................................................... 47 2.1 Mapping families ........................................................................................................ 47 2.2 Genotyping-by-Sequencing, GATK analysis, SNP discovery and parentage analysis 47 2.3 Linkage analysis ......................................................................................................... 50 2.4 Consensus map construction ...................................................................................... 52 3. Results ................................................................................................................................ 52 3.1 Numbers of high quality SNP markers ....................................................................... 52 3.2 Map length, map density and genome coverage ......................................................... 52 3.3 Comparisons of RG, ML and LEP maps .................................................................... 54 3.4 A consensus map ........................................................................................................ 56 4. Discussion .......................................................................................................................... 59 4.1 Changes in numbers of markers from GBS to linkage mapping ................................ 59 4.2 Map length, map density and genome coverage ......................................................... 59 4.3 Comparisons of RG, ML and LEP maps .................................................................... 60 4.4 A consensus map ........................................................................................................ 61 4.5 Scaffold misassembly in the C. gigas genome ........................................................... 61 References ................................................................................................................................. 63 Chapter 3 ....................................................................................................................................... 68 Mapping Genetic Factors Determining Type III Survivorship in the Pacific Oyster Crassostrea gigas .............................................................................................................................................. 68 v Abstract ..................................................................................................................................... 68 1. Introduction ........................................................................................................................ 69 2. Materials and Methods ....................................................................................................... 72 2.1 Biological materials .................................................................................................... 72 2.2 Data collection ............................................................................................................ 72 2.3 Data analyses .............................................................................................................. 73 3. Results ................................................................................................................................ 79 3.1 Distortions of Mendelian segregation ratios ............................................................... 79 3.2 The number, position, magnitude of gene effect and mode of gene action of vQTL . 79 3.3 Genetic models of vQTL ............................................................................................ 79 4. Discussion .......................................................................................................................... 91 4.1 Numbers of scored and distorted markers .................................................................. 91 4.2 Detecting and localizing vQTL .................................................................................. 92 4.3 Magnitudes of gene effects and modes of gene action ............................................... 93 4.4 Improvements achieved by adopting high-density linkage maps ............................... 95 4.5 Comparisons of vQTL across six families ................................................................. 98 References ................................................................................................................................. 99 Chapter 4 ..................................................................................................................................... 102 Uncovering the Genetic Bases of Growth Heterosis in the Pacific Oyster Crassostrea gigas ... 102 Abstract ................................................................................................................................... 102 1. Introduction ...................................................................................................................... 103 2. Materials and Methods ..................................................................................................... 106 2.1 Biological materials .................................................................................................. 106 2.2 Data collection .......................................................................................................... 106 2.3 Data analyses ............................................................................................................ 107 3. Results .............................................................................................................................. 112 3.1 Identifying “Parent-line” and “Hybrid” genotypes in the F2 families ...................... 112 3.2 Principal component analysis ................................................................................... 114 3.3 Growth QTL ............................................................................................................. 117 3.4 Genetic models underlying growth QTL .................................................................. 124 3.5 Genotype-by-environment interaction ...................................................................... 126 4. Discussion ........................................................................................................................ 127 vi 4.1 Genetic bases of gQTL for yield heterosis ............................................................... 127 4.2 Genotype-by-environment interaction ...................................................................... 129 4.3 Biological interpretation of principal components ................................................... 131 References ............................................................................................................................... 138 Chapter 5 ..................................................................................................................................... 142 Sex-specific Mortality Is a Confounding Factor in Exploring Sex Determination in the Pacific Oyster Crassostrea gigas ............................................................................................................ 142 Abstract ................................................................................................................................... 142 1. Introduction ...................................................................................................................... 143 2. Materials and Methods ..................................................................................................... 145 2.1 Biological materials and data collection ................................................................... 145 2.2 Data analyses ............................................................................................................ 146 3. Results .............................................................................................................................. 147 4. Discussion ........................................................................................................................ 152 References ............................................................................................................................... 155 Conclusions ................................................................................................................................. 157 References ............................................................................................................................... 161 vii List of Tables Table 1. Summary of diallel crosses ............................................................................................. 22 Table 2. Relationships among GLM-predicted, Bayesian-predicted, and observed yields in analyses of 5×5 complete diallel crosses ...................................................................................... 29 Table 3. Predictions of yield by Bayesian analysis for existing families in incomplete diallel crosses ........................................................................................................................................... 30 Table 4. The number of mismatches in top three parent lines in simulated complete (CPLT) and incomplete (R and RRC) diallel crosses ....................................................................................... 30 Table 5. Numbers of markers at key steps from GBS to final linkage maps for six families ...... 53 Table 6. Numbers of markers on and lengths of final linkage maps for six mapping families .... 54 Table 7. Coefficients of correlation between rank orders of markers on Regression and Maximum Likelihood maps .......................................................................................................... 55 Table 8. Coefficients of correlation between rank orders of markers on Regression and Lep- MAP2 maps .................................................................................................................................. 55 Table 9. Lengths of Regression (RG), Maximum Likelihood (ML) and Lep-MAP2 (LEP) maps (in cM) .......................................................................................................................................... 56 Table 10. Numbers of markers on and lengths of consensus maps .............................................. 57 Table 11. Comparisons of average marker spacings, map lengths and numbers of markers between previous and current linkage maps ................................................................................. 60 Table 12. Number of scaffolds that map to one or more than one linkage group, by number of SNPs per scaffold .......................................................................................................................... 62 Table 13. Segregation results for six F2 families .......................................................................... 81 Table 14. Viability QTL in six F2 families ................................................................................... 82 viii Table 15. Genetic models of 65 vQTL in six F2 families ............................................................. 88 Table 16. Summary of genetic models underlying vQTL ............................................................ 90 Table 17. Chi-square tests of independence of genetic similarity across four pairs of F1, phased haplotypes for all linkage groups in six, F2 families ................................................................... 113 Table 18. Phenotypic variance explained by Principal Component Analysis (PCA) ................. 115 Table 19. QTL of (A) principal components and (B) individual size- and growth-related traits 120 Table 20. Genetic models for gQTL ........................................................................................... 125 Table 21. Post hoc comparisons of monthly means following one-way ANOVA of monthly temperatures for July-October in 2012 ....................................................................................... 126 Table 22. ANOVA testing genotype-by-environment interaction .............................................. 127 Table 23. Dates of collecting data on live weights (lw) of Pacific oysters ................................. 134 Table 24. Criteria for assigning genetic models to growth QTL ................................................ 134 Table 25. Genetic models of QTL for size- and growth-related traits (trait-QTL)..................... 135 Table 26. Genetic models of QTL for principal components (PC-QTL) of size- and growth- related traits ................................................................................................................................. 137 Table 27. Numbers of individuals in four categories of sexual differentiation in six F2 families ..................................................................................................................................................... 148 Table 28. Potential sex QTL in F2 families................................................................................. 148 Table 29. Individuals cross-classified by sex and genotype for each potential sQTL ................ 149 ix List of Figures Figure 1. Pedigree information. .................................................................................................... 13 Figure 2. Bayesian-predicted vs GLM-predicted line-specific general combining abilities. ....... 31 Figure 3. Difference in rank of line-specific GCA between simulated complete (CPLT) and incomplete (R and RRC) diallel crosses. ...................................................................................... 32 Figure 4. Histograms of correlation coefficients (r 2 ) of predicted yields between simulated complete and incomplete diallel crosses for all, existing and missing families in diallel crosses.34 Figure 5. Correlations between rank orders of markers on Regression map (RG map) and consensus map made using LPmerge (LP map). .......................................................................... 58 Figure 6. Chi-square tests for testing genetic effects for vQTL1 in family 23×31. ...................... 75 Figure 7. Six genetic models of vQTL.......................................................................................... 78 Figure 8. Viability QTL in six F2 families. ................................................................................... 87 Figure 9. Viability QTL13 in family 23×31. ................................................................................ 97 Figure 10. Expected genetic similarity of parent-line and hybrid haplotypes. ........................... 110 Figure 11. Chi-square test of independence of genetic similarity across four pairs of F1, phased haplotypes for LG1 in family 23×31. ......................................................................................... 114 Figure 12. Distribution of genetic similarity between haplotypes in parent-line and hybrid genotypes. ................................................................................................................................... 114 Figure 13. Loadings of traits on principal components. ............................................................. 116 Figure 14. LRT profiles for PC1 across six F2 families. ............................................................. 119 Figure 15. Genetic models of yield-related QTL. ....................................................................... 129 Figure 16. Growth curve of log-transformed live weights in 23×31. ......................................... 130 Figure 17. An example trait-QTL with significant genotype-by-environment interaction. ....... 131 x Figure 18. LRT profiles for QTL mapping of sex as an ordinal trait. ........................................ 150 Figure 19. Correlation between LRTs for QTL mapping of sex as an ordinal and a continuous trait. ............................................................................................................................................. 151 xi Abstract As a species that has been introduced from Asia to all continents but Antarctica, the Pacific oyster Crassostrea gigas is of high economic value and great scientific significance. Farmed Pacific oysters play crucial roles in satisfying rapidly growing needs for wholesome, environmentally friendly, and economically viable seafood. More importantly, many interesting phenomena, including type III survivorship, Mendelian segregation distortion, inbreeding depression, heterosis, sex polymorphism, and sex reversal, widely observed in marine invertebrates like the Pacific oyster, are believed to be at least partially genetically determined. The availability of pedigreed lines and a sequenced genome enable the Pacific oyster to be a great model for exploring genetic causes for these phenomena in marine invertebrates. In order to produce high-yielding oyster seeds, we usually conduct diallel analysis, which can decompose the yield variance into genetic components, general combining ability (GCA), specific combining ability (SCA) and reciprocal effect (R), to select elite parent lines. Since the traditional Generalized Linear Model with fixed effects (i.e. GLM) cannot accommodate unanticipated missing information in a diallel, I use a Bayesian hierarchical model (i.e. Bayesian analysis) to resolve issues arising in incomplete diallel analysis. My study shows that the Bayesian hierarchical model can effectively deal with missing information, accurately predict genetic components and yield, and properly pick superior parent lines for commercial production. In addition to being a commercially important species, the Pacific oyster is also a model organism for studies on marine invertebrates because it has substantial genetic and genomic resources for elucidating the genetic bases of high early mortality, yield heterosis and sex determination. This dissertation aims to determine the genetic architectures underlying variation xii in viability, growth and sex by conducting quantitative trait locus (QTL) mapping on six, inter- related F2 families of Pacific oysters. I construct linkage maps based on single nucleotide polymorphisms (SNPs) detected by Illumina sequencing of reduced-representation genomic libraries (genotyping-by-sequencing, GBS). These high-density linkage maps have an average marker spacing of 0.49 cM and provide frameworks for QTL mapping within each family. Building upon previous discoveries, this dissertation has three major findings. First, viability QTL (vQTL) mapping with high-density linkage maps permits testing of specific models of vQTL gene action, narrows vQTL peaks arising from single, recessive deleterious mutations, and teases apart multiple deleterious mutations under broad vQTL peaks. Additionally, growth QTL (gQTL) mapping reveals that dominant and overdominant gene effects contribute to yield heterosis in the Pacific oyster. While positive dominance and overdominance support the two, classical, alternative hypotheses for heterosis, dominance and overdominance, cases of negative dominance suggest that a more complex genetic architecture is needed to explain heterosis. Along with genetic effects discovered from gQTL mapping, significant genotype-by-temperature interactions are detected at certain gQTL, especially for inbred, parent-line genotypes, supporting Lerner’s (1954) idea of genetic homeostasis at the locus level. Finally, through sex QTL (sQTL) mapping, we detect one potential candidate QTL that may account for sex determination in the Pacific oyster. However, it is of great necessity to discriminate sex-determining genes from sex- specific viability genes in order to validate potential sQTL. By illuminating genetic contributions to variation in fitness-related traits, including survival, growth (i.e. yield), and sex, in the Pacific oyster at the organismal and family levels using powerful QTL mapping and Bayesian approaches, this dissertation not only sheds light on xiii evolution and biological conservation in the ocean but also promotes sustainable shellfish aquaculture. 1 Introduction 1. Background to Questions Studied 1.1 Fitness-related traits and classical vs. balanced views of genetic diversity As a species that has been introduced from Asia to all continents but Antarctica, the Pacific oyster Crassostrea gigas is of high economic value and great scientific significance (Mann, 1979). Being identified as “the best of the best” seafood, farmed Pacific oysters play crucial roles in promoting sustainable aquaculture, which produces wholesome, environmentally friendly, and economically viable seafood. In order to keep pace with rapidly growing human consumption, the efficiency of oyster farming needs to be greatly improved through scientific approaches. More importantly, many interesting phenomena, including Mendelian segregation distortion, inbreeding depression and heterosis, and sex reversal, have been widely observed in the Pacific oyster. The availability of pedigreed lines and a sequenced genome enable the Pacific oyster to be a great model for exploring genetic causes for these phenomena in marine invertebrates in order to shed light on evolution and biological conservation in the ocean (Hedgecock et al., 1995; Curole et al., 2010; Plough and Hedgecock, 2011; Plough, 2012; Zhang et al., 2012). Accordingly, both economic and scientific values urge us to uncover how genetic diversity contributes to phenotypic variation for fitness-related traits, including survival, growth (i.e. yield), and sex determination, in the Pacific oyster. There exist two views of genetic diversity (i.e. classical and balanced) in evolution of fitness-related traits. The classical model assumes that, in populations of a species, genetically identical homozygotes of normal or wild-type genes show the highest adaptedness in a constant and uniform environment (Dobzhansky, 1970). However, a constant and uniform environment can hardly be achieved in nature in the real world, casting doubt on the classical model 2 (Dobzhansky, 1970). In contrast, balancing selection argues that genetic diversity is needed for a population to cope with spatially and temporally changing environments (Dobzhansky, 1970). Among different types of balancing selection, heterozygotes show a higher fitness than corresponding homozygotes in heterotic balance while the best adapted genotypes differ among environments in diversifying selection (Dobzhansky, 1970). Genetic diversity is the ultimate source of variation in fitness-related traits observed in nature and thus sets the foundation for uncovering the genetic determinants of phenotypic variation in controlled crosses. 1.2 Type III survivorship and distortions of Mendelian segregation ratios Many marine invertebrates, like the Pacific oyster, have high fecundity and high mortality during early life stages (type III survivorship). Both environmental (i.e. nutrition, temperature, desiccation, pCO2, salinity, water motion and solar radiation) and genetic factors are believed to be major drivers of the high early mortality in marine invertebrates (Phillips 2002; Parker et al., 2010; Plough and Hedgecock 2011; Plough et al., 2016). The first hints of genetic inviability in marine invertebrates were widespread reports of distortions of Mendelian segregation ratios in bivalves, including clams, mussels and oysters (Wada, 1975; Wilkins 1976; Beaumont et al., 1983; Gaffney and Scott, 1984; Foltz, 1986; Thiriot-Quiévreux et al., 1992; Hu et al., 1993; McGoldrick and Hedgecock, 1997; Launey and Hedgecock, 2001; Plough and Hedgecock, 2011; Plough et al., 2016). Genetic bases of Mendelian segregation distortion have been broadly investigated (Launey and Hedgecock, 2001; Plough and Hedgecock, 2011; Plough, 2012; Plough et al., 2016). Marker-based analyses find significant Mendelian segregation distortion in spat but not early trochophore larvae and ascribe the departure from Mendelian expectation to selection against identical-by-descent (IBD) recessive deleterious mutations (Launey and Hedgecock, 2001). 3 Subsequent viability quantitative trait locus (vQTL) mappings demonstrate 50% of vQTL expressed during metamorphosis, reveal mortality arising from both recessive and partially recessive alleles, and detect Mendelian segregation distortion in both inbred and random-bred families (Plough and Hedgecock, 2011; Plough et al., 2016). These previous discoveries verify the partially genetically determined type III survivorship in the Pacific oyster and suggest high genetic loads maintained by high mutation rates as the most likely cause for genotype deficiencies. Widely observed deficiencies of homozygous genotypes imply inbreeding depression and raise the question of explanations for inbreeding depression and its converse phenomenon, heterosis (Launey and Hedgecock, 2001; Plough and Hedgecock, 2011; Plough, 2012; Plough et al., 2016). 1.3 Inbreeding depression and heterosis Inbreeding depression, the reduction in survival and fertility of offspring produced by related parents, and its converse phenomenon, heterosis, the better performance of hybrid offspring than their inbred parents, have been widely observed in animals and plants in both natural and domesticated populations (Darwin, 1877; Charlesworth and Willis, 2009). Deficiencies in homozygous genotypes in Mendelian segregation distortion demonstrate that inbreeding depression in fitness-related traits is a common phenomenon in the Pacific oyster (Launey and Hedgecock, 2001; Plough and Hedgecock, 2011; Plough, 2012; Plough et al., 2016). Likewise, heterosis in fitness-related traits has been broadly discovered in bivalve molluscs since 1978 (Singh and Zouros, 1978; Hedgecock et al., 1995; Hedgecock and Davis, 2007). Since heterosis is pervasive in plants, especially crops, and animals, it has been widely exploited in plant and animal breeding. Hedgecock et al. (1995) confirm the existence of non- 4 additive components of genetic variance in larval mortality, larval size and juvenile size in the Pacific oyster. Hedgecock and Davis (2007) demonstrate that non-additive genetic components, specific combining ability (SCA) and reciprocal effect (R), make larger contributions to yield than general combining ability (GCA) does. These previous studies suggest crossbreeding of intentionally inbred lines as an important way to improve the production of farmed Pacific oysters (Lannan, 1980; Hedgecock et al., 1995; Hedgecock and Davis, 2007). Along with the practical application of heterosis, its genetic causes have been extensively studied. So far, three major genetic hypotheses, including overdominance, dominance and epistasis, have been proposed to explain heterosis and its converse phenomenon, inbreeding depression (Bruce, 1910; Crow, 1948; Stuber et al., 1992; Xiao et al., 1995; Cockerham and Zeng, 1996; Crow, 1998; Birchler et al., 2003, 2005, 2006; Charlesworth and Willis, 2009; Birchler et al., 2010). In the overdominance hypothesis, the lower fitness upon inbreeding is caused by the loss of interactions between heterozygous alleles that contribute to a higher fitness (Crow, 1998; Birchler et al., 2006; Charlesworth and Willis, 2009). In the dominance hypothesis, F1 hybrids are superior to inbred parent lines because deleterious recessive mutations inherited from one homozygous parent can be masked by wild-type alleles from the other parent (Crow, 1998; Birchler et al., 2006; Charlesworth and Willis, 2009). In the epistasis hypothesis, the better performance in hybrid offspring is ascribed to interactions between different genes. While each of these hypotheses can partially explain heterosis, the debate over the two classical, alternative hypotheses, dominance and overdominance, has lasted for more than a hundred years without being resolved (Crow, 1998). Studies on controlled crosses support epistasis as a potential explanation for heterosis by demonstrating both positive and negative heterosis in growth in the Pacific oyster (Hedgecock et 5 al., 1995). Additionally, Mendelian segregation distortion arising from deleterious recessive mutations supports the hypothesis of dominance (Launey and Hedgecock, 2001; Plough and Hedgecock, 2011; Plough, 2012). Furthermore, transcriptomic analyses show overdominant, underdominant, dominant and partially dominant patterns of gene expression in hybrids relative to inbred parents (Hedgecock et al., 2007). While these previous studies have greatly enhanced our understanding of heterosis in fitness-related traits, an investigation at the genomic level is indispensable in order to illuminate the genetic bases of heterosis in marine bivalves. 1.4 Sex polymorphism and sex determination In addition to the XX and XY system and the ZW and ZZ system in higher organisms, sex determination mechanisms also include haplodiploid systems, temperature-determined systems, hermaphroditic systems, parthenogenetic development, self-fertilization and alternation of generations (Carlson, 2013). While both genetic and environmental factors may impact sex determination, the genetic-control mechanism for sex determination is poorly understood in many organisms. Molluscs, like the Pacific oyster, are suitable for studies on sex evolution and determination, due to their various sex determination mechanisms (Yusa, 2007). By examining gonad maturation, sexual differentiation and sex change, early studies show that oysters can seasonally alternate sex (i.e. mature as male first and then switch to female), stay hermaphroditic, or remain dioecious (Coe, 1932a, 1932b, 1934, 1936, 1938, 1943). Moreover, environmental factors, such as temperature, nutrients and stress, are found to play a role in sex determination in oysters (Coe, 1932a, 1932b, 1934, 1936, 1938, 1943; Chávez-Villalba et al., 2011; Joyce et al., 2013; Santerre et al., 2013). In addition to environmental factors, previous studies suggest that genetic factors may also be involved in sex determination (Coe, 1932b; Needler, 1941; Coe, 1943; Chávez-Villalba et al., 2011). 6 A three-locus model for sex determination in oysters is first proposed by Haley (1977, 1979), in which each locus has two forms of alleles with m for maleness and f for femaleness and sex was determined by the ratio of m:f. Then, a one-locus genetic model, conforming with XX/XY system, is constructed by Guo et al. (1998), in which a dominant male allele, M, and a protandric female allele, F, determine sex in the Pacific oyster with MF being true males and FF being protandric females. Subsequently, Hedrick and Hedgecock (2010) construct a three- genotype model with MM individuals being male, FF individuals being female, and FM individuals being protandric. While these genetic models have shed light on sex determination in oysters, it is still possible that none of them is perfect. As suggested by Guo et al. (1998) and Hedgecock and Hedrick (2010), in order to figure out the genetic bases of sex determination in oysters, an examination on a large number of marker loci in controlled crosses of oysters would be necessary. 1.5 Questions to study Questions at the Family Level Using a Bayesian Hierarchical Model Diallel crosses are used to evaluate what genetic components, including general combining ability (GCA), specific combining ability (SCA) and reciprocal effect (R), contribute to variation in growth (i.e. yield). We apply a Bayesian hierarchical model (Lenarcic et al., 2012) to analyze data from diallel crosses, which take a set of inbred parents to produce offspring from all possible mating pairs, to answer the question: (i) Can the Bayesian hierarchical model better deal with missing information, effectively decompose variance in growth (i.e. yield) into genetic components, and facilitate the selection of elite parent lines for further crossing? (Ch.1) 7 Questions at the Organismal Level Using Quantitative Trait Locus (QTL) Mapping While genetic architecture underlying variation in viability, growth and sex have been extensively explored in the Pacific oyster, several questions still remain unresolved. First, due to the relatively smaller number of markers involved in viability analysis, previous studies can hardly identify more accurate genomic locations and genetic models accounting for genetic inviability. Additionally, while many studies have successfully detected QTL of growth-related traits in marine bivalves, they are unable to uncover genetic mechanisms accounting for these QTL and thus leave the genetic architecture underling growth heterosis incomplete. Finally, sex determination mechanisms are mostly constructed according to family-specific sex ratios, but rarely studied at the genomic level. Therefore, through QTL mapping with high-density linkage maps (constructed in Ch.2), we aim to answer three major questions: (ii) Can vQTL peaks be more narrowly localized in order to facilitate the identification of candidate genes and can specific genetic models of vQTL be constructed in order to better explain genotype-dependent early mortality? (Ch.3) (iii) What genetic causes can account for growth heterosis at the organismal level? (Ch.4) (iv) What genetic models can explain sex determination? (Ch.5) 2. Approaches to Constructing Genetic Architecture underlying Phenotypic Variation Numerous previous studies have demonstrated the contribution of genetic factors to phenotypic variation, so it is essential to uncover genetic bases of variation in fitness-related traits, including survival, growth and sex determination (Hedgecock et al., 1995; Guo et al., 1998; Launey and Hedgecock, 2001; Hedgecock and Davis, 2007; Hedrick and Hedgecock, 2010; Plough and Hedgecock, 2011; Plough, 2012; Plough et al., 2016). Genetic architecture 8 underlying variation in fitness-related traits refers to the number, the genomic location, the magnitude of gene effects and the mode of gene actions of genetic factors contributing to phenotypic variation. Approaches to revealing genetic mechanisms of phenotypic variation include Mendelian inheritance analysis, traditional quantitative genetic methods, genome-wide association study (GWAS), and quantitative trait locus (QTL) mapping (Hedgecock et al., 1995; Hedgecock and Davis, 2007; Griffiths, 2012; Ge et al., 2015). However, Mendelian inheritance analysis are not very powerful in studying traits with complex inheritance; traditional quantitative genetic methods mainly focus on variance in a certain trait among controlled crosses without revealing specific mechanisms; GWAS may not be suitable for studies on the Pacific oyster due to the large sample size and high cost. Therefore, my study applies QTL mapping in constructing the genetic architecture underlying complex fitness-related traits, survival, growth and sex determination, in the Pacific oyster. 2.1 Linkage mapping A linkage map indicates the relative locations of genes or genetic markers on a chromosome; it is a statistical construct based on recombination frequencies between genes or genetic markers (Griffiths, 2012). Linkage mapping, which can be conducted using amplified fragment length polymorphism (AFLP) markers, randomly amplified polymorphic DNA (RAPD) markers, microsatellite DNA markers (microsatellites) and single nucleotide polymorphism markers (SNPs), is an indispensable step for QTL mapping and marker-assisted selection (Lander and Botstein 1989; Doerge 2002; Hu and Xu 2009). Three linkage maps have been constructed for the Pacific oyster. The first AFLP-based linkage map (i.e. the first-generation linkage map) constructed by Li and Guo (2004) has an average marker spacing of 8.8 and 9.5 cM for male and female maps, but is not readily 9 transferred to other families because it depends on the restriction enzyme digestion at genomic sites that are not known or annotated. The issue of poor transfer is resolved by the second linkage map based on microsatellites (i.e. the first-generation linkage map) constructed by Hubert and Hedgecock (2004). The microsatellite-based linkage map, with an average marker spacing of 8.0 cM and 10.4 cM for male and female maps, has a low density of markers. Improved upon the microsatellite-based linkage map, the third microsatellite- and SNP-based linkage map (i.e. the second-generation linkage map) constructed by Hedgecock et al. (2015) has a grand mean length of 588 cM and an average marker spacing of 1.0 cM. The second-generation linkage map is nearly 10× denser than the first-generation linkage maps. Improving upon these previous linkage maps for the Pacific oyster, I construct third- generation high-density linkage maps based on SNPs from genotyping-by-sequencing (GBS) for six, interrelated F2 families of Pacific oysters (sire×dam) (i.e. 23×31, 23×40, 31×23, 40×92, 47×92 and 92×40), using Regression (RG maps), Maximum Likelihood (ML maps) and Lep- MAP2 (LEP maps) methods (Ooijen 2011; Rastas et al., 2015). RG maps, with an average marker spacing of 0.49 cM, are two times denser than the second-generation linkage map and are taken as final, framework maps for QTL analyses. 2.2 Quantitative Trait Locus (QTL) mapping Quantitative traits refer to traits having complex inheritance controlled by both environmental factors and multiple loci (Griffiths, 2012). A locus determining variation in a quantitative trait is defined as a quantitative trait locus (QTL) (Griffiths, 2012). Genetic architecture contributing to variation in complex fitness-related traits can be constructed using QTL mapping and follow-up genetic analyses (Griffiths, 2012). 10 Basically, QTL mapping can detect significant loci responsible for variation in a trait by examining whether the correlation between segregating phenotypes and genotypes within a F2 or backcross family is significant at a locus (Griffiths, 2012). Ideally, we can generate a hybrid F1 family by crossing two purely inbred parents differing in phenotype(s) of interest. Then, we establish a F2 family through full-sibling crossing between individuals in the F1 family or set up a backcross family by backcrossing the F1 family with one of its parents. With genotypic and phenotypic data collected from the F2 or backcross family, we can detect QTL by testing whether the correlation between segregating genotypes and phenotypes in the F2 or backcross family is significant at a locus (Griffiths, 2012). A locus can be identified as a QTL if the LOD score, which is the log10 of odds (i.e. the ratio of the probability of observing the data given that there is a QTL to the probability of observing the data given that there is no QTL), exceeds a threshold. Typically, the significane threshold is determined by permutation of the phenotypic data among individuals (Churchill and Doerge, 1994). In the case of mapping vQTL explaining distortions of Mendelian segregation ratios, there are no phenotypic data on viability to permute, so an approximation of the significance threshold is available (Piepho, 2001). We conduct QTL mapping on fitness-related traits using PROC QTL developed by Hu and Xu (2009). In running PROC QTL, we encountered a “Floating Point Zero Divide” error in all six families, which likely arises from the small spacing between markers, causing the program to stop (Z. Hu, University of California-Riverside Department of Botany and Plant Sciences, personal communication). We could rectify these errors by removing markers at the point in the data where the error was generated. In order to set the significance threshold for detecting QTL through permutation, we first need to permutate 1000 datasets and then run PROC QTL on each of them (Churchill and Doerge, 1994). Even though a linkage map is determined for running 11 PROC QTL on observed datasets by removing problematic markers for each family, this linkage map does not work for each permutated dataset. Thus, we have to permutate and run PROC QTL on more than 1000 datasets for setting the significance threshold, taking about two days for each trait in each family. Due to this error, it proved unrealistic to determine significance thresholds, even for measured traits, by analyzing1000 permutated datasets using PROC QTL. Therefore, significance thresholds for QTL of all fitness-related traits are approximated according to the method of Piepho (2001). 3. Pedigrees and Rearing of Six, F2 Interrelated Families In our study, QTL mapping was done for viability, growth and sex in six, interrelated F2 families (sire×dam), 23×31, 23×40, 31×23, 40×92, 47×92 and 92×40. Compared to the ideal experimental design of QTL mapping on F2 crosses, we use partially, instead of purely, inbred parent lines to set up F2 families. G0 families 23, 31, 47 and 90 were established using wild- caught parents by the Molluscan Broodstock Program (MBP) in 1996 (Langdon et al., 2003). G0 family 40 was established using wild-caught parents at the Taylor Shellfish Farms hatchery in 2001. After one or two generations of inbreeding, five partially inbred families, 23 (f=0.25), 31 (f=0.25), 40 (f=0.25), 47 (f=0.375) and 92 (f=0.375), were among seven parent lines used for a diallel cross at the Taylor hatchery in 2009. In May 2011, adults from 14 of the F1 hybrid families produced by this diallel cross became parents of F2 families by brother-sister crosses. These F2 families were reared in Thorndyke Bay, WA to the point at which individuals could be tagged and followed. The numbers of individuals per family cannot be interpreted as measures of “success” (i.e. survival). By June 2012, only six F2 families had sufficient numbers remaining for the study, 23×31 (f=0.31), 23×40 (f=0.31), 31×23 (f=0.31), 40×92 (f=0.32), 47×92 (f=0.33) and 92×40 (f=0.33). In June or July of 2012, when individual oysters were large enough, we glued 12 shellfish tags with a unique three-digit number to each individual, color-coded by family. We reared approximately 50 individuals from each family separately in suspended, rotating, cylindrical seed cages (cylinders 701-726 for six F2 families). We named an oyster with Tag ID- Cylinder ID-Family ID. For example, 000-706-23×31 represents an oyster from family 23×31 that was grown in cylinder 706 and tagged with the number 000. Among the six F2 families, families 23×31 and 31×23 and families 40×92 and 92×40 are two pairs of reciprocal families. Families 23×31 and 23×40 share the same paternal grandparent, families 23×40 and 92×40 share the same maternal grandparent, and families 40×92 and 47×92 share the same maternal grandparent. Detailed pedigree information is shown in Figure 1. With these six, interrelated F2 families, we determine the genetic architecture underlying variation in viability, growth and sex in the Pacific oyster. 13 1996 2001 2003 2004 2006 2009 201 1 23 31 23 31 31 23 40 23 40 G 0 G 1 F 1 F 2 ♂ ♀ ♂ ♀ ♂ ♀ ♂ ♀ ♂ ♀ 92 47 40 92 92 40 47 92 G 0 Wild Wild f = 0.32 f = 0.31 f = 0.33 G 0 G 1 G 2 Figure 1. Pedigree information. 14 References Beaumont, A., Beveridge, C., Budd, M., 1983. Selection and heterozygosity within single families of the mussel Mytilus edulis (L.). Marine Biology Letters 4, 151–161. Birchler, J.A., Auger, D.L., Riddle, N.C., 2003. In search of the molecular basis of heterosis. Plant Cell 15, 2236–2239. Birchler, J.A., Riddle, N.C., Auger, D.L., Veitia, R.A., 2005. Dosage balance in gene regulation: biological implications. Trends in Genetics 21, 219–226. Birchler, J.A., Yao, H., Chudalayandi, S., 2006. Unraveling the genetic basis of hybrid vigor. Proc Natl Acad Sci U S A 103, 12957–12958. Birchler, J.A., Yao, H., Chudalayandi, S., Vaiman, D., Veitia, R.A., 2010. Heterosis. The Plant Cell 22, 2105–2112. Bruce, A.B., 1910. The Mendelian theory of heredity and the augmentation of vigor. Science, New Series 32, 627–628. Carlson, E.A., 2013. The 7 sexes: biology of sex determination. Indiana University Press, Bloomington. Charlesworth, D., Willis, J.H., 2009. The genetics of inbreeding depression. Nat Rev Genet 10, 783–796. Chávez-Villalba, J., Soyez, C., Huvet, A., Gueguen, Y., Lo, C., Moullac, G.L., 2011. Determination of gender in the pearl oyster Pinctada margaritifera. Journal of Shellfish Research 30, 231–240. Churchill, G.A., Doerge, R.W., 1994. Empirical threshold values for quantitative trait mapping. Genetics 138, 963–971. Cockerham, C.C., Zeng, Z.B., 1996. Design III with marker loci. Genetics 143, 1437–1456. Coe, W.R., 1932a. Histological basis of sex changes in the American oyster (Ostrea virginica). Science 76, 125–7. Coe, W.R., 1932b. Sexual phases in the American oyster (Ostrea virginica). Biol Bull 63, 419– 441. Coe, W.R., 1934. Alternation of sexuality in oysters. The American Naturalist 68, 236–251. Coe, W.R., 1936. Environment and sex in the oviparous oyster Ostrea virginica. Biological Bulletin 71, 353–359. Coe, W.R., 1938. Primary sexual phases in the oviparous oyster (Ostrea virginica). Biological Bulletin 74, 64–75. Coe, W.R., 1943. Sexual differentiation in mollusks. I. Pelecypods. The Quarterly Review of Biology 18, 154–164. 15 Crow, J.F., 1948. Alternative hypotheses of hybrid vigor. Genetics 33, 477–487. Crow, J.F., 1998. 90 years ago: the beginning of hybrid maize. Genetics 148, 923–928. Curole, J.P., Meyer, E., Manahan, D.T., Hedgecock, D., 2010. Unequal and genotype-dependent expression of mitochondrial genes in larvae of the Pacific oyster Crassostrea gigas. Biol. Bull. 218, 122–131. Darwin, C., 1877. The effects of cross and self fertilisation in the vegetable kingdom. New York. Dobzhansky, T., 1970. Genetics of the evolutionary process. Columbia University Press. Doerge, R.W., 2002. Mapping and analysis of quantitative trait loci in experimental populations. Nat. Rev. Genet. 3, 43–52. Mann R., 1979. Exotic species in mariculture. Cambridge: MIT Press. pp. 363. Foltz, D.W., 1986. Null alleles as a possible cause of heterozygote deficiencies in the oyster Crassostrea virginica and other bivalves. Evolution 40, 869–870. Gaffney, P.M., Scott, T.M., 1984. Genetic heterozygosity and production traits in natural and hatchery populations of bivalves. Aquaculture 42, 289–302. Ge, J., Li, Q., Yu, H., Kong, L., 2015. Mendelian inheritance of golden shell color in the Pacific oyster Crassostrea gigas. Aquaculture 441, 21–24. Gosselin, L.A., Qian, P.Y., 1997. Juvenile mortality in benthic marine invertebrates. Mar. Ecol. Prog. Ser. 146, 265–282. Griffiths, A.J.F., 2012. Introduction to genetic analysis, 10 th ed. New York, Basingstoke. Haley, L.E., 1977. Sex determination in the American oyster. J. Hered. 68, 114–116. Haley, L.E., 1979. Genetics of sex determination in the American oyster. Proc. Nat. Shellfish. Assoc. 69, 54–57. Guo, X.M., Hedgecock, D., Hershberger, W.K., Cooper, K., Allen, S.K., 1998. Genetic determinants of protandric sex in the Pacific oyster, Crassostrea gigas Thunberg. Evolution 52, 394. Hedgecock, D., Davis, J.P., 2007. Heterosis for yield and crossbreeding of the Pacific oyster Crassostrea gigas. Aquaculture, Supplement: Genetics in Aquaculture IX 272, Supplement 1, S17–S29. Hedgecock, D., Lin, J.-Z., DeCola, S., Haudenschild, C.D., Meyer, E., Manahan, D.T., Bowen, B., 2007. Transcriptomic analysis of growth heterosis in larval Pacific oysters (Crassostrea gigas). Proc Natl Acad Sci U S A 104, 2313–2318. Hedgecock, D., McGoldrick, D.J., Bayne, B.L., 1995. Hybrid vigor in Pacific oysters: an experimental approach using crosses among inbred lines. Aquaculture, Genetics in Aquaculture V Proceedings of the Fifth International Symposium on Genetics in Aquaculture 137, 285–298. 16 Hedgecock, D., Shin, G., Gracey, A.Y., Den Berg, D.V., Samanta, M.P., 2015. Second- generation linkage maps for the Pacific oyster Crassostrea gigas reveal errors in assembly of genome scaffolds. G3-Genes Genomes Genet. 5, 2007–2019. Hedrick, P.W., Hedgecock, D., 2010. Sex determination: genetic models for oysters. J. Hered. 101, 602–611. Hu, Y.-P., Lutz, R.A., Vrijenhoek, R.C., 1993. Overdominance in early life stages of an American oyster strain. J. Hered. 84, 254–258. Hu, Z., Xu, S., 2009. PROC QTL—A SAS procedure for mapping quantitative trait loci. Int. J. Plant Genomics 2009: 1–3. Hubert, S., Hedgecock, D., 2004. Linkage maps of microsatellite DNA markers for the Pacific oyster Crassostrea gigas. Genetics 168, 351–362. Joyce, A., Holthuis, T.D., Charrier, G., Lindegarth, S., 2013. Experimental effects of temperature and photoperiod on synchrony of gametogenesis and sex ratio in the European oyster Ostrea edulis (L.). Journal of Shellfish Research 32, 447–458. Lander, E.S., Botstein, D., 1989. Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121, 185–199. Langdon, C., Evans, F., Jacobson, D., Blouin, M., 2003. Yields of cultured Pacific oysters Crassostrea gigas Thunberg improved after one generation of selection. Aquaculture 220, 227– 244. Lannan, J.E., 1980. Broodstock management of Crassostrea gigas: I. Genetic and environmental variation in survival in the larval rearing system. Aquaculture 21, 323–336. Launey, S., Hedgecock, D., 2001. High genetic load in the Pacific oyster Crassostrea gigas. Genetics 159, 255–265. Lenarcic, A.B., Svenson, K.L., Churchill, G.A., Valdar, W., 2012. A general Bayesian approach to analyzing diallel crosses of inbred strains. Genetics 190, 413–435. Li, L., Guo, X., 2004. AFLP-based genetic linkage maps of the Pacific oyster Crassostrea gigas Thunberg. Mar. Biotechnol. 6, 26–36. McGoldrick, D.J., Hedgecock, D., 1997. Fixation, segregation and linkage of allozyme loci in inbred families of the Pacific oyster Crassostrea gigas (Thunberg): Implications for the causes of inbreeding depression. Genetics 146, 321–334. Needler, A.B., 1941. Sex reversal in individual oysters. J. Fish. Res. Bd. Can. 5b, 361–364. Ooijen, J.W.V., 2011. Multipoint maximum likelihood mapping in a full-sib family of an outbreeding species. Genetics Research 93, 343–349. 17 Parker, L.M., Ross, P.M., O’Connor, W.A., 2010. Comparing the effect of elevated pCO2 and temperature on the fertilization and early development of two species of oysters. Mar Biol 157, 2435–2452. Phillips, N.E., 2002. Effects of nutrition-mediated larval condition on juvenile performance in a marine mussel. Ecology 83, 2562–2574. Piepho, H.P., 2001. A quick method for computing approximate thresholds for quantitative trait loci detection. Genetics 157, 425–432. Plough, L.V., 2012. Environmental stress increases selection against and dominance of deleterious mutations in inbred families of the Pacific oyster Crassostrea gigas. Mol. Ecol. 21, 3974–3987. Plough, L.V., Hedgecock, D., 2011. Quantitative trait locus analysis of stage-specific inbreeding depression in the Pacific oyster Crassostrea gigas. Genetics 189, 1473–1486. Plough, L.V., Shin, G., Hedgecock, D., 2016. Genetic inviability is a major driver of type III survivorship in experimental families of a highly fecund marine bivalve. Mol. Ecol. 25, 895–910. Rastas, P., Calboli, F.C.F., Guo, B., Shikano, T., Merilä, J., 2015. Construction of ultradense linkage maps with Lep-MAP2: stickleback F2 recombinant crosses as an example. Genome Biol Evol 8, 78–93. Santerre, C., Sourdaine, P., Marc, N., Mingant, C., Robert, R., Martinez, A.-S., 2013. Oyster sex determination is influenced by temperature - first clues in spat during first gonadic differentiation and gametogenesis. Comp. Biochem. Physiol. Part A: Mol. Integr. Physiol. 165, 61–69. Singh, S.M., Zouros, E., 1978. Genetic variation associated with growth rate in the American oyster (Crassostrea virginica). Evolution 32, 342–353. Stuber, C.W., Lincoln, S.E., Wolff, D.W., Helentjaris, T., Lander, E.S., 1992. Identification of genetic factors contributing to heterosis in a hybrid from two elite maize inbred lines using molecular markers. Genetics 132, 823–839. Thiriot-Quiévreux, C., Pogson, G.H., Zouros, E., 1992. Genetics of growth rate variation in bivalves: aneuploidy and heterozygosity effects in a Crassostrea gigas family. Genome 35, 39– 45. Wada, K.T., 1975. Electrophoretic variants of leucine aminopeptidase of the Japanese pearl oyster Pinctada fucata (Gould). Bull. Natl. Pearl Res. Lab. 19, 2152–2156. Xiao, J., Li, J., Yuan, L., Tanksley, S.D., 1995. Dominance is the major genetic basis of heterosis in rice as revealed by QTL analysis using molecular markers. Genetics 140, 745–754. Yusa, Y., 2007. Causes of variation in sex ratio and modes of sex determination in the Mollusca—an overview. American Malacological Bulletin 23, 89–98. Zhang, G., Fang, X., Guo, X., Li, L., Luo, R., et al., 2012. The oyster genome reveals stress adaptation and complexity of shell formation. Nature 490, 49–54. 18 Chapter 1 An Application of a Bayesian Hierarchical Model in Analyzing Crossbreeding Trials to Increase the Yield of the Pacific Oyster Crassostrea gigas Abstract Identifying elite inbred parent lines that produce high-performing hybrid Pacific oyster seed requires diallel or factorial test crosses among lines, each acting as both a male and a female parent. Previously, we used the generalized linear model with fixed effects (i.e. GLM) to partition variance in yield, among hybrid families produced by a diallel cross, into causal genetic components—principally, general combining ability (GCA), specific combining ability (SCA), and reciprocal effect (R). However, GLM is extremely sensitive to missing information (i.e. missing families in a diallel cross), which arises, for example, from variation in reproductive success of parent lines. To resolve this issue, we apply a Bayesian hierarchical model (i.e. Bayesian analysis) in analyzing diallel crosses of Pacific oysters. Our study suggests that the correlation between Bayesian analysis-predicted and observed yields is high, with coefficients of correlation above 0.99, for existing offspring, regardless of diallel completeness. Additionally, in the complete diallel analysis, GLM- and Bayesian analysis-based line-specific GCA rankings are consistent for parent lines worthy of further consideration. Finally, we test the reliability of the parent line selection for double-cross hybrids that are generated from crosses of two unrelated hybrid parents. In general, the Bayesian analysis-based parent line selection for double-cross hybrids becomes less accurate when non-parental lines (i.e. the four hybrid parents used to 19 predict the yield of double-cross hybrids) are missing in diallel crosses. However, this issue can be resolved by being cautious with estimates from missing non-parental families. Therefore, Bayesian analysis is powerful in selecting superior parent lines for producing high-yielding, hybrid, Pacific oyster seed. 1. Introduction The Pacific oyster Crassostrea gigas shows remarkable heterosis (hybrid vigor) for yield and its underlying components, growth and survival (Hedgecock et al., 1995; Launey and Hedgecock, 2001; Pace et al., 2006; Hedgecock and Davis, 2007; Plough and Hedgecock, 2011; Plough et al., 2016). Yield of F1 hybrids significantly exceeded that of the better-yielding parent in 16 of 22 cases (Hedgecock and Davis, 2007), meeting Griffing’s (1990) operational definition of heterosis, i.e. potence, ℎ 𝑝 = 𝑄 𝐿 ⁄ > 1 . 0, where Q is twice the deviation of a hybrid from the mid-parent value and L is the absolute difference between the parent trait-values. The widespread phenomenon of “better-parent” heterosis in crosses of inbred lines of the Pacific oyster, when evaluated in aquaculture systems, suggests that production of hybrid cultivars by crossbreeding has great potential to improve oyster yield (Hedgecock and Davis, 2007). In practice, diallel crosses rarely produce the next generation of all parental lines and all possible F1 hybrids due to reproductive failure or environmental factors (Hedgecock and Davis, 2007), and, even if they did, the calculation of potence for individual crosses is not nearly as useful as calculation of general combining ability (GCA, the additive contribution of a line to the performance of its offspring), specific combining ability (SCA, the deviation in the performance of a hybrid combination from the expected performance based on GCA of lines involved), and reciprocal effect (R), for the full set of crosses (Griffing, 1956). Estimates of these components of yield-variance allow selection of superior parent lines for commercial production of hybrid 20 oysters. Hedgecock and Davis (2007) showed that yields of the Pacific oyster increase with GCA, as expected (Langdon et al., 2003; de Melo et al., 2016) but that non-additive genetic components of yield variance, SCA and R, make substantial contributions to yield that are often larger than the contribution of GCA. This study also shows that application of the diallel cross to the Pacific oyster, using cages of full-siblings as the unit of replication and measurement, is an effective means for estimating components of yield-variance and for generating information useful for a commercial crossbreeding program. However, Hedgecock and Davis (2007) also revealed limitations of the generalized linear model with fixed effects (i.e. GLM), as implemented in DIALLEL-SAS05 (Zhang et al., 2005), for analyzing data from diallel crosses with missing data. Solutions to the problem of missing data included dropping parent lines, resulting in loss of power, and collapsing incomplete diallel crosses to complete, pseudo-partial diallel crosses, diminishing the scope of inference about SCA and R. Here, we compare previous results obtained from GLM with results from an alternative, Bayesian hierarchical model for analyzing data from diallel crosses (i.e. Bayesian analysis) (Lenarcic et al., 2012). The Bayesian analysis models additive, heterotic, epistatic, and parent-of- origin effects, while overcoming the difficulties posed by incomplete or imbalanced data and by data outliers. We find correspondences between the effects specified in the Bayesian hierarchical model and the classical components of yield-variance. We, then, further explore the reliability of estimates of GCA, SCA and R provided by Bayesian analysis of incomplete diallel crosses through simulation with two patterns of missing data, random loss of families and random loss of parent-line representation, mimicking the commonly observed reproductive failure of certain parent lines in diallel crosses of oysters (Lannan, 1980; Hedgecock and Davis, 2007). Finally, we focus on the reliability, in the face of missing data, of (1) ranking parent lines to identify those 21 worthy of further consideration or production of single-cross hybrids and (2) identifying F1 hybrids for production of double-cross hybrids. Jones (1918, 1922) promoted the use of double- cross hybrids in the early days of maize breeding, as a way of circumventing the handicap of producing F1 hybrid seed on an inbred parent; double-cross hybrids likely present the same advantage for the Pacific oyster, since inbred oysters are small and produce inadequate numbers of eggs for commercial hatchery production. As noted above, diallel crosses rarely produce the next generation of the parental lines, owing to inbreeding depression (Lannan, 1980; Launey and Hedgecock, 2001; Evans et al., 2004; Plough and Hedgecock, 2011). Since our main interest, here, is in the practical application of information from diallel crosses to the crossbreeding of superior hybrid cultivars, we focus on Griffing’s (1956) Method 3 for analyzing a diallel cross that yields data on all p(p−1) F1 hybrids from p parent lines. In this partial diallel, without inbred parents, we can estimate the extra- nuclear effects, R, causing differences between reciprocal hybrids. 2. Materials and Methods 2.1 Data on yield from diallel crosses Data on yield were collected from six diallel crosses, which are named by birth year and experiment number (e.g. 01x1 stands for the first experimental cross set up in 2001; Table 1). Analyses of data from four of these crosses were reported previously by Hedgecock and Davis (2007); data from the 12x1 and 12x2 crosses have not been analyzed previously. Inbred parent lines are named by a number, which is an abbreviation of the full family name described by Hedgecock and Davis (2007). First generation hybrid offspring (F1 families) are given a sire×dam name (e.g. 2×10 stands for offspring with paternal parent from family 2 and maternal parent from family 10). 22 Table 1. Summary of diallel crosses Diallel cross Phases Cross codes # parent lines Parent lines A) Incomplete diallel crosses 01x1 III, IV 01x1-III, -IV 6 2, 10, 35, 38, 46, 51 01x4 III, IV 01x4-III, -IV 7 9, 28, 33, 35, 41, 46, 53 03x6 II, III 03x6-II, -III 9 20, 26, 35, 36, 45, 47, 51, 52, 92 03x8 IV 03x8-IV 7 3, 9, 19, 21, 40, 61, 94 12x1 III 12x1-III 10 8, 15, 19, 20, 21, 24, 32, 33, 34, 35 12x2 III 12x2-III 8 7-39, 16, 5, 26, 2-39, 30, 36, 45 B) Complete diallel crosses 01x1 III, IV 01x1-IIIC, -IVC 5 2, 35, 38, 46, 51 03x6 II, III 03x6-IIC, -IIIC 5 26, 36, 45, 47, 92 For three diallel crosses, yield (biomass) data were obtained at different phases in the production cycle, as described by Hedgecock and Davis (2007). Phase II is indoor replicated nursery culture in upwelling tubes; Phase III is outdoor replicated nursery culture in suspended, rotating seed cages; Phase IV is final grow out of adults in on-bottom cages. The phenotype analyzed in this study was the mean live weight per individual oyster in a rearing unit at the end of a given phase of culture (i.e. total live weight of oysters per cage or tube divided by the count of live oysters). Since survival was generally high for all juveniles, mean live weight is a measure of biomass yield in this study. Altogether, we analyzed nine partial or incomplete diallel crosses and four complete diallel crosses (see Table 1 for diallel cross names, the number and names of parent lines, and culture phases for yield data). Complete diallel crosses were embedded within larger, incomplete diallel crosses. 2.2 Statistical models Traditionally, diallel crosses are analyzed by GLM (Griffing, 1956), Yijk = gi + gj + sij + rij + eijk, (1) 23 where gi is general (additive) combining ability (GCA) of paternal parent i, gj is GCA of maternal parent j, sij is specific (non-additive) combining ability (SCA) of hybrid i×j , rij is reciprocal effect (R) when reciprocal hybrids are included, accounting for differences between reciprocal hybrids i×j and j×i, and eijk is experimental error. In this study, diallel crosses are analyzed using a subset of the full Bayesian hierarchical model proposed by Lenarcic et al. (2012). For inbred families (j=k), Yjki = µ + aj + mj + ak – mk + bj=k + βinbred + ei, (2) where µ is grand mean of live weight per individual oyster in a diallel cross, aj and ak are additive (dosage) effects of parent lines, mj and mk are maternal effects of parent lines, bj=k is family-specific inbreeding penalty, βinbred is fixed inbreeding depression, and ei is error for an individual oyster i in the cross of maternal parent j × paternal parent k. Initial analyses of data from diallel crosses, which included inbred parents, produced some negative estimates of yield for inbred parent lines; given our primary interest in hybrid families, we exclude inbred families from further analyses. For hybrid families (j≠k), Yjki = µ + aj + mj + ak – mk + vjk + wjk + ei, (3) where µ, aj, ak, mj and mk are defined as above, except j≠k, vjk is family-specific symmetric effect for cross j×k (dam×sire), wjk is family-specific asymmetric effect for cross j×k (dam×sire), and ei is error for an individual oyster i. For reciprocal families, vjk = vkj and wjk = - wkj. Biologically, each component in the Bayesian hierarchical model has its own meaning (Lenarcic et al., 2012). Grand mean, µ, represents the overall average yield in oysters produced by a given diallel cross and reared in the same environment; µ can be viewed as a fixed yield possessed by every individual in this diallel cross. Additive effects are the (dosage) contribution 24 from paternal and maternal parents, as deviations from the grand mean. The maternal effect is a reflection of parent-of-origin effects due to the sex of an individual, which can be caused by factors including mitochondrial effects, uterine environments (in mammals), or state of reproductive maturation (in oysters) (Lannan, 1980). Family-specific symmetric and asymmetric effects are departures from the grand mean caused by interactions between pairs of parent lines. The symmetric effect is only dependent on the specific pair of parent lines, irrespective of which one serves as paternal or maternal parent, whereas the asymmetric effect results from differences between reciprocal crosses of the same pair of parents. The genetic components, GCA, SCA, and R, in GLM correspond to analogous terms in the Bayesian hierarchical model, according to their biological interpretations. GCA in GLM is equal to the parental additive effects in the Bayesian hierarchical model. For inbred families, SCA corresponds to the family-specific inbreeding penalty and the fixed inbreeding depression; for hybrid families, SCA corresponds to the family-specific symmetric effect. R corresponds to a combination of maternal and family-specific asymmetric effects for hybrid families. These genetic effects and the average live weights of families were predicted, using the output from Bayesian analysis (Bayesian output) carried out with BayesDiallel by R 3.0.2 (Lenarcic et al., 2012). 2.3 Diallel cross simulation To test the performance of Bayesian analysis, we simulated data from diallel crosses. According to equations (2) and (3), the simulated live weight is the sum of simulated genetic effects and simulated µ. For inbred families (j=k), Yjki (sim) = µ (sim) + aj (sim) + mj (sim) + ak (sim) – mk (sim) + bj=k (sim) + βinbred (sim); (4) for hybrid families (j≠k), 25 Yjki (sim)= µ (sim)+ aj (sim) + mj (sim) + ak (sim) – mk (sim) + vjk (sim) + wjk (sim); (5) where sim stands for “simulated” and µ, aj, mj, ak, mk, bj=k, βinbred, vjk and wjk are defined as above in equations (2) and (3). The live weight simulated here, Yjki (sim), is the mean live weight per individual cage. We therefore simulated mean live weight per individual per cage directly, rather than simulating live weights for all individuals grown in one cage first and then averaging them to get the mean live weight of full-siblings in a cage. Since yield should be positive, we took the absolute value of negative Y jki as the yield for corresponding cages. Genetic effects (i.e. a, m, v, w, bj=k, βinbred) and µ in equations (2) and (3) are assumed to be normally distributed, so two parameters, mean and standard deviation, are needed to simulate µ and genetic effects for each parent line, the sum of which is simulated live weight. Two factors need to be considered in diallel cross simulation, the phase when data on live weight are collected (i.e. Phase) and the pattern of missing (or present) families (i.e. Pattern). Mean and standard deviation for µ and genetic effects of seven inbred parent lines in Bayesian outputs for 12x1-III and 03x8-IV were used as mean and standard deviation to simulate µ and genetic effects. Live weights of Pacific oysters in 7×7 diallel crosses at juvenile (simulated based on 12x1-III Bayesian output) and adult (simulated based on 03x8-IV Bayesian output) life stages were generated. Since we are not interested in inbred families due to inbreeding depression, we removed all inbred families from simulated 7×7 diallel crosses. In total, we simulated 1000 complete diallel crosses (coded as CPLT) for juvenile and adult life stages, respectively, each consisting of 42 hybrid families. Assuming that each family is reared in ten cages, we simulated ten mean live weights (i.e. cage-average live weights) for each family, so 2000 CPLT diallel crosses, each consisting of 420 cages of Pacific oysters, were generated. Two types of missing-family patterns are usually observed in practice, diallel crosses with randomly 26 missing families (coded as R) and diallel crosses with missing rows or columns due to the loss of all hybrid families sharing a common parent line (coded as RRC). For simulated R diallel crosses, 21 hybrid families (50% of hybrid families) were randomly removed from CPLT diallel crosses, generating incomplete diallel crosses consisting of 21 hybrid families. For simulated RRC diallel crosses, a total of three rows or columns were randomly removed, producing incomplete diallel crosses consisting of 24 or 26 hybrid families. In total, there are four Phase- by-Pattern combinations of simulated diallel crosses (i.e. Juvenile-R, Juvenile-RRC, Adult-R, and Adult-RRC). Since 1000 CPLT diallel crosses were simulated for each life stage, 1000 different 7×7 diallel crosses were generated under each Phase-by-Pattern combination by randomly removing different families (for R diallel crosses) and different rows or columns (for RRC diallel crosses) from CPLT diallel crosses. In total, 4000 incomplete diallel crosses were simulated, among which R and RRC diallel crosses consist of 210 and 240 (or 260) cages, respectively. Each simulated diallel cross was coded as Phase-Pattern-Number. For example, Juvenile-CPLT-I stands for the first complete simulated diallel cross based on 12x1-III Bayesian output, Juvenile-R-II stands for the second R diallel cross simulated based on 12x1-III Bayesian output, and Adult-RRC-III stands for the third diallel cross simulated based on 03x8-IV Bayesian output. All simulated diallel crosses, CPLT, R and RRC, and the code in R for simulating diallel- cross data are available upon request. 2.4 Parent-line ranking Superior parent lines are selected according to line-specific GCA, which can be directly obtained from both Bayesian and GLM outputs (Griffing, 1956). For each complete diallel cross (01x1-IIIC, 01x1-IVC, 03x6-IIC, 03x6-IIIC), line-specific GCA estimates from GLM and Bayesian analysis were compared. For each simulated diallel cross, Bayesian analysis was 27 conducted to estimate GCA. We selected the top three parent lines in each simulated diallel cross, according to line-specific GCA (Griffing, 1956), and then compared between simulated complete and incomplete diallel crosses of the same Phase-by-Number combination. If a parent line is selected as a top one in simulated incomplete diallel cross, but not in simulated complete diallel cross, then this parent line is counted as a mismatch of top parent lines between simulated complete and incomplete diallel crosses. The number of mismatches in top parent lines between simulated complete and incomplete diallel crosses was counted. While Griffing (1956) takes the variance of GCA and SCA into consideration, when selecting superior parent lines, only GCA was used for parent line selection here. Meanwhile, to test how the number of missing families influences GCA ranking, the difference in ranks of GCA for a parent line between simulated complete and incomplete diallel crosses of the same Phase-by-Number combination was calculated and compared against the number of missing families for the corresponding parent line. 2.5 Double-cross hybrid parent selection Double-cross hybrids are generated from crosses of two unrelated hybrid parents that do not share a common parent. The yield of double-cross hybrids can be predicted based on the yield of hybrid families in a diallel cross according to Method B of Jenkins (1934), in which Y(A×B)×(C×D) = (YA×C + YA×D + YB×C + Y B×D) ÷ 4, (6) where Y(A×B)×(C×D) is the predicted yield of a double-cross hybrid AB×CD and YA×C, YA×D, YB×C and YB×D are the observed yields of hybrid families A×C, A×D, B×C and B×D, respectively. Families A×B and C×D are called parental lines because they serve as parents for the double- cross hybrid; families A×C, A×D, B×C and B×D are called non-parental lines, because they are only used for predicting the yield of the double-cross hybrid and do not serve as parents. 28 We need to accurately estimate the yield of missing non-parental lines, which is used for predicting double-cross hybrid yield, in order to correctly pick double-cross hybrid parents. We tested the accuracy of predicted yields of non-parental lines using r 2 , the coefficient of correlation between predicted yields in simulated complete and incomplete (R or RRC) diallel crosses, for missing, existing and all families in a diallel cross. 3. Results 3.1 Predictions by GLM and Bayesian methods in diallel cross analysis In analyses of complete diallel crosses, correlations between observed yield and predicted yield from GLM and Bayesian analysis are high for all hybrid families in four, 5×5 complete diallel crosses, with correlation coefficients above 0.99 for both methods (Table 2). In analyses of incomplete diallel crosses, even though as many as 67% of families are missing, the correlation coefficient, r 2 , for linear regressions of Bayesian-predicted yield on observed yield for existing families ranges from 0.968 to 1 (Table 3). Predictions of line-specific GCA ranking by GLM and Bayesian analysis are consistent for almost all parent lines in four, 5×5 complete diallel crosses. The exceptions are predicted line- specific GCA for parents 38 and 51 in 01x1-IVC and parent 45 in 03x6-IIIC (Fig. 2). 3.2 Selection of superior parent lines for F1 and double-cross hybrids An average number of 0.76, 0.8, 0.24 and 0.45 out of three top parent lines are incorrectly selected for diallel crosses Juvenile-R, Juvenile-RRC, Adult-R and Adult-RRC, respectively (Table 4). The difference in ranks of line-specific GCA for a parent line between simulated complete and incomplete diallel crosses does not seem to depend heavily on the number of missing families for the parent line (Fig. 3). 29 In order to select elite parents for double-cross hybrids, we need to evaluate whether predicted yields for non-parental lines by Bayesian analysis are reliable. Correlation coefficients of predicted yields between simulated complete and incomplete diallel crosses are above 0.84 for existing families, which are higher than those for missing or all families in diallel crosses (Fig. 4). Predictions on yields for missing families are poor except for simulated R diallel crosses at adult stage, with correlation coefficients clustering around 0.85 (Fig. 4). Table 2. Relationships among GLM-predicted, Bayesian-predicted, and observed yields in analyses of 5×5 complete diallel crosses Regression equation a r 2 Bayesian-predicted versus GLM-predicted yield (x: Bayesian-predicted yield; y: GLM-predicted yield) 01x1-IIIC y = 1.0304x + 0.0352 0.9976 01x1-IVC b y = 1.0206x – 77.394 0.9963 03x6-IIC y = 0.9928x – 0.0161 0.9993 03x6-IIIC y = 1.3416x – 1.7971 0.9904 GLM-predicted versus observed yield (x: observed yield; y: GLM-predicted yield) 01x1-IIIC y = 1.0133x + 0.0561 0.9974 01x1-IVC y = 0.9917x + 8.3825 0.9975 03x6-IIC y = x - 0.061 1 03x6-IIIC y = x - 0.0375 1 Bayesian-predicted versus observed yield (x: observed yield; y: Bayesian-predicted yield) 01x1-IIIC y = 0.9833x + 0.0204 0.9997 01x1-IVC y = 1.0292x – 86.783 0.999 03x6-IIC y = 1.0066x – 0.0004 0.9993 03x6-IIIC y = 0.7383x + 1.3485 0.9904 a Data on which Bayesian analysis is conducted are live weight per individual (g/individual). b Observed and predicted yields for diallel cross 01x1-IV are g/cage. 30 Table 3. Predictions of yield by Bayesian analysis for existing families in incomplete diallel crosses Diallel cross # families attempted # families obtained Observed yield (g/individual) (mean ± s.d.) Predicted yield (g/individual) (mean ± s.d.) Regression equation a r 2 01x1-III 30 25 1.17±0.24 1.17±0.22 y = 0.98x+0.02 1 01x1-IV b 30 25 3214±475 3193±424 y = 1.04x-117.68 0.998 01x4-III 42 21 0.95±0.17 0.98±0.14 y = 0.97x + 0.03 1 01x4-IV 42 21 25.55±4.44 25.62±3.25 y = 0.81x+4.89 0.968 03x6-II 72 44 0.06±0.02 0.06±0.01 y = x-0.0001 0.999 03x6-III 72 44 5.18±0.92 5.18±0.57 y = 0.77x+1.19 0.984 03x8-IV 42 26 60.04±8.43 60.25±6.74 y = 0.94x+3.77 0.996 12x1-III 90 29 0.82±0.27 0.84±0.18 y = 0.96x+0.03 1 12x2-III 56 25 1.75±0.46 1.79±0.34 y = 0.96x + 0.07 0.999 a Linear regressions of Bayesian-predicted yield [dependent variable, y] on observed yield [independent variable, x] for existing families in incomplete diallel crosses. b Observed and predicted yields for diallel cross 01x1-IV are g/cage. Table 4. The number of mismatches in top three parent lines in simulated complete (CPLT) and incomplete (R and RRC) diallel crosses CPLT vs Juvenile-R CPLT vs Juvenile-RRC CPLT vs Adult-R CPLT vs Adult-RRC # mismatches per simulation # simulations 1 560 580 237 409 2 100 102 0 21 3 1 5 0 0 Total # mismatches a 763 799 237 451 Total # parent lines to keep b 3000 3000 3000 3000 Average # mismatches c 0.76 0.8 0.24 0.45 a Total # mismatches: the total number of mismatches in top three parent lines between simulated complete and incomplete diallel crosses across 1000 simulated diallel crosses of the same Phase- by-Pattern combination. b Total # parent lines to keep: the total number of top three parent lines that need to be kept across 1000 simulated diallel crosses of the same Phase-by-Pattern combination. c Average # mismatches = Total # mismatches / 1000 simulations. 31 -0.3 -0.2 -0.1 0 0.1 0.2 -0.25 -0.15 -0.05 0.05 0.15 01x1-IIIC 0 0.02 0.04 0.06 0.08 -0.3 -0.2 -0.1 0 0.1 0.2 01x1-IVC -1 -0.5 0 0.5 1 -0.4 -0.2 0 0.2 0.4 0.6 03x6-IIC -0.8 -0.5 -0.2 0.1 0.4 0.7 1 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 03x6-IIIC GLM-predicted Bayesian-predicted 45 51 38 Figure 2. Bayesian-predicted vs GLM-predicted line-specific general combining abilities. 32 Figure 3. Difference in rank of line-specific GCA between simulated complete (CPLT) and incomplete (R and RRC) diallel crosses. Number of missing families per parent line is the number of missing families among all hybrid families sharing one common parent line. CPLT-R: the rank of line-specific GCA for a parent line in diallel cross CPLT minus the rank of line-specific GCA for the same parent line in diallel cross R. CPLT-RRC: the rank of line-specific GCA for a parent line in diallel cross CPLT minus the rank of line-specific GCA for the same parent line in diallel cross RRC. -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 10 11 12 Difference in parent ranks between complete and incomplete diallels Number of missing families per parent line Juvenile CPLT-R CPLT-RRC -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 10 11 12 Difference in parent ranks between complete and incomplete diallels Number of missing families per parent line Adult CPLT-R CPLT-RRC 33 0 20 40 60 80 100 120 140 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 # simulations Adult R All 0 30 60 90 120 150 180 210 240 270 300 330 360 390 420 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Adult R Existing 0 10 20 30 40 50 60 70 80 90 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Adult R Missing 0 10 20 30 40 50 60 70 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 # simulations Adult RRC All 0 100 200 300 400 500 600 700 800 900 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Adult RRC Existing 0 5 10 15 20 25 30 35 40 45 50 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Adult RRC Missing r 2 34 Figure 4. Histograms of correlation coefficients (r 2 ) of predicted yields between simulated complete and incomplete diallel crosses for all, existing and missing families in diallel crosses. Juvenile and Adult represent diallel crosses simulated based on 12x1-III and 03x8-IV Bayesian outputs, respectively; R and RRC stand for two missing patterns; All, Existing and Missing indicate correlation coefficients of predicted yields between simulated complete and incomplete diallel crosses for all, existing and missing families, respectively. 0 10 20 30 40 50 60 70 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 # simulations Juvenile R All 0 100 200 300 400 500 600 700 800 900 1000 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Juvenile R Existing 0 10 20 30 40 50 60 70 80 90 100 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Juvenile R Missing 0 10 20 30 40 50 60 70 80 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 # simulations Juvenile RRC All 0 100 200 300 400 500 600 700 800 900 1000 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Juvenile RRC Existing 0 20 40 60 80 100 120 140 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Juvenile RRC Missing r 2 35 4. Discussion 4.1 Comparison of GLM and Bayesian analysis Diallel analysis is very important in partitioning variance in a trait into the genetic components of general combining ability (GCA), specific combining ability (SCA) and reciprocal effect (R). In our study of complete diallel crosses, more than 99% of the variance in observed yields at Phases II to IV can be explained by predicted yields from GLM and Bayesian analysis (Table 2). This indicates that both GLM and Bayesian analysis are powerful tools for complete diallel analysis, regardless of which life stage furnishes the yield data. However, in practice, unanticipated missing information makes it hard to extract information from a diallel cross. While GLM can only easily deal with complete diallel or half diallel crosses, it cannot deal well with missing data. Bayesian analysis, on the other hand, is less restricted by the completeness of a diallel cross. As our simulations show, even when more than half of families are lost from a diallel cross, Bayesian analysis can still effectively estimate the genetic components GCA, SCA and R, and accurately predict yield for existing families (Table 3). Thus, Bayesian analysis is highly tolerant of missing information and a better method than GLM for diallel analysis. 4.2 Line-specific GCA ranking and parent line selection We selected top parent lines according to their line-specific GCA (Griffing, 1956). Comparing ranks of line-specific GCA estimates from GLM and Bayesian analysis, we find that the two methods yield consistent rankings for most parent lines in four, 5×5 complete diallel crosses. The few exceptions are parent lines having negative GCA effects (38 and 51 in 01x1- IVC and 45 in 03x6-IIIC; Fig. 2), which would not receive further consideration in any case. Therefore, Bayesian analysis provides reliable information for selecting superior parent lines in complete diallel crosses. 36 At least two out of three top parent lines can be correctly selected according to line- specific GCA predicted by Bayesian analysis regardless of missing pattern or life stage, suggesting that the Bayesian hierarchical model is reliable in selecting superior parent lines for setting up F1 hybrids (Table 4). Parent line selection is more accurate for R than for RRC diallel crosses, especially at Adult stage, because less information is available for a specific parent line when entire rows or columns in diallel crosses are missing due to factors such as reproductive failure (Table 4). In contrast, randomly missing families due to environmental factors does not appear to be a severe issue in analyzing diallel crosses and picking superior parent lines for F1 hybrids. For each missing pattern, parent line selection is more reliable for the Adult than for the Juvenile stage (Table 4). The better performance of Bayesian analysis-based parent line selection for Adult diallel cross is supported by the smaller difference in ranks of line-specific GCA between simulated complete and incomplete diallel crosses at the Adult stage (Fig. 3). We find that the number of missing families involving a parent line do not greatly affect the ranking accuracy of line-specific GCA. In general, the same number of missing families for a parent line results in a greater difference in line-specific GCA ranking between simulated complete and incomplete diallel crosses for juveniles than for adults (Fig. 3). Life stage appears to make a difference in reliability of line-specific GCA ranking, but this may be an artifact of the different signal-to-noise ratios (i.e. variance among families in a diallel cross divided by variance within families in a diallel cross) between simulated Juvenile and Adult diallel crosses. Mean signal-to- noise ratios for diallel crosses Juvenile-R, Juvenile-RRC, Adult-R and Adult-RRC are 1.66, 1.61, 6.49 and 6.15, respectively. This suggests that the accuracy of top parent line selection may be lowered by a smaller signal-to-noise ratio of yield data collected from a diallel cross. Therefore, 37 the signal-to-noise ratio of oyster yields should be increased as much as possible by setting up more replicates for a family in practice, in order to correctly pick top parent lines for F1 hybrids. 4.3 Double-cross hybrids Since inbred oysters are small and produce inadequate numbers of eggs, we want to set up double-cross hybrid families with two unrelated hybrid parents to resolve the issue of producing F1 hybrid families on inbred parents. We select elite parents for double-cross hybrids according to the performance of double-cross hybrids, which can be estimated with yields of F1 non- parental lines (Jenkins, 1934). Analyses on both real and simulated incomplete diallel crosses demonstrate that the Bayesian hierarchical model can accurately predict the yield of existing families (Table 3, Fig. 4). Therefore, we are confident in the parents we select for double-cross hybrids when all non-parental lines are present in a diallel cross and thus the double-cross hybrid yield can be accurately estimated. In practice, some non-parental lines are lost in diallel crosses. In this case, we need to be cautious in selecting superior parents for double-cross hybrids. The prediction on yields of the Adult stage appear to be better than that of the Juvenile stage; at the Adult stage, the yield is more accurately estimated for R diallel cross than for RRC diallel cross (Fig. 4). This suggests that the selection of superior parents for double-cross hybrids is the most reliable for R diallel cross at Adult stage because, with a more accurately estimated yield of F1 hybrids, double-cross hybrid yields can be better predicted. Again, the better Bayesian analysis-based prediction at the Adult stage may be driven by a larger signal-to-noise ratio of yield data collected at later life stage. Thus, as the parent line selection for F1 hybrids, the design of diallel crosses should aim to increase the signal-to-noise ratio of yield data. Like that for F1 hybrids, the parent line selection for double-cross hybrids appear to be more reliable in incomplete diallel crosses with 38 environmental factor-driven randomly missing families than in those with reproductive failure- driven missing families. The poorer prediction on the yield of double-cross hybrids for RRC diallel crosses is largely driven by the inaccurately estimated yield of missing non-parental families, which, in turn, is caused by the overestimate or underestimate on the genetic components of inbred parent lines. While the estimation on F1 yields for randomly missing families at later life stage (i.e. with a larger signal-to-noise ratio) appears to be better than that for reproductive failure-driven missing families at early life stage, the overall prediction on yields for missing families in diallel crosses does not seem to be very reliable for selecting parents for double-cross hybrids (Fig. 4). The correlation coefficients of predicted yields between simulated complete and incomplete diallel crosses are mostly around or below 0.5, meaning that the double-cross hybrid yield may not be accurately estimated especially as the number of missing non-parental lines increases (Fig. 4). However, in practice, this issue could be resolved by being cautious with the yield estimated with missing non-parental families when we pick parents to produce double-cross hybrids. More importantly, in our study, we assume that 50% of families are missing in a simulated diallel cross, but usually more families can survive in reality. On one hand, availability of more unrelated hybrid families in a diallel cross can generate a sufficient number of double- cross hybrid families without any non-parental line missing. On the other hand, a larger number of surviving families in a diallel cross can decrease the number of missing non-parental lines for potential double-cross hybrids, thus improving the accuracy of estimates on the double-cross hybrid yield. Even if such a high proportion (i.e. 50%) of families are missing in practice, we usually reduce the dimension of diallel crosses for analysis, in which case the accuracy of 39 predicted yields can be improved. Therefore, compared to our simulated study, selecting parents for setting up double-cross hybrids in practice should be more reliable. 40 References de Melo, C.M.R., Durland, E., Langdon, C., 2016. Improvements in desirable traits of the Pacific oyster, Crassostrea gigas, as a result of five generations of selection on the West Coast, USA. Aquaculture 460, 105–115. Evans, F., Matson, S., Brake, J., Langdon, C., 2004. The effects of inbreeding on performance traits of adult Pacific oysters (Crassostrea gigas). Aquaculture 230, 89–98. Griffing, B., 1956. Concept of general and specific combining ability in relation to diallel crossing systems. Aust. Jnl. of Bio. Sci. 9, 463–493. Griffing, B., 1990. Use of a controlled-nutrient experiment to test heterosis hypotheses. Genetics 126, 753–767. Hedgecock, D., Davis, J.P., 2007. Heterosis for yield and crossbreeding of the Pacific oyster Crassostrea gigas. Aquaculture, Supplement: Genetics in Aquaculture IX 272, Supplement 1, S17–S29. Hedgecock, D., McGoldrick, D.J., Bayne, B.L., 1995. Hybrid vigor in Pacific oysters: an experimental approach using crosses among inbred lines. Aquaculture, Genetics in Aquaculture V Proceedings of the Fifth International Symposium on Genetics in Aquaculture 137, 285–298. Jenkins, M.T., 1934. Methods of estimating the performance of double crosses in corn. Agronomy Journal 26, 199–204. Jones, D.F., 1918. The effects of inbreeding and crossbreeding upon development. Conn. Agric. Exp. Stn. Bull. 107. 100 pp. Jones, D.F., 1922. The productiveness of single and double first generation corn hybrids. J. Am. Soc. Agron. 14: 242–252. Langdon, C., Evans, F., Jacobson, D., Blouin, M., 2003. Yields of cultured Pacific oysters Crassostrea gigas Thunberg improved after one generation of selection. Aquaculture 220, 227– 244. Lannan, J.E., 1980. Broodstock management of Crassostrea gigas: I. Genetic and environmental variation in survival in the larval rearing system. Aquaculture 21, 323–336. Launey, S., Hedgecock, D., 2001. High genetic load in the Pacific oyster Crassostrea gigas. Genetics 159, 255–265. Lenarcic, A.B., Svenson, K.L., Churchill, G.A., Valdar, W., 2012. A general Bayesian approach to analyzing diallel crosses of inbred strains. Genetics 190, 413–435. Pace, D.A., Marsh, A.G., Leong, P.K., Green, A.J., Hedgecock, D., Manahan, D.T., 2006. Physiological bases of genetically determined variation in growth of marine invertebrate larvae: A study of growth heterosis in the bivalve Crassostrea gigas. Journal of Experimental Marine Biology and Ecology 335, 188–209. 41 Plough, L.V., Hedgecock, D., 2011. Quantitative trait locus analysis of stage-specific inbreeding depression in the Pacific oyster Crassostrea gigas. Genetics 189, 1473–1486. Plough, L.V., Shin, G., Hedgecock, D., 2016. Genetic inviability is a major driver of type III survivorship in experimental families of a highly fecund marine bivalve. Mol. Ecol. 25, 895–910. Zhang, Y., Kang, M.S., Lamkey, K.R., 2005. DIALLEL-SAS05: a comprehensive program for Griffing’s and Gardner-Eberhart analyses. Agron. J. 97, 1097–1106. 42 Chapter 2 High-Density Linkage Maps Constructed with SNPs from Genotyping by Sequencing for the Pacific Oyster Crassostrea gigas Abstract The Pacific oyster is an excellent model to explore genetic causes for various phenomena in marine molluscs and has been widely studied in breeding programs. Linkage mapping is an indispensable step for QTL mapping and marker-assisted selection, which are important tools for genetic analysis and breeding programs. Three linkage maps have been constructed with amplified fragment length polymorphism (AFLP) markers, microsatellite DNA markers and single nucleotide polymorphism (SNP) markers for the Pacific oyster. However, the low density of linkage maps make them less powerful in genetic analysis; moreover, the inability to compare these linkage maps across multiple studies makes them less reliable. In this study, we construct high-density linkage maps with SNPs, generated from Illumina sequencing (genotyping-by- sequencing, GBS), for six interrelated F2 families, using JoinMap 4.1 and Lep-MAP2. Newly constructed linkage maps are denser than the first- and second-generation linkage maps, with map lengths ranging from 454.6 cM to 589.7 cM and an average marker spacing of 0.35 cM to 0.63 cM. Linkage maps constructed using Regression (RG map), Maximum Likelihood (ML map) and Lep-MAP2 (LEP map) methods are highly consistent. Then, a consensus map is constructed with linkage maps from six families using MergeMap (MM map) and LPmerge (LP map). A total number of 5,385 markers are involved in consensus maps, generating a MM map of 1,234.4 cM and a LP map of 646.07 cM. Lengths of RG and LP maps are closer to the estimated length of the C. gigas genome, so they are taken as the final maps. Since linkage and 43 consensus maps constructed for different families using different methods are highly consistent, the maps constructed in this study are reliable for being applied in future studies. 1. Introduction As a species that has been introduced from Asia to all continents but Antarctica (Mann, 1979), the Pacific oyster Crassostrea gigas is of high scientific and commercial value. Many interesting phenomena, including distortions of Mendelian segregation ratios, heterosis in growth, polygenic sex determination and sex reversal, and variation in shells, have been observed in the Pacific oyster and other bivalve molluscs (Hedrick and Muona, 1990; Guo et al., 1998; Hedgecock and Davis, 2007; Hedrick and Hedgecock, 2010; Plough and Hedgecock, 2011; Ge et al., 2014, 2015a; Plough, 2016). The availability of a C. gigas genome (Zhang et al., 2012) and pedigreed lines (Curole and Hedgecock, 2007) make the Pacific oyster an excellent model to reveal potential causes for these phenomena. Among methods available for genetic analyses (i.e. Mendelian inheritance analysis, traditional quantitative genetic methods, quantitative trait locus mapping, genome-wide association analysis), quantitative trait locus (QTL) mapping is one of the most cost-efficient and powerful tools for uncovering genetic mechanisms underlying mortality, growth, sex determination and shell variation in marine species. Furthermore, the Pacific oyster has become one of the most commercially important species and thus many breeding programs have been conducted on it (Hershberger et al., 1984; Langdon et al., 2003; Dégremont et al., 2010). Marker-assisted selection plays crucial roles in breeding programs to improve the yield of Pacific oysters. Linkage mapping is an indispensable step for QTL mapping and marker-assisted selection (Lander and Botstein, 1989; Doerge, 2002; Hu and Xu, 2009). Linkage maps have been constructed for many species in molluscs (i.e. oyster, scallop, mussel, clam, abalone), using 44 amplified fragment length polymorphism (AFLP) markers (Yu and Guo, 2003, 2006; Li et al., 2005; Liu et al., 2006, 2017; Wang et al., 2007; Lallias et al., 2007a, b; Qin et al., 2007; Xu et al., 2008; Yuan et al., 2010; Shi et al., 2010; Guo et al., 2012; Ge et al., 2014, 2015b), randomly amplified polymorphic DNA (RAPD) markers (Liu et al., 2006), microsatellite DNA markers (Yu and Guo, 2003, 2006; Hubert and Hedgecock, 2004; Liu et al., 2006, 2017; Baranski et al., 2006; Wang et al., 2007; Qin et al., 2007; Lallias et al., 2007b; Xu et al., 2008; Zhan et al., 2009; Yuan et al., 2010; Sauvage et al., 2010; Li et al., 2012a, b; Petersen et al., 2012; Zhong et al., 2014; Bai et al., 2015, 2016; Hedgecock et al., 2015), and single nucleotide polymorphism (SNP) markers (Sauvage et al., 2010; Jones et al., 2013; Shi et al., 2014; Li and He, 2014; Zhong et al., 2014; Hedgecock et al., 2015; Harrang et al., 2015; Ren et al., 2016; Wang et al., 2016; Bai et al., 2016). With linkage maps, these studies perform genetic analyses on a variety of ecologically and economically important traits in molluscs, including growth, sex, shell color and shape, resistance to disease and resistance to mortality. Three linkage maps have been constructed for the Pacific oyster so far, which are mainly based on AFLP markers, microsatellite DNA markers and SNPs, respectively (Li and Guo, 2004; Hubert and Hedgecock, 2004; Hedgecock et al., 2015). The first published linkage map for the Pacific oyster was constructed by Li and Guo (2004) using AFLPs; their male map consists of 10 linkage groups with an average marker spacing of 8.8 cM and their female map consists of 11 linkage groups with an average marker spacing of 9.5 cM (Li and Guo, 2004). While AFLP- based linkage maps provide an essential basis for genetic analyses in the Pacific oyster, the poor transfer of AFLPs across crosses and populations restricts the application of these maps in practice (Li et al., 2003). Shortly after this AFLP-based linkage map was published, a linkage map based on microsatellite DNA markers was constructed for the Pacific oyster by Hubert and 45 Hedgecock (2004). The male linkage map includes 11 linkage groups with an average marker spacing of 8.0 cM and the female linkage map includes 12 linkage groups with an average marker spacing of 10.4 cM (Hubert and Hedgecock, 2004). Alignment of microsatellite DNA marker-based male and female linkage maps indicates 10 linkage groups, which is consistent with the number of haploid chromosomes observed in the Pacific oyster (Hubert and Hedgecock, 2004). The easy transfer of microsatellite DNA makers across families makes it possible to apply microsatellite DNA marker-based linkage maps widely in genetic analyses, such as QTL mapping (Plough and Hedgecock, 2011; Plough, 2012; Plough et al., 2016). However, the low density of microsatellite DNA marker-based linkage maps may make the power of detecting genetic factors and mechanisms limited. To improve linkage maps of the Pacific oyster, Hedgecock et al. (2015) constructed second-generation linkage maps for five families, using SNPs and microsatellite DNA markers. Consisting of 10 linkage groups, the second-generation linkage map has a grand mean length of 588 cM and an average marker spacing of 1.0 cM (Hedgecock et al., 2015). Meanwhile, the second-generation linkage map suggests widespread errors in scaffold assemblies of the C. gigas genome and ascribes difficulties in linkage mapping to Mendelian segregation distortion, uni-parentally segregating markers and genotyping errors (Hedgecock et al., 2015). With the advent of high-throughput sequencing technology, the generation of a large number of markers has become feasible and more cost efficient for non-model organisms, making it easier to construct high-density linkage maps for marine species such as the Pacific oyster. Recently, a high-density linkage map has been constructed for a hybrid family of C. gigas and C. angulata (Wang et al., 2016), using SNPs generated from genotyping-by-sequencing (GBS; Bentley et al., 2008; Elshire et al., 2011; Wang et al., 2016). In this study, imputation was 46 performed on uni-parentally segregating markers to correct genotyping errors and fill missing genotype values, which are found to be common problems of high-throughput sequencing such as GBS (Ward et al., 2013; Wang et al., 2016). However, the failure of Wang et al. (2016) to identify the SNPs used for their linkage map makes it hard to compare their and previous linkage maps. While previous studies provide valuable genetic maps for the Pacific oyster, linkage maps and consensus maps in these studies are mostly constructed using only one method or software. This makes it hard to compare linkage or consensus maps made using different methods to evaluate the reliability of them. Here, we construct high-density linkage maps based on SNPs from GBS for six interrelated F2 families (sire×dam) of Pacific oysters (i.e. 23×31, 23×40, 31×23, 40×92, 47×92 and 92×40), using Regression (RG maps), Maximum Likelihood (ML maps) and Lep-MAP2 (LEP maps) methods (Ooijen, 2011; Rastas et al., 2015). Then, we make consensus maps by including single RG maps from all six families using MergeMap (MM map) and LPmerge (LP map) (Wu et al., 2011; Endelman and Plomion, 2014). RG maps and LP maps are selected as final linkage maps and consensus maps, respectively. Final linkage maps in this study, with an average marker spacing of 0.49 cM, are denser than previous linkage maps. Finally, we make comparisons between (1) linkage maps constructed with Regression, Maximum Likelihood and Lep-MAP2 methods, (2) consensus maps made with MergeMap and LPmerge, and (3) LP maps and corresponding single RG maps for constructing consensus maps. Since all methods yield maps with highly correlated marker orders, with only a few exceptions, the linkage and consensus maps constructed in our study are reliable. 47 2. Materials and Methods 2.1 Mapping families We used six, interrelated F2 families (sire×dam), 23×31, 23×40, 31×23, 40×92, 47×92 and 92×40, for linkage mapping. Each F2 family was derived from an intercross of full-sib F1 hybrids, which, in turn, were produced by crosses of partially inbred lines 23, 31, 40, 47 and 92 in 2009. We made these F2 families in May and June of 2011 and reared them through the summer of 2012 in Thorndyke Bay, WA. Detailed pedigree and rearing information is described and shown in 3. Pedigrees and Rearing of Six, F2 Interrelated Families and Figure 1 in Introduction. 2.2 Genotyping-by-Sequencing, GATK analysis, SNP discovery and parentage analysis The six F2 families were harvested in September or October of 2012 and shipped to the University of Southern California, where adductor muscle tissue was dissected and preserved in 70% ethanol for later DNA extraction. Sixty-four out of 1,488 tagged oysters were lost from June (or July for 47×92) to October (or September for 47×92 and 92×40), so we ended up with 1,424 F2 progeny for genotyping. We obtained genotype data using genotyping-by-sequencing (GBS) and the follow-up bioinformatics analyses guided by Genome Analysis Toolkit (GATK, https://www.broadinstitute.org/gatk/) (Bentley et al., 2008; Elshire et al., 2011). GBS involved two steps, library preparation and sequencing. To construct libraries for sequencing, we first extracted DNA from the 12, F1 hybrid parents and a total of 1,424, F2 hybrid progeny, following the DNeasy 96 Procotol, Purification of Total DNA from Animal Tissues (Qiagen, https://www.qiagen.com/us/shop/sample-technologies/dna/genomic-dna/dneasy-blood- and-tissue-kit/#resources), with minor modifications. We checked the quality of the extracted DNA by agarose gel electrophoresis, quantified DNA in each sample, following the protocol of 48 Quant-iT TM PicoGreen® dsDNA Assay Kit, and diluted or concentrated the sample DNA to a working concentration of 10 ng/µl. To make libraries for sequencing, we digested 100 ng of extracted genomic DNA with the restriction enzyme ApoI, in buffer, at 50 ºC for 2 h, then at 80 ºC for 20 m. We ligated common and barcoded adapters (designed with http://www.deenabio.com/services/gbs-adapters) to genomic DNA by incubating the mixture of digested genomic DNA, T4 DNA ligase (NEB # M0202L), H2O, and 10 T4 DNA ligase reaction buffer at 22 ºC for 60 m, 65 ºC for 30 m and 4 ºC for cooling. We then pooled and cleaned the ligated products from different samples. We amplified the pooled products in 2 µl DNA template, 21 µl H2O, 25 µl NEB 2 Taq Master mix (NEB # M0270S), and 2 µl Primer mix, using the PCR program, 5 m at 72 ºC, 30 s at 98 ºC, 14 cycles (10 s at 98 ºC, 30 s at 65 ºC, 30 s at 72 ºC), 5 m at 72 ºC, and holding at 4 ºC. We then purified the amplified products by agarose gel electrophoresis, following protocols of QIAquick® PCR Purification Kit, and extracted the DNA fragments, following the protocol of MinElute® Gel Extraction Kit. Once libraries were constructed, we checked their size distributions on an Agilent Bioanalyzer and quantified their concentration using NanoDrop™ 2000 spectrophotometer. In all, we constructed 18 libraries for the six families of Pacific oysters and sent them to the University of Southern California (USC) Genome & Cytometry Core for sequencing. Seven libraries for family 23 31 were constructed with 48 individuals and the remaining libraries were constructed with 96 individuals; each library was sequenced in a single lane on an Illumina HiSeq instrument. We processed the GBS data on the Linux clusters of the University of Southern California’s Center for High-Performance Computing. Sequences that matched a barcode (one mismatch allowed), followed by the ApoI remnant site (one mismatch allowed), were assigned to the corresponding sample using a custom script in Python. The script also truncated reads having 49 a full cut site or the beginning of the common adapter. We aligned reads to the C. gigas genome (Zhang et al., 2012), using the Burrows-Wheeler alignment tool (BWA v0.7.8, MEM algorithm), and processed alignments with the GATK software package v3.3.0 (McKenna et al., 2010) for local realignment around indels and base quality score recalibration. Variant and genotype calling were done with the GATK HaplotypeCaller tool and refined by variant quality score recalibration with the same software (Van der Auwera et al., 2013). To carry out this process, a training set was prepared, using replicated individuals. Data from those individuals were filtered for the number of reads (minimum of 15), genotyping quality (minimum of 20), and sites, for which replicates had the same genotype and allelic balance was between 0.35 and 0.65. After processing data with GATK, we filtered the results using VCFtools v0.1.12b. We removed indels and kept biallelic sites with genotypes having a minimum genotyping quality of 30 and a minimum of five reads per site per individual. We excluded sites genotyped for less than 80% of individuals and individuals with data for less than 70% of sites. After excluding sites with missing data for parents, sites that were homozygous in both parents, and sites at which offspring had monomorphic or unexpected genotypes, we coded the remaining genotypes into JoinMap format. Genotyping error rate was 2-3%. We named each SNP locus by its location on the C. gigas genome (scaffold number and nucleotide position). For example, 1694- G204948A stands for nucleotide 204,948 in scaffold 1694 with G as the reference allele and A as the alternate allele. Pedigree of all well genotyped progeny oysters (n=1,166) was confirmed by the relatedness analysis (Manichaikul et al., 2010) and CERVUS (Marshall et al., 1998) and 125 individuals, which did not match their parents or siblings, were removed. In total, we collected genotypes for 12, parent oysters from six F1 families and 1,041 progeny oysters from six, F2 50 families (i.e. 23×31: n=208; 23×40: n=181; 31×23: n=142; 40×92: n=258; 47×92: n=127; 92×40: n=125) for linkage mapping. 2.3 Linkage analysis Linkage analyses were conducted with JoinMap 4.1 (Ooijen, 2011), using the cross- pollinated (CP) coding of genotypes to accommodate different mating types. We observed three mating types in F2 families, hk×hk, lm×ll and nn×np, among which hk×hk represented bi- parentally segregating markers, while lm×ll and nn×np represented maternally and paternally segregating markers, respectively. To correct genotyping errors and fill missing genotype information, uni-parentally segregating markers were imputed using the program Maskov according to Ward et al. (2013). We pooled imputed, uni-parentally and raw, bi-parentally segregating markers together for linkage mapping. We excluded loci with fewer than 10% of individuals genotyped per family. To increase the efficiency of linkage mapping, we retained only one marker from each set of identical markers identified by JoinMap. Markers were grouped, using the independence LOD, for 23×31, 23×40, 31×23, 40×92, 47×92 and 92×40; LOD ranges for the six families were 2-26, 2-22, 2-10, 2-26, 2-20 and 2-10, respectively. We used both Regression (RG) and Maximum Likelihood (ML) mapping methods to construct linkage maps. For the RG method, we used the Kosambi mapping function, with maximum recombination frequency of 0.4, minimum LOD of 1.0, and goodness-of-fit jump threshold for removing loci of 5.0. All other parameters were set to default values. We used only the first- or second-round regression linkage maps. For the ML method, which uses Haldane mapping units, we set chain length to 5,000, length of burn-in chain to 20,000, number of Monte Carlo EM cycles to 10, and chain length per Monte Carlo EM cycle to 5,000. All other parameters were set to default values. The parameter used to determine 51 whether a locus fits well between its neighboring loci is nearest-neighbor fit (cM), with a larger value indicating a poor fit (Ooijen, 2011). We constructed a map for each linkage group, using the RG method. If the largest nearest- neighbor fit was greater than 5 cM for any marker, we excluded that marker and constructed another RG map. We repeated this process until the largest nearest-neighbor fit was smaller than 5 cM and termed the result the initial RG map. Using the markers on this initial RG map, we then constructed a ML map. If nearest-neighbor fit for any marker was larger than 5 cM, we removed that marker and reconstructed both RG and ML maps. We repeated this iterative process until the largest nearest-neighbor fits were less than 5 cM for RG and ML maps. We evaluated the consistency between RG and ML maps by r 2 , the correlation coefficient for linear regression of marker rank orders on the ML map against marker rank orders on the RG map. When r 2 was less than 0.95, we removed markers with large nearest-neighbor fits or markers with inconsistent positions between RG and ML maps. We considered RG and ML maps for a linkage group consistent when r 2 reached at least 0.95; we then took the RG map as the final linkage map for that linkage group. Linkage groups were numbered according to the second- generation linkage map in Hedgecock et al. (2015), by matching the scaffold numbers of markers on our final linkage maps with those on the maps in the previous study. To evaluate the reliability of linkage mapping by JoinMap 4.1, the same set of markers on the final JoinMap linkage maps were used to construct linkage maps using Lep-MAP2 (Rastas et al., 2015). We compared the lengths and marker rank orders of linkage maps constructed by both programs, using linear regression. Genome coverage was estimated, according to (1), GC = 1- e -2*dn/L (Bishop et al., 1983), (1) 52 where d is the average distance between neighboring markers and n is the total number of markers on ten linkage groups in each family. The total length of a linkage group is estimated by adding twice the average distance between neighboring markers to the map length of the corresponding linkage group. L is sum of total lengths of ten linkage groups in a family. 2.4 Consensus map construction We made a consensus map by merging the final RG linkage maps from all six F2 families, using MergeMap (MM map; Wu et al., 2011) and LPmerge (LP map; Endelman and Plomion, 2014). We compared single maps with MM and LP maps and calculated lengths of MM and LP maps to decide which consensus map to keep. 3. Results 3.1 Numbers of high quality SNP markers Numbers of high quality SNP markers generated from GBS range from 4,615 to 13,885 over the six families (Table 5). On average, 65% of these markers were input to JoinMap (range, 3,374 to 7,982), of which nearly a third (32.5%) had identical genotypes; we retained one marker of each identical group for linkage analysis but brought back identical markers if their representative remained on the final RG map. The number of markers placed on final RG maps ranges from 794 to 1,351 or from 16% to 34% of the markers input to JoinMap for each family (Table 5). 3.2 Map length, map density and genome coverage Across six families, total lengths of genetic maps range from 454.6 cM to 589.7 cM, with an average of 522.4 cM; average marker spacing ranges from 0.35 cM to 0.63 cM, with a grand average of 0.49 cM (Table 6). Across six families, average lengths on individual linkage groups 53 range from 43.8 cM to 59.2 cM, while average numbers of markers per linkage group range from 77 to 155 (Table 6). Genome coverage averages 0.862 across the six families (Table 6). Table 5. Numbers of markers at key steps from GBS to final linkage maps for six families Step 23×31 23×40 31×23 40×92 47×92 92×40 SNPs from GBS 13,885 5,783 4,928 7,782 8,096 4,615 SNPs input to JoinMap 7,982 3,966 3,374 4,551 4,660 3,744 Identical SNPs 2,827 1,759 978 1,192 1,181 1,292 Ungrouped uni-parentally segregating SNPs 63 64 57 41 31 81 Ungrouped bi-parentally segregating SNPs 262 189 160 210 300 226 Ungrouped SNPs in small linkage groups - - 21 - 92 - SNPs with <10% individuals genotyped 1,421 371 640 827 1,132 420 SNPs on final Regression maps 1,272 1,351 1,123 991 794 1,138 54 Table 6. Numbers of markers on and lengths of final linkage maps for six mapping families 3.3 Comparisons of RG, ML and LEP maps Coefficients of correlation between rank orders of markers on RG and ML maps range from 0.984 to 0.999, as expected, since we removed ill-fitting and inconsistent markers to achieve agreement between these methods (Table 7). Coefficients of correlation between rank orders of markers on RG and LEP maps range from 0.768 to 0.996, with five correlation coefficients lower than 0.9 (Table 8, shaded cells). Across six families, genome lengths of RG, Linkage Length, Family Average Average group No. markers 23×31 23×40 31×23 40×92 47×92 92×40 length no. markers 1 Length (cM) 63.37 47.95 51 68.34 63.24 61.34 59.2 No. markers 195 172 108 142 77 102 132.7 2 Length (cM) 53.79 51.54 61.67 65.72 30.95 59.83 53.9 No. markers 115 68 72 97 59 75 81.0 3 Length (cM) 67.3 56.38 48.64 65.45 43.18 55.51 56.1 No. markers 208 206 125 139 78 174 155.0 4 Length (cM) 58.55 49.23 54.93 42.65 61.28 75 56.9 No. markers 112 129 116 82 94 138 111.8 5 Length (cM) 31.25 44.43 57.5 59.27 67.15 49.42 51.5 No. markers 90 111 85 97 82 96 93.5 6 Length (cM) 28.83 33.76 61.79 58.93 44.87 66.18 49.1 No. markers 156 109 184 144 86 122 133.5 7 Length (cM) 54.34 49.78 62.74 57.91 49.1 46.53 53.4 No. markers 100 199 90 107 94 80 111.7 8 Length (cM) 16.31 39.6 63.04 45.68 53.23 44.71 43.8 No. markers 44 118 100 44 89 87 80.3 9 Length (cM) 44.11 61.32 65.31 19.38 56.66 71.94 53.1 No. markers 108 84 89 41 59 82 77.2 10 Length (cM) 36.71 32.3 59.18 57.39 27.46 59.27 45.4 No. markers 144 155 154 98 76 182 134.8 Sum of lengths 454.6 466.3 585.8 540.7 497.1 589.7 522.4 Total no. of markers 1,272 1,351 1,123 991 794 1,138 1111.5 Average spacing (cM) 0.360 0.348 0.526 0.551 0.634 0.523 Genome coverage 0.863 0.863 0.862 0.862 0.861 0.862 55 ML and LEP maps range from 461.8-600.2 cM, 813.3-1,127.9 cM and 576.6-753.3 cM, respectively (Table 9). Table 7. Coefficients of correlation between rank orders of markers on Regression and Maximum Likelihood maps Linkage group 23×31 23×40 31×23 40×92 47×92 92×40 1 0.999 0.997 0.999 0.996 0.997 0.997 2 0.997 0.998 0.997 0.997 0.984 0.994 3 0.998 0.998 0.999 0.996 0.999 0.997 4 0.997 0.998 0.997 0.990 0.994 0.994 5 0.996 0.999 0.997 0.992 0.998 0.985 6 0.993 0.999 0.999 0.996 0.998 0.997 7 0.997 0.998 0.996 0.997 0.998 0.995 8 0.987 0.997 0.997 0.997 0.997 0.997 9 0.996 0.999 0.998 0.990 0.999 0.998 10 0.996 0.997 0.997 0.995 0.999 0.997 Table 8. Coefficients of correlation between rank orders of markers on Regression and Lep-MAP2 maps Linkage group 23×31 23×40 31×23 40×92 47×92 92×40 1 0.990 0.977 0.973 0.989 0.996 0.989 2 0.910 0.768 0.903 0.965 0.888 0.936 3 0.995 0.972 0.943 0.991 0.994 0.984 4 0.980 0.957 0.960 0.812 0.986 0.992 5 0.984 0.977 0.993 0.973 0.995 0.971 6 0.967 0.966 0.996 0.991 0.995 0.994 7 0.988 0.991 0.984 0.969 0.992 0.993 8 0.934 0.936 0.907 0.969 0.975 0.989 9 0.891 0.897 0.977 0.860 0.988 0.987 10 0.921 0.970 0.986 0.994 0.911 0.993 56 Table 9. Lengths of Regression (RG), Maximum Likelihood (ML) and Lep-MAP2 (LEP) maps (in cM) 23×31 23×40 31×23 Linkage group RG map ML map LEP map RG map ML map LEP map RG map ML map LEP map 1 63.4 139.7 69.8 47.9 118.9 73.8 51.0 130.7 79.4 2 53.8 132.8 69.8 51.5 112.2 91.1 61.7 99.7 77.6 3 67.3 159.1 82.9 56.4 109.8 70.0 48.6 96.3 62.2 4 58.6 120.8 75.3 49.2 70.6 49.9 54.9 114.2 71.9 5 31.3 83.1 51.6 44.4 73.0 49.7 57.5 88.0 66.9 6 28.8 103.8 45.5 33.8 74.3 57.1 61.8 116.9 72.7 7 54.3 92.5 54.0 49.8 97.0 61.5 62.7 111.1 71.8 8 16.3 28.7 19.9 39.6 65.5 93.0 63.0 85.8 80.6 9 44.1 93.2 56.8 61.3 77.9 67.2 65.3 158.6 89.0 10 36.7 71.1 42.0 32.3 59.7 48.7 59.2 106.6 67.7 Sum 454.6 1025.0 567.6 466.3 858.9 662.0 585.8 1108.0 740.0 Genome length 461.8 1041.2 576.6 473.2 871.7 671.8 596.3 1127.9 753.3 40×92 47×92 92×40 Linkage group RG map ML map LEP map RG map ML map LEP map RG map ML map LEP map 1 68.3 105.6 69.6 63.2 90.1 75.4 61.3 74.6 60.6 2 65.7 92.7 80.1 30.9 54.1 43.2 59.8 78.0 59.9 3 65.4 93.0 64.7 43.2 97.6 56.7 55.5 90.7 67.1 4 42.6 102.5 58.2 61.3 108.7 69.5 75.0 96.4 92.3 5 59.3 145.8 76.9 67.2 99.7 84.0 49.4 61.2 41.7 6 58.9 112.7 73.0 44.9 84.7 63.9 66.2 87.6 67.4 7 57.9 88.5 67.9 49.1 98.1 61.4 46.5 65.8 48.9 8 45.7 52.0 47.6 53.2 84.3 68.4 44.7 61.9 47.6 9 19.4 32.8 24.1 56.7 89.1 65.9 71.9 91.8 68.5 10 57.4 81.5 61.8 27.5 63.5 39.3 59.3 91.2 64.7 Sum 540.7 907.1 624.0 497.1 869.9 627.7 589.7 799.2 618.8 Genome length 551.7 925.6 636.7 509.8 892.0 643.7 600.2 813.3 629.7 3.4 A consensus map Consensus maps constructed using MergeMap and LPmerge include the same number of markers, 5,385, but differ in length, the MM map being 1,234.4 cM and the LP map being 646.07 cM (Table 10). Correlations between rank orders of markers on MM and LP maps are 57 high with correlation coefficients ranging from 0.918 to 0.991 (Table 10). Comparisons between LP map and single RG maps show that coefficients of correlation between rank orders of markers on single RG and consensus maps are high, ranging from 0.9997 to 0.9999 across six families (Fig. 5). Table 10. Numbers of markers on and lengths of consensus maps Linkage No. of Map length (cM) Coefficients of correlation between rank group markers MM LP orders of markers on MM and LP maps 1 642 129.96 68.64 0.984 2 406 140.09 67.45 0.918 3 719 151.05 67.4 0.988 4 524 140.45 75.93 0.985 5 467 106.71 59.15 0.991 6 678 97.67 50.45 0.984 7 572 109.42 58.45 0.989 8 389 113.99 67.16 0.933 9 353 111.57 72.26 0.984 10 635 133.49 59.18 0.966 Total 5,385 1,234.4 646.07 58 Rank order of markers on LP map Figure 5. Correlations between rank orders of markers on Regression map (RG map) and consensus map made using LPmerge (LP map). x: rank order of markers on LP map; y: rank order of markers on RG map. y = 0.9987x + 1.8452 r² = 0.9997 0 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200 1400 23×31 y = 1.0002x + 0.3835 r² = 0.9997 0 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200 1400 23×40 y = 1.0015x + 0.2361 r² = 0.9997 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200 31×23 y = x + 0.7582 r² = 0.9997 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200 40×92 y = x + 0.6252 r² = 0.9999 0 150 300 450 600 750 900 0 150 300 450 600 750 900 47×92 y = 1.0015x + 0.0642 r² = 0.9997 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200 92×40 Rank order of markers on RG map 59 4. Discussion 4.1 Changes in numbers of markers from GBS to linkage mapping Through GBS, we are able to generate a large number of SNPs, which sets the foundation for constructing reliable high-density linkage maps in the Pacific oyster. If we define markers generated from GBS, input to JoinMap, and kept on final RG maps as markers at stage 1, 2 and 3, respectively, then almost one third of markers are removed from one stage to next on average. At some loci where high quality markers from GBS are located, we find missing parental genotypes, monomorphic progeny genotypes and unexpected progeny genotypes (compared with parental genotypes) and thus remove these loci. Only the remaining markers are input to JoinMap for linkage mapping, among which some identical loci are detected and only one representative of identical loci is kept for linkage mapping. Some groupings in JoinMap contain a small number of markers (i.e. fewer than 23 markers, most groupings have only one or two markers) and no assignment of these markers to other linkage groups is successful, so markers in these groupings are ungrouped and thus excluded (Table 5). To construct high-quality linkage maps, we removed markers having high nearest-neighbor fit or causing inconsistencies between ML and RG maps. Ultimately, markers on final RG maps account for 14.8% and 23.6% of markers generated from GBS and input to JoinMap, respectively. The high degree of marker removal suggests that, to obtain high quality linkage maps, a large number of high quality SNPs are needed because most of them may be removed during linkage mapping. 4.2 Map length, map density and genome coverage Compared to second-generation linkage maps constructed by Hedgecock et al. (2015), our linkage maps have similar map lengths but more markers, making our linkage maps denser than previous ones (Table 11). While our linkage maps are denser than previous ones, genome 60 coverage is similar in our and previous linkage maps (~0.86; cf. Table 6 here with Table 2 in Hedgecock et al. (2015)), suggesting that map density does not significantly affect genome coverage. Table 11. Comparisons of average marker spacings, map lengths and numbers of markers between previous and current linkage maps Average spacing a Map length b No. of markers c Map 1 d Map 2 e Map 1 Map 2 Map 1 Map 2 Mean 0.49 1.04 52.24 59.18 111.15 58.42 Variance 0.01 0.05 24.43 151.12 752.45 982.05 t statistic -5.08 -1.51 9.16 df 6 9 9 p-value 0.002 0.164 0.0000074 a Mean of average spacing is the grand mean of average spacings across six families (cM per family). b Mean of map length is the grand mean of average map lengths across ten linkage groups (cM per family per linkage group). c Mean of No. of markers is grand mean of the average number of markers across ten linkage groups (No. of markers per family per linkage group). d Map 1 are linkage maps constructed in this study. e Map 2 are linkage maps constructed in Hedgecock et al. (2015). 4.3 Comparisons of RG, ML and LEP maps ML maps have the maximum map lengths (799-1,108 cM) while RG maps have the minimum map lengths (455-590 cM), with the length of LEP maps in between (568-740 cM) (Table 9). Grand average genome lengths estimated across six families for RG, LEP and ML maps are 532 cM, 652 cM and 945 cM, respectively. The genome length estimated for RG maps is closest to the length of the C. gigas genome, 545 Mb estimated from k-mer analysis, 559 Mb estimated from the final assembly of the C. gigas genome and 637 Mb estimated from flow cytometry (Zhang et al., 2012). Thus, RG maps are taken as the final linkage map. While RG and ML maps are highly consistent for all ten linkage groups in six families, inconsistencies in marker rank orders exist between RG and LEP maps in five out of 60 linkage 61 groups (10 linkage groups×6 families=60 linkage groups). However, correlation coefficients between rank orders of markers on RG maps and those five LEP maps are above 0.77. Thus, rank orders of markers on RG, ML and LEP maps are consistent in general. Thus, final RG maps we obtain in this study are of high quality and reliable. 4.4 A consensus map MM and LP maps are highly consistent with coefficients of correlation between rank orders of markers on MM and LP maps ranging from 0.918 to 0.991 (Table 10). Since the total length of LP map, 646.07 cM, is closer to the estimated length of the C. gigas genome, we decided to take LP map as the final consensus map. Also, rank orders of markers on LP map and corresponding single RG maps are highly consistent across all six families, with correlation coefficients of marker rank orders between LP and single RG maps for each family ranging from 0.9997 to 0.9999 (Fig. 5). This indicates that the consensus map, LP map, constructed in this study is reliable and can clearly show relative positions of loci on the C. gigas genome. 4.5 Scaffold misassembly in the C. gigas genome Numbers of SNPs on a scaffold range from one to thirty-four, with an average of about five SNPs per scaffold (Table 12). Fewer scaffolds are found to include more SNPs, with 90% of scaffolds with ten or fewer SNPs (Table 12). Almost 53% of scaffolds with more than one SNPs are assigned to up to six different linkage groups, suggesting errors in the assembly of the C. gigas genome (Zhang et al., 2012). The detection of assembly errors in the C. gigas genome is consistent with discoveries from second-generation linkage maps by Hedgecock et al. (2015). Additionally, according to Hedgecock et al. (2015), scaffolds are more likely to be incorrectly assembled as their lengths increase. Due to the misassembly of scaffolds, contigs are likely to be incorrectly ordered within scaffolds and thus genes may be inaccurately localized on the C. gigas 62 genome (Hedgecock et al., 2015). Therefore, the misassembly of genome scaffolds poses a challenge for pinpointing candidate genes under QTL for variation in important traits such as survival and growth in the Pacific oyster. Table 12. Number of scaffolds that map to one or more than one linkage group, by number of SNPs per scaffold No. of SNPs Number of linkage groups to which scaffolds map per scaffold 1 2 3 4 5 6 Sum 1 280 0 0 0 0 0 280 2 171 27 0 0 0 0 198 3 120 28 3 0 0 0 151 4 74 19 9 0 0 0 102 5 57 16 4 0 0 0 77 6 35 13 10 1 0 0 59 7 27 12 6 1 0 0 46 8 20 12 10 2 0 0 44 9 15 16 7 1 1 0 40 10 16 1 1 3 0 0 21 11 7 9 6 3 1 0 26 12 8 8 0 1 2 0 19 13 1 4 2 0 0 0 7 14 3 3 3 1 0 0 10 15 0 3 5 2 1 0 11 16 2 4 1 0 0 1 8 17 0 1 0 2 0 0 3 18 0 2 1 1 0 0 4 19 1 0 1 1 0 0 3 20 2 4 0 1 0 0 7 21 1 1 1 2 1 0 6 22 0 0 0 1 0 0 1 23 1 2 3 0 1 0 7 24 0 0 1 1 0 0 2 26 0 0 0 0 1 0 1 27 0 0 1 0 0 0 1 30 0 1 0 0 0 0 1 33 0 1 0 0 0 0 1 34 0 0 0 0 1 0 1 Sum 841 187 75 24 9 1 1137 63 References Bai, Z., Han, X., Luo, M., Lin, J., Wang, G., Li, J., 2015. Constructing a microsatellite-based linkage map and identifying QTL for pearl quality traits in triangle pearl mussel (Hyriopsis cumingii). Aquaculture 437: 102–110. Bai, Z., Han, X., Liu, X., Li, Q., Li, J., 2016. Construction of a high-density genetic map and QTL mapping for pearl quality-related traits in Hyriopsis cumingii. Sci. Rep. 6: 32608. Baranski, M., Loughnan, S., Austin, C.M., Robinson, N., 2006. A microsatellite linkage map of the blacklip abalone, Haliotis rubra. Anim. Genet. 37: 563–570. Bentley, D.R., Balasubramanian, S., Swerdlow, H.P., Smith, G.P., Milton J., et al., 2008. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456: 53–59. Curole, J.P., Hedgecock, D., 2007. Chapter 29. Bivalve genomics: Complications, challenges, and future perspectives. Aquaculture Genome Technologies, Liu, Z. (editor), Blackwell Publishing, Ames, Iowa, pp. 525-543. Dégremont, L., Bédier, E., Boudry, P., 2010. Summer mortality of hatchery-produced Pacific oyster spat (Crassostrea gigas). II. Response to selection for survival and its influence on growth and yield. Aquaculture 299: 21–29. Doerge, R.W., 2002. Mapping and analysis of quantitative trait loci in experimental populations. Nat. Rev. Genet. 3: 43–52. Elshire, R.J., Glaubitz, J.C., Sun, Q., Poland, J.A., Kawamoto, K., Buckler, E.S., Mitchell, S.E., 2011. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLOS ONE 6, e19379 Endelman, J.B., Plomion, C., 2014. LPmerge: an R package for merging genetic maps by linear programming. Bioinformatics 30: 1623–1624. Ge, J., Li, Q., Yu, H., Kong, L., 2014. Identification and mapping of a SCAR marker linked to a locus involved in shell pigmentation of the Pacific oyster (Crassostrea gigas). Aquaculture 434: 249–253. Ge, J., Li, Q., Yu, H., Kong, L., 2015a. Mendelian inheritance of golden shell color in the Pacific oyster Crassostrea gigas. Aquaculture 441: 21–24. Ge, J., Li, Q., Yu, H., Kong, L., 2015b. Identification of single-locus PCR-based markers linked to shell background color in the Pacific oyster (Crassostrea gigas). Mar. Biotechnol. 17: 655– 662. Guo, X., Hedgecock, D., Hershberger, W.K., Cooper, K., Allen, S.K., 1998. Genetic determinants of protandric sex in the Pacific oyster, Crassostrea gigas Thunberg. Evolution 52: 394–402. 64 Guo, X., Li, Q., Wang, Q.Z., Kong, L.F., 2012. Genetic mapping and QTL analysis of growth- related traits in the Pacific oyster. Mar. Biotechnol. N. Y. N 14: 218–226. Harrang, E., Heurtebise, S., Faury, N., Robert, M., Arzul, I., Lapègue, S., 2015. Can survival of European flat oysters following experimental infection with Bonamia ostreae be predicted using QTLs? Aquaculture 448: 521–530. Hedgecock, D., Davis, J.P., 2007. Heterosis for yield and crossbreeding of the Pacific oyster Crassostrea gigas. Aquaculture 272, Supplement 1: S17–S29. Hedgecock, D., Shin, G., Gracey, A.Y., Den Berg, D.V., Samanta, M.P., 2015. Second- generation linkage maps for the Pacific oyster Crassostrea gigas reveal errors in assembly of genome scaffolds. G3-Genes Genomes Genet. 5: 2007–2019. Hedrick, P.W., Muona, O., 1990. Linkage of viability genes to marker loci in selfing organisms. Heredity 64: 67–72. Hedrick, P.W., Hedgecock, D., 2010. Sex determination: genetic models for oysters. J. Hered. 101: 602–611. Hershberger, W.K., Perdue, J.A., Beattie, J.H., 1984. Genetic selection and systematic breeding in Pacific oyster culture. Aquaculture 39: 237–245. Hu, Z., Xu, S., 2009. PROC QTL—A SAS procedure for mapping quantitative trait loci. Int. J. Plant Genomics 2009: 1–3. Hubert, S., Hedgecock, D., 2004. Linkage maps of microsatellite DNA markers for the Pacific oyster Crassostrea gigas. Genetics 168: 351–362. Jones, D.B., Jerry, D.R., Khatkar, M.S., Raadsma, H.W., Zenger, K.R., 2013. A high-density SNP genetic linkage map for the silver-lipped pearl oyster, Pinctada maxima: a valuable resource for gene localisation and marker-assisted selection. BMC Genomics 14: 810. Lallias, D., Lapègue, S., Hecquet, C., Boudry, P., Beaumont, A.R., 2007a. AFLP-based genetic linkage maps of the blue mussel (Mytilus edulis). Anim. Genet. 38: 340–349. Lallias, D., Beaumont, A.R., Haley, C.S., Boudry, P., Heurtebise, S., Lapègue, S., 2007b. A first- generation genetic linkage map of the European flat oyster Ostrea edulis (L.) based on AFLP and microsatellite markers. Anim. Genet. 38: 560–568. Lander, E.S., Botstein, D., 1989. Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121: 185–199. Langdon, C., Evans, F., Jacobson, D., Blouin, M., 2003. Yields of cultured Pacific oysters Crassostrea gigas Thunberg improved after one generation of selection. Aquaculture 220: 227– 244. Li, Y., Byrne, K., Miggiano, E., Whan, V., Moore, S., Keys, S., Crocos, P., Preston, N., Lehnert, S., 2003. Genetic mapping of the kuruma prawn Penaeus japonicus using AFLP markers. Aquaculture 219: 143–156. 65 Li, L., Guo, X., 2004. AFLP-based genetic linkage maps of the Pacific oyster Crassostrea gigas Thunberg. Mar. Biotechnol. N. Y. N 6: 26–36. Li, L., Xiang, J.H., Liu, X., Zhang, Y., Dong, B., Zhang, X., 2005. Construction of AFLP-based genetic linkage map for Zhikong scallop, Chlamys farreri Jones et Preston and mapping of sex- linked markers. Aquaculture 245: 63–73. Li, H., Liu, X., Zhang, G., 2012a. Development and linkage analysis of 104 new microsatellite markers for bay scallop (Argopecten irradians). Mar. Biotechnol. 14: 1–9. Li, H., Liu, X., Zhang, G., 2012b. A consensus microsatellite-based linkage map for the hermaphroditic bay scallop (Argopecten irradians) and its application in size-related QTL analysis. PLOS ONE 7: e46926. Li, Y., He, M., 2014. Genetic mapping and QTL analysis of growth-related traits in Pinctada fucata using restriction-site associated DNA sequencing. PLOS ONE 9: e111707. Liu, X., Liu, X., Guo, X., Gao, Q., Zhao, H., Zhang, G., 2006. A preliminary genetic linkage map of the Pacific abalone Haliotis discus hannai Ino. Mar. Biotechnol. 8: 386–397. Liu, B., Teng, S., Shao, Y., Chai, X., Xiao, G., Fang, J., Zhang, J., Wang, C., 2017. A genetic linkage map of blood clam (Tegillarca granosa) based on simple sequence repeat and amplified fragment length polymorphism markers. J. Shellfish Res. 36: 31–40. Mann, R., 1979. Exotic species in mariculture. Cambridge: MIT Press. pp. 363. McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., DePristo, M.A., 2010. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20: 1297–1303. Ooijen, J.W.V., 2011. Multipoint maximum likelihood mapping in a full-sib family of an outbreeding species. Genet. Res. 93: 343–349. Petersen, J.L., Baerwald, M.R., Ibarra, A.M., May, B., 2012. A first-generation linkage map of the Pacific lion-paw scallop (Nodipecten subnodosus): Initial evidence of QTL for size traits and markers linked to orange shell color. Aquaculture 350: 200–209. Plough, L.V., Hedgecock, D., 2011. Quantitative trait locus analysis of stage-specific inbreeding depression in the Pacific oyster Crassostrea gigas. Genetics 189: 1473–1486. Plough, L.V., 2012. Environmental stress increases selection against and dominance of deleterious mutations in inbred families of the Pacific oyster Crassostrea gigas. Mol. Ecol. 21: 3974–3987. Plough, L.V., Shin, G., Hedgecock, D., 2016. Genetic inviability is a major driver of type III survivorship in experimental families of a highly fecund marine bivalve. Mol. Ecol. 25: 895– 910. Plough, L.V., 2016. Genetic load in marine animals: a review. Curr. Zool. 62: 567–579. 66 Qin, Y., Liu, X., Zhang, H., Zhang, G., Guo, X., 2007. Genetic mapping of size-related quantitative trait loci (QTL) in the bay scallop (Argopecten irradians) using AFLP and microsatellite markers. Aquaculture 272: 281–290. Rastas, P., Calboli, F.C.F., Guo, B., Shikano, T., Merilä, J., 2015. Construction of ultradense linkage maps with Lep-MAP2: Stickleback F2 recombinant crosses as an example. Genome Biol. Evol. 8: 78–93. Ren, P., Peng, W., You, W., Huang, Z., Guo, Q., Chen, N., He, P., Ke, J., Guo, J., Ke, C., 2016. Genetic mapping and quantitative trait loci analysis of growth-related traits in the small abalone Haliotis diversicolor using restriction-site-associated DNA sequencing. Aquaculture 454: 163– 170. Sauvage, C., Boudry, P., de Koning, D.J., Haley, C.S., Heurtebise, S., Lapègue, S., 2010. QTL for resistance to summer mortality and OsHV-1 load in the Pacific oyster (Crassostrea gigas). Anim. Genet. 41: 390–399. Shi, Y., Guo, X., Gu, Z., Wang, A., Wang, Y., 2010. Preliminary genetic linkage map of the abalone Haliotis diversicolor Reeve. Chin. J. Oceanol. Limnol. 28: 549–557. Shi, Y., Wang, S., Gu, Z., Lv, J., Zhan, X., Yu, C., Bao, Z., Wang, A., 2014. High-density single nucleotide polymorphisms linkage and quantitative trait locus mapping of the pearl oyster, Pinctada fucata martensii Dunker. Aquaculture 434: 376–384. Van der Auwera, G.A., Carneiro, M.O., Hartl, C., Poplin, R., Del Angel, G., Levy-Moonshine, A., Jordan, T., Shakir, K., Roazen, D., Thibault, J., Banks, E., Garimella, K.V., Altshuler, D., Gabriel, S., DePristo, M.A., 2013. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinforma. 43: 11.10.1-33. Wang, L., Song, L., Zhang, H., Gao, Q., Guo, X., 2007. Genetic linkage map of bay scallop, Argopecten irradians irradians (Lamarck 1819). Aquac. Res. 38: 409–419. Wang, J., Li, L., Zhang, G., 2016. A high-density SNP genetic linkage map and QTL analysis of growth-related traits in a hybrid family of oysters (Crassostrea gigas×Crassostrea angulata) using genotyping-by-sequencing. G3-Genes Genomes Genet. 6: 1417–1426. Ward, J. A., Bhangoo, J., Fernández-Fernández, F., Moore, P., Swanson, J., Viola, R., Velasco, R., Bassil, N., Weber, C.A., Sargent, D.J., 2013. Saturated linkage map construction in Rubus idaeususing genotyping by sequencing and genome-independent imputation. BMC Genomics 14: 2. Wu, Y., Close, T.J., Lonardi, S., 2011. Accurate construction of consensus genetic maps via integer linear programming. IEEE/ACM Trans. Comput. Biol. Bioinform. 8: 381–394. Xu, K., Li, Q., Kong, L., Yu, R., 2008. A first-generation genetic map of the Japanese scallop Patinopecten yessoensis-based AFLP and microsatellite markers. Aquac. Res. 40: 35–43. Yu, Z., Guo, X., 2003. Genetic linkage map of the eastern oyster Crassostrea virginica Gmelin. Biol. Bull. 204: 327–338. 67 Yu, Z., Guo, X., 2006. Identification and mapping of disease-resistance QTLs in the eastern oyster, Crassostrea virginica Gmelin. Aquaculture 254: 160–170. Yuan, T., He, M., Huang, L., Hu, J., 2010. Genetic linkage maps of the noble scallop Chlamys nobilis Reeve based on AFLP and microsatellite markers. J. Shellfish Res. 29: 55–62. Zhan, A., Hu, J., Hu, X., Hui, M., Wang, M., Peng, W., Huang, X., Wang, S., Lu, W., Sun, C., Bao, Z., 2009. Construction of microsatellite-based linkage maps and identification of size- related quantitative trait loci for Zhikong scallop (Chlamys farreri). Anim. Genet. 40: 821–831. Zhang, G., Fang, X., Guo, X., Li, L., Luo, R., et al., 2012. The oyster genome reveals stress adaptation and complexity of shell formation. Nature 490: 49–54. Zhong, X., Li, Q., Guo, X., Yu, H., Kong, L., 2014. QTL mapping for glycogen content and shell pigmentation in the Pacific oyster Crassostrea gigas using microsatellites and SNPs. Aquac. Int. 22: 1877–1889. 68 Chapter 3 Mapping Genetic Factors Determining Type III Survivorship in the Pacific Oyster Crassostrea gigas Abstract Many marine invertebrates, like the Pacific oyster, have high fecundity and high mortality during early life stages (type III survivorship). The first hints that high early mortality in marine molluscs might have a genetic basis were widespread observations of distortions of Mendelian segregation ratios. Later, multigenerational studies revealed genetic determinants of Mendelian segregation distortion and provided low-density genetic maps of their number and distribution in the genome. In this study, we conduct viability quantitative trait locus (vQTL) mapping on six interrelated F2 families with high-density linkage maps consisting of single nucleotide polymorphism (SNP) markers generated from genotyping-by-sequencing (GBS). We detect 74 vQTL with a range of 9-14 vQTL in each family, which is consistent with previous studies. Genotype-dependent mortality, resulting from all vQTL, ranges from 98.23% to 99.96% across six families. Sixty-five of 74 vQTL in six families are assigned to six genetic models caused by different numbers and types of mutations; 52% of these vQTL are shown to be caused by a single recessive mutation, suggesting identical-by-descent (IBD) recessive deleterious alleles as the major driver of type III survivorship. The vQTL peaks attributed to single recessive mutations tend to be sharp, indicating that, compared to low-density linkage maps, high-density linkage maps can improve the accuracy of vQTL mapping by more narrowly localizing vQTL peaks caused by IBD recessive deleterious alleles. Finally, high-density linkage maps are 69 effective in teasing apart multiple deleterious mutations under broad vQTL peaks and their genetic effects, especially when grandparental haplotypes can be determined. 1. Introduction Many marine invertebrates, like the Pacific oyster, have high fecundity and high mortality during early life stages (type III survivorship). While environmental factors, including nutrition, temperature, desiccation, pCO2, salinity, water motion and solar radiation, are believed to be major drivers of the high early mortality in marine invertebrates, the contribution from genotype- dependent inviability should not be ignored (Phillips, 2002; Parker et al., 2010; Plough and Hedgecock, 2011; Plough et al., 2016). The first hints of genotype-dependent mortality in marine invertebrates were widespread reports of distortions of Mendelian segregation ratios in bivalves including clams, mussels and oysters (Beaumont et al., 1983; Gaffney and Scott, 1984; Foltz, 1986; Thiriot-Quiévreux et al., 1992; Hu et al., 1993; McGoldrick and Hedgecock, 1997; Launey and Hedgecock, 2001; Plough and Hedgecock, 2011; Plough et al., 2016). With observed deviations in genotype counts from expected Mendelian segregation ratios of inheritance, mechanisms underlying genotype-dependent inviability can be uncovered and the magnitude of genetic load can be estimated. A better understanding of genetic inviability and genetic load can illuminate potential causal factors for the high early mortality, an ecologically and economically important phenomenon, in marine invertebrates. By comparing segregation ratios at microsatellite DNA markers across eight F2 or F3 families of Pacific oysters from classical crossbreeding experiments, Launey and Hedgecock (2001) found a strong selection against identical-by-descent (IBD) marker homozygotes and estimated the minimum number of highly deleterious mutations carried by wild Pacific oysters in each family to be 8-14. Following this molecular marker-based analysis, Plough and Hedgecock 70 (2011) conducted quantitative trait locus (QTL) mapping on two unrelated F2 inbred families at five life stages, in order to identify the number, genomic location, magnitude of gene effect, mode of gene action and temporal expression of genetic factors affecting viability in the Pacific oyster. This first study on viability QTL (vQTL) mapping in the Pacific oyster discovered 14-15 vQTL, at which recessive or partially recessive alleles were selected against (Plough and Hedgecock, 2011). Meanwhile, this study examined the stage-specific expression of vQTL for the first time in marine bivalves and identified 50% of vQTL to be expressed during metamorphosis (Plough and Hedgecock, 2011). Using a similar set of methods, Plough (2012) further explored effects of genotype-by-environment interactions on the early mortality in Pacific oysters and showed a stronger selection against deleterious alleles under stressful environments. To investigate whether randomly bred offspring carry the same genotype-dependent mortality as that in inbred crosses, Plough et al. (2016) did gene mapping on four randomly bred families from wild-caught parents. Surprisingly, they discovered a similar number of vQTL and deleterious alleles (i.e. 8-12 vQTL and 11-19 deleterious alleles per family), obtained roughly the same estimate of genotype-dependent mortality (i.e. 97.9–99.8%), and suggested genetic inviability as a driver of type III survivorship (Plough et al., 2016). While these previous studies shed light on genetic causes for Mendelian segregation distortion and reveled genetic determinants of type III survivorship, they may have been limited by the availability of low-density linkage maps. Except for Plough et al. (2016), these studies are based on a relatively small number of microsatellite DNA markers. The molecular marker-based analysis by Launey and Hedgecock (2001) studied only nineteen microsatellite DNA markers. An average of 48.5 and 43.5 microsatellite DNA markers were used to construct linkage maps in studies by Plough and Hedgecock (2011) and Plough (2012), respectively. Each of the four 71 vQTL maps constructed for randomly bred offspring by Plough et al. (2016) comprised, on average, 23 microsatellites and 21.5 SNPs. Estimates on the magnitude of genetic load and the mode of gene action may be biased by low-density linkage maps. For inbred families, the mode of gene action has to be tested with a two-locus selection model of Hedrick and Muona (1990), which can only accommodate IBD marker homozygotes, and the number of vQTL may be underestimated without referring to stage-specific data (Plough and Hedgecock, 2011; Plough, 2012). Likewise, for randomly bred offspring, the number of deleterious mutations can only be estimated with genetic effects identified by contingency chi-square tests at vQTL, rather than the genetic model of each vQTL (Plough et al., 2016). Here, we conduct vQTL mapping with high-density linkage maps in six, interrelated F2 families. With the advent of high-throughput sequencing technologies, the generation of large datasets of genome-wide markers has become feasible even for non-model organisms like the Pacific oyster. In our study, we construct high-density linkage maps using SNPs generated by genotyping-by-sequencing (GBS; Bentley et al., 2008; Elshire et al., 2011), generating linkage maps with 794-1,351 markers per family. By conducting vQTL mapping with more than 10× denser linkage maps (relative to the low-density linkage maps used in previous studies), we identify a similar number of vQTL per family and obtain a similar estimate of genotype- dependent mortality. Furthermore, we propose genetic models for vQTL, showing that, with high-density linkage maps, we are able to more narrowly localize vQTL caused by IBD recessive mutations and tease apart multiple deleterious alleles and their genetic effects under broad vQTL peaks. 72 2. Materials and Methods 2.1 Biological materials Our study was based on six, interrelated F2 families (sire×dam), 23×31, 23×40, 31×23, 40×92, 47×92 and 92×40. Each F2 family was derived from an intercross of full-sib F1 hybrids, which, in turn, were produced by crosses of partially inbred lines 23, 31, 40, 47 and 92 in 2009. We made these F2 families in May and June of 2011 and reared them through the summer of 2012 in Thorndyke Bay, WA. Detailed pedigree and rearing information is described and shown in 3. Pedigrees and Rearing of Six, F2 Interrelated Families and Figure 1 in Introduction. 2.2 Data collection The six F2 families were harvested in September or October of 2012 and shipped to the University of Southern California, where adductor muscle tissue was dissected and preserved in 70% ethanol for later DNA extraction. Sixty-four out of 1,488 tagged oysters (i.e. F2 progeny) were lost from June (or July for 47×92) to October (or September for 47×92 and 92×40), indicating that the mortality did not occur during the second summer post fertilization. We determined genotypes of oysters from the six, F1 hybrid families (parent genotypes) and the six F2 families (progeny genotypes) through genotyping-by-sequencing (GBS) and bioinformatics analyses guided by Genome Analysis Toolkit (GATK, https://www.broadinstitute.org/gatk/) (Bentley et al., 2008; Elshire et al., 2011). Details on GBS (library preparation and sequencing) and GATK analysis are described in Chapter 2. Finally, we conducted vQTL analysis with the same set of 12 parent oysters from six, F1 families and 1,041 progeny from six, F2 families (i.e. 23×31: n=208; 23×40: n=181; 31×23: n=142; 40×92: n=258; 47×92: n=127; 92×40: n=125) that were used in linkage mapping in Chapter 2. 73 2.3 Data analyses 2.3.1 Analysis of Mendelian segregation ratios Since the grandparents of families in this study were not completely inbred, there were three possible cross types in progeny (F2 families), ab×ab, aa×ab and ab×aa. In the absence of viability selection, progeny genotype proportions should conform to expected Mendelian segregation ratios (aa:ab:bb=1:2:1 or aa:ab=1:1). We tested whether progeny genotype proportions deviate from an expected Mendelian segregation ratio, using goodness-of-fit chi- square tests, with the significance threshold adjusted by Bonferroni sequential correction for multiple simultaneous tests for each cross type (Rice, 1989). Since cross types aa×ab and ab×aa had the same expected Mendelian segregation ratio, they were taken as one cross type in the adjustment of significance thresholds. Numbers of markers showing Mendelian segregation distortion (i.e. distorted markers) at α=0.05 and at the adjusted-α level were reported. Since some markers were removed from linkage maps for viability QTL (vQTL) mapping (see 2.3.2 for detail), we only conducted Mendelian segregation distortion analysis on markers that were kept for vQTL mapping. 2.3.2 Viability QTL mapping We constructed sixty high-density linkage maps for the six, F2 families, according to methods in Chapter 2, using JoinMap 4.1. We then mapped vQTL based on a viability QTL model developed by Luo and Xu (2003), which is implemented in PROC QTL, a user-defined procedure in SAS (version 9.4). Inputs for vQTL mapping included phased parent genotypes, progeny genotypes, and linkage maps. In running PROC QTL, we encountered a “Floating Point Zero Divide” error in all six families, which caused the program to stop and were likely caused by the small spacing between markers (Z. Hu, University of California-Riverside Department of 74 Botany and Plant Sciences, personal communication); we rectified these errors by removing markers at the point in the data where the error was generated. Since there is no phenotype for viability, PROC QTL generates a random dummy variable for each individual, as the trait for vQTL mapping. Genomes were scanned in 1-cM increments using the Maximum Likelihood method with distortion analysis. Since original parent lines were not completely inbred, we set the mating type of each F2 family to the four-way cross, AB×CD. A likelihood ratio test (LRT) statistic and estimated proportions of progeny genotypes (i.e. AC, AD, BC, BD) were generated for each increment on the genome. We approximated genome-wise and chromosome-wise QTL significance thresholds, at the α=0.05 level, by the method of Piepho (2001). When more than one QTL peak appeared on a linkage group, we identified separate vQTL peaks only when the LRT statistic between two QTL peaks fell by at least 4.6 (~1 LOD) (Lander and Botstein, 1989). If a vQTL included multiple QTL peaks, the position on the genome with the highest LRT value was taken as the location for that vQTL. 2.3.3 Mortality, genetic effects and genetic models of vQTL Genotype-dependent mortality can cause distortions of Mendelian segregation ratios at vQTL. We calculated the average relative survival, 𝑆 ̅ , at a vQTL according to (1), 𝑆 ̅ = ( 𝑤 11 / 𝑤 𝑚 𝑎𝑥 + 𝑤 12 / 𝑤 𝑚 𝑎𝑥 + 𝑤 21 / 𝑤 𝑚 𝑎𝑥 + 𝑤 22 / 𝑤 𝑚 𝑎𝑥 ) 4 = 1 4 𝑤 𝑚 𝑎𝑥 (1), where w11, w12, w21 and w22 are proportions of progeny genotypes, AC, AD, BC and BD at a vQTL, respectively, and wmax is the proportion of the most frequent genotype at a vQTL. Then, we calculated genotype-dependent mortality according to (2), M = 1 − ∏ 𝑆 ̅ 𝑖 𝑛 𝑖 = 1 (2), 75 where ∏ 𝑆 ̅ 𝑖 𝑛 𝑖 = 1 is the product of the average relative survival of all independent vQTL in a family. We defined independent vQTL as vQTL on different linkage groups or separated by at least 50 cM, if on a single linkage group. We tested the fitness effects of individual paternal alleles (the sire effect), maternal alleles (the dam effect), and interaction between parental alleles, using contingency chi-square tests with the significance threshold adjusted by Bonferroni sequential correction for multiple simultaneous tests (Fig. 6) (Foltz, 1986). Since a position with the highest LRT value under a vQTL was used to represent that vQTL, we conducted contingency chi-square tests on the position with the highest LRT value for each vQTL. Counts of individuals under each category were calculated by multiplying estimated frequencies of progeny genotypes from PROC QTL by the sample size of a family for each position being tested. Significant genetic effects at α=0.05 and adjusted-α levels were reported. C D ∑row Effect Chi-square A 4 60 64 Paternal 30.35 B 80 64 144 ∑d- Maternal 7.97 ∑col 84 124 208 140 Interaction 24.50 ∑d+ 68 Total, 3 d.f. 62.82 Figure 6. Chi-square tests for testing genetic effects for vQTL1 in family 23×31. To determine the genetic bases underlying vQTL, we constructed six genetic models to explain the various patterns of distortions of Mendelian segregation ratios observed (Fig. 7). In Model 1, segregation distortion is caused by a single recessive deleterious mutation, resulting in frequencies of progeny genotypes, such as AC<<AD=BC=BD. In Model 2, segregation distortion is caused by two recessive deleterious mutations, resulting in frequencies of progeny genotypes, such as AC≤BD<<AD=BC, when two mutations are on different chromosomes 76 (Model 2 (1)), and AC≤AD<<BC=BD, when two mutations are on the same chromosome in one parent (Model 2 (2)). In Model 3, segregation distortion is caused by one recessive deleterious mutation and one additive deleterious mutation, resulting in frequencies of progeny genotypes, such as AC<<AD=BD<<BC, when one recessive mutation is in one parent and two mutations are on different chromosomes of the other parent (Model 3 (1)), and AC<AD<<BD<<BC, when recessive and additive mutations are on the same chromosome in one parent (Model 3 (2)). In Model 4, segregation distortion is caused by one additive deleterious mutation, resulting in frequencies of progeny genotypes, such as AD<<AC=BD<<BC, when the same additive mutation is on one chromosome of each parent (Model 4 (1)), and AC=AD<<BC=BD, when only one additive mutation is on a chromosome of one parent (Model 4 (2)). In Model 5, Mendelian segregation distortion is caused by two additive deleterious mutations, resulting in frequencies of progeny genotypes, such as BD<<BC<<AD<<AC. In Model 6, Mendelian segregation distortion is caused by one dominant deleterious mutation, resulting in frequencies of progeny genotypes, such as AC=AD=BC<<BD. The array of genotype frequencies listed for each model is just an example of all cases in this model (Fig. 7). For example, for Model 1, any of the genotypes, AC, AD, BC and BD, could be the IBD homozygote having the lowest survival, but here we only list AC<<AD=BC=BD as an example. The six genetic models are mutually exclusive, meaning that a vQTL can only be assigned to one of them. For each vQTL, we first conducted goodness-of-fit chi-square tests on numbers of individuals of each genotype, AC, AD, BC and BD, which were calculated by multiplying estimated frequencies of progeny genotypes from PROC QTL by the sample size of a family for each position being tested, at Bonferroni adjusted-α level. Then, we determined the relationship among frequencies of AC, AD, BC and BD, according to the goodness-of-fit chi-square test on numbers of individuals in 77 each genotype. Finally, we assigned each vQTL to a genetic model according to the relationship among frequencies of its four genotypes, AC, AD, BC and BD. We estimated the number of viability mutations per vQTL per parent from both genetic effects and genetic models for vQTL. We counted sire, dam and interaction effects as evidence for one, one, and two mutations at a vQTL for the two parents, respectively. From the genetic models, we counted Model 4 (2) as evidence for one mutation; Models 1, 4 (1) and 6 as evidence for two copies of the same mutation; Model 5 as evidence for two mutations; Model 3 (1) as evidence for three mutations; and Models 2 and 3 (2) as evidence for two copies of two different mutations. The number of mutations per vQTL per parent was calculated by taking the average number of mutations across all vQTL in a family and then dividing this average by two. 78 Model 1: 1 recessive mutation Model 2: 2 recessive mutations AC<<AD=BC=BD (1) AC ≤BD<<AD=BC (2) AC ≤AD<<BC=BD (AD<<AC=BC=BD) (BC<<AC=AD=BD) (BD<<AC=AD=BC) Model 3: 1 recessive mutation + 1 additive mutation Model 5: 2 additive mutations (-: recessive; x: additive) (1) AC<<AD=BD<<BC (2) AC<AD<<BD<<BC BD<<BC<<AD<<AC Model 4: 1 additive mutation Model 6: 1 dominant mutation (1) AD<<AC=BD<<BC (2) AC=AD<<BC=BD AC=AD=BC<<BD Figure 7. Six genetic models of vQTL. The array of genotype frequencies listed is an example of all cases in a genetic model. A B - C D - x A B - x C D - A B - x x C D - A B - x C D - A B - x x C D - x A B * C D A B x x C D A B x C D A B + C D + 79 3. Results 3.1 Distortions of Mendelian segregation ratios Numbers of scored markers for vQTL mapping in six F2 families range from 606 to 982, among which distorted markers account for 54% to 92% at α=0.05 and 26% to 62% at the adjusted-α level (Table 13). Compared to markers of mating types aa×ab and ab×aa, more markers of the mating type ab×ab are found in scored markers (Table 13). Likewise, among markers distorted at α=0.05, the proportion of markers of mating type ab×ab is higher than markers of mating types aa×ab and ab×aa (Table 13). 3.2 The number, position, magnitude of gene effect and mode of gene action of vQTL We discover nine to fourteen vQTL per family. The vQTL are widely distributed across different linkage groups but inconsistent in location across the six families (Table 14, Fig. 8). Based on PROC QTL-estimated genotype frequencies, we identify seven types of genetic effects underlying 74 vQTL in six families (Table 14). In addition to a single sire, dam or interaction effect, combinations of sire and dam effects, sire and interaction effects, dam and interaction effects, and sire, dam and interaction effects are also potential causes for vQTL (Table 14). While the combination of sire, dam and interaction effects is the major factor underlying vQTL at α=0.05, the interaction effect causes most vQTL significant at the adjusted-α level (Table 14). Based on genetic effects, we estimate the number of mutations per vQTL per parent to be 0.94- 1.46 in six families (Table 14). Meanwhile, with PROC QTL-estimated genotype frequencies, we estimate genotype-dependent mortalities to be 98.23%-99.96% in six families (Table 14). 3.3 Genetic models of vQTL According to the relationship between numbers of individuals of each genotype, AC, AD, BC and BD determined by goodness-of-fit chi-square tests, we assign 65 out of 74 vQTL to six 80 genetic models, with genetic models successfully identified for 88% of all vQTL (Table 15, Fig. 7). Among 65 vQTL, 52.3% and 27.7% of them are assigned to Model 1 and Model 2, respectively, in which recessive mutations are causes for distortions of Mendelian segregation ratios (Table 16, Fig. 7). Based on genetic models, the estimated number of mutations per vQTL per parent ranges from 1.08 to 1.6 in six families (Table 14). 81 Table 13. Segregation results for six F2 families Family Cross type Markers scored Distorted markers % distorted markers 23×31 aa×ab 39 20 (3) 51.3% (7.7%) a ab×aa 20 5 (1) 25% (5%) ab×ab 923 878 (391) 95.1% (42.4%) Total 982 903 (395) 92% (40.2%) 23×40 aa×ab 119 69 (52) 58% (43.7%) ab×aa 188 151 (97) 80.3% (51.6%) ab×ab 299 271 (227) 90.6% (75.9%) Total 606 491 (376) 81% (62%) 31×23 aa×ab 85 49 (18) 57.6% (21.2%) ab×aa 90 53 (9) 58.9% (10%) ab×ab 598 428 (223) 71.6% (37.3%) Total 773 530 (250) 68.6% (32.3%) 40×92 aa×ab 98 41 (19) 41.8% (19.4%) ab×aa 49 22 (1) 44.9% (2%) ab×ab 593 567 (368) 95.6% (62.1%) Total 740 630 (388) 85.1 (52.4%) 47×92 aa×ab 24 0 (0) 0% (0%) ab×aa 17 4 (0) 23.5% (0%) ab×ab 574 458 (244) 79.8% (42.5%) Total 615 462 (244) 75.1% (39.7%) 92×40 aa×ab 62 19 (3) 30.6% (4.8%) ab×aa 76 22 (7) 28.9% (9.2%) ab×ab 521 315 (159) 60.5% (30.5%) Total 659 356 (169) 54% (25.6%) a Numbers in parentheses are test results adjusted for simultaneous multiple testing at Bonferroni adjusted-α level. 82 Table 14. Viability QTL in six F2 families Family Viability Linkage Location LRT Marker a Cross PROC QTL-estimated genotype frequency Genetic Genetic wmax c locus group (cM) type AC AD BC BD effects b model 23×31 vQTL1 1 0.0 80.8 1858-A214794T AB×CD 0.020 0.289 0.382 0.309 S,D,I (S,I) Model 1 0.3824 vQTL2 1 37.9 27.8 AB×CD 0.102 0.281 0.303 0.314 S,D,I (S) Model 1 0.3140 vQTL3 2 14.8 99.5 AB×CD 0.000 0.336 0.381 0.282 S,D,I (S,D,I) Model 1 0.3813 vQTL4 2 34.9 32.3 41164-C72308T AB×CD 0.101 0.253 0.305 0.341 S,D (S) Model 1 0.3413 vQTL5 4 58.6 107.9 1631-A114958T AB×CD 0.341 0.005 0.333 0.321 S,D,I (S,D,I) Model 1 0.3413 vQTL6 5 2.2 102.3 759-G98514T AB×CD 0.328 0.303 0.369 0.000 S,D,I (S,D,I) Model 1 0.3691 vQTL7 5 21.8 42.1 1293-C217703T AB×CD 0.314 0.302 0.307 0.077 S,D,I (S,D,I) Model 1 0.3139 vQTL8 6 3.1 22.6 1836-T655659G AB×CD 0.354 0.202 0.144 0.300 I (I) 0.3537 vQTL9 7 15.4 48.7 AB×CD 0.059 0.271 0.397 0.273 S,I (S,I) Model 3 (1) 0.3974 vQTL10 7 51.8 23.4 1391-A191293T AB×CD 0.135 0.215 0.300 0.350 S (S) 0.3504 vQTL11 9 37.0 106.7 917-C2294A AB×CD 0.010 0.252 0.341 0.397 S,D,I (S,D) Model 1 0.3973 vQTL12 10 1.1 78.3 42794-G8178T AB×CD 0.192 0.370 0.390 0.048 D,I (I) Model 2 (1) 0.3900 vQTL13 10 20.5 130.9 343-C440670A AB×CD 0.187 0.416 0.391 0.005 S,D,I (I) Model 2 (1) 0.4164 # mutations per vQTL per parent 1.27 1.23 Genotype-dependent mortality 0.9996 23×40 vQTL1 1 0.0 98.3 416-G16019T AB×CD 0.331 0.000 0.341 0.329 S,D,I (S,D,I) Model 1 0.3408 vQTL2 1 19.9 52.5 169-G252822T AB×CD 0.328 0.050 0.304 0.318 S,D,I (S,D,I) Model 1 0.3285 vQTL3 1 43.7 68.3 713-T165190A AB×CD 0.312 0.039 0.409 0.240 S,D (S,D) Model 1 0.4088 vQTL4 2 48.0 104.8 925-G106550A AB×CD 0.225 0.421 0.353 0.000 S,D,I (S,I) Model 1 0.4215 vQTL5 3 8.6 24.8 1584-A347927G AB×CD 0.128 0.204 0.304 0.364 S (S) 0.3642 vQTL6 3 21.6 30.3 309-C72293T AB×CD 0.160 0.171 0.241 0.428 S,D,I (S) Model 6 0.4279 vQTL7 4 30.1 21.7 42422-C55681T AB×CD 0.320 0.343 0.199 0.138 S (S) Model 4 (2) 0.3426 vQTL8 5 36.0 44.5 43038-G68303A AB×CD 0.235 0.437 0.091 0.237 S,D (S,D) Model 4 (1) 0.4369 vQTL9 6 3.5 65.2 36674-G15754A AB×CD 0.344 0.392 0.044 0.220 S,D (S) Model 2 (2) 0.3918 vQTL10 7 18.6 140.1 AB×CD 0.568 0.259 0.173 0.000 S,D (S,D) Model 4 (1) 0.5680 Table continues next page. 83 Family Viability Linkage Location LRT Marker a Cross PROC QTL-estimated genotype frequency Genetic Genetic wmax c locus group (cM) type AC AD BC BD effects b model vQTL11 7 41.2 115.0 44-G54908A AB×CD 0.536 0.302 0.138 0.024 S,D (S,D) Model 5 0.5356 vQTL12 9 34.7 28.7 AB×CD 0.123 0.300 0.183 0.394 S,D (D) Model 4 (2) 0.3943 vQTL13 10 2.3 50.5 AB×CD 0.137 0.442 0.314 0.108 S,I (I) Model 2 (1) 0.4417 vQTL14 10 19.3 100.6 950-G656572A AB×CD 0.249 0.404 0.342 0.006 S,D,I (S,I) Model 1 0.4041 # mutations per vQTL per parent 1.04 1.08 Genotype-dependent mortality 0.9996 31×23 vQTL1 1 22.7 21.4 82-A100653G AB×CD 0.218 0.429 0.123 0.229 S,D (S,D) Model 6 0.4294 vQTL2 2 5.1 63.4 34574-T11514A 0.280 0.317 0.393 0.010 S,D,I (D,I) Model 1 0.3928 vQTL3 3 0.0 27.8 42530-T45430C 0.211 0.310 0.381 0.098 D,I (I) 0.3813 vQTL4 3 20.4 89.5 787-A201113C 0.154 0.384 0.462 0.000 D,I (I) Model 2 (1) 0.4616 vQTL5 3 27.9 116.3 0.022 0.479 0.463 0.035 I (I) Model 2 (1) 0.4795 vQTL6 3 33.5 122.0 39946-T13350C 0.007 0.498 0.410 0.085 D,I (I) Model 2 (1) 0.4983 vQTL7 4 46.9 73.6 1525-G81705T AB×CD 0.294 0.319 0.380 0.007 S,D,I (D,I) Model 1 0.3802 vQTL8 6 37.5 21.1 43034-T155315A AB×CD 0.304 0.197 0.372 0.127 D (D) 0.3722 vQTL9 7 19.5 33.7 1469-G309637A AB×CD 0.085 0.158 0.462 0.296 S,I (S) 0.4616 vQTL10 7 46.5 78.1 64-T26204C AB×CD 0.014 0.187 0.362 0.437 S,D (S) Model 2 (2) 0.4366 vQTL11 9 4.9 75.6 AB×CD 0.000 0.352 0.352 0.296 S,D,I (S,D,I) Model 1 0.3521 vQTL12 10 3.9 28.1 1009-G834487C AB×CD 0.380 0.233 0.296 0.092 S,D (D) Model 1 0.3803 vQTL13 10 30.5 80.9 AB×CD 0.260 0.294 0.446 0.000 D,I (D,I) Model 1 0.4459 # mutations per vQTL per parent 1.04 1.40 Genotype-dependent mortality 0.9823 40×92 vQTL1 1 12.7 37.1 AB×CD 0.325 0.362 0.192 0.122 S (S) Model 2 (2) 0.3615 vQTL2 1 53.7 40.3 1559-A50276G AB×CD 0.213 0.423 0.218 0.147 S,D,I (S,I) Model 6 0.4227 vQTL3 2 63.3 122.4 756-G216199A AB×CD 0.281 0.377 0.342 0.000 S,D,I (S,D,I) Model 1 0.3767 vQTL4 3 56.7 67.8 AB×CD 0.311 0.376 0.059 0.254 S,D,I (S,D) Model 1 0.3758 Table continues next page. 84 Family Viability Linkage Location LRT Marker a Cross PROC QTL-estimated genotype frequency Genetic Genetic wmax c locus group (cM) type AC AD BC BD effects b model vQTL5 4 28.3 20.3 1843-T53048C AB×CD 0.204 0.288 0.349 0.160 I (I) 0.3485 vQTL6 6 9.0 141.3 1599-A736309T AB×CD 0.004 0.258 0.374 0.364 S,D,I (S,D,I) Model 1 0.3742 vQTL7 6 33.5 162.9 1333-C234270T AB×CD 0.004 0.197 0.467 0.332 S,I (S,I) Model 3 (2) 0.4674 vQTL8 7 13.3 87.8 43742-A254143T AB×CD 0.042 0.312 0.408 0.238 S,I (S,I) Model 1 0.4079 vQTL9 7 39.0 81.9 AB×CD 0.032 0.341 0.310 0.317 S,D,I (S,D,I) Model 1 0.3405 vQTL10 7 55.9 145.8 1014-A226609T AB×CD 0.000 0.395 0.280 0.325 S,D,I (S,D,I) Model 1 0.3948 vQTL11 8 0.5 142.5 338-G18908C AB×CD 0.296 0.352 0.352 0.000 S,D,I (S,D,I) Model 1 0.3518 vQTL12 9 18.2 285.4 39712-C33111T AB×CD 0.000 0.065 0.745 0.190 S,D,I (S,D,I) Model 3 (2) 0.7453 vQTL13 10 24.5 37.5 1788-C461361T AB×CD 0.188 0.142 0.264 0.405 S,I (S) 0.4053 vQTL14 10 57.2 98.3 40050-G17260A AB×CD 0.308 0.031 0.376 0.285 S,D,I (S,D) Model 1 0.3759 # mutations per vQTL per parent 1.46 1.25 Genotype-dependent mortality 0.9996 47×92 vQTL1 1 37.9 40.3 AB×CD 0.256 0.266 0.429 0.049 D,I (D,I) Model 1 0.4291 vQTL2 2 23.6 103.5 756-A216191T AB×CD 0.000 0.567 0.129 0.304 D,I (D,I) 0.5671 vQTL3 3 26.7 29.4 257-C439776T AB×CD 0.071 0.334 0.334 0.261 S,D,I (I) Model 1 0.3343 vQTL4 4 57.7 73.6 1795-T46848C AB×CD 0.398 0.301 0.301 0.000 S,D,I (S,D) Model 1 0.3982 vQTL5 6 28.6 36.5 79-C932044T AB×CD 0.150 0.381 0.375 0.094 I (I) Model 2 (1) 0.3806 vQTL6 7 14.6 71.7 389-C112915A AB×CD 0.205 0.394 0.394 0.008 S,D,I (I) Model 2 (1) 0.3936 vQTL7 7 39.1 28.3 AB×CD 0.149 0.370 0.370 0.112 I (I) Model 2 (1) 0.3698 vQTL8 8 4.1 77.1 41700-C124063T AB×CD 0.240 0.380 0.380 0.000 S,D,I (I) Model 1 0.3798 vQTL9 9 3.9 85.8 AB×CD 0.165 0.417 0.417 0.000 I (I) Model 2 (1) 0.4174 vQTL10 10 7.0 79.1 AB×CD 0.000 0.400 0.400 0.199 S,D,I (I) Model 2 (1) 0.4003 vQTL11 10 24.5 101.3 962-T6532C AB×CD 0.039 0.457 0.457 0.047 I (I) Model 2 (1) 0.4567 # mutations per vQTL per parent 1.09 1.6 Genotype-dependent mortality 0.9996 Table continues next page. 85 Family Viability Linkage Location LRT Marker a Cross PROC QTL-estimated genotype frequency Genetic Genetic wmax c locus group (cM) type AC AD BC BD effects b model 92×40 vQTL1 1 32.6 36.1 AB×CD 0.181 0.364 0.385 0.070 I (I) Model 2 (1) 0.3850 vQTL2 2 43.3 31.1 1-C182642T AB×CD 0.087 0.441 0.232 0.240 D,I (D,I) Model 3 (1) 0.4409 vQTL3 4 20.7 21.7 364-T636690A AB×CD 0.351 0.352 0.128 0.169 S (S) Model 4 (2) 0.3520 vQTL4 4 69.9 21.7 C34600-G19284A AB×CD 0.405 0.396 0.151 0.049 S (S) Model 2 (2) 0.4047 vQTL5 5 3.5 40.7 AB×CD 0.326 0.371 0.042 0.261 S,D (S) Model 1 0.3706 vQTL6 6 3.3 25.1 544-A23864G AB×CD 0.340 0.080 0.272 0.308 D,I (I) Model 1 0.3400 vQTL7 7 18.5 133.4 AB×CD 0.007 0.492 0.492 0.009 I (I) Model 2 (1) 0.4923 vQTL8 9 3.8 66.0 1674-G159C AB×CD 0.252 0.365 0.374 0.008 S,D,I (I) Model 1 0.3742 vQTL9 10 23.7 67.1 AB×CD 0.000 0.361 0.311 0.328 S,D,I (D,I) Model 1 0.3607 # mutations per vQTL per parent 0.94 1.33 Genotype-dependent mortality 0.9994 a Markers are only shown for vQTL at which a marker is located at the position with the highest LRT. b Genetic effects in parentheses are test results adjusted for simultaneous multiple testing at Bonferroni adjusted-α level and the number of mutations per vQTL per parent were calculated based on genetic effects at Bonferroni adjusted-α level. c Bold numbers are the higher wmax for vQTL that are on the same linkage group but not 50 cM apart and are used to calculate genotype-dependent mortality. 86 87 Figure 8. Viability QTL in six F2 families. 88 Table 15. Genetic models of 65 vQTL in six F2 families Family vQTL LG Count p-value a Genetic model AC AD BC BD AD=BC=BD AC=(AD+BC+BD)/3 23×31 vQTL-1 1 4 60 80 64 0.215 5.45E-14 Model 1 23×31 vQTL-2 1 21 58 63 65 0.818 6.83E-06 Model 1 23×31 vQTL-3 2 0 70 79 59 0.217 8.32E-17 Model 1 23×31 vQTL-4 2 21 53 63 71 0.251 5.96E-06 Model 1 23×31 vQTL-11 9 2 53 71 83 0.035 2.18E-15 Model 1 31×23 vQTL11 9 0 50 50 42 0.641 5.99E-12 Model 1 40×92 vQTL-6 6 1 66 97 94 0.039 9.49E-20 Model 1 40×92 vQTL-8 7 11 80 105 61 0.003 1.27E-13 Model 1 40×92 vQTL-9 7 8 88 80 82 0.817 4.69E-15 Model 1 40×92 vQTL-10 7 0 102 72 84 0.563 1.80E-20 Model 1 47×92 vQTL-3 3 9 42 42 33 0.476 1.28E-05 Model 1 92×40 vQTL-9 10 0 45 39 41 0.788 1.08E-10 Model 1 AC AD BC BD AC=BC=BD AD=(AC+BC+BD)/3 23×31 vQTL-5 4 71 1 69 67 0.939 4.41E-16 Model 1 23×40 vQTL-1 1 60 0 62 59 0.977 8.01E-15 Model 1 23×40 vQTL-2 1 59 9 55 58 0.916 2.95E-09 Model 1 23×40 vQTL-3 1 57 7 74 43 0.018 2.52E-10 Model 1 40×92 vQTL-14 10 80 8 97 73 0.167 3.08E-15 Model 1 92×40 vQTL-6 6 42 10 34 38 0.625 4.60E-05 Model 1 AC AD BC BD AC=AD=BD BC=(AC+AD+BD)/3 40×92 vQTL-4 3 80 97 15 66 0.047 2.32E-11 Model 1 92×40 vQTL-5 5 41 46 5 33 0.306 2.45E-07 Model 1 AC AD BC BD AC=AD=BC BD=(AC+AD+BC)/3 23×31 vQTL-6 5 68 63 77 0 0.497 8.44E-17 Model 1 23×31 vQTL-7 5 65 63 64 16 0.977 8.02E-08 Model 1 23×40 vQTL-4 2 41 76 64 0 0.005 8.01E-15 Model 1 23×40 vQTL-14 10 45 73 62 1 0.036 4.22E-14 Model 1 31×23 vQTL2 2 40 45 56 1 0.240 6.17E-11 Model 1 31×23 vQTL7 4 42 45 54 1 0.432 3.12E-11 Model 1 31×23 vQTL12 10 54 33 42 13 0.078 6.32E-05 Model 1 31×23 vQTL13 10 37 42 63 0 0.015 5.99E-12 Model 1 40×92 vQTL-3 2 73 97 88 0 0.163 1.84E-20 Model 1 40×92 vQTL-11 8 76 91 91 0 0.454 1.80E-20 Model 1 Table continues next page. 89 Family vQTL LG Count p-value a Genetic model AC AD BC BD AC=AD=BC BD=(AC+AD+BC)/3 47×92 vQTL-1 1 33 34 54 6 0.023 5.75E-07 Model 1 47×92 vQTL-4 4 51 38 38 0 0.300 7.70E-11 Model 1 47×92 vQTL-8 8 31 48 48 0 0.085 7.70E-11 Model 1 92×40 vQTL-8 9 32 46 47 1 0.174 5.69E-10 Model 1 AC AD BC BD AC=BC=BD AD=(AC+BC+BD)/3 40×92 vQTL-2 1 55 109 56 38 0.120 2.40E-06 Model 6 AC AD BC BD AC=AD=BC BD=(AC+AD+BC)/3 23×40 vQTL-6 3 29 31 44 77 0.165 4.96E-05 Model 6 AC AD BC BD AC=BC=BD AD=(AC+BC+BD)/3 31×23 vQTL1 1 31 61 17 33 0.079 2.95E-04 Model 6 AC AD BC BD AC=BC AC=BD AD=BC 23×31 vQTL-13 10 39 87 81 1 1.12E-04 1.87E-09 0.686 Model 2 (1) 23×40 vQTL-13 10 25 80 57 20 3.83E-04 0.435 0.048 Model 2 (1) AC AD BC BD AC=AD AC=BD BC=BD 23×40 vQTL-9 6 62 71 8 40 0.455 0.026 4.47E-06 Model 2 (2) AC AD BC BD AC=AD AD=BC BC=BD 23×40 vQTL-10 7 103 47 31 0 4.66E-06 0.081 2.113E-08 Model 4 (1) 23×40 vQTL-11 7 97 55 25 4 6.07E-04 8.97E-04 1.23E-04 Model 5 92×40 vQTL-4 4 51 49 19 6 0.913 2.09E-04 0.011 Model 2 (2) AC AD BC BD AC=AD AC=BD AD=BC 23×31 vQTL-12 10 40 77 81 10 0.001 2.20E-05 0.737 Model 2 (1) 31×23 vQTL4 3 22 55 66 0 1.86E-04 2.92E-06 0.312 Model 2 (1) 31×23 vQTL5 3 3 68 66 5 1.43E-14 0.516 0.844 Model 2 (1) 31×23 vQTL6 3 1 71 58 12 1.82E-16 0.002 0.271 Model 2 (1) 31×23 vQTL10 7 2 27 51 62 4.26E-06 6.11E-14 0.005 Model 2 (2) 47×92 vQTL-5 6 19 48 48 12 3.50E-04 0.209 0.946 Model 2 (1) 47×92 vQTL-6 7 26 50 50 1 0.006 1.55E-06 1 Model 2 (1) 47×92 vQTL-7 7 19 47 47 14 0.001 0.410 1 Model 2 (1) 47×92 vQTL-9 9 21 53 53 0 1.95E-04 4.66E-06 1 Model 2 (1) 92×40 vQTL-1 1 23 45 48 9 0.006 0.013 0.786 Model 2 (1) 92×40 vQTL-7 7 1 62 62 1 1.56E-14 0.882 1 Model 2 (1) Table continues next page. 90 Family vQTL LG Count p-value a Genetic model AC AD BC BD AC=AD AD=BD BC=BD 23×31 vQTL-9 7 12 56 83 57 1.07E-07 0.961 0.029 Model 3 (1) 40×92 vQTL-7 6 1 51 121 86 4.50E-12 0.003 0.015 Model 3 (2) 40×92 vQTL-12 9 0 17 192 49 4.16E-05 7.46E-05 2.63E-20 Model 3 (2) 92×40 vQTL-3 4 44 44 16 21 0.988 0.005 0.399 Model 4 (2) AC AD BC BD AC=AD AC=BC BC=BD 23×40 vQTL-7 4 58 62 36 25 0.714 0.023 0.159 Model 4 (2) 40×92 vQTL-1 1 84 93 49 31 0.476 0.003 0.045 Model 2 (2) AC AD BC BD AC=BC AD=BD BC=BD 92×40 vQTL-2 2 11 55 29 30 0.004 0.007 0.884 Model 3 (1) AC AD BC BD AC=BC AD=BC AD=BD 23×40 vQTL-12 9 22 54 33 71 0.143 0.023 0.129 Model 4 (2) AC AD BC BD AC=BC AC=BD AD=BD 23×40 vQTL-8 5 42 79 17 43 0.001 0.961 0.001 Model 4 (1) AC AD BC BD AC=BD AD=BC AD=BD 47×92 vQTL-10 10 0 51 51 25 4.82E-07 1 0.003 Model 2 (1) 47×92 vQTL-11 10 5 58 58 6 0.763 1 8.03E-11 Model 2 (1) a The p-value is from goodness-of-fit chi-square tests on the relationship between numbers of individuals of different genotypes. Table 16. Summary of genetic models underlying vQTL Genetic model No. of vQTL % vQTL Model 1 34 52.3% Model 2 (1) 14 27.7% Model 2 (2) 4 Model 3 (1) 2 6.2% Model 3 (2) 2 Model 4 (1) 2 7.7% Model 4 (2) 3 Model 5 1 1.5% Model 6 3 4.6% Total 65 91 4. Discussion 4.1 Numbers of scored and distorted markers Scored markers are found to have three mating types (i.e. aa×ab, ab×aa and ab×ab) because grandparental lines used to set up the F1 hybrid families (i.e. parental lines) were not completely inbred (Fig. 1, Introduction). Total numbers of scored markers for vQTL mapping in each family range from 606 to 982 (Table 13), meaning that linkage maps used for vQTL mapping in this study are at least 10× denser than those used previously (Plough and Hedgecock, 2011; Plough, 2012; Plough et al., 2016). Family 23×31 has a higher number of markers because we sequenced this family in libraries of 48 individuals, rather than in libraries of 96 individuals, as we did for the other five families. Unlike previous high-density C. gigas linkage maps, most of which include more markers segregating from only one parent, a higher proportion of bi-parentally segregating markers are found among all scored markers in this study (Hedgecock et al., 2015; Wang et al., 2016). This accords with the fact that regression (RG) and maximum likelihood (ML) maps are more consistent when more bi-parentally segregating markers are included (Hedgecock et al., 2015). The percentage of scored markers with deviations from expected Mendelian segregation ratios are 54-92% and 26-62% at the nominal α=0.05 and Bonferroni adjusted-α levels, respectively (Table 13). Previous studies on marine bivalves have shown proportions of Mendelian segregation distortion ranging from ~15-50%, with a maximum higher than 75% (Gaffney and Scott, 1984; Mallet et al., 1985; Foltz, 1986; Beaumont et al., 1988; Beaumont, 1991; Hu et al., 1993; Hu and Foltz, 1996; McGoldrick and Hedgecock, 1997; Bierne et al., 1998; Lallias et al., 2007a, b, 2009; Sauvage et al., 2010; Harrang et al., 2015; Plough, 2016). Proportions of distorted markers at Bonferroni adjusted-α level in this study fall in the range 92 reported in previous studies. However, compared to vQTL mapping with low-density linkage maps (e.g. Plough and Hedgecock, 2011), a higher proportion of markers are distorted at α=0.05 and a wider range of distorted marker proportions is observed at Bonferroni adjusted-α level in our study. The reduction in proportions of distorted markers from α=0.05 to Bonferroni adjusted- α levels ranges from ~19-52%. Extremely high proportions of distorted markers at α=0.05 may be generated by an increased Type I error in multiple simultaneous tests, while proportions of Mendelian segregation distortion at Bonferroni adjusted-α level may be overcorrected. Thus, numbers and proportions of distorted markers at α=0.05 and Bonferroni adjusted-α levels are both reported for reference here. 4.2 Detecting and localizing vQTL The number of vQTL ranges from nine to fourteen in six families, which is consistent with the 14-15 vQTL identified in Plough and Hedgecock (2011) and the 8-12 vQTL found in Plough et al. (2016). Thus, the high-density linkage maps used in this study did not detect more vQTL than what was detected by low-density linkage maps and previous linkage-map based estimates of genetic load are substantiated by this study. The cumulative genetic inviability calculated across all independent vQTL in each family ranges from 98.23-99.96%, which agrees with previous estimates (Plough and Hedgecock, 2011; Plough et al., 2016). In each family, vQTL are distributed across eight (23×31, 31×23, 92×40) or nine (23×40, 40×92, 47×92) out of ten linkage groups. Distributions of vQTL on the C. gigas genome differ across families and no consistent pattern of vQTL distribution is detected even for reciprocal families (23×31 versus 31×23, 40×92 versus 92×40). The wide distribution of vQTL across different linkage groups also accords with previous findings (Plough and Hedgecock, 2011; Plough, 2012; Plough et al., 2016). 93 4.3 Magnitudes of gene effects and modes of gene action Use of high-density linkage maps does improve the resolution of vQTL and classification of their gene actions compared to previous vQTL mapping with low-density maps. With high- density linkage maps, PROC QTL-estimated genotype frequencies are available for every 0.5 cM under each vQTL peak on average, making it possible to identify causal factors for vQTL at a fine scale. Fitting of data to six genetic models for explaining the genetic bases of vQTL suggests, first, that viability mutations are common enough to overlap in their effects on linked markers and, second, that these mutations show recessive, additive, and dominant modes of gene action. Of 65 vQTL fit to models, 80% are caused by one or two recessive mutations and 6% of vQTL arise from one recessive and one additive mutations, indicating that most deleterious mutations causing mortality are recessive. A single recessive mutation contributes to more than half of all vQTL, showing that IBD deleterious alleles are the major driver of the high early mortality in inbred families of Pacific oysters, as found previously (Launey and Hedgecock, 2001; Plough and Hedgecock, 2011). Three vQTL are caused by one dominant mutation, implying that highly deleterious dominant viability mutations arise occasionally. We would expect such mutations to be quickly eliminated from populations by natural selection. In addition to whether a mutation is recessive, additive, or dominant (i.e. the type of mutations), how many mutations are present (i.e. the number of mutations) and on which chromosomes these mutations are located (i.e. the location of mutations) also matter. Models 2, 3 and 4 all have two sub-models resulting in different frequencies of four genotypes. Sub-models under Models 3 and 4 and Model 2 demonstrate effects of the number and location of deleterious alleles on observed genotype frequencies at vQTL, respectively. Compared to vQTL mapping with low-density linkage maps, from which one can only roughly estimate the type and relative 94 position of deleterious mutations from the selection coefficient (s), the dominance deviation (h) and the recombination distance (c) in a two-locus selection model, vQTL mapping with high- density linkage maps can clearly determine the type, number and location of deleterious alleles causing Mendelian segregation distortion. Therefore, high-density linkage maps are more powerful in uncovering genetic mechanisms underlying the high early mortality in the Pacific oyster. Seven types of genetic effects are identified with contingency chi-square tests. A combination of sire, dam and interaction effects and a single interaction effect account for the most vQTL at α=0.05 and Bonferroni adjusted-α levels, respectively, suggesting that both paternal and maternal parents contribute genetic factors to the high early mortality of offspring. A correspondence can be established between some genetic models and genetic effects, which can verify genetic models (Models 2, 3, 4, 5) biologically and help trace the source of a deleterious allele. Viability QTL assigned to Model 2 (1) all show interaction effects because parents carry identical deleterious mutations in the same way and each of four chromosomes contribute one deleterious allele. Sire effects are detected at vQTL caused by Model 2 (2) because two deleterious alleles are located on one chromosome in one parent. Interaction effects plus sire, dam or sire and dam effects are found for Model 3 because parents carry different mutations and thus contribute to Mendelian segregation distortion differently. Likewise, vQTL assigned to Model 4 (1) and Model 4 (2) are found to be affected by the combination of sire and dam effects and sire/dam effects, respectively, because each parent in Model 4 (1) contributes one deleterious allele and only one parent in Model 4 (2) carries a mutation. Sources of chromosomes carrying a mutation can be determined by whether significant sire or dam effects 95 are detected. Similarly, both sire and dam effects appear significant at vQTL caused by Model 5 because both parents contribute deleterious mutations. With genetic effects and genetic models, average numbers of mutations are calculated. The number of mutations is estimated for each parent at each vQTL rather than for each parent because we are unable to assign some vQTL to any genetic models (Table 14). If we assume the average number of mutations per vQTL per parent holds true for vQTL without a genetic model, average numbers of mutations carried by a parent estimated from genetic effects and genetic models across six families are 14 and 16, respectively, which are within the range of previous estimates (Launey and Hedgecock, 2001; Plough et al., 2016). Since genetic models clearly show the number of mutations on each chromosome for both parents, the average number of mutations in each family estimated with genetic models is more accurate. 4.4 Improvements achieved by adopting high-density linkage maps With high-density linkage maps, vQTL peaks are more narrowly localized in our study compared to previous studies (cf. Fig. 8 here with Fig. 1 in Plough and Hedgecock (2011), Fig. 2 in Plough (2012) and Fig. 1 in Plough et al. (2016)). The vQTL peaks caused by single recessive deleterious mutations tend to be sharp (Table 14, Fig. 8). On the basis of this, the number of candidate genes responsible for mortality arising from IBD recessive deleterious alleles to test will be greatly reduced. In our case, genomic regions covered by sharp vQTL peaks resulting from Model 1 ranges from 0.4 to 16.2 cM, with an average of 8.2 cM. If we assume genes are evenly distributed on the genome, then the gene density is estimated to be ~52 genes per cM (28,027 genes/545 cM) in the Pacific oyster (Zhang et al., 2012). Thus, an average of 426 candidate genes need to be tested under each vQTL. However, chances are that the number of 96 candidate genes for further tests are overestimated because, more practically, it may be reasonable to only test genes located at positions with super high LRT. In addition to better localizing vQTL caused by IBD recessive deleterious mutations, high- density linkage maps make it possible to tease apart multiple deleterious mutations and their genetic effects under broad peaks. For example, vQTL13 in family 23×31 has a broad peak with a left shoulder (Fig. 8, Fig. 9). When checking genotype frequencies under this vQTL peak, genotypes aa and bb become more and less frequent, respectively, from position 430 cM to position 448 cM (Fig. 9). This demonstrates that genotypes aa and bb are selected against at position ~432 cM and position ~446 cM, respectively, suggesting that two recessive deleterious alleles located at two positions cause this vQTL peak (Fig. 9). The frequency of aa is higher than that of bb in general, meaning that the selection against bb is stronger than that against aa (Fig. 9). While vQTL mapping with low-density linkage maps also generates broad vQTL peaks, the specific genetic mechanisms underlying these vQTL cannot be determined as readily. 97 Figure 9. Viability QTL13 in family 23×31. 40 60 80 100 120 140 415 425 435 445 455 LRT cM A 0.16 0.17 0.18 0.19 0.20 0.21 430 435 440 445 450 Genotype frequency of aa cM B aa 0.00 0.01 0.02 0.03 0.04 0.05 430 435 440 445 450 Genotype frequency of bb cM C bb 98 4.5 Comparisons of vQTL across six families Our study is based on six, interrelated F2 families, among which there are two pairs of reciprocal families (23×31 versus 31×23 and 40×92 versus 92×40), so we were expecting to see common vQTL shared by related families. However, no pattern in numbers or distributions of vQTL across different families is detected (Table 14). This holds true even for reciprocal families. Likewise, we are unable to identify any common vQTL shared by multiple families. While the inconsistency in numbers/distributions of vQTL across six families and the lack of common vQTL shared by multiple families suggest a large load of deleterious mutations in the Pacific oyster, the insufficient power to detect common vQTL should not be ignored. We were unable to obtain genotype information on grandparental lines, which makes it difficult to trace deleterious alleles causing vQTL back to a common grandparent. In addition, the grandparental lines used to set up F2 families in this study undergo only one or two generations of inbreeding, meaning that they are not purely inbred (Fig. 1, Introduction). Due to the low degree of inbreeding, chromosomes and alleles inherited by different F2 offspring from their common ancestors may not even be identical. In this case, there is limited ability to detect common vQTL across families. Therefore, the power to detect the relationship of vQTL across multiple related families may be improved by including genotypic data on grandparental lines in analyses and increasing the inbreeding coefficient of founders in future studies. 99 References Beaumont, A.R., Beveridge, C.M., Budd, M.D., 1983. Selection and heterozygosity within single families of the mussel Mytilus-Edulis (L.). Mar. Biol. Lett. 4: 151–161. Beaumont, A.R., Beveridge, C.M., Barnet, E.A., Budd, M.D., Smyth-Chamosa, M., 1988. Genetic studies of laboratory reared Mytilus edulis. I. Genotype specific selection in relation to salinity. Heredity 61: 389-400. Beaumont, A.R., 1991. Genetic studies of laboratory reared mussels, Mytilus edulis: heterozygote deficiencies, heterozygosity and growth. Biol. J. Linn. Soc. 44: 273–285. Bentley, D.R., Balasubramanian, S., Swerdlow, H.P., Smith, G.P., Milton, J., et al., 2008. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456: 53–59. Bierne, N., Launey, S., Naciri-Graven, Y., Bonhomme, F., 1998. Early effect of inbreeding as revealed by microsatellite analyses on Ostrea edulis larvae. Genetics 148: 1893–1906. Elshire, R.J., Glaubitz, J.C., Sun, Q., Poland, J.A., Kawamoto, K., Buckler, E.S., Mitchell, S.E., 2011. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLOS ONE 6, e19379 Foltz, D.W., 1986. Null alleles as a possible cause of heterozygote deficiencies in the oyster Crassostrea virginica and other bivalves. Evolution 40: 869–870. Gaffney, P.M., Scott, T.M., 1984. Genetic heterozygosity and production traits in natural and hatchery populations of bivalves. Aquaculture 42: 289–302. Harrang, E., Heurtebise, S., Faury, N., Robert, M., Arzul, I., Lapègue, S., 2015. Can survival of European flat oysters following experimental infection with Bonamia ostreae be predicted using QTLs? Aquaculture 448: 521–530. Hedgecock, D., Davis, J.P., 2007. Heterosis for yield and crossbreeding of the Pacific oyster Crassostrea gigas. Aquaculture 272, Supplement 1: S17–S29. Hedgecock, D., Shin, G., Gracey, A.Y., Den Berg, D.V., Samanta, M.P., 2015. Second- generation linkage maps for the Pacific oyster Crassostrea gigas reveal errors in assembly of genome scaffolds. G3-Genes Genomes Genet. 5: 2007–2019 Hedrick, P.W., Muona, O., 1990. Linkage of viability genes to marker loci in selfing organisms. Heredity 64: 67–72. Hu, Y.-P., Lutz, R.A., Vrijenhoek, R.C., 1993. Overdominance in early life stages of an American oyster strain. J. Hered. 84: 254–258. Hu, Y.P., Foltz, D.W., 1996. Genetics of scnDNA polymorphisms in juvenile oysters (Crassostrea virginica). 1. Characterizing the inheritance of polymorphisms in controlled crosses. Mol. Mar. Biol. Biotechnol. 5: 123–129. 100 Lallias, D., Lapègue, S., Hecquet, C., Boudry, P., Beaumont, A.R., 2007a. AFLP-based genetic linkage maps of the blue mussel (Mytilus edulis). Anim. Genet. 38: 340–349. Lallias, D., Beaumont, A.R., Haley, C.S., Boudry, P., Heurtebise, S., Lapègue, S., 2007b. A first- generation genetic linkage map of the European flat oyster Ostrea edulis (L.) based on AFLP and microsatellite markers. Anim. Genet. 38: 560–568. Lallias, D., Gomez-Raya, L., Haley, C.S., Arzul, I., Heurtebise, S., Beaumont, A.R., Boudry, P., Lapègue, S., 2009. Combining two-stage testing and interval mapping strategies to detect QTL for resistance to bonamiosis in the European flat oyster Ostrea edulis. Mar. Biotechnol. 11: 570– 584. Lander, E.S., Botstein, D., 1989. Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121: 185–199. Langdon, C., Evans, F., Jacobson, D., Blouin, M., 2003. Yields of cultured Pacific oysters Crassostrea gigas Thunberg improved after one generation of selection. Aquaculture 220: 227– 244. Launey, S., Hedgecock, D., 2001. High genetic load in the Pacific oyster Crassostrea gigas. Genetics 159: 255–265. Luo, L., Xu, S., 2003. Mapping viability loci using molecular markers. Heredity 90: 459–467. Mallet, A.L., Zouros, E., Gartner-Kepkay, K.E., Freeman, K.R., Dickie, L.M., 1985. Larval viability and heterozygote deficiency in populations of marine bivalves: evidence from pair matings of mussels. Mar. Biol. 87: 165–172. Manichaikul, A., Mychaleckyj, J.C., Rich, S.S., Daly, K., Sale, M., Chen, W., 2010. Robust relationship inference in genome-wide association studies. Bioinformatics 26: 2867–2873. Marshall, T.C., Slate, J., Kruuk, L.E., Pemberton, J.M., 1998. Statistical confidence for likelihood-based paternity inference in natural populations. Mol. Ecol. 7: 639–655. McGoldrick, D.J., Hedgecock, D., 1997. Fixation, segregation and linkage of allozyme loci in inbred families of the Pacific oyster Crassostrea gigas (Thunberg): Implications for the causes of inbreeding depression. Genetics 146: 321–334. Parker, L.M., Ross, P.M., O’Connor, W.A., 2010. Comparing the effect of elevated pCO2 and temperature on the fertilization and early development of two species of oysters. Mar. Biol. 157: 2435–2452. Phillips, N.E., 2002. Effects of nutrition-mediated larval condition on juvenile performance in a marine mussel. Ecology 83: 2562–2574. Piepho, H.P., 2001. A quick method for computing approximate thresholds for quantitative trait loci detection. Genetics 157: 425–432. Plough, L.V., Hedgecock, D., 2011. Quantitative trait locus analysis of stage-specific inbreeding depression in the Pacific oyster Crassostrea gigas. Genetics 189: 1473–1486. 101 Plough, L.V., 2012. Environmental stress increases selection against and dominance of deleterious mutations in inbred families of the Pacific oyster Crassostrea gigas. Mol. Ecol. 21: 3974–3987. Plough, L.V., Shin G., Hedgecock D., 2016. Genetic inviability is a major driver of type III survivorship in experimental families of a highly fecund marine bivalve. Mol. Ecol. 25: 895– 910. Plough, L.V., 2016. Genetic load in marine animals: a review. Curr. Zool. 62: 567–579. Rice, W.R., 1989. Analyzing tables of statistical tests. Evolution 43: 223–225. Sauvage, C., Boudry, P., de Koning, D.-J., Haley, C.S., Heurtebise, S., Lapègue, S., 2010. QTL for resistance to summer mortality and OsHV-1 load in the Pacific oyster (Crassostrea gigas). Anim. Genet. 41: 390–399. Thiriot-Quiévreux, C., Pogson, G.H., Zouros, E., 1992. Genetics of growth rate variation in bivalves: aneuploidy and heterozygosity effects in a Crassostrea gigas family. Genome 35: 39– 45. Wang, J., Li, L., Zhang, G., 2016. A high-density SNP genetic linkage map and QTL analysis of growth-related traits in a hybrid family of oysters (Crassostrea gigas×Crassostrea angulata) using genotyping-by-sequencing. G3-Genes Genomes Genet. 6: 1417–1426. Zhang, G., Fang, X., Guo, X., Li, L., Luo, R., et al., 2012. The oyster genome reveals stress adaptation and complexity of shell formation. Nature 490: 49–54. 102 Chapter 4 Uncovering the Genetic Bases of Growth Heterosis in the Pacific Oyster Crassostrea gigas Abstract Heterosis (hybrid vigor) has been widely observed in plants and animals, but debate over its genetic causes has lasted for more than a hundred years without resolution. Previous analyses of experimental crosses of inbred lines demonstrated growth heterosis in the Pacific oyster (Hedgecock and Davis, 2007) but did not resolve long-standing alternative genetic hypotheses, dominance and overdominance, for heterosis. To elucidate the genetic mechanisms of growth heterosis, we conducted quantitative trait locus (QTL) mapping on six, interrelated F2 families of Pacific oysters, using single nucleotide polymorphisms (SNPs) detected by Illumina sequencing of reduced-representation genomic libraries (genotyping-by-sequencing, GBS). High-density linkage maps with an average interval between SNPs of 0.49 cM provide frameworks for QTL mapping within each family. A suite of size- and growth-related phenotypes derived from five, monthly live weights measured on more than 1000 individually tagged oysters during their second summer post fertilization. We first discriminate parent-line and hybrid genotypes by comparing genetic similarities among the four, phased F1 parent haplotypes established by linkage mapping. Then, we determine the genetic architecture of growth heterosis by comparing phenotypes corresponding to the parent-line and hybrid genotypes that segregate in F2 families at each QTL. We detected 44 and 27 QTL for size- and growth-related traits (trait-QTL) and principal components (PC-QTL), respectively. Among 44 trait-QTL, 28 and 16 contribute to individual variation in single (i.e. minor trait-QTL) and multiple (i.e. major trait-QTL) traits, 103 respectively. By removing redundant overlapping trait-QTL and PC-QTL, we obtain 49 growth QTL (gQTL) consisting of remaining trait-QTL and PC-QTL. Twenty-eight out of 49 gQTL show overdominant, dominant, and additive gene effects, but only non-additive effects can explain growth heterosis. Results provide evidence for the two, classical, alternative hypotheses for heterosis, overdominance and dominance; however, cases of negative dominance require genetic architectures that are more complex than the classical hypotheses. The rate of log- transformed live-weight increase from September to October is smaller than that from July to September, indicating that growth slows in response to lower fall temperatures. Interestingly, genotypes of superior individuals differ under low and high temperatures, suggesting genotype- by-environment interaction. 1. Introduction Heterosis, first perceived by Darwin (1877) and then reported and introduced by Shull (1908, 1914), refers to the phenomenon that progeny of two inbred parental lines perform better (i.e. exhibit greater biomass or yield, speed of development, or fertility) than the best parent (Shull, 1948). Commonly observed in plants, especially crops, and animals, heterosis has been widely exploited in plant and animal breeding. The genetic basis of heterosis has been broadly explored since then. So far, genetic explanations for heterosis include dominance, the complementation of deleterious recessive mutations by dominant alleles in hybrids, overdominance, the superiority caused by interactions between alleles in heterozygous states, and epistasis, the interactions between different loci affecting a trait (Bruce, 1910; Crow, 1948, 1998; Stuber et al., 1992; Xiao et al., 1995; Cockerham and Zeng, 1996; Birchler et al., 2003, 2005, 2006, 2010). Even though genetic causes for heterosis have been extensively investigated 104 since it was proposed, debate over two, classical, alternative hypotheses, dominance and overdominance, has lasted for over a hundred years without reaching a consensus. Heterosis in fitness-related traits was first reported in bivalve molluscs by Singh and Zouros (1978) and then demonstrated by Hedgecock et al. (1995) and Hedgecock and Davis (2007) in the Pacific oyster. Hedgecock et al. (1995) discovered both positive and negative heterosis in growth, suggesting epistasis as a potential explanation for heterosis. Meanwhile, studies on Mendelian segregation distortion revealed a high genetic load of recessive deleterious mutations in the Pacific oyster, supporting the hypothesis of dominance (Launey and Hedgecock, 2001; Plough and Hedgecock, 2011; Plough, 2012). A subsequent transcriptomic analysis showed overdominant, underdominant, dominant and partially dominant patterns of gene expression in hybrids relative to inbred parents (Hedgecock et al., 2007). Additionally, Pace et al. (2006) found that physiological mechanisms, such as higher feeding rate and more efficient metabolism, contribute to growth heterosis in oyster larvae. Previous studies have greatly enhanced our understanding of heterosis in fitness-related traits by illuminating genetic and physiological mechanisms at genomic, transcriptomic, metabolic and individual levels. However, genetic causes for growth heterosis in adult bivalve molluscs have rarely been studied at the genomic level. Quantitative trait locus (QTL) mapping is a powerful genomic approach to constructing genetic architecture underlying variation in fitness-related traits (Stuber et al., 1992; Xiao et al., 1995; Li et al., 2001; Luo et al., 2001; Plough and Hedgecock, 2011; Plough, 2012; Plough et al., 2016). QTL mapping has been widely conducted on growth-related traits in bivalve molluscs including oyster, scallop, abalone and clam (Qin et al., 2007; Baranski et al., 2008; Zhan et al., 2009; Guo et al., 2012; Petersen et al., 2012; Li et al., 2012; Li and He, 2014; Ren et al., 2016; 105 Wang et al., 2016; Nie et al., 2017; Niu et al., 2017). However, while these studies have successfully detected genomic regions accounting for individual growth variation in molluscs, few of them have explored the genetic bases underlying growth QTL (gQTL). Here, improving upon these previous studies, we conduct QTL mapping on six, interrelated F2 families of Pacific oysters, using single nucleotide polymorphisms (SNPs) detected by genotyping-by-sequencing (GBS). The phenotypes mapped are a suite of size- and growth-related traits derived from five, monthly live weights measured on more than 1000 individually tagged oysters during their second summer post fertilization. After detecting QTL for size- and growth-related traits, we determine the genetic architecture of growth heterosis by comparing phenotypes corresponding to the F1 hybrid and parent-line genotypes that segregate in the F2 families at each QTL. We first discriminate parent-line and hybrid genotypes, by comparing genetic similarities among the four, phased F1 parent haplotypes established by linkage mapping. In many cases, parent-line genotypes determined by genetic similarity conform to patterns of mortality associated with identical-by-descent (IBD) recessive deleterious mutations discovered in viability QTL mapping (Chapter 3). Then, we fit each QTL to a genetic model according to results from ANOVA and a series of post hoc tests on trait values corresponding to hybrid and parent-line genotypes. Our intent, in fitting these genetic models, is to reveal the gene action at gQTL and to test the long-standing explanations for heterosis, dominance and overdominance. Meanwhile, the rate of log-transformed live-weight increase in July and August is greater than that in September and October. The temperature in July and August is higher than that in September and October, suggesting that the growth of Pacific oysters slows in response to the 106 decreasing temperature. Further tests show that genotypes of fast-growing individuals differ under low and high temperatures, implying genotype-by-environment interaction. 2. Materials and Methods 2.1 Biological materials Our study was based on six, interrelated F2 families (sire×dam), 23×31, 23×40, 31×23, 40×92, 47×92 and 92×40. Each F2 family was derived from an intercross of full-sib F1 hybrids, which, in turn, were produced by crosses of partially inbred lines 23, 31, 40, 47 and 92 in 2009. We made these F2 families in May and June of 2011 and reared them through the summer of 2012 in Thorndyke Bay, WA. Detailed pedigree and rearing information is described and shown in 3. Pedigrees and Rearing of Six, F2 Interrelated Families and Figure 1 in Introduction. 2.2 Data collection We took monthly measurements of live weights (lw) for oysters in each family (i.e. lw1, lw2, lw3, lw4 and lw5) on dates indicated in Table 23 and harvested six F2 families in September or October of 2012. Only four monthly live weights were measured for family 47×92 because this family did not get large enough for being tagged and weighed until July. We determined genotypes of F1 hybrid parent oysters and of progeny from the six, F2 families, using genotyping-by-sequencing (GBS) and bioinformatics analyses guided by Genome Analysis Toolkit (GATK, https://www.broadinstitute.org/gatk/) (Bentley et al., 2008; Elshire et al., 2011). Details on GBS (library preparation and sequencing) and GATK analysis were described in Chapter 2. Finally, we conducted gQTL mapping with the same set of 12 parent oysters from six, F1 families and 1,041 progeny from six, F2 families (i.e. 23×31: n=208; 23×40: n=181; 31×23: n=142; 40×92: n=258; 47×92: n=127; 92×40: n=125) that were used in linkage mapping in Chapter 2. 107 Temperature was recorded every 30 min from 12:00 pm on July 2, 2012 to 00:00 am on October 19, 2012 at an approximate +1.0 ft tidal elevation for Thorndyke Bay, WA, where the six F2 families were reared. The daily highest, lowest and mean temperatures were determined and compared among July, August, September and October. 2.3 Data analyses 2.3.1 Measured and derived size and growth phenotypes We first took logarithms (base 10) of the five monthly live weights to generate five, size- at-age measures, lglw1, lglw2, lglw3, lglw4 and lglw5. Then, by subtracting the log-transformed live weight at one time point from that at the next time point, we derived four monthly growth gains between adjacent live weights, lglw21, lglw32, lglw43 and lglw54. We also derived a fifth growth gain between the first and fourth live weight measures, lglw41. Finally, we obtained deviations in intercepts and slopes of growth curves fitted to log-transformed live weights. We fitted a growth curve to live weights of each individual in a family to obtain individual growth curves. Similarly, we fitted a growth curve to live weights of all individuals in a family to get a family-specific grand growth curve. For each individual in a specific family, we used the deviation between intercepts and slopes of corresponding individual and grand growth curves. Since size- and growth-related traits correlate with each other, we conducted principal component analysis (PCA) on the foregoing measured and derived traits, with sex as a supplementary variable, in order to reduce redundancy in QTL mapping. Alongside the measured and derived traits, we used the first four principal components (i.e. PC1, PC2, PC3 and PC4) from this PCA for QTL mapping. 108 2.3.2 Growth QTL mapping Using JoinMap 4.1, we constructed 60, high-density linkage maps, with an average marker spacing of 0.49 cM, for the six, F2 families, according to methods described in Chapter 2. We mapped gQTL using the QTL model developed by Hu and Xu (2009), which is implemented in PROC QTL, a user-defined procedure in SAS (version 9.4). Inputs for gQTL mapping included phased parent genotypes, unphased progeny genotypes, linkage maps, and the progeny size and growth phenotypes described above. In running PROC QTL, we encountered the same errors as described in Chapter 3 and rectified these errors by removing markers at the point in the data where the error was generated. Genomes were scanned in 1-cM increments using the Maximum Likelihood method with distortion analysis. Since original parent lines were not completely inbred, we set the mating type of each F2 family to the four-way cross, AB×CD. A likelihood ratio test (LRT) statistic was generated for each increment on the genome. We approximated genome-wise and chromosome-wise QTL significance thresholds, at the α=0.05 level, by the method of Piepho (2001). When more than one QTL peak appeared on a linkage group, we identified separate gQTL peaks only when the LRT statistic between two QTL peaks fell by at least 4.6 (~1 LOD) (Lander and Botstein, 1989). If a gQTL included multiple QTL peaks, the position on the genome with the highest LRT value was taken as the location for that gQTL. 2.3.3 Genetic models underlying growth QTL After detecting QTL for both measured and derived size- and growth-related traits, we identified genetic models underlying them. In PROC QTL, cross type of F1 hybrid family was set as AB×CD, so a possible example of inbred parent lines could be AC and BD (Fig. 10). Through full-sibling crossing within a F1 family, four genotypes, AC, AD, BC and BD, should segregate in F2 families. In the example shown in Fig. 10, AC and BD are parent-line genotypes while AD 109 and BC are hybrid genotypes. Since we do not have genotypic data on inbred parent lines, we first discriminated parent-line and hybrid genotypes segregating in the six, F2 families, by comparing genetic similarities among the four, phased F1 parent haplotypes established by linkage mapping. For each pair of phased F1 parent haplotypes being compared in each linkage group, we calculated genetic similarity as the number of loci with identical alleles divided by the total number of loci. We reasoned that the four, phased F1 haplotypes should comprise two pairs of haplotypes derived from the two grandparents (inbred parent lines) and that the haplotypes derived from one grandparent should be more similar to each other than to either of the haplotypes derived from the other grandparent (Fig. 10). 110 We then conducted ANOVA on the corresponding trait for each gQTL to estimate parameter values and to obtain p-values for analyzing genetic mechanisms. ANOVA models for trait-QTL and PC-QTL are phenotype = genotype + sex + genotype sex and phenotype = genotype, respectively. For ANOVA on size or growth traits, we removed hermaphroditic and undifferentiated oysters in some families because not all four genotypes are present in these two A C D B A B C D A C B D A D B C Parent-line Genotypes (High Haplotype Similarity) Hybrid Genotypes (Low Haplotype Similarity) Inbred Parent Lines (Parent-line Genotypes) F1 Family F2 Family Figure 10. Expected genetic similarity of parent-line and hybrid haplotypes. 111 categories and this made the genotype-by-sex interaction inestimable. For ANOVA on principal components, we did not include sex as a cofactor, because the PCA already included sex as a supplementary variable or cofactor. From ANOVA, we obtained the absolute difference between the mean trait values corresponding to two parent-line genotypes (L), calculated twice the deviation in the trait value of a hybrid genotype from the mean trait value of the two parent-line genotypes (Q), and estimated potence as hp = Q/L (Griffing, 1990; Hedgecock et al., 1995). We then conducted a series of post hoc tests of whether the trait value corresponding to a hybrid genotype was significantly different from those of the two parent-line genotypes or the mid- parent value (Hedgecock et al., 2007). Finally, according to the results of these post hoc tests (Table 24), we assigned each gQTL to one of seven genetic models, overdominance (OD), underdominance (UD), dominance+ (D+), dominance- (D-), partial dominance+ (PD+), partial dominance- (PD-) and additive (ADD) effects. 2.3.4 Genotype-by-environment interaction analysis To test genotype-by-environment interaction, we first compared the average, highest and lowest daily temperatures at Thorndyke Bay, WA of each month from July 2 to October 18 of 2012, using one-way ANOVA with month as independent variable and temperature as dependent variable. Then, we calculated the log-transformed live-weight increase per day (i.e. rate of gain in log-transformed live weight) between the second and fourth measurements (i.e. lglw42) and between the fourth and fifth measurements (i.e. lglw54). Using one-way ANOVA, we then tested whether growth rates during these two time intervals were independent of genotype. 112 3. Results 3.1 Identifying “Parent-line” and “Hybrid” genotypes in the F2 families Genetic similarity is strongly dependent on the pair of phased F1 parent haplotypes being compared in 58 out of 60 chi-square tests across ten linkage groups in six F2 families (i.e. 1 chi- square test per linkage group×10 linkage groups×6 F2 families) (Table 17). LG1 in 23×31 illustrates this dependence (Fig. 11). Since partially inbred parent lines are used to set up F1 hybrid family, the pair of haplotypes of high genetic similarity should come from the same inbred parent line. With this logic, we are able to determine parent-line genotypes for 56 out of 60 linkage groups. In LG3 and LG4 in 23×40, genetic similarity is independent of the pair of F1 parent haplotypes being compared (Table 17, shaded cells). Even though LG5 in 23×40 and LG1 in 31×23 (Table 17, shaded cells) show dependence of genetic similarity on pairs of F1 parent haplotypes, the pattern is not two similar and two dissimilar pairwise haplotype comparisons, as in Fig. 11, but three similar comparisons and one dissimilar comparison. Thus, we cannot identify parent-line genotypes for LG3, LG4 and LG5 in 23×40 and LG1 in 31×23. Genetic similarity for parental (i.e. parent-line genotypes) and non-parental (i.e. hybrid genotypes) pairs of haplotypes, classified by the chi-square tests, is well separated by 0.5 (Fig. 12), suggesting that the method we use to determine parent-line genotype is reliable. 113 Table 17. Chi-square tests of independence of genetic similarity across four pairs of F1, phased haplotypes for all linkage groups in six, F2 families 23×31 a 23×40 31×23 40×92 47×92 92×40 LG1 p-value 6.02E-82 2.74E-32 1.00E-03 3.20E-60 1.66E-53 1.13E-44 # markers 135 75 60 111 64 61 LG2 p-value 2.76E-70 6.22E-16 1.16E-28 3.7866E-14 6.22E-12 6.78E-07 # markers 91 42 51 66 30 30 LG3 p-value 1.45E-85 0.651 1.17E-68 2.88E-62 5.36E-63 2.35E-30 # markers 176 88 93 109 73 92 LG4 p-value 1.05E-59 0.471 8.48E-17 2.33E-52 1.39E-78 7.98E-42 # markers 102 65 70 75 91 88 LG5 p-value 1.20E-66 7.48E-04 1.36E-65 1.07E-14 4.80E-27 3.16E-05 # markers 83 59 76 57 65 41 LG6 p-value 1.10E-112 2.39E-18 1.93E-74 3.63E-108 6.31E-47 6.25E-54 # markers 144 41 125 131 73 88 LG7 p-value 6.74E-55 4.10E-53 1.77E-62 5.54E-29 1.18E-71 1.55E-59 # markers 78 114 80 71 83 69 LG8 p-value 3.39E-30 3.11E-14 1.39E-04 3.44E-36 1.55E-59 7.98E-29 # markers 37 67 38 42 69 56 LG9 p-value 2.39E-67 5.09E-09 3.42E-68 3.39E-30 2.01E-47 1.57E-38 # markers 80 45 79 37 55 62 LG10 p-value 5.85E-91 5.12E-36 3.80E-53 1.79E-50 6.13E-57 9.84E-67 # markers 113 70 94 89 66 133 a For each linkage group, genetic similarity is compared for four pairs of F1, phased haplotypes (i.e. A vs. C, A vs. D, B vs. C, B vs. D, where A and B (C and D) represent two phased haplotypes from the F1 paternal (maternal) parent). Shaded cells indicate that (1) according to chi-square tests, the genetic similarity between two F1, phased haplotypes does not differ significantly across four pairs of F1, phased haplotypes for LG3 and LG4 in family 23×40 at α=0.05, and (2) the pattern is not two similar and two dissimilar pairwise haplotype comparisons for LG5 in 23×40 and LG1 in 31×23, even though they show dependence of genetic similarity on pairs of F1, phased haplotypes. 114 Figure 11. Chi-square test of independence of genetic similarity across four pairs of F1, phased haplotypes for LG1 in family 23×31. A and B (C and D) stand for two phased haplotypes from the F1 paternal (maternal) parent. “Identical” and “Non-identical” stand for numbers of loci with identical and non-identical alleles between the two F1, phased haplotypes being compared. Figure 12. Distribution of genetic similarity between haplotypes in parent-line and hybrid genotypes. Parent-line and hybrid categories are determined by significant associations in the contingency chi-square tests of Table 17; we cannot determine parent-line genotypes for LG3, LG4 and LG5 in 23×40 and LG1 in 31×23, so these 16 haplotype comparisons are designated “undetermined”. 3.2 Principal component analysis The first four principal components account for over 96% of variance in size- and growth- related traits (i.e. log-transformed live weights, increments between adjacent log-transformed 0 10 20 30 40 50 60 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Number of Comparisons Genetic Similarity Parent-Line Hybrid Undetermined Identical Non-identical Total A vs. C 135 0 135 Chi-square statistic 379.53 A vs. D 12 123 135 d.f. 3 B vs. C 11 124 135 p-value < 0.00001 B vs. D 112 23 135 Total 270 270 540 115 live weights, deviations in intercepts and slopes of growth curves fitted to log-transformed live weights), around 70% of which is explained by PC1 alone (Table 18). Table 18. Phenotypic variance explained by Principal Component Analysis (PCA) Cumulative percentage of variance PC1 a PC2 PC3 PC4 23×31 67.75 83.64 91.06 96.05 23×40 71.34 87.42 93.83 97.67 31×23 62.87 84.56 93.00 96.01 40×92 71.87 85.03 92.62 96.71 47×92 58.79 85.74 94.09 98.08 92×40 67.09 83.93 91.53 97.77 a PC1-PC4 stand for the first four principal components from PCA. The loadings or correlations of size- and growth-related, measured and derived traits on the first four principal components (Fig. 13) reveal consistent patterns of correlation among traits. PC1 correlates strongly and positively with size-at-age and strongly and negatively with monthly gains and growth slope. PC2 correlates positively with most traits, especially with lglw21 and lglw41. In all families but 47 92, which started smaller than the others did, PC3 correlates strongly and positively with lglw54 and moderately and negatively with lglw21. In family 4792, PC3 correlates negatively with lglw54, rather than positively as in the other families. Correlations of traits with PC4 vary to some extent across families. PC4 correlates moderately and negatively with the initial monthly weight gain, in four of the five families having this trait (again, family 47 92 was not tagged and measured until July), and with the final monthly gain in three of those families. The exceptional family, 23 31, shows the same pattern but with opposite signs, i.e. positive correlations of PC4 with initial and final monthly gains and negative correlations with gains during mid-summer months of maximal growth. 116 PC1 PC2 PC3 PC4 PC1 PC2 PC3 PC4 23×31 lglw1 0.99 -0.05 0.08 -0.01 23×40 lglw1 0.99 0.02 0.11 0.07 lglw2 0.99 0.10 -0.02 0.09 lglw2 0.97 0.21 0.01 -0.01 lglw3 0.96 0.26 -0.04 0.01 lglw3 0.93 0.37 0.00 0.05 lglw4 0.90 0.41 0.01 -0.08 lglw4 0.84 0.51 0.08 0.15 lglw5 0.86 0.47 0.12 -0.04 lglw5 0.79 0.57 0.18 0.10 lglw21 -0.35 0.68 -0.45 0.46 lglw21 -0.38 0.74 -0.43 -0.33 lglw32 -0.74 0.47 -0.05 -0.36 lglw32 -0.88 0.33 -0.03 0.18 lglw43 -0.78 0.36 0.20 -0.30 lglw43 -0.82 0.28 0.27 0.32 lglw54 -0.38 0.37 0.77 0.33 lglw54 -0.59 0.24 0.66 -0.39 lglw41 -0.76 0.63 -0.13 -0.08 lglw41 -0.82 0.56 -0.11 0.05 intercept 0.91 0.27 0.02 -0.14 intercept 0.95 0.13 0.02 -0.11 slope -0.93 -0.17 -0.01 0.14 slope -0.96 0.18 0.01 0.10 31×23 lglw1 0.99 0.02 0.09 0.08 40×92 lglw1 0.99 -0.01 0.05 0.04 lglw2 0.98 0.18 -0.03 0.03 lglw2 0.98 0.16 -0.06 -0.07 lglw3 0.92 0.38 -0.06 0.04 lglw3 0.94 0.30 -0.09 0.03 lglw4 0.84 0.52 0.01 0.14 lglw4 0.88 0.43 -0.08 0.15 lglw5 0.79 0.59 0.12 0.09 lglw5 0.85 0.51 0.06 0.12 lglw21 -0.35 0.70 -0.53 -0.26 lglw21 -0.50 0.64 -0.42 -0.39 lglw32 -0.66 0.64 -0.08 0.05 lglw32 -0.83 0.32 -0.05 0.35 lglw43 -0.76 0.39 0.29 0.35 lglw43 -0.86 0.16 0.09 0.34 lglw54 -0.30 0.46 0.77 -0.32 lglw54 -0.25 0.48 0.82 -0.19 lglw41 -0.69 0.70 -0.15 0.04 lglw41 -0.86 0.47 -0.18 0.09 intercept 0.92 0.24 0.02 -0.11 intercept 0.94 0.14 -0.01 0.13 slope -0.94 0.08 0.03 0.10 slope -0.95 0.13 0.01 -0.06 47×92 NA - - - - 92×40 lglw1 0.99 -0.03 0.01 0.09 lglw2 0.97 0.22 0.03 -0.02 lglw2 0.98 0.16 -0.03 -0.05 lglw3 0.91 0.37 0.10 0.10 lglw3 0.95 0.30 -0.06 -0.02 lglw4 0.79 0.59 0.15 -0.02 lglw4 0.89 0.42 -0.10 0.15 lglw5 0.75 0.65 0.03 -0.01 lglw5 0.87 0.48 0.02 0.15 NA - - - - lglw21 -0.31 0.75 -0.14 -0.55 lglw32 -0.72 0.46 0.25 0.45 lglw32 -0.78 0.46 -0.10 0.15 lglw43 -0.71 0.56 0.11 -0.41 lglw43 -0.70 0.29 -0.12 0.60 lglw54 -0.09 0.55 -0.82 0.09 lglw54 -0.14 0.39 0.91 0.03 lglw42 -0.80 0.57 0.20 0.00 lglw41 -0.73 0.66 -0.16 0.03 intercept 0.88 0.25 0.08 -0.05 intercept 0.97 0.20 -0.03 0.07 slope -0.68 0.72 -0.01 -0.04 slope -0.99 0.06 0.01 -0.02 Figure 13. Loadings of traits on principal components. Green and red indicate high and low values, respectively. 117 3.3 Growth QTL Since genome-wise thresholds are too high, we identify trait-QTL and PC-QTL using chromosome-wise thresholds (Fig. 14, example LRT profiles for PC1). We detect 44 QTL for individual traits (i.e. trait-QTL) and 27 QTL for principal components (i.e. PC-QTL), respectively (Table 19). Among the 44 trait-QTL, 28 and 16 contribute to individual variation in single (i.e. minor trait-QTL) and multiple (i.e. major trait-QTL) traits, respectively, indicating that some QTL play a role at a specific time point or interval while others affect the growth process (Table 25). Both trait- and PC-QTL are broadly distributed across all ten linkage groups in the six, F2 families (Table 19). Sixteen PC-QTL overlap with 21 trait-QTL; for example, PC- QTL at markers 226-C160617T and 43794-T116068A overlap with trait-QTL at markers 22- G563443A and 867-C359681T (Table 19). In some cases, two or more trait-QTL are under a single PC-QTL (Table 19). In order to facilitate analysis and interpretation, we combine trait- QTL and PC-QTL. For overlapping trait-QTL and PC-QTL, we take the PC-QTL as the representative gQTL, because the principal components account for correlation among traits. Twenty-one out of 44 trait-QTL overlap with 16 out of 27 PC-QTL, resulting in a total number of 49 non-overlapping gQTL for both size- and growth-related traits and principal components (Table 19). 118 0 5 10 15 20 25 30 LRT cM 23 ×31 1 2 3 4 5 6 7 8 9 10 0 5 10 15 20 25 30 LRT cM 23×40 0 5 10 15 20 25 30 LRT cM 31×23 0 5 10 15 20 25 30 LRT cM 40×92 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 119 Figure 14. LRT profiles for PC1 across six F2 families. Black lines indicate chromosome-wise thresholds. 0 5 10 15 20 25 30 LRT cM 47×92 0 5 10 15 20 25 30 LRT cM 92×40 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 120 Table 19. QTL of (A) principal components and (B) individual size- and growth-related traits No. A. Principal components B. Individual traits gQTL Nearest marker Family LG a Position Trait LRT b Nearest marker Family LG Position Trait 1 855-C67410T 23×31 7 2.4 lglw1, intercept, slope 2 4-G566200T 23×31 7 54.3 PC4 16.6 3 2442-C17818T 23×31 8 5.4 lglw54 248-T61677C c 23×31 8 5.5 PC4 13.3 4 1326-T27028C 23×40 1 16.7 lglw21 5 383-A406938T 23×40 2 0.0 lglw2, lglw3, lglw4, intercept 6 2680-G86433C 23×40 2 14.1 lglw1, lglw2, lglw3, lglw4, lglw5, intercept, lglw43, slope 7 2160-C34067A 23×40 2 20.9 lglw32 8 1032-A12571G 23×40 2 38.1 PC2 16.2 9 2037-G130694A 23×40 3 0.7 lglw54 10 852-A154456T 23×40 5 27.9 PC2 17.8 11 1440-C19955A 23×40 5 35.7 lglw54 12 226-C160617T 23×40 6 1.8 PC2 26.1 d 1508-G563443A 23×40 6 6.4 lglw4, lglw5 43794-T116068A 23×40 6 18.2 PC4 13.3 3280-C359681T 23×40 6 31.7 lglw43 13 1514-G19562A 23×40 8 21.4 PC3 14.1 Table continues next page. 121 No. A. Principal components B. Individual traits gQTL Nearest marker Family LG a Position Trait LRT b Nearest marker Family LG Position Trait 14 1226-T17326A 23×40 9 0.6 lglw2, lglw3, lglw4, lglw5 15 1065-C47209G 23×40 9 4.0 lglw54 16 2281-C396908T 23×40 9 48.9 lglw21 17 1010-G97009A 31×23 5 42.7 lglw4, lglw5 18 376-C157962T 31×23 7 37.9 PC2 20.0 691-T1228701C 31×23 7 41.8 lglw3, lglw4, lglw5 19 3124-C96629T 31×23 8 12.2 lglw54 20 1603-C117762T 31×23 8 58.0 PC2 26.8 21 472-C319612T 31×23 8 63.0 PC1 14.9 765-C496838T 31×23 8 63.0 lglw4 22 1952-T707530A 40×92 1 6.5 lglw21 2100-C230706A 40×92 1 9.8 lglw43 40-T150431A 40×92 1 14.1 PC2 25.0 23 1789-G15881A 40×92 1 41.8 lglw5 42568-T166943G 40×92 1 45.8 PC1 20.3 11703-T166943G 40×92 1 45.8 lglw1, lglw2, lglw3, lglw4, lglw5, intercept, lglw43, slope 24 3095-A119169T 40×92 1 54.4 lglw43 1245-T456306C 40×92 1 62.9 lglw1, lglw2, lglw3, lglw4, lglw5, intercept, lglw32, slope 1797-G44751A 40×92 1 68.3 PC1 20.1 25 1213-G717809A 40×92 3 0.9 PC4 21.1 26 602-C35584T 40×92 4 20.7 PC1 18.9 1122-C35584T 40×92 4 20.7 lglw1, lglw2, lglw3, lglw4, intercept, lglw32, lglw43, slope 27 36674-G15754A 40×92 6 9.3 PC2 19.2 Table continues next page. 122 No. A. Principal components B. Individual traits gQTL Nearest marker Family LG a Position Trait LRT b Nearest marker Family LG Position Trait 28 672-T19370G 40×92 7 1.6 lglw21 29 1767-T365766G 40×92 7 10.2 lglw1, lglw32, slope 30 1605-A587549G 40×92 8 45.0 PC2 19.2 499-A417211C 40×92 8 45.4 lglw5 1605-C552798T 40×92 8 45.7 lglw3, lglw4, intercept 31 294-G127027T 40×92 10 3.9 PC3 32.7 294-G126988T 40×92 10 4.1 lglw54 32 145-G707529A 47×92 1 3.4 PC1 16.1 145-G707529A 47×92 1 3.4 lglw2, lglw32, slope 33 169-G252822T 47×92 1 12.2 lglw4 34 1493-C71962T 47×92 1 46.6 slope 35 32014-G830A 47×92 3 43.2 lglw43 36 53-T172362A 47×92 5 14.7 lglw43, slope 53-G172375A 47×92 5 14.8 PC2 33.9 37 42918-C61178T 47×92 5 46.9 lglw54 38 481-C1089150T 47×92 7 37.2 PC3 17.2 39 1189-C373681T 47×92 8 18.0 lglw32 40 39712-T33123G 47×92 9 31.2 lglw43 41 42184-G42774A 92×40 2 55.1 PC2 14.0 42 1879-C62634T 92×40 3 6.1 PC3 19.2 43 1352-G435942A 92×40 3 22.5 lglw4 44 1249-G450639T 92×40 6 21.0 PC1 16.3 1249-G450639T 92×40 6 21.0 lglw4 Table continues next page. 123 No. A. Principal components B. Individual traits gQTL Nearest marker Family LG a Position Trait LRT b Nearest marker Family LG Position Trait 45 43804-A149128T 92×40 6 41.1 PC2 18.1 43804-A183435G 92×40 6 41.6 lglw5 1574-C395484T 92×40 6 55.6 lglw43 46 352-G7266A 92×40 7 13.2 lglw2, lglw3, lglw4, lglw5, intercept 47 38546-A48525C 92×40 8 0.4 PC4 14.0 48 1674-G159C 92×40 9 3.8 PC3 19.2 1674-G159C 92×40 9 3.8 lglw54 49 492-C631776T 92×40 10 6.2 lglw4, lglw5 a LG: linkage group. b LRT are only shown for PC-QTL as an example. c Boxes contain overlapping QTL for principal components and individual traits. d Bold numbers indicate LRT above the genome-wise threshold. 124 3.4 Genetic models underlying growth QTL Based on ANOVA, followed by post hoc contrasts among genotypes, we are able to identify genetic models for 24 trait-QTL and 16 PC-QTL, respectively (Tables 25, 26). The trait- QTL arise from both additive (ADD) and non-additive effects; non-additive effects include overdominance (OD), underdominance (UD), dominance+ (D+) and dominance- (D-; Table 25). Models of UD and D- effects are only associated with trait-QTL for monthly gains or slope deviations (Table 25). The PC-QTL also arise from a variety of effects, ADD, OD, D+, D-, and partial D+ (Table 26). Nearly all PC-QTL for PC1 fit models of OD and D+ gene action, with the single exception of the gQTL 26 on linkage group 4 of family 40 92, which shows a D- effect (Table 26). Ruling out redundant genetic models for overlapping trait-QTL, we find 11 models of OD, nine models of D+, five models of D-, and three models of ADD effects, accounting for 28 gQTL altogether (Table 20). 125 Table 20. Genetic models for gQTL No. QTL Trait Genetic gQTL Nearest Marker Family-LG-POS a model b Yield-related QTL 5 1350-A406938T 23×40-LG2-P0 lglw2-4, intercept D+ 6 40822-G86433C 23×40-LG2-P14 lglw1-5, intercept, lglw43, slope OD 8 1032-A12571G 23×40-LG2-P38 PC2 D- 12 226-C160617T 23×40-LG6-P1 PC2 D+ 18 211-T1228701C 31×23-LG7-P41 lglw3-5 OD 22 40-T150431A 40×92-LG1-P14 PC2 OD 23 42568-T166943G 40×92-LG1-P45 PC1 OD 24 1797-G44751A 40×92-LG1-P68 PC1 OD 26 602-C35584T 40×92-LG4-P20 PC1 D- 29 1767-T365766G 40×92-LG7-P10 lglw1, lglw32, slope OD 32 145-G707529A 47×92-LG1-P3 PC1 D+ 33 169-G252822T 47×92-LG1-P12 lglw4 OD 36 53-G172375A 47×92-LG5-P14 PC2 D+ 41 42184-G42774A 92×40-LG2-P55 PC2 D+ 43 1352-G435942A 92×40-LG3-P22 lglw4 D+ 44 1249-G450639T 92×40-LG6-P20 PC1 OD 45 43804-A149128T 92×40-LG6-P41 PC2 OD Non-yield-related QTL 2 4-G566200T 23×31-LG7-P54 PC4 OD 3 626-C17818T 23×31-LG8-P5 lglw54 D- 4 403-T27028C 23×40-LG1-P16 lglw21 ADD 7 1485-C34067A 23×40-LG2-P20 lglw32 D- 13 1514-G19562A 23×40-LG8-P21 PC3 ADD 16 115-C396908T 23×40-LG9-P48 lglw21 OD 19 117-C96629T 31×23-LG8-P12 lglw54 D+ 25 1213-G717809A 40×92-LG3-P0 PC4 ADD 28 672-T19370G 40×92-LG7-P1 lglw21 D+ 31 294-G127027T 40×92-LG10-P3 PC3 D+ 42 1879-C62634T 92×40-LG3-P6 PC3 D- a Family-LG-POS: Family-Linkage Group-Position of nearest markers of gQTL. b OD: overdominance; D+: dominance+; D-: dominance-; ADD: additive. 126 3.5 Genotype-by-environment interaction Our analysis shows that, from July 2 to October 18 of 2012, the average, highest and lowest daily temperature in September is lower than that in July and August, but higher than that in October (Table 21). The average, highest and lowest daily temperatures in July and August are the same (Table 21). The difference between growth rates during the second and fourth measurements (i.e. July and August) and during the fourth and fifth measurements (i.e. September, or September and October) differs by genotype at 10 and 18 out of 44 trait-QTL, at the significance levels of ɑ=0.05 and ɑ=0.1, respectively (Table 22). Table 21. Post hoc comparisons of monthly means following one-way ANOVA of monthly temperatures for July-October in 2012 Average daily p-value of pair-wise comparison Least squares temperature July August September October mean ( ℃) July 0.7585 <.0001 <.0001 15.40 August 0.7585 <.0001 <.0001 15.34 September <.0001 <.0001 <.0001 13.12 October <.0001 <.0001 <.0001 11.11 The highest daily p-value of pair-wise comparison Least squares temperature July August September October mean ( ℃) July 0.7405 <.0001 <.0001 21.44 August 0.7405 <.0001 <.0001 21.14 September <.0001 <.0001 0.0026 15.11 October <.0001 <.0001 0.0026 11.94 The lowest daily p-value of pair-wise comparison Least squares temperature July August September October mean ( ℃) July 0.923 0.0001 <.0001 12.97 August 0.923 <.0001 <.0001 13.01 September 0.0001 <.0001 <.0001 11.45 October <.0001 <.0001 <.0001 9.30 127 Table 22. ANOVA testing genotype-by-environment interaction No. gQTL trait-QTL (nearest marker) Family Linkage group Position p-value 1 372-C67410T 23×31 7 2.428 0.0369 6 40822-G86433C 23×40 2 14.089 0.0191 12 22-G563443A 23×40 6 6.372 0.012 16 115-C396908T 23×40 9 48.87 0.0052 22 145-T707530A 40×92 1 6.546 0.0796 22 1164-C230706A 40×92 1 9.75 0.0811 23 40980-G15881A 40×92 1 41.834 0.0087 23 42568-T166943G 40×92 1 45.786 0.0122 24 203-T456306C 40×92 1 62.94 0.0868 26 602-C35584T 40×92 4 20.696 0.0026 32 145-G707529A 47×92 1 3.412 0.0654 35 32014-G830A 47×92 3 43.176 0.0917 40 39712-T33123G 47×92 9 31.231 0.0846 44 1249-G450639T 92×40 6 20.964 0.0273 45 43804-A183435G 92×40 6 41.641 0.0934 45 1574-C395484T 92×40 6 55.615 0.0028 46 352-G7266A 92×40 7 13.187 0.0611 48 1674-G159C 92×40 9 3.784 0.0354 4. Discussion 4.1 Genetic bases of gQTL for yield heterosis In order to determine the genetic bases of heterosis, we first perform one-way ANOVA, with genotype and phenotype as independent and dependent variables, respectively, on each gQTL. After detecting significant associations between genotype and phenotype at all 49 gQTL by ANOVA, we conduct a series of pre-planned post hoc contrasts to fit 49 gQTL to mutually exclusive genetic models (Table 24). Since monthly weights, the intercept of the growth curve, PC1 and PC2 either measure or are highly correlated with variation in biomass, we classify QTL for these traits as yield-related gQTL; all others are non-yield-related gQTL. 128 We successfully determined genetic models for 28 out of 49 gQTL, which show OD, D+, D- and ADD effects, respectively, among which ADD effects cannot explain heterosis (Table 20). Nine and six out of 17 yield-related gQTL are caused by OD and D+, respectively (Table 20, Fig. 15), supporting the two, classical, alternative hypotheses for heterosis, dominance and overdominance (Bruce, 1910; Crow, 1948, 1998; Xiao et al., 1995; Birchler et al., 2003, 2005, 2006, 2010). However, two yield-related QTL, gQTL26 for PC1 and gQTL8 for PC2, arise from D-, implying that genetic architectures for yield heterosis are more complex than the classical hypotheses (Table 20, Fig. 15). On one hand, the case of D- can be explained by epistasis, meaning that yield-related gQTL showing D- contribute to heterosis by interacting with other genetic factors (Cockerham and Zeng, 1996; Birchler et al., 2006). On the other hand, yield- related gQTL resulting from D- may be involved in metabolic processes negatively correlated with yield. For example, previous studies have shown that, at the physiological level, protein metabolism contributes to variation in growth in bivalve molluscs (Hawkins et al., 1986; Pace et al., 2006; Meyer and Manahan 2010). Therefore, gQTL showing D- may involve genes participating in protein degradation, and thus, result in higher yield corresponding to hybrid genotypes. None of yield-related gQTL is caused by ADD effect, indicating that additive effects do not substantially contribute to variation in yield. Our finding is consistent with previous studies (Hedgecock et al., 1995; Hedgecock and Davis, 2007) in that non-additive genetic components of yield variance, specific combining ability and reciprocal effect, make larger contributions to yield than additive, general combining ability. 129 Figure 15. Genetic models of yield-related QTL. D+: dominance of the allele from the high- performing parent; D-: dominance of the allele from the low-performing parent; OD: overdominance. 4.2 Genotype-by-environment interaction The growth curve of log-transformed live weights shows that oysters grow at two different rates from the first to the fourth (i.e. June to August) and from the fourth to the fifth (i.e. September to October) measurements except for 47×92, which only has four monthly measured live weights (Fig. 16, growth curve of 23×31 as an example). Temperature in July and August is consistently higher than that in September and October. The rate of log-transformed live-weight increase from September to October is smaller than that from July to September, indicating that growth slows in response to lower fall temperatures and suggesting temperature as a major driver of differential growth (Table 21, Fig. 16). Differences between gains in live weights in July and August and in September and October are genotype-dependent at 10 and 18 (out of 44) trait- QTL, at the significance levels of ɑ=0.05 and ɑ=0.1, respectively (Table 22). Genotypes corresponding to fast-growing oysters differ under low and high temperatures (Fig. 17), indicating genotype-by-temperature interaction. 6 (35%) 2 (12%) 9 (53%) D+ D- OD 130 At 12 out of 18 trait-QTL showing significant genotype-by-environment interaction, the decreasing temperature has a larger negative effect on growth of parent-line genotype, indicating that hybrids can better cope with environmental stresses. Our discovery is consistent with the stronger selection against deleterious mutations under stressful environment demonstrated by Plough (2012), implying that inbreeding depression may be greater in harsher environments (Hedrick and Kalinowski, 2000). Meanwhile, this supports Lerner’s (1954) idea of genetic homeostasis that the ability of heterozygotes to buffer against environmental changes may be lost upon inbreeding, leading to more deviant phenotypes. Figure 16. Growth curve of log-transformed live weights in 23×31. -0.75 -0.58 -0.41 -0.24 -0.07 0.1 0.27 0.44 0.61 0.78 0.95 -3 7 17 27 37 47 57 67 77 87 97 107 117 127 Log-transformed live weight Day 131 Figure 17. An example trait-QTL with significant genotype-by-environment interaction. AC and BD are parent-line genotypes. 4.3 Biological interpretation of principal components In our study, in addition to log-transformed live weights, gains between adjacent log- transformed live weights, and deviations in intercept and slope of growth curves fitted to log- transformed live weights, we also include increase between the first and fourth measured log- transformed live weights (i.e. lglw41=lglw4-lglw1) in PCA. We include lglw41 because log- transformed live weights from the first to the fourth measurements and from the fourth to the fifth measurements fit two linear regression lines with different slopes (Fig. 16). We use the first four principal components (i.e. PC1, PC2, PC3 and PC4) as phenotypes for QTL mapping, because they explain over 96% of variance in the 12, size- and growth-related 0.0031 0.0033 0.0035 0.0037 0.0039 0.0041 0.0043 0.0045 0.0047 0.0049 0.01 0.0105 0.011 0.0115 0.012 0.0125 0.013 0.0135 Increase in log-transformed live weight per day gQTL16 on LG9 in 23×40 AC AD BC BD July 4-August 28, 2012 August 29-October 2, 2012 132 traits included in the PCA (Table 18, Fig. 13). Leaving aside family 47×92, which started smaller and lacked size information for June, patterns of principal components are remarkably consistent across the remaining five families (Fig. 13). Each principal component has its own biological interpretation. Highly correlated with each of the monthly live weights and with the intercept of the linear fit to log-transformed monthly weights, PC1 represents variation in size-at-age of F2 progeny and correlates positively with yield in biomass at the end of the growth period. PC1 correlates strongly and negatively with all monthly weight gains and with the slope of the growth curve, indicating an inverse relationship between size (yield) and growth. Smaller oysters grow faster than larger oysters but never catch up to the greater biomass of the larger oysters. PC2 correlates positively with nearly all traits, especially with gain in the first month and over the interval from June through September, suggesting that this component captures variance associated with maximum growth during the warmest months and ultimately with yield. In all families but 47 92, PC3 correlates strongly and positively with September weight gain and moderately and negatively with June weight gain. Thus, PC3 mainly captures variation in growth under the declining seawater temperatures of September, along with a moderately negative correlation between final and initial monthly weight gains. For family 47 92, PC3 correlates negatively with the last monthly gain, rather than positively as in other families. The loadings of traits on PC4 have similar patterns but different signs across the five families first weighed in June. PC4 tends to be negatively associated with weight gains in June and September and positively associated with weight gains in July and August. In family 23 31, the signs of these correlations are reversed. Temperatures in September and October are more than 2 ℃ lower than temperatures in July and August. Thus, PC4 demonstrates that growth rate of fast-growing 133 oysters under one temperature tends to slow down under the other, supporting genotype-by- temperature interaction. 134 Supplementary Information Table 23. Dates of collecting data on live weights (lw) of Pacific oysters Family lw1 lw2 lw3 lw4 lw5 23×31 6/1/2012 (299) a 7/3/2012 (287) 7/31/2012 (279) 8/28/2012 (279) 10/4/2012 (278) 23×40 6/1/2012 (230) 7/4/2012 (225) 7/31/2012 (224) 8/28/2012 (224) 10/2/2012 (224) 31×23 6/1/2012 (220) 7/4/2012 (213) 7/31/2012 (212) 8/29/2012 (210) 10/9/2012 (210) 40×92 6/1/2012 (400) 7/3/2012 (394) 7/31/2012 (393) 8/28/2012 (391) 10/18/2012 (391) 47×92 NA 7/3/2012 (167) 7/31/2012 (162) 8/29/2012 (162) 9/25/2012 (162) 92×40 6/1/2012 (172) 7/3/2012 (170) 7/31/2012 (164) 8/29/2012 (161) 9/27/2012 (160) a Numbers in parentheses indicate the sample sizes of tagged oysters weighed. Table 24. Criteria for assigning genetic models to growth QTL a Genetic model hp Hybrid=Inbred1 b Hybrid=Inbred2 Hybrid=Mid-Parent Overdominance >1 ≤0.05 c Underdominance <-1 ≤0.05 Dominance+ ~1 >0.05 ≤0.05 Dominance- ~-1 >0.05 ≤0.05 Partial Dominance+ 0<hp<1 ≤0.05 ≤0.05 Partial Dominance- -1<hp<0 ≤0.05 ≤0.05 Additive 0 ≤0.05 ≤0.05 >0.05 a From Hedgecock et al. (2007). b Inbred1 and Inbred2 are worse and better inbred parents, respectively. c Numbers in columns “Hybrid=Inbred1”, “Hybrid=Inbred2” and “Hybrid=Mid-Parent” are p- values for pre-planned post hoc contrasts to fit 49 gQTL to mutually exclusive genetic models, overdominance, underdominance, dominance+, dominance-, partial dominance+, partial dominance- and additive gene effects. 135 Table 25. Genetic models of QTL for size- and growth-related traits (trait-QTL) No. trait-QTL Genetic model gQTL Nearest marker Family-LG-POS a Trait Hybrid 1 Hybrid 2 3 626-C17818T 23×31-LG8-P5 lglw54 D- b (p=0.05) c 4 403-T27028C 23×40-LG1-P16 lglw21 ADD 5 1350-A406938T 23×40-LG2-P0 lglw2 D+ lglw3 D+ lglw4 OD intercept D+ 6 40822-G86433C 23×40-LG2-P14 lglw1 D+ lglw2 OD lglw3 OD lglw4 OD lglw5 OD intercept OD lglw43 D- D- slope D- 7 1485-C34067A 23×40-LG2-P20 lglw32 D- D- 12 22-G563443A 23×40-LG6-P6 lglw5 ADD (p=0.08, 0.08) 12 867-C359681T 23×40-LG6-P31 lglw43 D+ 16 115-C396908T 23×40-LG9-P48 lglw21 OD 18 211-T1228701C 31×23-LG7-P41 lglw3 D+ OD lglw4 OD OD lglw5 OD OD 19 117-C96629T 31×23-LG8-P12 lglw54 D+ (p=0.06) D+ 22 145-T707530A 40×92-LG1-P6 lglw21 D+ 23 40980-G15881A 40×92-LG1-P41 lglw5 OD OD 23 42568-T166943G 40×92-LG1-P45 lglw1 OD lglw2 OD lglw3 OD lglw4 D+ OD lglw5 OD OD intercept OD lglw43 D- slope UD 24 1327-A119169T 40×92-LG1-P54 lglw43 D- 24 203-T456306C 40×92-LG1-P62 lglw1 D+ D+ (p=0.07) lglw2 OD D+ Table continues next page. 136 No. trait-QTL Genetic model gQTL Nearest marker Family-LG-POS a Trait Hybrid 1 Hybrid 2 lglw3 OD OD lglw4 D+ OD lglw5 D+ OD intercept D+ D+ lglw32 D- slope D- D- (p=0.05) 28 672-T19370G 40×92-LG7-P1 lglw21 D+ 29 1767-T365766G 40×92-LG7-P10 lglw1 OD lglw32 D- slope UD 32 145-G707529A 47×92-LG1-P3 lglw2 D+ D+ (p=0.08) lglw32 D- slope D- 33 169-G252822T 47×92-LG1-P12 lglw4 OD OD 36 53-T172362A 47×92-LG5-P14 lglw43 D+ slope D+ 43 1352-G435942A 92×40-LG3-P22 lglw4 D+ 44 1249-G450639T 92×40-LG6-P20 lglw4 D+ 45 43804-A183435G 92×40-LG6-P41 lglw5 OD 45 1574-C395484T 92×40-LG6-P55 lglw43 UD a Family-LG-POS: Family-Linkage Group-Position of nearest markers of trait-QTL. b OD: overdominance; UD: underdominance; D+: dominance+; D-: dominance-; ADD: additive. c The p-value in parentheses is significant at ɑ=0.1. 137 Table 26. Genetic models of QTL for principal components (PC-QTL) of size- and growth- related traits No. PC-QTL Trait Genetic model gQTL Nearest marker Family-LG-POS a Hybrid 1 Hybrid 2 2 4-G566200T 23×31-LG7-P54 PC4 OD b OD 8 1032-A12571G 23×40-LG2-P38 PC2 D- D- 12 226-C160617T 23×40-LG6-P1 PC2 PD+ D+ 13 1514-G19562A 23×40-LG8-P21 PC3 ADD 22 40-T150431A 40×92-LG1-P14 PC2 OD 23 42568-T166943G 40×92-LG1-P45 PC1 OD OD 24 1797-G44751A 40×92-LG1-P68 PC1 OD D+ (p=0.09) c 25 1213-G717809A 40×92-LG3-P0 PC4 ADD 26 602-C35584T 40×92-LG4-P20 PC1 D- D- 31 294-G127027T 40×92-LG10-P3 PC3 D+ D+ (p=0.06) 32 145-G707529A 47×92-LG1-P3 PC1 D+ 36 53-G172375A 47×92-LG5-P14 PC2 D+ 41 42184-G42774A 92×40-LG2-P55 PC2 D+ D- (p=0.07) 42 1879-C62634T 92×40-LG3-P6 PC3 D- D- 44 1249-G450639T 92×40-LG6-P20 PC1 OD OD 45 43804-A149128T 92×40-LG6-P41 PC2 OD a Family-LG-POS: Family-Linkage Group-Position of nearest markers of PC-QTL. b OD: overdominance; UD: underdominance; D+: dominance+; D-: dominance-; PD+: partial dominance+; ADD: additive. c The p-value in parentheses is significant at ɑ=0.1. 138 References Baranski, M., Rourke, M., Loughnan, S., Hayes, B., Austin, C., Robinson, N., 2008. Detection of QTL for growth rate in the blacklip abalone (Haliotis rubra Leach) using selective DNA pooling. Anim. Genet. 39: 606–614. Bentley, D.R., Balasubramanian, S., Swerdlow, H.P., Smith, G.P., Milton, J., et al., 2008. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456: 53–59. Birchler, J.A., Auger, D.L., Riddle, N.C., 2003. In search of the molecular basis of heterosis. Plant Cell 15: 2236–2239. Birchler, J.A., Riddle, N.C., Auger, D.L., Veitia, R.A., 2005. Dosage balance in gene regulation: biological implications. Trends Genet. 21: 219–226. Birchler, J.A., Yao, H., Chudalayandi, S., 2006. Unraveling the genetic basis of hybrid vigor. Proc. Natl. Acad. Sci. 103: 12957–12958. Birchler, J.A., Yao, H., Chudalayandi, S., Vaiman, D., Veitia, R.A., 2010. Heterosis. Plant Cell 22: 2105–2112. Bruce, A.B., 1910. The Mendelian theory of heredity and the augmentation of vigor. Science 32: 627–628. Cockerham, C.C., Zeng, Z.B., 1996. Design III with marker loci. Genetics 143: 1437–1456. Crow, J.F., 1948. Alternative hypotheses of hybrid vigor. Genetics 33: 477–487. Crow, J.F., 1998. 90 years ago: the beginning of hybrid maize. Genetics 148: 923–928. Darwin, C., 1877. The effects of cross and self fertilisation in the vegetable kingdom. New York. Elshire, R.J., Glaubitz, J.C., Sun, Q., Poland, J.A., Kawamoto, K., Buckler, E.S., Mitchell, S.E., 2011. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLOS ONE 6: e19379. Griffing, B., 1990. Use of a controlled-nutrient experiment to test heterosis hypotheses. Genetics 126: 753–767. Guo, X., Li, Q., Wang, Q.Z., Kong, L.F., 2012. Genetic mapping and QTL analysis of growth- related traits in the Pacific oyster. Mar. Biotechnol. 14: 218–226. Hawkins, A.J.S., Bayne, B.L., Day, A.J., 1986. Protein turnover, physiological energetics and heterozygosity in the blue mussel, Mytilus edulis: the basis of variable age-specific growth. Proc R Soc Lond B 229: 161–176. Hedgecock, D., McGoldrick, D.J., Bayne, B.L., 1995. Hybrid vigor in Pacific oysters: an experimental approach using crosses among inbred lines. Aquaculture 137: 285–298. 139 Hedgecock, D., Davis, J.P., 2007. Heterosis for yield and crossbreeding of the Pacific oyster Crassostrea gigas. Aquaculture 272, Supplement 1: S17–S29. Hedgecock, D., Lin, J.-Z., DeCola, S., Haudenschild, C.D., Meyer, E., Manahan, D.T., Bowen, B., 2007. Transcriptomic analysis of growth heterosis in larval Pacific oysters (Crassostrea gigas). Proc. Natl. Acad. Sci. 104: 2313–2318. Hedrick, P.W., Kalinowski, S.T., 2000. Inbreeding depression in conservation biology. Annu. Rev. Ecol. Syst. 31: 139–162. Hu, Z., Xu, S., 2009. PROC QTL—A SAS procedure for mapping quantitative trait loci. Int. J. Plant Genomics 2009: 1–3. Lander, E.S., Botstein, D., 1989. Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121: 185–199. Langdon, C., Evans, F., Jacobson, D., Blouin, M., 2003. Yields of cultured Pacific oysters Crassostrea gigas Thunberg improved after one generation of selection. Aquaculture 220: 227– 244. Launey, S., Hedgecock, D., 2001. High genetic load in the Pacific oyster Crassostrea gigas. Genetics 159: 255–265. Lerner, I.M., 1954. Genetic homeostasis. Edinburgh: Oliver and Boyd. Li, Z.-K., Luo, L.J., Mei, H.W., Wang, D.L., Shu, Q.Y., Tabien, R., Zhong, D.B., Ying, C.S., Stansel, J.W., Khush, G.S., Paterson, A.H., 2001. Overdominant epistatic loci are the primary genetic basis of inbreeding depression and heterosis in rice. I. Biomass and grain yield. Genetics 158: 1737–1753. Li, H., Liu, X., Zhang, G., 2012. A consensus microsatellite-based linkage map for the hermaphroditic bay scallop (Argopecten irradians) and its application in size-related QTL analysis. PLOS ONE 7: e46926. Li, Y., He, M., 2014. Genetic mapping and QTL analysis of growth-related traits in Pinctada fucata using restriction-site associated DNA sequencing. PLOS ONE 9: e111707. Luo, L.J., Li, Z.-K., Mei, H.W., Shu, Q.Y., Tabien, R., Zhong, D.B., Ying, C.S., Stansel, J.W., Khush, G.S., Paterson, A.H., 2001. Overdominant epistatic loci are the primary genetic basis of inbreeding depression and heterosis in rice. II. Grain yield components. Genetics 158: 1755– 1771. Manichaikul, A., Mychaleckyj, J.C., Rich, S.S., Daly, K., Sale, M., Chen, W., 2010. Robust relationship inference in genome-wide association studies. Bioinformatics 26: 2867–2873. Marshall, T.C., Slate, J., Kruuk, L.E., Pemberton, J.M., 1998. Statistical confidence for likelihood-based paternity inference in natural populations. Mol. Ecol. 7: 639–655. Meyer, E., Manahan, D.T., 2010. Gene expression profiling of genetically determined growth variation in bivalve larvae (Crassostrea gigas). J. Exp. Biol. 213: 749–758. 140 Nie, H., Yan, X., Huo, Z., Jiang, L., Chen, P., Liu, H., Ding, J., Yang, F., 2017. Construction of a high-density genetic map and quantitative trait locus mapping in the Manila clam Ruditapes philippinarum. Sci. Rep. 7: 229. Niu, D., Du, Y., Wang, Z., Xie, S., Nguyen, H., Dong, Z., Li, J., 2017. Construction of the first high-density genetic linkage map and analysis of quantitative trait loci for growth-related traits in Sinonovacula constricta. Mar. Biotechnol. 19: 488–496. Pace, D.A., Marsh, A.G., Leong, P.K., Green, A.J., Hedgecock, D., Manahan, D.T., 2006. Physiological bases of genetically determined variation in growth of marine invertebrate larvae: A study of growth heterosis in the bivalve Crassostrea gigas. J. Exp. Mar. Biol. Ecol. 335: 188– 209. Petersen, J.L., Baerwald, M.R., Ibarra, A.M., May, B., 2012. A first-generation linkage map of the Pacific lion-paw scallop (Nodipecten subnodosus): Initial evidence of QTL for size traits and markers linked to orange shell color. Aquaculture 350: 200–209. Piepho, H.P., 2001. A quick method for computing approximate thresholds for quantitative trait loci detection. Genetics 157: 425–432. Plough, L.V., Hedgecock, D., 2011. Quantitative trait locus analysis of stage-specific inbreeding depression in the Pacific oyster Crassostrea gigas. Genetics 189: 1473–1486. Plough, L.V., 2012. Environmental stress increases selection against and dominance of deleterious mutations in inbred families of the Pacific oyster Crassostrea gigas. Mol. Ecol. 21: 3974–3987. Plough, L.V., Shin, G., Hedgecock, D., 2016. Genetic inviability is a major driver of type III survivorship in experimental families of a highly fecund marine bivalve. Mol. Ecol. 25: 895– 910. Qin, Y., Liu, X., Zhang, H., Zhang, G., Guo, X., 2007. Genetic mapping of size-related quantitative trait loci (QTL) in the bay scallop (Argopecten irradians) using AFLP and microsatellite markers. Aquaculture 272: 281–290. Ren, P., Peng, W., You, W., Huang, Z., Guo, Q., Chen, N., He, P., Ke, J., Gwo, J., Ke, C., 2016. Genetic mapping and quantitative trait loci analysis of growth-related traits in the small abalone Haliotis diversicolor using restriction-site-associated DNA sequencing. Aquaculture 454: 163– 170. Shull, G.H., 1908. The composition of a field of maize. Am. Breeders Assoc. Rep. 4: 296–301. Shull, G.H., 1914. Duplicate genes for capsule-form in Bursa bursa-pastoris. Zeitschr. Ind. Abstam. u. Vererbungs. 12: 97–149. Shull, G.H., 1948. What is “heterosis”? Genetics 33: 439–446. Singh, S.M., Zouros, E., 1978. Genetic variation associated with growth rate in the American oyster (Crassostrea virginica). Evolution 32: 342–353. 141 Stuber, C.W., Lincoln, S.E., Wolff, D.W., Helentjaris, T., Lander, E.S., 1992. Identification of genetic factors contributing to heterosis in a hybrid from two elite maize inbred lines using molecular markers. Genetics 132: 823–839. Wang, J., Li, L., Zhang, G., 2016. A high-density SNP genetic linkage map and QTL analysis of growth-related traits in a hybrid family of oysters (Crassostrea gigas×Crassostrea angulata) using genotyping-by-sequencing. G3-Genes Genomes Genet. 6: 1417–1426. Xiao, J., Li, J., Yuan, L., Tanksley, S.D., 1995. Dominance is the major genetic basis of heterosis in rice as revealed by QTL analysis using molecular markers. Genetics 140: 745–754. Zhan, A., Hu, J., Hu, X., Hui, M., Wang, M., Peng, W., Huang, X., Wang, S., Lu, W., Sun, C., Bao, Z., 2009. Construction of microsatellite-based linkage maps and identification of size- related quantitative trait loci for Zhikong scallop (Chlamys farreri). Anim. Genet. 40: 821–831. 142 Chapter 5 Sex-specific Mortality Is a Confounding Factor in Exploring Sex Determination in the Pacific Oyster Crassostrea gigas Abstract Sex determination, explained by a diversity of mechanisms in different species, is an interesting topic to biologists. Molluscs are suitable for studying sex evolution and determination, among which the Pacific oyster is a great model, owing to its diversity of sexual forms as well as abundant biological and genomic resources available. Previous studies reveal that both genetic elements and environmental factors, such as temperature and stress, play crucial roles in sex determination in oysters. Two- and three-genotype models have been proposed by Guo et al. (1998) and Hedrick and Hedgecock (2010) to explain sex determination in the Pacific oyster. While these models have greatly enhanced our understanding of sex determination, they may be imperfect. As suggested by Guo et al. (1998) and Hedrick and Hedgecock (2010), an examination on a large number of marker loci in controlled crosses of oysters would be necessary to uncover the genetic bases of sex determination in oysters. With high-density linkage maps of an average marker spacing, 0.49 cM, our study conducts QTL mapping on sex in six, interrelated F2 families of Pacific oysters, using single nucleotide polymorphisms (SNPs) detected by genotyping-by-sequencing (GBS). We detect six potential sex QTL (sQTL), with four in family 23×40 and two in family 47×92, among which one appears a true sQTL driving sex determination. However, we are uncertain about whether the other five QTL arise from sex- determining genes or sex-specific mortality. Therefore, it is important to discriminate sex- 143 determining genes and sex-specific viability in order to construct genetic models for sex determination in species with Mendelian segregation distortion. 1. Introduction Sex determination is of great interest to biologists because of its diverse mechanisms, which, in addition to the XX and XY system and the ZW and ZZ system in higher organisms, also include haplodiploid systems, temperature-determined systems, hermaphroditic systems, parthenogenetic development, self-fertilization, alternation of generations, etc. (Carlson, 2013). While both genetic and environmental factors are known to impact sex determination, the genetic-control mechanism for sex determination has not been well identified in many organisms. Characterized by a high diversity of sex determination mechanisms, molluscs are suitable for studies on evolution of sex and its determination (Yusa, 2007). The Pacific oyster is a great model for studies of this kind, owing to its diversity of sexual forms (i.e. coexistence of dioecy, protandry, and hermaphroditism) as well as abundant biological and genomic resources available (e.g. pedigreed lines, a sequenced C. gigas genome). Sex differentiation can start as early as around 34-44 days post fertilization (dpf) in oysters depending on temperature (Santerre et al., 2013). Early studies of sex determination in oysters focused on the examination and description of their gonad maturation, sexual differentiation, and sex change. Oysters can seasonally alternate sex (i.e. mature as male first and then switch to female), stay hermaphroditic, or remain dioecious, and sex determination in oysters can be influenced by environmental factors including temperature, nutritive conditions, etc. (Coe, 1932a, 1932b, 1934, 1936, 1938, 1943). Moreover, recent studies reveal that environmental factors including temperature and stress play a crucial role in sex determination in oysters (Chávez-Villalba et al., 2011; Joyce et al., 2013; Santerre et al., 2013). Meanwhile, some of these 144 studies suggest that, in addition to environmental factors, genetic factors may also be involved in sex determination (Coe, 1932b; Needler, 1941; Coe, 1943; Chávez-Villalba et al., 2011). The first genetic model for sex determination in oysters is a three-locus model proposed by Haley (1977; 1979), in which each locus has two forms of alleles with m for maleness and f for femaleness and sex is determined by the ratio of m:f. However, this model is imperfect due to the small number of families (i.e. only 5 families) on which it was based and the inability to describe sex reversal effectively. Then, a one-locus genetic model for sex determination, conforming with XX/XY system, was constructed by Guo et al. (1998), in which a dominant male allele, M, and a protandric female allele, F, determine sex in the Pacific oyster with MF being true males and FF being protandric females. The two-genotype model suggests strong paternal effects in sex determination, a small number of genes controlling sex ratios, and impacts of secondary genes and/or environmental factors on the sexual change of FF protandric females (Guo et al., 1998). However, the two-genotype model was unable to account for heterogeneity of sex ratios in half- sib families with the same paternal but different maternal parents, motivating the proposal of a three-genotype model with MM individuals being male, FF individuals being female, and FM individuals being protandric (Hedrick and Hedgecock, 2010). While these previous studies have greatly improved our understanding of sex determination in oysters, it is still possible that none of those models is perfect, as suggested by Hedrick and Hedgecock (2010). Meanwhile, it has also been recommended that, in order to figure out the genetic bases underlying sex determination in oysters, an examination on a large number of marker loci in controlled crosses of oysters would be necessary (Guo et al., 1998; Hedrick and Hedgecock, 2010). 145 Here, we conduct QTL mapping on sex in six, interrelated F2 families of Pacific oysters, using single nucleotide polymorphisms (SNPs) detected by genotyping-by-sequencing (GBS). High-density linkage maps with an average marker spacing of 0.49 cM provide frameworks for sex QTL (sQTL) mapping. We detect six potential sex-determining QTL, four in one family and two in another. 2. Materials and Methods 2.1 Biological materials and data collection Our study was based on six, interrelated F2 families (sire×dam), 23×31, 23×40, 31×23, 40×92, 47×92 and 92×40. Detailed pedigree and rearing information is described and shown in 3. Pedigrees and Rearing of Six, F2 Interrelated Families and Figure 1 in Introduction. The six F2 families were harvested in late September or early October of 2012 and shipped to the University of Southern California, where adductor muscle tissue was dissected and preserved in 70% ethanol for later DNA extraction. Sex was determined by microscopic examination of gonadal biopsies. We subsequently determined genotypes of oysters from the F1 hybrid parents and the six F2 families through genotyping-by-sequencing (GBS) and bioinformatics analyses guided by Genome Analysis Toolkit (GATK, https://www.broadinstitute.org/gatk/) (Bentley et al., 2008; Elshire et al., 2011). Details on GBS (library preparation and sequencing) and GATK analysis are described in Chapter 2. Finally, we conducted sQTL mapping with the same set of 12 parent oysters from six, F1 families and 1,041 progeny from six, F2 families (i.e. 23×31: n=208; 23×40: n=181; 31×23: n=142; 40×92: n=258; 47×92: n=127; 92×40: n=125) that were used in linkage mapping in Chapter 2. 146 2.2 Data analyses We constructed sixty high-density linkage maps for the six, F2 families, according to methods in Chapter 2, using JoinMap 4.1. We then mapped sQTL based on a QTL model developed by Hu and Xu (2009), which is implemented in PROC QTL, a user-defined procedure in SAS (version 9.4). Inputs for sQTL mapping included phased parent genotypes, unphased progeny genotypes, linkage maps and sex of F2 progeny. We found male, female, hermaphroditic and undifferentiated oysters in F2 families, but we used only male and female individuals in QTL mapping (23×31: n=138; 23×40: n=150; 31×23: n=86; 40×92: n=192; 47×92: n=99; 92×40: n=71). We coded male and female as 1 and 2, respectively, and treated sex as an ordinal trait in PROC QTL (Hu and Xu, 2009). Since distortion analysis cannot be included in QTL mapping on ordinal traits, we subsequently treated sex as a continuous trait in PROC QTL, so that we could use the distortion analysis option simultaneously. Genomes were scanned in 1-cM increments using the Maximum Likelihood method with distortion analysis off and on for sex as an ordinal or a continuous trait, respectively. We then mapped viability QTL (vQTL) by sex (i.e. male and female) based on a viability QTL model developed by Luo and Xu (2003). Genomes were scanned in 1-cM increments using the Maximum Likelihood method with distortion analysis. Since original parent lines were not completely inbred, we set the mating type of each F2 family to the four-way cross, AB×CD. A likelihood ratio test (LRT) statistic was generated for each increment on the genome. In running PROC QTL, we encountered the same errors as described in Chapter 3 and rectified these errors by removing markers at the point in the data where the error was generated. We approximated genome-wise and chromosome-wise QTL significance thresholds, at the α=0.05 level, by the method of Piepho (2001). When more than one QTL peak appeared on a linkage group, we identified separate QTL peaks only when the LRT statistic 147 between two QTL peaks fell by at least 4.6 (~1 LOD) (Lander and Botstein, 1989). If a QTL included multiple QTL peaks, the position on the genome with the highest LRT value was taken as the location for that QTL. We discriminated parent-line and hybrid genotypes according to the method described in Chapter 4. 3. Results We find male, female, hermaphroditic and undifferentiated oysters in each family, except for family 92×40, which does not have hermaphroditic individuals (Table 27). Families 23×31, 23×40, 31×23 and 47×92 only have a few hermaphrodites (23×31: n=2; 23×40: n=1; 31×23: n=3; 47×92: n=1), while family 40×92 has 17 (Table 27). We detect six potential sQTL in only two families, four in family 23×40 and two in family 47×92 (Table 28, Table 29, Fig. 18). Four potential sQTL are widely distributed across 10 linkage groups in family 23×40, while two potential sQTL are clustering on LG8 in family 47×92 (Table 28, Fig. 18). LRTs from QTL mapping on sex as an ordinal trait (i.e. distortion analysis off) and as a continuous trait (i.e. distortion analysis on) in PROC QTL are highly consistent, with r 2 , the coefficient of correlation between the two LRT profiles, being 0.97 for 23×40 and 0.99 for 47×92 (Fig. 19). According to vQTL mapping by sex, there is a stronger selection against females at QTL1 and 2 for family 23×40 (Table 28). In contrast, the selection against males is greater at QTL5 and 6 in family 47×92 (Table 28). The LRT for distortion, when mapping QTL for sex as a continuous trait, is high at nearly all QTL, except for QTL6, where it is only 6.35 (Table 28). 148 Table 27. Numbers of individuals in four categories of sexual differentiation in six F2 families A. Family 23×31 23×40 31×23 40×92 47×92 92×40 Total Male 48 70 28 109 37 8 300 Female 90 80 58 83 62 63 436 Hermaphroditic 2 1 3 17 1 0 24 Undifferentiated 68 30 52 48 26 47 271 Total 208 181 141 257 126 118 1031 B. Whether the proportions of four states of sexual differentiation depend on families C. Whether the male:female ratio depends on families Chi-square statistic d.f. p-value Chi-square statistic d.f. p-value 109.79 15 < 0.001 53.03 5 < 0.001 Table 28. Potential sex QTL in F2 families No. Family Nearest marker LG a Position sQTL b sQTL c vQTL by sex d QTL LRT_ordinal LRT_cont LRT_dist LRT_dist_sire LRT_dist_dam 1 23×40 1324-A77504T 1 9.64 15.80 15.59 15.81 19.77 86.60 2 23×40 370-G1054356A 5 20.09 15.19 15.79 22.80 6.35 33.36 3 23×40 753-G159486A 6 1.39 18.34 15.63 52.95 4 23×40 43-G608294A 10 18.81 28.42 30.06 78.13 5 47×92 241-T910750G 8 10.71 16.52 15.96 14.56 19.39 6.35 6 47×92 1189-C373681T 8 18.02 21.70 21.36 6.35 50.53 18.65 a LG: linkage group. b sQTL: sex QTL mapping; LRT_ordinal: LRT of QTL mapping on sex as an ordinal trait. c sQTL: sex QTL mapping; LRT_cont: LRT of QTL mapping on sex as a continuous trait; LRT_dist: LRT of distortion analysis from QTL mapping on sex as a continuous trait. d vQTL by sex: viability QTL mapping by sex; LRT_dist_sire and LRT_dist_dam are LRTs of vQTL mapping by sex for males and females, respectively. 149 Table 29. Individuals cross-classified by sex and genotype for each potential sQTL QTL1 Male Female Total QTL2 (exp) b Male Female Total AC 15 21 36 AC 19 43 62 AD a 15 3 18 AD 15 19 34 BC 21 22 43 BC 27 10 37 BD 16 32 48 BD 9 8 17 Total 67 78 145 Total 70 80 150 QTL3 Male Female Total QTL5 (exp) Male Female Total AC 17 19 36 AC 28 15 43 AD 22 37 59 AD 4 22 26 BC 8 0 8 BC 2 9 11 BD 18 13 31 BD 0 15 15 Total 65 69 134 Total 34 61 95 QTL4 Male Female Total QTL6 (exp) Male Female Total AC 29 7 36 AC 3 18 21 AD 30 35 65 AD 30 25 55 BC 8 31 39 BC 1 2 3 BD 0 2 2 BD 1 14 15 Total 67 75 142 Total 35 59 94 a Shaded cells indicate parent-line genotypes. b The “exp” indicates that the number of individuals in each category is estimated from genotype frequencies from vQTL mapping by sex. 150 Figure 18. LRT profiles for QTL mapping of sex as an ordinal trait. Red line indicates chromosome-wise QTL significance thresholds. 0 5 10 15 20 25 30 LRT cM 23×40 0 5 10 15 20 25 30 LRT cM 47×92 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 151 Figure 19. Correlation between LRTs for QTL mapping of sex as an ordinal and a continuous trait. LRT_ordinal and LRT_continuous stand for LRTs for QTL mapping on sex as ordinal and continuous traits, respectively. y = 0.9845x + 0.3057 r² = 0.966 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 LRT_ordinal LRT_continuous 23×40 y = 1.0104x + 0.0422 r² = 0.9865 0 5 10 15 20 25 0 5 10 15 20 25 LRT_ordinal LRT_continuous 47×92 152 4. Discussion Sex ratios are different across six F2 families (Table 27; p-value < 0.001). We find more female than male oysters in nearly all families, except for 40×92, which has more male individuals (Table 27). Since no sex information is available for undifferentiated individuals, we exclude them from sQTL mapping. Also, each category of an ordinal trait needs to account for at least 5% of the total sample size in order to be included in QTL mapping (Hu and Xu, 2009), so we remove hermaphrodites from analysis, except for family 40×92, in which hermaphrodites account for 6.6% (Table 27). Correlations between LRTs from QTL mapping of sex as an ordinal trait (i.e. distortion analysis off) and a continuous trait (i.e. distortion analysis on) in PROC QTL are highly consistent (Fig. 19). Even though distortion analysis is turned off when sex is mapped as an ordinal trait, the mapping result is not affected much, suggesting that distortions of Mendelian segregation ratios may not be a serious problem in QTL mapping of ordinal traits. Also, it may not make much of a difference to analyze a discrete trait as a categorical or continuous variable in QTL mapping. The high consistency between mapping sex as ordinal and continuous traits indicates that the QTL mapping result in our study is reliable. In order to fit potential sQTL to genetic models, we count the number of individuals in each genotype-by-sex category (Table 29). Since original parent lines used to set up F1 hybrid parents (AB×CD) are incompletely inbred, we expect four genotypes to segregate in the F2 progeny, AC, AD, BC and BD (Table 29). However, we are unable to discriminate the two hybrid genotypes for QTL2, 5 and 6, so we estimate the number of individuals in each category according to genotype frequencies from vQTL mapping by sex. 153 Five (i.e. QTL1, 2, 3, 4 and 5) out of six potential sQTL overlap with vQTL (refer to Table 14 in Chapter 3). Viability QTL mapping by sex shows sex-specific selection at three of these five potential sQTL, QTL1, 2 and 5 (Table 28). LRT from vQTL mapping by sex is unavailable for QTL3 and QTL4 because they were removed in order to resolve the error arising from small marker spacings, which causes PROC QTL to stop (Table 28). LRTs from distortion analysis in QTL mapping of sex as a continuous trait range from 14.56 to 78.13 for QTL1-5, close to or above the chromosome- or genome-wise threshold for detecting vQTL, suggesting that sex- specific mortality may also contribute to deviations in sex ratios at these QTL (Table 28). Given the overlap between vQTL and potential sQTL and the unequal selection against males and females, it is hard for us to determine whether the distortion of sex ratios at these loci is caused by sex-determining genes or arises from sex-specific viability. Also, there is no consistent pattern in sex ratio distortion across four genotypes at a QTL or across these five QTL (Table 29). Therefore, even though we observe male:female sex ratios deviating from 1:1 in some genotypes at these five QTL, we are unable to tell whether QTL1-5 are sex-determining genetic elements or to fit them to any genetic sex-determining models. Unlike the five QTL, QTL6 does not overlap with any vQTL detected by QTL mapping on all male, female and undifferentiated oysters (refer to Table 14 in Chapter 3). Meanwhile, LRT for distortion analysis from mapping sex as a continuous trait is 6.35, well below the chromosome- or genome-wise threshold for detecting vQTL when only male and female oysters are included in analysis (Table 28). Although LRTs for vQTL mapping by sex differ between males and females (Table 28), the different LRTs seem to be mainly driven by the absence of males in genotypes AC and BD at QTL6 (Table 29). Therefore, it is of high possibility that QTL6 is a potential sQTL arising from sex-determining genetic factors, instead of sex-specific 154 mortality. At this sQTL, males are nearly absent from parent-line genotypes (i.e. homozygotes) AC and BD, while males and females are almost in equal numbers in hybrid genotypes AD and BC. However, we are unable to fit this sQTL to either of the two- or three-genotype models for sex determination proposed by Guo et al. (1998) and Hedrick and Hedgecock (2010), based on studies of family sex-ratios. In our study, we detect six potential sQTL in only two out of six F2 families. This could be caused by the small sample size for sQTL mapping, ranging from 71 to 192 in families in which we failed to detect sQTL, or by extremely uneven male:female ratios (i.e. ~1:8 in 92×40). Therefore, a larger sample size may be needed to improve the QTL detection power in future studies. Also, the sex ratio of progeny produced by the F2 female may be a better phenotype for sQTL mapping (Alexander et al., 2015). More importantly, it is a great challenge to discriminate two possible factors contributing to a potential sQTL, sex-determining genes and sex-specific viability, for species with Mendelian segregation distortion. A potential approach to resolving this issue is to compare vQTL detected with samples taken post metamorphosis but before sex differentiation (i.e. 34-44 dpf) to potential sQTL detected when oysters get mature. We should be able to eliminate sex-specific mortality from sex determination if potential sQTL do not overlap with post-metamorphic vQTL. 155 References Alexander, H.J., Richardson, J.M.L., Edmands, S., Anholt, B.R., 2015. Sex without sex chromosomes: genetic architecture of multiple loci independently segregating to determine sex ratios in the copepod Tigriopus californicus. J. Evol. Biol. 28: 2196–2207. Bentley, D.R., Balasubramanian, S., Swerdlow, H.P., Smith, G.P., Milton, J., et al., 2008. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456: 53–59. Carlson, E.A., 2013. The 7 sexes: biology of sex determination. Indiana University Press, Bloomington. Chávez-Villalba, J., Soyez, C., Huvet, A., Gueguen, Y., Lo, C., Moullac, G.L., 2011. Determination of gender in the pearl oyster Pinctada margaritifera. J. Shellfish Res. 30: 231– 240. Coe, W.R., 1932a. Histological basis of sex changes in the American oyster (Ostrea virginica). Science 76: 125–127. Coe, W.R., 1932b. Sexual phases in the American oyster (Ostrea virginica). Biol. Bull. 63: 419– 441. Coe, W.R., 1934. Alternation of sexuality in oysters. Am. Nat. 68: 236–251. Coe, W.R., 1936. Environment and sex in the oviparous oyster Ostrea virginica. Biol. Bull. 71: 353–359. Coe, W.R., 1938. Primary sexual phases in the oviparous oyster (Ostrea virginica). Biol. Bull. 74: 64–75. Coe, W.R., 1943. Sexual differentiation in mollusks. I. Pelecypods. Q. Rev. Biol. 18: 154–164. Elshire, R.J., Glaubitz, J.C., Sun, Q., Poland, J.A., Kawamoto, K., Buckler, E.S., Mitchell, S.E., 2011. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLOS ONE 6: e19379. Guo, X.M., Hedgecock, D., Hershberger, W.K., Cooper, K., Allen, S.K., 1998. Genetic determinants of protandric sex in the Pacific oyster, Crassostrea gigas Thunberg. Evolution 52, 394-402. Haley, L., 1977. Sex determination in the American oyster. J. Hered. 68: 114–116. Haley, L., 1979. Genetics of sex determination in the American oyster. Proc. Natl. Shellfish. Assoc. 69: 54–57. Hedrick, P.W., Hedgecock, D., 2010. Sex determination: genetic models for oysters. J. Hered. 101: 602–611. 156 Hu, Z., Xu, S., 2009. PROC QTL—A SAS procedure for mapping quantitative trait loci. Int. J. Plant Genomics 2009: 1–3. Joyce, A., Holthuis, T.D., Charrier, G., Lindegarth, S., 2013. Experimental effects of temperature and photoperiod on synchrony of gametogenesis and sex ratio in the European oyster Ostrea edulis (L.). J. Shellfish Res. 32: 447–458. Lander, E.S., Botstein, D., 1989. Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121: 185–199. Luo, L., Xu, S., 2003. Mapping viability loci using molecular markers. Heredity 90: 459–467. Manichaikul, A., Mychaleckyj, J.C., Rich, S.S., Daly, K., Sale, M., Chen, W., 2010. Robust relationship inference in genome-wide association studies. Bioinformatics 26: 2867–2873. Marshall, T.C., Slate, J., Kruuk, L.E., Pemberton, J.M., 1998. Statistical confidence for likelihood-based paternity inference in natural populations. Mol. Ecol. 7: 639–655. Needler, A.B., 1941. Sex reversal in individual oysters. J. Fish. Res. Board Can. 5b: 361–364. Piepho, H.P., 2001. A quick method for computing approximate thresholds for quantitative trait loci detection. Genetics 157: 425–432. Plough, L.V., Hedgecock, D., 2011. Quantitative trait locus analysis of stage-specific inbreeding depression in the Pacific oyster Crassostrea gigas. Genetics 189: 1473–1486. Santerre, C., Sourdaine, P., Marc, N., Mingant, C., Robert, R., Martinez, A.S., 2013. Oyster sex determination is influenced by temperature - first clues in spat during first gonadic differentiation and gametogenesis. Comp. Biochem. Physiol. A. Mol. Integr. Physiol. 165: 61–69. Yusa, Y., 2007. Causes of variation in sex ratio and modes of sex determination in the mollusca—an overview. Am. Malacol. Bull. 23: 89–98. 157 Conclusions Using genomic and statistical approaches, my dissertation enhances our understanding of genetic effects on variation in fitness-related traits of the Pacific oyster Crassostrea gigas, at the organismal and family levels. My study starts with analyzing diallel crosses of partially inbred parent lines, in order to decompose the yield variance among F1 families into different genetic components, GCA, SCA and R, using an improved linear model that employs a Bayesian approach to accommodate missing data (Lenarcic et al., 2012). Going from the organismal to the genomic level, my study, either in depth or partially, reveals genetic mechanisms of broadly observed interesting phenomena in marine invertebrates, including Mendelian segregation distortion, heterosis and sex determination, through QTL mapping (Hu and Xu, 2009) based on high-density linkage maps with an average marker spacing of 0.49 cM. In Chapter 1, I show that a Bayesian hierarchical model can well accommodate unavoidable missing information, thus outperforming the traditional GLM in estimating genetic components and yield of Pacific oysters in practice. These accurately predicted genetic components and yields are important inputs for ranking and selecting parent lines. Elite parent lines identified according to predictions from the Bayesian hierarchical model are reliable for setting up high-yielding hybrid families, when variance among families in a diallel cross is high relative to variance within families, and for producing superior double-cross hybrids, when hybrid families used to predict the yield of double-cross hybrids are all present. In practice, in order to correctly select parent lines for hybrid families, it is helpful to increase variance among families relative to that within families by including more replicates for each family in a diallel cross. Moreover, identification of parents of double-cross hybrids should be more reliable in 158 reality, because a higher proportion of families should usually survive than in simulated diallel crosses, in which only half of families were present. In Chapters 2 through 5, I develop and apply methods for mapping genes underlying phenotypic variation in viability, growth and sex at the genomic level. I first construct high- density linkage maps, with an average marker spacing of 0.49 cM, which provides frameworks for QTL mapping. Compared to previous marker-based analysis (Launey and Hedgecock, 2001) and low-density QTL mapping (Plough and Hedgecock, 2011; Plough, 2012; Plough et al., 2016) on viability, QTL mapping based on 10× denser linkage maps in this study enables the construction of six specific genetic models. With these six genetic models involving additive, recessive and dominant deleterious mutations, the driver of type III survivorship and Mendelian segregation distortion is more precisely identified. Furthermore, high-density QTL mapping narrows vQTL peaks caused by single recessive deleterious mutation, greatly reducing the number of candidate genes responsible to test. Moreover, high-density QTL mapping is effective in teasing apart multiple deleterious mutations and their genetic effects under broad vQTL peaks. In spite of the improvements achieved by high-density QTL mapping in this study, two questions remain unexplored. First, even though vQTL arising from single recessive deleterious mutation have been more accurately pinpointed, we are still unable to identify candidate genes under narrow vQTL because of scaffold assembly errors in the C. gigas genome. Second, we are unable to trace the origin of mutations in F2 offspring causing vQTL back to the founders, as Launey and Hedgecock (2001) did, because we lack genotypic data on the inbred parent lines. This issue may be resolved by genotyping grandparents in future studies. Lack of genotypic data on inbred parent lines raised another challenge in gQTL analyses, which is to discriminate parent-line and hybrid genotypes for revealing genetic mechanisms of 159 heterosis. Fortunately, parent-line genotypes are determined according to genetic similarity between phased haplotypes from linkage mapping and confirmed by Mendelian segregation distortion arising from identical-by-descent recessive deleterious mutations. By comparing trait values between parent-line and hybrid genotypes at each QTL, we ascribe nine, six and two out of 17 yield QTL to overdominance, positive dominance and negative dominance, respectively. Only non-additive effects can account for variation in yield, supporting the larger contribution of non-additive genetic components, SCA and R, to yield than additive GCA, at the gene level. The observation of positive dominance and overdominance support the two, classical, alternative hypotheses of heterosis, while the case of negative dominance requires more complex genetic mechanisms to explain heterosis. In addition to genetic effects, temperature and its interaction with genotype also affect the growth of Pacific oysters. The growth rate of parent-line genotypes is lowered the most by the decreasing temperature from July and August to September and October, supporting Lerner’s (1954) idea of genetic homeostasis that the ability of heterozygotes to buffer against environmental changes may be lost upon inbreeding. While this study reveals genetic causes for heterosis, genetic mechanisms of differential growth between reciprocal families remain unexplained due to the inability to confirm identical-by-descent alleles. This issue may be resolved by genotyping grandparents, analyzing microhaplotypes as markers, or increasing the inbreeding levels of parent lines. Compared to vQTL and gQTL mapping, sQTL mapping seems to be less effective in unraveling the genetic mechanisms of sex determination. Through sQTL mapping, we identify six potential sex-determining QTL, five of which overlap with vQTL, suggesting sex-specific viability as the other potential factor accounting for sex ratio distortions. While the remaining one appears to be a good candidate sQTL, it still needs to be validated by excluding the effect of 160 sex-specific viability. Therefore, it remains a great challenge to discriminate two possible factors contributing to distortions of sex ratios at a potential sQTL, sex-determining genes and sex- specific viability, especially for species with Mendelian segregation distortion. Overall, QTL mapping and the Bayesian hierarchical model are important methods for elucidating genetic mechanisms of variation in fitness-related traits at the organismal and family levels. By illuminating genetic causes for variation in fitness-related traits, survival, growth and sex, in the Pacific oyster with the genomic and statistical approaches, this dissertation can not only shed light on evolution and biological conservation in the ocean, but also promote sustainable shellfish aquaculture. According to Hedgecock and Davis (2007), reciprocal effects significantly contribute to variation in yield among families, but cannot be well explained by maternal effect alone. This implies that nuclear-cytoplasmic interactions may play crucial roles in growth (i.e. yield) in the Pacific oyster. This study was planning to explore nuclear- cytoplasmic interactions by first comparing QTL, especially for growth (i.e. yield) across families. However, it is a pity that we are unable to compare genetic architectures of QTL across six, interrelated F2 families due to the lack of data on and the low inbreeding coefficient of inbred parent lines. For future studies, genetic analyses on fitness-related traits in marine invertebrates would be more powerful if microhaplotypes are analyzed as markers, genotypes of inbred parent lines are known, the assembly of the C. gigas genome is improved, and inbreeding coefficients of inbred parent lines are increased. 161 References Hedgecock, D., Davis, J.P., 2007. Heterosis for yield and crossbreeding of the Pacific oyster Crassostrea gigas. Aquaculture, Supplement: Genetics in Aquaculture IX 272, Supplement 1, S17–S29. Hu, Z., Xu, S., 2009. PROC QTL—A SAS procedure for mapping quantitative trait loci. Int. J. Plant Genomics 2009: 1–3. Launey, S., Hedgecock, D., 2001. High genetic load in the Pacific oyster Crassostrea gigas. Genetics 159, 255–265. Lenarcic, A.B., Svenson, K.L., Churchill, G.A., Valdar, W., 2012. A general Bayesian approach to analyzing diallel crosses of inbred strains. Genetics 190, 413–435. Lerner, I.M., 1954. Genetic homeostasis. Edinburgh: Oliver and Boyd. Plough, L.V., Hedgecock, D., 2011. Quantitative trait locus analysis of stage-specific inbreeding depression in the Pacific oyster Crassostrea gigas. Genetics 189, 1473–1486. Plough, L.V., 2012. Environmental stress increases selection against and dominance of deleterious mutations in inbred families of the Pacific oyster Crassostrea gigas. Mol. Ecol. 21, 3974–3987. Plough, L.V., Shin, G., Hedgecock, D., 2016. Genetic inviability is a major driver of type III survivorship in experimental families of a highly fecund marine bivalve. Mol. Ecol. 25, 895–910.
Abstract (if available)
Abstract
As a species that has been introduced from Asia to all continents but Antarctica, the Pacific oyster Crassostrea gigas is of high economic value and great scientific significance. Farmed Pacific oysters play crucial roles in satisfying rapidly growing needs for wholesome, environmentally friendly, and economically viable seafood. More importantly, many interesting phenomena, including type III survivorship, Mendelian segregation distortion, inbreeding depression, heterosis, sex polymorphism, and sex reversal, widely observed in marine invertebrates like the Pacific oyster, are believed to be at least partially genetically determined. The availability of pedigreed lines and a sequenced genome enable the Pacific oyster to be a great model for exploring genetic causes for these phenomena in marine invertebrates. ❧ In order to produce high-yielding oyster seeds, we usually conduct diallel analysis, which can decompose the yield variance into genetic components, general combining ability (GCA), specific combining ability (SCA) and reciprocal effect (R), to select elite parent lines. Since the traditional Generalized Linear Model with fixed effects (i.e. GLM) cannot accommodate unanticipated missing information in a diallel, I use a Bayesian hierarchical model (i.e. Bayesian analysis) to resolve issues arising in incomplete diallel analysis. My study shows that the Bayesian hierarchical model can effectively deal with missing information, accurately predict genetic components and yield, and properly pick superior parent lines for commercial production. ❧ In addition to being a commercially important species, the Pacific oyster is also a model organism for studies on marine invertebrates because it has substantial genetic and genomic resources for elucidating the genetic bases of high early mortality, yield heterosis and sex determination. This dissertation aims to determine the genetic architectures underlying variation in viability, growth and sex by conducting quantitative trait locus (QTL) mapping on six, inter-related F₂ families of Pacific oysters. I construct linkage maps based on single nucleotide polymorphisms (SNPs) detected by Illumina sequencing of reduced-representation genomic libraries (genotyping-by-sequencing, GBS). These high-density linkage maps have an average marker spacing of 0.49 cM and provide frameworks for QTL mapping within each family. Building upon previous discoveries, this dissertation has three major findings. First, viability QTL (vQTL) mapping with high-density linkage maps permits testing of specific models of vQTL gene action, narrows vQTL peaks arising from single, recessive deleterious mutations, and teases apart multiple deleterious mutations under broad vQTL peaks. Additionally, growth QTL (gQTL) mapping reveals that dominant and overdominant gene effects contribute to yield heterosis in the Pacific oyster. While positive dominance and overdominance support the two, classical, alternative hypotheses for heterosis, dominance and overdominance, cases of negative dominance suggest that a more complex genetic architecture is needed to explain heterosis. Along with genetic effects discovered from gQTL mapping, significant genotype-by-temperature interactions are detected at certain gQTL, especially for inbred, parent-line genotypes, supporting Lerner’s (1954) idea of genetic homeostasis at the locus level. Finally, through sex QTL (sQTL) mapping, we detect one potential candidate QTL that may account for sex determination in the Pacific oyster. However, it is of great necessity to discriminate sex-determining genes from sex-specific viability genes in order to validate potential sQTL. ❧ By illuminating genetic contributions to variation in fitness-related traits, including survival, growth (i.e. yield), and sex, in the Pacific oyster at the organismal and family levels using powerful QTL mapping and Bayesian approaches, this dissertation not only sheds light on evolution and biological conservation in the ocean but also promotes sustainable shellfish aquaculture.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Genome-wide analysis of genetic load and larval mortality in a highly fecund marine invertebrate, the Pacific Oyster Crassostrea gigas
PDF
Phylogeography, reproductive isolation, and the evolution of sex determination mechanisms in the copepod Tigriopus californicus
PDF
Stage-specific transcriptomic analyses of reproductive tissue in the Pacific oyster, Crassostrea gigas
PDF
Complex mechanisms of cryptic genetic variation
PDF
Molecular and behavioral mechanisms of circatidal biological rhythms in intertidal mollusks
PDF
Sex differences in aging and the effects of mitochondria
PDF
Dynamics of protein metabolism in larvae of marine invertebrates
PDF
The evolution of gene regulatory networks
PDF
Disentangling the ecology of bacterial communities in cnidarian holobionts
PDF
Physiological strategies of resilience to environmental change in larval stages of marine invertebrates
PDF
The environmental and genetic determinants of cleft lip and palate in the global setting
Asset Metadata
Creator
Yin, Xiaoshen
(author)
Core Title
Genetic architecture underlying variation in different traits in the Pacific oyster Crassostrea gigas
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Biology (Marine Biology and Biological Oceanography)
Publication Date
07/18/2019
Defense Date
04/23/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
genetics,linkage map,OAI-PMH Harvest,Pacific oyster Crassostrea gigas,quantitative trait locus mapping
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Hedgecock, Dennis (
committee chair
), Conti, David (
committee member
), Dean, Matthew (
committee member
), Edmands, Suzanne (
committee member
)
Creator Email
xiaoshey@usc.edu,xyin.aqua@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-18956
Unique identifier
UC11671948
Identifier
etd-YinXiaoshe-6420.pdf (filename),usctheses-c89-18956 (legacy record id)
Legacy Identifier
etd-YinXiaoshe-6420.pdf
Dmrecord
18956
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Yin, Xiaoshen
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
genetics
linkage map
Pacific oyster Crassostrea gigas
quantitative trait locus mapping