Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Understanding genetics of traits critical to the domestication of crops using Mixed Linear Models
(USC Thesis Other)
Understanding genetics of traits critical to the domestication of crops using Mixed Linear Models
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Understanding genetics of traits critical to the domestication of crops using Mixed Linear
Models
By
Anupam Singh
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(MOLECULAR BIOLOGY)
May 2022
Copyright 2022 Anupam Singh
ii
Dedication
To my family. My grandparents Kedar and Pano. My parents Indradeo and Baby. My
brothers Saurabh, Subhash. My sisters Roopam and Virginia. My babies Yuven, Indira and
Samarth. My love Navjot.
iii
Acknowledgments
This journey of PhD is incomplete without the training provided by my advisor Dr.
Sergey Nuzhdin. Thank you, Sergey, for teaching me the essential skill of “thinking”, the most
valuable skill for life. My deepest gratitude to all collaborators, specifically Dr. Douglas Cook,
Dr. Peter Chang, and Dr. Allison Shultz for providing me with the essential knowledge and
resources for a successful PhD. I want to extend my regards to my committee members Dr. Ian
Ehrenreich, Dr. Matthew Pratt and Dr. Fengzhu Sun for their support and guidance towards
completing my Ph.D. I am lucky to have found my students Maggie and Ania who have enriched
my experience in the time spent at USC.
The department, faculty, and staff of molecular and computational biology hold a special
place in my heart. They welcomed me from day one and have been a home away from home. My
lab/life mates Gary, Vasantika, Chris, Levi, Katrina, thank you for keeping me grounded,
uplifting me, offering car rides, coffees, and most importantly keeping the fun at work alive.
Beyond classwork, I extend my gratitude to Joint Educational Project and DJ Kast for enriching
my Ph.D. experience beyond the thesis and giving me the platform to give back to the
community via the cancer education program.
I am grateful to my parents for providing me with the support and opportunity to fulfill
my dream education. The decision of pursuing a Ph.D. for me wouldn’t be fulfilled without the
backing of my husband Navjot and the family. I would like to give a shout out to my brother
Saurabh and my sister Roopam for rooting for me throughout. Thanks to my friends Pooja,
Santosh, Keerthi, Sanjeev, Carmen, Maddy and Marika, for always saying “you got this”.
iv
Table of Contents
Dedication ...................................................................................................................................... ii
Acknowledgments ........................................................................................................................ iii
Table of Contents ......................................................................................................................... iv
List of Tables ............................................................................................................................... vii
List of Figures ............................................................................................................................. viii
Abstract ........................................................................................................................................ xii
Chapter 1: Introduction ............................................................................................................... 1
1.1 Domestication of crops ......................................................................................................... 1
1.2 Genomic tools in aiding crop domesticate: GWAS and GS ................................................. 2
1.3 Studying flowering time in wild Cicer.................................................................................. 3
1.4 Non-linear regression models for predicting flowering time ................................................ 4
1.5 Nested Association Mapping (NAM) populations, an experiment to expand genetic
diversity in the cultivars .............................................................................................................. 4
1.6 GWAS and Genome Selection in Kelp ................................................................................. 7
Chapter 2: Assessing effects of genetics and environment on flowering time in wild Cicer
across different environments ...................................................................................................... 9
2.1 Introduction ........................................................................................................................... 9
2.2 Methods............................................................................................................................... 11
2.2.1 Dataset of wild chickpea accessions ............................................................................ 11
2.2.2 Experimental design and phenotype collection ........................................................... 11
2.2.3 Statistical analysis ........................................................................................................ 12
2.2.4 GWAS .......................................................................................................................... 12
2.3 Results ................................................................................................................................. 14
2.3.1 Phenotype analysis ....................................................................................................... 14
2.3.2 ANOVA results ............................................................................................................ 18
v
2.3.3 Principal Component Analysis .................................................................................... 18
2.3.4 GWAS results .............................................................................................................. 19
2.4 Discussion ........................................................................................................................... 21
2.5 Conclusion .......................................................................................................................... 22
Chapter 3: Non-linear regression models for time to flowering in wild chickpea combine
genetic and climatic factors [30] ................................................................................................ 23
3.1 Background ......................................................................................................................... 23
3.2 Materials and methods ........................................................................................................ 26
3.2.1 Dataset of wild chickpea accessions ............................................................................ 26
3.2.2 Regression model for time to flowering ...................................................................... 30
3.2.3 Analytic form of control function ................................................................................ 32
3.2.4 Statistical tests .............................................................................................................. 33
3.2.5 Software tools .............................................................................................................. 34
3.3 Results ................................................................................................................................. 34
3.3.1 Model with interactions between climatic factors and locations ................................. 35
3.3.2 Basic flowering time models for locations .................................................................. 36
3.3.3 Analysis of the climatic factor effect on phenotype .................................................... 37
3.3.4 Flowering time model with climatic factor-by-genotype interaction .......................... 44
3.4 Discussion ........................................................................................................................... 49
3.5 Conclusions ......................................................................................................................... 52
Chapter 4: Uncovering genetic basis of critical domestication traits in Cicer hybrids using
Nested Association Mapping ...................................................................................................... 53
4.1 Introduction ......................................................................................................................... 53
4.2 Methods............................................................................................................................... 55
4.2.1 NAM crossing scheme, selection of genotypes ........................................................... 55
4.2.2 Field setup, phenotyping of different traits .................................................................. 56
4.2.3 DNA extraction and sequencing .................................................................................. 57
4.2.4 SNP calling and selection ............................................................................................ 58
4.2.5 SNP imputation ............................................................................................................ 58
4.2.6 GWAS method ............................................................................................................. 59
4.2.7 Candidate gene identified from GWAS ....................................................................... 61
4.2.8 MSA analysis/ ITASSER to infer amino acid changes/ structural changes in genes
inferred from GWAS ............................................................................................................ 61
4.3 Results ................................................................................................................................. 63
4.3.1 NAM populations crossing scheme ............................................................................. 63
4.3.2 Phenotype results ......................................................................................................... 65
4.3.3 Combined population analysis in GWAS .................................................................... 70
4.3.4 Genes identified around the GWAS hits ...................................................................... 70
vi
4.3.5 Correlated phenotypes have the same genomic region associated with them in the
GWAS analysis ..................................................................................................................... 73
4.3.6 Introduction of wild allele increases yield ................................................................... 75
4.3.7 Cost of domestication: the tradeoff between yield, leaf size, and leaf necrosis .......... 75
4.3.8 Effect of the environment of origin of the wild NAM parents on shattering in the F2s
............................................................................................................................................... 77
4.3.9 Gene level phylogeny shows signs of introgression of C. echi into C. ari .................. 80
4.4 Discussion ........................................................................................................................... 84
4.5 Conclusion .......................................................................................................................... 85
Chapter 5: Breeding superior kelp using GWAS and genomic selection models ................. 87
5.1 Introduction ......................................................................................................................... 87
5.2 Methods............................................................................................................................... 88
5.2.1 Phenotype collection .................................................................................................... 88
5.2.2 DNA sequencing and SNP calling ............................................................................... 89
5.2.3 GWAS .......................................................................................................................... 89
5.2.4 Genome Selection ........................................................................................................ 91
5.3 Results ................................................................................................................................. 92
5.3.1 Phenotype results ......................................................................................................... 92
5.3.2 GWAS results .............................................................................................................. 95
5.3.3 Genome selection results ............................................................................................. 97
5.4 Discussion ........................................................................................................................... 98
5.5 Conclusion .......................................................................................................................... 99
Chapter 6: Conclusion .............................................................................................................. 100
Reference List ............................................................................................................................ 101
Appendix A ................................................................................................................................ 113
Appendix ................................................................................................................................. 113
A.1 Non-linear regression models for time to flowering in wild chickpea combine genetic and
climatic factors[30] ................................................................................................................. 113
A.2 Genotyping and lipid profiling of 601 cultivated sunflower lines reveals novel genetic
determinants of oil fatty acid content [138] ............................................................................ 115
vii
List of Tables
Table 2.1: Study design of FTM data collection........................................................................... 15
Table 3.1: Statistically significant differences in effects of climatic factors and their
combinations on plant genotype. The pair-wise comparisons were performed with Mann-
Whitney-Wilcoxon test. DL- day lengthh, TEMP – temperature, P – precipitation; REF –
reference allele, ALT – alternative allele for polymorphic site .................................................... 47
Table 4.1: List of phenotypes collected in the NAM populations. The traits can be divided into
two broad categories: Architectural and Biomass and both of which have continuous and
categorical traits. ........................................................................................................................... 66
Table 4.2: List of genes found around the GWAS hits, for all the traits analyzed ....................... 72
viii
List of Figures
Figure 1.1: Mapping populations and their comparison ................................................................. 5
Figure 2.1: Distribution of averaged FTM across all sites. Y-axis is the number of days of FTM
and on the X-axis are all the sites where the plants were grown. Average flowering time in plants
sown in winter is higher compared to those that were sown in Spring and summertime ............. 16
Figure 2.2: Distribution of averaged FTM across all sites. The average of FTM in each site
across all years and treatments is shown in the figure. Y-axis is the number of days of FTM and
on the X-axis are all the sites where the plants were grown. Plants were grown in higher
elevation (like Mt. Barker and Ankara) flower later when compared to plants in lower elevation.
....................................................................................................................................................... 16
Figure 2.3: Pairwise spearman correlation of FTM in all sites of Turkey and Australia. Pairwise
Pearson correlation of FTM across all environments ((environment = Site + year + treatment,
example: Ank_15_TOS2) of data collection using R package (PerformanceAnalytics).
Significances associated with p-values( ***=0.001, **=0.05, *=0.1) ......................................... 17
Figure 2.4: Principal Component Analysis (PCA) of ret. Component analysis of all the ret
genotypes is shown in the figure. Y-axis is PC 2 values and X-axis is PC 1 values. Each blue dot
in the graph is a genotype. The separate clustering of genotypes indicates a high population
structure between genotypes. ........................................................................................................ 19
Figure 2.5: Manhattan plot and QQ plot showing the GWAS of FTM in Urfa_2014_TOS1 site of
Turkey. Mixed model results of GWAS for FTM in Urfa_2014_TOS1. Manhattan plot: X-axis is
the chromosomes in the genome of rets. Y-axis is the -log(P) values associated with each
genomic marker in the mixed model. QQplot: X-axis is the expected -lop(p) values of a null
model and Y-axis is the -log(P) associated with the genetic markers. There is a clean peak of
genomic markers associated with FTM on chromosome 4 in this model. .................................... 20
Figure 2.6: Manhattan plot and QQ plot showing the GWAS of FTM in Urfa_2014_TOS2 site of
Turkey. Mixed model results of GWAS for FTM in Urfa_2014_TOS2. Manhattan plot: X-axis is
the chromosomes in the genome of rets. Y-axis is the -log(P) values associated with each
genomic marker in the mixed model. QQplot: X-axis is the expected -lop(p) values of a null
model and Y-axis is the -log(P) associated with the genetic markers. There is no significant
association between genomic markers and FTM in this model. ................................................... 20
Figure 3.1:Distribution of time to flowering for the whole dataset. The range for time to
flowering is from 64 to 221 day .................................................................................................... 28
Figure 3.2: Analysis of climatic factor effects on phenotype. Box plots of climatic factor
influence estimators calculated for model ensembles and for each location as a finite difference
ix
approximation of the partial derivative of a regression function (1) in respect to the factor. Each
box covers two quantiles from 25 to 75% of influence’s variation with a horizontal line at
median value of the estimated influence. Empty circles represent outliers. Boxes located higher
than zero mark on vertical axis represent a positive influence of a factor on flowering time. In
this case increasing the factor speeds up flowering. Other boxes represent an opposite case.
“DL”, “TEMP”, “P” and “U” correspond to factors related to day length, temperature,
precipitation and humidity, respectively ....................................................................................... 39
Figure 3.3: Results of pair-wise comparisons of day length influence on flowering time. Mann-
Whitney-Wilcoxon test was applied to compare the means of day length influence estimators for
locations. Statistically significant differences in means are shown as red color gradation, cells
with statistically non-significant comparisons are left blank ........................................................ 40
Figure 3.4: Results of pair-wise comparisons of temperature influence on flowering time. Mann-
Whitney-Wilcoxon test was applied to compare the means of temperature influence estimators
for locations. Statistically significant differences in means are shown as red color gradation, cells
with statistically non-significant comparisons are left blank ........................................................ 41
Figure 3.5: Results of pair-wise comparisons of precipitation influence on flowering time. Mann-
Whitney-Wilcoxon test was applied to compare the means of precipitation influence estimators
for locations. Statistically significant differences in means are shown as red color gradation, cells
with statistically non-significant comparisons are left blank ........................................................ 42
Figure 3.6: Results of pair-wise comparisons of humidity influence on flowering time. Mann-
Whitney-Wilcoxon test was applied to compare the means of humidity influence estimators for
locations. Statistically significant differences in means are shown as red color gradation, cells
with statistically non-significant comparisons are left blank ........................................................ 43
Figure 3.7: Correlations of mean annual precipitation (mean_annual_prec) with allele frequency
of the 6 GWAS SNPs calculated for 15 populations of the wild chickpeas (shown for
completeness). Allele frequency of SNPs 1, 4,5 and 6 are correlated with mean annual
precipitation. The allele frequencies have a linear relationship at each of these significant SNPs,
showing that these alleles are nearly fixed in the population in regions with high mean annual
precipitation .................................................................................................................................. 48
Figure 4.1: Schematic representation of the NAM crossing scheme. An elite cultivar is crossed
with 19 wild species to create F1s. The F1s are selfed to create 19 F2 populations. The
chromosomes show a mosaic of recombination breakpoints at F2 generation. A total of 1942
individuals are genotyped in the NAM population ....................................................................... 64
Figure 4.2: Illustration representing the traits collected in NAM populations listed in ............... 65
Figure 4.3: Pairwise correlation of 8 continuous traits Plant height (PH), Plant lateral width
(PLAW) Plant longitudinal width (PLOW), Average seed mass (ASM), Total plant biomass
x
(TPB), yield, shattering index (SI) and Plant volume (PV). Significance level associated with the
pearson correlation value is "***" p-value is < 0.001, "**" if the p-value is < 0.01, "*" p-value
is < 0.05, "." p-value is < 0.10, .................................................................................................... 68
Figure 4.4: Relationship between PGH and PH in a and SI in b. Plants with lower fertility shatter
more. Plants with high fertility have yield. ................................................................................... 69
Figure 4.5: Relationship between PF and SI in a and Yield in b. Yield and PF statistically
significant between plants with low, medium and high fertility. .................................................. 69
Figure 4.6: Relationship between Yield and NL in a and Yield and PLS in b. Plants with no leaf
necrosis have statistically significantly higher yield. Similarly, plants with bigger leaves have
higher yield. .................................................................................................................................. 70
Figure 4.7: Analysis of yield in context with other traits. a) Correlation between Average seed
mass (ASM), Total plant biomass (TPB), Yield, Plant volume (PV), Plant lateral width (PLAW),
Plant longitudinal width (PLOW) Plant. b) Manhattan plots showing the SNPs tested in the
FARMCPU model for the traits shown in a). The 35mb LD block has a significant SNP
associated with these traits. The genome-wide significance is drawn after the Bonferroni
correction at 0.05 (0.05/number of tests). c) Violin plot showing the distribution of yield based
on the most significant genomic window. The position on chromosome 5 has a reference allele as
A and the alternate allele is G. d) bar plot showing the distribution of the SNP from c) in
cultivated (Cicer arietinum), closely related wild (Cicer reticulatum) and their common ancestor
(Cicer echinospernum) .................................................................................................................. 74
Figure 4.8: Tradeoff between yield, leaf size, and necrosis. a) Manhattan plots showing the SNPs
tested in the FARMCPU model for yield, plant leaf size (PLS), and Necrotic leaves (NL). The
genome-wide significance is drawn after the Bonferroni correction at 0.05 (0.05/number of
tests). b) Distribution of alleles on chromosome 5 at position 35631842 for b) PLS and c) NL.
Colors represent the change in leaf morphology based on the change in allele in b and change in
leaf morphology in c. .................................................................................................................... 77
Figure 4.9: a)Manhattan plots showing the SNPs tested in the FARMCPU model for shattering
index(SI). The genome-wide significance is drawn after the Bonferroni correction at 0.05
(0.05/number of tests). b) Violin plot showing the distribution of shattering index on the Y axis
and x-axis shows the allelic distribution of the most significant GWAS SNP from a) in all the
NAM populations combined. The position (2044973) of the most significant SNP on
Chromosome 6(Ca6) has C as the reference allele. c) It is an extension of b. It shows the
distribution of Ca6_2044973 in each of the NAM populations separately. d) Is the distribution of
most significant SNP for yield in all the NAM populations. ........................................................ 79
Figure 4.10: Structure plot from [2]. The Structure plot is overlaid on the map of Turkey. 19 of
the C. ret NAM parents are divided into 8 genetic groups represented by different colors. This
grouping is used to cluster the NAM parents by drawing circles around the sites from which
xi
these parents were collected in Turkey. Each circle has a number associated with it to denote its
group number. The structural grouping is used a proxy to cluster the NAM families. ................ 80
Figure 4.11: a) phylogenetic analysis of genes around GWAS hits. b) Violin plot showing the
distribution of SI in the C.echi NAM populations. C) distribution of yield in C.echi NAM
populations. ................................................................................................................................... 83
Figure 4.12: I-TASSER of disease resistance response 206 protein. The dirigent domain that
confers protein functionality in lignificition is in blue. The remaining portion of the protein is
shown in green. Vernolic acid, the proposed ligand is shown in orange[132] ............................. 83
Figure 5.1: A summary plot showing the correlation of the four traits collected for giant kelp
sporophytes: Blade weight, Stipe weight, plant biomass, and individual count of blades on the
sporophytes across 4 different populations. The first column of the plot also shows the total
number of each replicate (numbered from 1 to 5) collected across all the genotypes and column 2
shows a population-specific number of genotypes grown on the farm. ....................................... 93
Figure 5.2: a) Illustration of one kelp plant. b) count of survival of plants across all four
populations. Survival ratio is lowest for Los Angeles population. ............................................... 94
Figure 5.3: GWAS results of blade weight, plant biomass, and stipe weight. On left is the
manhattan plot showing markers on all the scaffolds (on x-axis) tested in the GWAS model. -
log10(p) value associated with each marker is plotted (on Y-axis) with a solid line showing
genome-wide significance for the P values using Bonferroni correction. Right: Quantile Quantile
(QQ) plot showing the deviation of the markers tested (in blue) against the null (red line) in the
GWAS model There are multiple genomic loci significantly associated with plant balde weight.
....................................................................................................................................................... 96
Figure 5.4: Violin plot showing the distribution of the allele at the most significant SNP. X-axis
shows two alleles. Y-axis the ........................................................................................................ 97
Figure 5.5: Scatter plot representing the correlation between average plant biomass collected
from the empirical / farm data from 2019 for each of the genotypes and predicted values for the
same trait using gBLUP. The highlighted red dot indicates the average plant biomass calculated
from the multiple regression model after fixing those 7 SNPs in the population. ........................ 98
xii
Abstract
Domestication of crops on land has helped meet the world’s growing food demand.
Genomic resources available for different crops have enabled the switch from traditional
phenotype-based domestication methods to more genetic approaches that are helping fasten the
selection process. Long-term selection in many plants has been shown to have enhanced its
agronomic potential but also caused a deterioration of its genetic diversity. One instance of that
situation is chickpeas. This thesis focuses on increasing genetic diversity in chickpeas using
nested association mapping (NAM) by the introgression of wild material into the elite cultivar.
Taking lessons from domestication on land, the second aspect of this thesis is focused on
the domestication of giant kelp. Popular genomic methods like genome-wide association studies
(GWAS) and genome selection (GS) have been used to propose genotypes that can advance
breeding of kelp and meet the growing demand for this economically valuable product
1
Chapter 1: Introduction
This thesis consists of 4 chapters. One has been published, two are in the preparation
stage with the last chapter setting a framework for publishing future work on genotype and
phenotype map in green algae. The following sections will introduce the concepts that introduce
the vocabulary of each chapter presented in this dissertation.
1.1 Domestication of crops
Domestication provides most of the food that we eat today. It’s the most important
development in the history of humans in the last 1300 years. The Fertile Crescent and parts of
China are known to be the two earliest centers of plant domestication [1]. Chickpea (Cicer
arietinum), a cultivated pulse belonging to Family Fabaceae was domesticated in the fertile
crescent about 10000 years ago [2]. Domestication in chickpeas favored the selection of key
traits also called domestication syndrome [3] like upright growth habit, less seed shattering,
reduced seed dormancy, and seed coat texture. Deconstructing the biology behind these traits can
help understand the relationship between traits and the tradeoff between phenotypic selection and
genetic resilience due to domestication, particularly in chickpeas. The next two chapters
presented in this thesis are based on the concept described above.
With the ever-growing hunger in the world, domestication has been adopted from
terrestrial systems and being applied to the marine world. A vital component of the marine
2
ecosystem is seaweed. Biomass of seaweed is used in the food industry and provides a source for
myriad chemicals, biofuels, cattle feed, etc that ease the burden on plant products [4]. Breeding
of selective macroalgal species is a recent phenomenon and lack of standard cultivars is causing
inconsistency in the production of the desired phenotypes [5]. Unlike farming on land where
farmers have enough resources to produce uniform crops, the macroalgal industry needs the
appropriate resources to cultivate desired biomass of the seaweed. My project is focused on
generating domesticates for one such macroalgal species Macrocystis pyrifera a.k.a giant kelp
using genomic tools.
1.2 Genomic tools in aiding crop domesticate: GWAS and GS
In the era of sequencing, genomic data-assisted methods are widely used to enhance traits
in existing cultivars or domesticate a new crop. Both Genome-Wide Association Study (GWAS)
and Genome Selection (GS) methods are at the center of the methods utilized to do so [6]. These
methods sometimes aid the traditional phenotype selection approach used for thousands of years
in domesticating plants.
Marker-assisted selection (MAS) utilizes GWAS and quantitative trait mapping loci
(QTL) approaches to speed up the selection process compared to the pedigree-based methods.
These methods are well suited for traits that are governed by few genes and are simple. As the
complexity of the traits increases, MAS becomes less suitable. GWAS has been used extensively
to identify genes governing key traits. I have implemented GWAS in all the chapters to infer the
genetic underpinnings of multiple traits.
3
In scenarios where multiple loci of small effect change a trait value, the scale of the
experiment has to be amplified exponentially to have enough data points to implement the MAS
methods. Genetic Background and epistatic interactions between genes make MAS difficult. GS
was introduced in 2001 by Meuwissen and colleagues to improve the breeding efficiency of
complex traits. In this breeding schema, DNA markers from the entire genome are used to
predict genotypes that would be most valuable to be parents of the offspring of the next
generation [7]. In the 5th chapter of this dissertation, I implement GS on genomic data of giant
kelp to choose parents that can be used in breeding superior offspring in the following
generation.
1.3 Studying flowering time in wild Cicer
Chapter 2 provides an analysis of flowering time data collected in two years for two wild
Cicer species: Cicer reticulatum (ret) and Cicer echinospermum (echi) in multiple locations of
Turkey and Australia.
Flowering time is especially important for adaptation and yield of chickpea in different
environments. Apart from being an easily monitored trait, it serves as an indicator for the plant's
successful podding time and maturity. Temperature, photoperiod, amount of moisture in the soil,
altitude, and the latitude of the region in which chickpea is grown cause alteration in this highly
variable trait of flowering. Studies have identified markers associated with flowering genes
(including efl-1 and efl-2) in the cultivated species, Cicer arietinum (ari)[8]–[10]. In this study,
4
we aim to identify many more genes and markers from two wilds (ret and echi) using whole-
genome sequencing of 249 genotypes.
Much of this project is based on data analysis using statistical methods like linear and
nonlinear modeling, analysis of variance (ANOVA), and correlation.
1.4 Non-linear regression models for predicting flowering time
Predicting flowering time is critical to a successful cropping season in flowering plants.
Existing models predicting plant phenotypes require extensive knowledge on the traits for
efficiently predicting them. A large amount of capital goes into designing studies to quantify
these phenotypes. Chapter 3 of the thesis describes a non-linear model that uses a mathematical
framework to predict flowering time in chickpeas using phenotype and existing climate data of
the environment of origin of the genotypes. I am second on the author list as I contributed to the
phenotype analysis, GWAS, regression analysis of flowering time in the dataset that was used to
construct the non-linear model. I have also contributed to the writing of the abstract,
introduction, methods, discussion, conclusion sections of the manuscript.
1.5 Nested Association Mapping (NAM) populations, an experiment to
expand genetic diversity in the cultivars
Understanding the genetic basis of traits critical to domestication is key to performing
any genetic manipulation aimed at improving them in cultivars. Different approaches have been
5
developed to dissect the genetic architecture of traits in plants and each of those methods come
with their advantages and disadvantages. Mapping populations are created by crossing two
genetically diverse populations [11] producing novel recombination blocks for important crop
phenotypes. While this biparental design provides high power to detect quantitative trait loci
(QTLs), it captures a narrow genetic diversity. Despite the shortcomings of a biparental design, it
is widely used [12], [13].
Figure 1.1: Mapping populations and their comparison
Mapping populations and their comparison. ‘x’s in black color denote crosses between parents
and circled ‘x’s denote self-mating until inbred. Dots represent many other individuals in the
population or family [14].
Multi-parent populations are the new age crossing scheme used to detect genetic regions
associated with traits by using a wide array of genetically diverse founders that are crossed to
produce progeny with high recombination. Nested association mapping (NAM) and multi-
founder advanced generation inter-cross (MAGIC) are two types of multi parental population
6
(MPP) design. The systematic crossing maximizes the allelic richness and increases the genetic
diversity in the advanced genotypes.
Populations of recombinant inbred lines (RIL) are generated from one biparental cross
that produces progeny with a mosaic of parental haplotypes. This makes bigger recombination
blocks that limit mapping resolution. NAM is a type of RIL with multiple families sharing one
common parent (shown in black) (Figure 1.1). This design provides improved resolution with a
high number of alleles represented. MAGIC populations are often derived from 8 or 16 parental
lines (we only draw four here for the sake of space; crossing schemes can be more complex than
shown here). Like NAM populations, MAGIC populations have superior resolution and higher
allelic richness in comparison to RIL populations. Association panels represent a snapshot of
natural variation from an existing population with historical recombination events and
accumulated mutations. As a result, they have greater recombination and allelic diversity than
the other three types of populations. Natural populations have an inherent population structure
that can pose difficulty in running the association tests [14].
NAM design was first adopted in maize and has been applied to multiple crops since it
was first introduced. This involves crossing of multiple founders with a single cultivar or inbred
line [15]. This generates thousands of progenies in each of the NAM families which helps
understand the genetic basis of traits but segregation distortion and high levels of recombination
in each family can limit the QTL mapping of the traits.
In chapter 4 of the dissertation, we used NAM design to address two main questions in
chickpeas: first to increase genetic diversity in the cultivated chickpeas by using NAM
populations, and breeding it to an advanced stage, and second, to dissect the genetic basis of
traits critical to domestication.
7
Domestication is known to reduce genetic diversity in cultivated species and cultivars of
chickpeas are estimated to have experienced a loss in genetic diversity of 93%. Populations of
wild relatives of chickpea carry genetic variation that helps them adapt to their native
environment[16]. Wettberg et.al.[2], suggested the introgression of the wild sister species from
the Cicer genus into the cultivated, to bring back this lost diversity and breed climate adaptable
cultivars. To gather this natural diversity, Wettberg et.al., collected wild relatives of the Cicer
genus, from the ‘fertile crescent’ (former Mesopotamia: source of several prominent crop
species, including chickpea) now in Turkey. This collection assembled a germplasm that
captured genetic variation caused by a range of climatic factors like drought, heat, and cold stress
[2].
To expand the genetic diversity in the cultivated chickpea while maintaining its desired
agronomic traits, it's important to first understand the genetic basis of those traits. Multi parental
population approaches have been adopted to understand the genetic basis of important
phenotypic traits. In chapter 4, I discuss, Nested Association Mapping (NAM) experiment, a
multi parental population created by crossing one cultivated (ari) with 19 different wild (ret)
parents, to understand the genetic basis of seminal domestication traits in chickpea.
1.6 GWAS and Genome Selection in Kelp
Macrocystis pyrifera, commonly known as giant kelp, was considered the foundation
species by Darwin [4] also considered an ecological engineer that increases both species richness
and food-web complexity. All the giant kelp grown today is in its wild form and there is a need
to domesticate the giant kelp. In chapter 5, I set a framework for domesticating giant kelp using
8
GWAS and GS methods. Numerous putative loci have been identified to be associated with plant
biomass and chlorophyll content in the giant kelp population.
9
Chapter 2: Assessing effects of genetics and environment on flowering
time in wild Cicer across different environments
Anupam Singh
1
, Peter L. Chang
1,2
, Bilal Aydin
3
, Ahmet Cakmak
3
, Noelia Carrasquila-Garcia
2
,
Christiane Ludwig
4
, Kelley Whisson
4
, Abdulkadir Aydogan
5
, Abdullah Kahraman
3
, Jens
Berger
4
, Douglas R. Cook
2
, Sergey V. Nuzhdin
1
1
Department of Biological Sciences, University of Southern California, Los Angeles, CA, 90089,
USA,
2
Department of Plant Pathology, University of California, Davis, CA, 95616, USA,
3
Department of Field Crops, Faculty of Agriculture, Harran University, Şanlıurfa, Turkey,
4
Commonwealth Scientific and Industrial Research Organization (CSIRO), Agriculture and
Food, Perth, WA, Australia,
5
Central Research Institute for Field Crops (CRIFC), Ankara,
Turkey. In prep
2.1 Introduction
Adaptation of commercial crops to the rapidly changing climate is of major importance
[17]. Chickpea (Cicer (C.) arietinum L.), ranks second in the most cultivated commercial grain
legume crop and belongs to the family of Fabaceae (ICARDA). It has 20% higher protein
content is consumed in many parts of the world[18][19]. Its domestication is believed to have
happened in Southeastern Turkey [20] and since then it has been adapted to various
environmental and climatic conditions across the globe from subtropical conditions in South
Asia and East Africa to Northern regions of temperate North America. The time duration for
chickpea to reach its reproductive phase is often limited by changing temperatures, rainfall
patterns, day length, or competition for space by other crops in rotation [3], [21]. The beginning
of the reproductive phase in plants is determined by the onset of flowering. Flowering time
(FTM) is critical to the plant life cycle and varies in response to seasonal cues, making it a highly
plastic phenotype [8]. Natural selection can sync FTM to favorable environmental conditions
making it adaptable to heterogeneous landscapes [22].
10
Crop duration and FTM are the most important trait for the adaptation of chickpeas to
any given environment. Vernalization (exposure to cold) and photoperiod (day length) are the
two main environmental factors that regulate and ensure that flowering occurs when seasonal
conditions are most likely to be conducive to seed set and maturation [23]. Historical
domestication (~10000 years ago) in chickpeas has resulted in reduced vernalization response
and increased photoperiod sensitivity enabling the expansion of domestic chickpea cultivation
into regions with damp winters and shorter growing seasons. This has enabled spring sowing,
permitting the entire plant life cycle to be completed before the end of summer and thus avoiding
pathogenic infections like ascochyta blight. In the cultivated chickpea, domestication, coupled
with widespread adoption, and changes in modern cropping season has depleted the genetic
diversity extensively[9], [24], [25].
This study describes the flowering time phenotype data collected in Turkey and Australia
under five different treatment conditions in three years and the results for a genome-wide
association study that identified putative locis associated with it. Seeds were planted in Turkey
and Australia under five different treatment conditions in three years. On testing the significance
of variables associated with the experimental design, we found the site, year, and treatment to be
important factors that affect the time to fir time to first flowering. In a genome-wide association
analysis for flowering time in the Mt. Barker site of Australia, we observe numerous hits,
identifying potential markers associated with flowering time in the wild Cicer ancestry of
chickpea.
11
2.2 Methods
2.2.1 Dataset of wild chickpea accessions
The flowering time phenotype is collected for two wild chickpea species ret and echi.
Accessions were collected at 21 sites in five regions in Turkey by von Wettberg et al [2]. These
wild accessions were planted in climatically distinct sites in Turkey (Sanliurfa and Ankara,
autumn and spring-sowing) and Australia (Floreat, near Perth, WA, and Mt.Barker, WA). Being
grown in contrasting environments the phenotype data on time to flowering is highly diverse.
2.2.2 Experimental design and phenotype collection
Flowering time in the wild Cicer was phenotyped by GRDC (Australian agricultural
funder) in two climatically distinct sites in Turkey (Sanliurfa and Ankara) and two sites in
Western Australia (Floreat, near Perth and Mt. Barker) over two seasons. We have sequencing
data available for 228 of the 253 phenotyped wild samples. The genotypes were assigned to
replicated blocks; in two planting times at each site, to examine differences in responsiveness to
seasonal cues such as temperature and day length across these sites. The Floreat site in Australia
also included a well-watered and water-restricted (drought) treatment. In Turkey, Sanliurfa is
warmer on average than native range sites as it is at a lower elevation, while Ankara has a much
more temperate climate than the native range, with colder winters and cooler, wetter summers.
The two Australian sites have a Mediterranean climate but differ in day length and seasonal
temperatures, and Mt Barker is located much farther South than Floreat.
12
2.2.3 Statistical analysis
Principal component analysis was performed using the SNPRelate package in R [26].
Pairwise correlation analysis between the sites was performed using PerformanceAnalytics
package [27] implemented in R.
2.2.4 GWAS
GWAS was performed using the Fixed and random model Circulating Probability
Unification (FarmCPU) method implemented in R[28], [29] .
The FARMCPU method is a Multiple Loci Linear Mixed Model (MLMM) divided into
two parts: Fixed Effect Model (FEM) and a Random Effect Model (REM) and they are used
iteratively. In FEM markers are tested one at a time, and false positives are controlled by using
multiple associated markers as covariates. The overfitting problem in FEM is circumvented in
REM by defining the kinship using the associated markers. At each iteration, the P values are
unified for both testing markers and the associated markers.
In FEM, each of the tests markers are tested one at a time and a set of pseudo
Quantitative Trait Nucleotides (QTNs) are used as covariates. The model can be written as:
(1)
In Equation 1, yi is the phenotype observation of the ith individual; Mi1, Mi2,…, Mit
represent the genotypes of t pseudpseudo-QTNso QTNs, initiated as an empty set; b1, b2, …, bj
13
are the effects of the corresponding pseudo QTNs; Sij is the genotype of the ith individual and
jth genetic marker; the corresponding effect of the jth genetic marker is dj; ei represents the
residuals having a distribution with zero mean and variance of .
In equation 1, each of the markers being tested receives a P value except for those that are
designated as pseudo QTNs are used as covariates. In the beginning, these pseudo QTN markers
are assigned P value of “NA" (Not Available). When each pseudo QTN is examined for each of
the testing markers, most significant P value replaces the “NA” for that pseudo QTN and it
becomes the P value of its corresponding marker.
The above step generates a P value for every marker. The P values and the associated
marker map are used to update the selection of pseudo QTNs by using the SUPER algorithm[24]
in a REM as follow:
(2)
where yi and ei are same as in Eq (1) and ui is the total genetic effect of the i
th
individual.
The expectations of total genetic effects for the individuals’ are zeros. The variance and
covariance matrix of the individuals’ total genetic effects is , where unknown
genetic variance is and kinship obtained from the pseudo QTNs K .
The pseudo QTNs in the FEM, Eq (1) are replaced by the set of pseudo QTNs that
maximizes the likelihood of the REM, Eq (2). When no change occurs in the estimated set of
pseudo QTNs occurs, this iteration stops [29].
Population structure in Australia was handled by using 4 PCs whereas for the genotypes
in turkey, the GWAS hits have a better model fit when PCs are not considered in the model.
14
The genome-wide significance threshold for the P values for each of the markers was
calculated using alpha = 0.01 and multiple tests correction was performed using the Bonferroni
correction.
2.3 Results
2.3.1 Phenotype analysis
FTM (phenotype) was collected for 191 genotypes of rets by GRDC (Australian
agricultural funder) in two climatically distinct sites in Turkey (Sanliurfa and Ankara) and two
sites in Western Australia (Floreat (Flor), near Perth and Mt. Barker (MB)) over two seasons.
Table 1 shows the study design of phenotype in these sites. Each genotype was assigned to
replicated blocks; in two planting times at each site, to examine differences in responsiveness to
seasonal cues such as temperature and day length across these sites. Flor site in Australia also
included a well-watered (ww) and water-restricted (drought) treatment. In Turkey, Sanliurfa
(urfa) is warmer on average than native range sites as it is at a lower elevation, while Ankara
(Ank) has a much more temperate climate than the native range, with colder winters and cooler,
wetter summers. The two Australian sites have a Mediterranean climate but differ in day length
and seasonal temperatures, and MB is located much farther South than Flor. Figure 2.1 is the
boxplot of averaged FTM in each site across all years. On plotting the flowering time across all
environments (Figure 2.2), we see that plants grown on higher elevations (like Mt. Barker and
Ankara) flower later when compared to plants that are grown in lower elevations. Figure 2.3
15
shows the results of pairwise Pearson correlation of FTM in all environments (environment =
Site+year+treatment, example: Ank_15_TOS2). The FTMs between each environment are
significantly correlated except for one environment, Ank in the year 2015 (Ank_15_TOS2 = 2
genotypes) due to less representation of the genotypes.
Table 2.1: Study design of FTM data collection
The FTM data study design is shown in the table. The column country shows the two countries
FTM was collected. Site_yr: is the site in the two countries in three different years (Example:
Ank_14 is site Ankara in 2014 ). The treatment column shows the type of soil condition in which
the plants were grown. Sow date is the date in which the seeds
16
Figure 2.1: Distribution of averaged FTM across all sites. Y-axis is the number of days of FTM
and on the X-axis are all the sites where the plants were grown. Average flowering time in plants
sown in winter is higher compared to those that were sown in Spring and summertime
Figure 2.2: Distribution of averaged FTM across all sites. The average of FTM in each site
across all years and treatments is shown in the figure. Y-axis is the number of days of FTM and
on the X-axis are all the sites where the plants were grown. Plants were grown in higher
elevation (like Mt. Barker and Ankara) flower later when compared to plants in lower elevation.
17
Figure 2.3: Pairwise spearman correlation of FTM in all sites of Turkey and Australia. Pairwise
Pearson correlation of FTM across all environments ((environment = Site + year + treatment,
example: Ank_15_TOS2) of data collection using R package (PerformanceAnalytics).
Significances associated with p-values( ***=0.001, **=0.05, *=0.1)
18
2.3.2 ANOVA results
Considering the replicated block design we looked at the effect of each site, year, and
country in which the genotypes were planted, on the phenotype(flowering) and all of these
factors were significant. We analyzed the phenotype distribution and the correlation of
phenotypes at each site. The phenotypes at each site across different years seem to be tightly
correlated except for site Ankara in Turkey. Ankara treatments in Ankara in the year 2015 have
less representation of the genotypes (Ank_15_TOS2 = 2 genotypes). Since the genotypes were
planted in different environments, we tested the effect of G(genotype) and GXE (genotype by
environment) effect on the flowering time in different years.
ANOVA results for G and GXE are:
Analysis of Variance Table
Response: Y
Df Sum Sq Mean Sq F value Pr(>F)
ENV 11 5902545 536595 7521.0661 < 2.2e-16 ***
REP(ENV) 28 1998 71 5.8212 < 2.2e-16 ***
GEN 191 66714 349 28.4989 < 2.2e-16 ***
ENV:GEN 903 39121 43 3.5348 < 2.2e-16 ***
Residuals 2451 30040 12
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
2.3.3 Principal Component Analysis
Whole-genome sequencing data is available for 140 of the 191 phenotyped individuals.
Figure 2.4 shows the results of the principal component analysis (PCA) of all the individuals of
19
the rets. The population structure between genotypes in ret species is high due to the high genetic
diversity between the individuals.
Figure 2.4: Principal Component Analysis (PCA) of ret. Component analysis of all the ret
genotypes is shown in the figure. Y-axis is PC 2 values and X-axis is PC 1 values. Each blue dot
in the graph is a genotype. The separate clustering of genotypes indicates a high population
structure between genotypes.
2.3.4 GWAS results
For GWAS, a mixed linear model (MLM) was fit to FTM (response variable) collected in
each environment separately, genomic markers (predictors) and principal components (PCs as
covariates) using GAPIT. Figure 5 and 6 shows the Manhattan plot and Quantile-Quantile (QQ)
plot for the association of FTM in the Urfa site of Turkey in two seasons. There are putative
20
genomic markers on chromosome 4, from the GWAS shown in Figure 5, follow-up study will be
done to confirm their position and functional consequence (if any) in the genome.
Figure 2.5: Manhattan plot and QQ plot showing the GWAS of FTM in Urfa_2014_TOS1 site of
Turkey. Mixed model results of GWAS for FTM in Urfa_2014_TOS1. Manhattan plot: X-axis is
the chromosomes in the genome of rets. Y-axis is the -log(P) values associated with each
genomic marker in the mixed model. QQplot: X-axis is the expected -lop(p) values of a null
model and Y-axis is the -log(P) associated with the genetic markers. There is a clean peak of
genomic markers associated with FTM on chromosome 4 in this model.
Figure 2.6: Manhattan plot and QQ plot showing the GWAS of FTM in Urfa_2014_TOS2 site of
Turkey. Mixed model results of GWAS for FTM in Urfa_2014_TOS2. Manhattan plot: X-axis is
the chromosomes in the genome of rets. Y-axis is the -log(P) values associated with each
genomic marker in the mixed model. QQplot: X-axis is the expected -lop(p) values of a null
model and Y-axis is the -log(P) associated with the genetic markers. There is no significant
association between genomic markers and FTM in this model.
21
2.4 Discussion
FTM is critical for deciding fertility and yield in plants. Important abiotic factors
affecting flowering in plants are photoperiod, temperature, moisture, and vernalization. Genetic
composition of the plants is key to determining their time of flowering. Domestication in
chickpeas has lead to a reduced response to vernalization, and shorter flowering time. Wild
relatives of chickpeas show a varying degree of FTM based on the geography of the area in
which it's grown, hinting at the strong gene by environment interaction. This study aimed to
identify genetic locis that govern FTM in the wild Cicer and decouple the effect of environment
on the trait. While we have been successful in looking at the data and establishing a clear effect
of environmental effect on the trait, GWAS has not helped us identify the causal loci associated
with FTM in the wild Cicer.
The dataset on hand is rich because it is collected across varying environments across
multiple years, this study lacks power for a GWAS for three main reasons. First, the genotypes
used in the GWAS are highly diverse. Wonvettberg et al. Showed that the wild rets could be
stratified into separate genotypic structural groups. Handling this populational structure, given
the total sample size, is difficult. Second, the number of observations for the dependent variable
(FTM) in the model is low and the genotype data available is very dense as the sequencing
platform used is whole genome sequencing (WGS). This makes the GWAS analysis highly
underpowered. Third, the experimental design is incomplete. The genotypes used in one field site
are not replicated fully in the consecutive year and site across all the environments.
22
2.5 Conclusion
Domestication of a crop is heavily dependent on its ability to adapt to its environment of
cultivation. Apart from traits of agronomic interest, FTM holds great value. Delineating the
genetic pathway and the genes of prime importance in FTM of wild chickpeas can be immensely
helpful in understanding the genetic architecture of the trait. In this study, we found that there is
a significant amount of gene by environment interaction in the FTM of wild Cicer. There is a
statistically significant pairwise correlation between sites across different years. There is an
observed trend in FTM in plants that were sown at higher altitudes vs lower. Plants that were
grown on higher elevation flowered later than those that are planted in the lower elevation.
Similarly, plants that were sown in winter flowered later than those that were sown in spring or
summer.
23
Chapter 3: Non-linear regression models for time to flowering in wild
chickpea combine genetic and climatic factors [30]
3.1 Background
Chickpea (Cicer arietinum L.), is the second most cultivated grain legume crop, grown in
more than 50 countries of the world (ICARDA). Chickpea, which was originally domesticated in
Southeastern Turkey, has been adapted to various environmental and climatic conditions across
the globe from subtropical conditions in South Asia and East Africa to Northern regions of
temperate North America. The time duration for chickpea to reach its reproductive phase is often
limited by changing temperatures, rainfall pattern, daylength or competition for use of land by
other crops in rotation [31], [32]. In Mediterranean and temperate regions, chickpeas are sown in
spring where the day length and temperature increase towards the reproductive period, while in
subtropical regions (Center and Southern India, Ethiopia, Queensland Australia) it is planted in
the start of the dry season after the monsoonal rainy season when daylengths tend to be shorter
and temperatures cooler. In the more temperate northern parts of India, the reproductive phase of
spring-planted chickpea coincides with decreasing temperature and day length, whereas in the
southern and the central parts of the country it falls within terminal drought (the end of the dry
season) [21], [33]. Hence, chickpea breeding has focused on developing varieties differing in
their growth duration to be able to adapt to different latitudes and sowing regimes [21], [23],
[34]. To achieve consistent yield, crop duration must closely match the available growing season
[35]. Chickpea cultivars and landraces become increasingly temperature responsive as from the
Mediterranean through northern, central and southern India, because these disparate origins have
selected for contrasting phenological regulators [21]. This information is invaluable for modeling
24
crop performance. For example, Vadez et al. (2012, 2013) [36]–[38] considered climatic factors
like expected rainfall to predict performance of chickpeas in different geographical locations.
Several successful plant models like SSM [37], [39], DSSAT [40]–[44], APSIM [45] and
others [46], [47] have been developed for legumes. These models use differential equations to
describe biophysical and biochemical processes like photosynthesis, water uptake etc. and
account for impact of genotype, soil, weather and economic factors. The influence of weather
conditions is assessed using concepts like Heat Unit Index (HUI) [47], Crop Heat Units (CHI),
Degree Days (DD), Biological Days (BD) [36] – all of them quantitatively characterizing the rate
of progression to the next phenological phase on a daily basis. Both DD and BD could depend on
temperature, water content and photoperiod. This formalism was applied to develop individual
models for important crops. For example, DSSAT was used to simulate growth and yield in
soybean [48] and chickpea [49] among several other crops [50]–[53]. For more than three
decades these models have been applied in research projects of different countries. The SSM
model was successfully tested using independent data from a wide range of growth [37], [39] and
environmental conditions including Iran [54] and water deficit in India [38]. Considerable
manipulations are required to adapt the DSSAT model to new environments and cultivars [55]–
[59], limiting the utilization of these models. As varieties are constantly changing because of
new releases that can cope with emerging pathogens and pests, as well as shifting consumer
demands, the need for flexible models that can adjust to new varieties is high.
In an era of rapidly advancing genomic technologies and approaches, updated modeling
approaches that can be tailored to genotype-specific effects are essential. Next generation
sequencing and high throughput genotyping lead to identification of thousands of molecular
markers (SSR, SNP, STMS, ESTs, CISP, DArT) [60] making it possible to construct chickpea
25
genetic maps [61], [62] and ultimately to dissect the effect of different loci on key traits like
flowering time. A combination of Sanger, 454/FLX and Illumina reads have been used to
generate in transcriptome and genome assemblies for chickpea [61], [63]–[65].
Due to these advances in sequencing technologies and data acquisition, the genome-wide
association study (GWAS) has become an important approach to understand the genetics of
natural variation and traits of agricultural importance. Recent examples of GWAS in
agriculturally important plants include identification of photoperiodic flowering time genes in
sorghum [60], frost tolerance genes in barley (Hordeum vulgare L.) [61]; leaf architecture [62]
and resistance to southern leaf blight genes in maize [63] as well as several agricultural traits in
rice[65], to name a few. To extend GWAS to the analysis of genotype-by-environment (G×E)
interactions bioclimatic variables can be used as a GWAS phenotype. Association between
bioclimatic variables at a site of an accession’s origin and SNPs can indicate climatic adaptation
[66]. While GWAS is a good method to identify genomic regions associated with important
traits, typical GWAS designs require controlled planting of replicated accessions. This can
quickly become logistically daunting and expensive across many sites.
Crop models may complement GWAS approaches by accounting for the influence of
environmental factors[43]. However the models developed in the pre-genomic era considered
genotype influence at best as a set of given “genetic coefficients” that do not correspond to actual
genes [67]. Therefore these models were unable to simulate gene-by-environment interactions,
thereby limiting their utility in predicting phenological characteristics of cultivars across
different geographical locations and genotypes [36]. Mathematical models and tools that
combine genetic and climate data to predict agronomic traits will greatly benefit breeders by
26
simulating the performance of any given well-characterized genotype in any given well-
characterized environment [68], [69].
While some of the current crop models consider the influence of local environmental
conditions and others global climate changes for locally grown varieties, here, we built a new
model using the flowering time of two species of wild chickpeas (Cicer reticulatum L. and C.
echinospermum) collected at 21 different sites in Turkey and grown in 4 distinct environmental
conditions. We further use our model to test two important hypotheses. Firstly, we propose that
besides local environmental factors, chickpea phenology may be strongly predicted by accession
geographic origin. Secondly, as the adaptation to specific environments is blueprinted in
genomes, the effects of genes on flowering time may be conditioned on environmental factors.
We check these hypotheses by statistical modeling of chickpea responses to climate change
scenarios conditional on geographic site of origin and genotype.
3.2 Materials and methods
3.2.1 Dataset of wild chickpea accessions
The dataset consists of wild chickpea (Cicer reticulatum L. and Cicer echinospermum).
Accessions were collected at 21 sites in five regions in Turkey by von Wettberg et al. [2]. These
wild accessions were planted in climatically distinct sites in Turkey (Sanliurfa and Ankara,
autumn and spring-sowing) and Australia (Floreat, near Perth, WA and Mt.Barker, WA). Being
grown in contrasting environments the phenotype data on time to flowering is highly diverse.
The distribution of time to flowering for the whole dataset is shown in Figure 3.1. The time to
27
flowering ranges from 64 to 221 day. Details on the phenotyping experiment and its subsequent
analysis will be presented in the future manuscripts (Berger, J.: Analysis of phenotyping of wild
chickpea in diverse environments, in preparation).
28
Figure 3.1:Distribution of time to flowering for the whole dataset. The range for time to
flowering is from 64 to 221 day
29
Climatic data were downloaded from NNDC Climate Data on-line [70]. The summary of
agroclimatic factors as well as results of testing their correlation with flowering time are given in
Additional file 1: Table S2 and S3, respectively.
A companion paper studying the genetic association of flowering time in one of the wild
chickpeas (Cicer reticulatum L.) has identified six suggestive polymorphic sites associated with
flowering time (Singh, A.: Genome-wide association studies in wild chickpea, in preparation).
These SNPs were identified as the best SNPs after running a mixed linear model (MLM) in
TASSEL, which associated flowering time (phenotype) with the genotypes using
site/year/season as a factor to account for their effect on phenotype. Additional file 1: Table S6
presents number of times the reference allele for a SNP associated with flowering time is present
in plant genotypes. To access genotype-environment interactions we group plants into 18 groups
– one for each alternative (ALT) and reference (REF) allele combination (ALT/ALT, REF/ALT
and REF/REF) – for each SNP and built a model (1) for each group.
30
3.2.2 Regression model for time to flowering
We model a time period from sowing to flowering as a linear combination of N control
functions Fn, n=0,…,N−1 of agroclimatic factors. Thus, the model takes the form (1)
𝑦 𝑖 = 𝛽 0+∑ 𝑛 =0𝑁 −1𝛽 𝑛 +1⋅ 𝐹 𝑛 ( 𝐗 𝑖 )+ 𝜀 𝑖 𝑖 =0,…, 𝐼 −1yi= β0+ ∑ n= 0N − 1 β n + 1⋅Fn ( X i ) + εii=0,
… ,I − 1
(1)
where yi is modeled phenotype (time from sowing to flowering) for each plant i from a
group of the size I, βn are coefficients, n=0,…,N, that are to be found to minimize the
discrepancy between data and model, Xi is a vector of agroclimatic factors and εi is a standard
error. The number of coefficients is N+1 because β0 is an intercept.
In comparison with previous models in our approach control functions Fn are
automatically composed in analytic form from the expressions of climatic factors. Thus, a wider
range of non-linear dependencies between the phenotype and factors is explored (see “Analytic
form of control function” section).
To study the adaptation to environment of origin we represent collection sites as L=21
binary variables, where l=1,…,L enumerates locations: Baristepe1, Baristepe2, Baristepe3,
Beslever, Cermik, Cudi, Cudi2, Dereici, Destek, Egil, Gunasan, Kalkan, Karabahce, Kayatepe,
Kesentas, Ortanca, Oyali, Sarikaya, Savur1, Sirnak1, Siv-Diyar (see column 1 in Additional file
1: Table S1). For each plant enumerated with i=0,…,I−1 one of the L variables 𝑑 𝑙𝑖 dil takes the
value ’1’ to indicate collection site and others are ’0’. The interaction between control function
and location is modeled by an additional term in the regression function that has the form of a
31
weighted sum of N·L pairwise products of control functions Fn and each binary site variable
𝑑 𝑙𝑖 dil.
Consequently, a model with information about a collection site takes the form (2).
𝑦 𝑖 = 𝛽 0+∑ 𝑛 =0𝑁 −1𝛽 𝑛 +1⋅ 𝐹 𝑛 ( 𝐗 𝑖 )+∑ 𝑛 =0𝑁 −1∑ 𝑙 =1𝐿 𝜁 𝑙 ⋅ 𝑁 + 𝑛 ⋅ 𝐹 𝑛 ( 𝐗 𝑖 ) ⋅ 𝑑 𝑙𝑖 + 𝜀 𝑖 yi= β0+ ∑ n = 0N − 1 β
n+ 1⋅Fn ( X i) + ∑ n= 0N − 1 ∑ l = 1Lζ l ⋅ N + n⋅ Fn( X i ) ⋅ dil + εi
(2)
where in addition to notations used in (1) new regression coefficients ζl·N+n define the
influence of function Fn of climatic factors on phenotype of plants collected at site l so that
condition ζl·N+n≠0 points on plant adaptation to the site. As a result, this model makes it
possible to regress a range of climatic variables describing the phenotyping site (e.g. day length,
temperature, precipitation etc.) independently for each of our 21 collection sites.
We denote K number of SNP and J=3 combinations of alternative (ALT) and reference
(REF) alleles ALT/ALT, ALT/REF and REF/REF by 0, 1, and 2, respectively. Then to include
GWAS results into the model we define J·K groups of plants so that members of the same group
have the same combination of alleles in one of the SNP positions. Thus we define a matrix D
with the number of rows equal to the number of plants I and J·K columns. Then, the elements of
matrix D are defined by (3). Thus, the form of the regression function adapts to the allele
combination of a plant by changing the weights of control functions.
𝑑 3𝑘 + 𝑗𝑖 ={10 if in plant 𝑖 the combination for SNP 𝑘 is 𝑗 otherwisedi3k+j={1 if
in plant i the combination for SNP k is j0 otherwise
(3)
Consequently, a model with genetic information takes the form (4).
𝑦 𝑖 = 𝛽 0+∑ 𝑛 =0𝑁 −1𝛽 𝑛 +1⋅ 𝐹 𝑛 ( 𝐗 𝑖 )+∑ 𝑛 =0𝑁 −1∑ 𝑘 =0𝐾 −1∑ 𝑗 =0𝐽 −1𝜌 (3𝑘 + 𝑗 ) 𝑁 + 𝑛 ⋅ 𝐹 𝑛 ( 𝐗 𝑖 ) ⋅ 𝑑 3𝑘 + 𝑗𝑖 + 𝜀 𝑖 yi=
β0+ ∑ n= 0N − 1βn+ 1⋅F n( X i) + ∑ n= 0N − 1∑ k = 0K− 1∑ j = 0J − 1ρ ( 3 k + j ) N + n⋅ Fn ( X i) ⋅
di3k + j + εi
(4)
where in addition to notations used in (1) new regression coefficients ρ(3k+j)N+n, define
the effect of genotype-by-climatic factor interaction.
32
3.2.3 Analytic form of control function
In previous studies different forms of dependencies between phenotype and climatic
factors have been considered [71]–[76]. For example, “segmented”, “beta”, “quadratic” and
“dent-like” functions were considered in [37]. A product of quadratic functions of day length and
mean temperature was used in iterative regression analysis (IRA) [77] to characterize a
developmental speed per day. An interphase speed was calculated as a product of the effects of
day length, water deficit and temperature in [78].
We propose a more general approach, in which the analytic form of a control function
together with regression coefficients and a set of predictors are inferred automatically by
stochastic minimization of the deviation of the model output from data. We use a combination of
Grammatical Evolution (GE) [79], [80], LASSO [81] and Differential Evolution Entirely Parallel
(DEEP) [82], [83] method to recover analytic form of Fn, find regression coefficients and
determine the set of climatic factors, respectively [84]. Differential Evolution was proposed by
Storn and Price in 1995 [85] as a heuristic stochastic optimization method. DEEP was developed
by us for application in the field of bioinformatics [83]. It includes several recently proposed
enhancements [83], [86].
In GE, the analytic function form is built by decoding the sequence called “word” of L
integers called codons. Decoding is performed according to simple rules of substitution that
establish a correspondence between codons and either an elementary arithmetic operation: ‘+‘, ‘-
‘, ‘*‘, ‘/‘, or expression: X, (X - Const), 1/(X - Const), where X is a name of a predictor and
Constis some constant number. To make estimation of regression coefficients with LASSO
33
method reliable we performed 4-fold cross-validation at this stage so that the model was build
using 75% of samples for training and the rest 25% was used for evaluation of the model.
3.2.4 Statistical tests
We used standard statistical techniques for hypothesis testing implemented in R system
for statistical computing [87]. We used multiple-way analysis of variance (MANOVA) with the
Pillai test statistic [88] and ANOVA with the Fisher test statistic to check for significance in the
difference of effects of climatic factors on phenotype between locations and genotypes. For
pairwise comparison of the influence of climatic factors on phenotype between genotypes and
locations we applied the Wilcoxon-Mann-Whitney test.
The Spearman’s rank correlation was used to estimate correlation between allele
frequencies and climatic factors at primary collection sites (geographic sites of origin).
34
3.2.5 Software tools
Although a few Grammatical Evolution (GE) implementations are freely available (see
e.g. [80], [89]) they either lack a specific set of expressions or show low performance in
experimental runs due to interpreted language (data not shown). Consequently a decision was
made to implement GE in C++ using Armadillo [90], mlpack [91], HDF5 [92], HighFive [93]
and Qt [94] as these packages provide efficient matrix operations, the LASSO method, data
input-output and utility functions, respectively. The code is open-source on GitLab [95].
TASSEL (Trait Analysis by aSSociation, Evolution and Linkage) [96] was developed in
Java, and is compatible with multiple operating systems (Windows, Linux and Mac OS).
TASSEL can implement several different GWAS models like general linear model (GLM) and
MLM using a GUI or command line version of the software.
3.3 Results
We first performed ANOVA test to check for differences in mean time to flowering
between accessions collected at different sites in Turkey to demonstrate that flowering time is an
adaptive trait in chickpea. The difference in means was significant with criterion value F=2.003
and p=0.005<0.05.
35
3.3.1 Model with interactions between climatic factors and locations
Next, to estimate the effect of interaction between climatic factors at phenotyping sites
and sampling locations in Turkey on flowering time we built a model (2). A series of numerical
experiments were performed, as several runs are needed to obtain a reliable solution with
stochastic optimization. By several trial-and-error attempts (data not shown) it was established
that the number of control functions N=12 and the length of the “word” L=5 were the best
parameters for the model. The population size for DEEP was set to 500.
We obtained several solutions with coefficient of determination >0.85 and different
analytic forms of the control functions (data not shown). We selected the model (5) as it reflects
the influence of effects of day length, temperature, humidity and precipitation in the phenotyping
environment and has coefficient of determination R2=0.97.
𝚃𝚃𝙵 =59.49+74.95 𝐷 𝑚 𝑖 𝑛𝑥 10+19.83/( 𝑇 𝑚 𝑖 𝑛𝑥 5−0.03)−1.98 𝑃 𝑚 𝑒 𝑎 𝑛𝑥 10−53.18( 𝐷 𝑚 𝑒 𝑎 𝑛𝑥 50
+1/( 𝑈 𝑚 𝑒 𝑎 𝑛𝑥 10−15−23.31))−13.04 𝐷 𝑚 𝑒 𝑎 𝑛𝑥 10−15−(0.05 ⋅ 𝙱𝚊𝚛𝚒𝚜𝚝𝚎𝚙 𝚎𝟷 +0.12 ⋅ 𝙱𝚊𝚛𝚒𝚜
𝚝𝚎𝚙𝚎𝟹 +0.29 ⋅ 𝙱𝚎 𝚜𝚕 𝚎𝚟 𝚎 𝚛 +0.31 ⋅ 𝙳𝚎 𝚛𝚎 𝚒𝚌 𝚒 +0.45 ⋅ 𝙺𝚊 𝚢𝚊 𝚝𝚎 𝚙𝚎 +0.03 ⋅ 𝙺𝚎𝚜𝚎𝚗𝚝𝚊𝚜
+0.74 ⋅ 𝚂𝚒𝚟 − 𝙳 𝚒𝚢 𝚊𝚛 +0.01 ⋅ 𝚂𝚊 𝚛𝚒 𝚔𝚊 𝚢𝚊 +0.10 ⋅ 𝚂𝚒𝚛𝚗𝚊𝚔𝟷 +0.20 ⋅ 𝙾𝚢𝚊𝚕𝚒 ) ⋅( 𝐷 𝑚 𝑒 𝑎 𝑛𝑥 5
0+1/( 𝑈 𝑚 𝑒 𝑎 𝑛𝑥 10−15−23.31))+(0.03 ⋅ 𝙲𝚞𝚍𝚒𝟸 +0.16 ⋅ 𝙳𝚎𝚜𝚝𝚎𝚔 +0.09 ⋅ 𝙶𝚞 𝚗𝚊 𝚜𝚊 𝚗 +0.46 ⋅
𝙺𝚎𝚜𝚎𝚗𝚝𝚊𝚜 +0.43 ⋅ 𝙾𝚢 𝚊𝚕 𝚒 +0.28 ⋅ 𝚂𝚒 𝚛 𝚗𝚊 𝚔𝟷 ) ⋅ 𝐷 𝑚 𝑖 𝑛𝑥 10+(0.41 ⋅ 𝙲𝚞𝚍𝚒 +0.05 ⋅ 𝙺𝚊 𝚛𝚊 𝚋 𝚊𝚑𝚌𝚎 −0.003 ⋅ 𝙺𝚎 𝚜𝚎 𝚗𝚝 𝚊𝚜 +0.02 ⋅ 𝙾𝚢 𝚊 𝚕𝚒 −0.12 ⋅ 𝚂𝚒𝚛𝚗𝚊𝚔𝟷 ) ⋅ 𝐷 𝑚 𝑒 𝑎 𝑛𝑥 10−15,TTF=59.
49 + 74 .95D x10min+ 19 .83/ ( Tx5min− 0.03 ) − 1.98 P x10me a n− 53 .18( D x50me a n
+ 1/ ( Ux10− 15 me a n− 23 .31) ) − 13 .04D x10− 15 me a n− ( 0.05 ⋅ B a r is t e pe 1+ 0.12 ⋅ Ba r
is t e pe 3+ 0.29 ⋅ B e s l e v e r + 0.31 ⋅ D e r e ic i+ 0.45 ⋅ Ka y a t e pe + 0.03 ⋅ K e s e n t a s + 0.74 ⋅ S iv −
Diy a r + 0.01 ⋅ S a r i k a y a + 0.10 ⋅ S ir n a k 1+ 0.20 ⋅ O y a l i) ⋅ ( D x50me a n+ 1/ ( Ux10− 15 me a n
− 23 .31) ) + ( 0.03 ⋅ C u di2+ 0.16 ⋅ D e s t e k + 0.09 ⋅ Gu na s a n+ 0.46 ⋅ K e s e n t a s + 0.43 ⋅ O y a l i
+ 0.28 ⋅ S ir na k 1 ) ⋅ D x10min+ ( 0.41 ⋅ C u di+ 0.05 ⋅ Ka r a b a h c e − 0.00 3⋅Ke s e n t a s + 0.02 ⋅ O
y a l i− 0.12 ⋅ S ir na k 1) ⋅ D x10− 15 me a n,
(5)
where 𝐷 𝑚 𝑖 𝑛 𝑥 10Dx10min, 𝐷 𝑚 𝑒 𝑎 𝑛 𝑥 50Dx50mean, 𝐷 𝑚 𝑒 𝑎 𝑛 𝑥 10−15 D x 10− 15m e an denote
minimum day length over 10 days after sowing, mean day length over a period of 50 days and
36
from 10 to 15 days after sowing, respectively, 𝑇 𝑚 𝑖 𝑛 𝑥 5Tx5min denotes minimum temperature
over 5 days after sowing, 𝑈𝑚 𝑒 𝑎 𝑛 𝑥 10−15 U x 10− 15m e an denotes mean relative humidity over an
interval from 10 to 15 days after sowing and 𝑃 𝑚 𝑒 𝑎 𝑛 𝑥 10Px10mean denotes average precipitation
over 10 days after sowing.
The analysis of relative difference in the sum of squares for a model with and without a
term describing an interaction between climatic factor at the phenotyping site and the accession
geographic site of origin allows us to conclude that sampling collection site-by-phenotyping
environment interaction accounts for about 14.7% of variation in time period from sowing to
flowering.
We found that day length-by-collection site interaction is important for locations
Baristepe3, Cudi, Cudi2, Destek, Gunasan, Karabahce, and both day length and humidity-by-
collection site interaction are important for Baristepe1, Beslever, Dereici, Kayatepe, Kesentas,
Oyali, Siv-Diyar, Sarikaya, and Sirnak1 sampling sites. There were no interactions between
climatic factors and collection sites in Baristepe2, Cermik, Egil, Kalkan, Ortanca and Savur1.
3.3.2 Basic flowering time models for locations
To analyze how climatic factors at phenotyping sites affect flowering time of plants
collected at different locations we built basic models (1) for groups of plants sampled at each
location separately. We present selected models with the highest coefficients of determination
(R2) between simulated and observed flowering time for each group. Due to the stochastic nature
of the procedure ten runs were performed with the same algorithmic parameters using different
37
seeds for the random number generator to obtain an ensemble of models. Various factors and
their combinations were selected as predictors by stochastic optimization.
Consequently, the effect of phenotyping environment day length, temperature,
precipitation, humidity and their pairwise combinations on flowering time for plant groups was
estimated with a coefficient of determination averaged over the ensemble of models, taking into
account only terms dependent on the factor in question. The resulting coefficient values for each
factor and factor combination are presented for all collection sites in Additional file 1: Figure
S3–S7.
We compared the mean effect of climatic factors or factor combinations between
accessions from different locations in Turkey with multiple-way ANOVA (MANOVA) using the
values of coefficients of determination obtained in model runs as dependent variables and
location as an independent variable. A Pillai statistics of 1.697 with p=0.037<0.05 confirmed the
statistical significance of differences in mean influences of factors and their combinations on
phenotype between locations.
Further, we applied one-way ANOVA to test the difference in effects on flowering time
between accessions from different locations for each climatic factor or factor combination
individually. Temperature, precipitation and their combination showed significant differences in
the means of coefficient of determination values with F=3.617, p=1.806e−06 and F=2.233,
p=0.003 and F=2.038, p=0.008 respectively.
3.3.3 Analysis of the climatic factor effect on phenotype
38
We continue our analysis of effects of climatic factors on flowering time for accessions
from different locations with a pair-wise comparison method. Firstly, the direction and extent of
each factor influence on phenotype was estimated as a finite difference approximation of the
partial derivative of a regression function (1) in respect to the factor. Figure 3.2 presents the box
plots of factor influence estimators calculated for model ensembles and for each location.
It is evident that both effects of day length and temperature on flowering time are
location-dependent. For accessions collected at some locations increasing day length (e.g. Egil)
or temperature (e.g. Ortanca) speeds up the rate of flowering, while at other locations the
response to these factors is reversed (e.g. Kesentas). Surprisingly there was a consistent effect of
precipitation across all accessions whereby higher precipitation reduced the time to flowering.
This result is consistent with negative correlation between precipitation measures and time to
flowering. In comparison with precipitation the influence of humidity is opposite for most
locations: the flowering time increases with rise of humidity. The influences of factor
combinations are comparatively negligible.
Next, we compared the means of estimators of a factor’s influence on phenotype for
location pairs with a Wilcoxon-Mann-Whitney test. Statistically significant differences in means
between locations pairs are presented in Figures 3.3, 3.4, 3.5 and 3.6 for day length, temperature,
precipitation and humidity, respectively.
39
Figure 3.2: Analysis of climatic factor effects on phenotype. Box plots of climatic factor
influence estimators calculated for model ensembles and for each location as a finite difference
approximation of the partial derivative of a regression function (1) in respect to the factor. Each
box covers two quantiles from 25 to 75% of influence’s variation with a horizontal line at
median value of the estimated influence. Empty circles represent outliers. Boxes located higher
than zero mark on vertical axis represent a positive influence of a factor on flowering time. In
this case increasing the factor speeds up flowering. Other boxes represent an opposite case.
“DL”, “TEMP”, “P” and “U” correspond to factors related to day length, temperature,
precipitation and humidity, respectively
40
Figure 3.3: Results of pair-wise comparisons of day length influence on flowering time. Mann-
Whitney-Wilcoxon test was applied to compare the means of day length influence estimators for
locations. Statistically significant differences in means are shown as red color gradation, cells
with statistically non-significant comparisons are left blank
41
Figure 3.4: Results of pair-wise comparisons of temperature influence on flowering time. Mann-
Whitney-Wilcoxon test was applied to compare the means of temperature influence estimators
for locations. Statistically significant differences in means are shown as red color gradation, cells
with statistically non-significant comparisons are left blank
42
Figure 3.5: Results of pair-wise comparisons of precipitation influence on flowering time. Mann-
Whitney-Wilcoxon test was applied to compare the means of precipitation influence estimators
for locations. Statistically significant differences in means are shown as red color gradation, cells
with statistically non-significant comparisons are left blank
43
Figure 3.6: Results of pair-wise comparisons of humidity influence on flowering time. Mann-
Whitney-Wilcoxon test was applied to compare the means of humidity influence estimators for
locations. Statistically significant differences in means are shown as red color gradation, cells
with statistically non-significant comparisons are left blank
44
3.3.4 Flowering time model with climatic factor-by-genotype interaction
Different genotypes may react differently to climatic factors. Here we check this
hypothesis using the flowering time model (4) with the interaction term between climatic factors
and genotype. We identified six SNPs associated with flowering time . Here we subdivided all
the plants into 18 groups, each containing similar allele combination at one of six polymorphic
sites (see the “Regression model for time to flowering” section for more details). We further
refer to these groups as SNP groups.
Ten runs were performed with the same algorithmic parameters but different seeds for
random number generator. The model (6) with the best coefficient of determination R2=0.97 was
selected for further analysis.
𝚃𝚃𝙵 =+++−++−5.71 ⋅ 𝑇 𝑚 𝑎 𝑥 𝑥 5−10−3.87 ⋅ 𝑇 𝑚 𝑎 𝑥 𝑥 15−20−0.39 ⋅(1/( 𝐷 𝑚 𝑖 𝑛𝑥 60−293.08)+ 𝑇 𝑚𝑎
𝑥𝑥 10−15)5.42 ⋅ 𝐷 𝑠 𝑢𝑚 𝑥 10−15/( 𝐷 𝑚 𝑖 𝑛𝑥 15−20 ⋅ 𝑃 𝑚 𝑒 𝑎 𝑛𝑥 50+1)20.08 ⋅( 𝑇 𝑚 𝑒 𝑎 𝑛𝑥 5−10+( 𝑈 𝑚 𝑒 𝑎 𝑛𝑥 10−15
−0.004)/( 𝑇 𝑚 𝑖 𝑛𝑥 5−10−210.12))(0.06 ⋅ 𝚜𝚗 𝚙𝟻 𝙰𝙰 +0.49 ⋅ 𝚜𝚗 𝚙𝟹 𝚁𝚁 ) ⋅ 𝑇 𝑚 𝑎 𝑥 𝑥 5−10+0.05 ⋅ 𝚜𝚗
𝚙𝟹 𝙰𝙰 ⋅ 𝑇 𝑚 𝑎 𝑥 𝑥 15−20(0.02 ⋅ 𝚜𝚗 𝚙𝟷 𝚁𝚁 +0.002 ⋅ 𝚜𝚗 𝚙𝟸 𝙰𝙰 +0.16 ⋅ 𝚜𝚗 𝚙𝟸 𝚁𝚁 +0.09 ⋅ 𝚜𝚗 𝚙𝟹 𝙰 𝙰 +0.50 ⋅ 𝚜𝚗 𝚙𝟹 𝚁𝚁 +0.007 ⋅ 𝚜𝚗 𝚙 𝟺 𝚁𝚁 +0.14 ⋅ 𝚜𝚗 𝚙 𝟻 𝚁𝚁 +0.04 ⋅ 𝚜𝚗 𝚙 𝟼 𝚁𝚁 ) ⋅(1/( 𝐷 𝑚 𝑖 𝑛 𝑥 60−29
3.08)+ 𝑇 𝑚 𝑎 𝑥 𝑥 10−15)0.11 ⋅ 𝚜𝚗 𝚙𝟺 𝚁𝚁 ⋅ 𝐷 𝑠 𝑢𝑚 𝑥 10−15/( 𝐷 𝑚 𝑖 𝑛 𝑥 15−20 ⋅ 𝑃 𝑚 𝑒 𝑎 𝑛𝑥 50+1)0.54 ⋅ 𝚜𝚗𝚙𝟹
𝙰𝙰 ⋅ 𝑇 𝑚 𝑒 𝑎 𝑛𝑥 5−10, TTF= − 5.71 ⋅ Tx5− 10 ma x− 3. 87 ⋅ Tx15− 20 ma x− 0.39 ⋅ ( 1/ ( D x60min
− 29 3.08 ) + Tx10− 15 ma x ) + 5.42 ⋅ D x10− 15 s u m/ ( D x15− 20 min⋅ P x50me a n+ 1) + 2
0.08 ⋅ ( Tx5− 10 me a n + ( Ux10− 15 me a n− 0.00 4) / ( Tx5− 10 min− 21 0.12 ) ) + ( 0.06 ⋅ s n
p5A A + 0.49 ⋅ s np3R R ) ⋅ Tx5− 10 ma x+ 0.05 ⋅ s n p3A A ⋅ T x15− 20 ma x− ( 0.02 ⋅ s np1R R +
0.00 2⋅s np2A A + 0.16 ⋅ s np2R R + 0.09 ⋅ s np3A A + 0.50 ⋅ s np3R R + 0.00 7⋅s np4R R + 0.14 ⋅
s np5R R + 0.04 ⋅ s np6R R ) ⋅ ( 1/ ( D x60min− 29 3.08 ) + Tx10− 15 ma x) + 0.11 ⋅ s np4R R ⋅ D
x10− 15 s u m/ ( D x15− 20 min⋅ P x50me a n+ 1) + 0.54 ⋅ s np3A A ⋅ Tx5− 10 me a n,
(6)
where 𝐷 𝑚 𝑖 𝑛 𝑥 60Dx60min, 𝐷 𝑚 𝑖 𝑛 𝑥 15−20 D x 15− 20m i n denote minimum day length over
60 days after sowing and over a period from 15 to 20 day after sowing, respectively;
𝐷 𝑠 𝑢 𝑚 𝑥 10−15 D x 10− 15s um denotes sum of day lengths over a period from 10 to 15;
𝑇 𝑚 𝑎 𝑥 𝑥 15−20 T x 15− 20m ax, 𝑇𝑚 𝑎 𝑥𝑥 10−15 T x 10− 15m ax and 𝑇 𝑚 𝑎 𝑥 𝑥 5−10 T x 5− 10m ax denote
maximum temperatures over a periods from 15 to 20, from 10 to 15 and from 5 to 10 days after
45
sowing, respectively; 𝑇 𝑚 𝑒 𝑎 𝑛 𝑥 5−10 T x 5− 10m e an and 𝑇 𝑚 𝑖 𝑛 𝑥 5−10 T x 5− 10m i ndenote mean and
minimum temperatures over a period from 5 to 10 days after sowing, respectively;
𝑈𝑚 𝑒 𝑎 𝑛 𝑥 10−15 U x 10− 15m e an denotes mean humidity of an interval from 10 to 15 days after
sowing and 𝑃 𝑚 𝑒 𝑎 𝑛 𝑥 50Px50mean denotes mean precipitation of a period over 50 days after
sowing.
While SNPs were identified only in Cicer reticulanum samples from 15 collection sites
we are able to fit the model to the whole dataset giving appropriate values to the indicator
variables – the elements of matrix D (see formulae 3 and 4).
The analysis of relative difference in the sum of squares for a model with and without the
interaction terms between climatic factors and each SNP group allows us to conclude that
genotype-by-environment interaction accounts for about 17.2% of variation in time period from
sowing to flowering. All SNPs interact with temperature and day length. Additionally, SNP3
interacts with relative humidity and SNP4 interacts with precipitation.
To analyze the difference in response of SNP groups to climatic factors we built
regression models (1) for each group separately.
Due to the stochastic nature of the procedure ten runs were performed with the same
algorithmic parameters using different seeds for the random number generator to obtain an
ensemble of models. Various agroclimatic factors and their combinations were selected as
predictors by stochastic optimization.
We calculated the coefficients of determination for ensemble of models from which the
terms that do not contain a predictor of a climatic factor or a combination of factors analyzed
were excluded.
46
The multiple-way ANOVA (MANOVA) applied to the coefficient of determination as
dependent variable and SNP group membership as an independent variable showed that the
difference in mean effects of climatic factors on SNP groups is statistically significant (Pillai
satistic value 1.287, p=1.333e−09<0.05).
Next we applied one-way ANOVA to test the influence of each climatic factor
individually. The significant differences in the means of the coefficient determination values
were observed for day length, humidity and the combination of precipitation and day length
(F=2.102, p=0.009497 and F=6.642, p=5.159e−12 and F=1.904, p=0.0218 respectively).
The next step in our analysis was the pair-wise comparison of climatic factor influences
on flowering time between SNP groups. The direction and extent of each factor influence on
phenotype was estimated as a finite difference approximation of the partial derivative of a
regression function in respect to the factor.
The means of estimators of a factor influence on phenotype averaged over SNP groups
were compared with a Mann-Whitney-Wilcoxon test. As is evident from an analysis of Table 1,
climatic factors had divergent effects on genotypes with different reference alleles at five out of
six polymorphic position analyzed. As an example, for SNP1 (T →G) day length has different
effects on plants with ALT/ALT and REF/ALT, as well as REF/REF and ALT/ALT allele
combinations. Precipitation influences plants with ALT/ALT and REF/REF combinations
differently. In case of SNP2 (A →G) we found clear differences between genotypes with
ALT/ALT and REF/REF for combination of day length with either temperature or precipitation.
For SNP3 (C →T) humidity affects genotypes with ALT/ALT and REF/REF differently, day
length – temperature combination exerts different influence on ALT/ALT and ALT/REF
genotypes, as well as ALT/REF and REF/REF genotypes, day length – precipitation combination
47
shows different effects on ALT/ALT and REF/ALT genotypes. For SNP5 (C →A) there is
difference in influence of day length on REF/REF and REF/ALT, as well as REF/ALT and
ALT/ALT genotypes. In addition, precipitation also affects differently ALT/REF and REF/REF
genotypes. Different effects of day length on ALT/ALT and REF/ALT genotypes is evident for
SNP6 (A →G).
Table 3.1: Statistically significant differences in effects of climatic factors and their
combinations on plant genotype. The pair-wise comparisons were performed with Mann-
Whitney-Wilcoxon test. DL- day lengthh, TEMP – temperature, P – precipitation; REF –
reference allele, ALT – alternative allele for polymorphic site
To further understand the relationship between precipitation and the allele frequency of
the SNPs, we correlated the allele frequency of 15 populations at the putative GWAS SNPs with
the mean annual precipitation at the primary collection sites of the genotypes. Allele frequency
of the SNPs 1, 4, 5 and 6 have a linear relationship and are correlated with mean annual
precipitation. This is indicative of the alleles being fixed in the populations which are found in
the areas with high mean precipitation (see Figure 3.7). Spearman’s rank correlation between
mean annual temperature and the allele frequency of the SNPs resulted in no significant
relationship. This is indicative of 2 possible scenarios, a) the mean annual temperature value
48
might not be indicative of critical time window affecting the time of flowering in the genotypes,
b) the SNP alleles are not in genes involved in the pathways of temperature response .
Figure 3.7: Correlations of mean annual precipitation (mean_annual_prec) with allele frequency
of the 6 GWAS SNPs calculated for 15 populations of the wild chickpeas (shown for
completeness). Allele frequency of SNPs 1, 4,5 and 6 are correlated with mean annual
precipitation. The allele frequencies have a linear relationship at each of these significant SNPs,
showing that these alleles are nearly fixed in the population in regions with high mean annual
precipitation
49
3.4 Discussion
The lifecycle of chickpea is strongly determined by environmental factors. Consequently,
its phenology is likely strongly predicted by geographic origin and local phenotyping
environment, as demonstrated in domestic chickpea cultivars and landraces originating from the
Mediterranean to southern India [21]. Here we investigate this hypothesis in the wild progenitors
of chickpea by statistical modeling of chickpea responses to environment conditional on
geographic site of origin and genotype. Usually the extent of G×E interaction due to sampling
site and environmental factors is modeled by state-of-the-art techniques such as AMMI and
factorial regression or by using bioclimatic variables as a GWAS phenotype. Here we
implemented a more general solution in which the analytic form of dependencies between
predictors (climatic factors, collection sites and genotypes) and phenotype (flowering time) is
automatically inferred by a stochastic optimization technique. Apart from automation the
advantage of our approach resides in its ability to quickly examine different fits to the data and
select the optimal one. We performed model parameterization on a wild chickpea dataset
collected at 21 different locations in Turkey [2] grown in 4 different environments. GWAS
analysis of the data identified six polymorphic sites responsible for flowering time variation
independent of environmental conditions (Singh, A.: Genome-wide association studies in wild
chickpea, in preparation).
We built two types of flowering time models – for the whole dataset and for groups of
plants, that either originated from one sampling site or have similar allele combination at one of
the 6 SNP positions.
50
Using the models for the whole dataset we found that 14.7% and 17.2% of variation in
time to flowering is accounted for by interactions of climatic factors with geographic origin of
the plant and its genotype, respectively. Contrary to previous approaches that measure the
combined sensitivity of the phenotype to all environmental factors, our approach makes it
possible to identify responses to specific environmental conditions and sampling locations in
individual accessions, collection sites or SNP groups. In this case we have treated collection site
as a model parameter which describes the composite influence of geography (latitude, altitude
etc.) climate (day length, temperature) and biological interactions on phenotype. We found that
in total 15 out of 21 sampling sites interact with different climatic factors at the phenotyping site,
day length and humidity in particular. We also showed that all of six polymorphic sites identified
in GWAS interact with temperature and day length, and that SNP3 and SNP4 additionally
interact with relative humidity and precipitation respectively.
The influence of the geographic site of origin on plant phenology was further confirmed
by applying a group-oriented approach. We found that wild chickpea accessions originating from
different collection sites react differently to different environments. For example, plants
collected at Baristepe1 react differently to day length change in comparison to plants from
locations Baristepe3, Destek, Egil, Ortanca, Savur1 and Sirnak1 (see Figure 3.3).
Observing the relation between climatic factors at the site of genotype collection, we
hypothesized that there should be an association between the allele frequency of the GWAS
SNPs and climatic factors at genotype collection site. This was confirmed by strong correlations
of allele frequency with collection site mean annual precipitation in 4 of the 6 SNP groups
(Figure 3.7). Three of these four SNPs, have fixed alleles (allele frequency 1) within populations
with highest mean precipitation. This makes sense, given strong selection for climate-appropriate
51
flowering time in Mediterranean annuals, which typically flower early to avoid terminal drought
in low rainfall regions, but flower later to maximize their reproductive potential in longer season,
high rainfall environments [97]. In this context we were interested to discover that there was no
correlation of SNP allele frequencies with collection site annual mean temperature. This may be
explained by strong site and SNP interaction for phenotyping temperature whereby genotypes
collected at different locations responded differently to temperature (and precipitation). Thus,
increasing temperature led to earlier flowering in some locations (e.g. Ortanca, Savur) and later
flowering in others (e.g. Kesentas) (see Figure 3.2), that makes it impossible to reveal
dependencies between SNP frequencies and temperature with standard correlation analysis.
We were also able to demonstrate that certain environmental variables differently affect
flowering time of genotypes with different allele combinations at five out of six polymorphic
position analyzed. For example, different allele combinations at SNP1 differently react on day
length change (see Table 3.1). This is an important characteristic of the SNP that might be used
in practice.
We believe that the models we have developed here can be plugged into existing process-
based models, such as SSM, to build a new generation of crop models that predicts aspects of
crop performance based on genetic, geographic, environmental and management data. In an era
of growing genomic information, these new models are essential. Specific subroutines modeling
selected biological processes could be modified to incorporate effects on these variables without
altering other processes within the model. With Grammatical Evolution and DEEP this can be
achieved in automatic way, easing the adaptation of crop models in breeding programs around
the world.
52
3.5 Conclusions
Analyzing patterns of adaptation is a key for defining strategies to cope with GxE
interactions in breeding for either wide or specific adaptation. The phenology of adaptive traits,
like flowering time, may be strongly predicted by plant geographic origin and local
environmental factors. Here we tested this hypothesis by statistical modeling of wild chickpea
flowering time responses to different environmental conditions. Our results showed that 1)
geographic origin of a plant is indeed a good predictor of flowering time in chickpea and 2)
allele combinations at GWAS hits associated with flowering time are “environmentally
responsive”, i.e. react differently to changes in climatic factors.
Our methodology is generic and can be further applied and extended to existing crop
models.
53
Chapter 4: Uncovering genetic basis of critical domestication traits in
Cicer hybrids using Nested Association Mapping
Anupam Singh
1
, Maggie P. Bendersky
1
, Asif Zubair
1
, Noelia Carrasquila-Garcia
2
, Susan
Moenga
2
, Emily Bergmann
2
, Peter L. Chang
1,2
, Douglas R. Cook
2
, Sergey V. Nuzhdin
1
1
Department of Biological Sciences, University of Southern California, Los Angeles, CA, 90089,
USA,
2
Department of Plant Pathology, University of California, Davis, CA, 95616, USA. In
prep.
4.1 Introduction
Cicer arietinum (C.ari), commonly known as chickpea is a valued grain legume of
traditional agriculture in the Mediterranean Basin and Western Asia as well as India and
Ethiopia. Chickpea has been adapted to a subtropical or Mediterranean-type climate. It grows
almost exclusively after the rainy season on stored moisture in the soil [98]. Chickpea seeds
contain about 20% protein and thus is an important meat substitute like lentil and peas
Chickpea is considered to be one of the eight plants that was domesticated founder crops
in the Levant (i.e., the western “arc” of the “Fertile Crescent” ). These 8 crops can be divided
into three cereals (einkorn wheat Triticum monococcum, emmer wheat Triticum turgidum subsp.
dicoccum, and barley Hordeum vulgare), four pulses (lentil Lens culinaris, pea Pisum sativum,
chickpea Cicer arietinum, and bitter vetch Vicia ervilia), and a single oil and fiber crop (flax
Linum usitatissimum) [99] have been known to be the founder crops that started agriculture in
the Levant during the Pre-Pottery Neolithic (PPN) period some 11,000– 10,000 years ago.
Cicer echinospermum (C. echi) and Cicer reticulatum(C. ret) are two wild chickpeas that
are diploid self-pollinated annuals known only from southeastern Turkey. Modern chickpea
cultivar: Cicer arietinum (C. ari) is known to have been domesticated from its closely related
54
wild progenitors C.ret(insert citation). Intensive selection in the cultivars has shrunken its genetic
diversity and efforts are now in place to reintroduce the genetic variation. One common way to
address the loss of genetic diversity problem is by introgression of the wild material into the
cultivated. This method is also helpful to impart both biotic and abiotic resilience from the wilds
into the cultivar gene pool. However, this can come with the cost of a tradeoff with the desired
traits in the cultivars.
To keep the desired phenotypes in the cultivars intact while introducing the wild genome
into them, it's important to understand the genetic architecture of these traits. Hybrids created
between wild and cultivated chickpeas using a multi parental population design (MPP) like
nested association mapping (NAM) design can prove to be a step in that direction.
NAM populations utilize the strengths of bi-parental mapping populations and
association mapping and increase the precision of QTL mapping by efficiently capturing rare
alleles and enhancing the chances of identifying novel genomic loci contributing to the trait of
interest [15]. In the plant world, NAM design was initiated in the maize nested [100]and has
been successfully utilized in other crops such as rice [101], wheat [102], soybean [103],
barley[104], [105], and sorghum [106].
In this paper, we describe the NAM population developed for chickpeas using the two
closely related species C.ret and C.echi. A popular cultivated line ICCV96029 [107] is used as the
common parent which is crossed with 19 C.rets and 6 C.echis to create two separate NAM
populations. 16 important domestication traits have been phenotyped and a genome-wide
association study is performed to narrow down the genetic regions associated with the traits.
Important and well-known genes like BIG GRAIN 1-like B positively regulates auxin response
and transport pathway and has been linked to increased grain size and yield in crops such as maize
55
and rice [108]–[112], disease resistance response protein 206 (anti shattering gene)[113]–[117],
protein YELLOW LEAF 1, chloroplastic-like [118]–[121] etc.
4.2 Methods
4.2.1 NAM crossing scheme, selection of genotypes
Two NAM populations were created using three species of chickpeas C. ari, C.ret and
C.echi. For the first population, the families were created by crossing 19 males from C. ret and
the second population using 6 males from C. echi and the sampling of these genotypes are
described in [2] and they were all crossed with a common parent, ICCV96029[107]. It is an elite
cultivar possessing a double‐podding habit, conferring a yield advantage in drought‐prone and
short-season environments. It was developed as a super early flowering chickpea germplasm
from India harboring additional sources of genes for early flowering. ICCV96029 has been
shown to have better early growth vigor, a taller phenotype with more branches, and had a wider
plant canopy, giving it an advantage for utilizing available moisture more efficiently. It flowers
in about 30 days and matures in 75 to 80 days [107], [122]. The resulting two NAM populations,
one with 19 families and the other with 6 families were scored for 16 agronomically important
traits.
56
4.2.2 Field setup, phenotyping of different traits
Chickpea F2 hybrids were grown as individual plants at the University of California,
Davis in the spring and summer of 2016. They were planted in rows with 2 ft spacing between
plants within rows and 5 ft spacing between rows. All F2 progeny were grown in a single year in
the same plot.
Both quantitative and qualitative traits were scored for each individual belonging to the
19 (ariXret) and 6 (ariXechi) families. A total of 16 traits were phenotyped (table 4.1) They can
be divided into two main categories, plant architectural traits like are, plant height (PH): was
measured by taking the widest point of the plant followed by the axis perpendicular to the widest
point, plant lateral width (PLAW): this trait measured the lateral width of the f2 plant, plant
longitudinal width (PLOW): this trait measured the longitudinal width of the plant, plant growth
habit (PGH): was estimated using four values: prostrate or intermediate, upright, highly
branching or not, a this was based on the description of the growth pattern in the literature. plant
stem size (PSS): the plants stem size were categorized into normal or extra-large, plant leaf size
(PLS): the plants either had normal or extra-large leaves, yellowing leaves (YL): this trait was
scored as a yes or a no trait, Necrotic leaves (NL): this was measured as yes if the plant had
necrotic leaves or not. The second category of traits scored are Biomass related traits like Total
plant biomass (TPB), Yield, plant volume (PV): volume of water that a plant displaces, volume
was approximated assuming the shape of an ellipse: (4/3) Å~ (a Å~ b Å~ c) and density was
estimated as the ratio of biomass to volume, shattering index (SI): After bagging the plant during
the harvest phase, number of seeds that were found fallen from the plant into the bag, average
seed mass (ASM): it’s the combined average of mass of seeds that a plant produces, plant
57
fertility (PF), double pods (DP): this trait is measured if the plant had a double podding habit or
not, pod size (PS): pods in the F2 plants were either normal or extra-large.
Once the plants attained maturity, they were harvested separately. For traits like
shattering index, seed from each plant was collected steadily until they reached maturity. The
harvest date was calculated from the date of germination to the final day of harvest. At harvest,
the above-ground portion of individual plants was collected by cutting the plant at the soil level.
Plants were dried in heavy-duty brown paper bags: at least 1 week in the field, 2 weeks in a
warm greenhouse with continuous air movement and no cooling, and 1 week in a temperature
controlled (60 °C) drying room. Shattering was calculated as the ratio of mature seed released
from or remaining within pods after drying.
4.2.3 DNA extraction and sequencing
DNA was extracted from all of the F2 individuals using the Qiagen DNeasy 96 format
Plant Mini Kit (Valencia, CA, USA). Genomic DNA was then digested with HindIII and NlaIII
restriction enzymes and ligated with adapters. This results in the ‘barcode’ adapter ligating to
HindIII and the ‘common’ adapter ligating to NlaIII. Size selection followed by 14 rounds of
PCR amplification was performed to generate 100 base paired-end reads on an Illumina
HiSeq4000. These steps were performed at the University of California at Davis Genome Center
DNA Technologies Core.
58
4.2.4 SNP calling and selection
Sequenced Illumina reads were mapped to the C. ari, CDCFrontier reference genome [15]
using BWA MEM [16] with default mapping parameters. GATK pipeline was used for SNP
calling. For filtering the SNPs, GATK Best Practices recommendation was followed. SNP calls
with Mapping Quality (MQ) > 37 and Quality by Depth (QD) > 24 were retained. These metrics
ensured that both the mapping quality and genotype calls with the highest confidence were
retained. The SNPs were further filtered for MQRankSum < |2.0|, removing the ambiguity in the
Mapping Quality scores for alleles at a given locus. This stringent filtering criterion retained
about 40% of variant sites reported by GATK. This final SNP set was then filtered to retain
genomic sites with a minor allele frequency (MAF) threshold of >3% and genotype call rate
>80%.
4.2.5 SNP imputation
FSFHap Imputation in TASSEL5 was run separately for each chromosome using the
following parameters: -windowLD true -bc false -fillgaps true 0.5 -window 50 -overlap 10. 2013
reference genome (Varshney et al.) was used to determine the marker order in the F2 genome.
Each F2 individual was also assigned a 50% contribution from each parental genome and a phet
value of 0.5, as expected for F2 individuals. An average of 5000 markers were called for each of
the population. In the combined analysis of 1942 F2 individuals, we also used a set of 1560 SNPs
that contained an allele specific to the ICCV96029 domesticated cultivars.
59
For all the NAM populations, SNPs that were at intermediate allele frequency were
chosen, such that these SNPs were segregating at intermediate allele frequency (AF) in the
population and were nearly fixed and different in wild and cultivated parental species.
4.2.6 GWAS method
Genome-wide association tests were performed for the 17 traits separately. All the NAM
populations were analyzed using the Fixed and random model Circulating Probability
Unification (FarmCPU) method implemented in R[28], [29].
The FARMCPU method is a Multiple Loci Linear Mixed Model (MLMM) divided into
two parts: Fixed Effect Model (FEM) and a Random Effect Model (REM) and they are used
iteratively. In FEM markers are tested one at a time, and false positives are controlled by using
multiple associated markers as covariates. The overfitting problem in FEM is circumvented in
REM by defining the kinship using the associated markers. At each iteration, the P values are
unified for both testing markers and the associated markers.
In FEM, each of the tests markers are tested one at a time and a set of pseudo
Quantitative Trait Nucleotides (QTNs) are used as covariates. The model can be written as:
(1)
In Equation 1, yi is the phenotype observation of the ith individual; Mi1, Mi2,…, Mit
represent the genotypes of t pseudo-QTNs, initiated as an empty set; b1, b2, …, bj are the effects
of the corresponding pseud QTNs; Sij is the genotype of the ith individual and jth genetic
60
marker; the corresponding effect of the jth genetic marker is dj; ei represents the residuals having
a distribution with zero mean and variance of .
In equation 1, each of the markers being tested receives a P value except for those that are
designated as QTNs are used as covariates. In the beginning, these pseudo QTN markers are
assigned P-value of “NA" (Not Available). When each pseudo QTN is examined for each of the
testing markers, most significant P value replaces the “NA” for that pseudo QTN and it becomes
the P value of its corresponding marker.
The above step generates a P-value for every marker. The P values and the associated
marker map are used to update the selection of pseudo QTNs by using the SUPER algorithm[24]
in a REM as follow:
(2)
where yi and ei are same as in Eq (1) and ui is the total genetic effect of the i
th
individual.
The expectations of total genetic effects for the individuals’ are zeros. The variance and
covariance matrix of the individuals’ total genetic effects is , where unknown
genetic variance is and kinship obtained from the pseudo QTNs K .
The pseudo QTNs in the FEM, Eq (1) is replaced by the set of pseudo QTNs that
maximizes the likelihood of the REM, Eq (2) . When no change occurs in the estimated set of
pseudo QTNs occurs, this iteration stops [29].
The genome-wide significance threshold for the P values for each of the markers was
calculated and set using the Bonferroni test threshold set at 0.01.
Genome-wide significance in the present study was calculated as follows
61
0.01/1560 (cutOff / number of markers) = 0.000006410256, -log(0.000006410256) =5.2
The two wild NAM C.ret and C.echi populations were analyzed separately.
4.2.7 Candidate gene identified from GWAS
To identify genes in the genomic region associated with the traits, we chose a window of
10 MB approximately the size of the LD block in the F2 populations, around the GWAS hits. LD
was calculated using vcftools.
4.2.8 MSA analysis/ ITASSER to infer amino acid changes/ structural
changes in genes inferred from GWAS
Gene trees were constructed across three species C.ari, C.ret and C.echi using
phylogenetic analysis of individual genes. Gene sequences were downloaded from National
Center for Biotechnology Information (NCBI) using published genomes: ASM33114v1 for C.ari,
Besevn079_v0.3 for C.ret, and S2Drd065_v0.5 for C.echi. Respective genes from Medicago
truncatula was used as outgroup in the analysis.
After picking a gene in C. ari, MegaBLAST was used to identify orthologs in other
species using their alignment algorithm. Interspecific gene sequences involving the three Cicers
and orthologous gene sequences in Medicago truncatula were aligned using MEGAX Version
10.1.8 software package[123]. Multiple sequence alignment was conducted using the ClustalW
algorithm with default parameters. Maximum likelihood trees were constructed using Tamura-
62
Nei and Poisson algorithms. Phylogenetic consistency was evaluated via 1000 bootstrap
replications.
For inferring the role of mutations present in the Cicers, protein structural modeling was
conducted using Iterative Threading ASSEmbly Refinement (I-TASSER) using default
parameters, no templates were pre-excluded and no secondary expected secondary structure was
specified prior to modeling [124]. Three elements of I-TASSER output were used in the analysis:
secondary structure prediction, tertiary structure prediction, and ligand-residue interactions. In I-
TASSER, the confidence scores for secondary structure range from [0,9], with 9 indicating the
highest confidence. Secondary structure elements averaging ≥6 were used in the analysis.
Confidence scores for tertiary structure prediction range from [-5,2], with 2 indicating the
highest possible confidence. For this study, models with scores ≥-1.5 were considered. The
confidence scores of proposed ligand binding sites and ligands range from [0,1] with 1 indicating
the highest possible confidence. Predictions with confidence scores ≥0.1 were included. The
confidence requirement for ligand-residue interactions is more relaxed because ligand feasibility
was considered in addition to the confidence score. For example, a human leukocyte may have a
remarkably high confidence score, but low feasibility as a ligand naturally occurring in plants.
Likewise, a known plant hormone may have a lower confidence score, but is more feasible and
taken into consideration.
Protein structural modeling was conducted using I-TASSER .
63
4.3 Results
4.3.1 NAM populations crossing scheme
The crossing scheme utilized in the NAM population generated by crossing ICCV9609
was chosen as the cultivar and 19 different C. ret individuals are shown in Figure 4.1. The
resulting F2 progeny has a mosaic chromosomal pattern due to the recombination events
between the parental genetic material. In the figure, we see the cultivar parent represented by
black chromosome, and all the C.ret parents are represented using 19 distinct colors. The F1s
produced after crossing the cultivated line with the wilds are selfed to produce F2 progeny. A
total of 1942 F2 individuals were produced from this setup, and they were genotyped for the C.
ret NAM population. These individuals were then sown in fields to obtain phenotypes. We
looked at the phenotype data extensively to understand their distribution across all the
populations.
64
Figure 4.1: Schematic representation of the NAM crossing scheme. An elite cultivar is crossed
with 19 wild species to create F1s. The F1s are selfed to create 19 F2 populations. The
chromosomes show a mosaic of recombination breakpoints at F2 generation. A total of 1942
individuals are genotyped in the NAM population
65
4.3.2 Phenotype results
A total of 16 traits were scored for all the genotypes in the 19 NAM populations. These
can be divided into two main categories: architectural and biomass. Of these, 8 are continuous
and 8 are categorical traits (Figure 4.1). Statistical analysis was performed to assess the trend in
the dataset based on numerous factors. At first, a correlation analysis was performed for all the
continuous traits. The results are listed in the next section.
Figure 4.2: Illustration representing the traits collected in NAM populations listed in
table 3.1.
66
Traits analyzed Trait value
Plant architectural traits
1) Plant height (PH) Continuous
2) Plant lateral width (PLAW) Continuous
3) Plant longitudinal width (PLOW) Continuous
4) Plant growth habit (PGH)
Categorical: Prostrate, intermediate and
upright
5) Plant stem size (PSS) Categorical: Normal, Extra large
6) Plant leaf size (PLS) Categorical: Normal, Extra large
7) yellowing leaves (YL) Categorical: Yes, no
8) Necrotic leaves (NL) Categorical: Yes, no
Biomass traits
9) Total plant biomass (TPB) Continuous
10) Yield Continuous
11) Plant volume (PV) Continuous
12) shattering index (SI) Continuous
13) Average seed mass (ASM) Continuous
14) Plant fertility (PF) Categorical: low, high
15) Double pods (DP) Categorical: Yes, no
16) Pod size (PS) Categorical: Normal, Extra large
Table 4.1: List of phenotypes collected in the NAM populations. The traits can be divided into
two broad categories: Architectural and Biomass and both of which have continuous and
categorical traits.
67
4.3.2.1 Relationship between continuous traits
For the eight continuous traits PH, PLAW, PLOW, ASM, TPB, yield, SI, and PV
pairwise correlation analysis was performed for all the populations. All the traits show a strong
correlation across all the populations except SI and yield. This could be due to the way shattering
was scored. All the traits look normally distributed except for the shattering index. It shows a
bimodal distribution, and this could be because shattering tends to be more of a binary trait than
a continuous trait. SI is also scored on a scale of 0 to 100.
4.3.2.2 Relationship between continuous and categorical traits
PGH is a categorical trait with three categories. On examining the relationship between
PH and PGH in F2 Plants we see that individuals with upright phenotype are statistically
significantly taller than those with prostrate or intermediate growth habits. SI also varies based
on the PGH. Plants with upright phenotype shatter more (Figure 4.4). We also looked at the
relationship between PF, shattering, and yield. As expected, plants with lower fertility had lower
yield compared to those that had higher fertility. However, shattering seemed to be heightened in
plants with reduced fertility (Figure 4.5). We see an interesting trend when we plot PLS against
yield. Plants with bigger leaves have high yield. The picture looks similar when we plot NL and
yield. Plants with leaf necrosis have lower yield (Figure 4.6).
68
Figure 4.3: Pairwise correlation of 8 continuous traits Plant height (PH), Plant lateral width
(PLAW) Plant longitudinal width (PLOW), Average seed mass (ASM), Total plant biomass
(TPB), yield, shattering index (SI) and Plant volume (PV). Significance level associated with the
pearson correlation value is "***" p-value is < 0.001, "**" if the p-value is < 0.01, "*" p-value
is < 0.05, "." p-value is < 0.10,
69
A) B)
Figure 4.4: Relationship between PGH and PH in a and SI in b. Plants with lower fertility shatter
more. Plants with high fertility have yield.
A) B)
Figure 4.5: Relationship between PF and SI in a and Yield in b. Yield and PF statistically
significant between plants with low, medium and high fertility.
70
Figure 4.6: Relationship between Yield and NL in a and Yield and PLS in b. Plants with no leaf
necrosis have statistically significantly higher yield. Similarly, plants with bigger leaves have
higher yield.
4.3.3 Combined population analysis in GWAS
To identify genetic control of domestication traits in the chickpea NAM population,
genome-wide associations were performed using a single NAM population using the Fixed and
random model Circulating Probability Unification (FarmCPU). A window size of 10 MB was
used to identify genes enriched around the GWAS hits. The genomic window was decided based
on the LD block in the F2 population.
Multiple locis were identified for different phenotypes using GWAS (Table 4.2).
4.3.4 Genes identified around the GWAS hits
To understand the underlying genes for each of the domestication traits analyzed in the
GWAS, we considered the ~10 MB window around the most significant GWAS SNP. The
window size is a proxy of LD blocks size in a chickpea NAM population. More than 5000 genes
T−test
p < 2.2e−16
0
50
100
150
no yes
p_necrotic_leaves
yield
p_necrotic_leaves no yes
Kruskal−Wallis
p = 1.3e−07
0
50
100
150
Extra Large Extra Small Normal
p_leaf_size
yield
p_leaf_size Extra Large Extra Small Normal
NL
PLS
71
were identified around each of the genomic windows around the GWAS hit. Of the list of genes
identified, many consisted of both known and established functions that were identified in
chickpea and other plants. Most of the genes were neither functionally validated nor found to be
linked with the traits in consideration of the present study. The most significant SNP for PH was
found on chromosome 1, many genes related to the plant architecture, growth was found in
around the 10 MB window. Two main genes around the significant SNP are protein SHORT-
ROOT-like and AP2-like ethylene-responsive transcription factor PLT2. Both of those genes are
known to play a role in plant height [125], [126]. Protein YELLOW LEAF 1, chloroplastic-like
encodes Magnesium-protoporphyrin IX monomethyl ester cyclase (MPEC) an enzyme involved
in eventual chlorophyll production [119], and is essential for healthy photosynthesis and green
leaves in plants are found in the close proximity of YL phenotype GWAS hit. Mutations in this
gene cause yellowing of leaves[127]. Another gene zinc finger BED domain-containing protein
DAYSLEEPER-like is a transposase-like protein that is involved in the plant growth and
development of the plant, zinc finger BED domain-containing protein RICESLEEPER 2-like is a
transposase-like protein involved plant growth and development. Mutants show moderately
decreased height and seed yield, protein MAINTENANCE OF MERISTEMS-like, maintains
genome stability and cell division activity in meristematic cells.
For chromosome 5, 31-39 MB was found to have significant hits for plant width, fertility,
leaf size, pod size, leaf necrosis, plant biomass, Yield, biomass conversion efficiency, and
combined average mass of seed (Table 4.2).
72
Trait Chromosome Position Gene Name
Plant height (PH) 1 15397968 protein SHORT-ROOT-like
AP2-like ethylene-responsive transcription factor
PLT2
Plant growth habit (PGH) 1 15744351 CEN-like protein 2
Plant volume (PV) auxin response factor
Yellowing leaves (YL) 2 4507168 protein YELLOW LEAF 1, chloroplastic-like
Plant fertility (PF) 5 33845953 COP9 signalosome complex subunit 5a-like
pentatricopeptide repeat-containing protein
At1g20300, mitochondrial-like
Yield 5 35631842 protein GRAIN 1-like B
Necrotic leaves (NL) MLO protein homolog 1-like
Plant leaf size (PLS) 5 39350222 protein LEAFY
Total plant biomass (TPB) 5 35935020 auxin response factor 5-like
Plant lateral width
(PLAW)
AP2-like ethylene-responsive transcription factor
ANT
Plant longitudinal width
(PLOW)
ethylene-responsive transcription factor ERF114-
like
Average seed mass (ASM) 5 36291511 bidirectional sugar transporter SWEET4-like
Plant leaf size (PLS) 5 39350222 protein LEAFY
Shattering index (SI) 6 2044973 disease resistance response protein 206
homeobox protein BEL1 homolog
Double pods (DP) 6 29173408 transcription factor RAX2-like
Pod size (PS) auxin response factor 18
Plant stem size (PSS) 7 31759608 gibberellin 20 oxidase-like protein
Table 4.2: List of genes found around the GWAS hits, for all the traits analyzed
in this study.
73
For Shattering, one of the important genes identified around the hit are disease resistance
response protein 206. It is implicated in lignification, a key biological process of seed
shattering[128][128], homeobox protein BEL1 homolog protein, it is a possible ortholog of SH5,
and is established as a shattering gene. It interacts with AG-MADS box genes and negatively
regulates carpel identity. It also maintains indeterminacy of inflorescence meristem and is
required for PIN1 (auxin transport) expression in ovules[63], [129].
4.3.5 Correlated phenotypes have the same genomic region associated
with them in the GWAS analysis
Across all the 19 populations, yield shows a strong correlation with ASM, TPB, PV,
PLAW, PLOW (Figure 4.6). These phenotypic correlations translate very well into GWAS
results. All of these correlated traits have significant loci associated with them on the same 35
MB LD block on chromosome 5, indicating the combined role of these traits in dictating the
yield of the plants. Many important genes like protein BIG GRAIN 1-like B, which regulates
auxin transport positively and is involved in gravitropism, plant growth, and seed development
[109], [110]. COP9 signalosome complex subunit 5a-like protein which is a regulator of Cullin-
RING ubiquitin E3 ligases (CRLs) and CULI1 ligases (SCF complexes). SCF complexes are
involved in auxin signaling, flower development, and seed development. Mutations in the gene
result in decreased growth, seed size, and fertility [130].
74
a) b)
c) d)
Figure 4.7: Analysis of yield in context with other traits. a) Correlation between Average seed mass (ASM), Total
plant biomass (TPB), Yield, Plant volume (PV), Plant lateral width (PLAW), Plant longitudinal width (PLOW)
Plant. b) Manhattan plots showing the SNPs tested in the FARMCPU model for the traits shown in a). The 35mb LD
block has a significant SNP associated with these traits. The genome-wide significance is drawn after the Bonferroni
correction at 0.05 (0.05/number of tests). c) Violin plot showing the distribution of yield based on the most
significant genomic window. The position on chromosome 5 has a reference allele as A and the alternate allele is G.
d) bar plot showing the distribution of the SNP from c) in cultivated (Cicer arietinum), closely related wild (Cicer
reticulatum) and their common ancestor (Cicer echinospernum)
75
4.3.6 Introduction of wild allele increases yield
Assessment of the most significant SNP associated with a yield which was found on
chromosome 5 at position 35631842 revealed that having a homozygous allele at that position
has a lower median yield. The introduction of the wild allele increases the yield significantly
(Figure 4.6 C). Heterozygotes have a higher yield compared to individuals that have a
homozygous reference or wild allele. This has been seen in the azuki bean (Vigna angularis). As
a result of domestication, the seed yield reduced on a per plant basis as farmers selected for
determinate plants with larger pods and fewer large seeds per pod than its progenitor wild
relative (A. Kaga, NIAS, Japan, unpubl. res.)
On examining this exact position in the parents of the F2s and other species of
chickpeas, we confirmed that both the wild species were fixed for the alternate allele. This hints
at the tradeoff between yield and other traits as a result of domestication. While farmers selected
for yield, they chose less yield in favor of other essential traits like seed size, growth habit,
fertility.
4.3.7 Cost of domestication: the tradeoff between yield, leaf size, and
leaf necrosis
To understand why farmers chose against high yield in the cultivated, we looked at other
traits that shared the most significant SNP in the GWAS analysis. PLS, NL and yield have the
same genomic position significantly associated with the GWAS on chromosome 5. A
76
heterozygous allele at that position has overall less leaf necrosis and bigger leaf size. Both are
desirable agronomically. NL is an undesirable agronomic trait as it decreases the area of
photosynthesis for the plant. Complete drying of leaves can lead to poor vegetative growth in the
plant [131].
77
a) b)
c)
Figure 4.8: Tradeoff between yield, leaf size, and necrosis. a) Manhattan plots showing the
SNPs tested in the FARMCPU model for yield, plant leaf size (PLS), and Necrotic leaves (NL).
The genome-wide significance is drawn after the Bonferroni correction at 0.05 (0.05/number of
tests). b) Distribution of alleles on chromosome 5 at position 35631842 for b) PLS and c) NL.
Colors represent the change in leaf morphology based on the change in allele in b and change in
leaf morphology in c.
4.3.8 Effect of the environment of origin of the wild NAM parents on
shattering in the F2s
Shattering is another trait important that the domesticators cared about. Chickpea
cultivars are known to have reduced shattering, whereas their wild counterparts are known to
A/A A/G G/G
Normal
Extra Large
0 200 400 600 800 1000
A/A A/G G/G
yes
no
0 200 400 600 800 1000
78
shatter to continue their propagation. GWAS results of shattering showed a prominent peak on
chromosome 6 (Figure 4.8 a). On examining the most significant SNP associated with shattering
(Ca6_2044973) the homozygote wild allele has a higher median shattering whereas the
homozygote cultivated allele has a lower median shattering. This trend is expected and consistent
with the knowledge that domestication changed the degree of shattering. The next step was to
look at the distribution of this allele in the individual NAM populations. Under different genetic
backgrounds, the introduction of wild allele increases shattering across all the populations
however, in some populations the heterozygous wild and cultivated allele has a higher shattering
compared to the homozygous wild allele. To understand this trend further, we looked at the
geography of origin of the parents. There is a trend with elevation and shattering. NAM parents
originating from higher altitudes have less shattering i.e a homozygous wild allele has lower
shattering than a heterozygous allele.
79
Figure:
a) c)
b) d)
Figure 4.9: a)Manhattan plots showing the SNPs tested in the FARMCPU model for shattering
index(SI). The genome-wide significance is drawn after the Bonferroni correction at 0.05
(0.05/number of tests). b) Violin plot showing the distribution of shattering index on the Y axis
and x-axis shows the allelic distribution of the most significant GWAS SNP from a) in all the
NAM populations combined. The position (2044973) of the most significant SNP on
Chromosome 6(Ca6) has C as the reference allele. c) It is an extension of b. It shows the
distribution of Ca6_2044973 in each of the NAM populations separately. d) Is the distribution of
most significant SNP for yield in all the NAM populations.
80
We know that the degree of shattering in the wild varies based on its native
environmental conditions. Wild parents in the NAM population were collected from
geographical regions with varying elevations. We wanted to assess if the environment of origin
of the NAM parents played a role in the distribution of shattering across their F2 populations.
Indeed, parents that were sampled from lower elevations (Oyali) showed higher median
shattering when compared to those that were sampled from the mountains (Sirnak). This gradient
of shattering as a function elevation was observed for yield too.
Figure 4.10: Structure plot from [2]. The Structure plot is overlaid on the map of Turkey. 19 of
the C. ret NAM parents are divided into 8 genetic groups represented by different colors. This
grouping is used to cluster the NAM parents by drawing circles around the sites from which
these parents were collected in Turkey. Each circle has a number associated with it to denote its
group number. The structural grouping is used a proxy to cluster the NAM families.
4.3.9 Gene level phylogeny shows signs of introgression of C. echi into
C. ari
We performed gene-level analysis for some of the genes found around the GWAS hit for
different traits. As a result, three major phylogenetic patterns emerged. The most common was
81
the known genome level phylogeny of the three species C.echi, C.ret and C.ari where C. ari show
divergence from the two wild relatives. The second phylogenetic pattern showed a resemblance
of genes in the C.ari more closely to C.ret indicating the divergence of these two species from
their common ancestor. The third and the most unexpected phylogeny was where we saw that the
genes in C. ari grouped with C.echis. The divergence between C. ari and C.ret is estimated to be
around 10 k years and the divergence between C.ret and C.echi is estimated to be between 95-
127k years. This finding hints at the signs of introgression between C.echis and C.rets.
a)
Cicer reticulatum homeobox protein BEL1 homolog
Cicer echinospermum homeobox protein BEL1 homolog
Cicer arietinum homeobox protein BEL1 homolog
Medicago truncatula homeobox protein BEL1 homolog (ortholog)
Sequences aligned in MEGAX via CulstalW algorithm (Gap Opening Penalty: 15:00 and Gap Extension Penalty: 6.66).
Maximum likelihood trees were also constructed in MEGAX using the Tamura-Nei model validated with 1000 bootstraps.
Cicer reticulatum CEN-like protein 2
Cicer echinospermum CEN-like protein 2
Cicer arietinum CEN-like protein 2
Medicago truncatula CEN-like protein 2
Sequences aligned in MEGAX via CulstalW algorithm (Gap Opening Penalty: 15:00 and Gap Extension Penalty: 6.66).
Maximum likelihood trees were also constructed in MEGAX using the Tamura-Nei model validated with 1000 bootstraps.
Cicer reticulatum COP9 signalosome complex subunit 5a-like
Cicer echinospermum COP9 signalosome complex subunit 5a-like
Cicer arietinum COP9 signalosome complex subunit 5a-like
Medicago truncatula COP9 signalosome complex subunit 5a-like ortholog
17
Sequences aligned in MEGAX via CulstalW algorithm (Gap Opening Penalty: 15:00 and Gap Extension Penalty: 6.66).
Maximum likelihood trees were also constructed in MEGAX using the Tamura-Nei model validated with 1000 bootstraps.
Cicer arietinum ethylene-responsive transcription factor ERF114-like
Cicer echinospermum ethylene-responsive transcription factor ERF114-like
Cicer reticulatum ethylene-responsive transcription factor ERF114-like
Medicago truncatula ethylene-responsive transcription factor ERF114
82
Cicer arietinum
Cicer echinospermum disease resistance response protein 206
Cicer reticulatum disease resistance response protein 206
Medicago truncatula disease resistance response protein 206 ortholog
disease resistance response protein 206
Sequences aligned in MEGAX via CulstalW algorithm (Gap Opening Penalty: 15:00 and Gap Extension Penalty: 6.66).
Maximum likelihood trees were also constructed in MEGAX using the Tamura-Nei model validated with 1000 bootstraps.
Cicer arietinum gibberellin 20 oxidase-like protein
Cicer echinospermum gibberellin 20 oxidase-like protein
Medicago truncatula gibberellin 20 oxidase-like protein
Cicer reticulatum gibberellin 20 oxidase-like protein
Sequences aligned in MEGAX via CulstalW algorithm (Gap Opening Penalty: 15:00 and Gap Extension Penalty: 6.66).
Maximum likelihood trees were also constructed in MEGAX using the Tamura-Nei model validated with 1000 bootstraps.
Cicer arietinum protein SHORT-ROOT-like
Cicer reticulatum protein SHORT-ROOT-like
Cicer echinospermum protein SHORT-ROOT-like
Medicago truncatula protein SHORT-ROOT
Sequences aligned in MEGAX via CulstalW algorithm (Gap Opening Penalty: 15:00 and Gap Extension Penalty: 6.66).
Maximum likelihood trees were also constructed in MEGAX using the Tamura-Nei model validated with 1000 bootstraps.
Cicer arietinum AP2-like ethylene-responsive transcription factor PLT2
Cicer reticulatum AP2-like ethylene-responsive transcription factor PLT2
Cicer echinospermum AP2-like ethylene-responsive transcription factor PLT2
Medicago truncatula AP2-like ethylene-responsive transcription factor PLT2
Sequences aligned in MEGAX via CulstalW algorithm (Gap Opening Penalty: 15:00 and Gap Extension Penalty: 6.66).
Maximum likelihood trees were also constructed in MEGAX using the Tamura-Nei model validated with 1000 bootstraps.
Cicer arietinum protein LEAFY
Cicer reticulatum protein LEAFY
Cicer echinospermum protein LEAFY
Medicago truncatula protein UNIFOLIATA
Sequences aligned in MEGAX via CulstalW algorithm (Gap Opening Penalty: 15:00 and Gap Extension Penalty: 6.66).
Maximum likelihood trees were also constructed in MEGAX using the Tamura-Nei model validated with 1000 bootstraps.
Cicer arietinum MLO protein homolog 1-like
Cicer reticulatum MLO protein homolog 1-like
Cicer echinospermum MLO protein homolog 1-like
Medicago truncatula MLO protein homolog 1-like
Sequences aligned in MEGAX via CulstalW algorithm (Gap Opening Penalty: 15:00 and Gap Extension Penalty: 6.66).
Maximum likelihood trees were also constructed in MEGAX using the Tamura-Nei model validated with 1000 bootstraps.
Cicer arietinum protein BIG GRAIN 1-like B
Cicer reticulatum protein BIG GRAIN 1-like B
Cicer echinospermum protein BIG GRAIN 1-like B
Medicago truncatula protein BIG GRAIN 1-like B ortholog
Sequences aligned in MEGAX via CulstalW algorithm (Gap Opening Penalty: 15:00 and Gap Extension Penalty: 6.66).
Maximum likelihood trees were also constructed in MEGAX using the Tamura-Nei model validated with 1000 bootstraps.
Cicer arietinum auxin response factor 18
Cicer reticulatum auxin response factor 18
Cicer echinospermum auxin response factor 18
Medicago truncatula auxin response factor 10
Sequences aligned in MEGAX via CulstalW algorithm (Gap Opening Penalty: 15:00 and Gap Extension Penalty: 6.66).
Maximum likelihood trees were also constructed in MEGAX using the Tamura-Nei model validated with 1000 bootstraps.
Cicer arietinum transcription factor RAX2-like
Cicer reticulatum transcription factor RAX2-like
Cicer echinospermum transcription factor RAX2-like
Medicago truncatula transcription factor RAX2-like
Sequences aligned in MEGAX via CulstalW algorithm (Gap Opening Penalty: 15:00 and Gap Extension Penalty: 6.66).
Maximum likelihood trees were also constructed in MEGAX using the Tamura-Nei model validated with 1000 bootstraps.
83
b) c)
Figure 4.11: a) phylogenetic analysis of genes around GWAS hits. b) Violin plot showing the
distribution of SI in the C.echi NAM populations. C) distribution of yield in C.echi NAM
populations.
Figure 4.12: I-TASSER of disease resistance response 206 protein. The dirigent domain that
confers protein functionality in lignificition is in blue. The remaining portion of the protein is
shown in green. Vernolic acid, the proposed ligand is shown in orange[132]
84
4.4 Discussion
Domestication of crops is assessed by examining traits like seed size, ear-shattering in the
cereals, pod indehiscence in the pulses, and capsule indehiscence in flax. Wild plants have
smaller seeds, disarticulate their pods and ears at maturation, and shatter their seed for dispersal.
Loss of shattering is the best effective marker for recognizing domestication in grains and crops.
While selection against shattering is an early domestication trait, larger seed size is achieved at a
later stage of domestication and is known to be controlled by multiple genes.
In this study, we performed GWAS on the NAM populations of chickpea. All the 19
families generated have 1942 F2s that were analyzed in the single GWAS model. Separate
models were run for all the16 traits. While all the traits are important for the breeders, yield
holds the most value. In the analysis present, we see that the introduction of the wild allele
increases the yield in both heterozygous and homozygous wild states. This hints at the cost the
domesticators of chickpeas had to pay for domesticating the crop. Various speculations arise on
why this would be the case. One very apparent reason is the tradeoff of yield with other
important traits. We see that NL, PLS, PF could be the traits the initial farmers of chickpeas
would be cared about.
The other trait we focused on was SI. GWAS was very effective at predicting a strong
genetic locus associated with the trait. When we followed up on the history of the traits in the
wild, what we see is that F2 individuals that have homozygous allele of wild parents originating
from mountain top shatter less when compared to other allelic states. In F2s with a homozygous
allele of parents originating from the lower elevation, we see a higher shattering. This hints at the
degree of shattering in the wild in their native habitat. Plants on the mountain experience
different weather constraints to those that are on the plains with lower elevation.
85
The third finding in this study was the phylogeny differences observed between genes
that were found around the GWAS hits. Genome level phylogeny of the three Cicer species echi,
ret and ari shows echi as an outgroup (citation). When we construct phylogenetic trees for genes
taken one at a time, we see some genes that show signs of introgression in the cultivated from the
C.ehis, the common ancestor of C.ari and C.ret. While some genes in cultivars show signs of
divergence from C.ret. This gives us an idea that the domestication on the gene level was
different than the genome level. Different genes experienced varied selection pressures to reach
its current form.
With these observations in mind, we hypothesize that shattering is among the first set of
traits the domesticator looks to select against in the domesticate, and some of the important genes
necessary for shattering are shared in C.ari with the common ancestor C.echi.
4.5 Conclusion
Domestication has changed the genomic landscape of chickpeas and the reduction in its
genetic diversity is well known. This study is one step in the direction of expanding the genetic
diversity in the chickpea cultivars by first understanding the genetic basis of agronomically
important traits that were historically selected during its domestication. NAM design helps
generate crosses that can be bred for multiple generations that can help fix desirable alleles for
those traits and retain the diversity from wild relatives in the process.
Here we found that 1) the domesticators of chickpeas compromised on the overall plant
yield in lieu of selecting for reduced leaf necrosis and bigger leaf size, 2) Geography of origin of
86
the wild NAM parents dictate the degree of shattering in the F2s, 3) Phylogeny of genes in the
proximity of the GWAS hits for traits two main signatures, introgression from the common
ancestor and differentiation from the closely related wild because of domestication.
87
Chapter 5: Breeding superior kelp using GWAS and genomic selection
models
Anupam Singh
1
, Gary Molano
1
, Ania Webb
1
, Filipe Alberto
2
, Sergey V. Nuzhdin
1
1
Department of Biological Sciences, University of Southern California, Los Angeles, CA, 90089,
USA,
2
Department of Biological Sciences, University of Wisconsin-Milwaukee, Milwaukee, WI,
United States. In Prep
5.1 Introduction
Marine macroalgae like giant Kelp (Macrocystis pyrifera), produces products like alginic
acid, biofuels, feed stalk for other high-value seafood like abalone [133] that are of high
economic value. With the ever-increasing demand for fossil fuels and the concerns around their
impact on the environment, alternate sources of fuels are necessary to meet the growing need
[134][135]. Biomass generated from the marine macroalgae commonly known as seaweed is the
starting point in producing such products. Cultivation of marine organisms is different from
terrestrial farming as the additional resources required to grow plants like fertilizers and arable
land is unnecessary. biological productivity of the seaweed can sometimes produce more
biomass yield compared to most of the terrestrial plants.
Existing biomass production of commercialized macroalgae [133]involves using natural
populations and mass cultures that have not been established. Laboratory methods are in place to
generate seedlings using genetic methods like hybrid vigor to cultivate high biomass of seaweed.
Sophisticated genetic approaches to breeding need to be adopted to facilitate the large-scale
production of giant kelp. Efforts are in place to create a seed bank of genetically diverse
genotypes which can be phenotyped for important domestication traits and serve as a critical
88
resource for the breeding community. In this study we take an approach in that direction by
utilizing the genetic resources of in M. pyrifera [136] and phenotyping hybrids for important
traits leading to higher biomass productivity. We use genome-wide association study (GWAS) to
understand the genetic regions governing the traits.
Another important aspect to breeding superior kelp is that it's important to identify
genotypes that can be used for making crosses that can yield desirable plant biomass and meet
the required productivity. Genomic selection is a method helpful in predicting favorable
genotypes that when crossed could breed hybrids of higher performance.
5.2 Methods
5.2.1 Phenotype collection
The phenotype data were collected for a total of 500 F1 hybrids (sporophytes) produced by
specific crosses of female and male gametophytes from the UWM germplasm collection. 5
replicates of each were planted. The farm design consisted of one male gametophyte collected
originally from Leo Carrillo was crossed with a total of 500 females gametophytes that were
initially collected from Arroyo Quemado (Santa Barbara Channel, N=50), Leo Carrilo (East from
Santa Barbar Channel, n=350), Wrigley (Catalina Island, N=50) and Carlsbad (northern San Diego
County, N=50). The genotypes were placed on the farm in Santa Barbara in March of 2019. The
89
planted F1s were a diploid sporophyte that was attached to a 4 inch, 1mm diameter piece of string.
The replicated sporophyte was attached to the replicate string.
Three agronomic traits that were measured in the farm are stipe number, blade number,
and plant biomass. Stipe number was measured by counting the total number of stipes in each
sporophyte, number of blades were measured by counting the total number of blades on each
individual cross. Plant biomass was measured by blade number and stipe number.
5.2.2 DNA sequencing and SNP calling
Sequenced Illumina reads were mapped to the M. pyrifera draft genome (specifically the
210416_CI_03 draft genome) using hisat2 with default mapping parameters . The best practices
GATK4 pipeline was used for calling and filtering SNPs [137]. Initial filtering kept sites with a
minQ score > 30 and a minmeanDP of 5. Further filtering by max-missing rate for genotype kept
sites that were present in 95% of the individuals. The final SNP set was then pruned to keep sites
with a minor allele frequency >0.03.
5.2.3 GWAS
The Fixed and random model Circulating Probability Unification (FarmCPU) method
implemented in R was used to perform GWAS [28], [29] .
The FARMCPU method is a Multiple Loci Linear Mixed Model (MLMM) divided into
two parts: Fixed Effect Model (FEM) and a Random Effect Model (REM) and they are used
90
iteratively. In FEM markers are tested one at a time, and false positives are controlled by using
multiple associated markers as covariates. The overfitting problem in FEM is circumvented in
REM by defining the kinship using the associated markers. At each iteration, the P values are
unified for both testing markers and the associated markers.
In FEM, each of the tests markers are tested one at a time and a set of pseudo
Quantitative Trait Nucleotides (QTNs) are used as covariates. The model can be written as:
(1)
In Equation 1, yi is the phenotype observation of the ith individual; Mi1, Mi2,…, Mit
represnt the genotypes of t pseudo QTNs, initiated as an empty set; b1, b2, …, bj are the effects
of the corresponding pseudo QTNs; Sij is the genotype of the ith individual and jth genetic
marker; the corresponding effect of the jth genetic marker is dj; ei represents the residuals having
a distribution with zero mean and variance of .
In equation 1, each of the markers being tested receive a P value except for those that are
designated as pseudo QTNs are used as covariates. In the beginning, these pseudo QTN markers
are assigned P value of “NA" (Not Available). When each pseudo QTN is examined for each of
the testing markers, most significant P value replaces the “NA” for that pseudo QTN and it
becomes the P value of its corresponding marker.
The above step generates P value for every marker. The P values and the associated
marker map are used to update the selection of pseudo QTNs by using the SUPER algorithm[24]
in a REM as follow:
(2)
where yi and ei are same as in Eq (1) and ui is the total genetic effect of the i
th
individual.
The expectations of total genetic effects for the individuals’ are zeros. The variance and
91
covariance matrix of the individuals’ total genetic effects is , where unknown
genetic variance is and kinship obtained from the pseudo QTNs K .
The pseudo QTNs in the FEM, Eq (1) is replaced by the set of pseudo QTNs that
maximizes the likelihood of the REM, Eq (2) . When no change occurs in the estimated set of
pseudo QTNs occurs, this iteration stops [29].
Genome wide significance threshold for the P values generated for each of the markers
was calculated and set using the Bonferroni test threshold set at 0.01.
5.2.4 Genome Selection
The genome selection model was built using a two-step method. For the first step, we
implemented a genomic best linear unbiased prediction (gBLUP) method to make predictions for
plant biomass for each of the genotypes.
For the second step of selection process, we aimed to identify locis that can be fixed in
the population to achieve a higher plant biomass. For this, we identified all the significant
(SNPs) from the GWAS model (top 8 SNPs) and fit a stepwise regression model to test which of
these SNPs best predicted the average plant biomass data. We came up with 7 SNPs that best
predicted the phenotype and the final multiple regression model looks like:
average_plant_biomass ~ SNP1+SNP2+SNP3+SNP4+SNP5+SNP6+SNP7
(PGA_scaffold8__83_contigs__length_51047780_17740877 +
PGA_scaffold13__41_contigs__length_21129140_1435593 +
PGA_scaffold25__21_contigs__length_13377422_10710593 +
92
PGA_scaffold27__11_contigs__length_14080196_413956 +
PGA_scaffold29__94_contigs__length_14242823_2296212 +
PGA_scaffold29__94_contigs__length_14242823_5190537 +
PGA_scaffold29__94_contigs__length_14242823_5609425)
5.3 Results
5.3.1 Phenotype results
In 2019, a total of 500 genotypes of giant kelp were grown in a common garden farm
with 5 replicates for each of the genotype. Three main phenotypes, individual count, plant stipe
weight and plant weight were collected for all the 2500 plants. Plant biomass was calculated
from stipe and blade weight. Figure 5.1 represents a comprehensive picture of the relationship
between the four traits. The data has some outliers but for the most part its normally distributed.
The assumptions of normality can hold true considering the sample size of the data. The
phenotype data was highly variable and noisy across the replicates of a single genotype. The
survival of the genotypes was also variable across the replicates (Figure 5.2)
93
Figure 5.1: A summary plot showing the correlation of the four traits collected for giant kelp
sporophytes: Blade weight, Stipe weight, plant biomass, and individual count of blades on the
sporophytes across 4 different populations. The first column of the plot also shows the total
number of each replicate (numbered from 1 to 5) collected across all the genotypes and column 2
shows a population-specific number of genotypes grown on the farm.
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
● ●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
● ● ●
●
● ● ● ●
●
●
● ● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
● ●
●
● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ● ●
●
● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ● ● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ● ● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ●
●
●
●
●
●
●
●
●
● ●
● ● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
● ●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
● ● ●
●
●
●
●
● ●
● ●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
● ● ●
●
●
● ●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
● ● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
● ● ● ●
● ● ● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
● ●
●
● ●
●
●
● ● ●
●
● ●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ● ●
● ●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
● ●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ● ● ●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
● ● ● ●
●
● ●
●
●
● ●
●
●
●
●
● ●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ● ●
● ● ●
●
●
●
●
●
●
●
●
● ●
●
● ● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
● ●
● ●
● ● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
● ● ●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ● ●
●
●
● ● ●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
● ●
● ●
● ●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
● ●
● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
● ●
● ●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
● ●
●
●
● ●
●
●
●
● ●
●
●
●
● ●
●
● ●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
● ●
● ●
●
● ●
●
●
●
●
●
● ●
●
● ●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
● ●
●
● ●
● ● ●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
● ● ●
● ●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
● ● ●
●
● ●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
● ● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
● ●
●
● ●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ● ●
●
●
●
●
● ● ●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
● ●
●
●
● ● ●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
● ●
●
●
● ●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
● ●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
● ●
●
● ●
●
●
●
● ●
●
●
●
● ●
●
●
● ● ●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ● ● ●
●
● ●
●
●
●
● ● ● ●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
● ●
●
●
●
● ●
●
●
●
● ●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ● ●
●
●
●
● ● ●
●
● ●
●
●
●
● ●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
● ● ●
●
●
●
●
●
●
●
● ●
● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ● ●
● ●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
● ● ● ●
●
●
●
● ●
●
●
●
●
●
●
●
● ● ● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
● ● ●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ● ● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
● ●
● ●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ● ●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ● ●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
● ●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ● ●
●
●
● ●
●
● ●
●
●
● ● ●
●
● ●
●
● ●
●
●
● ● ●
●
●
● ●
●
●
●
●
●
● ●
●
● ●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
● ●
●
● ●
●
●
● ●
●
● ●
●
●
● ●
● ● ●
● ● ● ●
●
●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Corr:
0.885
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
● ●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ● ● ●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
● ● ● ●
●
● ●
●
●
● ●
●
●
●
●
● ●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ● ●
● ● ●
●
●
●
●
●
●
●
●
● ●
●
● ● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
● ●
● ●
● ● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
● ● ●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ● ●
●
●
● ● ●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
● ●
● ●
● ●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
● ●
● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
● ●
● ●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
● ●
●
●
● ●
●
●
●
● ●
●
●
●
● ●
●
● ●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
● ●
● ●
●
● ●
●
●
●
●
●
● ●
●
● ●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
● ●
●
● ●
● ● ●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
● ● ●
● ●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
● ● ●
●
● ●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
● ● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
● ●
●
● ●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ● ●
●
●
●
●
● ● ●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
● ●
●
●
● ● ●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
● ●
●
●
● ●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
● ●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
● ●
●
● ●
●
●
●
● ●
●
●
●
● ●
●
●
● ● ●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ● ● ●
●
● ●
●
●
●
● ● ● ●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
● ●
●
●
●
● ●
●
●
●
● ●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ● ●
●
●
●
● ● ●
●
● ●
●
●
●
● ●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
● ● ●
●
●
●
●
●
●
●
● ●
● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ● ●
● ●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
● ● ● ●
●
●
●
● ●
●
●
●
●
●
●
●
● ● ● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
● ● ●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ● ● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
● ●
● ●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ● ●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ● ●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
● ●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ● ●
●
●
● ●
●
● ●
●
●
● ● ●
●
● ●
●
● ●
●
●
● ● ●
●
●
● ●
●
●
●
●
●
● ●
●
● ●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
● ●
●
● ●
●
●
● ●
●
● ●
●
●
● ●
● ● ●
● ● ● ●
●
●
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ●
● ●
● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Corr:
0.285
Corr:
0.206
●
● ●
●
●
●
● ●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
● ● ●
●
●
●
●
●
●
●
● ●
● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ● ●
● ●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
● ● ● ●
●
●
●
● ●
●
●
●
●
●
●
●
● ● ● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
● ● ●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ● ● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
● ●
● ●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ● ●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ● ●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
● ●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ● ●
●
●
● ●
●
● ●
●
●
● ● ●
●
● ●
●
● ●
●
●
● ● ●
●
●
● ●
●
●
●
●
●
● ●
●
● ●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
● ●
●
● ●
●
●
● ●
●
● ●
●
●
● ●
● ● ●
● ● ● ●
●
●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Corr:
0.991
Corr:
0.939
Corr:
0.269
Replicate Population Blade_weight Stipe_weight Individual_count plant_biomass
Replicate Population Blade_weight Stipe_weight Individual_count plant_biomass
0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60 0 50100 150 2000 50100 150 2000 50100 150 2000 50100 150 200 0 200 400 600 800 0 50 100 150 200 0 10 20 30 0 250 500 750 1000
0
100
200
300
400
0
100
200
300
0
100
200
300
0
100
200
300
0
100
200
300
0
200
400
600
800
0
50
100
150
200
0
10
20
30
0
250
500
750
1000
94
We also looked at the survival of the replicates across all four populations. Genotypes
from Los Angeles (LC) had the lowest survival to death ratio of 1.78 and San Diego (CB) had it
highest with 3.828. The ratio was 2.37 for genotypes from Santa Barbara (AQ), 2.814 for
Channel Islands (CI) Figure (5.2).
a) B)
Figure 5.2: a) Illustration of one kelp plant. b) count of survival of plants across all four
populations. Survival ratio is lowest for Los Angeles population.
95
5.3.2 GWAS results
GWAS was conducted for three traits plant blade weight, plant Biomass, and plant stipe
weight for the kelp farm data collected in the summer of 2019. A total of 243 genotypes were
analyzed in the model. The SNP data was filtered for a call rate of 80% at a given site and for a
minor allele frequency of 3% with 130363 SNPs remaining as an input into the GWAS.
Following is the qq plot and Manhattan plot showing significant hits associated with
average blade weight. GWAS identifies putative locis that are significantly associated with all
the three traits. Once the genome annotation is complete, these results will need to be followed
up with identifying genes in or around these significant SNPs to understand the genetic basis of
the three traits, specifically plant biomass.
96
Figure 5.3: GWAS results of blade weight, plant biomass, and stipe weight. On left is the
manhattan plot showing markers on all the scaffolds (on x-axis) tested in the GWAS model. -
log10(p) value associated with each marker is plotted (on Y-axis) with a solid line showing
genome-wide significance for the P values using Bonferroni correction. Right: Quantile Quantile
(QQ) plot showing the deviation of the markers tested (in blue) against the null (red line) in the
GWAS model There are multiple genomic loci significantly associated with plant balde weight.
97
On looking at the most significant loci associated with average plant biomass, the
homozygous A allele has significantly higher plant biomass when compared to a homozygous G
allele.
Figure 5.4: Violin plot showing the distribution of the allele at the most significant SNP. X-axis
shows two alleles. Y-axis the
5.3.3 Genome selection results
To select giant kelp genotypes and further the breeding program to increase plant
biomass in giant kelp, we performed a multiple-step genome selection method. The genome
selection method predicted a multifold increase in the progeny post-selection. Figure 5.5 shows
the correlation of predicted plant biomass by the genome selection model on the x-axis and
actual plant biomass value collected by genotypes in the farm on the y axis.
98
Figure 5.5: Scatter plot representing the correlation between average plant biomass collected
from the empirical / farm data from 2019 for each of the genotypes and predicted values for the
same trait using gBLUP. The highlighted red dot indicates the average plant biomass calculated
from the multiple regression model after fixing those 7 SNPs in the population.
5.4 Discussion
Domesticating kelp is essential for meeting the growing demand for macroalgae. This
project is aiming to domesticate kelp by first laying a foundation for understanding genes
governing plant biomass, the most desirable trait for kelp domestication. The common garden
setup to phenotype the plants is a common method utilized in the terrestrial system.
Phenotypes collected showed a high degree of variation between replicates. This
understanding provided us with insights to increase the replicate number in the future
phenotyping experiments. Survival of the genotypes was also another factor in the phenotyping
99
experiment. Date of planting of the sporophytes in the farm was an important factor in dictating
the survival of the plants on the farm.
This study utilized genomic models that are building blocks for the domestication
process. While phenotype-based selection is a time taking process and it takes many years of
careful breeding to establish robust traits that can be consistently produced over multiple
generations. The genomic models will benefit greatly from replicate phenotyping across multiple
years as it will help understand the gene by environment interaction of the desirable traits like
plant biomass.
5.5 Conclusion
GWAS and GS are the two most commonly and utilized new age methods for selection of traits
and parents that can breed superior crosses over generations. In this project, we performed
GWAS to identify gen genomic regions associated with three important giant kelp phenotypes,
stipe weight, blade weight, and biomass. We have identified putative loci that shows a strong
association with the trait. We also performed genome selection to identify genomic loci that
when bred into the population would yield progeny with double biomass.
100
Chapter 6: Conclusion
This dissertation presents an overview of phenotypes that have helped domestication in
terrestrial crops like chickpeas. Genetic underpinnings that have shaped the genomic landscape
in the elite cultivars have also eroded its genetic diversity over time. The chickpea team has
taken a step in the direction of expanding the existing genetic diversity in the domesticated
chickpeas by introducing the wild material into them using the popular method of nested
association mapping. With the current work, we understand the genetic locis that govern the
desired agronomic trait.
The knowledge of domestication on land can be extended to the marine system and it can
provide a good framework for domestication of agronomically important macroalgal species like
M. pyrifera a.k.a giant kelp. The GWAS and GS work presented in chapter 5 is an attempt in the
direction of domesticating kelp and providing genomic resources for kelp domesticate.
101
Reference List
[1] “J. Diamond, Evolution, consequences and future of plant and animal domestication, Nature,
418 (2002), pp. 700-707”.
[2] E. J. von Wettberg et al., “Ecology and genomics of an important crop wild relative as a
prelude to agricultural innovation,” Nat Commun, vol. 9, 2018, doi: 10.1038/s41467-018-
02867-z.
[3] H. Dempewolf, R. J. Eastwood, L. Guarino, C. K. Khoury, J. V. Müller, and J. Toll,
“Adapting Agriculture to Climate Change: A Global Initiative to Collect, Conserve, and
Use Crop Wild Relatives,” Agroecol. Sustain. Food Syst., vol. 38, no. 4, pp. 369–377, Apr.
2014, doi: 10.1080/21683565.2013.870629.
[4] J. P. van der Meer, “The Domestication of Seaweeds,” BioScience, vol. 33, no. 3, pp. 172–
176, 1983, doi: 10.2307/1309270.
[5] R. Loureiro, C. Gachon, and C. Rebours, “Seaweed cultivation: Potential and challenges of
crop domestication at an unprecedented pace,” New Phytol., vol. 206, Jan. 2015, doi:
10.1111/nph.13278.
[6] J. Spindel et al., “Genomic Selection and Association Mapping in Rice (Oryza sativa): Effect
of Trait Genetic Architecture, Training Population Composition, Marker Number and
Statistical Model on Accuracy of Rice Genomic Selection in Elite, Tropical Rice Breeding
Lines,” PLOS Genet., vol. 11, no. 2, p. e1004982, Feb. 2015, doi:
10.1371/journal.pgen.1004982.
[7] T. H. Meuwissen, B. J. Hayes, and M. E. Goddard, “Prediction of total genetic value using
genome-wide dense marker maps.,” Genetics, vol. 157, no. 4, pp. 1819–1829, Apr. 2001,
doi: 10.1093/genetics/157.4.1819.
[8] “Gaur, P. M., Samineni, S., Tripathi, S., Varshney, R. K., & Gowda, C. L. (2014). Allelic
relationships of flowering time genes in chickpea. Euphytica, 203(2), 295-308.
doi:10.1007/s10681-014-1261-7”.
[9] S. Ridge et al., “The Chickpea Early Flowering 1 (Efl1) Locus Is an Ortholog of Arabidopsis
ELF3,” Plant Physiol., vol. 175, no. 2, pp. 802–815, Oct. 2017, doi: 10.1104/pp.17.00082.
[10] V. Hegde, “Genetics of flowering time in chickpea in a semi‐arid environment,” Plant
Breed., vol. 129, pp. 683–687, Jan. 2010, doi: 10.1111/j.1439-0523.2009.01748.x.
[11] J. Ma, L. U. Wingen, S. Orford, P. Fenwick, J. Wang, and S. Griffiths, “Using the UK
reference population Avalon × Cadenza as a platform to compare breeding strategies in
elite Western European bread wheat,” Mol. Breed., vol. 35, no. 2, p. 70, Feb. 2015, doi:
10.1007/s11032-015-0268-7.
102
[12] R. Bernardo, “Molecular Markers and Selection for Complex Traits in Plants: Learning
from the Last 20 Years,” Crop Sci. - CROP SCI, vol. 48, Sep. 2008, doi:
10.2135/cropsci2008.03.0131.
[13] O. Ladejobi et al., “Maximizing the potential of multi-parental crop populations,” Appl.
Transl. Genomics, vol. 11, pp. 9–17, Dec. 2016, doi: 10.1016/j.atg.2016.10.002.
[14] J. L. Gage, B. Monier, A. Giri, and E. S. Buckler, “Ten Years of the Maize Nested
Association Mapping Population: Impact, Limitations, and Future Directions[OPEN],”
Plant Cell, vol. 32, no. 7, pp. 2083–2093, Jul. 2020, doi: 10.1105/tpc.19.00951.
[15] M. D. McMullen et al., “Genetic properties of the maize nested association mapping
population.,” Science, vol. 325, no. 5941, pp. 737–740, Aug. 2009, doi:
10.1126/science.1174320.
[16] “Hajjar, R. & Hodgkin, T. The use of wild relatives in crop improvement: a survey of
developments over the last 20 years. Euphytica 156, 1–13 (2007)”.
[17] S. McCouch et al., “Feeding the future,” Nature, vol. 499, no. 7456, pp. 23–24, Jul. 2013,
doi: 10.1038/499023a.
[18] “U.S. Department of Health and Human Services and U.S. Department of Agriculture. 2015
– 2020 Dietary Guidelines for Americans. 8th Edition. December 2015. Available at
https://health.gov/our-work/food-nutrition/previous-dietary-guidelines/2015.”
[19] T. C. Wallace, R. Murray, and K. M. Zelman, “The Nutritional Value and Health Benefits
of Chickpeas and Hummus,” Nutrients, vol. 8, no. 12, p. 766, Nov. 2016, doi:
10.3390/nu8120766.
[20] “Zohary D, Hopf M, Weiss E. 2012. Domestication of plants in the Old World: the origin
and spread of domesticated plants in Southwest Asia, Europe, and the Mediterranean Basin,
4th edn. Oxford, UK: Oxford University Press”.
[21] J. Berger, S. Milroy, N. Turner, K. Siddique, M. Imtiaz, and R. Malhotra, “Chickpea
evolution has selected for contrasting phenological mechanisms among different habitats,”
Euphytica, vol. 180, 2011, doi: 10.1007/s10681-011-0391-4.
[22] J. T. Anderson, J. H. Willis, and T. Mitchell-Olds, “Evolutionary genetics of plant
adaptation.,” Trends Genet. TIG, vol. 27, no. 7, pp. 258–266, Jul. 2011, doi:
10.1016/j.tig.2011.04.001.
[23] S. Abbo, J. Berger, and N. Turner, “Evolution of cultivated chickpea: Four bottlenecks limit
diversity and constrain adaptation,” Funct Plant Biol, vol. 30, 2003, doi: 10.1071/FP03084.
[24] Roorkiwal M, von Wettberg EJ, Upadhyaya HD, Warschefsky E, Rathore A, Varshney RK,
“Exploring Germplasm Diversity to Understand the Domestication Process in Cicer spp.
Using SNP and DArT Markers,” PLoS ONE, 2014, doi: 9(7): e102016.
103
[25] “Berger, J. et al. Ecogeography of Annual Wild Species. Crop. Sci. 43, 1076–1090 (2003)”.
[26] X. Zheng, D. Levine, J. Shen, S. M. Gogarten, C. Laurie, and B. S. Weir, “A high-
performance computing toolset for relatedness and principal component analysis of SNP
data,” Bioinforma. Oxf. Engl., vol. 28, no. 24, pp. 3326–3328, Dec. 2012, doi:
10.1093/bioinformatics/bts606.
[27] Brian G Peterson, Peter Carl, Kris Boudt, Ross Bennett, Joshua Ulrich, Eric Zivot, M
Lestel, K Balkissoon, D Wuertz, “PerformanceAnalytics: Econometric tools for
performance and risk analysis,” R Package Version, vol. 1, no. 3, 2014.
[28] A. E. Lipka et al., “GAPIT: genome association and prediction integrated tool,”
Bioinformatics, vol. 28, no. 18, pp. 2397–2399, Sep. 2012, doi:
10.1093/bioinformatics/bts444.
[29] Xiaolei Liu, Meng Huang, Bin Fan, Edward S. Buckler, Zhiwu Zhang, “Iterative Usage of
Fixed and Random Effect Models for Powerful and Efficient Genome-Wide Association
Studies,” PloS Genet., Feb. 2016, doi: https://doi.org/10.1371/journal.pgen.1005767.
[30] K. Kozlov et al., “Non-linear regression models for time to flowering in wild chickpea
combine genetic and climatic factors,” BMC Plant Biol., vol. 19, no. 2, p. 94, Mar. 2019,
doi: 10.1186/s12870-019-1685-2.
[31] E. H. Roberts, P. Hadley, and R. J. Summerfield, “Effects of temperature and photoperiod
on flowering in chickpeas (Cicer arietinum L),” Ann Bot, vol. 55, 1985, doi:
10.1093/oxfordjournals.aob.a086969.
[32] J. B. Smithson, J. A. Thompson, and R. J. Summerfield, “Chickpea (Cicer arietinum L),” in
Grain Legume Crops, R. J. Summerfield and R. E. Roberts, Eds. London: Collins, 1985.
[33] P. Singh and S. M. Virmani, “Modelling growth and yield of chickpea (Cicer arietinum L),”
Field Crop Res, vol. 46, 1996, doi: 10.1016/0378-4290(95)00085-2.
[34] H. D. Upadhyaya et al., “A genome-scale integrated approach aids in genetic dissection of
complex flowering time trait in chickpea,” Plant Mol Biol, vol. 89, 2015, doi:
10.1007/s11103-015-0377-z.
[35] R. H. Ellis et al., “Towards the reliable prediction of time to flowering in six annual crops.
v. chickpea (Cicer arietinum),” Exp Agric, vol. 30, 1994, doi:
10.1017/S0014479700024376.
[36] V. Vadez, A. Soltani, and T. R. Sinclair, “Crop simulation analysis of phenological
adaptation of chickpea to different latitudes of India,” Field Crops Res, vol. 146, 2013, doi:
10.1016/j.fcr.2013.03.005.
[37] A. Soltani, G. Hammer, B. Torabi, M. Robertson, and E. Zeinali, “Modeling chickpea
growth and development: Phenological development,” Field Crops Res, vol. 99, 2006, doi:
10.1016/j.fcr.2006.02.004.
104
[38] V. Vadez, A. Soltani, and T. R. Sinclair, “Modelling possible benefits of root related traits
to enhance terminal drought adaptation of chickpea,” Field Crops Res, vol. 137, 2012, doi:
10.1016/j.fcr.2012.07.022.
[39] A. Soltani, M. Robertson, Y. Mohammad-Nejad, and A. Rahemi-Karizaki, “Modeling
chickpea growth and development: Leaf production and senescence,” Field Crops Res, vol.
99, 2006, doi: 10.1016/j.fcr.2006.02.005.
[40] J. Jones et al., “Toward a new generation of agricultural system data, models, and
knowledge products: State of agricultural systems science,” Agric Syst, vol. 155, 2017, doi:
10.1016/j.agsy.2016.09.021.
[41] “Jones J, Antle J, Basso B, J Boote K, T Conant R, Foster I, Charles J Godfray H, Herrero
M, E Howitt R, Janssen S, Keating B, Muñoz-Carpena R, Porter C, Rosenzweig C, R.
Wheeler T. Brief history of agricultural systems modeling. 2016; 155:240–254.”.
[42] J. Jones et al., “The DSSAT cropping system model,” Eur J Agron, vol. 18, 2003, doi:
10.1016/S1161-0301(02)00107-7.
[43] K. J. Boote, J. Jones, and N. Pickering, “Potential uses and limitations of crop models,”
Agron J, vol. 88, 1996, doi: 10.2134/agronj1996.00021962008800050005x.
[44] K. J. Boote, J. Jones, J. W. White, S. Asseng, and J. I. Lizaso, “Putting mechanisms into
crop production models,” Plant Cell Env., vol. 36, 2013, doi: 10.1111/pce.12119.
[45] B. Keating et al., “An overview of APSIM, a model designed for farming systems
simulation,” Eur J Agron, vol. 18, 2003, doi: 10.1016/S1161-0301(02)00108-9.
[46] “Battisti R, Sentelhas PC, Boote KJ. Sensitivity and requirement of improvements of four
soybean crop simulation models for climate change studies in Southern Brazil. Int J
Biometeorol. 2017. https://doi.org/10.1007/s00484-017-1483-1.”, [Online]. Available:
https://doi.org/10.1007/s00484-017-1483-1
[47] J. R. Williams, C. A. Jones, J. R. Kiniry, and D. A. Spanel, “The EPIC crop growth model,”
Trans ASAE, vol. 32, 1989, doi: 10.13031/2013.31032.
[48] G. G. Wilkerson, J. Jones, K. J. Boote, K. T. Ingram, and J. W. Mishoe, “Modeling soybean
growth for crop management,” Trans Am Soc Agric Eng, vol. 26, 1983, doi:
10.13031/2013.33877.
[49] M. Roorkiwal et al., “Genome-enabled prediction models for yield related traits in
chickpea,” Front Plant Sci, vol. 7, 2016, doi: 10.3389/fpls.2016.01666.
[50] G. Hoogenboom, J. W. White, J. Jones, and K. J. Boote, “Beangro: A process-oriented dry
bean model with a versatile user interface,” Agon J, vol. 86, 1994.
[51] J. Jones, B. Keating, and C. Porter, “Approaches to modular model development,” Agric
Syst, vol. 70, 2001, doi: 10.1016/S0308-521X(01)00054-3.
105
[52] A. Wajid et al., “Simulating the interactive impact of nitrogen and promising cultivars on
yield of lentil (Lens culinaris) using CROPGRO-legume model,” Int J Agric Biol, vol. 15,
2013.
[53] M. N. Ilkaee et al., “Simulation of some of important traits in chickpea cultivars under
different sowing date using cropgro-pea model,” Int J Biosci, vol. 4, 2014.
[54] A. Soltani and T. R. Sinclair, “A simple model for chickpea development, growth and
yield,” Field Crops Res, vol. 124, 2011, doi: 10.1016/j.fcr.2011.06.021.
[55] M. Lal, K. K. Singh, G. Srinivasan, L. S. Rathore, D. Naidu, and C. N. Tripathi, “Growth
and yield responses of soybean in Madhya Pradesh, India to climate variability and
change,” Agric Meteorol, vol. 93, 1999, doi: 10.1016/S0168-1923(98)00105-1.
[56] U. Chung, Y. Kim, B. Seo, and M. Seo, “Evaluation of variation and uncertainty in the
potential yield of soybeans in South Korea using multi-model ensemble climate change
scenarios,” Agrotechnology, vol. 6, 2017.
[57] A. Mohammed, T. Tana, P. Singh, A. Molla, and A. Seid, “Identifying best crop
management practices for chickpea (Cicer arietinum L,) in northeastern Ethiopia under
climate change condition,” Agric Water Manag, vol. 194, 2017, doi:
10.1016/j.agwat.2017.08.022.
[58] D. D. Patil and H. R. Patel, “Calibration and validation of CROPGRO (DSSAT 4.6) model
for chickpea under middle GUJARAT agroclimatic region,” Int J Agric Sci, vol. 9, 2017.
[59] M. Urgaya, “Modeling the impacts of climate change on chickpea production in Adaa
Woreda (East Showa zone) in the semi-arid central rift valley of Ethiopia,” J Pet Env.
Biotechnol, vol. 7, 2016.
[60] S. U. Bhosale et al., “Association analysis of photoperiodic flowering time genes in west
and central African sorghum [Sorghum bicolor (L,) moench],” BMC Plant Biol, vol. 12,
2012, doi: 10.1186/1471-2229-12-32.
[61] A. Visioni et al., “Genome-wide association mapping of frost tolerance in barley (Hordeum
vulgare L),” BMC Genomics, vol. 14, 2013, doi: 10.1186/1471-2164-14-424.
[62] F. Tian et al., “Genome-wide association study of leaf architecture in the maize nested
association mapping population,” Nat Genet, vol. 43, 2011, doi: 10.1038/ng.746.
[63] K. L. Kump et al., “Genome-wide association study of quantitative resistance to southern
leaf blight in the maize nested association mapping population,” Nat Genet, vol. 43, 2011,
doi: 10.1038/ng.747.
[64] W. Yang et al., “Combining high-throughput phenotyping and genome-wide association
studies to reveal natural genetic variation in rice,” Nat Commun, vol. 5, 2014, doi:
10.1038/ncomms6087.
106
[65] W. B. Suwarno, K. V. Pixley, N. Palacios-Rojas, S. M. Kaeppler, and R. Babu, “Genome-
wide association analysis reveals new targets for carotenoid biofortification in maize,”
Theor Appl Genet, vol. 128, 2015, doi: 10.1007/s00122-015-2475-3.
[66] “Lasky JR, Upadhyaya HD, Ramu P, Deshpande S, Hash CT, Bonnette J, Juenger TE,
Hyma K, Acharya C, Mitchell SE, Buckler ES, Brenton Z, Kresovich S, Morris GP.
Genome-environment associations in sorghum landraces predict adaptive traits. Sci Adv.
2015; 1(6). https://doi.org/10.1126/sciadv.1400218.
http://advances.sciencemag.org/content/1/6/e1400218.full.pdf.”, [Online]. Available:
http://advances.sciencemag.org/content/1/6/e1400218.full.pdf
[67] C. Hwang et al., “Next generation crop models: A modular approach to model early
vegetative and reproductive development of the common bean (Phaseolus vulgaris L),”
Agric Syst, vol. 155, 2017, doi: 10.1016/j.agsy.2016.10.010.
[68] J. Hatfield and C. Walthall, “Meeting global food needs: Realizing the potential via
genetics x environment x management interactions,” Agron J, vol. 107, 2015, doi:
10.2134/agronj15.0076.
[69] F. Tardieu and R. Tuberosa, “Dissection and modelling of abiotic stress tolerance in
plants,” Curr Opin Plant Biol, vol. 13, 2010, doi: 10.1016/j.pbi.2009.12.012.
[70] “NNDC. Climate Data On-line.
https://www7.ncdc.noaa.gov/CDO/cdoselect.cmd?datasetabbv=GSOD. Accessed 30 Dec
2017.”, [Online]. Available:
https://www7.ncdc.noaa.gov/CDO/cdoselect.cmd?datasetabbv=GSOD
[71] G. L. Hammer et al., “Genotype-by-environment interaction in grain sorghum. II, Effects of
temperature and photoperiod on ontogeny,” Crop Sci, vol. 29, 1989, doi:
10.2135/cropsci1989.0011183X002900020029x.
[72] T. Horie, “Crop ontogeny and development,” in Physiology and Determination of Crop
Yield, K. J. Boote, J. M. Bennett, T. R. Sinclair, and G. M. Paulsen, Eds. Madison, USA:
ASA, CSSA, and SSSA, 1994.
[73] E. L. Piper, K. J. Boote, J. Jones, and S. S. Grimm, “Comparison of two phenology models
for predicting flowering and maturity date of soybean,” Crop Sci, vol. 36, 1996, doi:
10.2135/cropsci1996.0011183X003600060033x.
[74] X. Yin et al., “A model for photothermal responses of flowering in rice. i. model
description and parameterization,” Field Crops Res, vol. 51, 1997, doi: 10.1016/S0378-
4290(96)03456-9.
[75] M. J. Robertson et al., “Simulation of growth and development of diverse legume species in
apsim,” Aust J Agric Res, vol. 53, 2002, doi: 10.1071/AR01106.
[76] M. J. Robertson et al., “Environmental and genotypic control of time to flowering in canola
and Indian mustard,” Aust J Agric Res, vol. 53, 2002, doi: 10.1071/AR01182.
107
[77] D. J. Major, D. R. Johnson, J. W. Tanner, and I. C. Anderson, “Effects of daylength and
temperature on soybean development,” Crop Sci, vol. 15, 1975, doi:
10.2135/cropsci1975.0011183X001500020009x.
[78] T. Hodges and V. French, “Soybean: soybean stages modeled from temperature, daylenth
and water availability,” Agron J, vol. 77, 1985, doi:
10.2134/agronj1985.00021962007700030031x.
[79] M. O’Neill and C. Ryan, “Grammatical evolution,” EE Trans Evol Comput, vol. 5, 2001,
doi: 10.1109/4235.942529.
[80] F. Noorian, A. de Silva, and P. Leong, “gramEvol: Grammatical Evolution in R,” J Stat
Softw Artic, vol. 71, 2016.
[81] R. Tibshirani, “Regression shrinkage and selection via the LASSO,” J R Stat Soc Ser B, vol.
58, 1996.
[82] K. Kozlov and A. Samsonov, “DEEP – Differential Evolution Entirely Parallel Method for
Gene Regulatory Networks,” J Supercomput, vol. 57, 2011, doi: 10.1007/s11227-010-0390-
6.
[83] K. Kozlov, A. M. Samsonov, and M. Samsonova, “A software for parameter optimization
with differential evolution entirely parallel method,” PeerJ Comput Sci, vol. 2, 2016, doi:
10.7717/peerj-cs.74.
[84] K. Kozlov, L. Y. Novikova, I. V. Seferova, and M. G. Samsonova, “Mathematical model of
soybean development dependence on climatic factors,” Biofizika, vol. 63, 2018.
[85] “Storn R, Price K. Differential evolution – a simple and efficient heuristic for global
optimization over continuous spaces. Technical Report Technical Report TR-95-012, ICSI.
1995.”.
[86] D. Zaharie, “Parameter adaptation in differential evolution by controlling the population
diversity,” in Proc. of 4th InternationalWorkshop on Symbolic and Numeric Algorithms for
Scientific Computing, D. Petcu, Ed. Timisoara, Romania: Analele Universitatii Timisoara,
2002.
[87] R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation
for Statistical Computing, 2018.
[88] K. C. S. Pillai, “Regression shrinkage and selection via the LASSO,” Ann Math Stat, vol.
26, 1955, doi: 10.1214/aoms/1177728599.
[89] “Peter Harrington. Genetic Programming C++ Code. https://github.com/pbharrin/Genetic-
Prog. Accessed 30 Dec 2017.”, [Online]. Available: https://github.com/pbharrin/Genetic-
Prog
108
[90] C. Sanderson and R. Curtin, “Armadillo: a template-based C++ library for linear algebra,” J
Open Source Soft, vol. 1, 2016, doi: 10.21105/joss.00026.
[91] R. R. Curtin et al., “mlpack: A scalable C++ machine learning library,” J Mach Learn Res,
vol. 14, 2013.
[92] “The HDF Group. Hierarchical Data Format, Version 5. http://www.hdfgroup.org/HDF5/.
Accessed 30 Dec 2017.”, [Online]. Available: http://www.hdfgroup.org/HDF5/
[93] “The Blue Brain Project. HighFive - Header-only C++ HDF5 Interface.
https://github.com/pbharrin/Genetic-Prog.”, [Online]. Available:
https://github.com/pbharrin/Genetic-Prog
[94] “The Qt Company. Qt Library. https://www.qt.io/. Accessed 30 Dec 2017.”, [Online].
Available: https://www.qt.io/
[95] “Kozlov K. NLREG. https://gitlab.com/mackoel/nlreg. Accessed 30 Dec 2017.”, [Online].
Available: https://gitlab.com/mackoel/nlreg
[96] P. J. Bradbury, Z. Zhang, D. E. Kroon, T. M. Casstevens, Y. Ramdoss, and E. S. Buckler,
“TASSEL: software for association mapping of complex traits in diverse samples,”
Bioinformatics, vol. 23, 2007, doi: 10.1093/bioinformatics/btm308.
[97] J. Berger, D. Shrestha, and C. Ludwig, “Reproductive Strategies in Mediterranean
Legumes: Trade-Offs between Phenology, Seed Size and Vigor within and between Wild
and Domesticated Lupinus Species Collected along Aridity Gradients,” Front Plant Sci,
vol. 8, 2017, doi: 10.3389/fpls.2017.00548.
[98] J. Kumar and S. Abbo, “Genetics of flowering time in chickpea and its bearing on
productivity in semiarid environments,” in Advances in Agronomy, vol. 72, Academic
Press, 2001, pp. 107–138. doi: 10.1016/S0065-2113(01)72012-3.
[99] Daniel Zohary,Maria Hopf,Ehud Weiss, “The Origin and Spread of Domesticated Plants in
Southwest Asia, Europe, and the Mediterranean Basin,” in Domestication of Plants in the
Old World, [Online]. Available:
https://oxford.universitypressscholarship.com/view/10.1093/acprof:osobl/9780199549061.0
01.0001/acprof-9780199549061
[100] Q. Chen et al., “TeoNAM: A Nested Association Mapping Population for Domestication
and Agronomic Trait Analysis in Maize,” Genetics, vol. 213, no. 3, pp. 1065–1078, Nov.
2019, doi: 10.1534/genetics.119.302594.
[101] C. A. Fragoso et al., “Genetic Architecture of a Rice Nested Association Mapping
Population,” G3 GenesGenomesGenetics, vol. 7, no. 6, pp. 1913–1926, Jun. 2017, doi:
10.1534/g3.117.041608.
109
[102] K. W. Jordan et al., “The genetic architecture of genome-wide recombination rate
variation in allopolyploid wheat revealed by nested association mapping.,” Plant J. Cell
Mol. Biol., vol. 95, no. 6, pp. 1039–1054, Sep. 2018, doi: 10.1111/tpj.14009.
[103] A. Xavier et al., “Genome-Wide Analysis of Grain Yield Stability and Environmental
Interactions in a Multiparental Soybean Population,” G3 GenesGenomesGenetics, vol. 8,
no. 2, pp. 519–529, Feb. 2018, doi: 10.1534/g3.117.300300.
[104] A. Maurer et al., “Modelling the genetic architecture of flowering time control in barley
through nested association mapping,” BMC Genomics, vol. 16, no. 1, p. 290, Apr. 2015,
doi: 10.1186/s12864-015-1459-7.
[105] L. M. Nice et al., “Development and Genetic Characterization of an Advanced
Backcross-Nested Association Mapping (AB-NAM) Population of Wild × Cultivated
Barley,” Genetics, vol. 203, no. 3, pp. 1453–1467, Jul. 2016, doi:
10.1534/genetics.116.190736.
[106] S. Bouchet et al., “Increased Power To Dissect Adaptive Traits in Global Sorghum
Diversity Using a Nested Association Mapping Population,” Genetics, vol. 206, no. 2, pp.
573–585, Jun. 2017, doi: 10.1534/genetics.116.198499.
[107] Kumar J, Rao BV, “Super early chickpea developed at ICRISAT Asia center. Int.,”
Chickpea Pigeonpea Newsl 3:17 –18, 1996.
[108] P. Gil et al., “BIG: a calossin-like protein required for polar auxin transport in
Arabidopsis,” Genes Dev., vol. 15, no. 15, pp. 1985–1997, Aug. 2001, doi:
10.1101/gad.905201.
[109] L. Liu et al., “Activation of Big Grain1 significantly improves grain size by regulating
auxin transport in rice,” Proc. Natl. Acad. Sci., vol. 112, no. 35, pp. 11102–11107, 2015,
doi: 10.1073/pnas.1512748112.
[110] B. S. Mishra, M. Jamsheer K, D. Singh, M. Sharma, and A. Laxmi, “Genome-Wide
Identification and Expression, Protein–Protein Interaction and Evolutionary Analysis of the
Seed Plant-Specific BIG GRAIN and BIG GRAIN LIKE Gene Family,” Front. Plant Sci.,
vol. 8, 2017, doi: 10.3389/fpls.2017.01812.
[111] S.-F. Lo et al., “Rice Big Grain 1 promotes cell division to enhance organ development,
stress tolerance and grain yield,” Plant Biotechnol. J., vol. 18, no. 9, pp. 1969–1983, Sep.
2020, doi: 10.1111/pbi.13357.
[112] M. J. Milner, S. Bowden, M. Craze, and E. J. Wallington, “Ectopic expression of TaBG1
increases seed size and alters nutritional characteristics of the grain in wheat but does not
lead to increased yields,” BMC Plant Biol., vol. 21, no. 1, p. 524, Nov. 2021, doi:
10.1186/s12870-021-03294-x.
110
[113] D. Aguilar-Benitez, J. Rubio, T. Millán, J. Gil, J. V. Die, and P. Castro, “Genetic analysis
reveals PDH1 as a candidate gene for control of pod dehiscence in chickpea,” Mol. Breed.,
vol. 40, no. 4, p. 40, Apr. 2020, doi: 10.1007/s11032-020-01117-9.
[114] X. Ma et al., “Functional characterization of soybean (Glycine max) DIRIGENT genes
reveals an important role of GmDIR27 in the regulation of pod dehiscence,” Genomics, vol.
113, no. 1, Part 2, pp. 979–990, Jan. 2021, doi: 10.1016/j.ygeno.2020.10.033.
[115] D. Rau et al., “Genomic dissection of pod shattering in common bean: mutations at
nonorthologous loci at the basis of convergent phenotypic evolution under domestication of
leguminous species,” Plant J., vol. 97, Nov. 2018, doi: 10.1111/tpj.14155.
[116] T. A. Parker, L. L. de Sousa, T. de Oliveira Floriani, A. Palkovic, and P. Gepts, “Toward
the introgression of PvPdh1 for increased resistance to pod shattering in common bean,”
Theor. Appl. Genet., vol. 134, no. 1, pp. 313–325, Jan. 2021, doi: 10.1007/s00122-020-
03698-7.
[117] V. Burlat, M. Kwon, L. B. Davin, and N. G. Lewis, “Dirigent proteins and dirigent sites
in lignifying tissues,” Phytochemistry, vol. 57, no. 6, pp. 883–897, Jul. 2001, doi:
10.1016/S0031-9422(01)00117-0.
[118] Z. Sheng et al., “Yellow-Leaf 1 encodes a magnesium-protoporphyrin IX monomethyl
ester cyclase, involved in chlorophyll biosynthesis in rice (Oryza sativa L.),” PLOS ONE,
vol. 12, no. 5, p. e0177989, May 2017, doi: 10.1371/journal.pone.0177989.
[119] H. Zhang et al., “Rice Chlorina-1 and Chlorina-9 encode ChlD and ChlI subunits of Mg-
chelatase, a key enzyme for chlorophyll synthesis and chloroplast development.,” Plant
Mol. Biol., vol. 62, no. 3, pp. 325–337, Oct. 2006, doi: 10.1007/s11103-006-9024-z.
[120] X. Wang, R. Huang, and R. Quan, “Mutation in Mg-Protoporphyrin IX Monomethyl
Ester Cyclase Decreases Photosynthesis Capacity in Rice,” PloS One, vol. 12, no. 1, pp.
e0171118–e0171118, Jan. 2017, doi: 10.1371/journal.pone.0171118.
[121] D. Chen et al., “The rice LRR-like1 protein YELLOW AND PREMATURE DWARF 1
is involved in leaf senescence induced by high light.,” J. Exp. Bot., vol. 72, no. 5, pp. 1589–
1605, Feb. 2021, doi: 10.1093/jxb/eraa532.
[122] J. Kumar and B. Rao, “Registration of ICCV 96029, a Super Early and Double Podded
Chickpea Germplasm,” Crop Sci. - CROP SCI, vol. 41, Mar. 2001, doi:
10.2135/cropsci2001.412605x.
[123] S. Kumar, G. Stecher, M. Li, C. Knyaz, and K. Tamura, “MEGA X: Molecular
Evolutionary Genetics Analysis across Computing Platforms.,” Mol. Biol. Evol., vol. 35,
no. 6, pp. 1547–1549, Jun. 2018, doi: 10.1093/molbev/msy096.
[124] J. Yang and Y. Zhang, “I-TASSER server: new development for protein structure and
function predictions,” Nucleic Acids Res., vol. 43, no. W1, pp. W174–W181, Jul. 2015, doi:
10.1093/nar/gkv342.
111
[125] J. Wang et al., “Reduced Expression of the SHORT-ROOT Gene Increases the Rates of
Growth and Development in Hybrid Poplar and Arabidopsis,” PLOS ONE, vol. 6, no. 12, p.
e28878, Dec. 2011, doi: 10.1371/journal.pone.0028878.
[126] Z. Wang, X. Wu, Q. Ren, X. Chang, and R. Li, “QTL mapping for developmental
behavior of plant height in wheat (Triticum aestivum L.),” Euphytica, vol. 174, pp. 447–
458, Aug. 2010, doi: 10.1007/s10681-010-0166-3.
[127] C. Li et al., “Mutation in Mg-Protoporphyrin IX Monomethyl Ester Cyclase Causes
Yellow and Spotted Leaf Phenotype in Rice,” Plant Mol. Biol. Report., vol. 37, Aug. 2019,
doi: 10.1007/s11105-019-01152-7.
[128] H. Funatsuki et al., “Molecular basis of a shattering resistance boosting global
dissemination of soybean,” Proc. Natl. Acad. Sci. U. S. A., vol. 111, no. 50, pp. 17797–
17802, Dec. 2014, doi: 10.1073/pnas.1417282111.
[129] A. Becker, M. Bey, T. R. Bürglin, H. Saedler, and G. Theissen, “Ancestry and diversity
of BEL1-like homeobox genes revealed by gymnosperm ( Gnetum gnemon) homologs.,”
Dev. Genes Evol., vol. 212, no. 9, pp. 452–457, Oct. 2002, doi: 10.1007/s00427-002-0259-
7.
[130] D. Jin, B. Li, X.-W. Deng, and N. Wei, “Plant COP9 signalosome subunit 5, CSN5.,”
Plant Sci. Int. J. Exp. Plant Biol., vol. 224, pp. 54–61, Jul. 2014, doi:
10.1016/j.plantsci.2014.04.001.
[131] P. M. Gaur and V. K. Gour, “A gene for leaf necrosis in chickpea (Cicer arietinum L.).,”
J. Hered., vol. 91, no. 1, pp. 79–81, Feb. 2000, doi: 10.1093/jhered/91.1.79.
[132] Thompson, A.E., Dierig, D.A. , Kleiman, R., “Variation in Vernonia galamensis
flowering characteristics, seed oil and vernolic acid contents.,” Ind. Crops Prod. Vol. 3 3
Pages 175-183, 1994, [Online]. Available:
https://www.sciencedirect.com/science/article/pii/0926669094900655
[133] J. Vásquez, “Production, use and fate of Chilean brown seaweeds: Re-sources for a
sustainable fishery,” J. Appl. Phycol., vol. 20, pp. 457–467, Oct. 2008, doi:
10.1007/s10811-007-9308-y.
[134] A. H. Buschmann et al., “Chapter Six - The Status of Kelp Exploitation and Marine
Agronomy, with Emphasis on Macrocystis pyrifera, in Chile,” in Advances in Botanical
Research, vol. 71, N. Bourgougnon, Ed. Academic Press, 2014, pp. 161–188. doi:
10.1016/B978-0-12-408062-1.00006-8.
[135] D. Aitken, C. Bulboa, A. Godoy-Faundez, J. L. Turrion-Gomez, and B. Antizar-Ladislao,
“Life cycle assessment of macroalgae cultivation and processing for biofuel production,” J.
Clean. Prod., vol. 75, pp. 45–56, Jul. 2014, doi: 10.1016/j.jclepro.2014.03.080.
112
[136] M. Johansson et al., “Seascape drivers of Macrocystis pyrifera population genetic
structure in the northeast Pacific,” Mol. Ecol., vol. 24, pp. 4866–4885, Sep. 2015, doi:
10.1111/mec.13371.
[137] G. A. Van der Auwera et al., “From FastQ data to high confidence variant calls: the
Genome Analysis Toolkit best practices pipeline.,” Curr. Protoc. Bioinforma., vol. 43, no.
1110, p. 11.10.1-11.10.33, 2013, doi: 10.1002/0471250953.bi1110s43.
[138] A. I. Chernova et al., “Genotyping and lipid profiling of 601 cultivated sunflower lines
reveals novel genetic determinants of oil fatty acid content,” BMC Genomics, vol. 22, no. 1,
p. 505, Jul. 2021, doi: 10.1186/s12864-021-07768-y.
113
Appendix A
Appendix
A.1 Non-linear regression models for time to flowering in wild
chickpea combine genetic and climatic factors[30]
Background
Accurate prediction of crop flowering time is required for reaching maximal farm efficiency.
Several models developed to accomplish this goal are based on deep knowledge of plant
phenology, requiring large investment for every individual crop or new variety. Mathematical
modeling can be used to make better use of more shallow data and to extract information from it
with higher efficiency. Cultivars of chickpea, Cicer arietanum, are currently being improved by
introgressing wild C. reticulatum biodiversity with very different flowering time requirements.
More understanding is required for how flowering time will depend on environmental conditions
in these cultivars developed by introgression of wild alleles.
Results
We built a novel model for flowering time of wild chickpeas collected at 21 different sites in
Turkey and grown in 4 distinct environmental conditions over several different years and
seasons. We propose a general approach, in which the analytic forms of dependence of flowering
time on climatic parameters, their regression coefficients, and a set of predictors are inferred
automatically by stochastic minimization of the deviation of the model output from data. By
using a combination of Grammatical Evolution and Differential Evolution Entirely Parallel
method, we have identified a model that reflects the influence of effects of day length,
temperature, humidity and precipitation and has a coefficient of determination of R2=0.97.
Conclusions
We used our model to test two important hypotheses. We propose that chickpea phenology may
be strongly predicted by accession geographic origin, as well as local environmental conditions
at the site of growth. Indeed, the site of origin-by-growth environment interaction accounts for
114
about 14.7% of variation in time period from sowing to flowering. Secondly, as the adaptation to
specific environments is blueprinted in genomes, the effects of genes on flowering time may be
conditioned on environmental factors. Genotype-by-environment interaction accounts for about
17.2% of overall variation in flowering time. We also identified several genomic markers
associated with different reactions to climatic factor changes. Our methodology is general and
can be further applied to extend existing crop models, especially when phenological information
is limited.
115
A.2 Genotyping and lipid profiling of 601 cultivated sunflower lines
reveals novel genetic determinants of oil fatty acid content [138]
Background
Sunflower is an important oilseed crop domesticated in North America approximately 4000 years
ago. During the last century, oil content in sunflower was under strong selection. Further
improvement of oil properties achieved by modulating its fatty acid composition is one of the
main directions in modern oilseed crop breeding.
Results
We searched for the genetic basis of fatty acid content variation by genotyping 601 inbred
sunflower lines and assessing their lipid and fatty acid composition. Our genome-wide
association analysis based on the genotypes for 15,483 SNPs and the concentrations of 23 fatty
acids, including minor fatty acids, revealed significant genetic associations for eleven of them.
Identified genomic regions included the loci involved in rare fatty acids variation on
chromosomes 3 and 14, explaining up to 34.5% of the total variation of docosanoic acid (22:0) in
sunflower oil.
Conclusions
This is the first large-scale implementation of high-throughput lipidomic profiling to sunflower
germplasm characterization. This study contributes to the genetic characterization of Russian
sunflower collections, which made a substantial contribution to the development of sunflower as
the oilseed crop worldwide and provides new insights into the genetic control of oil composition
that can be implemented in future studies.
Abstract (if available)
Abstract
Domestication of crops on land has helped meet the world’s growing food demand. Genomic resources available for different crops have enabled the switch from traditional phenotype-based domestication methods to more genetic approaches that are helping fasten the selection process. Long-term selection in many plants has been shown to have enhanced its agronomic potential but also caused a deterioration of its genetic diversity. One instance of that situation is chickpeas. This thesis focuses on increasing genetic diversity in chickpeas using nested association mapping (NAM) by the introgression of wild material into the elite cultivar.
Taking lessons from domestication on land, the second aspect of this thesis is focused on the domestication of giant kelp. Popular genomic methods like genome-wide association studies (GWAS) and genome selection (GS) have been used to propose genotypes that can advance breeding of kelp and meet the growing demand for this economically valuable product.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Developing genetic tools to assist in the domestication of giant kelp
PDF
Plant genome wide association studies and improvement of the linear mixed model by applying the weighted relationship matrix
PDF
Exploring the genetic basis of quantitative traits
PDF
Ecologically responsible domestication of kelp facilitated by genomic tools
PDF
Exploring the genomic landscape of giant kelp: biotechnological implications and sustainable development
PDF
Understanding the genetics, evolutionary history, and biomechanics of the mammalian penis bone
PDF
Model selection methods for genome wide association studies and statistical analysis of RNA seq data
PDF
Understanding the genetic architecture of complex traits
PDF
Investigating the potential roles of three mammalian traits in female reproductive investment
PDF
Bayesian analysis of transcriptomic and genomic data
PDF
Diversity and dynamics of giant kelp “seed-bank” microbiomes: Applications for the future of seaweed farming
PDF
Genetic and molecular insights into the genotype-phenotype relationship
PDF
Studies in bivalve aquaculture: metallotoxicity, microbiome manipulations, and genomics & breeding programs with a focus on mutation rate
PDF
Genome sequencing and transcriptome analysis of the phenotypically plastic spadefoot toads
PDF
Genetic architectures of phenotypic capacitance
PDF
Complex mechanisms of cryptic genetic variation
PDF
Evolutionary genomic analysis in heterogeneous populations of non-model and model organisms
PDF
Evolutionary mechanisms responsible for genetic and phenotypic variation
PDF
Applications of next generation sequencing in sessile marine invertabrates
PDF
Bayesian hierarchical models in genetic association studies
Asset Metadata
Creator
Singh, Anupam
(author)
Core Title
Understanding genetics of traits critical to the domestication of crops using Mixed Linear Models
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Molecular Biology
Degree Conferral Date
2022-05
Publication Date
04/12/2022
Defense Date
02/25/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
chickpea,Cicer hybrids,domestication,giant kelp,GWAS,Linear Models,Mixed Models,nested association mapping,OAI-PMH Harvest,Regression.
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Nuzhdin, Sergey (
committee chair
), Ehrenreich, Ian (
committee member
), Pratt, Matthew Robert (
committee member
), Sun, Fengzhu (
committee member
)
Creator Email
anupamsi@usc.edu,anupamsingh194@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC110937262
Unique identifier
UC110937262
Legacy Identifier
etd-SinghAnupa-10502
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Singh, Anupam
Type
texts
Source
20220412-usctheses-batch-922
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
chickpea
Cicer hybrids
domestication
giant kelp
GWAS
Linear Models
Mixed Models
nested association mapping
Regression.