Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Characterizing brain aging with neuroimaging, health, and genetic data
(USC Thesis Other)
Characterizing brain aging with neuroimaging, health, and genetic data
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Characterizing Brain Aging with Neuroimaging,
Health, and Genetic Data
Kaida Ning
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the Requirements for the Degree
DOCTOR OF PHILOSOPHY
(Computational Biology and Bioinformatics)
May 2020
Copyright 2020 Kaida Ning
ii
Abstract
As our population continues to age, the number of people who experience cognitive
decline and face increased risk of neurodegenerative diseases also grows. To preserve
cognitive function and prevent aging related diseases, it is imperative that we first
identify and understand the lifestyle, environmental or genetic factors that are
associated with brain aging. In this thesis, we explore brain magnetic resonance imaging
data, clinical data, and genetic data of both cognitively healthy subjects and patients
with Alzheimer's disease, with the goal of using multimodal data to understand brain
aging. Through studying cognitively healthy subjects, we quantified the association
between brain aging and multiple lifestyle factors and genetic factors. Through studying
subjects with Alzheimer's disease, we trained a statistical model that captured
important brain and genetic features associated with the disease, which can accurately
predict the disease risk of mild cognitive impaired subjects. Our results help to set a few
potential directions for decelerating brain aging, such as providing guidelines for a brain-
friendly lifestyle and offering dementia prediction at an early stage.
iii
Acknowledgements
First and foremost, I would like to thank my supervisor, Dr. Arthur Toga, for being my
supervisor. Dr. Toga encouraged me to explore different research projects that I was
interested in. In addition, he always gave insightful suggestions on my research work.
Without his support, I wouldn’t have the chance to follow my enthusiasm in studying
brain aging. I also want to thank Professor Fengzhu Sun and Professor Hosung Kim for
being my committee members and giving great suggestions and supports to my
research projects.
I am grateful to collaborate with many talented people through my research. Dr. Lu
Zhao curated the UK Biobank data locally, on which majority of my thesis work
depended. His expertise in processing large amount of brain imaging data indeed
accelerated my research. Will Matloff often discussed research problems with me. He is
knowledgeable about ongoing imaging genetics researches and has given me
enlightening opinions. I also want to thank Dr. Ben Duffy for offering expert knowledge
in training neural network model for brain age prediction. Additionally, I would like to
thank Dr. Meredith Franklin for helping me with statistical problems. Further, many
talented and kind LONI people helped me to go through technical problems, discussed
research ideas with me, and proof read my writings.
I also feel blessed to have my husband, Bo, who was always supportive to me. Most of
days as a PhD student were cheerful. However, I also have struggled with technical
problems and everyday life problems. In these times, Bo always held my hands and
helped me out. I am also thankful to our daughter, Anda, who brought us a lot of
happiness. I learned to cherish both the time for playing with her and the time for doing
research. I am also thankful to my parents and mother in law who supported me during
my PhD study.
My years on pursuing PhD degree were well spent with all these lovely people!
iv
Abbreviations
AD Alzheimer's disease
AUC area under the receiver operating characteristic curve
CA chronological age
CNN convolutional neural network
MCI mild cognitive impairment
MRI magnetic resonance imaging
NN neural network
PBA predicted brain age
RBA relative brain age
SNP single nucleotide polymorphism
v
Table of Contents
Abstract ....................................................................................................................................... ii
Acknowledgements ............................................................................................................... iii
Abbreviations .......................................................................................................................... iv
Table of Contents ..................................................................................................................... v
List of Figures ........................................................................................................................ viii
List of Tables ............................................................................................................................. x
Chapter 1 Introduction and background .................................................................. 1
1.1 Aging and brain aging .................................................................................................................... 1
1.2 Observing the brain with magnetic resonance imaging .......................................................... 2
1.3 Emerging big data for research .................................................................................................... 3
1.4 Imaging-based biomarkers ........................................................................................................... 4
1.5 Statistical models for data analyses ........................................................................................... 5
1.6 Summary and overview of chapters ........................................................................................... 7
Chapter 2 Relative brain age: a biomarker derived from brain
magnetic resonance imaging ............................................................................................ 11
2.1 Introduction .................................................................................................................................. 11
2.2 Results ........................................................................................................................................... 12
2.2.1 Demographic information ................................................................................................................ 12
2.2.2 Predicted brain age and relative brain age ................................................................................ 13
2.3 Material and methods ................................................................................................................ 15
2.3.1 Overview of UK Biobank project .................................................................................................... 15
2.3.2 Magnetic resonance imaging data ................................................................................................ 16
2.3.3 Obtaining predicted brain age and relative brain age based on structural MRI
data ..................................................................................................................................................................... 16
2.4 Discussion ...................................................................................................................................... 18
Chapter 3 Association of relative brain age with lifestyle factors --
smoking and alcohol consumption ................................................................................. 21
3.1 Introduction .................................................................................................................................. 21
3.2 Results ........................................................................................................................................... 22
3.2.1 Previous tobacco smoking is significantly associated with relative brain age ............... 22
3.2.2 Alcohol consumption is significantly associated with relative brain age ......................... 24
3.2.3 Additive effect of smoking and alcohol consumption in association with
relative brain age ........................................................................................................................................... 25
3.3 Materials and methods ............................................................................................................... 26
3.3.1 Demographic information ................................................................................................................ 26
3.3.2 Quantifying the association of RBA with previous tobacco smoking amount
and alcohol intake amount ......................................................................................................................... 27
3.4 Discussion ...................................................................................................................................... 27
vi
Chapter 4 Association of relative brain age with lifestyle factors --
parity 30
4.1 Introduction .................................................................................................................................. 30
4.2 Results ........................................................................................................................................... 32
4.2.1 Demographic information ................................................................................................................ 32
4.2.2 Number of offspring and relative brain age .............................................................................. 33
4.2.3 Number of offspring and cognitive function ............................................................................. 37
4.3 Materials and methods ............................................................................................................... 40
4.3.1 Demographic information ................................................................................................................ 40
4.3.2 Study the association between number of offspring and relative brain age ................. 41
4.3.3 Study the association between number of offspring and cognitive function ................ 42
4.4 Discussion ...................................................................................................................................... 42
Chapter 5 Improving relative brain age estimate with a convolutional
neural network model and its implication on identifying genetic factors
associated with brain aging .............................................................................................. 48
5.1 Introduction .................................................................................................................................. 48
5.2 Results ........................................................................................................................................... 49
5.2.1 Predicted brain age accuracy from CNN model and regression model ............................ 49
5.2.2 Genetic factors associated with brain aging .............................................................................. 51
5.3 Material and methods ................................................................................................................ 56
5.3.1 Summary of samples used ............................................................................................................... 56
5.3.2 Obtaining relative brain age based on a CNN model .............................................................. 56
5.3.3 Genetic association analyses .......................................................................................................... 60
5.4 Discussion ...................................................................................................................................... 60
Chapter 6 Predicting Alzheimer's disease risk using both neural
network and regression models ...................................................................................... 63
6.1 Introduction .................................................................................................................................. 63
6.2 Results ........................................................................................................................................... 65
6.2.1 Demographic information ................................................................................................................ 65
6.2.2 Models’ performance in classifying AD and in predicting progression from MCI
to AD ................................................................................................................................................................... 66
6.2.3 Important brain and SNP features used by NN model ........................................................... 68
6.2.4 Interactions among brain and genetic features captured by NN model ......................... 69
6.3 Materials and Methods .............................................................................................................. 70
6.3.1 Description of ADNI subjects in the study .................................................................................. 70
6.3.2 MRI brain imaging data and genotype data .............................................................................. 71
6.3.3 Neural network and logistic regression models ....................................................................... 72
6.3.4 Procedures for model training and testing ................................................................................ 75
6.3.5 Identifying important brain and SNP features .......................................................................... 77
6.3.6. Identifying interactions among features ................................................................................... 78
6.4 Discussion ...................................................................................................................................... 79
Chapter 7: Conclusions and future directions ........................................................... 83
7.1 Summary of findings ................................................................................................................... 83
7.2 The pros-and-cons of NNs for analyzing big data ................................................................... 87
7.3 Identifying factors associated with brain aging and the next step ...................................... 88
7.4 Potential of dementia onset prediction based on imaging data ......................................... 89
vii
7.5 Conclusion ..................................................................................................................................... 91
References .............................................................................................................................. 92
viii
List of Figures
Figure 1.1. A brain magnetic resonance imaging scan. .................................................................. 3
Figure 2.1. Procedure for training a model for calculating relative brain age and
applying the model to evaluation set samples. .................................................................... 14
Figure 2.2 Relationship between chronological age, predicted brain age, and
relative brain age. ........................................................................................................................... 15
Figure 3.1. Relationship between previous tobacco smoking frequency and
relative brain age. ........................................................................................................................... 23
Figure 3.2. Relationship between alcohol intake frequency and relative brain age. ........ 24
Figure 4.1. Procedure for studying the association between number of offspring
and relative brain age using samplings. .................................................................................. 35
Figure 4.2. Distribution of relative brain age predicted by model with
multivariable adjustment over 500 samplings in female (left) and male (right)
subjects. ............................................................................................................................................. 36
Figure 4.3. Number of offspring versus response time predicted by model with
multivariable adjustment in female (left) and male (right) subjects. The unit
of response time is millisecond. ................................................................................................ 38
Figure 4.4. Number of offspring versus visual memory score in female (left) and
male (right) subjects. The unit of visual memory is log(number of mistakes
made in memorizing matching cards). .................................................................................... 39
Figure 5.1. Association between chronological age and predicted brain age from a
convolutional neural network model. ..................................................................................... 51
Figure 5.2. Manhattan plot for the association p-values between SNPs and
relative brain age across the genome. The red line indicates the genome-
wide significant threshold on p-value (i.e., 5E-8). The blue line indicates p-
value of 0.05. .................................................................................................................................... 52
Figure 5.3. Regional visualization of a 2-Mb locus on Chromosome 17 where the
SNPs showing genome-wide significant associations with relative brain age
are located. ....................................................................................................................................... 53
Figure 5.4. Regional visualization of a 1-Mb locus on Chromosome 4 where the
SNPs showing genome-wide significant associations with relative brain age
are located. ....................................................................................................................................... 54
Figure 5.5. Comparison of genetic signals identified using RBA from a linear
regression model and using RBA from a convolutional neural network model. ...... 55
Figure 5.6. Procedure for obtaining RBA through a 5-fold cross-validation strategy
and then carrying out RBA-SNP association. ......................................................................... 57
Figure 5.7. Structure of the convolutional neural network model for predicting
brain age based on MRI data. Curved arrows indicate skip connections
between layers. ............................................................................................................................... 59
Figure 6.1. Structure of a neural network model with two hidden layers. ........................... 67
ix
Figure 6.2. Accuracy (measured as AUC) of the best-performing neural network
and logistic regression models in predicting progression from mild cognitive
impairment to Alzheimer’s disease. ......................................................................................... 68
Figure 6.3. Strength of pairwise interactions among all the brain and genetic
features used in the neural network model. ......................................................................... 70
Figure 6.4. Training, validation, and testing data. ......................................................................... 75
Figure 7.1 Paradigm of research on factors associated with brain aging. ............................ 89
x
List of Tables
Table 2.1. Demographic information of subjects included in the analyses. ......................... 13
Table 4.1 Demographic information of subjects included in the analyses for the
association between parity and relative brain age. ............................................................ 32
Table 4.2. Median of coefficient estimations for number of offspring in association
with relative brain age in regression model with multivariable adjustment in
500 samplings. ................................................................................................................................. 36
Table 4.3. Coefficient estimations of number of offspring in association with
response time in regression model with multivariable adjustment. The unit
of response time is millisecond. ................................................................................................ 39
Table 4.4. Coefficient estimations of number of offspring in association with visual
memory score. The unit of visual memory is log(number of mistakes made in
memorizing matching cards). ..................................................................................................... 40
Table 6.1. Demographic characteristics of Alzheimer’s disease, mild cognitive
impairment, and healthy control subjects. ............................................................................ 65
Table 6.2. Weight of important features for classifying AD patients and CN
subjects in the neural network model. .................................................................................... 69
1
Chapter 1 Introduction and background
1.1 Aging and brain aging
In this thesis, we study aging of the brain. The brain's aging process, while different
between individuals, is associated with structural changes, declined cognitive function,
and increased risk of dementia
1-3
.
The number of Americans aged 65 and over is projected to reach 80 million by year
2050
4
. As the general population gets older, the number of people who experience
cognitive decline and face increased risk of neurodegenerative diseases such as
Alzheimer's disease (AD)
5
also grows. The AD population is projected to increase from
5.8 million today to 14 million in year 2050
6
. On the other hand, brain aging may be
different from chronological aging. People with the same chronological age can have
different brain aging trajectories, different levels of brain atrophy, and different
cognitive capacities. Understanding the factors associated with brain aging can lead to
methods to intervene in the aging process and preserve cognitive function.
2
1.2 Observing the brain with magnetic resonance imaging
To study the brain, one needs to first observe it. Researchers and clinicians started using
magnetic resonance imaging (MRI), a non-invasive imaging technology that provides
high-resolution anatomical images, for studying human brain since the 1980s
7,8
. Figure
1.1 is an example of brain MRI.
Brain MRI has been used to provide initial disease diagnoses based on lesion, bleeding,
tumor, inflammation, etc., observed in the brain
9-12
. It has also been used to study how
the volume of specific brain regions, such as frontal lobes, temporal lobes, and lateral
ventricles, changes during normal aging
13
. Further, MRI has been used to study
accelerated aging caused by diseases such as AD, and to predict disease risk
14-17
.
3
Figure 1.1. A brain magnetic resonance imaging scan.
1.3 Emerging big data for research
Nowadays, brain imaging data collected for both the general population and patients
with brain diseases are growing steadily, thanks to the collaborative efforts of
researchers from different sites
18
. These large amounts of data provided researchers
with more statistical power to obtain a clearer picture of brain structural changes in
association with aging and disease risk. For example, the UK Biobank recruited ~500,000
subjects in the United Kingdom
19
. All participants have provided blood, urine and saliva
samples and have been genotyped. About 20,000 participants went through MRI scan as
4
of August 2018. The Alzheimer's Disease Neuroimaging Initiative (ADNI) database
(http://adni.loni.usc.edu) was established in 2004 to measure the progression of healthy
and cognitively impaired participants with brain scans, biological markers, and
neuropsychological assessments
20
. To date, about 3,000 subjects with AD, mild cognitive
impairment, and healthy controls were enrolled. Many other databases for studying
various diseases have also been curated, such as Parkinson's progression marker
initiative, Autism centers of excellence, Neurodegeneration in aging Down syndrome,
etc. (https://ida.loni.usc.edu/)
Further, researchers with different expertise started working closely to analyze big data
from these repositories. For example, the Enhancing Neuroimaging Genetics through
Meta-Analysis (ENIGMA; http://enigma.ini.usc.edu/) project brought researchers from
computer science, statistics, and genetics background to work closely to understand
brain structure, function, and disease, based on brain imaging and genetic data.
1.4 Imaging-based biomarkers
The increasing amount of imaging data has led to more discoveries regarding how the
brain's health status is linked to imaging-based features or biomarkers.
Conventional biomarkers are usually based on individual brain morphometric
measurements from MRI. For example, researchers found that certain brain regions
shrink in AD patients, including hippocampus, temporal gyrus, etc., and may be used for
AD prediction
14,21-23
. Statistical classification models, such as support vector machines
14-
5
17,24,25
, linear discriminant analysis
23,25
, and regression models
26-28
, etc., have been
successfully trained for obtaining 'integrative' biomarkers for classifying and predicting
AD using these biomarkers.
Recently, researchers have successfully used machine-learning methods to derive a
biomarker that is commonly referred to as predicted brain age (PBA) or brain age based
on imaging data. PBA reflects the degree of aging of the brain based on its anatomical
characteristics, as computed based on brain morphology measurements across the
entire brain. Several studies used PBA and revealed that advanced brain age is
associated with Alzheimer's disease, objective cognitive impairment, and schizophrenia,
etc.
29-34
In this thesis, we further derived relative brain age (RBA), a biomarker that
describes a subject's PBA relative to peers. We will talk about RBA and its application in
later chapters.
1.5 Statistical models for data analyses
Various statistical models have been trained to extract information from the data. For
example, traditional regression models and neural networks (NNs) are often used for
regression and classification tasks. They both fit the data by defining a cost function and
then optimizing the parameters in the model so that the cost function is the lowest
35
.
Traditional linear and logistic regression models assume that each predictor (or feature)
contributes additively to the response variable, or log odds of the response variable
6
when it is binary. A common application of regression models is to study the association
between a specific factor of interest and brain morphometric measurements while
adjusting for factors such as age and sex
36,37
. Regression models are also used to predict
certain feature or disease risk based on MRI data
38,39
.
As a comparison, NNs have more complicated structure than regression models, and are
able to extract complex interactions from features through transformation functions in
the layers of nodes connected within the NNs
40,41
. In recent years, NNs have led to
critical breakthroughs in modern artificial intelligence problems such as visual
recognition and speech recognition
41-44
. One class of NNs most commonly used for
analyzing imaging data is convolutional neural network (CNN). CNNs are able to learn
features in images with convolution operations without prior knowledge of what these
features are. With increased amount of labeled clinical imaging data, researchers are
able to train CNNs to accomplish clinical tasks. For example, Andre et al. reported
training a CNN model with more than a hundred thousand skin images, where the NN
models reached dermatologist-level accuracy in classification of skin cancer images
45
.
Langner et al. reported training a CNN model for predicting 'body age' based on whole-
body MRI scan, where the body age characterized the aging level of a human body
based on its structure. The mean absolute error between predicted body age and
chronological age was 2.49 years, which was more accurate than the estimate given by
experienced radiologists
46
. Researchers have also applied CNNs to processing brain MRI
data. Applications include but are not limited to preprocessing MRI data, segmentation
of different brain regions, detection and segmentation of tumors
47,48
. Recently, Jonsson
7
et al. trained a CNN model for predicting 'brain age' based on brain MRI scan, where the
mean absolute error between predicted brain age and chronological age was 3.6
years
49
. We will further discuss about predicting brain age in Chapter 2 and Chapter 5.
Since identifying good models for fitting the data is an important part in the research of
brain aging, we have used both traditional regression models and NNs for our research
problems, and compared their performance.
1.6 Summary and overview of chapters
In this thesis, we further extended the applications of MRI to study brain aging,
including normal brain aging in cognitively healthy subjects and abnormal brain aging in
patients with Alzheimer's disease. We analyzed UK Biobank data for the former and
analyzed ADNI data for the latter. The two databases were multimodal, encompassing
MRI, genetic, lifestyle, and clinical data, etc.
1.6.1 Studying brain aging in cognitively normal subjects
In the cognitively normal population, the aging process doesn't affect everyone in the
same way. Individuals with the same chronological age can have different brain aging
trajectories
32
. Researchers have identified various factors associated with accelerated
brain structure atrophy by investigating the MRI data along with lifestyle or disease
features of subjects. For example, compared with non-smokers, smokers have
significantly smaller grey matter volume and lower grey matter density in the frontal
8
regions, the occipital lobe, and the temporal lobe. Smokers also have a significantly
greater rate of atrophy in regions that show morphological abnormalities in the early
stages of AD
50-52
. It has also been reported that patients with alcohol use disorder show
decreased regional grey and white matter volumes in the medial-prefrontal and
orbitofrontal cortices. The loss of brain gray and white matter volume accelerates with
aging in chronic alcoholics
53,54
.
In our study of brain aging in cognitively healthy subjects, we took into account
multidimensional aging patterns across all regions of the brain. For this purpose, we first
obtained PBA biomarker using a regression model, and then obtained relative brain age
(RBA) based on PBA and chronological age. RBA indicates how old a subject’s PBA
appears compared to peers and is independent of age, which allows direct comparison
of brain aging level for subjects with different chronological age (Chapter 2).
Although heavy smoking and alcohol consumption are known to be associated with
accelerated brain aging in specific brain regions, the associations haven’t been well
quantified, especially when all brain regions are considered. Therefore, after obtaining
RBA, we quantified how smoking and alcohol consumption are associated with RBA
(Chapter 3).
After studying the association of RBA with smoking and alcohol consumption, we
investigated the association of RBA with parity. Previous research on the association
between brain structural change and parity reported inconclusive findings. Hoekzema et
al. reported that the volume of certain gray matter regions was reduced during
9
pregnancy and the reductions did not recover for at least 2 years post partum
55
, while
others reported that the gray matter restoration process was evident within the first
few months postpartum
56,57
. Most studies on the association between brain structure
and parity had a relatively small sample size (n<100) and less than three years of
postpartum follow-up
55-57
. We hoped that through analyzing the big data from UK
Biobank, we could draw a clearer picture of the association between parity and brain
structure (Chapter 4).
Besides lifestyle habits, genetic factors are also thought to be involved in brain aging. A
recent study analyzed brain imaging data and chronological age information from twins
and suggested that the brain aging process was heritable
34
. However, the extent to
which individual genetic variants are associated with brain aging needs to be further
investigated. Further, we hypothesized that a convolutional neural network (CNN)
model may produce more accurate PBA and RBA metrics than the regression model
used in Chapter 2. Therefore, we trained a CNN model for obtaining RBA and studied
the association between genetic factors and RBA. We also compared the genetic factors
identified based on RBA derived from the CNN model and RBA from the regression
model (Chapter 5).
1.6.1 Studying brain aging in AD patients
Besides studying factors associated with normal brain aging, we also studied brain aging
in AD patients (Chapter 6). It is known that AD is associated with accelerated atrophy in
multiple brain regions including hippocampus, entorhinal cortex, and temporal gyrus,
10
etc.
30,58
It is also known that AD risk is affected by multiple genetic factors
59
. A long-
standing question is how to best use brain morphometric and genetic data to distinguish
AD patients from cognitively normal subjects and to predict those who will progress
from mild cognitive impairment (MCI) to AD. We trained a NN model to classify AD with
both brain morphometric measurements from MRI data and genetic data. We then
assessed this model's performance in predicting progression from MCI to AD. We
further investigate this model to identify the important predictors and interactions
among the predictors.
11
Chapter 2 Relative brain age: a biomarker derived from
brain magnetic resonance imaging
2.1 Introduction
In this chapter, we talk about deriving relative brain age (RBA) biomarker from brain
imaging data. RBA describes if a person’s brain has experienced accelerated or
decelerated aging compared to peers. We will use RBA for studying lifestyle and genetic
factors associated with brain aging in Chapters 3-5.
Recently, researchers have successfully used machine-learning methods to derive a
biomarker that is commonly referred to as predicted brain age (PBA) or brain age based
on brain imaging data. PBA reflects the degree of aging of the brain based on its
anatomical characteristics, as computed based on brain morphology measurements
across the entire brain. PBA has been derived and used in several studies, where the
mean absolute error between PBA and chronological age (CA) was less than 5 years in
adults
29,32,34
. Further, it has been shown that advanced brain age is associated with
Alzheimer's disease, objective cognitive impairment, and schizophrenia, etc.
29-33
. Before
our research, many papers used the difference between PBA and CA (i.e., PBA - CA) for
capturing deviation of person’s brain structural aging from norm
60,61
.
12
However, due to regression dilution, PBA - CA is correlated with CA and may not be
optimal
39,62
. Therefore, we further developed RBA metric, which is independent of CA
and indicates if a subject's brain experiences accelerated or decelerated aging compared
to peers. Besides our research, Smith et al. independently reported a method for
improving brain age delta estimation
39
. They gave statistical reasoning for the cause of
the association between PBA - CA and CA in linear regression. They also suggested
removing the association through stage 2 correction of brain age delta, which was very
similar to our RBA metric. In this chapter, we talk about a method for deriving RBA using
UK Biobank data.
2.2 Results
2.2.1 Demographic information
We randomly split the data for 17,308 subjects with brain magnetic resonance imaging
into training set (n = 5,193) and evaluation set (n = 12,115). Table 2.1 illustrates the
demographic information for the subjects included in the training and evaluation sets.
There was no significant difference in age, gender, smoking, and alcohol consumption
between these two sets.
13
Table 2.1. Demographic information of subjects included in the analyses.
Number of subjects Male (%) | Female (%) Age (mean [SD], min-max)
Training data
(for model training)
5,193 2,466 (47%) | 2,727 (53%) 63.3 [7.4], 46.2-80.7
Evaluation data
(for association analyses)
12,115 5,753 (47%) | 6,362 (53%) 63.3 [7.4], 45.2-80.3
2.2.2 Predicted brain age and relative brain age
We trained a regression model that produced the predicted brain age (PBA) using
training set subjects. We observed that the difference between PBA and CA (i.e., PBA -
CA) was negatively associated with CA. The older subjects tended to have negative PBA -
CA, while the younger subjects tended to have positive PBA - CA. Therefore, after
obtaining PBA for each subject we further trained a model to calculate relative brain age
(RBA) (see methods). We then applied the trained models to the evaluation set subjects,
and further obtained PBA and RBA for the evaluation set subjects (as illustrated in
Figure 2.1). The mean absolute error (MAE) between PBA and chronological age (CA) in
the evaluation set was 3.8 years. The relationship between CA, PBA, and RBA for the
evaluation set subjects is illustrated in Figure 2.2. In the evaluation set, there were
roughly half of the subjects with positive RBA and half of the subjects with negative RBA
at each age range (Supplementary Figure 2.2), although PBA-CA was negatively
associated with CA (Supplementary Figure 2.1).
14
Figure 2.1. Procedure for training a model for calculating relative brain age and applying the model to evaluation
set samples.
15
Figure 2.2 Relationship between chronological age, predicted brain age, and relative brain age.
2.3 Material and methods
2.3.1 Overview of UK Biobank project
The UK Biobank recruited ~500,000 subjects in the United Kingdom
19
. The participants
have provided blood, urine and saliva samples. All participants have been genotyped.
20,000 participants scanned as of August 2018 were included in our study (including
brain, heart, abdomen, bones and carotid artery). All participants had provided
informed consent. The present analyses were conducted under data application number
25641.
16
2.3.2 Magnetic resonance imaging data
Details of the structural brain magnetic resonance imaging (MRI) data, such as imaging
hardware and acquisition protocols, are described elsewhere
63,64
. In our analyses,
quality controlled structural MRI data was obtained for 21,345 subjects. We excluded
1,222 (5.7%) subjects with brain and nervous system related illness, including cognitive
impairment, neurological disorders or stroke, etc. We further excluded 2,815 (13.2%)
subjects with non-European ancestry (according to both self-reported ethnicity and
principal component analyses on the genetic data). Brain imaging data of 17,308
subjects were used in our analyses. The age range of these participants is between 45.2
years and 80.7 years.
In total, 403 brain morphometrics, including volume of cortical, subcortical and white
matter regions, thickness and surface area of cortical regions, ventricle size, intracranial
volume, etc., were obtained with FreeSurfer 6.0
65
based on the T1 MRI brain scans, with
the Desikan-Killiany atlas. FreeSurfer is documented and freely available for download
online (http://surfer.nmr.mgh.harvard.edu/).
2.3.3 Obtaining predicted brain age and relative brain age based on structural MRI
data
Predicted brain age (PBA) is a metric describing how old a person's brain appears based
on a brain scan at a single time-point. Relative brain age (RBA) is a metric indicating if a
person’s brain has experienced accelerated or decelerated aging compared to peers. It
17
captures the deviation of a person’s brain structural aging from the population’s normal
pace.
We trained a model for obtaining PBA and RBA based on MRI data using training set
subjects. To be specific, we randomly split the brain imaging data of 17,308 subjects into
training and evaluation sets. Our rationale for picking 30% (5,193) of the subjects as the
training set and the remaining 70% (12,115) as the evaluation set was to balance the
need for accurately training a model to predict brain age and the need for a large
number of subjects in the evaluation set for evaluating the association of RBA and the
factors of interest.
The model for obtaining PBA and RBA is trained as follows. We first trained a model
obtaining predicted brain age (PBA) based on MRI data using data of the training set
subjects. To be specific, we built a linear regression model with Lasso regularization for
predicting brain age using R package glmnet
66,67
. In the model, the chronological age
was the response variable, and 403 brain quantitative measures derived using
Freesurfer were used as predictors. During model training, the Lasso parameter,
lambda, was selected based on an internal cross validation using glmnet. We did not do
any pre-selection on the predictors, since the training set sample size was sufficiently
large relative to the number of predictors in the model. The mean absolute error (MAE)
between PBA and chronological age in the training set was 3.5 years. We observed that
due to regression dilution
62
, the difference between PBA and CA (i.e., PBA - CA) was
negatively associated with CA. The older subjects tended to have negative PBA - CA,
18
while the younger subjects tended to have positive PBA - CA (Figure 2.2 and
Supplementary Figure 2.1). Therefore, after obtaining PBA for each subject, we further
calculated RBA. RBA is defined as the difference between PBA and expected PBA given a
subject’s chronological age (i.e., RBA= PBA- Expected(PBA|CA)). Here,
Expected(PBA|CA)), or EPBA, was obtained through building a regression model where
CA was the predictor and PBA was the response variable. In that way, RBA is
independent of CA. At each age range, there were roughly half of the subjects with
positive RBA and half of the subjects with negative RBA (Figure 2.2). A subject with
positive RBA has a brain that appears older than those of peers, while a subject with
negative RBA has a brain that appears younger. Since we linear operations were used to
derive RBA based on PBA and CA, the unit of RBA is year.
After training the model for obtaining PBA and RBA using the training set data, we
applied it to the evaluation set and carried on association analyses.
2.4 Discussion
Here we analyzed the brain imaging data collected for 17,308 UK Biobank subjects. We
derived RBA metric using training set subjects, and further investigated the association
of RBA with smoking, alcohol intake, and genetic variants using evaluation set subjects.
In our analyses, we first calculated PBA of a subject based on structural MRI data and
then derived RBA, a metric that describes a subject's PBA relative to peers. RBA was
19
calculated as the difference between PBA and EPBA (i.e., RBA=PBA - EPBA; see the
methods section for details) of a person. As a comparison, in other studies where PBA
was derived based on regression model, the difference between PBA and CA (PBA - CA,
or BrainAGE) was used to indicate the brain aging status
29-31
. We observed that due to
regression dilution, older subjects tend to have negative values of PBA - CA, while
younger subjects tend to have positive values of PBA - CA (Figure 2.2). As a comparison,
RBA was independent of CA. At all age ranges, roughly half of the subjects had positive
RBA and half of the subjects had negative RBA.
Supplementary Figures
Supplementary Figure 2.1. Relationship between chronological age and the difference
between predicted brain age and chronological age in the evaluation set.
20
Supplementary Figure 2.2. Relationship between chronological age and relative brain
age in the evaluation set.
21
Chapter 3 Association of relative brain age with lifestyle
factors -- smoking and alcohol consumption
3.1 Introduction
In chapter 2 we derived relative brain age metric (RBA). In this chapter, we study the
association of RBA with smoking and alcohol consumption.
Heavy smoking and heavy alcohol drinking are among the most studied lifestyle factors
on brain aging. Compared with non-smokers, smokers have significantly smaller grey
matter volume and lower grey matter density in the frontal regions, the occipital lobe,
and the temporal lobe. Further, smokers have a significantly greater rate of atrophy in
regions that show morphological abnormalities in the early stages of Alzheimer’s
disease
50-52
. It has also been reported that patients with alcohol use disorder show
decreased regional grey and white matter volumes in the medial-prefrontal and
orbitofrontal cortices. The loss of brain gray and white matter volume accelerates with
aging in chronic alcoholics
53,54
. On the other hand, studies have shown that nicotine, a
compound contained in tobacco, may improve attention and other cognitive functions
in human subjects
68,69
. It has also been reported that drinking wine may be beneficial to
the cardiovascular system, which is related to brain health
70,71
. To date, it is still unclear
how smoking and alcohol consumption is associated with brain structural aging,
especially when the morphology of all the brain regions is considered. Therefore, we
22
analyzed brain-imaging data collected for 17,308 UK Biobank subjects who were
cognitively normal and were of European ancestry and studied the association of
relative brain age with smoking and alcohol consumption.
3.2 Results
3.2.1 Previous tobacco smoking is significantly associated with relative brain age
Information of previous tobacco smoking frequency was collected for 11,651 of the
evaluation set subjects during the visit for MRI scan. Regression analyses adjusting for
gender and education showed that previous tobacco smoking frequency was statistically
significantly associated with RBA (ANOVA F-test p-value < 2E-16, see Figure 3.1).
Pairwise comparisons showed that the most significant difference was between those
who smoked on most or all days (with an average RBA of 0.6 years) and the rest of the
smoking frequency categories (i.e., those who abstained from smoking, just tried once
or twice, or occasionally), while there was no significant difference among the groups of
subjects who didn’t smoke on most or all days.
23
Figure 3.1. Relationship between previous tobacco smoking frequency and relative brain age.
24
Figure 3.2. Relationship between alcohol intake frequency and relative brain age.
3.2.2 Alcohol consumption is significantly associated with relative brain age
Information of current alcohol drinking frequency was collected for 11,600 of the
evaluation set subjects during the visit for MRI scan. Regression analyses adjusting for
gender and education showed that alcohol consumption frequency was statistically
25
significantly associated with RBA (ANOVA F-test p-value = 9E-6, see Figure 3.2). Pairwise
comparisons among groups with different alcohol consumption frequencies showed
that the strongest difference was between the group who drank alcohol on most or all
days (with an RBA of 0.4 years) and the rest of the alcohol drinking frequency categories
(i.e., those who abstained from drinking, drank at special occasions only, 1~3 times a
month, 1~2 times a week, or 3~4 times a week), while the difference among groups who
didn’t drink on most or all days was insignificant.
3.2.3 Additive effect of smoking and alcohol consumption in association with relative
brain age
Smoking and alcohol consumption amount were positively correlated and had an
additive effect on RBA. Among the 2,327 subjects who smoked on most or all days and
did not abstain from alcohol, the correlation between the two variables was 0.08 (p-
value = 9E-5). We used a regression model with RBA as the response variable and with
smoking amount, alcohol consumption amount, sex, and education as predictors.
According to this model, each additional pack-year of smoking was associated with 0.03
years of increased RBA (p-value = 2E-8); each additional gram of alcohol consumption
per day was associated with 0.02 years of increased RBA (p-value = 6E-10). The R-
squared value of this model was 0.032. As a comparison, a model with only smoking
amount as predictor and adjusted for sex and education had an R-squared of 0.018. A
model with only alcohol consumption amount as predictor and adjusted for sex and
education had an R-squared of 0.015. We also built a regression model with an
26
interaction term between alcohol drinking and smoking. The interaction term was
insignificant, indicating that there was insufficient evidence to support the presence of
an interaction between alcohol drinking and smoking in affecting RBA.
3.3 Materials and methods
3.3.1 Demographic information
We used the information of education qualification collected during the visit for MRI
scan. The qualification variable has multiple categories based on a British system. We
collapsed it into two categories indicating whether or not a subject held a college or
university degree, as used in the paper by Cox et al.
72
There was a significant association
between education and RBA (p-value = 0.009). Therefore, we also adjusted for
education when assessing the association of RBA with smoking, alcohol consumption,
and genetic variants.
We used the information of smoking history and alcohol intake status that was collected
during the visit for MRI scan. The smoking and alcohol intake frequency categories used
in our analyses were as reported in the UK Biobank questionnaire. The smoking pack-
years was defined as the number of cigarettes smoked per day/20 multiplied by the
number of years of smoking. The alcohol intake amount was calculated as described in
the paper by Piumatti et al.
73
. Alcohol consumption per day for a specific type of drink
was calculated as the number of drinks consumed per day multiplied by the number of
27
grams of alcohol contained in one drink. The total amount of alcohol consumption per
day was the summation of the alcohol amount from all types of drinks. More details can
be found on the UK Biobank website (http://www.ukbiobank.ac.uk/).
3.3.2 Quantifying the association of RBA with previous tobacco smoking amount and
alcohol intake amount
We quantified the association between previous tobacco smoking amount, alcohol
intake amount, and RBA using a two-step regression model adjusting for gender and
education. We first built a linear regression model using data of 2,327 evaluation set
subjects who previous smoked daily or almost daily and did not abstain from drinking
alcohol. We then identified subjects with large Cook's distance as potential influential
observations (i.e., subjects with Cook's distance greater than 3* the mean Cook's
distance of all the subjects). We excluded these influential observations, fitted a second
linear regression model, and reported results based on the second regression model. In
total, data of 2,174 non-influential observations were used in the second-step
regression.
3.4 Discussion
Our analyses of smoking and RBA indicated that subjects who had smoked on most or all
days had a significantly higher RBA compared to subjects who smoked less often. That
was consistent with previous studies, which showed significantly greater rate of atrophy
28
in certain regions of the brains of smokers
50-52
. Our data also showed insignificant
difference of RBA among the subjects who smoked occasionally, only tried once or
twice, or abstained from smoking. This suggests that the detrimental effect of smoking
on brain aging occurs mainly among those who smoked on most days.
Our analyses of alcohol intake frequency and RBA indicated that subjects who drank
daily or almost daily had a significantly higher RBA compared to those who drank less
frequently. Our finding was consistent with previous studies, which showed that heavy
alcohol consumption was detrimental to the brain
53,54,74
. We did not find significant RBA
difference among subjects who drank alcohol less frequently or those who abstain from
drinking. It has been reported that a small dose of alcohol is associated with a reduced
risk of cardiovascular disease, coronary heart disease and stroke
73,75,76
. Moreover,
cardiovascular health and brain health are related. Researchers have found that
cardiovascular risk factors like hypertension and heart disease are associated with
increased brain white matter abnormalities and brain atrophy
70,77
. Therefore, a small
amount of alcohol may have certain beneficial to brain health through contributing to
the cardiovascular health. Gu et al., have reported that light-to-moderate total alcohol
intake was associated with larger total brain volume in elderly subjects
78
. Nevertheless,
our results didn’t show RBA difference among subjects who drank alcohol less
frequently or those who abstain from drinking. We also acknowledge that our
observation would need to be further validated using an independent data set.
29
Our study has some limitations. First, we used a linear regression model with LASSO to
produce PBA based on structural MRI data. More sophisticated statistical approaches
such as using principal component analyses for dimension reduction before LASSO
regression, or using neural networks may help to improve the accuracy of PBA. Also, the
combination of structural MRI and other types of brain imaging data (e.g., functional
MRI, diffusion-weighted MRI) may help to improve the accuracy of PBA. A more
accurate PBA would allow better estimation of RBA. Second, in our study, we
investigated the association of brain age with tobacco smoking and alcohol
consumption. Besides smoking and alcohol consumption, various environmental factors
may be associated with brain age. For example, physical exercise and meditation had
been reported to be associated with lower brain aging level
79,80
. Further, genetics also
affects brain aging
34
. Therefore, the variation of RBA that can be explained by smoking
and alcohol drinking amount was small (as reflected by the small R-squared in the
regression model for quantifying the association of RBA with smoking and alcohol
drinking amount). More studies can be done to help fully understand the factors
associated with brain age. Third, we chose to use pack-years and grams of alcohol intake
per day for assessing the smoking and drinking amount. It is worth noting that the self-
reported smoking and drinking amount may not be accurate. Further, there are
alternative measurements for assessing smoking and drinking amount, which may yield
slightly different findings
81,82
.
30
Chapter 4 Association of relative brain age with lifestyle
factors -- parity
4.1 Introduction
In previous chapter, we talked about the association of RBA with smoking and alcohol
consumption, two adverse factors on brain aging. In this chapter, we talk about our
finding on association of RBA with parity, for which the association with brain aging was
unclear.
Previous research on the association between brain structural change and parity
reported inconclusive findings. Hoekzema et al. reported that the volume of certain gray
matter regions was reduced during pregnancy and the reductions did not recover for at
least 2 years post partum
55
, while others reported that the gray matter restoration
process was evident within the first few months postpartum
56,57
. Most studies on the
association between brain structure and parity had a relatively small sample size
(n<100) and less than three years of postpartum follow-up
55-57
. To date, it is still unclear
if there are any long-term effects of parity on brain structure in the mid-to-old age
population. We hypothesized that there may be an observable association between
parity and RBA and sought to investigate this question using UK Biobank data.
31
Researchers have also investigated the association between parity and cognitive
function in females, though different conclusions have been found. Some studies found
that parity was associated with better episodic memory and had a protective effect
against Alzheimer's disease (AD)
59,83
. Contrarily, parity has been associated with poor
word recall score, Mini Mental State Exam score, and AD neural pathology
84,85
. A recent
study of approximately 10,000 male and female subjects found an association between
the number of offspring and cognitive function in later life, including memory and
executive function, and suggested that socioeconomic status largely accounted for the
association
86
.
On the other hand, having offspring leads to significant life changes in both females and
males, all of which may impact the brain. For example, among low-parity men and
women, more frequent use of alcohol and tobacco was observed
87
. Children might serve
as a 'bridge' connecting parents to more social and community activities
88
. Adult
children can provide parents with emotional and social support, as well as instrumental
support such as s shopping and house work
89,90
. Modig et al reported that having
offspring was associated with lower mortality risk in both sexes. Interestingly, the
differences in death risks between subjects with and without offspring were slightly
larger for men than for women
91
. Therefore, we hypothesized that lifestyle and
environmental factors accompanying having offspring, other than pregnancy history,
might also play a role in the association between parity and wellbeing of the brain. In
this case, an association between parity and wellbeing of the brain would be observed
32
in both men and women. For the purposes of our study, we extended the definition of
parity to be the number of offspring for both men and women.
4.2 Results
4.2.1 Demographic information
Brain imaging data were obtained for 6,822 women and 6,762 men. Among female
subjects, 21% were childless, 14% had one child, 44% had two children, 16% three
children, and 5% four or more children. Among male subjects, 20% were childless, 13%
had one child, 45% had two children, 17% three children, and 5% four or more children.
Table 4.1 provides summary statistics all covariates considered in the analyses grouped
by sex. Descriptive results for subjects with cognitive function data are shown in
Supplementary Table 4.1.
Table 4.1 Demographic information of subjects included in the analyses for the
association between parity and relative brain age.
Female (n=6822) Male (n=6762)
Number of offspring, n (%)
0 1456 (21.3%) 1351 (20%)
1 920 (13.5%) 843 (12.5%)
2 3020 (44.3%) 3074 (45.5%)
3 1117 (16.4%) 1136 (16.8%)
>=4 309 (4.5%) 358 (5.3%)
Age, mean (SD) 62.2 (7.3) 64 (7.5)
Education, n (%)
College or university degree 3091 (45.3%) 3306 (48.9%)
Other degree 3731 (54.7%) 3456 (51.1%)
BMI, n (%)
Normal 3247 (47.6%) 2227 (32.9%)
Obese 1151 (16.9%) 1238 (18.3%)
Overweight 2354 (34.5%) 3284 (48.6%)
Underweight 70 (1%) 13 (0.2%)
Household income, n (%)
33
Less than 18,000 986 (14.5%) 654 (9.7%)
18,000 to 30,999 2050 (30%) 1818 (26.9%)
31,000 to 51,999 2036 (29.8%) 2184 (32.3%)
52,000 to 100,000 1379 (20.2%) 1644 (24.3%)
Greater than 100,000 371 (5.4%) 462 (6.8%)
Past tobacco smoking, n (%)
Abstained from smoking 3582 (52.5%) 3068 (45.4%)
Just tried once or twice 1162 (17%) 1162 (17.2%)
Occasionally 879 (12.9%) 787 (11.6%)
On most or all days 1199 (17.6%) 1745 (25.8%)
Alcohol intake, n (%)
Abstained from drinking 413 (6.1%) 276 (4.1%)
Special occasions only 849 (12.4%) 394 (5.8%)
1~3 times a month 918 (13.5%) 640 (9.5%)
1~2 times a week 1950 (28.6%) 1777 (26.3%)
3~4 times a week 1794 (26.3%) 2291 (33.9%)
Daily or almost daily 898 (13.2%) 1384 (20.5%)
Sleep duration, n (%)
Normal 5045 (74%) 5198 (76.9%)
Short 1692 (24.8%) 1493 (22.1%)
Long 85 (1.2%) 71 (1%)
Living with others, n (%)
No 1497 (21.9%) 1049 (15.5%)
Yes 5325 (78.1%) 5713 (84.5%)
Diabetes, n (%)
No 6633 (97.2%) 6405 (94.7%)
Yes 189 (2.8%) 357 (5.3%)
Hypertension, n (%)
No 5482 (80.4%) 4760 (70.4%)
Yes 1340 (19.6%) 2002 (29.6%)
4.2.2 Number of offspring and relative brain age
The number of offspring was significantly associated with RBA in both sexes. In 500
random samplings, median ANOVA p-value for the association between number of
offspring and RBA was <0.001 for both female and male subjects. Among females,
34
compared with those who were childless, subjects with two offspring were estimated to
have a brain age that was 0.5 years younger, and subjects with three offspring were
estimate to have a brain age that was 0.7 years younger. Among males, subjects with
two offspring were estimated to have a brain age that was 0.6 years younger, and
subjects with three offspring were estimate to have a brain age that was 0.7 years
younger. In female subjects, a significant linear trend (p<0.001) of the association was
observed, while in male subjects a quadratic trend (p<0.001) was observed (Figure 4.2).
Table 4.1 shows the median parameter estimates for the number of offspring across the
500 samplings. No significant interaction was observed between number of offspring
and sex on RBA.
35
Figure 4.1. Procedure for studying the association between number of offspring and relative brain age using
samplings.
36
Figure 4.2. Distribution of relative brain age predicted by model with multivariable adjustment over 500 samplings
in female (left) and male (right) subjects.
Table 4.2. Median of coefficient estimations for number of offspring in association with
relative brain age in regression model with multivariable adjustment in 500 samplings.
Female Male
Coefficient (95% CI)
Coefficient (95% CI)
Childless (baseline)
Childless (baseline)
1 offspring -0.21 (-0.66,0.24) 1 offspring -0.46 (-0.93,0.01)
2 offspring -0.52 (-0.87,-0.17)* 2 offspring -0.62 (-0.99,-0.25)*
3 offspring -0.72 (-1.15,-0.29)* 3 offspring -0.68 (-1.13,-0.23)*
>=4 offspring -0.69 (-1.36,-0.02)* >=4 offspring -0.41 (-1.06,0.24)
*p-value<0.05
95% CI is inferred from median standard error in 500 samplings
37
4.2.3 Number of offspring and cognitive function
In female subjects, the number of offspring was statistically significantly associated with
both response time and visual memory according to regression models that adjusted for
covariates as described in the methods section (ANOVA F-test p-values < 0.001).
Compared with subjects who were childless, those with any number of offspring had
shorter response time and made fewer mistakes in visual memory task. A non-linear
relationship was observed between number of offspring and response time and
between number of offspring and visual memory, confirmed with statistically significant
quadratic trend tests (p-value = 0.002 for response time, and p-value <0.001 for visual
memory). Figures 4.4 and 4.4 illustrate these trends, and parameter estimates for
number of offspring in the two models are listed in Tables 4.3 and 4.4, respectively.
Pairwise comparisons among parity groups showed that subjects with 2 or 3 children
and those who were childless had the largest differences in both response time and
visual memory, although for visual memory score among females the difference did not
reach statistical significance when adjusted for multiple testing.
In male subjects, the number of offspring was also significantly associated with both
response time and visual memory (ANOVA F-test p-values < 0.001). Compared with
subjects who were childless, those with offspring had shorter response time and made
fewer mistakes in visual memory task. Similar to females, a quadratic trend existed for
the associations with both outcomes (p-value<0.001) as shown in Figures 4.3 and 4.4
and parameter estimates for number of offspring are listed in Tables 4.3 and 4.4.
38
Figure 4.3. Number of offspring versus response time predicted by model with multivariable adjustment in female
(left) and male (right) subjects. The unit of response time is millisecond.
39
Figure 4.4. Number of offspring versus visual memory score in female (left) and male (right) subjects. The unit of
visual memory is log(number of mistakes made in memorizing matching cards).
Table 4.3. Coefficient estimations of number of offspring in association with response
time in regression model with multivariable adjustment. The unit of response time is
millisecond.
Female Male
Coefficient (95% CI)
Coefficient (95% CI)
Childless (baseline)
Childless (baseline)
1 offspring -4.18 (-6.05,-2.31)** 1 offspring -7.45 (-9.4,-5.50)**
2 offspring -7.30 (-8.77,-5.83)** 2 offspring -9.24 (-10.76,-7.71)**
3 offspring -6.24 (-8.00,-4.47)** 3 offspring -7.93 (-9.76,-6.09)**
>=4 offspring -4.47 (-7.02,-1.93)** >=4 offspring -10.36 (-12.9,-7.81)**
**p-value<0.001
40
Table 4.4. Coefficient estimations of number of offspring in association with visual
memory score. The unit of visual memory is log(number of mistakes made in
memorizing matching cards).
Female Male
Coefficient (95% CI)
Coefficient (95% CI)
Childless (baseline)
Childless (baseline)
1 offspring -0.01 (-0.02,0.00) 1 offspring -0.04 (-0.05,-0.03)**
2 offspring -0.02 (-0.03,-0.01)** 2 offspring -0.06 (-0.07,-0.05)**
3 offspring -0.03 (-0.04,-0.02)** 3 offspring -0.06 (-0.07,-0.05)**
>=4 offspring -0.02 (-0.03,0.00)* >=4 offspring -0.06 (-0.07,-0.04)**
*P-value<0.05
**P-value<0.001
Regression models with integrated data from both sexes indicated significant interaction
between number of offspring and sex on response time and visual memory, where
protective effects of having offspring on cognitive function appeared to be larger in
male subjects than female subjects (p-value of interaction <0.001 for both cognitive
functions).
4.3 Materials and methods
4.3.1 Demographic information
Demographic information, including parity (i.e., number of live births for women, and
number of children fathered for men), age, education, body mass index (BMI), average
total household income, past tobacco smoking frequency, alcohol intake frequency,
sleep duration, living alone or with others, diabetes, and hypertension disease status,
41
was obtained for all subjects. Parity was further categorized into 5 groups: no offspring,
1 offspring, 2 offspring, 3 offspring, and >=4 offspring. We treated number of offspring
as a categorical variable rather than continuous for two reasons: first, the >=4 category
contained subjects with 4, 5, or more offspring did not have a linear relationship with
the other categories; second, we hypothesized that relationship of number of offspring
with cognitive function and relative brain age might not be linear.
4.3.2 Study the association between number of offspring and relative brain age
We investigated the association between the number of offspring and RBA by repeating
the following three-step procedure 500 times: First, we randomly split the samples into
sets A and B, each having equal size. Second, using set A, we trained a model to obtain
RBA based on MRI data and applied it to obtain RBA for set B. Third, using set B, we
examined the association between number of offspring and RBA adjusting for multiple
covariates. The procedure was repeated 500 times so that distribution of the parameter
of interest in all the rounds gave information on how sensitive the result is to the
random splits used. The analyses procedure is visualized in a flowchart in Figure 4.1. We
analyzed the association in females and males separately and then combined the data
of both sexes to look for interaction between sex and number of offspring in association
with RBA. In the association analyses, linear regression with multivariable adjustment
was use. We adjusted for age, education, body mass index (BMI), average total
household income, past tobacco smoking frequency, alcohol intake frequency, sleep
42
duration, living alone or with others, diabetes, and hypertension disease. Pairwise
comparisons of number of offspring and RBA were conducted using the Scheffe test.
Statistical significance was set at a=0.05 and all regression analyses were conducted
using the R language
67
.
4.3.3 Study the association between number of offspring and cognitive function
We first assessed the association between cognitive function (response time and visual
memory) and number of offspring for males and females separately using regression
models. We then combined data of both sexes, used an interaction term between sex
and number of offspring to test whether the overall associations were significantly
different for males and females.
4.4 Discussion
We studied the association of number of offspring with RBA, response time, visual
memory in the UK Biobank cohort.
Most previous studies on the long-term association between parity and wellbeing of the
brain only evaluated cognitive function
59,83,85,92
. Our study contributes new information
because we further looked into the association between parity and RBA, a biomarker of
structural aging of the brain, and observed findings that corroborated the association
between parity and cognitive function. In both sexes, subjects with any number of
43
offspring had younger appearing brain than subjects with no offspring. In male subjects,
the association between parity and RBA followed a "U-shape" pattern, where subjects
with 2 or 3 offspring had younger appearing brain compared to subjects with 0, 1, or
>=4 offspring. That was similar to the association observed between parity and cognitive
function. In comparison, a linear relationship was observed between number of
offspring and RBA in females. This linear association may be explained by the hormonal
fluctuation specifically linked to women's pregnancy history and remains to be further
investigated.
We observed that in both males and females, having offspring is associated with better
visual memory and faster response time after adjusting for age, education, BMI, income,
smoking, and other factors. One possible explanation is that parenthood naturally leads
to interactions and activities with children. Moreover, children might become a 'bridge'
connecting to more social and community activities
88
. Previous studies have shown that
social interactions are protective of cognitive function
93,94
. On the other hand, child
rearing is also associated with increased financial and physical stress
95,96
. Previous
studies also showed association between parity and increased cardiovascular disease
risk and increased BMI in both sexes
97,98
. This could possibly explain our observation of a
"U-shape" association; cognitive function did not monotonically improve with increasing
number of offspring.
Further, the association between cognitive function and parity was significantly stronger
in male subjects than in female subjects. Since males do not experience the physical
44
process of pregnancy, our observation further suggests that lifestyle factors
accompanying having offspring may play an important role in the association between
parity and cognitive function. Our finding is corroborated by a study by Zhang et al. that
showed that single men who were childless had significantly higher rates of loneliness
and depression compared with women in comparable circumstances
99
.
Relative strengths of the study are its large sample size, inclusion of both male and
female subjects, and observation in the association between RBA and parity that further
supported the association between cognitive function and parity. Our study also has a
few limitations. First, while we adjusted for a number of socioeconomic, lifestyle, and
health covariates in our model, we were not able to account for the possibility that they
may be time varying. Details of these covariates in early life could be useful for
understanding other underlying issues related to cognitive function and structural aging
of the brain. Second, the study is an observational study, so it is impossible to conclude
that having offspring is leading to improved brain health. It could also be possible that
those who have poor underlying health have fewer opportunities to have offspring.
Third, since only a small proportion of subjects had 5 or more children, we categorized
number of off-spring into 0, 1, 2, 3, and >=4 as in previous studies using this variable
86,97
,
and did not study the difference among those who have 4, 5, or more offspring. Fourth,
brain health is only a small part of overall health condition of the body. Although we
found that having offspring is associated with better visual memory, faster response
time, and a younger looking brain, we may not conclude that having offspring is
associated with improved wellness of the whole body.
45
In conclusion, we observed robust evidence that parity is associated with visual
memory, response time, as well as RBA in both sexes. Our observation suggests that
lifestyle factors associated with having offspring, likely shared by both sexes, contribute
to these associations. At the same time, we observed different detailed association
patterns within women and men, which suggest the importance of studying the
association between parity and wellbeing of the brain in the context of sex.
46
Supplementary items
Supplementary Table 4.1. Demographic information of subjects included in the analyses
for the association between parity and cognitive function.
Female (n=160,077) Male (n=143,119)
Number of offspring, n (%)
0 29931 (18.7%) 28808 (20.1%)
1 20621 (12.9%) 18059 (12.6%)
2 72997 (45.6%) 63598 (44.4%)
3 27989 (17.5%) 24210 (16.9%)
>=4 8539 (5.3%) 8444 (5.9%)
Age, mean (SD) 56.7 (7.9) 57.5 (8.1)
Education, n (%)
College or university degree 53736 (33.6%) 51636 (36.1%)
Other degree 106341 (66.4%) 91483 (63.9%)
BMI, n (%)
Normal 64032 (40%) 35033 (24.5%)
Obese 36119 (22.6%) 35719 (25%)
Overweight 58910 (36.8%) 72143 (50.4%)
Underweight 1016 (0.6%) 224 (0.2%)
Household income, n (%)
Less than 18,000 36161 (22.6%) 24930 (17.4%)
18,000 to 30,999 42706 (26.7%) 35129 (24.5%)
31,000 to 51,999 42046 (26.3%) 40240 (28.1%)
52,000 to 100,000 31273 (19.5%) 33775 (23.6%)
Greater than 100,000 7891 (4.9%) 9045 (6.3%)
Past tobacco smoking, n (%)
Abstained from smoking 75167 (47%) 54420 (38%)
Just tried once or twice 26613 (16.6%) 23376 (16.3%)
Occasionally 22917 (14.3%) 20014 (14%)
On most or all days 35380 (22.1%) 45309 (31.7%)
Alcohol intake, n (%)
Abstained from drinking 11034 (6.9%) 6084 (4.3%)
Special occasions only 20537 (12.8%) 8224 (5.7%)
1~3 times a month 21052 (13.2%) 12147 (8.5%)
1~2 times a week 43265 (27%) 37618 (26.3%)
3~4 times a week 36805 (23%) 41055 (28.7%)
Daily or almost daily 27384 (17.1%) 37991 (26.5%)
47
Sleep duration, n (%)
Normal 122086 (76.3%) 108010 (75.5%)
Short 35724 (22.3%) 33248 (23.2%)
Long 2267 (1.4%) 1861 (1.3%)
Living with others, n (%)
No 31468 (19.7%) 21886 (15.3%)
Yes 128609 (80.3%) 121233 (84.7%)
Diabetes, n (%)
No 154801 (96.7%) 133868 (93.5%)
Yes 5276 (3.3%) 9251 (6.5%)
Hypertension, n (%)
No 123179 (76.9%) 99636 (69.6%)
Yes 36898 (23.1%) 43483 (30.4%)
48
Chapter 5 Improving relative brain age estimate with a
convolutional neural network model and its implication on
identifying genetic factors associated with brain aging
5.1 Introduction
Besides lifestyle habits, genetic factors are also involved in brain aging. In this chapter,
we discuss the association between RBA and genetic factors.
A previous study analyzed brain imaging data and chronological age information from
twins and suggested that the brain aging process was heritable
34
. Two recent studies by
Jonsson et al. and by Ning et al. investigated the association between single nucleotide
polymorphisms (SNPs) and brain age using UK Biobank data, and both highlighted the
association between MAPT gene and brain age. Jonsson et al. used a convolutional
neural network (CNN) model for predicting brain age where the predictor was the whole
3D MRI scan
49
, while Ning et al. used a linear regression model for predicting brain age,
where predictors were brain morphometric measurements from MRI
100
. These two
studies both reported mean absolute error (MAE) of around 3.5 years between the
predicted brain age and the true chronological age.
Recently, Langner et al. trained a CNN for predicting age of the body based on whole-
body MRI of about 20,000 subjects. The MAE between predicted body age and true
49
chronological age reached 2.5 years
46
. As a comparison, when training the CNN for
predicting brain age, Jonsson et al. used a training set of fewer than 2,000 subjects, and
reached MAE of 3.6 years
49
. We hypothesized that we might improve the brain age
prediction accuracy through training a CNN model based on imaging data, where the
training set sample size is close to 20,000. Furthermore, a more accurate predicted brain
age not only provides a more reliable estimation of brain aging status, but will also allow
identification of stronger association between genetic factors and brain age.
In this study, we trained a CNN for predicting brain age, then obtained RBA and studied
genetic factors associated with RBA. To fully utilize the data, we crossed over the
samples to train models for obtaining RBA and then studied association between SNPs
and RBA. We did a side-by-side comparison for the PBA accuracy between the CNN
model and a regression model. We also compared the genetic factors identified using
RBA derived from the CNN and using RBA derived from the regression model.
5.2 Results
5.2.1 Predicted brain age accuracy from CNN model and regression model
The MAE of CNN model was 2.8 years in the evaluation set. As a comparison, the MAE of
linear regression model trained and evaluated on the same data set was 3.7. CNN model
had significantly smaller MAE than regression model (paired t-test p-value < 0.05).
50
We observed that when using a CNN model for predicting brain age, the metric PBA-CA
was negatively associated with CA (i.e., older subjects had negative PBA-CA while
younger subjects had positive PBA-CA; Figure 5.1). This was similar to the association
between PBA-CA and CA when PBA was derived from a linear regression model (as
shown in Chapter 2). Therefore, we derived RBA after obtaining PBA using CNN model
(RBA is described in Chapter 2), and then assessed the association between RBA and
SNPs.
51
Figure 5.1. Association between chronological age and predicted brain age from a convolutional neural network
model.
5.2.2 Genetic factors associated with brain aging
In total, 234 SNPs showed significant association with RBA (association p-value < 5E-8).
Figure 5.2 is the Manhattan plot showing association p-value between SNPs and RBA
52
across the genome. SNP rs199533, which is located in NSF gene on chromosome 17,
showed the most significant association with RBA (p-value = 3E-12). Multiple SNPs close
to rs199533 also showed significant association with RBA, some of which are located in
MAPT gene (Figure 5.3). Further, two SNPs on chromosome 4 showed significant
association with RBA, although they are not in any gene region (Figure 5.4).
Figure 5.2. Manhattan plot for the association p-values between SNPs and relative brain age across the genome.
The red line indicates the genome-wide significant threshold on p-value (i.e., 5E-8). The blue line indicates p-value
of 0.05.
53
Figure 5.3. Regional visualization of a 2-Mb locus on Chromosome 17 where the SNPs showing genome-wide
significant associations with relative brain age are located.
54
Figure 5.4. Regional visualization of a 1-Mb locus on Chromosome 4 where the SNPs showing genome-wide
significant associations with relative brain age are located.
We also obtained RBA with a linear regression model using the same set of data and
assessed the association between RBA and SNPs. In total, 208 SNPs showed significant
association with RBA (association p-value < 5E-8). We observed that although SNPs
showing significant association with RBA from the CNN model also showed strong
association with RBA from the regression mode, the association p-values were stronger
when using RBA from the CNN model (Figure 5.5). Supplementary Figure 5.1 is a
55
Manhattan plot for the association between SNPs and RBA derived from the linear
regression model.
Figure 5.5. Comparison of genetic signals identified using RBA from a linear regression model and using RBA from a
convolutional neural network model.
56
5.3 Material and methods
5.3.1 Summary of samples used
We used 16,998 subjects who had both quality controlled brain imaging data and
genetic data available. The quality control on brain imaging data has been described in
Chapter 2. Details of the genotyping and genotype calling procedures are described
elsewhere
101
. Quality control was done on both SNP level and sample level. Our quality
control on SNPs ensured that all SNPs had missing rate less than 0.02 and passed Hardy-
Weinberg exact test (i.e., Hardy-Weinberg equilibrium p-value >= 1E-6). Quality control
on the samples ensured that all subjects had genotyping rate greater than 0.98 and had
heterozygosity rate within ±3 standard deviation, had matched reported gender and
genetic gender, and were of European ancestry (according to both self-reported
ethnicity and genetic ethnicity based on principal component analyses). Related
individuals (i.e., kinship coefficient >0.1) were further removed. Genotype data of
675,827 autosomal SNPs for 16,998 subjects was used for further association analyses.
5.3.2 Obtaining relative brain age based on a CNN model
5.3.2.1 Training and evaluating CNN models
We used a five-fold cross-validation strategy for training a CNN model for predicting
brain age and for doing SNP-RBA association analyses. To be specific, we randomly split
the subjects into five sets, where each set had 20% of subjects, and had the same
distribution of age and gender. We used 80% of data for training a CNN model for
57
predicting brain age, and then further trained a linear model to adjust PBA for the effect
of chronological age to obtain RBA (as described in Chapter 2). We then applied the
trained CNN model and linear regression model to the remaining 20% of samples and
obtained PBA and RBA of these samples. We crossed over the split samples and
obtained RBA for each set of the 20% of subjects. In the five-fold cross validation,
median mean absolute error (MAE) between PBA from the CNN and chronological age
was 2.7 (ranged from 2.5 to 3.1). We then combined RBA of all evaluation set subjects
for genetic association analyses. Figure 5.6 illustrates the procedure for obtaining RBA
through a 5-fold cross-validation strategy and then carrying out RBA-SNP association
analyses.
Figure 5.6. Procedure for obtaining RBA through a 5-fold cross-validation strategy and then carrying out RBA-SNP
association.
58
5.3.2.2 Details of the CNN model
We used a CNN model with ResNet structure
102
implemented in NiftyNet for predicting
brain age based on 3D MRI data (https://niftynet.io). We chose ResNet structure
because empirical evidence has shown that this structure is easy to optimize and
performs well in imaging data analyses
102
. We down-sampled the MRI data from
182*218*182 to 91*109*91 before training the CNN, due to the GPU memory
limitations. The ResNet model provided by Niftynet was composed of a convolution
layer, six bottleneck layers, a fully connected layer, and an output layer. Multiple short-
cut connections were added among bottleneck layers. Niftynet allows the user to
specify the number of bottleneck layers and the number of filters in each layer. We
specified the structure as follows. The initial convolution layer had 64 filters. There were
six bottleneck layers: bottleneck layer 1 and bottleneck layer 2 had 128 filters;
bottleneck layer 3 and bottleneck layer 4 had 256 filters; bottleneck layer 5 and
bottleneck layer 6 had 512 filters. The model was trained on two GPUs with learning
rate at 0.0001. The CNN structure is illustrated in Figure 5.6.
59
Figure 5.7. Structure of the convolutional neural network model for predicting brain age based on MRI data. Curved
arrows indicate skip connections between layers.
60
5.3.3 Genetic association analyses
We used linear regression model implemented in PLINK
103
for genotypic test, adjusting
for gender and first three genetic principal components of ancestry, to test the
association between SNPs and RBA.
5.4 Discussion
In this study, we trained a CNN for obtaining PBA. The MAE between PBA and CA was
2.7. We observed that the metric PBA-CA was negatively associated with CA when using
CNN model (Figure 5.1). This is similar to the pattern between PBA-CA and CA when
using a regression model for predicting brain age. That is probably due to the nature of
the CNN model - the last layer of the CNN model we used was essentially a linear
regression model. The statistical reason for the negative association between PBA-CA
and CA was discussed by Smith et al.
39
Therefore, we further derived RBA after
obtaining PBA and assessed association between RBA and SNPs.
The SNP most significantly associated with RBA was located in NSF gene. NSF encodes
ATPase that is involved in cellular membrane fusion events, and is associated with
neuronal intranuclear inclusion disease
104
. It is also worth noting that SNPs in MAPT
gene also showed significant association with RBA. Previous studies also showed that
61
mutations in MAPT, which encodes tau protein, are associated with dementia and
Parkinson's disease
105
. Therefore, it is likely that both NSF and MAPT gene are functional
genes for brain aging. Further, functions of other SNPs showing significant association
with RBA remain to be investigated.
A CNN model was more accurate than a regression model in predicting brain age. The
genetic signals identified to be associated with RBA were very similar when using a CNN
model and using a regression model (Figure 5.2 and Supplementary Figure 5.1).
Nevertheless, the associations were stronger (i.e., with more significant p-value) when
using RBA derived from the CNN model than using RBA derived from the linear
regression model (Figure 5.5).
We also note that although the CNN model gave more accurate prediction of brain age
then the regression model, and therefore allowed identifying more significant
association between SNPs and RBA, the additional cost of time to train CNN models was
not trivial. For our training process, a CNN model took about two days to converge (with
two GPUs). As a comparison, training a regression model with FreeSurfer measurements
as predictors only took two minutes on one CPU.
In summary, a CNN model was able to more accurately predict brain age than a
regression model. With a more accurate PBA, and therefore a more accurate RBA, we
identified stronger association between SNPs and RBA. A more accurate PBA is likely to
lead to better understanding of the association between brain aging and other
environmental or lifestyle factors.
62
Supplementary Figure
Supplementary Figure 5.1. Manhattan plot for the association p-values between SNPs
and relative brain age across the genome, where RBA is derived from a linear regression
model. The red line indicates the genome-wide significant threshold on p-value (i.e., 5E-
8). The blue line indicates p-value of 0.05.
63
Chapter 6 Predicting Alzheimer's disease risk using both
neural network and regression models
6.1 Introduction
In previous chapters, we focused on cognitively normal subjects and studied factors
associated with 'normal' brain aging. In this chapter, we talk about our research on
Alzheimer's disease risk prediction. To be specific, we use a neural network (NN) model
to classify Alzheimers disease (AD) and healthy controls, and further used the NN to
predict progression from mild cognitive impairment (MCI) to AD. We also tried to
understand the 'knowledge' that NN model learned for classifying AD.
AD is characterized by specific brain structural changes and genetic risk factors
21,22,59
.
Measurements of structural changes based on MRI scans have previously been used to
classify AD patients versus cognitively normal (CN) subjects and to predict the risk of
progression from mild cognitive impairment (MCI) to AD. Statistical classification
models, such as support vector machines
14-17,24,25
, linear discriminant analysis
23,25
, and
regression models
26-28
, etc., have been successfully trained for that. On the other hand,
AD risk is also affected by genetic variants an individual carries, which can be measured
accurately from birth
59
. Previous studies have also used genome-wide genetic
information alone to predict AD occurrence with a logistic regression (LR) model
106,107
.
With the growing availability of data that includes both brain imaging and genetic data
64
for AD and CN subjects, researchers have combined the structural imaging data and
genetic data for these AD classification and prediction tasks
15,108,109
. Existing studies in
AD classification and prediction have relied on statistical models that primarily include
additive effects of the included structural imaging and genetic features. However, the
estimation of AD risk may be more accurate if interactions among brain and genetic
features are also included in these models
106,110,111
. To the best of our knowledge, no
research study has systematically investigated these interactions while building
statistical models for classifying AD subjects.
To capture the joint effects of brain and genetic features in AD risks as well as the
interactions among them, we chose neural network (NN) as our modeling tool (Hinton
and Salakhutdinov, 2006). For this reason, NNs are well suited for investigating diseases
with multifactorial pathophysiology and etiology, like AD, especially as datasets of
neuroimaging and genetic data grow in volume.
While NN models have been exceptionally successful at making predictions, they are
typically applied as "black-box" tools and not used to reveal the reasoning behind the
decisions. As a result, although NNs have been applied to predict AD risks
14,112
, the
important brain and genetic features and their interactions captured by the models
remain elusive. Recent advances in methods for interpretation of NN models allow
researchers to identify these salient and interacting features
113-116
. We take advantage
of similar methods to not only train a NN model for AD classification, but also to
65
investigate this model to identify the important predictors and interactions in the
model.
6.2 Results
6.2.1 Demographic information
In total, 138 AD patients, 225 CN subjects, and 358 MCI patients who had quality-
controlled quantitative brain structural data and genetic data were included in the
study. Among the 358 MCI patients, 166 progressed from MCI to AD during follow-up;
192 did not progress during at least 24 months of follow-up. Demographic information
of the subjects used in our study is summarized in Table 6.1.
Table 6.1. Demographic characteristics of Alzheimer’s disease, mild cognitive
impairment, and healthy control subjects.
Diagnostic Number Female|Male Age (median[min-max])
Cognitively Normal 225 113|112 74[56-90]
Alzheimer's Disease 138 60|78 75[56-91]
Mild Cognitive Impairment 358 148|210 73[55-88]
66
6.2.2 Models’ performance in classifying AD and in predicting progression from MCI to
AD
We trained NN models where brain morphometric and genetic data were used as
predictors for AD (Figure 6.1). NN models that included both brain morphometric and
genetic data performed significantly better than those that included either alone. Our
analyses showed that when both brain and SNP features were included as predictors,
the 100 sub-models of the best-performing NN model had a median AUC of 0.835 in
predicting MCI progression. When only SNP features or brain features were used as
predictors, the best-performing NN model had a median AUC of 0.689 and 0.820,
respectively. Performance of the 100 NN sub-models with both brain and SNP features
as predictors were significantly higher than that of the 100 NN sub-models with only
SNP features or only brain features as predictors (t-test p-value <2E-16 for both
comparisons).
Further, the best-performing NN model had a moderately, significantly higher AUC than
the best-performing LR model where both brain and SNP features were included as
predictors (the 100 sub-models of the best-performing LR model had a median AUC of
0.824 in predicting MCI progression; t-test p-value <2E-16), indicating that NN captured
interactions among brain and SNP features which improved the model's performance.
Figure 6.2 shows the AUC of the 100 sub-models of the best-performing NN and LR
models where only SNP features or only brain features were used as predictors and
where both features were used as predictors. We also note that random sampling of the
67
training and validation data affected the models' performance: the 100 sub-models of
the best NN model had an AUC between 0.811 and 0.846 in predicting MCI progression.
Both NN and LR models had high accuracy in classifying AD and CN subjects. According
to the internal validation data, the best-performing NN model with both brain and SNP
predictors had a median AUC of 0.948 while the best-performing LR model with both
brain and SNP predictors had a median AUC of 0.945.
Figure 6.1. Structure of a neural network model with two hidden layers.
68
Figure 6.2. Accuracy (measured as AUC) of the best-performing neural network and logistic regression models in
predicting progression from mild cognitive impairment to Alzheimer’s disease.
6.2.3 Important brain and SNP features used by NN model
Examination of the NN model with the highest AUC in the testing data revealed brain
and SNP features that were important in the model. The volume of the left middle
temporal gyrus, the left hippocampus, the right entorhinal cortex, the left inferior lateral
ventricle and the right inferior parietal lobe were the five most important brain features
in the model (i.e., these features had the largest absolute weights). As for genetic
features, the APOE ɛ4 risk allele dosage, a major AD genetic risk factor
59,117
, had the
highest weight in the NN model. Other genetic features did not have weight as large as
the aforementioned features. Table 6.2 lists the weight of the 5 most important brain
and genetic features in the best-performing NN model.
69
Table 6.2. Weight of important features for classifying AD patients and CN subjects in
the neural network model.
Brain features Weight Genetic features Weight
Left Middle Temporal Gyrus 0.60 APOE ε4 dosage 0.86
Left Hippocampus 0.56 rs10948363 (CD2AP) 0.29
Right Entorhinal Cortex 0.52 rs7274581 (CASS4) 0.29
Left Inferior Lateral Ventricle 0.48 rs17125944 (FERMT2) 0.24
Right Inferior Parietal Lobule 0.40 rs4147929 (ABCA7) 0.22
6.2.4 Interactions among brain and genetic features captured by NN model
Our analyses of interactions within the best-performing NN model revealed that both
brain and genetic features were involved in strong interactions. For example, the
strongest interaction captured by NN model was between the right parahippocampal
gyrus and the right lateral occipital gyrus. The second strongest interaction was between
the right banks of the superior temporal sulcus and the left posterior cingulate. The
interaction between the SNP rs10838725 and the left lateral occipital gyrus was the
third strongest interaction. Figure 6.3 shows the pairwise interaction among all the brain
and genetic features used in the NN model.
70
Figure 6.3. Strength of pairwise interactions among all the brain and genetic features used in the neural network
model.
6.3 Materials and Methods
6.3.1 Description of ADNI subjects in the study
We used brain imaging and genetic data from the Alzheimer's Disease Neuroimaging
Initiative (ADNI) database (http://adni.loni.usc.edu), a large dataset established in 2004
71
to measure the progression of healthy and cognitively impaired participants with brain
scans, biological markers, and neuropsychological assessments
20
. A goal of ADNI has
been to test whether serial magnetic resonance imaging (MRI), positron emission
tomography (PET), other biological markers, and clinical and neuropsychological
assessment can be combined to measure the progression of mild cognitive impairment
(MCI) and early Alzheimer’s disease (AD).
In total, 138 AD patients, 225 CN subjects, and 358 MCI patients who had quality-
controlled quantitative brain structural data and genetic data were included. AD
subjects were included if they maintained AD diagnosis throughout their follow-ups.
Similarly, healthy control subjects were included if they maintained healthy control
diagnosis throughout their follow-ups. We did not require 24 months of follow-up for
the AD and healthy control subjects. Only MCI subjects who stayed as MCI or progressed
to AD were considered. MCI subjects who reverted to a healthy control diagnosis were
excluded. Among the 358 MCI patients, 166 progressed from MCI to AD during follow-
up; 192 did not progress during at least 24 months of follow-up.
6.3.2 MRI brain imaging data and genotype data
Baseline imaging data was obtained using 1.5T or 3T MRI. Cortical reconstruction and
volumetric segmentation was performed using FreeSurfer
65,118
and obtained from the
ADNI database. Subjects who had good overall segmentation and passed a visual quality
control process were used in our analyses. Based on prior knowledge of brain regions
affected by AD
21,22
, we included volume measurements for the following 16 regions as
72
potential predictors in our models: hippocampus, entorhinal cortex, parahippocampal
gyrus, superior temporal gyrus, middle temporal gyrus, inferior temporal gyrus,
amygdala, precuneus, inferior lateral ventricle, fusiform, posterior cingulate, superior
parietal lobe, inferior parietal lobe, caudate, banks of superior temporal sulcus, lateral
occipital gyrus.
ADNI subjects were genotyped on three different platforms (i.e., Illumina Human 610-
Quad, Illumina Human Omni Express and Illumina Omni 2.5M). We merged the
genotype data from the three platforms. Quality control ensured that 1) all subjects
were of European ancestry and had genotyping rate greater than 0.95; 2) all SNPs had
missing rate less than 0.05 and passed Hardy-Weinberg exact test (i.e., p-value >= 1E-6).
We further extracted the genotype of APOE ɛ4 risk allele and 19 SNPs reported to be
significantly associated with AD in a previous genome-wide association study
59
. Missing
genotypes of AD-associated SNPs were imputed with IMPUTE2 using 1000 genome as
reference panel
119,120
.
6.3.3 Neural network and logistic regression models
We trained NN and LR models to classify AD versus healthy control subjects given brain
and SNP features as predictors. After training the models, we applied them to MCI
subjects to predict each MCI subject's risk of progression to AD.
A LR model assumes that each predictor contributes additively to the subject’s log odds
of AD, (see Equation 1).
73
𝑙𝑜𝑔
%
&'%
= 𝛽
*
+ 𝛽
,
𝑥
,
.
,/*
(1)
Here 𝑝 is the probability that a subject has disease; 𝛽
*
is a bias term; 𝛽
,
is the weight of
input feature 𝑥
,
, reflecting the strength and directionality of 𝑥
,
in affecting 𝑝, and 𝑚 is
the number of input features.
A NN is a network whose nodes (or “artificial neurons”) encode information with their
activation level (a real-valued number). We will refrain from referring to these nodes as
artificial neurons to avoid confusion with biological neurons. In a NN, nodes are
organized in multiple layers: an input layer, one or more hidden layers and an output
layer. Nodes in the input layer represent the brain and SNP features as predictors, nodes
in the second and third layers allow interactions among the predictors in the first layer,
and the output layer contains a single node that represents the disease risk (Figure 6.1).
More specifically, in a NN with L layers, let ℎ
,
7
denote the activation level of a node j in
the l-th hidden layer, and 𝑚
7
denote the number of hidden units in this layer. We
overload this notation to use ℎ
,
*
to represent the input features 𝑥
,
. Each hidden unit
activation is computed as the weighted sum of the nodes’ activations from the layer
below followed by a non-linear transformation function 𝑓(𝑥) (see Equation 2).
ℎ
9
7
= 𝑓 𝑏
9
7
+ ℎ
,
7'&
𝑤
,9
7
.
<=>
,/&
,𝑙 = 1,...,𝐿−1 (2)
Here 𝑤
,9
7
is the weight of the connection from ℎ
,
7'&
, the 𝑖 -th node in layer l-1, to ℎ
9
7
, the
𝑗 -th node in layer l. 𝑏
9
7
is a bias term that regulates the overall activation level for node
74
𝑗. The non-linear function 𝑓 in our model is the rectified linear function (ReLU): 𝑓(𝑥) =
max(0,x).
In the output layer, the NN predicts the log odds of AD using a weighted sum of the
hidden layer features, (see Equation 3)
𝑙𝑜𝑔
𝑝
1−𝑝
= 𝑏
J
+ ℎ
,
J'&
𝑤
,
J
.
K=>
,/&
(3)
where 𝑏
J
and 𝑤
,
J
are the corresponding bias and weights for the output layer. The
weights and biases for all layers are learned from the training data
41
.
Contrasting Equation 3 with Equation 1, we see that the last layer of a NN is identical to
a LR model except that the "features" are replaced with hidden activations at layer L-1,
which are highly nonlinear functions of the input.
2.3.1. Shortcut connections
We additionally employ shortcut connections that connect all nodes in the input layer
directly to the output layer
102,121
. This connectivity structure can be considered as a
hybrid between LR and NN, which allows our network to not waste resources modeling
additive effects in the input features and reserve the NN for complex interactions only.
To prevent over-fitting we used L1 regularization on the weights and early stopping for
both NN and LR models
122
. We trained both NN and LR models using MatConvNet
package
123
with identical training protocols, which allowed fair comparison of the two
models.
75
6.3.4 Procedures for model training and testing
6.3.4.1 Training, validation, and testing data
Model training and validation was done using data collected for subjects with AD and
healthy controls. Model testing was carried out for subjects with MCI (Figure 6.4).
Figure 6.4. Training, validation, and testing data.
6.3.4.2 Predictors in the models
We included brain and genetic features described in section 2.2 as predictors in the
models. We adjusted for age, gender, education and first three principal components
derived from the genetic data by including them as predictors. All predictors were
normalized to have a mean of zero and a variance of 1 across subjects.
6.3.4.3 Random sampling and model training
We used a random selection of 80% of the AD and healthy control subjects for training
models, and 20% of the remaining subjects for internal validation (i.e., for selecting the
number of training iterations of NN and LR models). This random selection of samples
76
was repeated 100 times for each model in order to take into account variations in the
data. After training the models, we applied them to the MCI subjects to test their ability
to predict progression to AD.
6.3.4.4 Hyper-parameters in the NN and LR models
The hyper-parameters explored include learning rate (ranging from 1E-3 to 1E-1) and
weight decay (ranging from 1E-5 to 1). We also explored the number of hidden nodes in
the two hidden layers for NN model (ranging from 2, 4, 8, up to 64 nodes in each layer).
In total, we assessed 100 NN models with different hyper-parameter combinations,
where the hyper-parameters were randomly selected from the afore-mentioned ranges.
For each NN model with a specific hyper-parameter combination, we trained a
corresponding LR model with the same learning rate and weight decay parameter
values. Therefore, we trained 100 NN models, along with their corresponding LR
models, where each model was trained and validated using 100 sets of randomly
selected AD and healthy control subjects.
6.3.4.5 Model evaluation
Accuracy of models was evaluated using receiver operating characteristic curve (ROC).
Since a model with a specific hyper parameter combination was trained and validated
using data of 100 subsets of random AD and healthy control subjects, 100 “sub-models”
(i.e., each sub-model has its own estimations of the weight parameters) were obtained
for that model. Thus, we used median area under the ROC curve (AUC) of these 100 sub-
77
models to represent a model's accuracy when applying it to the internal validation data
and the testing data.
6.3.5 Identifying important brain and SNP features
After training the NN models where both brain and SNP features were used as
predictors, we assessed the NN model with the highest accuracy in the testing data and
identified brain and SNP features that were important in the model. Within a trained NN
model, the importance of a feature is estimated with partial derivatives method
124
: for
each predictor 𝑥
,
, we took the derivative of the predicted log likelihood ratio of a
subject 𝑠 having AD with respect to 𝑥
,
and then averaged the derivative over all
subjects: the importance score of 𝑥
,
= Ε
O
(
P 7QR(%
S
/(&'%
S
))
P U
V
), where 𝑝
O
is the predicted AD
risk of subject 𝑠. The importance score is computed over all 100 rounds of the best-
performance model, and the magnitude of the median score is used to represent the
importance of predictor 𝑥
,
.
The same definition of importance predictors could be applied to the LR models. For LR
models whose log likelihood ratio is given by Equation (1), the importance score of 𝑥
,
evaluates to 𝛽
,
. In other words, the importance of a feature is given by the
corresponding regression coefficient. This is consistent with how LR has been used to
assess predictor importance. We also note that since the predictors were normalized to
have mean of zero and variance of 1, the importance score of the predictors we report
is not dependent on scale and is comparable to each other.
78
6.3.6. Identifying interactions among features
We identified pairwise interactions among brain and SNP features by investigating the
best-performing NN and LR models on the test set. Within a trained model, the pair-
wise interaction between features 𝑥
,
and 𝑥
9
is defined as 𝐼
,9
= 𝐸
O
(
P
Y
7QR(%
S
/(&'%
S
))
P U
V
P U
Z
),
where 𝑝
O
is the predicted probability that subject 𝑠 has AD. Note that for LR, the
interaction above has a closed-form solution, which is 𝐼
,9
= 0 for all 𝑥
,
and 𝑥
9
. This
serves as a sanity check that LR does not model pair-wise interactions among its
predictors.
For NNs, 𝐼
,9
does not admit a closed-form solution and must be computed numerically,
which may introduce estimation errors. Leveraging the fact that the theoretical
interaction scores for LR models are always 0, we apply the same numerical procedure
to estimate the interaction scores for both NN and LR models, and quantify the
significance of an interaction in NN by testing how significantly its score differs from the
corresponding score in a LR model. In particular, we calculated pair-wise interactions in
the 100 NN sub-models of the best-performing NN model. We also calculated pair-wise
interactions in the LR model that had the same hyper-parameters as the best-
performing NN model. For a given pair of features, we compared their interaction in the
100 NN sub-models and in the 100 LR sub-models using Wilcoxon test, then reported
the interaction strength as -log(p-value).
79
6.4 Discussion
In this study, we systematically employed NN models for classifying AD patients and CN
subjects and then investigated the ability of the trained NN models to identify important
predictors and interactions in the models.
We found that including both brain and genetic features as model predictors increased
the models' performance compared with only including either brain or SNP features in
the models. Genetic features were important for predicting MCI progression: a random
prediction usually yields an AUC around 0.5, while using genetic features as predictors
increased the median AUC of the best-performing NN models to 0.689. In comparison,
the best-performing NN models with brain features alone as predictors had better
performance in predicting MCI progression, with a median AUC of 0.820. Including both
brain and SNP features as predictors in the models further increased the models’
prediction accuracy by a moderate amount: median AUC of the best-performing NN
models reached 0.835. Although the combined brain and SNP features achieves a higher
AUC, it is worth noting that while the SNP features are available at birth, the brain
features may not reflect the neurodegenerative changes associated with AD risks for
subjects at a younger age
2,125
.
Analyses of the trained NN models indicated that measurements of the middle temporal
gyrus, the hippocampus and the entorhinal cortex were the most important brain
features for predicting AD risks. The hippocampus plays an important role in memory
formation and is well known to be affected by AD
21,22,126
. Further, a previous study
80
reported that these three structures were among the ones with the largest effect size
on MCI progression
127
. As for genetic features, APOE ɛ4 risk allele dosage had the
highest weight in the NN model. While GWAS had identified other genetic loci
significantly associated with AD risks
59
, the weights of those features were lower than
those of the aforementioned brain features in the model.
While the NN model performed significantly better than the LR model (p-value <2e-16),
the performance increase is modest. The median AUC of the best-performing LR model
was 0.824, while median AUC of the best-performing NN model was 0.835. Over all, the
interaction effects were subtle compared to the additive effects. However, this modest
performance increase, which is likely a result of the NN modeling interactions among
predictors, provides evidence that there do exist interactions among the brain and
genetic features. Beyond building the NN models for classification and prediction, there
is important knowledge about disease pathophysiology to be gained in studying the
interactions within them. Existing NN models of AD risk have all been treated as a black-
box and not further investigated. For the first time in the field of imaging genetics, we
use very recently developed NN analysis techniques to investigate the NN models and
identify the important interactions among the brain and genetic features in affecting AD
risk.
Our novel analysis of the trained NN model revealed three strong interactions of AD risk
factors, each of which are biologically plausible, provide insight into the pathophysiology
of AD, and warrant further study. First, the NN identified a strong interaction between
81
the right parahippocampal gyrus and the right lateral occipital gyrus. A relevant finding
was reported by Sommer et al.
128
, who observed correlated activity in the occipital and
the parahippocampal cortex during encoding and the resulting memory trace. We would
therefore expect AD patients to have aberrant connectivity between these two regions,
either structurally or functionally as measured by diffusion weighted MRI or functional
MRI. Second, the NN identified an interaction between the right banks of the superior
temporal sulcus and the left posterior cingulate. The posterior cingulate cortex is a
central part of the default mode network in the brain and is known to have prominent
projections to the superior temporal sulcus
129
. Third, the NN found an interaction
between the SNP rs10838725 and the left lateral occipital gyrus. Recent evidence
suggests that the functional gene of this SNP with respect to AD is the SPI1 gene, which
plays a role in myeloid cell function
130
. Neuroinflammation in AD, which is mediated by
myeloid cells, is known to occur in the occipital cortex
131
. The mechanism of
interactions among such brain and genetic features require further study.
Our study has some limitations. First, there are different ways to define MCI progression
versus non-progression. Longer follow-up time may allow us to observe more MCI
patients progressing to AD. The models’ accuracy in predicting MCI progression may
change when different definitions of progression are used and when observations from
longer follow-up are used. Second, as more GWAS are carried out, our understanding of
the SNPs and genes associated with AD will get updated. Therefore, the importance
score of the predictors and the predictor interactions should be interpreted with that in
mind and would need further validation. Third, since NN models have more parameters
82
than LR models, more data points are needed to train the NN models
132
. It is possible
that with a larger sample size, NN may have a more significant advantage compared to
LR models. Also, our findings on the important predictors and interactions among
predictors may get refined and updated as the sample size increases. Fourth, the NN
models with different structures may have different performance. In our analyses, we
used a structure with two hidden layers and a direct connection between the input layer
and the output layer. As a comparison, in the previous studies using NN models for
classifying AD, only two hidden layers were used, and no direct connection was built
between the input layer and the output layer
14,112
. Finally, interpreting NN models
remains an open research topic. Our definition of important predictors and interactions
was based on derivatives of the log likelihood of disease risk, while alternative
definitions
114,116,124,133
may reveal other insights into the model.
To summarize, we trained NN models for classifying AD and CN subjects, yielding models
with good performance on the task of classifying and predicting AD. Our novel analyses
of the trained NN models led to findings of important brain and genetic features and
interactions among them that affect AD risk, which can guide future research on AD
etiology. Our approach of training and investigating NN models can be particularly
valuable for understanding etiology of diseases that have multiple, interacting risk
factors.
83
Chapter 7: Conclusions and future directions
7.1 Summary of findings
Large-scale multimodal data has a great potential to help us understand and defy brain
aging. In our research, we analyzed these data and help to realize this potential.
In Chapters 2-5, we studied factors associated with brain aging in healthy subjects.
Chapter 2 introduced a process to obtain RBA, an imaging based biomarker indicating
how old a subject’s brain structure appears compared to peers, using a regression-based
method. It allowed direct comparisons among subjects with different chronological
ages. Chapter 3 focused on the association of RBA with smoking and alcohol
consumption, two factors that were typically believed to be detrimental to the brain.
We found that subjects who smoked on most or all days had brains appearing 0.6 years
older than subjects who didn't smoke on most or all days, while there was no significant
difference among the groups of subjects who didn’t smoke on most or all days and
those who abstained from smoking. We also found that subjects who drank alcohol on
most or all days had brains appearing 0.4 years older than subjects who didn't drink on
most or all days, while there was no significant difference among the groups of subjects
who didn’t drink on most or all days and those who abstained from drinking. Chapter 4
focused on the association of RBA with number of offspring in both males and females.
Before our study, it was unclear if having offspring had long-term effect on the brain.
We found that in both sexes, subjects with any number of offspring had younger
84
appearing brain than subjects with no offspring. In male subjects, the association
between parity and RBA followed a "U-shape" pattern, where subjects with 2 or 3
offspring had younger appearing brain compared to subjects with 0, 1, or >=4 offspring.
In comparison, a linear relationship was observed between number of offspring and RBA
in females. This linear association may be explained by the hormonal fluctuation
specifically linked to women's pregnancy history and remains to be further investigated.
Our finding suggested that lifestyle factors accompanying having offspring, rather than
the physical process of pregnancy experienced only by females, contribute to these
associations. After studying the association between RBA and lifestyle factors, we
studied association between RBA and genetic factors in Chapter 5. We hypothesized
that a CNN model may produce a more accurate RBA metric than a regression model
and therefore facilitate to better identify genetic factors that are significantly associated
with RBA. After obtaining RBA from a CNN model, we found that the most significant
RBA-associated SNPs were in a chromosome 17 locus, which highlighted involvement of
NSF gene and MAPT gene in brain aging. We also trained a regression model for
obtaining RBA and then identified SNPs associated with RBA from the regression model.
Although SNPs showing significant association with RBA from the CNN model also
showed strong association with RBA from the regression mode, the association p-values
were stronger when using RBA from the CNN model. We concluded that a CNN model
more accurately assessed the RBA than a regression model and allowed identifying
more significant SNP-RBA association.
85
Through literature review, we found previous studies that provide insights for the
associations of RBA with lifestyle and genetic factors. For example, we found that
subjects who smoked or consumed alcohol on most or all days had a significantly higher
RBA compared to subjects who smoked or consumed alcohol less often. That was
consistent with previous studies, which showed significantly greater rate of atrophy in
certain regions of the brains of smokers and drinkers
50-54,74
. Further, alcohol
consumption causes dehydration, which may also increase apparent brain age
134,135
. On
the other hand, there was no significant RBA difference among subjects who smoked
occasionally and those who abstained from smoking, or among subjects who drank
alcohol less frequently and those who abstain from drinking. We hypothesize that the
benefit of nicotine and wine drinking previously reported might counteract the
detrimental effect of smoking and alcohol consumption
68,69,73,75,76
. We also found that in
both females and males, parity was associated with younger brain age and improved
cognitive function. That was corroborated by a previous research which reported that
having offspring was associated with lower mortality risk in both sexes
91
. Further,
children might serve as a 'bridge' connecting parents to more social and community
activities
88
. Adult children can provide parents with emotional and social support, as
well as instrumental support such as s shopping and house work
89,90
. All of these could
contribute to the overall health status of parents. We further identified SNPs in NSF and
MAPT genes that were significantly associated with RBA. Previous literature suggested
that both these genes are likely to be functional genes for brain aging. NSF encodes
ATPase that is involved in cellular membrane fusion events, and is associated with
86
neuronal intranuclear inclusion disease, a progressive neurodegenerative disease
104
.
MAPT gene encodes tau protein, which accumulates in Alzheimer's disease brains. Also,
mutations in MAPT are associated with dementia and Parkinson's disease
105
.
In Chapter 6, we studied brain aging for AD. To be specific, we trained NN models to
classify AD using brain morphometric measurements derived from MRI data and genetic
data. We then applied the model to MCI subjects and predict their progression to AD.
NN models with both brain and SNP features as predictors performed significantly
better than NN models with either genetic or brain features alone in predicting
progression from MCI to AD. To be specific, median AUC was 0.835, 0.820, and 0.689 for
NN models with both features as predictors, with only brain features as predictors, and
with only genetic features as predictors, respectively. Further, NN models performed
better than regression models in predicting progression to AD, indicting that NN models
captured interactions among predictors that improved the models' performance. We
further identified strong brain and genetic features interactions in affecting AD risk
through analyzing the NN model. NN showed a strong interaction between the right
parahippocampal gyrus and the right lateral occipital gyrus. A study by Sommer et al.
128
found correlated activity in the occipital and the parahippocampal cortex during
encoding and the resulting memory trace. We therefore expect AD patients to have
aberrant connectivity between these two regions, either structurally or functionally as
measured by diffusion weighted MRI or functional MRI. The NN also showed an
interaction between the SNP rs10838725 and the left lateral occipital gyrus. The
functional gene of this SNP with respect to AD is the SPI1 gene, which plays a role in
87
myeloid cell function
130
. Neuroinflammation in AD, which is mediated by myeloid cells,
is known to occur in the occipital cortex
131
.
Although we found previous studies that supported our findings, further experiments or
analyses need to be carried out to further understand the biology behind the
associations we observed.
7.2 The pros-and-cons of NNs for analyzing big data
We used both traditional regression models and NN models for brain age prediction and
for Alzheimer’s disease risk prediction. In both tasks, NN models showed better
accuracy than regression models. Nevertheless, there are some disadvantages of using
NN models, especially when the training sample size is small or when the access to GPU
is limited.
A NN model is likely to outperform a regression model when there are interactions
among input features, and the training set is large enough. However, when the sample
size of the training set is small, a NN may not always outperform a regression model.
This is because with the same number of input (or predictors), a NN model has a more
complicated structure and therefore more parameters to fit than a regression model. A
small sample size is more likely to cause under-fitting in NN models than in regression
models. Schulz et al. reported that in an imaging classification task, accuracy of both NN
models and regression models increased as sample size increased. Further, with more
88
observations available for model training, a NN model improved more in prediction
accuracy over a regression model
136
.
It is also worth noting that many hyper-parameters in the NN model can be explored for
the NN model to reach a better accuracy. Those include but are not limited to learning
rate, cost function, number of nodes in each layer, as well as the network structure.
However, due to limited access to GPU when conducting our research, we only tried a
few hyper-parameters. Although our CNN model achieved the state-of-art accuracy in
predicting brain age, an even better CNN can probably be trained with more
computation power.
Another concern in using NN models is the computation time. In our task for training a
model to predict brain age with 14,000 training subjects, a NN model took about two
days to converge (with 2 GPUs). As a comparison, training a regression model with
FreeSurfer measurements as predictors only took two minutes. When there is limited
access to GPU, it may be more practical to try regression models first. If the regression
model indicates a potentially promising finding, we can switch to a NN model for further
investigation.
7.3 Identifying factors associated with brain aging and the next step
We found that RBA was significantly associated with smoking, alcohol consumption,
parity history and genetic factors. We note that many other factors may also affect brain
89
aging. For example, researchers have shown that exercise improves general health and
may also slow down brain aging and reduce AD risk
137,138
, while diseases such as
diabetes and Schizophrenia are associated with accelerated brain aging
30,139
. As large-
scale data covering these factors become increasingly accessible, the process of
obtaining RBA as well as the NN models discussed in this thesis can be easily extended
to reveal the connections between these factors and brain aging.
The thesis has only focused on identifying lifestyle and genetic factors associated with
brain aging. However, a more interesting and practical question is if we could make a
comprehensive plan to interfere with the brain aging process (Figure 7.1). This thesis has
set a few potential directions, such as restraining from excessive smoking and alcohol
consumption, developing therapies targeting the gene MAPT or NSF. In addition, RBA
provides the foundation for validating such a plan.
Figure 7.1 Paradigm of research on factors associated with brain aging.
7.4 Potential of dementia onset prediction based on imaging data
Nowadays, there is no cure for Alzheimer's disease, the most common form of
dementia
6
. If people at risk of developing AD or other forms of dementia were identified
90
at a much earlier stage when no symptom arises, they could have more time to identify
a preventative method most suitable for them.
Previous studies showed that AD patients have an accelerated brain age of about 10
years
29
compared to their chronological age. We hypothesize that the accelerated brain
age may have started many years before any AD symptom arises and can be captured by
RBA. Therefore, we suggest that a long-term follow up study be carried out to assess
how RBA can be used to accurately predict AD onset. Three sources of information need
to be collected. First, brain MRI data for cognitively healthy subjects at baseline, and at
each follow-up visit if possible. In this way, RBA can be monitored longitudinally.
Second, cognitive function score and dementia status of the subjects during each visit.
Third, data on other factors that have been have been associated with AD
140-142
, such as
genetic mutation, amyloid beta plaques identified through PET scan, serum or blood
biomarkers including amyloid beta and neurofilament, etc. This project has a potential
obstacle in applying the model for obtaining RBA trained with UK Biobank to another
dataset. The UK Biobank data were collected for subjects 45 years or older using the
same type of scanner across different data collection centers. Therefore the model's
accuracy may drop when applied to another data set that has an age range different
from that of UK Biobank, or was obtained from MRI machines different from that used
by UK Biobank. A potential solution is transfer learning
143
, where the model for
obtaining RBA is re-trained with part of the new data set.
91
7.5 Conclusion
The knowledge we learned about brain aging has two major implications on health care.
First, with a clear picture of the environmental and lifestyle factors associated with brain
aging, people can get educated to avoid the detrimental factors and adopt habits that
are beneficial to the brain in their daily life. Second, people at a high risk of accelerated
brain aging or dementia can get early interventions ranging from changing their
lifestyles to joining clinical trials before the disease symptoms arise. Both implications
are conducive to the mission of shifting our focus from disease treatment to prevention,
a mission that becomes increasingly more impactful as our population continues to age.
92
References
1. Lindenberger, U. Human cognitive aging: corriger la fortune? Science 346,
572-8 (2014).
2. Jack, C.R., Jr. et al. Age, Sex, and APOE epsilon4 Effects on Memory, Brain
Structure, and beta-Amyloid Across the Adult Life Span. JAMA Neurol 72,
511-9 (2015).
3. Andersen, K. et al. Gender differences in the incidence of AD and vascular
dementia: The EURODEM Studies. EURODEM Incidence Research Group.
Neurology 53, 1992-7 (1999).
4. Ortman, J., Velkoff, V. & Hogan, H. An Aging Nation: The Older Population in
the United States. (2014).
5. Cole, J.H., Marioni, R.E., Harris, S.E. & Deary, I.J. Brain age and other bodily
'ages': implications for neuropsychiatry. Mol Psychiatry 24, 266-281 (2019).
6. Alzheimer's Association. Alzheimer’s disease - facts and figures. Alzheimer's &
Dementia (2019).
7. Hayes, C.E., Edelstein, W.A., Schenck, J.F., Mueller, O.M. & Eash, M. An Efficient,
Highly Homogeneous Radiofrequency Coil for Whole-Body NMR Imaging at
1.5 T. Journal of Magnetic Resonance 63(1985).
8. Kellner, T. Heady Times: This Scientist Took the First Brain Selfie and Helped
Revolutionize Medical Imaging. GE Reports (2015).
9. Calderon-Garciduenas, L. et al. Exposure to severe urban air pollution
influences cognitive outcomes, brain volume and systemic inflammation in
clinically healthy children. Brain Cogn 77, 345-55 (2011).
10. Gebarski, S.S. et al. The initial diagnosis of multiple sclerosis: clinical impact
of magnetic resonance imaging. Ann Neurol 17, 469-74 (1985).
11. Karayiannis, C. et al. Prevalence of Brain MRI Markers of Hemorrhagic Risk in
Patients with Stroke and Atrial Fibrillation. Front Neurol 7, 151 (2016).
12. Pereira, S., Pinto, A., Alves, V. & Silva, C.A. Brain Tumor Segmentation Using
Convolutional Neural Networks in MRI Images. IEEE Trans Med Imaging 35,
1240-1251 (2016).
93
13. Coffey, C.E. et al. Quantitative cerebral anatomy of the aging human brain: a
cross-sectional study using magnetic resonance imaging. Neurology 42, 527-
36 (1992).
14. Aguilar, C. et al. Different multivariate techniques for automated
classification of MRI data in Alzheimer's disease and mild cognitive
impairment. Psychiatry Res 212, 89-98 (2013).
15. Da, X. et al. Integration and relative value of biomarkers for prediction of MCI
to AD progression: spatial patterns of brain atrophy, cognitive scores, APOE
genotype and CSF biomarkers. Neuroimage Clin 4, 164-73 (2014).
16. Davatzikos, C., Bhatt, P., Shaw, L.M., Batmanghelich, K.N. & Trojanowski, J.Q.
Prediction of MCI to AD conversion, via MRI, CSF biomarkers, and pattern
classification. Neurobiol Aging 32, 2322 e19-27 (2011).
17. Davatzikos, C., Xu, F., An, Y., Fan, Y. & Resnick, S.M. Longitudinal progression
of Alzheimer's-like patterns of atrophy in normal older adults: the SPARE-AD
index. Brain 132, 2026-35 (2009).
18. Van Horn, J.D. & Toga, A.W. Human neuroimaging as a "Big Data" science.
Brain Imaging Behav 8, 323-31 (2014).
19. Allen, N.E., Sudlow, C., Peakman, T., Collins, R. & UK Biobank. UK biobank
data: come and get it. Sci Transl Med 6, 224ed4 (2014).
20. Petersen, R.C. et al. Alzheimer's Disease Neuroimaging Initiative (ADNI):
clinical characterization. Neurology 74, 201-9 (2010).
21. Weiner, M.W. et al. The Alzheimer's Disease Neuroimaging Initiative: a
review of papers published since its inception. Alzheimers Dement 9, e111-94
(2013).
22. Weiner, M.W. et al. 2014 Update of the Alzheimer's Disease Neuroimaging
Initiative: A review of papers published since its inception. Alzheimers
Dement 11, e1-120 (2015).
23. Eskildsen, S.F. et al. Structural imaging biomarkers of Alzheimer's disease:
predicting disease progression. Neurobiol Aging 36 Suppl 1, S23-31 (2015).
24. Orru, G., Pettersson-Yeo, W., Marquand, A.F., Sartori, G. & Mechelli, A. Using
Support Vector Machine to identify imaging biomarkers of neurological and
psychiatric disease: a critical review. Neurosci Biobehav Rev 36, 1140-52
(2012).
25. Wolz, R. et al. Multi-method analysis of MRI images in early diagnostics of
Alzheimer's disease. PLoS One 6, e25446 (2011).
94
26. Desikan, R.S. et al. Automated MRI measures identify individuals with mild
cognitive impairment and Alzheimer's disease. Brain 132, 2048-57 (2009).
27. Liu, X., Tosun, D., Weiner, M.W., Schuff, N. & Alzheimer's Disease
Neuroimaging, I. Locally linear embedding (LLE) for MRI based Alzheimer's
disease classification. Neuroimage 83, 148-57 (2013).
28. Young, J. et al. Accurate multimodal probabilistic prediction of conversion to
Alzheimer's disease in patients with mild cognitive impairment. Neuroimage
Clin 2, 735-45 (2013).
29. Franke, K., Ziegler, G., Kloppel, S., Gaser, C. & Alzheimer's Disease
Neuroimaging, I. Estimating the age of healthy subjects from T1-weighted
MRI scans using kernel methods: exploring the influence of various
parameters. Neuroimage 50, 883-92 (2010).
30. Franke, K., Gaser, C., Manor, B. & Novak, V. Advanced BrainAGE in older
adults with type 2 diabetes mellitus. Front Aging Neurosci 5, 90 (2013).
31. Nenadic, I., Dietzek, M., Langbein, K., Sauer, H. & Gaser, C. BrainAGE score
indicates accelerated brain aging in schizophrenia, but not bipolar disorder.
Psychiatry Res 266, 86-89 (2017).
32. Cole, J.H. & Franke, K. Predicting Age Using Neuroimaging: Innovative Brain
Ageing Biomarkers. Trends Neurosci 40, 681-690 (2017).
33. Liem, F. et al. Predicting brain-age from multimodal imaging data captures
cognitive impairment. Neuroimage 148, 179-188 (2017).
34. Cole, J.H. et al. Predicting brain age with deep learning from raw imaging data
results in a reliable and heritable biomarker. Neuroimage 163, 115-124
(2017).
35. Murphy, K.P. Machine Learning: A Probabilistic Perspective (2018).
36. Hibar, D.P., Kohannim, O., Stein, J.L., Chiang, M.C. & Thompson, P.M.
Multilocus genetic analysis of brain images. Front Genet 2, 73 (2011).
37. Potkin, S.G. et al. Hippocampal atrophy as a quantitative trait in a genome-
wide association study identifying novel susceptibility genes for Alzheimer's
disease. PLoS One 4, e6501 (2009).
38. Teipel, S.J., Kurth, J., Krause, B., Grothe, M.J. & Alzheimer's Disease
Neuroimaging, I. The relative importance of imaging markers for the
prediction of Alzheimer's disease dementia in mild cognitive impairment -
Beyond classical regression. Neuroimage Clin 8, 583-93 (2015).
95
39. Smith, S.M., Vidaurre, D., Alfaro-Almagro, F., Nichols, T.E. & Miller, K.L.
Estimation of brain age delta from brain imaging. Neuroimage 200, 528-539
(2019).
40. Gunther, F., Wawro, N. & Bammann, K. Neural networks for modeling gene-
gene interactions in association studies. BMC Genet 10, 87 (2009).
41. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436-44 (2015).
42. Hinton, G.E. & Salakhutdinov, R.R. Reducing the dimensionality of data with
neural networks. Science 313, 504-7 (2006).
43. Krizhevsky, A., Sutskever, I. & Hinton, G.E. Imagenet classification with deep
convolutional neural networks. (2012).
44. Silver, D. et al. Mastering the game of Go with deep neural networks and tree
search. Nature 529, 484-9 (2016).
45. Esteva, A. et al. Corrigendum: Dermatologist-level classification of skin
cancer with deep neural networks. Nature 546, 686 (2017).
46. Langner, T., Wikstrom, J., Bjerner, T., Ahlstrom, H. & Kullberg, J. Identifying
morphological indicators of aging with neural networks on large-scale
whole-body MRI. IEEE Trans Med Imaging (2019).
47. Bernal, J. et al. Deep convolutional neural networks for brain image analysis
on magnetic resonance imaging: a review. Artif Intell Med 95, 64-81 (2019).
48. Havaei, M. et al. Brain tumor segmentation with Deep Neural Networks. Med
Image Anal 35, 18-31 (2017).
49. Jonsson, B.A. et al. Brain age prediction using deep learning uncovers
associated sequence variants. Nat Commun 10, 5409 (2019).
50. Durazzo, T.C., Insel, P.S., Weiner, M.W. & Alzheimer Disease Neuroimaging, I.
Greater regional brain atrophy rate in healthy elderly subjects with a history
of cigarette smoking. Alzheimers Dement 8, 513-9 (2012).
51. Duriez, Q., Crivello, F. & Mazoyer, B. Sex-related and tissue-specific effects of
tobacco smoking on brain atrophy: assessment in a large longitudinal cohort
of healthy elderly. Front Aging Neurosci 6, 299 (2014).
52. Gallinat, J. et al. Smoking and structural brain deficits: a volumetric MR
investigation. Eur J Neurosci 24, 1744-50 (2006).
53. Pfefferbaum, A. et al. Brain gray and white matter volume loss accelerates
with aging in chronic alcoholics: a quantitative MRI study. Alcohol Clin Exp
Res 16, 1078-89 (1992).
96
54. Asensio, S. et al. Magnetic resonance imaging structural alterations in brain of
alcohol abusers and its association with impulsivity. Addict Biol 21, 962-71
(2016).
55. Hoekzema, E. et al. Pregnancy leads to long-lasting changes in human brain
structure. Nat Neurosci 20, 287-296 (2017).
56. Luders, E. et al. Potential Brain Age Reversal after Pregnancy: Younger Brains
at 4-6Weeks Postpartum. Neuroscience 386, 309-314 (2018).
57. Oatridge, A. et al. Change in brain size during and after pregnancy: study in
healthy women and women with preeclampsia. AJNR Am J Neuroradiol 23,
19-26 (2002).
58. Apostolova, L.G. et al. Hippocampal atrophy and ventricular enlargement in
normal aging, mild cognitive impairment (MCI), and Alzheimer Disease.
Alzheimer Dis Assoc Disord 26, 17-27 (2012).
59. Lambert, J.C. et al. Meta-analysis of 74,046 individuals identifies 11 new
susceptibility loci for Alzheimer's disease. Nat Genet 45, 1452-8 (2013).
60. Cole, J.H. et al. Brain age predicts mortality. Mol Psychiatry 23, 1385-1392
(2018).
61. Lowe, L.C., Gaser, C., Franke, K. & Alzheimer's Disease Neuroimaging, I. The
Effect of the APOE Genotype on Individual BrainAGE in Normal Aging, Mild
Cognitive Impairment, and Alzheimer's Disease. PLoS One 11, e0157514
(2016).
62. Hutcheon, J.A., Chiolero, A. & Hanley, J.A. Random measurement error and
regression dilution bias. BMJ 340, c2289 (2010).
63. Miller, K.L. et al. Multimodal population brain imaging in the UK Biobank
prospective epidemiological study. Nat Neurosci 19, 1523-1536 (2016).
64. Smith, S., Alfaro-Almagro, F. & Miller, K. UK Biobank Brain Imaging
Documentation. (2017).
65. Fischl, B. FreeSurfer. Neuroimage 62, 774-81 (2012).
66. Friedman, J., Hastie, T. & Tibshirani, R. Regularization Paths for Generalized
Linear Models via Coordinate Descent. J Stat Softw 33, 1-22 (2010).
67. R Core Team. R: A language and environment for statistical computing. R
Foundation for Statistical Computing (2012).
97
68. Ettinger, U. et al. Effects of acute nicotine on brain function in healthy
smokers and non-smokers: estimation of inter-individual response
heterogeneity. Neuroimage 45, 549-61 (2009).
69. Gold, M., Newhouse, P.A., Howard, D. & Kryscio, R.J. Nicotine treatment of
mild cognitive impairment: a 6-month double-blind pilot clinical trial.
Neurology 78, 1895; author reply 1895 (2012).
70. Almeida, O.P. et al. Coronary heart disease is associated with regional grey
matter volume loss: implications for cognitive function and behaviour. Intern
Med J 38, 599-606 (2008).
71. Gianaros, P.J., Greer, P.J., Ryan, C.M. & Jennings, J.R. Higher blood pressure
predicts lower regional grey matter volume: Consequences on short-term
information processing. Neuroimage 31, 754-65 (2006).
72. Cox, S.R. et al. Ageing and brain white matter structure in 3,513 UK Biobank
participants. Nat Commun 7, 13629 (2016).
73. Piumatti, G., Moore, S., Berridge, D., Sarkar, C. & Gallacher, J. The relationship
between alcohol use and long-term cognitive decline in middle and late life: a
longitudinal analysis using UK Biobank. J Public Health (Oxf) (2018).
74. Shokri-Kojori, E., Tomasi, D., Wiers, C.E., Wang, G.J. & Volkow, N.D. Alcohol
affects brain functional connectivity and its coupling with behavior: greater
effects in male heavy drinkers. Mol Psychiatry 22, 1185-1195 (2017).
75. Corrao, G., Rubbiati, L., Bagnardi, V., Zambon, A. & Poikolainen, K. Alcohol and
coronary heart disease: a meta-analysis. Addiction 95, 1505-23 (2000).
76. Ronksley, P.E., Brien, S.E., Turner, B.J., Mukamal, K.J. & Ghali, W.A. Association
of alcohol consumption with selected cardiovascular disease outcomes: a
systematic review and meta-analysis. BMJ 342, d671 (2011).
77. Kappus, N. et al. Cardiovascular risk factors are associated with increased
lesion burden and brain atrophy in multiple sclerosis. J Neurol Neurosurg
Psychiatry 87, 181-7 (2016).
78. Gu, Y. et al. Alcohol intake and brain structure in a multiethnic elderly cohort.
Clin Nutr 33, 662-7 (2014).
79. Luders, E., Cherbuin, N. & Gaser, C. Estimating brain age using high-resolution
pattern recognition: Younger brains in long-term meditation practitioners.
Neuroimage 134, 508-513 (2016).
98
80. Steffener, J. et al. Differences between chronological and brain age are related
to education and self-reported physical activity. Neurobiol Aging 40, 138-44
(2016).
81. Neuner, B. et al. Modeling smoking history: a comparison of different
approaches in the MARS study on age-related maculopathy. Ann Epidemiol
17, 615-21 (2007).
82. Wood, M.A., Kaptoge, S. & Butterworth, S.A. Risk thresholds for alcohol
consumption: combined analysis of individual-participant data for 599912
current drinkers in 83 prospective studies. The Lancet 391(2018).
83. Henderson, V.W., Guthrie, J.R., Dudley, E.C., Burger, H.G. & Dennerstein, L.
Estrogen exposures and memory at midlife: a population-based study of
women. Neurology 60, 1369-71 (2003).
84. Beeri, M.S. et al. Number of children is associated with neuropathology of
Alzheimer's disease in women. Neurobiol Aging 30, 1184-91 (2009).
85. Heys, M. et al. Life long endogenous estrogen exposure and later adulthood
cognitive function in a population of naturally postmenopausal women from
Southern China: the Guangzhou Biobank Cohort Study.
Psychoneuroendocrinology 36, 864-73 (2011).
86. Read, S.L. & Grundy, E.M.D. Fertility History and Cognition in Later Life. J
Gerontol B Psychol Sci Soc Sci 72, 1021-1031 (2017).
87. Kravdal, O. Is the relationship between childbearing and cancer incidence
due to biology or lifestyle? Examples of the importance of using data on men.
Int J Epidemiol 24, 477-84 (1995).
88. Furstenberg, F.F. Banking on families: how families generate and distribute
social capital Journal of marriage and family 67(2005).
89. Kramarow, E.A., Lentzner, H.R., Rooks R. N., Weeks, J.D. & Saydah, S.H. Health,
United States. (1999).
90. Ross, C.E. & Mirowsky, J. Family relationships, social support and subjective
life expectancy. J Health Soc Behav 43, 469-89 (2002).
91. Modig, K., Talback, M., Torssander, J. & Ahlbom, A. Payback time? Influence of
having children on mortality in old age. J Epidemiol Community Health 71,
424-430 (2017).
92. Davies, S.J., Lum, J.A., Skouteris, H., Byrne, L.K. & Hayden, M.J. Cognitive
impairment during pregnancy: a meta-analysis. Med J Aust 208, 35-40
(2018).
99
93. Wang, H.X., Karp, A., Winblad, B. & Fratiglioni, L. Late-life engagement in
social and leisure activities is associated with a decreased risk of dementia: a
longitudinal study from the Kungsholmen project. Am J Epidemiol 155, 1081-
7 (2002).
94. Yen, Y.C., Yang, M.J., Shih, C.H. & Lung, F.W. Cognitive impairment and
associated risk factors among aged community members. Int J Geriatr
Psychiatry 19, 564-9 (2004).
95. Blanchflower, D.G. & Clark, A.E. Children, Unhappiness and Family Finances:
Evidence from One Million Europeans. National Bureau of Economic
Research Working Paper 25597(2019).
96. Richter, D., Kramer, M.D., Tang, N.K.Y., Montgomery-Downs, H.E. & Lemola, S.
Long-term effects of pregnancy and childbirth on sleep satisfaction and
duration of first-time and experienced mothers and fathers. Sleep 42(2019).
97. Magnus, M.C. et al. Number of Offspring and Cardiovascular Disease Risk in
Men and Women: The Role of Shared Lifestyle Characteristics. Epidemiology
28, 880-888 (2017).
98. Peters, S.A., Huxley, R.R. & Woodward, M. Women's reproductive health
factors and body adiposity: findings from the UK Biobank. Int J Obes (Lond)
40, 803-8 (2016).
99. Zhang, Z. & Hayward, M.D. Childlessness and the psychological well-being of
older persons. J Gerontol B Psychol Sci Soc Sci 56, S311-20 (2001).
100. Ning, K., Zhao, L., Matloff, W. & Toga, A.W. Association of relative brain age
with tobacco smoking, alcohol consumption, and genetic variants. Scientific
Reports (in press).
101. UKBiobank. Genotyping and quality control of UK Biobank, a large-scale,
extensively phenotyped prospective resource. (2015).
102. He, K., Zhang, X., Ren, S. & Sun, J. Deep Residual Learning for Image
Recognition. arXiv:1512.03385 (2015).
103. Purcell, S. et al. PLINK: a tool set for whole-genome association and
population-based linkage analyses. Am J Hum Genet 81, 559-75 (2007).
104. Pountney, D.L., Raftery, M.J., Chegini, F., Blumbergs, P.C. & Gai, W.P. NSF, Unc-
18-1, dynamin-1 and HSP90 are inclusion body components in neuronal
intranuclear inclusion disease identified by anti-SUMO-1-immunocapture.
Acta Neuropathol 116, 603-14 (2008).
100
105. Goedert, M. NEURODEGENERATION. Alzheimer's and Parkinson's diseases:
The prion concept in relation to assembled Abeta, tau, and alpha-synuclein.
Science 349, 1255555 (2015).
106. Ebbert, M.T. et al. Population-based analysis of Alzheimer's disease risk
alleles implicates genetic interactions. Biol Psychiatry 75, 732-7 (2014).
107. Escott-Price, V. et al. Common polygenic variation enhances risk prediction
for Alzheimer's disease. Brain 138, 3673-84 (2015).
108. Kong, D. et al. Predicting Alzheimer's Disease Using Combined Imaging-
Whole Genome SNP Data. J Alzheimers Dis 46, 695-702 (2015).
109. Zhang, Z., Huang, H., Shen, D. & Alzheimer's Disease Neuroimaging, I.
Integrative analysis of multi-dimensional imaging genomics data for
Alzheimer's disease prediction. Front Aging Neurosci 6, 260 (2014).
110. Montembeault, M., Rouleau, I., Provost, J.S., Brambati, S.M. & Alzheimer's
Disease Neuroimaging, I. Altered Gray Matter Structural Covariance
Networks in Early Stages of Alzheimer's Disease. Cereb Cortex 26, 2650-62
(2016).
111. Delbeuck, X., Van der Linden, M. & Collette, F. Alzheimer's disease as a
disconnection syndrome? Neuropsychol Rev 13, 79-92 (2003).
112. Sankari, Z. & Adeli, H. Probabilistic neural networks for diagnosis of
Alzheimer's disease using conventional and wavelet coherence. J Neurosci
Methods 197, 165-70 (2011).
113. Ribeiro, M.T., Singh, S. & Guestrin, C. Why should i trust you?: Explaining the
predictions of any classifier. Proceedings of the 22nd ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, 1135-
1144 (2016).
114. Zeiler, M.D. & Fergus, R. Visualizing and understanding convolutional
networks. European conference on computer vision, 818-833 (2014).
115. Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks.
arXiv:1703.01365 (2017).
116. Tsang, M., Cheng, D. & Liu, Y. Detecting Statistical Interactions from Neural
Network Weights. arXiv:1705.04977 (2017).
117. Saunders, A.M. et al. Association of apolipoprotein E allele epsilon 4 with late-
onset familial and sporadic Alzheimer's disease. Neurology 43, 1467-72
(1993).
101
118. Miriam Hartig, D.T.-S., Sky Raptentsetsang, Alix Simonson, Adam Mezher,
Norbert Schuff, Michael Weiner. UCSF FreeSurfer Methods. (2014).
119. Howie, B., Marchini, J. & Stephens, M. Genotype imputation with thousands of
genomes. G3 (Bethesda) 1, 457-70 (2011).
120. Howie, B.N., Donnelly, P. & Marchini, J. A flexible and accurate genotype
imputation method for the next generation of genome-wide association
studies. PLoS Genet 5, e1000529 (2009).
121. Ripley, B.D. Pattern recognition and neural networks. Cambridge university
press (1996).
122. Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the
Royal Statistical Society. Series B 58, 267-288 (1996).
123. Vedaldi, A. & Lenc, K. MatConvNet: Convolutional neural networks for
MATLAB. Proceeding of the ACM Int. Conf. on Multimedia (2015).
124. Gevrey, M., Dimopoulos, I. & Lek, S. Review and comparison of methods to
study the contribution of variables in artificial neural network models.
Ecological modelling 160, 249-264 (2003).
125. Jack, C.R., Jr. et al. Age-specific population frequencies of cerebral beta-
amyloidosis and neurodegeneration among people with normal cognitive
function aged 50-89 years: a cross-sectional study. Lancet Neurol 13, 997-
1005 (2014).
126. Scheltens, P. et al. Atrophy of medial temporal lobes on MRI in "probable"
Alzheimer's disease and normal ageing: diagnostic value and
neuropsychological correlates. J Neurol Neurosurg Psychiatry 55, 967-72
(1992).
127. Risacher, S.L. et al. Baseline MRI predictors of conversion from MCI to
probable AD in the ADNI cohort. Curr Alzheimer Res 6, 347-61 (2009).
128. Sommer, T., Rose, M., Weiller, C. & Buchel, C. Contributions of occipital,
parietal and parahippocampal cortex to encoding of object-location
associations. Neuropsychologia 43, 732-43 (2005).
129. Leech, R. & Sharp, D.J. The role of the posterior cingulate cortex in cognition
and disease. Brain 137, 12-32 (2014).
130. Huang, K.L. et al. A common haplotype lowers PU.1 expression in myeloid
cells and delays onset of Alzheimer's disease. Nat Neurosci 20, 1052-1061
(2017).
102
131. Kreisl, W.C. et al. In vivo radioligand binding to translocator protein
correlates with severity of Alzheimer's disease. Brain 136, 2228-38 (2013).
132. Geman, S., Bienenstock, E. & Doursat, R. Neural networks and the
bias/variance dilemma. Neural computation (1992).
133. Olden, J.D., Joy, M.K. & Death, R.G. An accurate comparison of methods for
quantifying variable importance in artificial neural networks using simulated
data. Ecological Modelling 178, 389-397 (2004).
134. Schroth, G., Naegele, T., Klose, U., Mann, K. & Petersen, D. Reversible brain
shrinkage in abstinent alcoholics, measured by MRI. Neuroradiology 30, 385-
9 (1988).
135. Wittbrodt, M.T., Sawka, M.N., Mizelle, J.C., Wheaton, L.A. & Millard-Stafford,
M.L. Exercise-heat stress with and without water replacement alters brain
structures and impairs visuomotor performance. Physiol Rep 6, e13805
(2018).
136. Schulz, M. et al. Deep learning for brains?: Different linear and nonlinear
scaling in UK Biobank brain images vs. machine-learning datasets. bioRxiv
(2019).
137. Larson, E.B. et al. Exercise is associated with reduced risk for incident
dementia among persons 65 years of age and older. Ann Intern Med 144, 73-
81 (2006).
138. Kramer, A.F., Erickson, K.I. & Colcombe, S.J. Exercise, cognition, and the aging
brain. J Appl Physiol (1985) 101, 1237-42 (2006).
139. Schnack, H.G. et al. Accelerated Brain Aging in Schizophrenia: A Longitudinal
Pattern Recognition Study. Am J Psychiatry 173, 607-16 (2016).
140. Preische, O. et al. Serum neurofilament dynamics predicts neurodegeneration
and clinical progression in presymptomatic Alzheimer's disease. Nat Med 25,
277-283 (2019).
141. Sabri, O. et al. Florbetaben PET imaging to detect amyloid beta plaques in
Alzheimer's disease: phase 3 study. Alzheimers Dement 11, 964-74 (2015).
142. Schindler, S.E. et al. High-precision plasma beta-amyloid 42/40 predicts
current and future brain amyloidosis. Neurology 93, e1647-e1659 (2019).
143. Shin, H.C. et al. Deep Convolutional Neural Networks for Computer-Aided
Detection: CNN Architectures, Dataset Characteristics and Transfer Learning.
IEEE Trans Med Imaging 35, 1285-98 (2016).
Abstract (if available)
Abstract
As our population continues to age, the number of people who experience cognitive decline and face increased risk of neurodegenerative diseases also grows. To preserve cognitive function and prevent aging related diseases, it is imperative that we first identify and understand the lifestyle, environmental or genetic factors that are associated with brain aging. In this thesis, we explore brain magnetic resonance imaging data, clinical data, and genetic data of both cognitively healthy subjects and patients with Alzheimer's disease, with the goal of using multimodal data to understand brain aging. Through studying cognitively healthy subjects, we quantified the association between brain aging and multiple lifestyle factors and genetic factors. Through studying subjects with Alzheimer's disease, we trained a statistical model that captured important brain and genetic features associated with the disease, which can accurately predict the disease risk of mild cognitive impaired subjects. Our results help to set a few potential directions for decelerating brain aging, such as providing guidelines for a brain-friendly lifestyle and offering dementia prediction at an early stage.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Alzheimer’s disease: dysregulated genes, ethno-racial disparities, and environmental pollution
PDF
The association of cerebrovascular disease risk factors with brain structure and its modification by genetic variation
PDF
Neuroimaging markers of risk & resilience to brain aging and dementia
PDF
Feature engineering and supervised learning on metagenomic sequence data
PDF
Using neuroinformatics to identify genomic and proteomic markers of suboptimal aging and Alzheimer's disease
PDF
Neuroimaging in complex polygenic disorders
PDF
Investigating the evolution of gene networks through simulated populations
PDF
Novel multi-site brain imaging approaches to map HIV-related neuropathology
PDF
Exploring the genetic basis of complex traits
PDF
Investigating brain aging and neurodegenerative diseases through omics data
PDF
Learning to diagnose from electronic health records data
PDF
Vascular contributions to brain aging along the Alzheimer's disease continuum
PDF
Air pollution neurotoxicity throughout the lifespan: studies on the mechanism of toxicity and interactions with effects of sex and genetic background
PDF
Computational algorithms for studying human genetic variations -- structural variations and variable number tandem repeats
PDF
Neuroinflammation and ApoE4 genotype in at-risk female aging: implications for Alzheimer's disease
PDF
Blood-brain barrier pathophysiology in cognitive impairment and injury
PDF
Decoding information about human-agent negotiations from brain patterns
PDF
The accumulation of somatic mutations in humans with age
PDF
Mapping genetic variants for nonsense-mediated mRNA decay regulation across human tissues
PDF
Model selection methods for genome wide association studies and statistical analysis of RNA seq data
Asset Metadata
Creator
Ning, Kaida
(author)
Core Title
Characterizing brain aging with neuroimaging, health, and genetic data
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Computational Biology and Bioinformatics
Publication Date
05/04/2020
Defense Date
01/09/2020
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
brain imaging,brain MRI,convolutional neural networks,deep learning,environment,genetics,human brain aging,lifestyle,OAI-PMH Harvest,survey data
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Toga, Arthur W. (
committee chair
), Kim, Hosung (
committee member
), Sun, Fengzhu (
committee member
)
Creator Email
kaidanin@usc.edu,ning.kaida@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-297236
Unique identifier
UC11663336
Identifier
etd-NingKaida-8423.pdf (filename),usctheses-c89-297236 (legacy record id)
Legacy Identifier
etd-NingKaida-8423.pdf
Dmrecord
297236
Document Type
Dissertation
Rights
Ning, Kaida
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
brain imaging
brain MRI
convolutional neural networks
deep learning
environment
genetics
human brain aging
lifestyle
survey data