Close
The page header's logo
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected 
Invert selection
Deselect all
Deselect all
 Click here to refresh results
 Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Characterizing brain aging with neuroimaging, health, and genetic data
(USC Thesis Other) 

Characterizing brain aging with neuroimaging, health, and genetic data

doctype icon
play button
PDF
 Download
 Share
 Open document
 Flip pages
 More
 Download a page range
 Download transcript
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content    

Characterizing Brain Aging with Neuroimaging,
Health, and Genetic Data

Kaida Ning

A Dissertation Presented to the  
FACULTY OF THE USC GRADUATE SCHOOL  
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the Requirements for the Degree
DOCTOR OF PHILOSOPHY
(Computational Biology and Bioinformatics)


May 2020

Copyright 2020                                           Kaida Ning
   ii

Abstract
As our population continues to age, the number of people who experience cognitive
decline and face increased risk of neurodegenerative diseases also grows. To preserve
cognitive function and prevent aging related diseases, it is imperative that we first
identify  and  understand  the  lifestyle,  environmental  or  genetic  factors  that  are
associated with brain aging. In this thesis, we explore brain magnetic resonance imaging
data, clinical data, and genetic data of both cognitively healthy subjects and patients
with Alzheimer's disease, with the goal of using multimodal data to understand brain
aging.  Through  studying  cognitively  healthy  subjects,  we  quantified  the  association
between brain aging and multiple lifestyle factors and genetic factors. Through studying
subjects  with  Alzheimer's  disease,  we  trained  a  statistical  model  that  captured
important brain and genetic features associated with the disease, which can accurately
predict the disease risk of mild cognitive impaired subjects. Our results help to set a few
potential directions for decelerating brain aging, such as providing guidelines for a brain-
friendly lifestyle and offering dementia prediction at an early stage.
 
   iii

Acknowledgements

First and foremost, I would like to thank my supervisor, Dr. Arthur Toga, for being my
supervisor. Dr. Toga encouraged me to explore different research projects that I was
interested in. In addition, he always gave insightful suggestions on my research work.
Without his support, I wouldn’t have the chance to follow my enthusiasm in studying
brain aging. I also want to thank Professor Fengzhu Sun and Professor Hosung Kim for
being  my  committee  members  and  giving  great  suggestions  and  supports  to  my
research projects.  
I am grateful to collaborate with many talented people through my research. Dr. Lu
Zhao  curated  the  UK  Biobank  data  locally,  on  which  majority  of  my  thesis  work
depended.  His  expertise  in  processing  large  amount  of  brain  imaging  data  indeed
accelerated my research. Will Matloff often discussed research problems with me. He is
knowledgeable  about  ongoing  imaging  genetics  researches  and  has  given  me
enlightening opinions. I also want to thank Dr. Ben Duffy for offering expert knowledge
in training neural network model for brain age prediction. Additionally, I would like to
thank Dr. Meredith Franklin for helping me with statistical problems. Further, many
talented and kind LONI people helped me to go through technical problems, discussed
research ideas with me, and proof read my writings.  
I also feel blessed to have my husband, Bo, who was always supportive to me. Most of
days as a PhD student were cheerful. However, I also have struggled with technical
problems and everyday life problems. In these times, Bo always held my hands and
helped me out. I am also thankful to our daughter, Anda, who brought us a lot of
happiness. I learned to cherish both the time for playing with her and the time for doing
research. I am also thankful to my parents and mother in law who supported me during
my PhD study.  
My years on pursuing PhD degree were well spent with all these lovely people!
   iv

Abbreviations

AD  Alzheimer's disease
AUC  area under the receiver operating characteristic curve
CA  chronological age
CNN  convolutional neural network
MCI  mild cognitive impairment
MRI  magnetic resonance imaging
NN  neural network
PBA  predicted brain age
RBA  relative brain age
SNP  single nucleotide polymorphism
 
   v

Table of Contents

Abstract ....................................................................................................................................... ii
Acknowledgements ............................................................................................................... iii
Abbreviations .......................................................................................................................... iv
Table of Contents ..................................................................................................................... v
List of Figures ........................................................................................................................ viii
List of Tables ............................................................................................................................. x
Chapter 1   Introduction and background .................................................................. 1
1.1 Aging and brain aging .................................................................................................................... 1
1.2 Observing the brain with magnetic resonance imaging .......................................................... 2
1.3 Emerging big data for research .................................................................................................... 3
1.4 Imaging-based biomarkers ........................................................................................................... 4
1.5 Statistical models for data analyses ........................................................................................... 5
1.6 Summary and overview of chapters ........................................................................................... 7
Chapter 2   Relative brain age: a biomarker derived from brain
magnetic resonance imaging ............................................................................................ 11
2.1 Introduction .................................................................................................................................. 11
2.2 Results ........................................................................................................................................... 12
2.2.1 Demographic information ................................................................................................................ 12
2.2.2 Predicted brain age and relative brain age ................................................................................ 13
2.3 Material and methods ................................................................................................................ 15
2.3.1 Overview of UK Biobank project .................................................................................................... 15
2.3.2 Magnetic resonance imaging data ................................................................................................ 16
2.3.3 Obtaining predicted brain age and relative brain age based on structural MRI
data ..................................................................................................................................................................... 16
2.4 Discussion ...................................................................................................................................... 18
Chapter 3   Association of relative brain age with lifestyle factors --
smoking and alcohol consumption ................................................................................. 21
3.1 Introduction .................................................................................................................................. 21
3.2 Results ........................................................................................................................................... 22
3.2.1 Previous tobacco smoking is significantly associated with relative brain age ............... 22
3.2.2 Alcohol consumption is significantly associated with relative brain age ......................... 24
3.2.3 Additive effect of smoking and alcohol consumption in association with
relative brain age ........................................................................................................................................... 25
3.3 Materials and methods ............................................................................................................... 26
3.3.1 Demographic information ................................................................................................................ 26
3.3.2 Quantifying the association of RBA with previous tobacco smoking amount
and alcohol intake amount ......................................................................................................................... 27
3.4 Discussion ...................................................................................................................................... 27
   vi

Chapter 4   Association of relative brain age with lifestyle factors --
parity  30
4.1 Introduction .................................................................................................................................. 30
4.2 Results ........................................................................................................................................... 32
4.2.1 Demographic information ................................................................................................................ 32
4.2.2 Number of offspring and relative brain age .............................................................................. 33
4.2.3 Number of offspring and cognitive function ............................................................................. 37
4.3 Materials and methods ............................................................................................................... 40
4.3.1 Demographic information ................................................................................................................ 40
4.3.2 Study the association between number of offspring and relative brain age ................. 41
4.3.3 Study the association between number of offspring and cognitive function ................ 42
4.4 Discussion ...................................................................................................................................... 42
Chapter 5   Improving relative brain age estimate with a convolutional
neural network model and its implication on identifying genetic factors
associated with brain aging .............................................................................................. 48
5.1 Introduction .................................................................................................................................. 48
5.2 Results ........................................................................................................................................... 49
5.2.1 Predicted brain age accuracy from CNN model and regression model ............................ 49
5.2.2 Genetic factors associated with brain aging .............................................................................. 51
5.3 Material and methods ................................................................................................................ 56
5.3.1 Summary of samples used ............................................................................................................... 56
5.3.2 Obtaining relative brain age based on a CNN model .............................................................. 56
5.3.3 Genetic association analyses .......................................................................................................... 60
5.4 Discussion ...................................................................................................................................... 60
Chapter 6   Predicting Alzheimer's disease risk using both neural
network and regression models ...................................................................................... 63
6.1 Introduction .................................................................................................................................. 63
6.2 Results ........................................................................................................................................... 65
6.2.1 Demographic information ................................................................................................................ 65
6.2.2 Models’ performance in classifying AD and in predicting progression from MCI
to AD ................................................................................................................................................................... 66
6.2.3 Important brain and SNP features used by NN model ........................................................... 68
6.2.4 Interactions among brain and genetic features captured by NN model ......................... 69
6.3 Materials and Methods .............................................................................................................. 70
6.3.1 Description of ADNI subjects in the study .................................................................................. 70
6.3.2 MRI brain imaging data and genotype data .............................................................................. 71
6.3.3 Neural network and logistic regression models ....................................................................... 72
6.3.4 Procedures for model training and testing ................................................................................ 75
6.3.5 Identifying important brain and SNP features .......................................................................... 77
6.3.6. Identifying interactions among features ................................................................................... 78
6.4 Discussion ...................................................................................................................................... 79
Chapter 7:  Conclusions and future directions ........................................................... 83
7.1 Summary of findings ................................................................................................................... 83
7.2 The pros-and-cons of NNs for analyzing big data ................................................................... 87
7.3 Identifying factors associated with brain aging and the next step ...................................... 88
7.4 Potential of dementia onset prediction based on imaging data ......................................... 89
   vii

7.5 Conclusion ..................................................................................................................................... 91
References .............................................................................................................................. 92

 
   viii

List of Figures
Figure 1.1. A brain magnetic resonance imaging scan. .................................................................. 3
Figure 2.1. Procedure for training a model for calculating relative brain age and
applying the model to evaluation set samples. .................................................................... 14
Figure 2.2 Relationship between chronological age, predicted brain age, and
relative brain age. ........................................................................................................................... 15
Figure 3.1. Relationship between previous tobacco smoking frequency and
relative brain age. ........................................................................................................................... 23
Figure 3.2. Relationship between alcohol intake frequency and relative brain age. ........ 24
Figure 4.1. Procedure for studying the association between number of offspring
and relative brain age using samplings. .................................................................................. 35
Figure 4.2. Distribution of relative brain age  predicted by model with
multivariable adjustment over 500 samplings in female (left) and male (right)
subjects. ............................................................................................................................................. 36
Figure 4.3. Number of offspring versus response time predicted by model with
multivariable adjustment in female (left) and male (right) subjects. The unit
of response time is millisecond. ................................................................................................ 38
Figure 4.4. Number of offspring versus visual memory score in female (left) and
male (right) subjects. The unit of visual memory is log(number of mistakes
made in memorizing matching cards). .................................................................................... 39
Figure 5.1. Association between chronological age and predicted brain age from a
convolutional neural network model. ..................................................................................... 51
Figure 5.2. Manhattan plot for the association p-values between SNPs and
relative brain age across the genome. The red line indicates the genome-
wide significant threshold on p-value (i.e., 5E-8). The blue line indicates p-
value of 0.05. .................................................................................................................................... 52
Figure 5.3. Regional visualization of a 2-Mb locus on Chromosome 17 where the
SNPs showing genome-wide significant associations with relative brain age
are located. ....................................................................................................................................... 53
Figure 5.4. Regional visualization of a 1-Mb locus on Chromosome 4 where the
SNPs showing genome-wide significant associations with relative brain age
are located. ....................................................................................................................................... 54
Figure 5.5. Comparison of genetic signals identified using RBA from a linear
regression model and using RBA from a convolutional neural network model. ...... 55
Figure 5.6. Procedure for obtaining RBA through a 5-fold cross-validation strategy
and then carrying out RBA-SNP association. ......................................................................... 57
Figure 5.7. Structure of the convolutional neural network model for predicting
brain age based on MRI data. Curved arrows indicate skip connections
between layers. ............................................................................................................................... 59
Figure 6.1. Structure of a neural network model with two hidden layers. ........................... 67
   ix

Figure 6.2. Accuracy (measured as AUC) of the best-performing neural network
and logistic regression models in predicting progression from mild cognitive
impairment to Alzheimer’s disease. ......................................................................................... 68
Figure 6.3. Strength of pairwise interactions among all the brain and genetic
features used in the neural network model. ......................................................................... 70
Figure 6.4. Training, validation, and testing data. ......................................................................... 75
Figure 7.1 Paradigm of research on factors associated with brain aging. ............................ 89

 
   x

List of Tables  
Table 2.1. Demographic information of subjects included in the analyses. ......................... 13
Table 4.1 Demographic information of subjects included in the analyses for the
association between parity and relative brain age. ............................................................ 32
Table 4.2. Median of coefficient estimations for number of offspring in association
with relative brain age in regression model with multivariable adjustment in
500 samplings. ................................................................................................................................. 36
Table 4.3. Coefficient estimations of number of offspring in association with
response time in regression model with multivariable adjustment. The unit
of response time is millisecond. ................................................................................................ 39
Table 4.4. Coefficient estimations of number of offspring in association with visual
memory score. The unit of visual memory is log(number of mistakes made in
memorizing matching cards). ..................................................................................................... 40
Table 6.1. Demographic characteristics of Alzheimer’s disease, mild cognitive
impairment, and healthy control subjects. ............................................................................ 65
Table 6.2. Weight of important features for classifying AD patients and CN
subjects in the neural network model. .................................................................................... 69
   1

Chapter 1    Introduction and background  

1.1 Aging and brain aging
In this thesis, we study aging of the brain. The brain's aging process, while different
between individuals, is associated with structural changes, declined cognitive function,
and increased risk of dementia
1-3
.
The number of Americans aged 65 and over is projected to reach 80 million by year
2050
4
. As the general population gets older, the number of people who experience
cognitive  decline  and  face  increased  risk  of  neurodegenerative  diseases  such  as
Alzheimer's disease (AD)
5
also grows. The AD population is projected to increase from
5.8 million today to 14 million in year 2050
6
. On the other hand, brain aging may be
different from chronological aging. People with the same chronological age can have
different  brain  aging  trajectories,  different  levels  of  brain  atrophy,  and  different
cognitive capacities. Understanding the factors associated with brain aging can lead to
methods to intervene in the aging process and preserve cognitive function.  

   2

1.2 Observing the brain with magnetic resonance imaging
To study the brain, one needs to first observe it. Researchers and clinicians started using
magnetic resonance imaging (MRI), a non-invasive imaging technology that provides
high-resolution anatomical images, for studying human brain since the 1980s
7,8
. Figure
1.1 is an example of brain MRI.  
Brain MRI has been used to provide initial disease diagnoses based on lesion, bleeding,
tumor, inflammation, etc., observed in the brain
9-12
. It has also been used to study how
the volume of specific brain regions, such as frontal lobes, temporal lobes, and lateral
ventricles,  changes  during  normal  aging
13
.  Further,  MRI  has  been  used  to  study
accelerated aging caused by diseases such as AD, and to predict disease risk
14-17
.
   3


Figure 1.1. A brain magnetic resonance imaging scan.

1.3 Emerging big data for research
Nowadays, brain imaging data collected for both the general population and patients
with  brain  diseases  are  growing  steadily,  thanks  to  the  collaborative  efforts  of
researchers from different sites
18
. These large amounts of data provided researchers
with more statistical power to obtain a clearer picture of brain structural changes in
association with aging and disease risk. For example, the UK Biobank recruited ~500,000
subjects in the United Kingdom
19
. All participants have provided blood, urine and saliva
samples and have been genotyped. About 20,000 participants went through MRI scan as
   4

of  August  2018.  The  Alzheimer's  Disease  Neuroimaging  Initiative  (ADNI)  database
(http://adni.loni.usc.edu) was established in 2004 to measure the progression of healthy
and  cognitively  impaired  participants  with  brain  scans,  biological  markers,  and
neuropsychological assessments
20
. To date, about 3,000 subjects with AD, mild cognitive
impairment, and healthy controls were enrolled. Many other databases for studying
various  diseases  have  also  been  curated,  such  as  Parkinson's  progression  marker
initiative, Autism centers of excellence, Neurodegeneration in aging Down syndrome,
etc. (https://ida.loni.usc.edu/)
Further, researchers with different expertise started working closely to analyze big data
from these repositories. For example, the Enhancing Neuroimaging Genetics through
Meta-Analysis (ENIGMA; http://enigma.ini.usc.edu/) project brought researchers from
computer science, statistics, and genetics background to work closely to understand
brain structure, function, and disease, based on brain imaging and genetic data.
1.4 Imaging-based biomarkers
The increasing amount of imaging data has led to more discoveries regarding how the
brain's health status is linked to imaging-based features or biomarkers.  
Conventional  biomarkers  are  usually  based  on  individual  brain  morphometric
measurements from MRI. For example, researchers found that certain brain regions
shrink in AD patients, including hippocampus, temporal gyrus, etc., and may be used for
AD prediction
14,21-23
.  Statistical classification models, such as support vector machines
14-
   5

17,24,25
,  linear  discriminant  analysis
23,25
,  and  regression  models
26-28
,  etc.,  have  been
successfully trained for obtaining 'integrative' biomarkers for classifying and predicting
AD using these biomarkers.  
Recently,  researchers  have  successfully  used  machine-learning  methods  to  derive  a
biomarker that is commonly referred to as predicted brain age (PBA) or brain age based
on imaging data. PBA reflects the degree of aging of the brain based on its anatomical
characteristics,  as  computed  based  on  brain  morphology  measurements  across  the
entire  brain.  Several  studies  used  PBA  and  revealed  that  advanced  brain  age  is
associated with Alzheimer's disease, objective cognitive impairment, and schizophrenia,
etc.
29-34
In this thesis, we further derived relative brain age (RBA), a biomarker that
describes a subject's PBA relative to peers. We will talk about RBA and its application in
later chapters.

1.5 Statistical models for data analyses
Various statistical models have been trained to extract information from the data. For
example, traditional regression models and neural networks (NNs) are often used for
regression and classification tasks. They both fit the data by defining a cost function and
then optimizing the parameters in the model so that the cost function is the lowest
35
.
Traditional linear and logistic regression models assume that each predictor (or feature)
contributes additively to the response variable, or log odds of the response variable
   6

when it is binary. A common application of regression models is to study the association
between  a  specific  factor  of  interest  and  brain  morphometric  measurements  while
adjusting for factors such as age and sex
36,37
. Regression models are also used to predict
certain feature or disease risk based on MRI data
38,39
.
As a comparison, NNs have more complicated structure than regression models, and are
able to extract complex interactions from features through transformation functions in
the layers of nodes connected within the NNs
40,41
. In recent years, NNs have led to
critical  breakthroughs  in  modern  artificial  intelligence  problems  such  as  visual
recognition and speech recognition
41-44
. One class of NNs most commonly used for
analyzing imaging data is convolutional neural network (CNN). CNNs are able to learn
features in images with convolution operations without prior knowledge of what these
features are. With increased amount of labeled clinical imaging data, researchers are
able to train CNNs to accomplish clinical tasks. For example, Andre et al. reported
training a CNN model with more than a hundred thousand skin images, where the NN
models reached dermatologist-level accuracy in classification of skin cancer images
45
.
Langner et al. reported training a CNN model for predicting 'body age' based on whole-
body MRI scan, where the body age characterized the aging level of a human body
based  on  its  structure.  The  mean  absolute  error  between  predicted  body  age  and
chronological age was 2.49 years, which was more accurate than the estimate given by
experienced radiologists
46
. Researchers have also applied CNNs to processing brain MRI
data. Applications include but are not limited to preprocessing MRI data, segmentation
of different brain regions, detection and segmentation of tumors
47,48
. Recently, Jonsson
   7

et al. trained a CNN model for predicting 'brain age' based on brain MRI scan, where the
mean  absolute  error  between  predicted  brain  age  and  chronological  age  was  3.6
years
49
. We will further discuss about predicting brain age in Chapter 2 and Chapter 5.
Since identifying good models for fitting the data is an important part in the research of
brain aging, we have used both traditional regression models and NNs for our research
problems, and compared their performance.  

1.6 Summary and overview of chapters
In  this  thesis,  we  further  extended  the  applications  of  MRI  to  study  brain  aging,
including normal brain aging in cognitively healthy subjects and abnormal brain aging in
patients with Alzheimer's disease. We analyzed UK Biobank data for the former and
analyzed ADNI data for the latter. The two databases were multimodal, encompassing
MRI, genetic, lifestyle, and clinical data, etc.  
1.6.1 Studying brain aging in cognitively normal subjects
In the cognitively normal population, the aging process doesn't affect everyone in the
same way. Individuals with the same chronological age can have different brain aging
trajectories
32
. Researchers have identified various factors associated with accelerated
brain structure atrophy by investigating the MRI data along with lifestyle or disease
features  of  subjects.  For  example,  compared  with  non-smokers,  smokers  have
significantly smaller grey matter volume and lower grey matter density in the frontal
   8

regions, the occipital lobe, and the temporal lobe. Smokers also have a significantly
greater rate of atrophy in regions that show morphological abnormalities in the early
stages of AD
50-52
. It has also been reported that patients with alcohol use disorder show
decreased  regional  grey  and  white  matter  volumes  in  the  medial-prefrontal  and
orbitofrontal cortices. The loss of brain gray and white matter volume accelerates with
aging in chronic alcoholics
53,54
.  
In  our  study  of  brain  aging  in  cognitively  healthy  subjects,  we  took  into  account
multidimensional aging patterns across all regions of the brain. For this purpose, we first
obtained PBA biomarker using a regression model, and then obtained relative brain age
(RBA) based on PBA and chronological age. RBA indicates how old a subject’s PBA
appears compared to peers and is independent of age, which allows direct comparison
of brain aging level for subjects with different chronological age (Chapter 2).  
Although heavy smoking and alcohol consumption are known to be associated with
accelerated brain aging in specific brain regions, the associations haven’t been well
quantified, especially when all brain regions are considered. Therefore, after obtaining
RBA, we quantified how smoking and alcohol consumption are associated with RBA
(Chapter 3).  
After  studying  the  association  of  RBA  with  smoking  and  alcohol  consumption,  we
investigated the association of RBA with parity. Previous research on the association
between brain structural change and parity reported inconclusive findings. Hoekzema et
al.  reported  that  the  volume  of  certain  gray  matter  regions  was  reduced  during
   9

pregnancy and the reductions did not recover for at least 2 years post partum
55
, while
others reported that the gray matter restoration process was evident within the first
few months postpartum
56,57
. Most studies on the association between brain structure
and  parity  had  a  relatively  small  sample  size  (n<100) and less than three years of
postpartum  follow-up
55-57
.  We  hoped  that  through  analyzing  the  big  data  from  UK
Biobank, we could draw a clearer picture of the association between parity and brain
structure (Chapter 4).
Besides lifestyle habits, genetic factors are also thought to be involved in brain aging. A
recent study analyzed brain imaging data and chronological age information from twins
and suggested that the brain aging process was heritable
34
. However, the extent to
which individual genetic variants are associated with brain aging needs to be further
investigated.  Further,  we  hypothesized  that  a  convolutional  neural  network  (CNN)
model may produce more accurate PBA and RBA metrics than the regression model
used in Chapter 2. Therefore, we trained a CNN model for obtaining RBA and studied
the association between genetic factors and RBA. We also compared the genetic factors
identified based on RBA derived from the CNN model and RBA from the regression
model (Chapter 5).
1.6.1 Studying brain aging in AD patients
Besides studying factors associated with normal brain aging, we also studied brain aging
in AD patients (Chapter 6). It is known that AD is associated with accelerated atrophy in
multiple brain regions including hippocampus, entorhinal cortex, and temporal gyrus,
   10

etc.
30,58
It is also known that AD risk is affected by multiple genetic factors
59
. A long-
standing question is how to best use brain morphometric and genetic data to distinguish
AD patients from cognitively normal subjects and to predict those who will progress
from mild cognitive impairment (MCI) to AD. We trained a NN model to classify AD with
both brain morphometric measurements from MRI data and genetic data. We then
assessed  this  model's  performance  in  predicting  progression  from  MCI  to  AD.  We
further  investigate  this  model  to  identify  the  important  predictors  and  interactions
among the predictors.  
 
   11

Chapter 2    Relative brain age: a biomarker derived from
brain magnetic resonance imaging

2.1 Introduction    
In this chapter, we talk about deriving relative brain age (RBA) biomarker from brain
imaging  data.  RBA  describes  if  a  person’s  brain  has  experienced  accelerated  or
decelerated aging compared to peers.  We will use RBA for studying lifestyle and genetic
factors associated with brain aging in Chapters 3-5.
Recently,  researchers  have  successfully  used  machine-learning  methods  to  derive  a
biomarker that is commonly referred to as predicted brain age (PBA) or brain age based
on  brain  imaging  data.  PBA  reflects  the  degree  of  aging  of  the  brain  based  on  its
anatomical  characteristics,  as  computed  based  on  brain  morphology  measurements
across the entire brain. PBA has been derived and used in several studies, where the
mean absolute error between PBA and chronological age (CA) was less than 5 years in
adults
29,32,34
. Further, it has been shown that advanced brain age is associated with
Alzheimer's disease, objective cognitive impairment, and schizophrenia, etc.
29-33
. Before
our research, many papers used the difference between PBA and CA (i.e., PBA - CA) for
capturing deviation of person’s brain structural aging from norm
60,61
.  
   12

However, due to regression dilution, PBA - CA is correlated with CA and may not be
optimal
39,62
. Therefore, we further developed RBA metric, which is independent of CA
and indicates if a subject's brain experiences accelerated or decelerated aging compared
to  peers.  Besides  our  research,  Smith  et  al.  independently  reported  a  method  for
improving brain age delta estimation
39
. They gave statistical reasoning for the cause of
the association between PBA - CA and CA in linear regression. They also suggested
removing the association through stage 2 correction of brain age delta, which was very
similar to our RBA metric. In this chapter, we talk about a method for deriving RBA using
UK Biobank data.

2.2 Results
2.2.1 Demographic information  
We randomly split the data for 17,308 subjects with brain magnetic resonance imaging
into training set (n = 5,193) and evaluation set (n = 12,115). Table 2.1 illustrates the
demographic information for the subjects included in the training and evaluation sets.
There was no significant difference in age, gender, smoking, and alcohol consumption
between these two sets.  


   13

Table 2.1. Demographic information of subjects included in the analyses.

  Number of subjects  Male (%) | Female (%)  Age (mean [SD], min-max)
Training data
(for model training)
5,193  2,466 (47%) | 2,727 (53%)  63.3 [7.4], 46.2-80.7
Evaluation data  
(for association analyses)
12,115  5,753 (47%) | 6,362 (53%)  63.3 [7.4], 45.2-80.3


2.2.2 Predicted brain age and relative brain age
We  trained  a  regression  model  that  produced  the  predicted  brain  age  (PBA)  using
training set subjects. We observed that the difference between PBA and CA (i.e., PBA -
CA) was negatively associated with CA. The older subjects tended to have negative PBA -
CA, while the younger subjects tended to have positive PBA - CA. Therefore, after
obtaining PBA for each subject we further trained a model to calculate relative brain age
(RBA) (see methods). We then applied the trained models to the evaluation set subjects,
and further obtained PBA and RBA for the evaluation set subjects (as illustrated in
Figure 2.1). The mean absolute error (MAE) between PBA and chronological age (CA) in
the evaluation set was 3.8 years. The relationship between CA, PBA, and RBA for the
evaluation set subjects is illustrated in Figure 2.2. In the evaluation set, there were
roughly half of the subjects with positive RBA and half of the subjects with negative RBA
at  each  age  range  (Supplementary  Figure  2.2),  although  PBA-CA  was  negatively
associated with CA (Supplementary Figure 2.1).

   14


Figure 2.1. Procedure for training a model for calculating relative brain age and applying the model to evaluation
set samples.  








   15


Figure 2.2 Relationship between chronological age, predicted brain age, and relative brain age.  

2.3 Material and methods
2.3.1 Overview of UK Biobank project
The UK Biobank recruited ~500,000 subjects in the United Kingdom
19
. The participants
have provided blood, urine and saliva samples. All participants have been genotyped.
20,000 participants scanned as of August 2018 were included in our study (including
brain,  heart,  abdomen,  bones  and  carotid  artery).  All  participants  had  provided
informed consent. The present analyses were conducted under data application number
25641.  
   16

2.3.2 Magnetic resonance imaging data
Details of the structural brain magnetic resonance imaging (MRI) data, such as imaging
hardware  and  acquisition  protocols,  are  described  elsewhere
63,64
.  In  our  analyses,
quality controlled structural MRI data was obtained for 21,345 subjects. We excluded
1,222 (5.7%) subjects with brain and nervous system related illness, including cognitive
impairment, neurological disorders or stroke, etc. We further excluded 2,815 (13.2%)
subjects  with  non-European  ancestry  (according  to  both  self-reported  ethnicity  and
principal  component  analyses  on  the  genetic  data).  Brain  imaging  data  of  17,308
subjects were used in our analyses. The age range of these participants is between 45.2
years and 80.7 years.
In total, 403 brain morphometrics, including volume of cortical, subcortical and white
matter regions, thickness and surface area of cortical regions, ventricle size, intracranial
volume, etc., were obtained with FreeSurfer 6.0
65
based on the T1 MRI brain scans, with
the Desikan-Killiany atlas. FreeSurfer is documented and freely available for download
online (http://surfer.nmr.mgh.harvard.edu/).  
2.3.3 Obtaining predicted brain age and relative brain age based on structural MRI
data
Predicted brain age (PBA) is a metric describing how old a person's brain appears based
on a brain scan at a single time-point. Relative brain age (RBA) is a metric indicating if a
person’s brain has experienced accelerated or decelerated aging compared to peers. It
   17

captures the deviation of a person’s brain structural aging from the population’s normal
pace.  
We trained a model for obtaining PBA and RBA based on MRI data using training set
subjects. To be specific, we randomly split the brain imaging data of 17,308 subjects into
training and evaluation sets. Our rationale for picking 30% (5,193) of the subjects as the
training set and the remaining 70% (12,115) as the evaluation set was to balance the
need for accurately training a model to predict brain age and the need for a large
number of subjects in the evaluation set for evaluating the association of RBA and the
factors of interest.  
The model for obtaining PBA and RBA is trained as follows. We first trained a model
obtaining predicted brain age (PBA) based on MRI data using data of the training set
subjects. To be specific, we built a linear regression model with Lasso regularization for
predicting brain age using R package glmnet
66,67
. In the model, the chronological age
was  the  response  variable,  and  403  brain  quantitative  measures  derived  using
Freesurfer  were  used  as  predictors.  During  model  training,  the  Lasso  parameter,
lambda, was selected based on an internal cross validation using glmnet. We did not do
any pre-selection on the predictors, since the training set sample size was sufficiently
large relative to the number of predictors in the model. The mean absolute error (MAE)
between PBA and chronological age in the training set was 3.5 years. We observed that
due to regression dilution
62
, the difference between PBA and CA (i.e., PBA - CA) was
negatively associated with CA. The older subjects tended to have negative PBA - CA,
   18

while  the  younger  subjects  tended  to  have  positive  PBA  -  CA  (Figure  2.2  and
Supplementary Figure 2.1). Therefore, after obtaining PBA for each subject, we further
calculated RBA. RBA is defined as the difference between PBA and expected PBA given a
subject’s  chronological  age  (i.e.,  RBA=  PBA-  Expected(PBA|CA)).  Here,
Expected(PBA|CA)), or EPBA, was obtained through building a regression model where
CA  was  the  predictor  and  PBA  was  the  response  variable.  In  that  way,  RBA  is
independent of CA. At each age range, there were roughly half of the subjects with
positive RBA and half of the subjects with negative RBA (Figure 2.2). A subject with
positive RBA has a brain that appears older than those of peers, while a subject with
negative RBA has a brain that appears younger. Since we linear operations were used to
derive RBA based on PBA and CA, the unit of RBA is year.
After training the model for obtaining PBA and RBA using the training set data, we
applied it to the evaluation set and carried on association analyses.  

2.4 Discussion
Here we analyzed the brain imaging data collected for 17,308 UK Biobank subjects. We
derived RBA metric using training set subjects, and further investigated the association
of RBA with smoking, alcohol intake, and genetic variants using evaluation set subjects.  
In our analyses, we first calculated PBA of a subject based on structural MRI data and
then derived RBA, a metric that describes a subject's PBA relative to peers. RBA was
   19

calculated as the difference between PBA and EPBA (i.e., RBA=PBA - EPBA; see the
methods section for details) of a person. As a comparison, in other studies where PBA
was derived based on regression model, the difference between PBA and CA (PBA - CA,
or BrainAGE) was used to indicate the brain aging status
29-31
. We observed that due to
regression dilution, older subjects tend to have negative values of PBA - CA, while
younger subjects tend to have positive values of PBA - CA (Figure 2.2). As a comparison,
RBA was independent of CA. At all age ranges, roughly half of the subjects had positive
RBA and half of the subjects had negative RBA.
Supplementary Figures

Supplementary Figure 2.1. Relationship between chronological age and the difference
between predicted brain age and chronological age in the evaluation set.
   20


Supplementary Figure 2.2. Relationship between chronological age and relative brain
age in the evaluation set.

 
   21

Chapter 3    Association of relative brain age with lifestyle
factors -- smoking and alcohol consumption

3.1 Introduction
In chapter 2 we derived relative brain age metric (RBA). In this chapter, we study the
association of RBA with smoking and alcohol consumption.
Heavy smoking and heavy alcohol drinking are among the most studied lifestyle factors
on brain aging. Compared with non-smokers, smokers have significantly smaller grey
matter volume and lower grey matter density in the frontal regions, the occipital lobe,
and the temporal lobe. Further, smokers have a significantly greater rate of atrophy in
regions  that  show  morphological  abnormalities  in  the  early  stages  of  Alzheimer’s
disease
50-52
. It has also been reported that patients with alcohol use disorder show
decreased  regional  grey  and  white  matter  volumes  in  the  medial-prefrontal  and
orbitofrontal cortices. The loss of brain gray and white matter volume accelerates with
aging in chronic alcoholics
53,54
. On the other hand, studies have shown that nicotine, a
compound contained in tobacco, may improve attention and other cognitive functions
in human subjects
68,69
. It has also been reported that drinking wine may be beneficial to
the cardiovascular system, which is related to brain health
70,71
. To date, it is still unclear
how  smoking  and  alcohol  consumption  is  associated  with  brain  structural  aging,
especially when the morphology of all the brain regions is considered. Therefore, we
   22

analyzed  brain-imaging  data  collected  for  17,308  UK  Biobank  subjects  who  were
cognitively  normal  and  were  of  European  ancestry  and  studied  the  association  of
relative brain age with smoking and alcohol consumption.

3.2 Results
3.2.1 Previous tobacco smoking is significantly associated with relative brain age
Information of previous tobacco smoking frequency was collected for 11,651 of the
evaluation set subjects during the visit for MRI scan. Regression analyses adjusting for
gender and education showed that previous tobacco smoking frequency was statistically
significantly  associated  with  RBA  (ANOVA  F-test  p-value  < 2E-16, see Figure 3.1).
Pairwise comparisons showed that the most significant difference was between those
who smoked on most or all days (with an average RBA of 0.6 years) and the rest of the
smoking frequency categories (i.e., those who abstained from smoking, just tried once
or twice, or occasionally), while there was no significant difference among the groups of
subjects who didn’t smoke on most or all days.

   23


Figure 3.1. Relationship between previous tobacco smoking frequency and relative brain age.


   24


Figure 3.2. Relationship between alcohol intake frequency and relative brain age.

3.2.2 Alcohol consumption is significantly associated with relative brain age
Information  of  current  alcohol  drinking  frequency  was  collected  for  11,600  of  the
evaluation set subjects during the visit for MRI scan. Regression analyses adjusting for
gender  and  education  showed  that  alcohol  consumption  frequency  was  statistically
   25

significantly associated with RBA (ANOVA F-test p-value = 9E-6, see Figure 3.2). Pairwise
comparisons among groups with different alcohol consumption frequencies showed
that the strongest difference was between the group who drank alcohol on most or all
days (with an RBA of 0.4 years) and the rest of the alcohol drinking frequency categories
(i.e., those who abstained from drinking, drank at special occasions only, 1~3 times a
month, 1~2 times a week, or 3~4 times a week), while the difference among groups who
didn’t drink on most or all days was insignificant.  
3.2.3 Additive effect of smoking and alcohol consumption in association with relative
brain age
Smoking  and  alcohol  consumption  amount  were  positively  correlated  and  had  an
additive effect on RBA. Among the 2,327 subjects who smoked on most or all days and
did not abstain from alcohol, the correlation between the two variables was 0.08 (p-
value = 9E-5). We used a regression model with RBA as the response variable and with
smoking  amount,  alcohol  consumption  amount,  sex,  and  education  as  predictors.
According to this model, each additional pack-year of smoking was associated with 0.03
years of increased RBA (p-value = 2E-8); each additional gram of alcohol consumption
per day was associated with 0.02 years of increased RBA (p-value = 6E-10). The R-
squared value of this model was 0.032. As a comparison, a model with only smoking
amount as predictor and adjusted for sex and education had an R-squared of 0.018. A
model with only alcohol consumption amount as predictor and adjusted for sex and
education  had  an  R-squared  of  0.015.  We  also  built  a  regression  model  with  an
   26

interaction  term  between  alcohol  drinking  and  smoking.  The  interaction  term  was
insignificant, indicating that there was insufficient evidence to support the presence of
an interaction between alcohol drinking and smoking in affecting RBA.

3.3 Materials and methods
3.3.1 Demographic information
We used the information of education qualification collected during the visit for MRI
scan. The qualification variable has multiple categories based on a British system. We
collapsed it into two categories indicating whether or not a subject held a college or
university degree, as used in the paper by Cox et al.
72
There was a significant association
between  education  and  RBA  (p-value  =  0.009).  Therefore,  we  also  adjusted  for
education when assessing the association of RBA with smoking, alcohol consumption,
and genetic variants.
We used the information of smoking history and alcohol intake status that was collected
during the visit for MRI scan. The smoking and alcohol intake frequency categories used
in our analyses were as reported in the UK Biobank questionnaire. The smoking pack-
years was defined as the number of cigarettes smoked per day/20 multiplied by the
number of years of smoking. The alcohol intake amount was calculated as described in
the paper by Piumatti et al.
73
. Alcohol consumption per day for a specific type of drink
was calculated as the number of drinks consumed per day multiplied by the number of
   27

grams of alcohol contained in one drink. The total amount of alcohol consumption per
day was the summation of the alcohol amount from all types of drinks. More details can
be found on the UK Biobank website (http://www.ukbiobank.ac.uk/).  
3.3.2 Quantifying the association of RBA with previous tobacco smoking amount and
alcohol intake amount
We  quantified  the  association  between  previous  tobacco  smoking  amount,  alcohol
intake amount, and RBA using a two-step regression model adjusting for gender and
education.  We first built a linear regression model using data of 2,327 evaluation set
subjects who previous smoked daily or almost daily and did not abstain from drinking
alcohol. We then identified subjects with large Cook's distance as potential influential
observations  (i.e.,  subjects  with  Cook's  distance  greater  than  3*  the  mean  Cook's
distance of all the subjects). We excluded these influential observations, fitted a second
linear regression model, and reported results based on the second regression model. In
total,  data  of  2,174  non-influential  observations  were  used  in  the  second-step
regression.

3.4 Discussion
Our analyses of smoking and RBA indicated that subjects who had smoked on most or all
days had a significantly higher RBA compared to subjects who smoked less often. That
was consistent with previous studies, which showed significantly greater rate of atrophy
   28

in  certain  regions  of  the  brains  of  smokers
50-52
.  Our  data  also  showed  insignificant
difference of RBA among the subjects who smoked occasionally, only tried once or
twice, or abstained from smoking. This suggests that the detrimental effect of smoking
on brain aging occurs mainly among those who smoked on most days.  
Our analyses of alcohol intake frequency and RBA indicated that subjects who drank
daily or almost daily had a significantly higher RBA compared to those who drank less
frequently. Our finding was consistent with previous studies, which showed that heavy
alcohol consumption was detrimental to the brain
53,54,74
. We did not find significant RBA
difference among subjects who drank alcohol less frequently or those who abstain from
drinking. It has been reported that a small dose of alcohol is associated with a reduced
risk  of  cardiovascular  disease,  coronary  heart  disease  and  stroke
73,75,76
.  Moreover,
cardiovascular  health  and  brain  health  are  related.  Researchers  have  found  that
cardiovascular  risk  factors  like  hypertension  and  heart  disease  are  associated  with
increased brain white matter abnormalities and brain atrophy
70,77
. Therefore, a small
amount of alcohol may have certain beneficial to brain health through contributing to
the cardiovascular health. Gu et al., have reported that light-to-moderate total alcohol
intake was associated with larger total brain volume in elderly subjects
78
. Nevertheless,
our  results  didn’t  show  RBA  difference  among  subjects  who  drank  alcohol  less
frequently  or  those  who  abstain  from  drinking.  We  also  acknowledge  that  our
observation would need to be further validated using an independent data set.
   29

Our study has some limitations. First, we used a linear regression model with LASSO to
produce PBA based on structural MRI data. More sophisticated statistical approaches
such  as  using  principal  component  analyses  for  dimension  reduction  before  LASSO
regression, or using neural networks may help to improve the accuracy of PBA. Also, the
combination of structural MRI and other types of brain imaging data (e.g., functional
MRI,  diffusion-weighted  MRI)  may  help  to  improve  the  accuracy  of  PBA.  A  more
accurate  PBA  would  allow  better  estimation  of  RBA.  Second,  in  our  study,  we
investigated  the  association  of  brain  age  with  tobacco  smoking  and  alcohol
consumption. Besides smoking and alcohol consumption, various environmental factors
may be associated with brain age. For example, physical exercise and meditation had
been reported to be associated with lower brain aging level
79,80
. Further, genetics also
affects brain aging
34
. Therefore, the variation of RBA that can be explained by smoking
and alcohol drinking amount was small (as reflected by the small R-squared in the
regression  model  for  quantifying  the  association  of  RBA  with  smoking  and  alcohol
drinking  amount).  More  studies  can  be  done  to  help  fully  understand  the  factors
associated with brain age. Third, we chose to use pack-years and grams of alcohol intake
per day for assessing the smoking and drinking amount. It is worth noting that the self-
reported  smoking  and  drinking  amount  may  not  be  accurate.  Further,  there  are
alternative measurements for assessing smoking and drinking amount, which may yield
slightly different findings
81,82
.
 
   30

Chapter 4    Association of relative brain age with lifestyle
factors -- parity

4.1 Introduction
In previous chapter, we talked about the association of RBA with smoking and alcohol
consumption, two adverse factors on brain aging. In this chapter, we talk about our
finding on association of RBA with parity, for which the association with brain aging was
unclear.  
Previous  research  on  the  association  between  brain  structural  change  and  parity
reported inconclusive findings. Hoekzema et al. reported that the volume of certain gray
matter regions was reduced during pregnancy and the reductions did not recover for at
least 2 years post partum
55
, while others reported that the gray matter restoration
process was evident within the first few months postpartum
56,57
. Most studies on the
association  between  brain  structure  and  parity  had  a  relatively  small  sample  size
(n<100) and less than three years of postpartum follow-up
55-57
. To date, it is still unclear
if there are any long-term effects of parity on brain structure in the mid-to-old age
population. We hypothesized that there may be an observable association between
parity and RBA and sought to investigate this question using UK Biobank data.  
   31

Researchers  have  also  investigated  the  association  between  parity  and  cognitive
function in females, though different conclusions have been found. Some studies found
that parity was associated with better episodic memory and had a protective effect
against Alzheimer's disease (AD)
59,83
. Contrarily, parity has been associated with poor
word recall score, Mini Mental State Exam score, and AD neural pathology
84,85
. A recent
study of approximately 10,000 male and female subjects found an association between
the  number  of  offspring  and  cognitive  function  in  later  life,  including  memory  and
executive function, and suggested that socioeconomic status largely accounted for the
association
86
.  
On the other hand, having offspring leads to significant life changes in both females and
males, all of which may impact the brain. For example, among low-parity men and
women, more frequent use of alcohol and tobacco was observed
87
. Children might serve
as  a  'bridge'  connecting  parents  to  more  social  and  community  activities
88
.  Adult
children can provide parents with emotional and social support, as well as instrumental
support  such  as  s  shopping  and  house  work
89,90
.  Modig  et  al  reported  that  having
offspring  was  associated  with  lower  mortality  risk  in  both  sexes.  Interestingly,  the
differences in death risks between subjects with and without offspring were slightly
larger  for  men  than  for  women
91
.  Therefore,  we  hypothesized  that  lifestyle  and
environmental factors accompanying having offspring, other than pregnancy history,
might also play a role in the association between parity and wellbeing of the brain. In
this case, an association between parity and wellbeing of the brain would be observed
   32

in both men and women. For the purposes of our study, we extended the definition of
parity to be the number of offspring for both men and women.

4.2 Results
4.2.1 Demographic information
Brain imaging data were obtained for 6,822 women and 6,762 men. Among female
subjects, 21% were childless, 14% had one child, 44% had two children, 16% three
children, and 5% four or more children. Among male subjects, 20% were childless, 13%
had one child, 45% had two children, 17% three children, and 5% four or more children.
Table 4.1 provides summary statistics all covariates considered in the analyses grouped
by  sex.  Descriptive  results  for  subjects  with  cognitive  function  data  are  shown  in
Supplementary Table 4.1.
Table 4.1 Demographic information of subjects included in the analyses for the
association between parity and relative brain age.

  Female (n=6822)  Male (n=6762)
Number of offspring, n (%)
 
0  1456 (21.3%)  1351 (20%)
1  920 (13.5%)  843 (12.5%)
2  3020 (44.3%)  3074 (45.5%)
3  1117 (16.4%)  1136 (16.8%)
>=4  309 (4.5%)  358 (5.3%)
Age, mean (SD)  62.2 (7.3)  64 (7.5)
Education, n (%)
 
College or university degree  3091 (45.3%)  3306 (48.9%)
Other degree  3731 (54.7%)  3456 (51.1%)
BMI, n (%)
 
Normal  3247 (47.6%)  2227 (32.9%)
Obese  1151 (16.9%)  1238 (18.3%)
Overweight  2354 (34.5%)  3284 (48.6%)
Underweight  70 (1%)  13 (0.2%)
Household income, n (%)
 
   33

Less than 18,000  986 (14.5%)  654 (9.7%)
18,000 to 30,999  2050 (30%)  1818 (26.9%)
31,000 to 51,999  2036 (29.8%)  2184 (32.3%)
52,000 to 100,000  1379 (20.2%)  1644 (24.3%)
Greater than 100,000  371 (5.4%)  462 (6.8%)
Past tobacco smoking, n (%)
 
Abstained from smoking  3582 (52.5%)  3068 (45.4%)
Just tried once or twice  1162 (17%)  1162 (17.2%)
Occasionally  879 (12.9%)  787 (11.6%)
On most or all days  1199 (17.6%)  1745 (25.8%)
Alcohol intake, n (%)
 
Abstained from drinking  413 (6.1%)  276 (4.1%)
Special occasions only  849 (12.4%)  394 (5.8%)
1~3 times a month  918 (13.5%)  640 (9.5%)
1~2 times a week  1950 (28.6%)  1777 (26.3%)
3~4 times a week  1794 (26.3%)  2291 (33.9%)
Daily or almost daily  898 (13.2%)  1384 (20.5%)
Sleep duration, n (%)
 
Normal  5045 (74%)  5198 (76.9%)
Short  1692 (24.8%)  1493 (22.1%)
Long  85 (1.2%)  71 (1%)
Living with others, n (%)
 
No  1497 (21.9%)  1049 (15.5%)
Yes  5325 (78.1%)  5713 (84.5%)
Diabetes, n (%)
 
No  6633 (97.2%)  6405 (94.7%)
Yes  189 (2.8%)  357 (5.3%)
Hypertension, n (%)
 
No  5482 (80.4%)  4760 (70.4%)
Yes  1340 (19.6%)  2002 (29.6%)


4.2.2 Number of offspring and relative brain age  
The number of offspring was significantly associated with RBA in both sexes. In 500
random  samplings,  median  ANOVA  p-value  for  the  association  between  number  of
offspring and RBA was <0.001 for both female and male subjects. Among females,
   34

compared with those who were childless, subjects with two offspring were estimated to
have a brain age that was 0.5 years younger, and subjects with three offspring were
estimate to have a brain age that was 0.7 years younger. Among males, subjects with
two offspring were estimated to have a brain age that was 0.6 years younger, and
subjects with three offspring were estimate to have a brain age that was 0.7 years
younger. In female subjects, a significant linear trend (p<0.001) of the association was
observed, while in male subjects a quadratic trend (p<0.001) was observed (Figure 4.2).
Table 4.1 shows the median parameter estimates for the number of offspring across the
500 samplings. No significant interaction was observed between number of offspring
and sex on RBA.  

   35


Figure 4.1. Procedure for studying the association between number of offspring and relative brain age using
samplings.  


 
   36


Figure 4.2. Distribution of relative brain age  predicted by model with multivariable adjustment over 500 samplings
in female (left) and male (right) subjects.

Table 4.2. Median of coefficient estimations for number of offspring in association with
relative brain age in regression model with multivariable adjustment in 500 samplings.  

Female     Male    

Coefficient (95% CI)

Coefficient (95% CI)
Childless (baseline)

Childless (baseline)

1 offspring  -0.21 (-0.66,0.24)  1 offspring  -0.46 (-0.93,0.01)
2 offspring  -0.52 (-0.87,-0.17)*  2 offspring  -0.62 (-0.99,-0.25)*
3 offspring  -0.72 (-1.15,-0.29)*  3 offspring  -0.68 (-1.13,-0.23)*
>=4 offspring  -0.69 (-1.36,-0.02)*  >=4 offspring  -0.41 (-1.06,0.24)
*p-value<0.05
   
95% CI is inferred from median standard error in 500 samplings
 
   37

4.2.3 Number of offspring and cognitive function  
In female subjects, the number of offspring was statistically significantly associated with
both response time and visual memory according to regression models that adjusted for
covariates  as  described  in  the  methods  section  (ANOVA  F-test  p-values  < 0.001).
Compared with subjects who were childless, those with any number of offspring had
shorter response time and made fewer mistakes in visual memory task.  A non-linear
relationship  was  observed  between  number  of  offspring  and  response  time  and
between number of offspring and visual memory, confirmed with statistically significant
quadratic trend tests (p-value = 0.002 for response time, and p-value <0.001 for visual
memory).  Figures  4.4  and  4.4  illustrate  these  trends,  and  parameter  estimates  for
number of offspring in the two models are listed in Tables 4.3 and 4.4, respectively.
Pairwise comparisons among parity groups showed that subjects with 2 or 3 children
and those who were childless had the largest differences in both response time and
visual memory, although for visual memory score among females the difference did not
reach statistical significance when adjusted for multiple testing.  
In male subjects, the number of offspring was also significantly associated with both
response time and visual memory (ANOVA F-test p-values < 0.001). Compared with
subjects who were childless, those with offspring had shorter response time and made
fewer mistakes in visual memory task. Similar to females, a quadratic trend existed for
the associations with both outcomes (p-value<0.001) as shown in Figures 4.3 and 4.4
and parameter estimates for number of offspring are listed in Tables 4.3 and 4.4.  
   38



Figure 4.3. Number of offspring versus response time predicted by model with multivariable adjustment in female
(left) and male (right) subjects. The unit of response time is millisecond.

   39


Figure 4.4. Number of offspring versus visual memory score in female (left) and male (right) subjects. The unit of
visual memory is log(number of mistakes made in memorizing matching cards).

Table 4.3. Coefficient estimations of number of offspring in association with response
time in regression model with multivariable adjustment. The unit of response time is
millisecond.

Female     Male    

Coefficient (95% CI)

Coefficient (95% CI)
Childless (baseline)

Childless (baseline)

1 offspring  -4.18 (-6.05,-2.31)**  1 offspring  -7.45 (-9.4,-5.50)**
2 offspring  -7.30 (-8.77,-5.83)**  2 offspring  -9.24 (-10.76,-7.71)**
3 offspring  -6.24 (-8.00,-4.47)**  3 offspring  -7.93 (-9.76,-6.09)**
>=4 offspring  -4.47 (-7.02,-1.93)**  >=4 offspring  -10.36 (-12.9,-7.81)**
**p-value<0.001
     


   40

Table 4.4. Coefficient estimations of number of offspring in association with visual
memory score. The unit of visual memory is log(number of mistakes made in
memorizing matching cards).

Female     Male    

Coefficient (95% CI)

Coefficient (95% CI)
Childless (baseline)

Childless (baseline)
 1 offspring  -0.01 (-0.02,0.00)  1 offspring  -0.04 (-0.05,-0.03)**
2 offspring  -0.02 (-0.03,-0.01)**  2 offspring  -0.06 (-0.07,-0.05)**
3 offspring  -0.03 (-0.04,-0.02)**  3 offspring  -0.06 (-0.07,-0.05)**
>=4 offspring  -0.02 (-0.03,0.00)*  >=4 offspring  -0.06 (-0.07,-0.04)**
*P-value<0.05
     **P-value<0.001
       
Regression models with integrated data from both sexes indicated significant interaction
between number of offspring and sex on response time and visual memory, where
protective effects of having offspring on cognitive function appeared to be larger in
male subjects than female subjects (p-value of interaction <0.001 for both cognitive
functions).  

4.3 Materials and methods
4.3.1 Demographic information  
Demographic information, including parity (i.e., number of live births for women, and
number of children fathered for men), age, education, body mass index (BMI), average
total household income, past tobacco smoking frequency, alcohol intake frequency,
sleep duration, living alone or with others, diabetes, and hypertension disease status,
   41

was obtained for all subjects. Parity was further categorized into 5 groups: no offspring,
1 offspring, 2 offspring, 3 offspring, and >=4 offspring. We treated number of offspring
as a categorical variable rather than continuous for two reasons: first, the >=4 category
contained subjects with 4, 5, or more offspring did not have a linear relationship with
the other categories; second, we hypothesized that relationship of number of offspring
with cognitive function and relative brain age might not be linear.
4.3.2 Study the association between number of offspring and relative brain age
We investigated the association between the number of offspring and RBA by repeating
the following three-step procedure 500 times: First, we randomly split the samples into
sets A and B, each having equal size. Second, using set A, we trained a model to obtain
RBA based on MRI data and applied it to obtain RBA for set B. Third, using set B, we
examined the association between number of offspring and RBA adjusting for multiple
covariates. The procedure was repeated 500 times so that distribution of the parameter
of interest in all the rounds gave information on how sensitive the result is to the
random splits used. The analyses procedure is visualized in a flowchart in Figure 4.1. We
analyzed the association in females and males separately and then combined the data
of both sexes to look for interaction between sex and number of offspring in association
with RBA. In the association analyses, linear regression with multivariable adjustment
was  use.  We  adjusted  for  age,  education,  body  mass  index  (BMI),  average  total
household income, past tobacco smoking frequency, alcohol intake frequency, sleep
   42

duration,  living  alone  or  with  others,  diabetes,  and  hypertension  disease.  Pairwise
comparisons of number of offspring and RBA were conducted using the Scheffe test.
Statistical significance was set at a=0.05 and all regression analyses were conducted
using the R language
67
.
4.3.3 Study the association between number of offspring and cognitive function
We first assessed the association between cognitive function (response time and visual
memory) and number of offspring for males and females separately using regression
models. We then combined data of both sexes, used an interaction term between sex
and number of offspring to test whether the overall associations were significantly
different for males and females.  

4.4 Discussion
We studied the association of number of offspring with RBA, response time, visual
memory in the UK Biobank cohort.  
Most previous studies on the long-term association between parity and wellbeing of the
brain only evaluated cognitive function
59,83,85,92
. Our study contributes new information
because we further looked into the association between parity and RBA, a biomarker of
structural aging of the brain, and observed findings that corroborated the association
between parity and cognitive function. In both sexes, subjects with any number of
   43

offspring had younger appearing brain than subjects with no offspring. In male subjects,
the association between parity and RBA followed a "U-shape" pattern, where subjects
with 2 or 3 offspring had younger appearing brain compared to subjects with 0, 1, or
>=4 offspring. That was similar to the association observed between parity and cognitive
function.  In  comparison,  a  linear  relationship  was  observed  between  number  of
offspring and RBA in females. This linear association may be explained by the hormonal
fluctuation specifically linked to women's pregnancy history and remains to be further
investigated.  
We observed that in both males and females, having offspring is associated with better
visual memory and faster response time after adjusting for age, education, BMI, income,
smoking, and other factors. One possible explanation is that parenthood naturally leads
to interactions and activities with children. Moreover, children might become a 'bridge'
connecting to more social and community activities
88
. Previous studies have shown that
social interactions are protective of cognitive function
93,94
. On the other hand, child
rearing  is  also  associated  with  increased  financial  and  physical  stress
95,96
.  Previous
studies also showed association between parity and increased cardiovascular disease
risk and increased BMI in both sexes
97,98
. This could possibly explain our observation of a
"U-shape" association; cognitive function did not monotonically improve with increasing
number of offspring.  
Further, the association between cognitive function and parity was significantly stronger
in male subjects than in female subjects. Since males do not experience the physical
   44

process  of  pregnancy,  our  observation  further  suggests  that  lifestyle  factors
accompanying having offspring may play an important role in the association between
parity and cognitive function. Our finding is corroborated by a study by Zhang et al. that
showed that single men who were childless had significantly higher rates of loneliness
and depression compared with women in comparable circumstances
99
.  
Relative strengths of the study are its large sample size, inclusion of both male and
female subjects, and observation in the association between RBA and parity that further
supported the association between cognitive function and parity. Our study also has a
few limitations. First, while we adjusted for a number of socioeconomic, lifestyle, and
health covariates in our model, we were not able to account for the possibility that they
may  be  time  varying.  Details  of  these  covariates  in  early  life  could  be  useful  for
understanding other underlying issues related to cognitive function and structural aging
of the brain. Second, the study is an observational study, so it is impossible to conclude
that having offspring is leading to improved brain health. It could also be possible that
those who have poor underlying health have fewer opportunities to have offspring.
Third, since only a small proportion of subjects had 5 or more children, we categorized
number of off-spring into 0, 1, 2, 3, and >=4 as in previous studies using this variable
86,97
,
and did not study the difference among those who have 4, 5, or more offspring. Fourth,
brain health is only a small part of overall health condition of the body. Although we
found that having offspring is associated with better visual memory, faster response
time,  and  a  younger  looking  brain,  we  may  not  conclude  that  having  offspring  is
associated with improved wellness of the whole body.  
   45

In  conclusion,  we  observed  robust  evidence  that  parity  is  associated  with  visual
memory, response time, as well as RBA in both sexes. Our observation suggests that
lifestyle factors associated with having offspring, likely shared by both sexes, contribute
to these associations. At the same time, we observed different detailed association
patterns  within  women  and  men,  which  suggest  the  importance  of  studying  the
association between parity and wellbeing of the brain in the context of sex.


 
   46

Supplementary items
Supplementary Table 4.1. Demographic information of subjects included in the analyses
for the association between parity and cognitive function.

  Female (n=160,077)  Male (n=143,119)
Number of offspring, n (%)
 
0  29931 (18.7%)  28808 (20.1%)
1  20621 (12.9%)  18059 (12.6%)
2  72997 (45.6%)  63598 (44.4%)
3  27989 (17.5%)  24210 (16.9%)
>=4  8539 (5.3%)  8444 (5.9%)
Age, mean (SD)  56.7 (7.9)  57.5 (8.1)
Education, n (%)
   College or university degree  53736 (33.6%)  51636 (36.1%)
Other degree  106341 (66.4%)  91483 (63.9%)
BMI, n (%)
   Normal  64032 (40%)  35033 (24.5%)
Obese  36119 (22.6%)  35719 (25%)
Overweight  58910 (36.8%)  72143 (50.4%)
Underweight  1016 (0.6%)  224 (0.2%)
Household income, n (%)
   Less than 18,000  36161 (22.6%)  24930 (17.4%)
18,000 to 30,999  42706 (26.7%)  35129 (24.5%)
31,000 to 51,999  42046 (26.3%)  40240 (28.1%)
52,000 to 100,000  31273 (19.5%)  33775 (23.6%)
Greater than 100,000  7891 (4.9%)  9045 (6.3%)
Past tobacco smoking, n (%)
 
Abstained from smoking  75167 (47%)  54420 (38%)
Just tried once or twice  26613 (16.6%)  23376 (16.3%)
Occasionally  22917 (14.3%)  20014 (14%)
On most or all days  35380 (22.1%)  45309 (31.7%)
Alcohol intake, n (%)
   Abstained from drinking  11034 (6.9%)  6084 (4.3%)
Special occasions only  20537 (12.8%)  8224 (5.7%)
1~3 times a month  21052 (13.2%)  12147 (8.5%)
1~2 times a week  43265 (27%)  37618 (26.3%)
3~4 times a week  36805 (23%)  41055 (28.7%)
Daily or almost daily  27384 (17.1%)  37991 (26.5%)
   47

Sleep duration, n (%)
   Normal  122086 (76.3%)  108010 (75.5%)
Short  35724 (22.3%)  33248 (23.2%)
Long  2267 (1.4%)  1861 (1.3%)
Living with others, n (%)
 
No  31468 (19.7%)  21886 (15.3%)
Yes  128609 (80.3%)  121233 (84.7%)
Diabetes, n (%)
 
No  154801 (96.7%)  133868 (93.5%)
Yes  5276 (3.3%)  9251 (6.5%)
Hypertension, n (%)
   No  123179 (76.9%)  99636 (69.6%)
Yes  36898 (23.1%)  43483 (30.4%)



 
   48

Chapter 5    Improving relative brain age estimate with a
convolutional neural network model and its implication on
identifying genetic factors associated with brain aging

5.1 Introduction
Besides lifestyle habits, genetic factors are also involved in brain aging. In this chapter,
we discuss the association between RBA and genetic factors.
A previous study analyzed brain imaging data and chronological age information from
twins and suggested that the brain aging process was heritable
34
. Two recent studies by
Jonsson et al. and by Ning et al. investigated the association between single nucleotide
polymorphisms (SNPs) and brain age using UK Biobank data, and both highlighted the
association between MAPT gene and brain age. Jonsson et al. used a convolutional
neural network (CNN) model for predicting brain age where the predictor was the whole
3D MRI scan
49
, while Ning et al. used a linear regression model for predicting brain age,
where  predictors  were  brain  morphometric  measurements  from  MRI
100
.  These  two
studies both reported mean absolute error (MAE) of around 3.5 years between the
predicted brain age and the true chronological age.  
Recently, Langner et al. trained a CNN for predicting age of the body based on whole-
body MRI of about 20,000 subjects. The MAE between predicted body age and true
   49

chronological age reached 2.5 years
46
. As a comparison, when training the CNN for
predicting brain age, Jonsson et al. used a training set of fewer than 2,000 subjects, and
reached MAE of 3.6 years
49
. We hypothesized that we might improve the brain age
prediction accuracy through training a CNN model based on imaging data, where the
training set sample size is close to 20,000. Furthermore, a more accurate predicted brain
age not only provides a more reliable estimation of brain aging status, but will also allow
identification of stronger association between genetic factors and brain age.  
In this study, we trained a CNN for predicting brain age, then obtained RBA and studied
genetic  factors  associated  with  RBA.  To  fully  utilize  the  data,  we  crossed  over  the
samples to train models for obtaining RBA and then studied association between SNPs
and RBA. We did a side-by-side comparison for the PBA accuracy between the CNN
model and a regression model. We also compared the genetic factors identified using
RBA derived from the CNN and using RBA derived from the regression model.  

5.2 Results
5.2.1 Predicted brain age accuracy from CNN model and regression model
The MAE of CNN model was 2.8 years in the evaluation set. As a comparison, the MAE of
linear regression model trained and evaluated on the same data set was 3.7. CNN model
had significantly smaller MAE than regression model (paired t-test p-value < 0.05).
   50

We observed that when using a CNN model for predicting brain age, the metric PBA-CA
was  negatively  associated  with  CA  (i.e.,  older  subjects  had  negative  PBA-CA  while
younger subjects had positive PBA-CA; Figure 5.1). This was similar to the association
between PBA-CA and CA when PBA was derived from a linear regression model (as
shown in Chapter 2). Therefore, we derived RBA after obtaining PBA using CNN model
(RBA is described in Chapter 2), and then assessed the association between RBA and
SNPs.
   51


Figure 5.1. Association between chronological age and predicted brain age from a convolutional neural network
model.

5.2.2 Genetic factors associated with brain aging
In total, 234 SNPs showed significant association with RBA (association p-value < 5E-8).
Figure 5.2 is the Manhattan plot showing association p-value between SNPs and RBA
   52

across the genome. SNP rs199533, which is located in NSF gene on chromosome 17,
showed the most significant association with RBA (p-value = 3E-12). Multiple SNPs close
to rs199533 also showed significant association with RBA, some of which are located in
MAPT  gene  (Figure  5.3).  Further,  two  SNPs  on  chromosome  4  showed  significant
association with RBA, although they are not in any gene region (Figure 5.4).  



Figure 5.2. Manhattan plot for the association p-values between SNPs and relative brain age across the genome.
The red line indicates the genome-wide significant threshold on p-value (i.e., 5E-8). The blue line indicates p-value
of 0.05.  




   53



Figure 5.3. Regional visualization of a 2-Mb locus on Chromosome 17 where the SNPs showing genome-wide
significant associations with relative brain age are located.



   54


Figure 5.4. Regional visualization of a 1-Mb locus on Chromosome 4 where the SNPs showing genome-wide
significant associations with relative brain age are located.

We also obtained RBA with a linear regression model using the same set of data and
assessed the association between RBA and SNPs. In total, 208 SNPs showed significant
association with RBA (association p-value < 5E-8). We observed that although SNPs
showing  significant  association  with  RBA  from  the  CNN  model  also  showed  strong
association with RBA from the regression mode, the association p-values were stronger
when  using  RBA  from  the  CNN  model  (Figure  5.5).  Supplementary  Figure  5.1  is  a
   55

Manhattan plot for the association between SNPs and RBA derived from the linear
regression model.


Figure 5.5. Comparison of genetic signals identified using RBA from a linear regression model and using RBA from a
convolutional neural network model.  

   56

5.3 Material and methods
5.3.1 Summary of samples used
We  used  16,998  subjects  who  had  both  quality  controlled  brain  imaging  data  and
genetic data available. The quality control on brain imaging data has been described in
Chapter 2. Details of the genotyping and genotype calling procedures are described
elsewhere
101
. Quality control was done on both SNP level and sample level. Our quality
control on SNPs ensured that all SNPs had missing rate less than 0.02 and passed Hardy-
Weinberg exact test (i.e., Hardy-Weinberg equilibrium p-value >= 1E-6). Quality control
on the samples ensured that all subjects had genotyping rate greater than 0.98 and had
heterozygosity rate within ±3 standard deviation, had matched reported gender and
genetic  gender,  and  were  of  European  ancestry  (according  to  both  self-reported
ethnicity  and  genetic  ethnicity  based  on  principal  component  analyses).  Related
individuals  (i.e.,  kinship  coefficient  >0.1)  were  further  removed.  Genotype  data  of
675,827 autosomal SNPs for 16,998 subjects was used for further association analyses.
5.3.2 Obtaining relative brain age based on a CNN model  
5.3.2.1 Training and evaluating CNN models
We used a five-fold cross-validation strategy for training a CNN model for predicting
brain age and for doing SNP-RBA association analyses. To be specific, we randomly split
the subjects into five sets, where each set had 20% of subjects, and had the same
distribution of age and gender. We used 80% of data for training a CNN model for
   57

predicting brain age, and then further trained a linear model to adjust PBA for the effect
of chronological age to obtain RBA (as described in Chapter 2). We then applied the
trained CNN model and linear regression model to the remaining 20% of samples and
obtained  PBA  and  RBA  of  these  samples.  We  crossed  over  the  split  samples  and
obtained RBA for each set of the 20% of subjects.  In the five-fold cross validation,
median mean absolute error (MAE) between PBA from the CNN and chronological age
was 2.7 (ranged from 2.5 to 3.1). We then combined RBA of all evaluation set subjects
for genetic association analyses. Figure 5.6 illustrates the procedure for obtaining RBA
through a 5-fold cross-validation strategy and then carrying out RBA-SNP association
analyses.

Figure 5.6. Procedure for obtaining RBA through a 5-fold cross-validation strategy and then carrying out RBA-SNP
association.
   58


5.3.2.2 Details of the CNN model
We used a CNN model with ResNet structure
102
implemented in NiftyNet for predicting
brain  age  based  on  3D  MRI  data  (https://niftynet.io).  We  chose  ResNet  structure
because  empirical  evidence  has  shown  that  this  structure  is  easy  to  optimize  and
performs  well  in  imaging  data  analyses
102
.  We  down-sampled  the  MRI  data  from
182*218*182  to  91*109*91  before  training  the  CNN,  due  to  the  GPU  memory
limitations. The ResNet model provided by Niftynet was composed of a convolution
layer, six bottleneck layers, a fully connected layer, and an output layer. Multiple short-
cut  connections  were  added  among  bottleneck  layers.  Niftynet  allows  the  user  to
specify the number of bottleneck layers and the number of filters in each layer. We
specified the structure as follows. The initial convolution layer had 64 filters. There were
six  bottleneck  layers:  bottleneck  layer  1  and  bottleneck  layer  2  had  128  filters;
bottleneck  layer  3  and  bottleneck  layer  4  had  256  filters;  bottleneck  layer  5  and
bottleneck layer 6 had 512 filters. The model was trained on two GPUs with learning
rate at 0.0001. The CNN structure is illustrated in Figure 5.6.
   59


Figure 5.7. Structure of the convolutional neural network model for predicting brain age based on MRI data. Curved
arrows indicate skip connections between layers.
   60


5.3.3 Genetic association analyses
We used linear regression model implemented in PLINK
103
for genotypic test, adjusting
for  gender  and  first  three  genetic  principal  components  of  ancestry,  to  test  the
association between SNPs and RBA.  

5.4 Discussion
In this study, we trained a CNN for obtaining PBA. The MAE between PBA and CA was
2.7. We observed that the metric PBA-CA was negatively associated with CA when using
CNN model (Figure 5.1). This is similar to the pattern between PBA-CA and CA when
using a regression model for predicting brain age. That is probably due to the nature of
the CNN model - the last layer of the CNN model we used was essentially a linear
regression model. The statistical reason for the negative association between PBA-CA
and  CA  was  discussed  by  Smith  et  al.
39
 Therefore,  we  further  derived  RBA  after
obtaining PBA and assessed association between RBA and SNPs.  
The SNP most significantly associated with RBA was located in NSF gene. NSF encodes
ATPase that is involved in cellular membrane fusion events, and is associated with
neuronal intranuclear inclusion disease
104
. It is also worth noting that SNPs in MAPT
gene also showed significant association with RBA. Previous studies also showed that
   61

mutations  in  MAPT,  which  encodes  tau  protein,  are  associated  with  dementia  and
Parkinson's disease
105
. Therefore, it is likely that both NSF and MAPT gene are functional
genes for brain aging. Further, functions of other SNPs showing significant association
with RBA remain to be investigated.
A CNN model was more accurate than a regression model in predicting brain age. The
genetic signals identified to be associated with RBA were very similar when using a CNN
model  and  using  a  regression  model  (Figure  5.2  and  Supplementary  Figure  5.1).
Nevertheless, the associations were stronger (i.e., with more significant p-value) when
using  RBA  derived  from  the  CNN  model  than  using  RBA  derived  from  the  linear
regression model (Figure 5.5).
We also note that although the CNN model gave more accurate prediction of brain age
then  the  regression  model,  and  therefore  allowed  identifying  more  significant
association between SNPs and RBA, the additional cost of time to train CNN models was
not trivial. For our training process, a CNN model took about two days to converge (with
two GPUs). As a comparison, training a regression model with FreeSurfer measurements
as predictors only took two minutes on one CPU.
In  summary,  a  CNN  model  was  able  to  more  accurately  predict  brain  age  than  a
regression model. With a more accurate PBA, and therefore a more accurate RBA, we
identified stronger association between SNPs and RBA. A more accurate PBA is likely to
lead  to  better  understanding  of  the  association  between  brain  aging  and  other
environmental or lifestyle factors.
   62


Supplementary Figure


Supplementary Figure 5.1. Manhattan plot for the association p-values between SNPs
and relative brain age across the genome, where RBA is derived from a linear regression
model. The red line indicates the genome-wide significant threshold on p-value (i.e., 5E-
8). The blue line indicates p-value of 0.05.  


 
   63

Chapter 6    Predicting Alzheimer's disease risk using both
neural network and regression models

6.1 Introduction
In previous chapters, we focused on cognitively normal subjects and studied factors
associated with 'normal' brain aging. In this chapter, we talk about our research on
Alzheimer's disease risk prediction. To be specific, we use a neural network (NN) model
to classify Alzheimers disease (AD) and healthy controls, and further used the NN to
predict  progression  from  mild  cognitive  impairment  (MCI)  to  AD.  We  also  tried  to
understand the 'knowledge' that NN model learned for classifying AD.
AD is characterized by specific brain structural changes and genetic risk factors
21,22,59
.
Measurements of structural changes based on MRI scans have previously been used to
classify AD patients versus cognitively normal (CN) subjects and to predict the risk of
progression  from  mild  cognitive  impairment  (MCI)  to  AD.  Statistical  classification
models, such as support vector machines
14-17,24,25
, linear discriminant analysis
23,25
, and
regression models
26-28
, etc., have been successfully trained for that. On the other hand,
AD risk is also affected by genetic variants an individual carries, which can be measured
accurately  from  birth
59
.  Previous  studies  have  also  used  genome-wide  genetic
information alone to predict AD occurrence with a logistic regression (LR) model
106,107
.
With the growing availability of data that includes both brain imaging and genetic data
   64

for AD and CN subjects, researchers have combined the structural imaging data and
genetic data for these AD classification and prediction tasks
15,108,109
. Existing studies in
AD classification and prediction have relied on statistical models that primarily include
additive effects of the included structural imaging and genetic features. However, the
estimation of AD risk may be more accurate if interactions among brain and genetic
features are also included in these models
106,110,111
. To the best of our knowledge, no
research  study  has  systematically  investigated  these  interactions  while  building
statistical models for classifying AD subjects.
To capture the joint effects of brain and genetic features in AD risks as well as the
interactions among them, we chose neural network (NN) as our modeling tool (Hinton
and Salakhutdinov, 2006). For this reason, NNs are well suited for investigating diseases
with  multifactorial  pathophysiology  and  etiology,  like  AD,  especially  as  datasets  of
neuroimaging and genetic data grow in volume.  
While NN models have been exceptionally successful at making predictions, they are
typically applied as "black-box" tools and not used to reveal the reasoning behind the
decisions. As a result, although NNs have been applied to predict AD risks
14,112
, the
important brain and genetic features and their interactions captured by the models
remain elusive. Recent advances in methods for interpretation of NN models allow
researchers to identify these salient and interacting features
113-116
. We take advantage
of  similar  methods  to  not  only  train  a  NN  model  for  AD  classification,  but  also  to
   65

investigate  this  model  to  identify  the  important  predictors  and  interactions  in  the
model.  

6.2 Results
6.2.1 Demographic information
In total, 138 AD patients, 225 CN subjects, and 358 MCI patients who had quality-
controlled quantitative brain structural data and genetic data were included in the
study. Among the 358 MCI patients, 166 progressed from MCI to AD during follow-up;
192 did not progress during at least 24 months of follow-up. Demographic information
of the subjects used in our study is summarized in Table 6.1.


Table 6.1. Demographic characteristics of Alzheimer’s disease, mild cognitive
impairment, and healthy control subjects.

Diagnostic  Number  Female|Male  Age (median[min-max])
Cognitively Normal  225  113|112  74[56-90]
Alzheimer's Disease  138  60|78  75[56-91]
Mild Cognitive Impairment  358  148|210  73[55-88]


   66

6.2.2 Models’ performance in classifying AD and in predicting progression from MCI to
AD  
We  trained  NN  models  where  brain  morphometric  and  genetic  data  were  used  as
predictors for AD (Figure 6.1). NN models that included both brain morphometric and
genetic data performed significantly better than those that included either alone. Our
analyses showed that when both brain and SNP features were included as predictors,
the 100 sub-models of the best-performing NN model had a median AUC of 0.835 in
predicting MCI progression. When only SNP features or brain features were used as
predictors,  the  best-performing  NN  model  had  a  median  AUC  of  0.689  and  0.820,
respectively. Performance of the 100 NN sub-models with both brain and SNP features
as predictors were significantly higher than that of the 100 NN sub-models with only
SNP  features  or  only  brain  features  as  predictors  (t-test  p-value  <2E-16 for both
comparisons).  
Further, the best-performing NN model had a moderately, significantly higher AUC than
the best-performing LR model where both brain and SNP features were included as
predictors (the 100 sub-models of the best-performing LR model had a median AUC of
0.824 in predicting MCI progression; t-test p-value <2E-16), indicating that NN captured
interactions among brain and SNP features which improved the model's performance.
Figure 6.2 shows the AUC of the 100 sub-models of the best-performing NN and LR
models where only SNP features or only brain features were used as predictors and
where both features were used as predictors. We also note that random sampling of the
   67

training and validation data affected the models' performance: the 100 sub-models of
the best NN model had an AUC between 0.811 and 0.846 in predicting MCI progression.  
Both NN and LR models had high accuracy in classifying AD and CN subjects. According
to the internal validation data, the best-performing NN model with both brain and SNP
predictors had a median AUC of 0.948 while the best-performing LR model with both
brain and SNP predictors had a median AUC of 0.945.  



Figure 6.1. Structure of a neural network model with two hidden layers.  

   68


Figure 6.2. Accuracy (measured as AUC) of the best-performing neural network and logistic regression models in
predicting progression from mild cognitive impairment to Alzheimer’s disease.  


6.2.3 Important brain and SNP features used by NN model
Examination of the NN model with the highest AUC in the testing data revealed brain
and SNP features that were important in the model. The volume of the left middle
temporal gyrus, the left hippocampus, the right entorhinal cortex, the left inferior lateral
ventricle and the right inferior parietal lobe were the five most important brain features
in the model (i.e., these features had the largest absolute weights). As for genetic
features, the APOE ɛ4 risk allele dosage, a major AD genetic risk factor
59,117
, had the
highest weight in the NN model. Other genetic features did not have weight as large as
the aforementioned features. Table 6.2 lists the weight of the 5 most important brain
and genetic features in the best-performing NN model.  
   69

Table 6.2. Weight of important features for classifying AD patients and CN subjects in
the neural network model.  

Brain features  Weight  Genetic features  Weight
Left Middle Temporal Gyrus  0.60  APOE ε4 dosage  0.86
Left Hippocampus  0.56  rs10948363 (CD2AP)  0.29
Right Entorhinal Cortex  0.52  rs7274581 (CASS4)  0.29
Left Inferior Lateral Ventricle  0.48  rs17125944 (FERMT2)  0.24
Right Inferior Parietal Lobule  0.40  rs4147929 (ABCA7)  0.22


6.2.4 Interactions among brain and genetic features captured by NN model
Our analyses of interactions within the best-performing NN model revealed that both
brain  and  genetic  features  were  involved  in  strong  interactions.  For  example,  the
strongest interaction captured by NN model was between the right parahippocampal
gyrus and the right lateral occipital gyrus. The second strongest interaction was between
the right banks of the superior temporal sulcus and the left posterior cingulate. The
interaction between the SNP rs10838725 and the left lateral occipital gyrus was the
third strongest interaction. Figure 6.3 shows the pairwise interaction among all the brain
and genetic features used in the NN model.  
   70


Figure 6.3. Strength of pairwise interactions among all the brain and genetic features used in the neural network
model.  

6.3 Materials and Methods  
6.3.1 Description of ADNI subjects in the study
We used brain imaging and genetic data from the Alzheimer's Disease Neuroimaging
Initiative (ADNI) database (http://adni.loni.usc.edu), a large dataset established in 2004
   71

to measure the progression of healthy and cognitively impaired participants with brain
scans, biological markers, and neuropsychological assessments
20
. A goal of ADNI has
been  to  test  whether  serial  magnetic  resonance  imaging  (MRI),  positron  emission
tomography  (PET),  other  biological  markers,  and  clinical  and  neuropsychological
assessment can be combined to measure the progression of mild cognitive impairment
(MCI) and early Alzheimer’s disease (AD).
In total, 138 AD patients, 225 CN subjects, and 358 MCI patients who had quality-
controlled  quantitative  brain  structural  data  and  genetic  data  were  included.  AD
subjects were included if they maintained AD diagnosis throughout their follow-ups.
Similarly, healthy control subjects were included if they maintained healthy control
diagnosis throughout their follow-ups. We did not require 24 months of follow-up for
the AD and healthy control subjects. Only MCI subjects who stayed as MCI or progressed
to AD were considered. MCI subjects who reverted to a healthy control diagnosis were
excluded. Among the 358 MCI patients, 166 progressed from MCI to AD during follow-
up; 192 did not progress during at least 24 months of follow-up.  
6.3.2 MRI brain imaging data and genotype data  
Baseline imaging data was obtained using 1.5T or 3T MRI. Cortical reconstruction and
volumetric segmentation was performed using FreeSurfer
65,118
and obtained from the
ADNI database. Subjects who had good overall segmentation and passed a visual quality
control process were used in our analyses. Based on prior knowledge of brain regions
affected by AD
21,22
, we included volume measurements for the following 16 regions as
   72

potential predictors in our models: hippocampus, entorhinal cortex, parahippocampal
gyrus,  superior  temporal  gyrus,  middle  temporal  gyrus,  inferior  temporal  gyrus,
amygdala, precuneus, inferior lateral ventricle, fusiform, posterior cingulate, superior
parietal lobe, inferior parietal lobe, caudate, banks of superior temporal sulcus, lateral
occipital gyrus.
ADNI subjects were genotyped on three different platforms (i.e., Illumina Human 610-
Quad,  Illumina  Human  Omni  Express  and  Illumina  Omni  2.5M).  We  merged  the
genotype data from the three platforms. Quality control ensured that 1) all subjects
were of European ancestry and had genotyping rate greater than 0.95; 2) all SNPs had
missing rate less than 0.05 and passed Hardy-Weinberg exact test (i.e., p-value >= 1E-6).
We further extracted the genotype of APOE ɛ4 risk allele and 19 SNPs reported to be
significantly associated with AD in a previous genome-wide association study
59
. Missing
genotypes of AD-associated SNPs were imputed with IMPUTE2 using 1000 genome as
reference panel
119,120
.  
6.3.3 Neural network and logistic regression models
We trained NN and LR models to classify AD versus healthy control subjects given brain
and SNP features as predictors. After training the models, we applied them to MCI
subjects to predict each MCI subject's risk of progression to AD.  
A LR model assumes that each predictor contributes additively to the subject’s log odds
of AD, (see Equation 1).
   73

𝑙𝑜𝑔
%
&'%
= 𝛽
*
+ 𝛽
,
𝑥
,
.
,/*
           (1)  
Here 𝑝 is the probability that a subject has disease; 𝛽
*
is a bias term; 𝛽
,
is the weight of
input feature 𝑥
,
, reflecting the strength and directionality of 𝑥
,
in affecting 𝑝, and 𝑚 is
the number of input features.  
A NN is a network whose nodes (or “artificial neurons”) encode information with their
activation level (a real-valued number). We will refrain from referring to these nodes as
artificial  neurons  to  avoid  confusion  with  biological  neurons.  In  a  NN,  nodes  are
organized in multiple layers: an input layer, one or more hidden layers and an output
layer. Nodes in the input layer represent the brain and SNP features as predictors, nodes
in the second and third layers allow interactions among the predictors in the first layer,
and the output layer contains a single node that represents the disease risk (Figure 6.1).  
More specifically, in a NN with L layers, let ℎ
,
7
denote the activation level of a node j in
the  l-th  hidden  layer,  and 𝑚
7
denote  the  number  of  hidden  units  in  this  layer.  We
overload this notation to use ℎ
,
*
to represent the input features 𝑥
,
. Each hidden unit
activation is computed as the weighted sum of the nodes’ activations from the layer
below followed by a non-linear transformation function 𝑓(𝑥) (see Equation 2).  
ℎ
9
7
= 𝑓 𝑏
9
7
+ ℎ
,
7'&
𝑤
,9
7
.
<=>
,/&
,𝑙 = 1,...,𝐿−1        (2)  
Here 𝑤
,9
7
is the weight of the connection from ℎ
,
7'&
, the 𝑖 -th node in layer l-1, to ℎ
9
7
, the
𝑗 -th node in layer l. 𝑏
9
7
is a bias term that regulates the overall activation level for node
   74

𝑗. The non-linear function 𝑓 in our model is the rectified linear function (ReLU): 𝑓(𝑥) =
max(0,x).
In the output layer, the NN predicts the log odds of AD using a weighted sum of the
hidden layer features, (see Equation 3)
𝑙𝑜𝑔
𝑝
1−𝑝
= 𝑏
J
+ ℎ
,
J'&
𝑤
,
J
.
K=>
,/&
           (3)
where 𝑏
J
and 𝑤
,
J
are  the  corresponding  bias  and  weights  for  the  output  layer.  The
weights and biases for all layers are learned from the training data
41
.  
Contrasting Equation 3 with Equation 1, we see that the last layer of a NN is identical to
a LR model except that the "features" are replaced with hidden activations at layer L-1,
which are highly nonlinear functions of the input.  
2.3.1. Shortcut connections
We additionally employ shortcut connections that connect all nodes in the input layer
directly to the output layer
102,121
. This connectivity structure can be considered as a
hybrid between LR and NN, which allows our network to not waste resources modeling
additive effects in the input features and reserve the NN for complex interactions only.
To prevent over-fitting we used L1 regularization on the weights and early stopping for
both NN and LR models
122
. We trained both NN and LR models using MatConvNet
package
123
with identical training protocols, which allowed fair comparison of the two
models.    
   75

6.3.4 Procedures for model training and testing
6.3.4.1 Training, validation, and testing data
Model training and validation was done using data collected for subjects with AD and
healthy controls. Model testing was carried out for subjects with MCI (Figure 6.4).

Figure 6.4. Training, validation, and testing data.


6.3.4.2 Predictors in the models
We included brain and genetic features described in section 2.2 as predictors in the
models. We adjusted for age, gender, education and first three principal components
derived from the genetic data by including them as predictors. All predictors were
normalized to have a mean of zero and a variance of 1 across subjects.  
6.3.4.3 Random sampling and model training
We used a random selection of 80% of the AD and healthy control subjects for training
models, and 20% of the remaining subjects for internal validation (i.e., for selecting the
number of training iterations of NN and LR models). This random selection of samples
   76

was repeated 100 times for each model in order to take into account variations in the
data. After training the models, we applied them to the MCI subjects to test their ability
to predict progression to AD.
6.3.4.4 Hyper-parameters in the NN and LR models  
The hyper-parameters explored include learning rate (ranging from 1E-3 to 1E-1) and
weight decay (ranging from 1E-5 to 1). We also explored the number of hidden nodes in
the two hidden layers for NN model (ranging from 2, 4, 8, up to 64 nodes in each layer).
In total, we assessed 100 NN models with different hyper-parameter combinations,
where the hyper-parameters were randomly selected from the afore-mentioned ranges.
For  each  NN  model  with  a  specific  hyper-parameter  combination,  we  trained  a
corresponding  LR  model  with  the  same  learning  rate  and  weight  decay  parameter
values.  Therefore,  we  trained  100  NN  models,  along  with  their  corresponding  LR
models,  where  each  model  was  trained  and  validated  using  100  sets  of  randomly
selected AD and healthy control subjects.
6.3.4.5 Model evaluation  
Accuracy of models was evaluated using receiver operating characteristic curve (ROC).
Since a model with a specific hyper parameter combination was trained and validated
using data of 100 subsets of random AD and healthy control subjects, 100 “sub-models”
(i.e., each sub-model has its own estimations of the weight parameters) were obtained
for that model. Thus, we used median area under the ROC curve (AUC) of these 100 sub-
   77

models to represent a model's accuracy when applying it to the internal validation data
and the testing data.  
6.3.5 Identifying important brain and SNP features  
After  training  the  NN  models  where  both  brain  and  SNP  features  were  used  as
predictors, we assessed the NN model with the highest accuracy in the testing data and
identified brain and SNP features that were important in the model. Within a trained NN
model, the importance of a feature is estimated with partial derivatives method
124
: for
each predictor 𝑥
,
,  we took the derivative of the predicted log likelihood ratio of a
subject 𝑠 having  AD  with  respect  to 𝑥
,
and  then  averaged  the  derivative  over  all
subjects: the importance score of 𝑥
,
= Ε
O
(
P 7QR(%
S
/(&'%
S
))
P U
V
), where 𝑝
O
is the predicted AD
risk of subject 𝑠. The importance score is computed over all 100 rounds of the best-
performance model, and the magnitude of the median score is used to represent the
importance of predictor 𝑥
,
.  
The same definition of importance predictors could be applied to the LR models. For LR
models whose log likelihood ratio is given by Equation (1), the importance score of  𝑥
,

evaluates  to 𝛽
,
.  In  other  words,  the  importance  of  a  feature  is  given  by  the
corresponding regression coefficient. This is consistent with how LR has been used to
assess predictor importance. We also note that since the predictors were normalized to
have mean of zero and variance of 1, the importance score of the predictors we report
is not dependent on scale and is comparable to each other.
   78

6.3.6. Identifying interactions among features  
We identified pairwise interactions among brain and SNP features by investigating the
best-performing NN and LR models on the test set. Within a trained model, the pair-
wise  interaction  between  features 𝑥
,
and 𝑥
9
is  defined  as 𝐼
,9
= 𝐸
O
(
P
Y
7QR(%
S
/(&'%
S
))
P U
V
P U
Z
),
where 𝑝
O
is  the  predicted  probability  that  subject 𝑠 has  AD.    Note  that  for  LR,  the
interaction  above  has  a  closed-form  solution,  which  is 𝐼
,9
= 0 for  all 𝑥
,
and 𝑥
9
.  This
serves  as  a  sanity  check  that  LR  does  not  model  pair-wise  interactions  among  its
predictors.  
For NNs, 𝐼
,9
does not admit a closed-form solution and must be computed numerically,
which  may  introduce  estimation  errors.  Leveraging  the  fact  that  the  theoretical
interaction scores for LR models are always 0, we apply the same numerical procedure
to  estimate  the  interaction  scores  for  both  NN  and  LR  models,  and  quantify  the
significance of an interaction in NN by testing how significantly its score differs from the
corresponding score in a LR model. In particular, we calculated pair-wise interactions in
the 100 NN sub-models of the best-performing NN model. We also calculated pair-wise
interactions  in  the  LR  model  that  had  the  same  hyper-parameters  as  the  best-
performing NN model. For a given pair of features, we compared their interaction in the
100 NN sub-models and in the 100 LR sub-models using Wilcoxon test, then reported
the interaction strength as -log(p-value).  
   79

6.4 Discussion
In this study, we systematically employed NN models for classifying AD patients and CN
subjects and then investigated the ability of the trained NN models to identify important
predictors and interactions in the models.  
We found that including both brain and genetic features as model predictors increased
the models' performance compared with only including either brain or SNP features in
the models. Genetic features were important for predicting MCI progression: a random
prediction usually yields an AUC around 0.5, while using genetic features as predictors
increased the median AUC of the best-performing NN models to 0.689. In comparison,
the  best-performing  NN  models  with  brain  features  alone  as  predictors  had  better
performance in predicting MCI progression, with a median AUC of 0.820. Including both
brain  and  SNP  features  as  predictors  in  the  models  further  increased  the  models’
prediction accuracy by a moderate amount: median AUC of the best-performing NN
models reached 0.835. Although the combined brain and SNP features achieves a higher
AUC, it is worth noting that while the SNP features are available at birth, the brain
features may not reflect the neurodegenerative changes associated with AD risks for
subjects at a younger age
2,125
.  
Analyses of the trained NN models indicated that measurements of the middle temporal
gyrus,  the  hippocampus  and  the  entorhinal  cortex  were  the  most  important  brain
features for predicting AD risks. The hippocampus plays an important role in memory
formation and is well known to be affected by AD
21,22,126
. Further, a previous study
   80

reported that these three structures were among the ones with the largest effect size
on MCI progression
127
. As for genetic features, APOE ɛ4 risk allele dosage had the
highest  weight  in  the  NN  model.  While  GWAS  had  identified  other  genetic  loci
significantly associated with AD risks
59
, the weights of those features were lower than
those of the aforementioned brain features in the model.  
While the NN model performed significantly better than the LR model (p-value <2e-16),
the performance increase is modest. The median AUC of the best-performing LR model
was 0.824, while median AUC of the best-performing NN model was 0.835. Over all, the
interaction effects were subtle compared to the additive effects. However, this modest
performance increase, which is likely a result of the NN modeling interactions among
predictors, provides evidence that there do exist interactions among the brain and
genetic features. Beyond building the NN models for classification and prediction, there
is important knowledge about disease pathophysiology to be gained in studying the
interactions within them. Existing NN models of AD risk have all been treated as a black-
box and not further investigated. For the first time in the field of imaging genetics, we
use very recently developed NN analysis techniques to investigate the NN models and
identify the important interactions among the brain and genetic features in affecting AD
risk.  
Our novel analysis of the trained NN model revealed three strong interactions of AD risk
factors, each of which are biologically plausible, provide insight into the pathophysiology
of AD, and warrant further study. First, the NN identified a strong interaction between
   81

the right parahippocampal gyrus and the right lateral occipital gyrus. A relevant finding
was reported by Sommer et al.
128
, who observed correlated activity in the occipital and
the parahippocampal cortex during encoding and the resulting memory trace. We would
therefore expect AD patients to have aberrant connectivity between these two regions,
either structurally or functionally as measured by diffusion weighted MRI or functional
MRI. Second, the NN identified an interaction between the right banks of the superior
temporal sulcus and the left posterior cingulate. The posterior cingulate cortex is a
central part of the default mode network in the brain and is known to have prominent
projections to the superior temporal sulcus
129
. Third, the NN found an interaction
between  the  SNP  rs10838725  and  the  left  lateral  occipital  gyrus.  Recent  evidence
suggests that the functional gene of this SNP with respect to AD is the SPI1 gene, which
plays a role in myeloid cell function
130
. Neuroinflammation in AD, which is mediated by
myeloid  cells,  is  known  to  occur  in  the  occipital  cortex
131
.  The  mechanism  of
interactions among such brain and genetic features require further study.  
Our study has some limitations. First, there are different ways to define MCI progression
versus  non-progression.  Longer  follow-up  time  may  allow  us  to  observe  more  MCI
patients progressing to AD. The models’ accuracy in predicting MCI progression may
change when different definitions of progression are used and when observations from
longer follow-up are used. Second, as more GWAS are carried out, our understanding of
the SNPs and genes associated with AD will get updated. Therefore, the importance
score of the predictors and the predictor interactions should be interpreted with that in
mind and would need further validation. Third, since NN models have more parameters
   82

than LR models, more data points are needed to train the NN models
132
. It is possible
that with a larger sample size, NN may have a more significant advantage compared to
LR  models.  Also,  our  findings  on  the  important  predictors  and  interactions  among
predictors may get refined and updated as the sample size increases. Fourth, the NN
models with different structures may have different performance. In our analyses, we
used a structure with two hidden layers and a direct connection between the input layer
and the output layer. As a comparison, in the previous studies using NN models for
classifying AD, only two hidden layers were used, and no direct connection was built
between the input layer and the output layer
14,112
. Finally, interpreting NN models
remains an open research topic. Our definition of important predictors and interactions
was  based  on  derivatives  of  the  log  likelihood  of  disease  risk,  while  alternative
definitions
114,116,124,133
may reveal other insights into the model.  
To summarize, we trained NN models for classifying AD and CN subjects, yielding models
with good performance on the task of classifying and predicting AD. Our novel analyses
of the trained NN models led to findings of important brain and genetic features and
interactions among them that affect AD risk, which can guide future research on AD
etiology.  Our  approach  of  training  and  investigating  NN  models  can  be  particularly
valuable  for  understanding  etiology  of  diseases  that  have  multiple,  interacting  risk
factors.
 
   83

Chapter 7:  Conclusions and future directions

7.1 Summary of findings
Large-scale multimodal data has a great potential to help us understand and defy brain
aging. In our research, we analyzed these data and help to realize this potential.
In Chapters 2-5, we studied factors associated with brain aging in healthy subjects.
Chapter 2 introduced a process to obtain RBA, an imaging based biomarker indicating
how old a subject’s brain structure appears compared to peers, using a regression-based
method.  It  allowed  direct  comparisons  among  subjects  with  different  chronological
ages.  Chapter  3  focused  on  the  association  of  RBA  with  smoking  and  alcohol
consumption, two factors that were typically believed to be detrimental to the brain.
We found that subjects who smoked on most or all days had brains appearing 0.6 years
older than subjects who didn't smoke on most or all days, while there was no significant
difference among the groups of subjects who didn’t smoke on most or all days and
those who abstained from smoking. We also found that subjects who drank alcohol on
most or all days had brains appearing 0.4 years older than subjects who didn't drink on
most or all days, while there was no significant difference among the groups of subjects
who didn’t drink on most or all days and those who abstained from drinking. Chapter 4
focused on the association of RBA with number of offspring in both males and females.
Before our study, it was unclear if having offspring had long-term effect on the brain.
We  found  that  in  both  sexes,  subjects  with  any  number  of  offspring  had  younger
   84

appearing  brain  than  subjects  with  no  offspring.  In  male  subjects,  the  association
between parity and RBA followed a "U-shape" pattern, where subjects with 2 or 3
offspring had younger appearing brain compared to subjects with 0, 1, or >=4 offspring.
In comparison, a linear relationship was observed between number of offspring and RBA
in  females.  This  linear  association  may  be  explained  by  the  hormonal  fluctuation
specifically linked to women's pregnancy history and remains to be further investigated.
Our finding suggested that lifestyle factors accompanying having offspring, rather than
the physical process of pregnancy experienced only by females, contribute to these
associations.  After  studying  the  association  between  RBA  and  lifestyle  factors,  we
studied association between RBA and genetic factors in Chapter 5. We hypothesized
that a CNN model may produce a more accurate RBA metric than a regression model
and therefore facilitate to better identify genetic factors that are significantly associated
with RBA. After obtaining RBA from a CNN model, we found that the most significant
RBA-associated SNPs were in a chromosome 17 locus, which highlighted involvement of
NSF  gene  and  MAPT  gene  in  brain  aging.  We  also  trained  a  regression  model  for
obtaining RBA and then identified SNPs associated with RBA from the regression model.
Although  SNPs  showing  significant  association  with  RBA  from  the  CNN  model  also
showed strong association with RBA from the regression mode, the association p-values
were stronger when using RBA from the CNN model. We concluded that a CNN model
more accurately assessed the RBA than a regression model and allowed identifying
more significant SNP-RBA association.  
   85

Through  literature  review,  we  found  previous  studies  that  provide  insights  for  the
associations  of  RBA  with  lifestyle  and  genetic  factors.  For  example,  we  found  that
subjects who smoked or consumed alcohol on most or all days had a significantly higher
RBA  compared  to  subjects  who  smoked  or  consumed  alcohol  less  often.  That  was
consistent with previous studies, which showed significantly greater rate of atrophy in
certain  regions  of  the  brains  of  smokers  and  drinkers
50-54,74
.  Further,  alcohol
consumption causes dehydration, which may also increase apparent brain age
134,135
. On
the other hand, there was no significant RBA difference among subjects who smoked
occasionally and those who abstained from smoking, or among subjects who drank
alcohol less frequently and those who abstain from drinking. We hypothesize that the
benefit  of  nicotine  and  wine  drinking  previously  reported  might  counteract  the
detrimental effect of smoking and alcohol consumption
68,69,73,75,76
. We also found that in
both females and males, parity was associated with younger brain age and improved
cognitive function. That was corroborated by a previous research which reported that
having offspring was associated with lower mortality risk in both sexes
91
.  Further,
children might serve as a 'bridge' connecting parents to more social and community
activities
88
. Adult children can provide parents with emotional and social support, as
well as instrumental support such as s shopping and house work
89,90
. All of these could
contribute to the overall health status of parents. We further identified SNPs in NSF and
MAPT genes that were significantly associated with RBA. Previous literature suggested
that both these genes are likely to be functional genes for brain aging. NSF encodes
ATPase that is involved in cellular membrane fusion events, and is associated with
   86

neuronal  intranuclear  inclusion  disease,  a  progressive  neurodegenerative  disease
104
.
MAPT gene encodes tau protein, which accumulates in Alzheimer's disease brains. Also,
mutations in MAPT are associated with dementia and Parkinson's disease
105
.  
In Chapter 6, we studied brain aging for AD. To be specific, we trained NN models to
classify AD using brain morphometric measurements derived from MRI data and genetic
data. We then applied the model to MCI subjects and predict their progression to AD.
NN  models  with  both  brain  and  SNP  features  as  predictors  performed  significantly
better  than  NN  models  with  either  genetic  or  brain  features  alone  in  predicting
progression from MCI to AD. To be specific, median AUC was 0.835, 0.820, and 0.689 for
NN models with both features as predictors, with only brain features as predictors, and
with only genetic features as predictors, respectively. Further, NN models performed
better than regression models in predicting progression to AD, indicting that NN models
captured interactions among predictors that improved the models' performance. We
further identified strong brain and genetic features interactions in affecting AD risk
through analyzing the NN model. NN showed a strong interaction between the right
parahippocampal gyrus and the right lateral occipital gyrus. A study by Sommer et al.
128

found  correlated  activity  in  the  occipital  and  the  parahippocampal  cortex  during
encoding and the resulting memory trace. We therefore expect AD patients to have
aberrant connectivity between these two regions, either structurally or functionally as
measured  by  diffusion  weighted  MRI  or  functional  MRI.  The  NN  also  showed  an
interaction  between  the  SNP  rs10838725  and  the  left  lateral  occipital  gyrus.  The
functional gene of this SNP with respect to AD is the SPI1 gene, which plays a role in
   87

myeloid cell function
130
. Neuroinflammation in AD, which is mediated by myeloid cells,
is known to occur in the occipital cortex
131
.
Although we found previous studies that supported our findings, further experiments or
analyses  need  to  be  carried  out  to  further  understand  the  biology  behind  the
associations we observed.

7.2 The pros-and-cons of NNs for analyzing big data
We used both traditional regression models and NN models for brain age prediction and
for  Alzheimer’s  disease  risk  prediction.  In  both  tasks,  NN  models  showed  better
accuracy than regression models. Nevertheless, there are some disadvantages of using
NN models, especially when the training sample size is small or when the access to GPU
is limited.  
A NN model is likely to outperform a regression model when there are interactions
among input features, and the training set is large enough. However, when the sample
size of the training set is small, a NN may not always outperform a regression model.
This is because with the same number of input (or predictors), a NN model has a more
complicated structure and therefore more parameters to fit than a regression model. A
small sample size is more likely to cause under-fitting in NN models than in regression
models. Schulz et al. reported that in an imaging classification task, accuracy of both NN
models and regression models increased as sample size increased. Further, with more
   88

observations available for model training, a NN model improved more in prediction
accuracy over a regression model
136
.
It is also worth noting that many hyper-parameters in the NN model can be explored for
the NN model to reach a better accuracy. Those include but are not limited to learning
rate, cost function, number of nodes in each layer, as well as the network structure.
However, due to limited access to GPU when conducting our research, we only tried a
few hyper-parameters. Although our CNN model achieved the state-of-art accuracy in
predicting  brain  age,  an  even  better  CNN  can  probably  be  trained  with  more
computation power.
Another concern in using NN models is the computation time. In our task for training a
model to predict brain age with 14,000 training subjects, a NN model took about two
days to converge (with 2 GPUs). As a comparison, training a regression model with
FreeSurfer measurements as predictors only took two minutes. When there is limited
access to GPU, it may be more practical to try regression models first. If the regression
model indicates a potentially promising finding, we can switch to a NN model for further
investigation.

7.3 Identifying factors associated with brain aging and the next step
We found that RBA was significantly associated with smoking, alcohol consumption,
parity history and genetic factors. We note that many other factors may also affect brain
   89

aging. For example, researchers have shown that exercise improves general health and
may  also  slow  down  brain  aging  and  reduce  AD  risk
137,138
,  while  diseases  such  as
diabetes and Schizophrenia are associated with accelerated brain aging
30,139
. As large-
scale  data  covering  these  factors  become  increasingly  accessible,  the  process  of
obtaining RBA as well as the NN models discussed in this thesis can be easily extended
to reveal the connections between these factors and brain aging.  
The thesis has only focused on identifying lifestyle and genetic factors associated with
brain aging. However, a more interesting and practical question is if we could make a
comprehensive plan to interfere with the brain aging process (Figure 7.1). This thesis has
set a few potential directions, such as restraining from excessive smoking and alcohol
consumption, developing therapies targeting the gene MAPT or NSF. In addition, RBA
provides the foundation for validating such a plan.  

Figure 7.1 Paradigm of research on factors associated with brain aging.  

7.4 Potential of dementia onset prediction based on imaging data
Nowadays,  there  is  no  cure  for  Alzheimer's  disease,  the  most  common  form  of
dementia
6
. If people at risk of developing AD or other forms of dementia were identified
   90

at a much earlier stage when no symptom arises, they could have more time to identify
a preventative method most suitable for them.  
Previous studies showed that AD patients have an accelerated brain age of about 10
years
29
compared to their chronological age. We hypothesize that the accelerated brain
age may have started many years before any AD symptom arises and can be captured by
RBA. Therefore, we suggest that a long-term follow up study be carried out to assess
how RBA can be used to accurately predict AD onset. Three sources of information need
to be collected. First, brain MRI data for cognitively healthy subjects at baseline, and at
each  follow-up  visit  if  possible.  In  this  way,  RBA  can  be  monitored  longitudinally.
Second, cognitive function score and dementia status of the subjects during each visit.
Third, data on other factors that have been have been associated with AD
140-142
, such as
genetic mutation, amyloid beta plaques identified through PET scan, serum or blood
biomarkers including amyloid beta and neurofilament, etc. This project has a potential
obstacle in applying the model for obtaining RBA trained with UK Biobank to another
dataset. The UK Biobank data were collected for subjects 45 years or older using the
same type of scanner across different data collection centers. Therefore the model's
accuracy may drop when applied to another data set that has an age range different
from that of UK Biobank, or was obtained from MRI machines different from that used
by  UK  Biobank.  A  potential  solution  is  transfer  learning
143
,  where  the  model  for
obtaining RBA is re-trained with part of the new data set.
   91

7.5 Conclusion
The knowledge we learned about brain aging has two major implications on health care.
First, with a clear picture of the environmental and lifestyle factors associated with brain
aging, people can get educated to avoid the detrimental factors and adopt habits that
are beneficial to the brain in their daily life. Second, people at a high risk of accelerated
brain  aging  or  dementia  can  get  early  interventions  ranging  from  changing  their
lifestyles to joining clinical trials before the disease symptoms arise. Both implications
are conducive to the mission of shifting our focus from disease treatment to prevention,
a mission that becomes increasingly more impactful as our population continues to age.

   92

References

1.  Lindenberger, U. Human cognitive aging: corriger la fortune? Science 346,
572-8 (2014).
2.  Jack, C.R., Jr. et al. Age, Sex, and APOE epsilon4 Effects on Memory, Brain
Structure, and beta-Amyloid Across the Adult Life Span. JAMA Neurol 72,
511-9 (2015).
3.  Andersen, K. et al. Gender differences in the incidence of AD and vascular
dementia: The EURODEM Studies. EURODEM Incidence Research Group.
Neurology 53, 1992-7 (1999).
4.  Ortman, J., Velkoff, V. & Hogan, H. An Aging Nation: The Older Population in  
the United States. (2014).
5.  Cole, J.H., Marioni, R.E., Harris, S.E. & Deary, I.J. Brain age and other bodily
'ages': implications for neuropsychiatry. Mol Psychiatry 24, 266-281 (2019).
6.  Alzheimer's Association. Alzheimer’s disease - facts and figures. Alzheimer's &
Dementia (2019).
7.  Hayes, C.E., Edelstein, W.A., Schenck, J.F., Mueller, O.M. & Eash, M. An Efficient,
Highly Homogeneous Radiofrequency Coil for Whole-Body NMR Imaging at
1.5 T. Journal of Magnetic Resonance 63(1985).
8.  Kellner, T. Heady Times: This Scientist Took the First Brain Selfie and Helped
Revolutionize Medical Imaging. GE Reports (2015).
9.  Calderon-Garciduenas, L. et al. Exposure to severe urban air pollution
influences cognitive outcomes, brain volume and systemic inflammation in
clinically healthy children. Brain Cogn 77, 345-55 (2011).
10.  Gebarski, S.S. et al. The initial diagnosis of multiple sclerosis: clinical impact
of magnetic resonance imaging. Ann Neurol 17, 469-74 (1985).
11.  Karayiannis, C. et al. Prevalence of Brain MRI Markers of Hemorrhagic Risk in
Patients with Stroke and Atrial Fibrillation. Front Neurol 7, 151 (2016).
12.  Pereira, S., Pinto, A., Alves, V. & Silva, C.A. Brain Tumor Segmentation Using
Convolutional Neural Networks in MRI Images. IEEE Trans Med Imaging 35,
1240-1251 (2016).
   93

13.  Coffey, C.E. et al. Quantitative cerebral anatomy of the aging human brain: a
cross-sectional study using magnetic resonance imaging. Neurology 42, 527-
36 (1992).
14.  Aguilar, C. et al. Different multivariate techniques for automated
classification of MRI data in Alzheimer's disease and mild cognitive
impairment. Psychiatry Res 212, 89-98 (2013).
15.  Da, X. et al. Integration and relative value of biomarkers for prediction of MCI
to AD progression: spatial patterns of brain atrophy, cognitive scores, APOE
genotype and CSF biomarkers. Neuroimage Clin 4, 164-73 (2014).
16.  Davatzikos, C., Bhatt, P., Shaw, L.M., Batmanghelich, K.N. & Trojanowski, J.Q.
Prediction of MCI to AD conversion, via MRI, CSF biomarkers, and pattern
classification. Neurobiol Aging 32, 2322 e19-27 (2011).
17.  Davatzikos, C., Xu, F., An, Y., Fan, Y. & Resnick, S.M. Longitudinal progression
of Alzheimer's-like patterns of atrophy in normal older adults: the SPARE-AD
index. Brain 132, 2026-35 (2009).
18.  Van Horn, J.D. & Toga, A.W. Human neuroimaging as a "Big Data" science.
Brain Imaging Behav 8, 323-31 (2014).
19.  Allen, N.E., Sudlow, C., Peakman, T., Collins, R. & UK Biobank. UK biobank
data: come and get it. Sci Transl Med 6, 224ed4 (2014).
20.  Petersen, R.C. et al. Alzheimer's Disease Neuroimaging Initiative (ADNI):
clinical characterization. Neurology 74, 201-9 (2010).
21.  Weiner, M.W. et al. The Alzheimer's Disease Neuroimaging Initiative: a
review of papers published since its inception. Alzheimers Dement 9, e111-94
(2013).
22.  Weiner, M.W. et al. 2014 Update of the Alzheimer's Disease Neuroimaging
Initiative: A review of papers published since its inception. Alzheimers
Dement 11, e1-120 (2015).
23.  Eskildsen, S.F. et al. Structural imaging biomarkers of Alzheimer's disease:
predicting disease progression. Neurobiol Aging 36 Suppl 1, S23-31 (2015).
24.  Orru, G., Pettersson-Yeo, W., Marquand, A.F., Sartori, G. & Mechelli, A. Using
Support Vector Machine to identify imaging biomarkers of neurological and
psychiatric disease: a critical review. Neurosci Biobehav Rev 36, 1140-52
(2012).
25.  Wolz, R. et al. Multi-method analysis of MRI images in early diagnostics of
Alzheimer's disease. PLoS One 6, e25446 (2011).
   94

26.  Desikan, R.S. et al. Automated MRI measures identify individuals with mild
cognitive impairment and Alzheimer's disease. Brain 132, 2048-57 (2009).
27.  Liu, X., Tosun, D., Weiner, M.W., Schuff, N. & Alzheimer's Disease
Neuroimaging, I. Locally linear embedding (LLE) for MRI based Alzheimer's
disease classification. Neuroimage 83, 148-57 (2013).
28.  Young, J. et al. Accurate multimodal probabilistic prediction of conversion to
Alzheimer's disease in patients with mild cognitive impairment. Neuroimage
Clin 2, 735-45 (2013).
29.  Franke, K., Ziegler, G., Kloppel, S., Gaser, C. & Alzheimer's Disease
Neuroimaging, I. Estimating the age of healthy subjects from T1-weighted
MRI scans using kernel methods: exploring the influence of various
parameters. Neuroimage 50, 883-92 (2010).
30.  Franke, K., Gaser, C., Manor, B. & Novak, V. Advanced BrainAGE in older
adults with type 2 diabetes mellitus. Front Aging Neurosci 5, 90 (2013).
31.  Nenadic, I., Dietzek, M., Langbein, K., Sauer, H. & Gaser, C. BrainAGE score
indicates accelerated brain aging in schizophrenia, but not bipolar disorder.
Psychiatry Res 266, 86-89 (2017).
32.  Cole, J.H. & Franke, K. Predicting Age Using Neuroimaging: Innovative Brain
Ageing Biomarkers. Trends Neurosci 40, 681-690 (2017).
33.  Liem, F. et al. Predicting brain-age from multimodal imaging data captures
cognitive impairment. Neuroimage 148, 179-188 (2017).
34.  Cole, J.H. et al. Predicting brain age with deep learning from raw imaging data
results in a reliable and heritable biomarker. Neuroimage 163, 115-124
(2017).
35.  Murphy, K.P. Machine Learning: A Probabilistic Perspective (2018).
36.  Hibar, D.P., Kohannim, O., Stein, J.L., Chiang, M.C. & Thompson, P.M.
Multilocus genetic analysis of brain images. Front Genet 2, 73 (2011).
37.  Potkin, S.G. et al. Hippocampal atrophy as a quantitative trait in a genome-
wide association study identifying novel susceptibility genes for Alzheimer's
disease. PLoS One 4, e6501 (2009).
38.  Teipel, S.J., Kurth, J., Krause, B., Grothe, M.J. & Alzheimer's Disease
Neuroimaging, I. The relative importance of imaging markers for the
prediction of Alzheimer's disease dementia in mild cognitive impairment -
Beyond classical regression. Neuroimage Clin 8, 583-93 (2015).
   95

39.  Smith, S.M., Vidaurre, D., Alfaro-Almagro, F., Nichols, T.E. & Miller, K.L.
Estimation of brain age delta from brain imaging. Neuroimage 200, 528-539
(2019).
40.  Gunther, F., Wawro, N. & Bammann, K. Neural networks for modeling gene-
gene interactions in association studies. BMC Genet 10, 87 (2009).
41.  LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436-44 (2015).
42.  Hinton, G.E. & Salakhutdinov, R.R. Reducing the dimensionality of data with
neural networks. Science 313, 504-7 (2006).
43.  Krizhevsky, A., Sutskever, I. & Hinton, G.E. Imagenet classification with deep
convolutional neural networks. (2012).
44.  Silver, D. et al. Mastering the game of Go with deep neural networks and tree
search. Nature 529, 484-9 (2016).
45.  Esteva, A. et al. Corrigendum: Dermatologist-level classification of skin
cancer with deep neural networks. Nature 546, 686 (2017).
46.  Langner, T., Wikstrom, J., Bjerner, T., Ahlstrom, H. & Kullberg, J. Identifying
morphological indicators of aging with neural networks on large-scale
whole-body MRI. IEEE Trans Med Imaging (2019).
47.  Bernal, J. et al. Deep convolutional neural networks for brain image analysis
on magnetic resonance imaging: a review. Artif Intell Med 95, 64-81 (2019).
48.  Havaei, M. et al. Brain tumor segmentation with Deep Neural Networks. Med
Image Anal 35, 18-31 (2017).
49.  Jonsson, B.A. et al. Brain age prediction using deep learning uncovers
associated sequence variants. Nat Commun 10, 5409 (2019).
50.  Durazzo, T.C., Insel, P.S., Weiner, M.W. & Alzheimer Disease Neuroimaging, I.
Greater regional brain atrophy rate in healthy elderly subjects with a history
of cigarette smoking. Alzheimers Dement 8, 513-9 (2012).
51.  Duriez, Q., Crivello, F. & Mazoyer, B. Sex-related and tissue-specific effects of
tobacco smoking on brain atrophy: assessment in a large longitudinal cohort
of healthy elderly. Front Aging Neurosci 6, 299 (2014).
52.  Gallinat, J. et al. Smoking and structural brain deficits: a volumetric MR
investigation. Eur J Neurosci 24, 1744-50 (2006).
53.  Pfefferbaum, A. et al. Brain gray and white matter volume loss accelerates
with aging in chronic alcoholics: a quantitative MRI study. Alcohol Clin Exp
Res 16, 1078-89 (1992).
   96

54.  Asensio, S. et al. Magnetic resonance imaging structural alterations in brain of
alcohol abusers and its association with impulsivity. Addict Biol 21, 962-71
(2016).
55.  Hoekzema, E. et al. Pregnancy leads to long-lasting changes in human brain
structure. Nat Neurosci 20, 287-296 (2017).
56.  Luders, E. et al. Potential Brain Age Reversal after Pregnancy: Younger Brains
at 4-6Weeks Postpartum. Neuroscience 386, 309-314 (2018).
57.  Oatridge, A. et al. Change in brain size during and after pregnancy: study in
healthy women and women with preeclampsia. AJNR Am J Neuroradiol 23,
19-26 (2002).
58.  Apostolova, L.G. et al. Hippocampal atrophy and ventricular enlargement in
normal aging, mild cognitive impairment (MCI), and Alzheimer Disease.
Alzheimer Dis Assoc Disord 26, 17-27 (2012).
59.  Lambert, J.C. et al. Meta-analysis of 74,046 individuals identifies 11 new
susceptibility loci for Alzheimer's disease. Nat Genet 45, 1452-8 (2013).
60.  Cole, J.H. et al. Brain age predicts mortality. Mol Psychiatry 23, 1385-1392
(2018).
61.  Lowe, L.C., Gaser, C., Franke, K. & Alzheimer's Disease Neuroimaging, I. The
Effect of the APOE Genotype on Individual BrainAGE in Normal Aging, Mild
Cognitive Impairment, and Alzheimer's Disease. PLoS One 11, e0157514
(2016).
62.  Hutcheon, J.A., Chiolero, A. & Hanley, J.A. Random measurement error and
regression dilution bias. BMJ 340, c2289 (2010).
63.  Miller, K.L. et al. Multimodal population brain imaging in the UK Biobank
prospective epidemiological study. Nat Neurosci 19, 1523-1536 (2016).
64.  Smith, S., Alfaro-Almagro, F. & Miller, K. UK Biobank Brain Imaging
Documentation. (2017).
65.  Fischl, B. FreeSurfer. Neuroimage 62, 774-81 (2012).
66.  Friedman, J., Hastie, T. & Tibshirani, R. Regularization Paths for Generalized
Linear Models via Coordinate Descent. J Stat Softw 33, 1-22 (2010).
67.  R Core Team. R: A language and environment for statistical computing. R
Foundation for Statistical Computing (2012).
   97

68.  Ettinger, U. et al. Effects of acute nicotine on brain function in healthy
smokers and non-smokers: estimation of inter-individual response
heterogeneity. Neuroimage 45, 549-61 (2009).
69.  Gold, M., Newhouse, P.A., Howard, D. & Kryscio, R.J. Nicotine treatment of
mild cognitive impairment: a 6-month double-blind pilot clinical trial.
Neurology 78, 1895; author reply 1895 (2012).
70.  Almeida, O.P. et al. Coronary heart disease is associated with regional grey
matter volume loss: implications for cognitive function and behaviour. Intern
Med J 38, 599-606 (2008).
71.  Gianaros, P.J., Greer, P.J., Ryan, C.M. & Jennings, J.R. Higher blood pressure
predicts lower regional grey matter volume: Consequences on short-term
information processing. Neuroimage 31, 754-65 (2006).
72.  Cox, S.R. et al. Ageing and brain white matter structure in 3,513 UK Biobank
participants. Nat Commun 7, 13629 (2016).
73.  Piumatti, G., Moore, S., Berridge, D., Sarkar, C. & Gallacher, J. The relationship
between alcohol use and long-term cognitive decline in middle and late life: a
longitudinal analysis using UK Biobank. J Public Health (Oxf) (2018).
74.  Shokri-Kojori, E., Tomasi, D., Wiers, C.E., Wang, G.J. & Volkow, N.D. Alcohol
affects brain functional connectivity and its coupling with behavior: greater
effects in male heavy drinkers. Mol Psychiatry 22, 1185-1195 (2017).
75.  Corrao, G., Rubbiati, L., Bagnardi, V., Zambon, A. & Poikolainen, K. Alcohol and
coronary heart disease: a meta-analysis. Addiction 95, 1505-23 (2000).
76.  Ronksley, P.E., Brien, S.E., Turner, B.J., Mukamal, K.J. & Ghali, W.A. Association
of alcohol consumption with selected cardiovascular disease outcomes: a
systematic review and meta-analysis. BMJ 342, d671 (2011).
77.  Kappus, N. et al. Cardiovascular risk factors are associated with increased
lesion burden and brain atrophy in multiple sclerosis. J Neurol Neurosurg
Psychiatry 87, 181-7 (2016).
78.  Gu, Y. et al. Alcohol intake and brain structure in a multiethnic elderly cohort.
Clin Nutr 33, 662-7 (2014).
79.  Luders, E., Cherbuin, N. & Gaser, C. Estimating brain age using high-resolution
pattern recognition: Younger brains in long-term meditation practitioners.
Neuroimage 134, 508-513 (2016).
   98

80.  Steffener, J. et al. Differences between chronological and brain age are related
to education and self-reported physical activity. Neurobiol Aging 40, 138-44
(2016).
81.  Neuner, B. et al. Modeling smoking history: a comparison of different
approaches in the MARS study on age-related maculopathy. Ann Epidemiol
17, 615-21 (2007).
82.  Wood, M.A., Kaptoge, S. & Butterworth, S.A. Risk thresholds for alcohol
consumption: combined analysis of individual-participant data for 599912
current drinkers in 83 prospective studies. The Lancet 391(2018).
83.  Henderson, V.W., Guthrie, J.R., Dudley, E.C., Burger, H.G. & Dennerstein, L.
Estrogen exposures and memory at midlife: a population-based study of
women. Neurology 60, 1369-71 (2003).
84.  Beeri, M.S. et al. Number of children is associated with neuropathology of
Alzheimer's disease in women. Neurobiol Aging 30, 1184-91 (2009).
85.  Heys, M. et al. Life long endogenous estrogen exposure and later adulthood
cognitive function in a population of naturally postmenopausal women from
Southern China: the Guangzhou Biobank Cohort Study.
Psychoneuroendocrinology 36, 864-73 (2011).
86.  Read, S.L. & Grundy, E.M.D. Fertility History and Cognition in Later Life. J
Gerontol B Psychol Sci Soc Sci 72, 1021-1031 (2017).
87.  Kravdal, O. Is the relationship between childbearing and cancer incidence
due to biology or lifestyle? Examples of the importance of using data on men.
Int J Epidemiol 24, 477-84 (1995).
88.  Furstenberg, F.F. Banking on families: how families generate and distribute
social capital Journal of marriage and family 67(2005).
89.  Kramarow, E.A., Lentzner, H.R., Rooks R. N., Weeks, J.D. & Saydah, S.H. Health,
United States. (1999).
90.  Ross, C.E. & Mirowsky, J. Family relationships, social support and subjective
life expectancy. J Health Soc Behav 43, 469-89 (2002).
91.  Modig, K., Talback, M., Torssander, J. & Ahlbom, A. Payback time? Influence of
having children on mortality in old age. J Epidemiol Community Health 71,
424-430 (2017).
92.  Davies, S.J., Lum, J.A., Skouteris, H., Byrne, L.K. & Hayden, M.J. Cognitive
impairment during pregnancy: a meta-analysis. Med J Aust 208, 35-40
(2018).
   99

93.  Wang, H.X., Karp, A., Winblad, B. & Fratiglioni, L. Late-life engagement in
social and leisure activities is associated with a decreased risk of dementia: a
longitudinal study from the Kungsholmen project. Am J Epidemiol 155, 1081-
7 (2002).
94.  Yen, Y.C., Yang, M.J., Shih, C.H. & Lung, F.W. Cognitive impairment and
associated risk factors among aged community members. Int J Geriatr
Psychiatry 19, 564-9 (2004).
95.  Blanchflower, D.G. & Clark, A.E. Children, Unhappiness and Family Finances:
Evidence  from  One  Million  Europeans. National  Bureau  of  Economic
Research Working Paper 25597(2019).
96.  Richter, D., Kramer, M.D., Tang, N.K.Y., Montgomery-Downs, H.E. & Lemola, S.
Long-term effects of pregnancy and childbirth on sleep satisfaction and
duration of first-time and experienced mothers and fathers. Sleep 42(2019).
97.  Magnus, M.C. et al. Number of Offspring and Cardiovascular Disease Risk in
Men and Women: The Role of Shared Lifestyle Characteristics. Epidemiology
28, 880-888 (2017).
98.  Peters, S.A., Huxley, R.R. & Woodward, M. Women's reproductive health
factors and body adiposity: findings from the UK Biobank. Int J Obes (Lond)
40, 803-8 (2016).
99.  Zhang, Z. & Hayward, M.D. Childlessness and the psychological well-being of
older persons. J Gerontol B Psychol Sci Soc Sci 56, S311-20 (2001).
100.  Ning, K., Zhao, L., Matloff, W. & Toga, A.W. Association of relative brain age
with tobacco smoking, alcohol consumption, and genetic variants. Scientific
Reports (in press).
101.  UKBiobank. Genotyping and quality control of UK Biobank, a large-scale,
extensively phenotyped prospective resource. (2015).
102.  He, K., Zhang, X., Ren, S. & Sun, J. Deep Residual Learning for Image
Recognition. arXiv:1512.03385 (2015).
103.  Purcell, S. et al. PLINK: a tool set for whole-genome association and
population-based linkage analyses. Am J Hum Genet 81, 559-75 (2007).
104.  Pountney, D.L., Raftery, M.J., Chegini, F., Blumbergs, P.C. & Gai, W.P. NSF, Unc-
18-1, dynamin-1 and HSP90 are inclusion body components in neuronal
intranuclear inclusion disease identified by anti-SUMO-1-immunocapture.
Acta Neuropathol 116, 603-14 (2008).
   100

105.  Goedert, M. NEURODEGENERATION. Alzheimer's and Parkinson's diseases:
The prion concept in relation to assembled Abeta, tau, and alpha-synuclein.
Science 349, 1255555 (2015).
106.  Ebbert, M.T. et al. Population-based analysis of Alzheimer's disease risk
alleles implicates genetic interactions. Biol Psychiatry 75, 732-7 (2014).
107.  Escott-Price, V. et al. Common polygenic variation enhances risk prediction
for Alzheimer's disease. Brain 138, 3673-84 (2015).
108.  Kong, D. et al. Predicting Alzheimer's Disease Using Combined Imaging-
Whole Genome SNP Data. J Alzheimers Dis 46, 695-702 (2015).
109.  Zhang, Z., Huang, H., Shen, D. & Alzheimer's Disease Neuroimaging, I.
Integrative analysis of multi-dimensional imaging genomics data for
Alzheimer's disease prediction. Front Aging Neurosci 6, 260 (2014).
110.  Montembeault, M., Rouleau, I., Provost, J.S., Brambati, S.M. & Alzheimer's
Disease Neuroimaging, I. Altered Gray Matter Structural Covariance
Networks in Early Stages of Alzheimer's Disease. Cereb Cortex 26, 2650-62
(2016).
111.  Delbeuck, X., Van der Linden, M. & Collette, F. Alzheimer's disease as a
disconnection syndrome? Neuropsychol Rev 13, 79-92 (2003).
112.  Sankari, Z. & Adeli, H. Probabilistic neural networks for diagnosis of
Alzheimer's disease using conventional and wavelet coherence. J Neurosci
Methods 197, 165-70 (2011).
113.  Ribeiro, M.T., Singh, S. & Guestrin, C. Why should i trust you?: Explaining the
predictions of any classifier. Proceedings of the 22nd ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, 1135-
1144 (2016).
114.  Zeiler, M.D. & Fergus, R. Visualizing and understanding convolutional
networks. European conference on computer vision, 818-833 (2014).
115.  Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks.
arXiv:1703.01365 (2017).
116.  Tsang, M., Cheng, D. & Liu, Y. Detecting Statistical Interactions from Neural
Network Weights. arXiv:1705.04977 (2017).
117.  Saunders, A.M. et al. Association of apolipoprotein E allele epsilon 4 with late-
onset familial and sporadic Alzheimer's disease. Neurology 43, 1467-72
(1993).
   101

118.  Miriam Hartig, D.T.-S., Sky Raptentsetsang, Alix Simonson, Adam Mezher,
Norbert Schuff, Michael Weiner. UCSF FreeSurfer Methods. (2014).
119.  Howie, B., Marchini, J. & Stephens, M. Genotype imputation with thousands of
genomes. G3 (Bethesda) 1, 457-70 (2011).
120.  Howie, B.N., Donnelly, P. & Marchini, J. A flexible and accurate genotype
imputation method for the next generation of genome-wide association
studies. PLoS Genet 5, e1000529 (2009).
121.  Ripley, B.D. Pattern recognition and neural networks. Cambridge university
press (1996).
122.  Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the
Royal Statistical Society. Series B 58, 267-288 (1996).
123.  Vedaldi, A. & Lenc, K. MatConvNet: Convolutional neural networks for
MATLAB. Proceeding of the ACM Int. Conf. on Multimedia (2015).
124.  Gevrey, M., Dimopoulos, I. & Lek, S. Review and comparison of methods to
study the contribution of variables in artificial neural network models.
Ecological modelling 160, 249-264 (2003).
125.  Jack, C.R., Jr. et al. Age-specific population frequencies of cerebral beta-
amyloidosis and neurodegeneration among people with normal cognitive
function aged 50-89 years: a cross-sectional study. Lancet Neurol 13, 997-
1005 (2014).
126.  Scheltens, P. et al. Atrophy of medial temporal lobes on MRI in "probable"
Alzheimer's disease and normal ageing: diagnostic value and
neuropsychological correlates. J Neurol Neurosurg Psychiatry 55, 967-72
(1992).
127.  Risacher, S.L. et al. Baseline MRI predictors of conversion from MCI to
probable AD in the ADNI cohort. Curr Alzheimer Res 6, 347-61 (2009).
128.  Sommer, T., Rose, M., Weiller, C. & Buchel, C. Contributions of occipital,
parietal and parahippocampal cortex to encoding of object-location
associations. Neuropsychologia 43, 732-43 (2005).
129.  Leech, R. & Sharp, D.J. The role of the posterior cingulate cortex in cognition
and disease. Brain 137, 12-32 (2014).
130.  Huang, K.L. et al. A common haplotype lowers PU.1 expression in myeloid
cells and delays onset of Alzheimer's disease. Nat Neurosci 20, 1052-1061
(2017).
   102

131.  Kreisl, W.C. et al. In vivo radioligand binding to translocator protein
correlates with severity of Alzheimer's disease. Brain 136, 2228-38 (2013).
132.  Geman, S., Bienenstock, E. & Doursat, R. Neural networks and the
bias/variance dilemma. Neural computation (1992).
133.  Olden, J.D., Joy, M.K. & Death, R.G. An accurate comparison of methods for
quantifying variable importance in artificial neural networks using simulated
data. Ecological Modelling 178, 389-397 (2004).
134.  Schroth, G., Naegele, T., Klose, U., Mann, K. & Petersen, D. Reversible brain
shrinkage in abstinent alcoholics, measured by MRI. Neuroradiology 30, 385-
9 (1988).
135.  Wittbrodt, M.T., Sawka, M.N., Mizelle, J.C., Wheaton, L.A. & Millard-Stafford,
M.L. Exercise-heat stress with and without water replacement alters brain
structures and impairs visuomotor performance. Physiol Rep 6, e13805
(2018).
136.  Schulz, M. et al. Deep learning for brains?: Different linear and nonlinear
scaling in UK Biobank brain images vs. machine-learning datasets. bioRxiv
(2019).
137.  Larson, E.B. et al. Exercise is associated with reduced risk for incident
dementia among persons 65 years of age and older. Ann Intern Med 144, 73-
81 (2006).
138.  Kramer, A.F., Erickson, K.I. & Colcombe, S.J. Exercise, cognition, and the aging
brain. J Appl Physiol (1985) 101, 1237-42 (2006).
139.  Schnack, H.G. et al. Accelerated Brain Aging in Schizophrenia: A Longitudinal
Pattern Recognition Study. Am J Psychiatry 173, 607-16 (2016).
140.  Preische, O. et al. Serum neurofilament dynamics predicts neurodegeneration
and clinical progression in presymptomatic Alzheimer's disease. Nat Med 25,
277-283 (2019).
141.  Sabri, O. et al. Florbetaben PET imaging to detect amyloid beta plaques in
Alzheimer's disease: phase 3 study. Alzheimers Dement 11, 964-74 (2015).
142.  Schindler, S.E. et al. High-precision plasma beta-amyloid 42/40 predicts
current and future brain amyloidosis. Neurology 93, e1647-e1659 (2019).
143.  Shin, H.C. et al. Deep Convolutional Neural Networks for Computer-Aided
Detection: CNN Architectures, Dataset Characteristics and Transfer Learning.
IEEE Trans Med Imaging 35, 1285-98 (2016). 
Abstract (if available)
Abstract As our population continues to age, the number of people who experience cognitive decline and face increased risk of neurodegenerative diseases also grows. To preserve cognitive function and prevent aging related diseases, it is imperative that we first identify and understand the lifestyle, environmental or genetic factors that are associated with brain aging. In this thesis, we explore brain magnetic resonance imaging data, clinical data, and genetic data of both cognitively healthy subjects and patients with Alzheimer's disease, with the goal of using multimodal data to understand brain aging. Through studying cognitively healthy subjects, we quantified the association between brain aging and multiple lifestyle factors and genetic factors. Through studying subjects with Alzheimer's disease, we trained a statistical model that captured important brain and genetic features associated with the disease, which can accurately predict the disease risk of mild cognitive impaired subjects. Our results help to set a few potential directions for decelerating brain aging, such as providing guidelines for a brain-friendly lifestyle and offering dementia prediction at an early stage. 
Linked assets
University of Southern California Dissertations and Theses
doctype icon
University of Southern California Dissertations and Theses 
Action button
Conceptually similar
Alzheimer’s disease: dysregulated genes, ethno-racial disparities, and environmental pollution
PDF
Alzheimer’s disease: dysregulated genes, ethno-racial disparities, and environmental pollution 
The association of cerebrovascular disease risk factors with brain structure and its modification by genetic variation
PDF
The association of cerebrovascular disease risk factors with brain structure and its modification by genetic variation 
Neuroimaging markers of risk & resilience to brain aging and dementia
PDF
Neuroimaging markers of risk & resilience to brain aging and dementia 
Using neuroinformatics to identify genomic and proteomic markers of suboptimal aging and Alzheimer's disease
PDF
Using neuroinformatics to identify genomic and proteomic markers of suboptimal aging and Alzheimer's disease 
Data modeling approaches for continuous neuroimaging genetics
PDF
Data modeling approaches for continuous neuroimaging genetics 
Neuroimaging in complex polygenic disorders
PDF
Neuroimaging in complex polygenic disorders 
Feature engineering and supervised learning on metagenomic sequence data
PDF
Feature engineering and supervised learning on metagenomic sequence data 
Investigating the evolution of gene networks through simulated populations
PDF
Investigating the evolution of gene networks through simulated populations 
Air pollution neurotoxicity throughout the lifespan: studies on the mechanism of toxicity and interactions with effects of sex and genetic background
PDF
Air pollution neurotoxicity throughout the lifespan: studies on the mechanism of toxicity and interactions with effects of sex and genetic background 
Exploring the genetic basis of complex traits
PDF
Exploring the genetic basis of complex traits 
Vascular contributions to brain aging along the Alzheimer's disease continuum
PDF
Vascular contributions to brain aging along the Alzheimer's disease continuum 
Learning to diagnose from electronic health records data
PDF
Learning to diagnose from electronic health records data 
Investigating brain aging and neurodegenerative diseases through omics data
PDF
Investigating brain aging and neurodegenerative diseases through omics data 
Neuroinflammation and ApoE4 genotype in at-risk female aging: implications for Alzheimer's disease
PDF
Neuroinflammation and ApoE4 genotype in at-risk female aging: implications for Alzheimer's disease 
Novel multi-site brain imaging approaches to map HIV-related neuropathology
PDF
Novel multi-site brain imaging approaches to map HIV-related neuropathology 
Using insoluble proteomics to study changes in proteostasis during aging and neurodegenerative disease
PDF
Using insoluble proteomics to study changes in proteostasis during aging and neurodegenerative disease 
Model selection methods for genome wide association studies and statistical analysis of RNA seq data
PDF
Model selection methods for genome wide association studies and statistical analysis of RNA seq data 
Computational algorithms for studying human genetic variations -- structural variations and variable number tandem repeats
PDF
Computational algorithms for studying human genetic variations -- structural variations and variable number tandem repeats 
Independent and interactive effects of depression genetic risk and household socioeconomic status on emotional behavior and brain development
PDF
Independent and interactive effects of depression genetic risk and household socioeconomic status on emotional behavior and brain development 
Characterization of lenticulostriate arteries using high-resolution black blood MRI as an early imaging biomarker for vascular cognitive impairment and dementia
PDF
Characterization of lenticulostriate arteries using high-resolution black blood MRI as an early imaging biomarker for vascular cognitive impairment and dementia 
Action button
Asset Metadata
Creator Ning, Kaida (author) 
Core Title Characterizing brain aging with neuroimaging, health, and genetic data 
School College of Letters, Arts and Sciences 
Degree Doctor of Philosophy 
Degree Program Computational Biology and Bioinformatics 
Publication Date 05/04/2020 
Defense Date 01/09/2020 
Publisher University of Southern California (original), University of Southern California. Libraries (digital) 
Tag brain imaging,brain MRI,convolutional neural networks,deep learning,environment,genetics,human brain aging,lifestyle,OAI-PMH Harvest,survey data 
Language English
Contributor Electronically uploaded by the author (provenance) 
Advisor Toga, Arthur W. (committee chair), Kim, Hosung (committee member), Sun, Fengzhu (committee member) 
Creator Email kaidanin@usc.edu,ning.kaida@gmail.com 
Permanent Link (DOI) https://doi.org/10.25549/usctheses-c89-297236 
Unique identifier UC11663336 
Identifier etd-NingKaida-8423.pdf (filename),usctheses-c89-297236 (legacy record id) 
Legacy Identifier etd-NingKaida-8423.pdf 
Dmrecord 297236 
Document Type Dissertation 
Rights Ning, Kaida 
Type texts
Source University of Southern California (contributing entity), University of Southern California Dissertations and Theses (collection) 
Access Conditions The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law.  Electronic access is being provided by the USC Libraries in agreement with the a... 
Repository Name University of Southern California Digital Library
Repository Location USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
brain imaging
brain MRI
convolutional neural networks
deep learning
environment
genetics
human brain aging
lifestyle
survey data