Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Biological geometry-aware deep learning frameworks for enhancing medical cyber-physical systems
(USC Thesis Other)
Biological geometry-aware deep learning frameworks for enhancing medical cyber-physical systems
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Biological Geometry-aware Deep Learning Frameworks for
Enhancing Medical Cyber-Physical Systems
by
Chenzhong Yin
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
May 2025
Copyright 2025 Chenzhong Yin
To my parents and grandparents.
ii
Acknowledgements
First and foremost, I would like to express my deepest gratitude to my Ph.D. advisor and mentor,
Professor Paul Bogdan. Since our very first meeting before I joined his research group in May
2018, he has been unwaveringly supportive. One of the qualities I most admire in Paul is his ability
to think outside the box and tackle problems using non-conventional approaches—a mindset he has
consistently encouraged me to adopt throughout my doctoral studies. Paul’s constant motivation
have propelled me to explore new and interdisciplinary problems, which significantly shaped the
nature of my thesis work. He has always pushed me to go a step further and has been the backbone
of my academic journey. His remarkable approachability eliminated any communication barriers;
I could always reach out to him, whether in person or via email, and he would often respond
and provide suggestions even in the middle of the night. His dedication is further exemplified by
the numerous novel ideas he shared with me, greatly enriching my research experience. I feel
incredibly fortunate to have been advised by such an enthusiastic and hard-working individual.
One invaluable lesson I’ve learned from him is to persist and keep my spirits high, especially in
the face of constant failures. During difficult times, Paul provided emotional support like a friend
and family member. For example, when I was struggling with experimental results, he took the
time to share his own experiences with setbacks, offering both practical solutions and heartfelt
encouragement that helped me regain my confidence.
I would like to express my heartfelt gratitude to Professor Andrei Irimia for his invaluable
mentorship during my Ph.D. journey. Prof. Irimia helped me develop and refine my research
ideas, offering valuable feedback and guidance on my research directions. With his keen sense
of identifying meaningful and important research topics, he skillfully organized my ideas and
experimental results into coherent projects. I will always remember the times when I proposed
iii
unconventional ideas in his lab; he would thoughtfully assess them, grounding my thoughts into
specific research topics and providing wise direction on how to proceed. His support has been
instrumental in shaping my research and academic growth.
I would also like to express my sincere gratitude to my committee members, Professor Jyotirmoy Deshmukh and Professor Antonio Ortega, for their valuable feedback on my thesis work. I
am grateful to Professor Viktor Prasanna, a member of my Qualifying Examination committee,
for his insightful suggestions that led to significant improvements in my dissertation. Their encouragement and support have been a great source of motivation, reinforcing my determination to
complete my Ph.D. thesis.
I extend my heartfelt gratitude to Professor Shahin Nazarian for giving me the opportunity
to assist in teaching EE 453: Computing Platforms and Paradigms. Throughout this experience, I
was profoundly impressed by his exceptional programming skills and his unwavering dedication to
teaching. His commitment to student learning has been a great source of inspiration, significantly
influencing my own approach to education and mentorship.
I am deeply grateful to the research group at USC for providing me with the opportunity to
meet some of the most wonderful and hardworking people. I would like to thank Gaurav Gupta,
who mentored me extensively in both subject knowledge and paper organization skills during my
early years in the group. My thanks also go to Valeriu Balaban, the first mentor who patiently
taught me Python coding skills. I am thankful to Ruochen Yang for being my labmate and lunch
companion; most of the time, we were the only two present in the office. I would like to thank Yao
Xiao, with whom I submitted my first research paper. I also thank Mohamed Ridha Znaidi, Emily
A. Reed, Panagiotis Kyriakis, and Jayson Sia, with whom I had wonderful discussions during our
lengthy group meetings and beyond. I am grateful to Anzhe Cheng, Zhenkun Wang, Xinghe Chen,
and Heng Ping for collaborating with me and staying up late on research papers. I would especially
like to thank Xiongye Xiao. As we both joined the lab at the same time, he is not only a coworker
but also a very close friend; we have always supported each other both mentally and academically.
iv
I am deeply grateful to my co-authors, whose collaboration and invaluable insights have significantly contributed to the success of my research and publications. I especially want to thank
Phoebe Imms, Nahian F. Chowdhury, Nikhil N. Chaudhari, Roy J. Massett, Haoqing Wang, Anar
Amgalan, Mikhail E. Kandel, Young Jae Lee, Defu Cao, Genshuo Liu, Zhihong Pan, Professor Radu Balan, Professor Paul M. Thompson, Professor Gabriel Popescu, Professor Alexander
Niculescu, Professor Mihai Udrescu, Professor Lucretia Udrescu, Professor Stefan Mihaicuta, and
Professor David M. Mannino for their dedication, guidance, and encouragement throughout this
journey. Your expertise and commitment have been instrumental in shaping the impact of our joint
efforts, and I feel privileged to have worked alongside such exceptional colleagues.
I would like to extend my heartfelt gratitude to Mingxi Cheng, who was the first to open the
door to the field of artificial intelligence for me. Her unwavering support and insightful guidance have been instrumental throughout my PhD journey. Mingxi patiently mentored me in both
coding and the foundational knowledge of AI, providing me with the skills and confidence necessary to navigate complex research challenges. Her expertise and dedication not only deepened
my understanding of AI-based research but also inspired me to pursue this as the main focus of
my dissertation. Beyond her technical mentorship, Mingxi fostered a stimulating and collaborative research environment, encouraging me to think critically and creatively. Her encouragement
and belief in my potential have been a source of motivation and perseverance, for which I am
profoundly grateful.
I would like to express my sincere gratitude to my friends Haofan Sun, Yuhao Wnag, Luyang
Liu, Li Ding, and Li Wang. Their presence has added vibrancy and joy to my Ph.D. journey, and
although we are not always in the same location, our enduring friendships have provided me with
unwavering support and encouragement. I am deeply thankful to my friend Bingyi Zhang, who
has been an exceptional roommate and confidant. Bingyi has not only shared countless memorable
moments with me but has also assisted me academically by teaching me various course materials.
His companionship and intellectual support have been invaluable, and I could not have hoped for
a better friend and partner throughout this journey.
v
Finally, I would like to express my deepest gratitude to my parents, Xiuhua Xu and Huimin
Yin. Their unwavering support has been the bedrock of my five-year Ph.D. journey, allowing me to
dedicate myself fully to my studies without the burden of financial concerns. Beyond their material
assistance, they have been my steadfast anchors during times of adversity, providing cons Their
selfless love and unshakeable belief in my abilities have been invaluable, inspiring me to overcome
numerous challenges and strive for excellence. My parents have always celebrated my successes
and offered comfort during moments of disappointment, instilling in me a sense of stability and
peace that has been crucial to my well-being. I am profoundly grateful for their constant presence
and enduring faith in me. Their sacrifices and unwavering support have made this achievement
possible, and I dedicate this milestone to them with all my love and heartfelt appreciation.
vi
Table of Contents
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiv
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Challenges and Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Chapter 2: Interpretable deep learning based brain age prediction architecture . . . . . . . . 7
2.1 Introduction to brain age prediction and deep learning . . . . . . . . . . . . . . . . 7
2.2 Neuroanatomic patterns of aging . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Associations with neurocognitive endophenotypes . . . . . . . . . . . . . . . . . . 13
2.4 3D-CNN benchmarking & evaluation . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Interpretable deep learning methods . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5.1 Participants and neuroimageing . . . . . . . . . . . . . . . . . . . . . . . 16
2.5.2 Neurocognitive measures and MRI preprocessing . . . . . . . . . . . . . . 17
2.5.3 3D-CNN architecture and 3D-CNN training . . . . . . . . . . . . . . . . . 19
2.5.4 Saliency maps analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5.5 Data and code availability statement . . . . . . . . . . . . . . . . . . . . . 22
2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6.1 Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6.2 Sex differences in anatomic brain aging . . . . . . . . . . . . . . . . . . . 24
2.6.3 Anatomy changes according to neurocognitive status . . . . . . . . . . . . 24
2.6.4 Comparison to other methods . . . . . . . . . . . . . . . . . . . . . . . . 26
2.6.5 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Chapter 3: Raising the limit of image rescaling using auxiliary encoding . . . . . . . . . . 30
3.1 Introduction to super-resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2 Proposed methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.1 IRN-alpha architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
vii
3.2.2 IRN-meta architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.1 Ablation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.2 Image rescaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Chapter 4: Fractional dynamics foster deep learning of COPD stage prediction . . . . . . . 39
4.1 Introduction to COPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Recorded phyiological signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3 Multifractal detrended fluctuation analysis . . . . . . . . . . . . . . . . . . . . . . 43
4.4 Fractal properties of physiological signals . . . . . . . . . . . . . . . . . . . . . . 44
4.5 Fractional dynamics modeling of COPD relevant
physiological signals subject to unknown perturbations . . . . . . . . . . . . . . . 49
4.6 Fractional Dynamics Deep Learning Prediction of COPD stages . . . . . . . . . . 52
4.6.1 K-fold cross-validation results . . . . . . . . . . . . . . . . . . . . . . . . 53
4.6.2 Hold-out validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.6.3 Transfer learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Chapter 5: Deciphering network science in brain-derived neuronal cultures . . . . . . . . . 66
5.1 Introduction to NCN evolution and properties . . . . . . . . . . . . . . . . . . . . 67
5.2 Neuronal interconnections exhibit assortative behavior . . . . . . . . . . . . . . . 68
5.3 Optimizing Flow & Robustness in Neuronal Networks . . . . . . . . . . . . . . . 73
5.4 Clustering analysis in NCN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.5 Multifractal network analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Chapter 6: Enhancing neural network performance with leader-follower architecture and
local error signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.1.1 Leader-Follower Neural Networks with BP (LFNNs) . . . . . . . . . . . . 89
6.1.2 BP-free Leader-Follower Neural Networks (LFNN-ℓs) . . . . . . . . . . . 93
6.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Chapter 7: Conclusion and future directions . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
A Interpretable deep learning based brain age prediction
architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
A.1 Cognitive measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
A.1.1 CamCAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
A.1.2 ADNI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
A.2 Supplementary plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
viii
A.3 Correlations with neurocognitive function . . . . . . . . . . . . . . . . . . 146
B Fractional dynamics foster deep learning of COPD stage
prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
B.1 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
B.1.1 WestRo COPD dataset . . . . . . . . . . . . . . . . . . . . . . . 152
B.1.2 WestRo Porti COPD dataset . . . . . . . . . . . . . . . . . . . . 154
B.2 Multifractal detrended fluctuation analysis . . . . . . . . . . . . . . . . . . 156
B.3 Neural network architecture for the WestRo COPD dataset . . . . . . . . . 157
B.4 Challenges and limitations of spirometry in COPD . . . . . . . . . . . . . 160
B.5 Definition of COPD stages . . . . . . . . . . . . . . . . . . . . . . . . . . 162
B.6 Early COPD stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
C Deciphering Network Science in Brain-Derived Neuronal
Cultures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
C.1 Sample preparation & Microscopy . . . . . . . . . . . . . . . . . . . . . . 164
C.2 Networks centrality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
C.3 Clustering Coefficient. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
C.4 Multifractal analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
ix
List of Tables
2.1 Participant demographics. Sample size, descriptive statistics (minimum, maximum, mean µ, and standard deviation σ), the male-to-female (M:F) sex ratio, and
breakdown by FreeSurfer version used for preprocessing. Demographics are listed
for each repository and neurological/cognitive status. . . . . . . . . . . . . . . . . 17
3.1 Comparison of 4× upscaling results using different IRN-A hyperparameters and
settings. The best results are highlighted in red. . . . . . . . . . . . . . . . . . . . 35
3.2 Comparison of 4× upscaling results using different IRN-M hyperparameters and
settings. The best results are highlighted in red. . . . . . . . . . . . . . . . . . . . 35
3.3 Quantitative results of upscaled ×4 images of 5 datasets across different bidirectional rescaling approaches. The best two results highlighted in red and blue respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.1 The COPD stage predicting results for test set with our Fractional Dynamics Deep
Learning Model (FDDLM). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.1 The assortativity coefficient for neuronal culture networks and neuronal culture
cluster networks in consecutive snapshots. . . . . . . . . . . . . . . . . . . . . . 73
5.2 The parameters of multifractal spectrum . . . . . . . . . . . . . . . . . . . . . . . 80
6.1 Comparison between the proposed model and a set of BP-enabled and BP-free
algorithms under MNIST, CIFAR-10. The best test errors (%) are highlighted in
bold. Leadership size is set to 70% for all the LFNNs and LFNN-ℓs. . . . . . . . . 93
x
6.2 Error rate (% ↓) results of LFNNs and LFNN-ℓs (with different leadership percentage) on Tiny ImageNet and ImageNet subset. We also trained CNN counterparts
(without LF hierarchy) with BP and global loss for reference. The test error rates
of BP-enabled CNNs under Tiny ImageNet and ImageNet subset are 35.76% and
51.62%, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.3 Error rates (% ↓) on CIFAR-10 (a) and Tiny-ImageNet (b) with all baseline models
with 4 blocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.4 Error rates (% ↓), speedup ratios (↑), and the number of parameters (↓) compared
among different methods on ResNet families and ViT, each with 4 blocks, when
applied to ImageNet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.5 Training time (↓) in hours, compared across different methods for ResNet families
and ViT, each with 4 blocks, when applied to ImageNet. . . . . . . . . . . . . . . 98
7.1 Correlations between neurocognitive measures and BA or CA for CN participants
from CamCAN. Spearman’s rank correlation coefficients rS, p-values for the null
hypotheses H0: rS = 0, and power values 1−β for each correlation are provided.
Degrees of freedom are N − 2 for all. Fisher’s z- and p-values are listed for the
comparison of correlations between cognitive/neural measures with CA and BA;
the null hypothesis was H0: rS(CA) = rs(BA). p-values in bold are significant after
FDR correction. Abbreviations: N = sample size; SE = standard error; CIL = lower
limit of confidence interval; CIU = upper limit of the confidence interval; RT =
response time; ToT = tip-of-the-tongue; VSTM = visual short term memory. . . . 147
7.2 Same as Table 7.1, for CN participants in ADNI. Abbreviations: CDRSB = clinical
dementia rating sum of boxes; ADAS = Alzheimer’s disease assessment scale;
MMSE = mini-mental state exam; RAVLT = Rey auditory verbal learning test;
RAVLT P = RAVLT percent forgetting; RAVLT IR = RAVLT immediate recall;
FAQ = functional activities questionnaire. . . . . . . . . . . . . . . . . . . . . . . 148
7.3 Same as Table 7.2, for patients with MCI. . . . . . . . . . . . . . . . . . . . . . . 149
xi
7.4 Same as Table 7.2, for patients with AD. . . . . . . . . . . . . . . . . . . . . . . 150
7.5 Same as Table 7.2, for patients with CI (MCI and AD combined). . . . . . . . . . 151
7.6 Essential information of all the COPD patients in our dataset which include medical center; COPD stage; COPD onset; age; gender; smoking status; body mass
index (BMI); standard questionnaires (CAT—COPD assessment test, mMRC—
modified Medical Research Council dyspnea scale), exacerbation history, and comorbidities (cardiometabolic (CC), cancer (CA), metabolic (MC), psychiatric (PC),
and renal (RD)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.7 Complexity and prediction performance for the WestRo COPD dataset across different deep learning models under k-fold validation (k = 5): fractional dynamics
deep learning model (FDDLM), Vanilla deep neural network (DNN), long shortterm memory (LSTM), and convolutional neural network (CNN). ↑ / ↓ indicates
higher/lower values are better. All results are evaluated on the same machine for
fair comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
xii
List of Figures
2.1 Overview of BA estimation by an interpretable 3D-CNN. (A) Proportions of participants in the aggregate dataset (ADNI, UKBB, CamCAN, and HCP), where each
human symbol represents ∼300 participants. (B) T1-weighted MRIs were skullstripped and 3D saliency probability maps were generated from 3D-CNN output
for each subject. (C) Prior to BA estimation using the 3D-CNN, participants were
split by sex and assigned randomly into training and test sets. MAE was used to
evaluate 3D-CNN performance from BA estimation results for test sets. The test
set’s CA histogram is displayed in an inset. (D) The 3D-CNN’s input consists of
T1-weighted MRIs, and its output are BA estimates. Saliency maps are extracted
from 3D-CNN output after training. A dropout rate of 0.3 is used in all dropout
layers, and a ReLU activation function is used in all convolutional and dense layers. xi
is the feature map for input i and wi
is its weight. (E) Sample sizes for
participants with neurocognitive measures. . . . . . . . . . . . . . . . . . . . . . 9
xiii
2.2 Comparison of brain saliency maps across sexes and diagnoses. (A) Sex-specific
mean saliency maps (PM, PF) and the sex dimorphism map ∆P = (PM −PF)/[(PF +
PM)/2] of CN participants. In all cases, canonical cortical views (sagittal, axial,
and coronal) are displayed in radiological convention. Higher saliencies (brighter
regions) indicate neuroanatomic locations whose voxels contribute more to BA
estimation. Regions drawn in red have higher saliencies in males (PM > PF);
the reverse (PF > PM) is true for regions drawn in blue. (B) Canonical views of
the sex dimorphism map ∆P for CN participants. Sex-specific deviations of ∆P
from its mean across sexes are expressed as percentages of the mean. Red indicates that ∆PM > ∆PF, i.e., males have higher saliency; blue indicates the reverse
(∆PF > ∆PM), i.e., females have higher saliency. (C) Like (A), for the comparison
between CN participants and participants with CI, where ∆P = (PCI −PCN)/PCN;
red indicates PCI > PCN, blue indicates PCN > PCI. (D) Like (B), for the saliency
difference ∆P between CN and CI participants. Images are displayed in radiological orientation convention (the right hand side of the reader is the left hand side of
the participant, and vice versa). . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
xiv
2.3 Correlations between neurocognitive measures and both estimated BA and CA.
Results are depicted for two independent test sets: CamCAN and ADNI. (A) displays CN participants from CamCAN, (B) displays CN participants from ADNI,
(C) displays results only for participants with MCI, and (D) displays results for
participants with either MCI or AD. For each independent test set, the sample size
for each neurocognitive measure listed below the measure name. Bar charts depict
Spearman’s correlations rS (along x) between BA (green) or CA (red) and each
neurocognitive measure (along y). Bars are contoured in black if rS is significant.
Error bar widths equate to one standard error of the mean. For each neurocognitive
measure, the corresponding bar pair is annotated with Fisher’s z statistic. Asterisks
indicate neurocognitive measures for which the difference in Spearman’s correlations rS(BA)−rS(CA) is significant. . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Radar plots of sex-specific MAEs and performance parameters. Radar plots of
MAE, R
2
and performance parameters (average ET and the number of trainable
parameters) according to sex and diagnostic status (CN: UKBB, CamCAN; MCI
or AD: ADNI). The SFCN of Gong et al. [36, 37] (purple) is compared to our 3DCNN (blue). To facilitate simultaneous comparison, all values are normalized to
range from 0 to 1, where the maximum value in each measurement was rescaled
as 1 and 0 remained as 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1 Illustration of invertible image rescaling network architecture: (a) RGBA approach
and (b) metadata approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 Visual examples from Urban100 test set (Best viewed in online version with zoomin). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
xv
4.1 Overview of the proposed method for COPD stage prediction: (a) based on the
medical observations from the latest research in the field, we identify the physiological signals with relevance in COPD and measure them, (b) we record such
physiological signals with a medical sensor network—NOX T3™ portable sleep
monitor [127]. An example of a physiological signal (Abdomen) recorded from a
stage 4 COPD patient is shown in Figure 1 (c). Figure 1 (d) and (e) summarize
the multifractal analysis in terms of scaling function and generalized Hurst exponent. (In (c-e), we intend to present the case) (f) We employ the analysis of the
fractional-order dynamics to extract the signatures of the signals as coupling matrices and fractional-order exponents and use these signatures (along with expert
diagnosis) to train a deep neural network that can identify COPD stages. . . . . . . 42
4.2 The geometry of fluctuation profiles for the COPD-relevant physiological signals
recorded from a normal abdomen and a stage 4 COPD abdomen. We calculate the
scaling functions from 6 raw physiological signals: Abdomen, Thorax, Oxygen
Saturation, Plethysmograph, Nasal Pressure, and Pulse, where the exponents are
q ∈ [-5, 5]. Panels (a-c) and (g-i) with signals recorded from a healthy person, (df), and (j-l) with signals recorded from a representative stage 4 COPD patient. The
resulting points (pink nodes) of the multifractal scaling function with power-law
scaling will converge to a focus point (dark purple nodes) at the largest scale (L) if
the tested signal has multifractal features. . . . . . . . . . . . . . . . . . . . . . . 48
4.3 Multifractal analysis of 6 physiological signals from healthy people (stage 0)
and stage 4 COPD patients with 95% confidence interval: Generalized Hurst
exponent H(q) as a function of q-th order moments (where q values are discretely
extracted from −5 to 5) for physiological signals (Abdomen (a), Thorax (b), Oxygen Saturation (c), Plethysmograph (d), Nasal Pressure (e), and Thorax (f)) extracted from healthy people (stage 0) and severe COPD patients (stage 4). . . . . . 49
xvi
4.4 Comparison of Wasserstein distance between the distributions of mean H(q) curves
across different COPD stages (i.e., H(q) curves in Figure 4.3, Figure S2, and Figure S3). Comparison in terms of Wasserstein distance between the H(q) distribution function for Abdomen (a), Thorax (b), Pulse (c), Nasal Pressure (d), Oxygen
Saturation (e), and Plethysmograph (f) signals recorded from patients across all the
COPD stages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.5 Network formulation for FDDLM: (a) Basic structure of an artificial neuron model;
(b) Overview of the neural network model we trained in FDDLM to identify COPD
stages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.6 Training and testing result comparisons (accuracy (a,d,g), AUROC (b,e,h), and
loss (c,f,i)) of different deep learning models for the k-fold cross-validation. Training/testing accuracy (a), AUROC (b), and loss (c) for our FDDLM, where the training processes use signal signatures extracted with the fractional dynamic mathematical model. Training/testing accuracy, AUROC, and loss for the Vanilla DNN
model (d-f) and the LSTM model (g-i), where the training processes use the physiological signals recorded with portable sleep monitors (raw data). Both Vanilla
DNN and LSTM models share similar network structures and computation parameters with our FDDLM (Vanilla DNN has the same network structure as our model,
except the input size). We obtain these results with the k-fold cross-validation
(k = 5). We also show the confusion matrices for test set across different models:
FDDLM in panel (j), Vanilla DNN in (k), and LSTM in (l). . . . . . . . . . . . . . 55
xvii
4.7 Training and testing result comparisons of different deep learning models for the
hold-out validation. The training/testing accuracy (a), AUROC (b), and loss (c)
for our FDDLM, where the training processes use signal signatures extracted with
the fractional dynamic mathematical model. The training/test accuracy (d) and (g),
AUROC (e) and (h), and loss (f) and (i), for Vanilla DNN model and LSTM model,
where the training processes use the physiological signals recorded with portable
sleep monitors. Both Vanilla DNN and LSTM models share similar network structures with FDDLM (i.e., same neural network structure but different the input size).
We obtain these results by holding out data from every single institution as the test
set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.8 The comparison of confusion matrices resulted from different deep learning models: fractional dynamics (a-d), Vanilla DNN (e-h), and LSTM (i-l). We built the test
sets by holding out data gathered from one institution (i.e., VB, MD1, MD2, and
CP) at a time. The matrix representations clearly show that our model outperforms
both Vanilla DNN and LSTM—in all experiments and for all labels representing
COPD stages—in terms of prediction errors. . . . . . . . . . . . . . . . . . . . . . 59
5.1 Layouts for neuronal culture networks at three representative time points. Neurons
at the start of the experiment at time t = 0 hours (a), t = 7 hours (b) and the end of
the experiment t = 14 hours (c). The magnification zoom of the neurons at t = 0
hours (d), t = 7 hours (e) and t = 14 hours (f). Figures 1(g), 1(h) and 1(i) show the
identified neurons and their connections obtained with our algorithm (see Methods section on "Cell segmentation and neural tracing") for the three corresponding
time points (each neuron and neurite is identified by a unique color). After constructing the adjacency matrices from the tracing and segmentation algorithm, the
visualization of the network layouts at t = 0 hours (j), 7 hours (k) and 14 hours (l)
are presented. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
xviii
5.2 Degree distribution for neurons and neural clusters. An artificial network example
(a), where the yellow node has the highest degree centrality, the red node has the
highest closeness centrality and the green node has the highest betweenness centrality (with solid lines and dotted lines). The node-to-node degree distribution for
the network example (a) without (b) and with (c) additional dotted lines in order to
mimic the connectivity phenomena observed in neural cluster networks,where the
color bars present the occurrence frequency of the node-to-node matrices and the
red cycles with connection pairs represent the coordinate of the peak values in the
matrix. The node-to-node degree distribution for the neuronal culture networks at
the start of the experiment t = 0 hours (d), after 7 hours (e), and the end of the experiment after 14 hours (f). The node-to-node degree distribution for the neuronal
culture cluster networks at the start of the experiment t = 0 hours (g), after 7 hours
(h), and at the end of the experiment after 14 hours (i). . . . . . . . . . . . . . . . 71
5.3 Investigate the changes of degree-, closeness-, and betweenness-centrality in consecutive neuronal culture networks and neuronal culture cluster networks. The
CDF curves of the degree centrality (a), closeness centrality (b) and betweenness
centrality (c) for neuronal culture networks for three times t = 0, 7, and 14 hours.
The CDF curves of the degree centrality (d), closeness centrality (e) and betweenness centrality (f) for neuronal culture cluster networks for three times t = 0, 7, and
14 hours. The average degree centrality (g), average closeness centrality (h) and
average betweenness centrality (i) for neuronal culture networks for 15 time points
within the 14 hours experiment. . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
xix
5.4 Comparison of clustering indices between the neuronal culture networks and modelbased randomly constructed networks of the same size. (a) The comparison (errors) in terms of the transitivity between the neuronal culture networks and the RR,
ER, WS, BA, SSF, and WMG based generated networks (for each model we generated 1000 network realizations) for the 14 hours experiment. (b) The comparison
(errors) in terms of average clustering coefficient between the neuronal culture networks and the RR, ER, WS, BA, SSF, and WMG based constructed networks (for
each model we generated 1000 networks) within 14 hours. (c) The comparison (errors) in terms of average square clustering coefficient between the neuronal culture
networks and the RR, ER, WS, BA, SSF, and WMG based networks during the 14
hours experiment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.5 Variations of interconnections between two neighboring neurons. The exceedance
probability for the length of connections between two neurons (i.e., the probability
of observing the length of a connection between two neurons exceeding a certain
threshold) at the start of the experiment t = 0 hours (a), after 7 hours (b), at the end
of the experiments after 14 hours (c), and the comparison of (a-c) is shown in (d). . 77
5.6 Multifractal analysis of neuronal culture networks and neuronal culture cluster networks. (a) Multifractal spectrum f(α) as a function of Lipschitz-Holder exponent
α for neuronal culture networks in t = 0, 7, and 14 hours. (b) Generalized fractal
dimension D(q) as a function of q-th order moment for neuronal culture networks
in t = 0, 7, and 14 hours. (c) Multifractal spectrum f(α) as a function of LipschitzHolder exponent α for neuronal culture cluster networks in t = 0, 7, and 14 hours.
(d) Generalized fractal dimension D(q) as a function of q-th order moment for
neuronal culture cluster networks in t = 0, 7, and 14 hours. . . . . . . . . . . . . . 78
xx
6.1 From bird flock to LFNN. a-b. A flock of birds where leaders are informed and
lead the flock. c. An abstracted network from the flock. d. A LFNN architecture.
Weight updates of LFNN. e. BP in classic deep neural network (DNN) training.
Global prediction loss is back-propagated through layers. f. An LF hierarchy in
a DNN. Within a layer, neurons are grouped as (leader and follower) workers. g.
Weight update of follower workers. h. Weight update of leader workers with BP.
i. BP-free weight update of leader workers. Training visualization. j. workers
activity visualization in an LFNN. At each time step, the followers (black lines)
align themselves with leaders (red lines). k. Patterned collective motion produced
by the classic Vicsek model [213]. . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.2 a. Network performance results when varying leadership size from 10% to 100%.
b. Ablation study results from four different loss functions. Loss variation demonstration and leadership during training. c. Global prediction loss and both local
losses. d. Without local follower loss. e. Without local leader loss. f. Global
prediction loss alone. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.3 Leadership in workers during training. The color and size of dots represent the
times a worker is selected as leader. A worker can be selected as leader up to 300
times in each epoch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.4 Classification error of a. ResNet-101 and b. ResNet-152 on CIFAR-10 and TinyImageNet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.5 BA estimation errors for test set participants. Scatter plots show participants’ BAs
vs. CAs for males (a) and females (b) using the BP-based 3D-CNN model. Plots
(c) and (d) display the corresponding results for LFNN-ℓ. Solid line in each panel
represent no error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.1 Sagittal planes of the average salience maps for CN males. . . . . . . . . . . . . . 134
7.2 Same as Fig. 7.1, but for axial planes. . . . . . . . . . . . . . . . . . . . . . . . . 135
7.3 Same as Fig. 7.1, but for coronal planes. . . . . . . . . . . . . . . . . . . . . . . . 136
xxi
7.4 Sagittal planes of the average salience maps for CN females. . . . . . . . . . . . . 137
7.5 Same as Fig. 7.4, but for axial planes in females. . . . . . . . . . . . . . . . . . . 138
7.6 Same as Fig. 7.4, but for coronal planes in females. . . . . . . . . . . . . . . . . . 139
7.7 Sagittal planes of the average salience maps for CN participants. . . . . . . . . . . 140
7.8 Same as Fig. 7.7, but for axial planes. . . . . . . . . . . . . . . . . . . . . . . . . 141
7.9 Same as Fig. 7.7, but for coronal planes. . . . . . . . . . . . . . . . . . . . . . . . 142
7.10 Sagittal planes of the average salience maps for CI participants. . . . . . . . . . . 143
7.11 Same as Fig. 7.10, but for axial planes. . . . . . . . . . . . . . . . . . . . . . . . . 144
7.12 Same as Fig. 7.10, but for coronal planes. . . . . . . . . . . . . . . . . . . . . . . 145
7.13 Age estimation errors for test set participants. Scatter plots depict participants’
BAs as a function of their CAs for males (A, blue) and females (B, red). The
marginal distributions of CA and BA are depicted as histograms on the top and
right sides of each panel in (A) and (B), respectively. . . . . . . . . . . . . . . . . 152
7.14 Performance metrics according to sex, cohort, and neurocognitive status. (A) - (H)
Scatter plots of BA vs. CA for each sex in each test set cohort: (A, E) UKBB CN;
(B, F) CamCAN CN; (C, G) ADNI MCI, (D, H) ADNI AD. For CN participants
(UKBB, CamCAN), depicted data reflect benchmarking results. In participants
with MCI or AD (ADNI), they are provided for illustration and reference only,
since benchmarking did not involve participants with CI. (I)-(P) Scatter plots of
AG vs. CA for each sex across test cohorts: (I, M) UKBB CN; (J, N) CamCAN
CN; (K, O) ADNI MCI; (L, P) ADNI AD. (Q)-(R) AG distributions for females
and males, respectively. In (Q), for CamCAN, the SFCN’s violin plot range is (-76,
34) yr. Gray and black asterisks indicate significant differences in AG means and
variances, respectively, between the 3D-CNN and SFCN. . . . . . . . . . . . . . . 153
xxii
7.15 The radar plot for measuring the complexity and prediction performance for the
WestRo COPD dataset across different deep learning models under k-fold validation (k = 5): fractional-dynamics deep learning model (FDDLM), Vanilla deep
neural network (DNN), long short-term memory (LSTM), and convolutional neural network (CNN). We normalized all the values represented in this plot. . . . . . 160
xxiii
Abstract
In the realm of Medical Cyber-Physical Systems, where devices and information systems interact,
medical cyber–physical data is generated digitally, stored electronically, and accessed remotely by
medical professionals and patients. With the rise of medical big data, the collection and sharing of
medical cyber–physical data offer significant value for diagnoses, pathological analyses, epidemic
tracking, pharmaceuticals, insurance, and more. Prior research has focused on the interaction between cyber and physical spaces, constructing CPS-based architectures, and utilizing big data,
deep learning, and cloud computing to develop medical diagnosis systems. However, challenges
persist in these medical cyber-physical diagnosis systems, including (1) the lack of interpretability
in deep learning models; (2) fluctuations in data quality, notably impacting the performance of
medical data analysis; and (3) inadequate model designs misaligning datasets’ geometric properties with model architectures and (4) classical backpropagation limits compatibility with parallel
computing during training of the deep learning architecture.
To address these limitations, this thesis proposes novel mathematical-based deep learning and
super-resolution models to accurately analyze medical datasets and improve image data quality.
The overarching goal is to leverage Cyber-Physical Systems approaches for modeling, analysis,
sensing, and optimization in healthcare and medical diagnosis, while overcoming the shortcomings
of previous work.
To improve model performance and enhance the interpretability of architecture, we introduce
an interpretable 3D-CNN model for predicting participants’ brain age (BA) using saliency maps.
These maps shed light on hidden feature attribution and regional brain characteristics that reflect
BA, offering insights into cognitive aging patterns. The model’s generalizability to new cohorts is
xxiv
also demonstrated. This study’s translational potential is highlighted by the connections between
estimated BAs and neurocognitive measures of cognitive impairment (CI).
In the pursuit of better medical image quality, we present two novel invertible neural networks
(INNs): IRN-A and IRN-M. These networks address image super-resolution (SR) by enhancing image quality. Normalizing flow models using invertible neural networks (INN) have been
widely investigated for successful generative image super-resolution (SR) by learning the transformation between the normal distribution of latent variable z and the conditional distribution of
high-resolution (HR) images gave a low-resolution (LR) input. While the random sampling of
latent variable z is useful in generating diverse photo-realistic images, it is not desirable for image
rescaling when accurate restoration of the HR image is more important. Hence, in places of random
sampling of z, we propose auxiliary encoding modules to further push the limit of image rescaling
performance. Two options to store the encoded latent variables in downscaled LR images, both
readily supported in existing image file format, are proposed. One is saved as the alpha-channel,
the other is saved as meta-data in the image header, and the corresponding modules are denoted as
suffixes -A and -M respectively. These models outperform existing methods in terms of PSNR and
SSIM, pushing the boundaries of image rescaling performance.
In addition to model interpretation and image enhancement, leveraging the geometric characteristics of complex data, such as lengthy time series, can significantly enhance model performance
and reduce computational complexity. Our approach involves fractional-order dynamical modeling, which extracts distinctive signatures (coupling matrices) from physiological signals across
patients with chronic obstructive pulmonary disease (COPD). These fractional signatures are then
utilized to construct and train a deep neural network capable of predicting COPD stages for suspected patients. Input features, including thorax breathing effort, respiratory rate, and oxygen
saturation levels, inform the predictions. By employing this methodology, we ensure equitable access to reliable medical consultation for patients, particularly in regions with lower socioeconomic
conditions.
xxv
Finally, to accelerate the training process of classic neural network (NN) architectures, we
introduce a worker concept by incorporating local loss functions into the NN design. This NN
structure contains workers that encompass one or more information processing units (e.g., neurons, filters, layers, or blocks of layers). Workers are either leaders or followers, and we train
a leader-follower neural network (LFNN) by leveraging local error signals. The LFNN does not
require backpropagation (BP) or a global loss function to achieve optimal performance (we denote
LFNN trained without BP as LFNN-ℓ). By investigating worker behavior and evaluating the LFNN
and LFNN-ℓ architectures on a variety of image classification tasks (e.g., MNIST, CIFAR-10, ImageNet), we demonstrate that LFNN-ℓ trained with local error signals achieves lower error rates and
superior scalability than state-of-the-art machine learning approaches. Furthermore, LFNN-ℓ can
be conveniently embedded in classic convolutional NN architectures (e.g., VGG, ResNet, and Vision Transformer (ViT)), achieving a 2x speedup compared to BP-based methods and significantly
outperforming models trained with end-to-end BP and other state-of-the-art BP-free methods in
terms of accuracy on CIFAR-10, Tiny-ImageNet, and ImageNet.
xxvi
Chapter 1
Introduction
1.1 Challenges and Motivations
Cyber-Physical Systems (CPSs) seamlessly integrate computation, networking, and physical processes across various domains, from automotive to aerospace, industries, and healthcare. In this
era of technological advancement, CPS finds applications in diverse fields, including the medical domain, where it bridges the gap between the internet and users through multidisciplinary
approaches.
Medical Cyber Physical Systems (MCPSs), a subset of CPS, are crucial in modern healthcare,
playing a pivotal role in medical care and diagnosis. Traditionally, MCPS architectures prioritize
patient monitoring and data feedback, employing big data and cloud computing to establish medical diagnosis frameworks. In the following discourse, we address the significant challenges associated with deriving intelligence from medical data and enhancing diagnosis performance within
the context of CPS.
Challenge 1: Lack of interpretability in the deep learning based cloud computing model.
Cloud computing models are responsible for analyzing the data (e.g., images or signals) collected
by medical sensors. The most widely used cloud computing models are based on deep learning—a
learning approach that efficiently combines feature extraction and classification. This approach
proves to be a valuable tool for medical diagnosis, as it can logically explain a patient’s symptoms
[1]. The primary focus of deep learning is on representing input data and generalizing learned
1
patterns for use on unseen data [2]. When applying deep learning in "high-level" domains such as
healthcare, the criminal justice system, and medical decision-making, it becomes crucial for users
to appropriately weigh the assistance provided by the system and understand the reasoning behind
the models’ output [3]. However, the interpretability of deep learning models has consistently
posed a challenge in scenarios that require explanations of the features involved in modeling [4].
For instance, multi-layer neural networks, which have achieved impressive accuracy comparable
to human performance in diverse prediction and classification tasks [5], operate as opaque entities,
offering limited insight into the rationale behind the selection of specific features over others during
training. Additionally, they do not provide clarity on how correlations in the training data influence
feature choices, or why certain network architectures are preferred over alternative options.
Challenge 2: Low quality medical image data can heavily influence the models’ analyzing
and predicting performance.In the domain of medical image processing, the spatial resolution of
images can be influenced by a range of factors, particularly in relation to the specific modality. For
example, in MRI images, the spatial resolution might experience degradation due to factors like
image scan time, patient movements, considerations of patient comfort, and hardware configurations [6]. Among deep learning based cloud computation, the goodness of the data representation
has a large impact on the performance of machine learners on the data: a poor data representation
is likely to reduce the performance of even an advanced, complex machine learner, while a good
data representation can lead to high performance for a relatively simpler machine learner [7].
Challenge 3: Inadequately designed deep learning models that neglect the compatibility between datasets’ geometric properties and model architecture. In the domain of deep learning,
the quality of data significantly influences model performance. As a result, feature engineering,
which involves crafting features and data representations from raw data, assumes a pivotal role in
deep learning tasks. However, when dealing with the analysis of extremely long time series data,
the application of feature selection-based deep learning methods remains limited [5]. Typically,
the long short-term memory (LSTM) architecture stands as one of the most widely employed deep
learning approaches for analyzing biological signals and conducting predictions or classifications.
2
However, LSTM falls short in fully representing the long memory effect in the input, and it cannot generate extended memory sequences from unknown noise inputs[8, 9]. This limitation arises
from the fact that, to prevent the issue of gradient vanishing, the LSTM model incorporates a forget
gate to discard data before a specific timepoint t, thereby maintaining an appropriate gradient function length. Consequently, when addressing very long-time series with extensive memory ranges,
LSTM struggles to predict or classify them with high accuracy. The transformer represents a stateof-the-art model capable of extracting long-term correlations within data, and it has emerged as a
promising substitute for LSTM. However, based on previous research [10], the transformer only
surpasses classic deep learning models when the number of training samples exceeds 30 million.
These findings emphasize the necessity to develop novel deep learning architectures that effectively
handle data geometry and long-term memory extraction from extensive time-series data.
Challenge 4: Classical backpropagation limits compatibility with parallel computing during
training of the deep learning architecture. Most deep neural networks today are trained using
backpropagation (BP) [11, 12, 13]. However, BP is considered "biologically implausible" because
the brain does not form symmetric backward connections. Additionally, BP is incompatible with
high levels of model parallelism and restricts potential hardware designs. These limitations highlight the need for a fundamentally different learning algorithm for deep networks (for instance,
pretraining ViT-L/16 on JFT-300M requires 0.23k TPUv3-core-days, ViT-H/14 requires 2.5k days,
and BiT (ResNet152x4) demands 9.9k days [10].).
1.2 Thesis Contributions
To address the challenges mentioned earlier and bridge existing research gaps, our focus is on
developing novel deep learning architectures that can enhance hidden interpretability, improve data
quality, and analyze data geometry to enhance model performance within the context of MCPS.
Our primary contributions are outlined as follows.
3
To introduce hidden interpretability into deep learning architectures and elucidate the brain regions influencing aging, we propose a novel interpretable deep learning model for predicting brain
age (BA) in Chapter 2. Deep learning techniques can estimate BA by learning to predict chronological ages (CAs) of healthy subjects from brain MRIs, while minimizing the mean absolute error
(MAE) between BA and CA [14]. Despite the generally superior BA estimation performance of
deep learning compared to other methods [15], the inherent black-box nature of deep learning
hinders the interpretability of feature attribution [16]. Understanding how regional brain features
contribute to BA estimation using deep learning methods remains a challenge. Additionally, many
DL BA estimators lack accuracy and fail to generalize to previously unseen cohorts. To address
these limitations, we introduce a novel and interpretable three-dimensional (3D) convolutional
neural network (CNN) designed for BA estimation using T1-weighted brain MRIs. MRI feature
attribution is achieved through saliency maps, enabling the identification of structural brain patterns associated with cognitive normal (CN) aging. These patterns reflect regional and sex-specific
variations in neuroanatomic features linked to BA. We also demonstrate the generalizability of the
3D-CNN model to new cohorts. The potential for real-world application is evident in the connections established between estimated BAs and neurocognitive measures of cognitive impairment
(CI).
In Chapter 3, our focus is on enhancing the quality of image data. We introduce two novel
super-resolution approaches based on Invertible Neural Networks (INN) to upscale low-resolution
inputs to high-resolution images. For optimizing the rescaling model, the leading approach is the
Invertible Rescaling Net (IRN), as proposed by Xiao et al. [17], which has demonstrated significantly improved performance. In the case of downscaling, IRN is trained to convert high-resolution
(HR) inputs to visually pleasing low-resolution (LR) outputs using a latent variable z. As z follows
an input-agnostic Gaussian distribution during training, it enables accurate reconstruction of HR
images during the inverse up-scaling process, even when z is randomly sampled from a normal
distribution. However, the model’s performance can be further enhanced by efficiently preserving
the high-frequency information contained in z. In order to fully leverage the potential of IRN and
4
maximize model performance, we propose two approaches: IRN-Meta (IRN-M) and IRN-Alpha
(IRN-A). These methods efficiently compress the high-frequency information stored in z, which
can then be used to recover z and subsequently restore the HR image during the inverse up-scaling
process. These two architectures have demonstrated superior performance compared to the latest
INN-based SR models available.
Addressing the poor performance of LSTM and transformers in analyzing extensive time series
data, Chapter 4 introduces a novel fractional dynamics-based model. This model is designed to effectively analyze data geometry and capture long-term memory from COPD physiological signal
datasets. By extracting fractional features, represented by a coupling matrix A, from these extensive time-series data, we obtain information that deep learning models can readily classify. Even
a linear classifier achieves high accuracy using these fractional features. This research offers an
alternative diagnostic approach that overcomes the limitations of conventional spirometry-based
methods, which often rely on a medical doctor’s past experience. Furthermore, this approach minimizes human intervention, enabling nurses and doctors to place sensors on the patient’s body and
record physiological signals using the NOX device, which stores the data locally.
Chapter 5 introduces a computer vision model dedicated to segmenting neurons and neurites
from quantitative phase imaging data. This model supports neuroscientists in constructing neuronal
culture networks (NCNs) for the analysis of neural communication and the emergence of learning,
cognition, and creative behavior. Through complex network characterization, we identify a selfoptimization phenomenon within brain-derived neuronal culture networks and neuronal culture
cluster networks of rats and mice. This phenomenon involves enhanced information transmission, reduced latency, and maximized robustness over time, achieved through connection growth
or neuronal cluster merging. These findings complement prior research that has highlighted selforganized criticality, small-world states, and the connection between higher clustering and spontaneous bursting in specific parts of neuronal culture networks [18, 19].
Chapter 6 proposes a leader-follower neural network (LFNN) architecture that mirrors the
complexity observed in biological systems to speedup the training process of the traditional deep
5
learning architectures. Our LFNN divides the neural network into layers of elementary leader
and follower units, leveraging characteristics of collective motion to designate leadership. Leaders
in LFNNs are informed by and guide the entire system’s evolution. As a biologically plausible
alternative to backpropagation (BP), our approach utilizes distinct error signals for leaders and followers, enabling training through local error signals. We evaluate the LFNN architecture and its
BP-free version trained with local loss (LFNN-ℓ) on image datasets including MNIST, CIFAR-10,
and ImageNet. Our LFNN and LFNN-ℓ outperform other BP-free algorithms and achieve results
comparable to BP-enabled baselines. Notably, our algorithm demonstrates superior performance
on ImageNet compared to all other BP-free baselines. Moreover, LFNN-ℓ can be conveniently
incorporated into VGG, ResNet, and ViT architectures to accelerate training. It significantly outperforms state-of-the-art block-wise learning BP-free methods on CIFAR-10, Tiny-ImageNet, and
ImageNet. We also applied LFNN-ℓ to a 3D-CNN model for analyzing 3D brain MRIs to predict
brain age. Our LFNN-ℓ outperformed the latest 3D-CNN in terms of mean absolute error (MAE)
and achieved a 2× speedup. This study introduces complex collectives to deep learning, providing
new insights into biologically plausible neural network research and opening avenues for future
work.
6
Chapter 2
Interpretable deep learning based brain age prediction
architecture
2.1 Introduction to brain age prediction and deep learning
Although chronological age (CA) reflects disease risk, the rate of aging varies across individuals,
organs, tissues and clinical conditions [20]. Because CA does not capture this variation well, there
is interest in estimating biological age to predict morbidity [21, 22]. Among typically aging adults,
in the absence of any clinical indications, biological age is expected to equal CA, on average [23].
Neuroanatomic biological age inferred from magnetic resonance imaging (MRI), henceforth referred to as brain age (BA), can quantify disease-related changes in aging and associated increases
in mortality risk [24, 25]. Thus, reliable BA estimators can help to stratify individuals according
to disease risk [26, 27]. The difference between BA and CA, known as age gap (AG), conveys
whether aging is faster or slower than expected [28, 29]. In clinical cohorts, improving BA estimates can translate into better estimates of participants’ deviations from typical aging [30, 31]. For
example, BA has the potential to become an affordable and noninvasive pre-clinical indicator of
mild cognitive impairment (MCI) and Alzheimer’s disease (AD) [32] due to the strong association
between BA and dementia risk [33, 34].
7
Deep learning (DL) methods can estimate BA by learning to estimate cognitively normal (CN)
subjects’ CAs from MRIs of their brain, while minimizing the mean absolute error (MAE) between BA and CA [14]. Compared to other approaches, DL typically yields better BA estimates [15]. However, its inherent black-box nature hinders the interpretability of its feature attribution [16], since the relative utility of regional brain features for BA estimation by DL methods
is unknown. Furthermore, many DL estimators of BA are inaccurate and lack generalizability
to cohorts not encountered during DL training. To address these shortcomings, we introduce a
novel and interpretable three-dimensional (3D) convolutional neural network (CNN) to estimate
BA from T1-weighted brain MRIs. To provide neuroanatomic interpretability, MRI feature attribution is achieved through saliency maps. These allow one to identify structural brain patterns of
CN aging that reflect regional and sex-specific variations in neuroanatomic features reflecting BA.
3D-CNN generalizability to new cohorts is also illustrated. The translational potential of this study
is reflected in the associations between estimated BAs and neurocognitive measures of CI.
2.2 Neuroanatomic patterns of aging
We use a novel, interpretable 3D-CNN framework to estimate the BAs of 650 CN adults (age
range: 18-88 yr; 325 males) from the Cambridge Centre for Aging and Neuroscience (CamCAN,
Fig. 2.1 A and C). BAs were also estimated in 359 participants with AD dementia (age range: 55-
92 yr; 198 males) and in 351 participants with MCI due to AD (age range 55-89; 230 males) from
the Alzheimer’s Disease Neuroimaging Initiative (ADNI, Fig. 2.1 A). Among participants with
MCI, 54% were diagnosed with dementia within 11 years from the acquisition of MRIs analyzed
in this study. We generated 3D-CNN saliency maps of each participant’s brain to determine how
the 3D-CNN weighs each MRI voxel (Fig. 2.1 B and D). Saliency maps can help to identify brain
locations whose MRI features are weighted more heavily during age estimation (Fig. 2.2, Figs.
S1 to S12). Using this strategy, we mapped CI-related aging patterns and studied their variation
8
Dense 1
BatchNorm
MaxPool3D
Conv3D
Conv3D
MaxPool3D
BatchNorm
Dropout
Conv3D
MaxPool3D
BatchNorm
GlobalAvgPool
Dense 128
Dropout
male
female
male
female
saliency map
3D-CNN
T1-weighted MRIs skull-stripped MRIs
estimated BA
...
Conv3D block
output layer
...
bias
activation
function
(ReLU)
neural and cognitive
measures
males females
average saliency map
sex differences
Female
Male
(A) aggregate datasets (B)
(C)
(E)
(D)
p
r
e
p
r
o
c
e
s
sin
g
s
alie
n
c
y
m
a
p
e
xtr
a
ctio
n
Input
ADNI
UKBB
CamCAN
HCP
Text
100%
-100%
0%
Figure 2.1: Overview of BA estimation by an interpretable 3D-CNN. (A) Proportions of participants in
the aggregate dataset (ADNI, UKBB, CamCAN, and HCP), where each human symbol represents ∼300
participants. (B) T1-weighted MRIs were skull-stripped and 3D saliency probability maps were generated
from 3D-CNN output for each subject. (C) Prior to BA estimation using the 3D-CNN, participants were split
by sex and assigned randomly into training and test sets. MAE was used to evaluate 3D-CNN performance
from BA estimation results for test sets. The test set’s CA histogram is displayed in an inset. (D) The 3DCNN’s input consists of T1-weighted MRIs, and its output are BA estimates. Saliency maps are extracted
from 3D-CNN output after training. A dropout rate of 0.3 is used in all dropout layers, and a ReLU activation
function is used in all convolutional and dense layers. xi
is the feature map for input i and wi
is its weight.
(E) Sample sizes for participants with neurocognitive measures.
9
across sexes, brain regions, and subjects, as well as their association with neurocognitive outcome
(Fig. 2.1 E).
Our results in CN participants (Fig. 2.2 A and B) reveal typical neuroanatomic patterns of
aging, including ventricular enlargement, atrophy of frontal, temporal, and hippocampal cortices,
and cortical thinning. Cortical features are weighted differently across sexes (Fig. 2.2 A and B),
which suggests that males’ BA estimation is particularly reliant upon Sylvian fissure widening,
ventricular enlargement, and cingulate cortex atrophy. Males’ BA estimation is also weighted more
heavily by features of the lateral temporal lobe and dorsolateral frontal lobe in the right hemisphere,
a notable lateralization effect. By contrast, females’ saliencies are higher in posterior and medial
occipital regions (except the left calcarine sulcus), in the inferior and medial aspects of the parietal
lobes, in the supramarginal gyrus and adjacent parietal structures, in the right supramarginal gyrus
and callosal sulcus, in the pars triangularis of the right inferior frontal gyrus, and in posterior
insular regions (Fig. 2.2 A and B). In females, on average, white matter is weighted more heavily
than gray matter when estimating BA.
Fig. 2.2 C and D compares subject-wise average saliency maps according to cognitive status
(CN versus CI). This comparison reveals brain features upon which the 3D-CNN relies more when
estimating age according to cognitive status. For this reason, such features may reflect how CI
modifies regional brain aging. Many structures salient in CN aging are in the cortical gray matter
and include the dorsolateral aspect of the right frontal lobe, the lateral aspect of the right temporal
lobe, the posterolateral aspect of the right occipital lobe, as well as pericallosal regions in both
hemispheres (Fig. 2.2 C and D). Cerebral white matter is more salient in aging with CI than in CN
aging (Fig. 2.2 C), as is the brainstem, medial aspects of the temporal lobes (including parahippocampal and fusiform gyri), and the caudal portions of the anterior cingulate gyri (Fig. 2.2 D).
Appreciable lateralization of saliencies is noted when comparing CN participants to participants
with CI, and the lateralization pattern is similar to that obviated by the sex comparison (Fig. 2.2 B).
Involved are lateral temporal areas, the angular and supramarginal gyri, middle cingulate cortex,
parahippocampal areas, as well as both medial and dorsolateral prefrontal cortices.
10
(c) CN CI CI-CN
(d)
(b)
(a) males females CN males-females
z=28 mm y=-28 mm x=28 mm
z=0 mm y=0 mm x=0 mm
z=-28 mm y=28 mm x=-28 mm
z=28 mm y=-28 mm x=28 mm
z=0 mm y=0 mm x=0 mm
z=-28 mm y=28 mm x=-28 mm
z=28 mm y=-28 mm x=28 mm
z=0 mm y=0 mm x=0 mm
z=-28 mm y=28 mm x=-28 mm
z=28 mm y=-28 mm x=28 mm
z=0 mm y=0 mm x=0 mm
z=-28 mm y=28 mm x=-28 mm
0%
40%
80%
20%
60%
100%
0%
40%
80%
20%
60%
100%
0%
50%
-50%
differences between CN and CI
sex differences
100%
0%
-100%
50%
-50%
150%
-150%
z=28 mm y=-28 mm x=28 mm
z=0 mm y=0 mm x=0 mm
z=-28 mm y=28 mm x=-28 mm
z=28 mm y=-28 mm x=28 mm
z=0 mm y=0 mm x=0 mm
z=-28 mm y=28 mm x=-28 mm
Figure 2.2: Comparison of brain saliency maps across sexes and diagnoses. (A) Sex-specific mean saliency
maps (PM, PF) and the sex dimorphism map ∆P = (PM −PF)/[(PF +PM)/2] of CN participants. In all cases,
canonical cortical views (sagittal, axial, and coronal) are displayed in radiological convention. Higher saliencies (brighter regions) indicate neuroanatomic locations whose voxels contribute more to BA estimation.
Regions drawn in red have higher saliencies in males (PM > PF); the reverse (PF > PM) is true for regions
drawn in blue. (B) Canonical views of the sex dimorphism map ∆P for CN participants. Sex-specific deviations of ∆P from its mean across sexes are expressed as percentages of the mean. Red indicates that
∆PM > ∆PF, i.e., males have higher saliency; blue indicates the reverse (∆PF > ∆PM), i.e., females have
higher saliency. (C) Like (A), for the comparison between CN participants and participants with CI, where
∆P = (PCI − PCN)/PCN; red indicates PCI > PCN, blue indicates PCN > PCI. (D) Like (B), for the saliency
difference ∆P between CN and CI participants. Images are displayed in radiological orientation convention
(the right hand side of the reader is the left hand side of the participant, and vice versa).
11
CA
BA
0.0 0.2 0.4 0.6 0.8
proverbs
N = 611
emotional regu.
N = 291
force matching
N = 300
emotional recog.
N = 42
hotel
N = 618
ToT
N = 604
picture priming
N = 543
RT (simple)
N = 577
famous faces
N = 620
Benton faces
N = 622
VSTM
N = 609
motor learning
N = 296
emotional memory
N = 303
fluid intelligence
N = 621
RT (choice)
N = 573
Spearman's correlation with age
CamCAN CN (N650)
-0.08
-0.03
-0.20
-0.19
-0.02
-0.41
-0.33
-0.52
0.74
0.48
0.32
-0.65
-0.04
0.78
-0.67
(a)
CA
BA
0.0 0.2 0.4 0.6 0.8
RAVLT learning
N = 346
RAVLT P
N = 343
RAVLT forgetting
N = 345
ADASQ4
N = 345
logical memory
N = 224
MMSE
N = 347
CDRSB
N = 347
FAQ
N = 346
ADAS13
N = 344
trail making
N = 341
digit symbol
N = 347
ADAS11
N = 346
RAVLT IR
N = 346
Spearman's correlation with age
MCI (N351)
-2.00
1.80
-0.40
2.10
-1.20
-2.30
3.00*
3.00*
2.70*
2.00
-2.20
2.70*
-2.20
(c)
CA
BA
0.0 0.2 0.4 0.6 0.8
digit symbol
N = 76
ADAS11
N = 76
CDRSB
N = 76
ADAS13
N = 76
trail making
N = 76
FAQ
N = 76
ADASQ4
N = 76
RAVLT learning
N = 76
logical memory
N = 76
RAVLT forgetting
N = 76
MMSE
N = 76
RAVLT IR
N = 76
RAVLT P
N = 76
Spearman's correlation with age
ADNI CN (N510)
1.00
0.74
-0.23
0.76
-0.19
-0.48
0.81
-1.20
0.09
-0.02
0.11
-0.58
0.28
(b)
CA
BA
0.0 0.2 0.4 0.6 0.8
logical memory
N = 350
RAVLT forgetting
N = 511
ADASQ4
N = 514
RAVLT P
N = 501
RAVLT learning
N = 514
ADAS13
N = 504
MMSE
N = 519
ADAS11
N = 515
digit symbol
N = 512
trail making
N = 492
RAVLT IR
N = 514
CDRSB
N = 519
FAQ
N = 518
Spearman's correlation with age
MCI or AD (N710)
0.22
-0.97
2.20
1.40
-3.00*
2.00
-2.90*
1.90
-2.70*
2.50*
-2.90*
3.60*
4.10*
(d)
(A) (C)
(B) (B) (D)
Figure 2.3: Correlations between neurocognitive measures and both estimated BA and CA. Results are
depicted for two independent test sets: CamCAN and ADNI. (A) displays CN participants from CamCAN,
(B) displays CN participants from ADNI, (C) displays results only for participants with MCI, and (D) displays results for participants with either MCI or AD. For each independent test set, the sample size for each
neurocognitive measure listed below the measure name. Bar charts depict Spearman’s correlations rS (along
x) between BA (green) or CA (red) and each neurocognitive measure (along y). Bars are contoured in black
if rS is significant. Error bar widths equate to one standard error of the mean. For each neurocognitive
measure, the corresponding bar pair is annotated with Fisher’s z statistic. Asterisks indicate neurocognitive
measures for which the difference in Spearman’s correlations rS(BA)−rS(CA) is significant. 12
2.3 Associations with neurocognitive endophenotypes
The ability of estimated BA to capture neurocognitive endophenotypes was contrasted to that of
CA. This was achieved by comparing Spearman’s correlations rS between each age (BA, CA) and
every neurocognitive measure of CN aging (Fig. 2.3 and Tables S1 through S5). For all neurocognitive measures, significant rS values reflect typical aging effects on neurocognitive function
(worse performance is correlated with older age). As expected, among CN participants, BA and
CA reflect cognition to similar extents. For example, among CamCAN CN participants (Fig. 2.3 A
and Table S1), older BA and CA are correlated with worse performance on word finding (picture
priming), motor learning (force matching), motor response time (choice and simple response time
(RT) tasks), face recognition (Benton’s unfamiliar face recognition, famous faces test), Cattell’s
fluid intelligence, emotional memory, and visual short term memory (VSTM) measures. In the
ADNI CN cohort (Fig. 2.3 B and Table S2), no neurocognitive measure examined is significantly
more correlated with BA than with CA.
Across participants with CI, BA is significantly more correlated than CA with neurocognitive
measures. In participants with MCI (Fig. 2.3 C and Table S3), older BA (but not CA) is significantly correlated with worse scores on all measures of neurocognitive function examined, except
1) delayed verbal recall and learning on the Rey auditory verbal learning test (RAVLT), 2) delayed
word recall measured by the AD assessment scale question 4 (ADAS Q4), and 3) logical memory.
For the clinical dementia rating sum of boxes (CDR-SB) and the functional abilities questionnaire
(FAQ), the difference in correlations between BA and CA is significant and BA outperforms CA
in its ability to reflect neurocognitive function. In participants with AD, no significant correlations
exist between BA and any neurocognitive measure apart from FAQ scores (Table S4). Nevertheless, older CA is correlated with poorer delayed verbal memory (RAVLT forgetting). By contrast,
among all participants with any type of CI (whether MCI or AD), BA (but not CA) is significantly
correlated with all measures except delayed verbal recall (RAVLT, ADAS Q4) and logical memory
(Fig. 2.3 D and Table S5). The difference in correlations between BA and CA is significant for the
13
CDR-SB, mini-mental state exam (MMSE), RAVLT immediate recall (IR), and FAQ. When separating participants with CI by apolipoprotein E4 (APOE4) status, BA is not more correlated with
any neurocognitive measure in carriers compared to non-carriers. The omnibus effect of a logistic
regression accounting for all interactions between AG, CA, and sex is significant (χ
2
343 = 29.500,
p < 0.001). AGs are significantly and positively associated with MCI participants’ probability of
conversion to AD (β = 1.417, t343 = 2.240, p = 0.025). The only significant interaction is between
AG and sex (β = -1.121, t343 = -2.129, p = 0.033), i.e., MCI females with more negative AGs and
MCI males with more positive AGs are significantly more likely to convert to AD. When including
all interactions, the omnibus effect of the regression that predicts time to conversion is significant
if AG is included as predictor (R
2 = 0.065, F8,181 = 2.880, p = 0.007), but not if AG is excluded
(R
2 = 0.012, F4,185 = 1.790, p = 0.151).
2.4 3D-CNN benchmarking & evaluation
We compare our 3D-CNN to an award-winning [35] state-of-the-art model, the simple fully convolutional network (SFCN) of Gong et al. [36, 37], by replicating its training, validation, and benchmarking. The SFCN was pre-trained on 5,698 UK Biobank (UKBB) subjects, whereas our 3DCNN was trained on 4,681 participants (2,513 females; age range: 22-95 yr) aggregated across the
UKBB, Human Connectome Project-Aging (HCP-A), Human Connectome Project-Young Adult
(HCP-YA), and ADNI. In the testing set, our model’s MAE between BA and CA is 2.41 years (yr)
for males and 2.23 yr for females (Fig. S13 A and B). The coefficient of determination R
2
is 0.96;
the correlation coefficient r is 0.98. Across all external testing sets (UKBB, CamCAN, AD, and
MCI), our model has a higher R
2
than the SFCN (Table S6).
On identical UKBB data (N = 518), the 3D-CNN achieves MAEs of 2.27 yr (males) and 2.31
yr (females) (Fig. S14 A and E), while the SFCN achieves an MAE of 2.14 yr across both sexes
(Fig. S14 Q and R). In the independent CamCAN CN cohort, the SFCN’s MAEs are 9.90 yr
(males) and 9.17 yr (females) (Fig. S14 Q and R). By contrast, our 3D-CNN achieves MAEs of
14
4.71 yr (males) and 3.01 yr (females) (Fig. S14 B and F). During pre-training, the SFCN yields
an MAE within 2% of the 3D-CNN’s. However, in the independent test cohort of CN participants,
our MAEs are 42% lower than the SFCN’s in the same cohort. The SFCN yields MAEs of 7.72 yr
(males) and 7.50 yr (females) for participants with MCI, and 8.24 yr (males) and 8.65 yr (females)
for participants with AD (Fig. S14 Q and R). By contrast, our 3D-CNN model achieves an MAE
of 5.26 yr (males) and 4.33 yr (females) for participants with MCI (Fig. S14 C and G), and 6.48
yr (males) and 5.98 yr (females) for participants with AD (Fig. S14 D and H). Compared to
the SFCN (Fig. S14 Q and R), the 3D-CNN yields significantly larger mean AGs for (A) females
with MCI (t144 = 6.595, p < 0.001), (B) males with AD (t195 = 4.710, p < 0.001) and (C) females
with AD (t162 = 6.200, p < 0.001). The 3D-CNN also yields significantly larger AG variances in
participants with AD (males: F197,197 = 1.857, Pitman’st196 = 4.440, p < 0.001; females: F162,162 =
2.493, Pitman’s t161 = 6.006, p < 0.001). Compared to the 3D-CNN, the SFCN yields significantly
larger AG variances for the following CN groups: (A) UKBB males (F796,796 = 1.137, Pitman’s
t795 = 12.967, p < 0.001); (B) UKBB females (F796,796 = 1.097, Pitman’s t795 = 9.034, p < 0.001);
(C) CamCAN females (F309,309 = 7.576, Pitman’s t308 = 21.082, p < 0.001). As expected, the
3D-CNN’s mean AG is ∼75% larger in participants with CI than in CN participants (Fig. S14
I to P), possibly reflecting faster brain aging in the former. The BA estimation parameters of
the 3D-CNN and SFCN, evaluated without fine-tuning, are compared in Fig. 7.15 and Table S6.
The 3D-CNN has shorter execution times (ETs) and fewer trainable parameters, reflecting lower
complexity (Table S6 and Fig. 7.15). As Table S6 and Fig. S14 suggest for participants with CI,
our CNN yields higher R
2
and lower MAEs than the SFCN.
15
MALES (M) FEMALES (F)
Figure 2.4: Radar plots of sex-specific MAEs and performance parameters. Radar plots of MAE, R
2
and
performance parameters (average ET and the number of trainable parameters) according to sex and diagnostic status (CN: UKBB, CamCAN; MCI or AD: ADNI). The SFCN of Gong et al. [36, 37] (purple) is
compared to our 3D-CNN (blue). To facilitate simultaneous comparison, all values are normalized to range
from 0 to 1, where the maximum value in each measurement was rescaled as 1 and 0 remained as 0.
2.5 Interpretable deep learning methods
2.5.1 Participants and neuroimageing
This study was undertaken in adherence with the US Code of Federal Regulations (45 C.F.R. 46)
and the Declaration of Helsinki. MRIs analyzed in this study were acquired as part of other studies, with approval from the institutional review boards or similar ethical monitoring bodies at the
respective institutions where data had been acquired for ADNI [38] and HCP [39]. UKBB efforts were undertaken with ethical approval from the North West Multi-Centre Research Ethics
Committee of the United Kingdom. Ethical approval for CamCAN was obtained by the Cambridgeshire 2 (now East of England—Cambridge Central) Research Ethics Committee. Informed
written consent was obtained from all participants.
The aggregate dataset consists of 5,851 CN individuals (3,142 females) aged 22-95 yr sampled from ADNI (N = 510), HCP-A (N = 508), HCP-YA (N = 1,112), and UKBB (N = 3,721;
Table 2.1). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial
16
CA (yr) ratio FreeSurfer version
repository status N min max µ σ M:F 4.3.0 5.3.0 6.0.0 7.1.1
ADNI CN 510 56 95 75.1 7.2 1:1.17 0 0 260 250
HCP-A CN 508 36 80 55.8 12.0 1:1.38 0 0 309 199
HCP-YA CN 1112 22 37 28.8 3.7 1:1.17 0 1112 0 0
UKBB CN 3721 45 83 62.7 10.1 1:1.13 0 0 3721 0
CamCAN CN 650 23 88 54.2 18.6 1:1.00 0 0 0 650
ADNI AD 359 56 95 75.9 8.0 1:0.83 359 0 0 0
ADNI MCI 351 55 89 75.2 7.3 1:0.53 351 0 0 0
All all 7211 23 95 58.4 10.3 1:1.09 710 1112 4290 1099
Table 2.1: Participant demographics. Sample size, descriptive statistics (minimum, maximum, mean µ, and
standard deviation σ), the male-to-female (M:F) sex ratio, and breakdown by FreeSurfer version used for
preprocessing. Demographics are listed for each repository and neurological/cognitive status.
MRI, positron emission tomography, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD. MRI acquisition protocols for HCP-A and HCP-YA are described elsewhere [40]. For UKBB data, we used
preprocessed images generated by a UKBB pipeline whose output included FreeSurfer reconstructions [41].
An independent test set of 650 CN participants aged 18−88 yr was obtained from CamCAN.
Additionally, 408 participants with MCI, and 359 participants with AD were obtained from ADNI
(Table 2.1). CamCAN inclusion/exclusion criteria [42] and ADNI eligibility/diagnosis criteria are
described elsewhere [38]. N = 75 participants with MCI were excluded due to MCI diagnosis
being unrelated to AD, leaving 351 participants with MCI (190 converted to AD, 161 did not). Of
524 CI (MCI or AD) participants whose correlations between BA and neurocognitive scores were
computed, 307 participants were APOE4 carriers.
2.5.2 Neurocognitive measures and MRI preprocessing
We used neurocognitive measures available in CamCAN and ADNI to evaluate the utility of our
estimated BAs to capture neurocognitive phenotypes (see SI Methods for detailed task descriptions). Thirteen cognitive measures that assess emotional processing, executive function, memory,
and motor function were obtained from the CamCAN repository [42]. Emotional processing was
17
measured via 1) Ekman’s emotion expression recognition test, 2) the emotional memory test, and
3) the emotional regulation test. Executive function was measured using 1) Cattell’s fluid intelligence test, 2) the hotel test, and 3) a proverb comprehension task. Memory was measured using 1)
Benton’s face recognition test, 2) the famous faces test, 3) a picture priming task, 4) the tip of the
tongue (ToT) test and 5) a VSTM task. Motor function was assessed via 1) a force matching task,
2) a motor learning task, 3) a reaction time (RT) ‘choice’ task, and 4) a RT ‘simple’ task.
Nine cognitive measures that assess neural function, cognitive performance, and functional
impairment were obtained from the ADNI repository [43]. To eliminate systematic variability in
FreeSurfer software versions, we limited correlation analysis for the CI cohort to subjects from
ADNI1 only (FreeSurfer v4.3). For neural function, four established dementia rating scales were
obtained, including 1) the clinical dementia rating scale – sum of boxes (CDR-SB), 2) the diagnostic ADAS versions 11 and 13, and 3) the MMSE. Cognitive performance was measured via four
neuropsychological measures: 1) the RAVLT, 2) delayed recall on the logical memory test, 3) the
digit symbol substitution test, and 4) the trail-making test. Functional impairment was measured
by the FAQ.
Freesurfer’s recon-all function was used to reconstruct and segment T1-weighted MRIs.
This process includes skull-stripping, motion correction, normalization of non-uniform signal intensities, Talairach space transformation, removal of non-brain tissues, and registration of all subjects’ brains into a common coordinate space [44]. FreeSurfer (FS) was used for three reasons: 1)
UKBB makes FS reconstructions available; 2) the FS workflow is fully automated and thus convenient; 3) our study involves surface analyses and registrations across native and atlas spaces, which
FS facilitates. During FS preprocessing using recon-all, all MRIs were affinely registered to
the MNI305 atlas. Due to sourcing from several MRI repositories, enhancement of segmentation
accuracy differed slightly between cohorts (Table 2.1). UKBB and HCP-YA reconstructions were
enhanced using T2-weighted MRIs, while ADNI, HCP-A, and CamCAN were enhanced using
fluid-attenuated inversion recovery MRIs.
18
2.5.3 3D-CNN architecture and 3D-CNN training
We constructed a novel DL regression model using a 3D-CNN whose inputs are FS brain.mgz
output files and whose outputs are estimated BAs. The DL architecture was implemented in Python
3.6 using TensorFlow 2.7.0 and executed on a computer with an Intel Core i7 processor (2.2 GHz
clock speed) with 16 GB of random access memory, and a 12 GB NVIDIA Tesla K80 graphical
processing unit. The 3D-CNN consists of three convolutional blocks followed by two dense layers.
The input matrix size is 82 × 86 × 100. Each convolutional (conv) block has a 3D conv layer, a
batch normalization layer, a max-pooling layer, and an optional dropout layer. The filter sizes of
the first three (conv) blocks are 64, 128, and 128, respectively. Conv block filter size determines the
dimensionality of the output space. The rectified linear unit (ReLU) activation function is applied
to all conv and dense layers. The ReLU activation function is defined as g(x) = max(0, x) for input
x. g(x) can efficiently reduce the likelihood of a vanishing gradient and makes the output more
sparse. After the conv blocks, the fourth block consists of one global average pooling layer (used
for global average pooling of 3D data), one dense layer, and one dropout layer (dropout rate = 0.3).
The resulting feature map, of size 18×18×18×128, is pooled to 128×1 and then projected onto
the output dense layer, which has one output neuron to estimate BA using regression.
We choose MSE as loss function and use an Adam optimizer (learning rate = 0.001) [45]. The
advantage of outputting BAs as real numbers rather than assigning them to discrete age bins [36,
37] is that, in the former case, BA outputs are assigned within a continuous domain and range. Due
to regression to the mean [46], estimated BAs exhibit a previously documented CA-dependent bias.
To alleviate this effect, we use the zero correlation constraint method of Treder et al. [47] to regress
out the bias from the BAs of testing set participants. This is done separately for each cohort. Biascorrected BAs are used for all analyses. CN participants are aggregated from UKBB, HCP-A,
HCP-YA, and ADNI. Participants were randomly assigned into training and test sets of sizes equal
to 20% of the total sample size (N =5,851).
We optimized CNN architecture and fine-tuned hyperparameters. 2D-CNNs use 2D kernels to
estimate sliding windows across single slices, such that leveraging information from adjacent slices
19
is challenging [48]. We therefore chose a 3D-CNN that overcomes this deficit by using 3D kernels
to estimate sliding windows for volumetric patches. The latter capture inter-slice image context
and improve model performance [48]. We also included dropout and batch normalization layers
because these help to alleviate overfitting [49]. Grid and random searches determined suitable
hyperparameter values (e.g., batch size, kernel size, weight decay). An n-dimensional grid was
defined to map the n hyperparameters and to identify their ranges. We examined all possible 3DCNN configurations to identify optimal values for each hyperparameter. Since we used MSE as a
loss function, we selected a configuration with the lowest loss value (error).
We tested the 3D-CNN on independent cohorts to refine 3D-CNN architecture, illustrate model
generalizability, alleviate data overfitting, and to compare the 3D-CNN to other approaches. The
testing set was designed to include a random selection of participants from the same cohorts as the
training set. To avoid overfitting the 3D-CNN to the training set, we monitored its performance on
the testing set. To avoid overfitting on both training and testing sets, we tested our model on two
independent cohorts (CamCAN and ADNI) that had not been used for 3D-CNN design. The latter
of these cohorts includes participants with a range of cognitive statuses (CN, MCI, or AD).
After computing AGs for identical samples using both our 3D-CNN and the SFCN, we performed Welch’s t-tests for paired samples with unequal variances to compare the mean AGs obtained using the two methods. AG variances were compared using Pitman’s variance ratio test for
correlated samples, whereby F = σ
2
1
/σ
2
2
, Pitman’s tN−2 = [(F − 1)
√
N −2]/[2
p
F(1−r
2)], and
r is the correlation of AGSFCN with AGCNN. The AG variances are σ1 and σ2, whose subscripts
{1,2} denote the SFCN or CNN, as needed, to satisfy the inequality σ1 > σ2.
2.5.4 Saliency maps analysis
A saliency map is a topographically organized depiction of the visual saliency in an MRI volume
V0. Here we extend a saliency approach for 2D-CNNs [50] to the 3D case. For an MRI brain
volume V0 and a 3D-CNN model with score function S(V), we rank voxels in V0 based on their
importance to S(V). We consider the linear score model S(V) = w
TV+b, where the volume V,
20
weight w and bias b are in one-dimensional (vectorized) forms. Since the 3D-CNN and score
function are highly nonlinear functions of V, the linear score model cannot be applied directly.
We approximate S(V) at V0 using the first-order Taylor series S(V0) ≃ w
T
0 V0 + b0, where w0 =
∂S/∂V |V0
is the partial derivative of S(V) at V0 and b0 = b |V0
is the bias b at V0. The spatial
and temporal distributions of saliencies contain unique patterns conveying information about BA.
Two distinct workflows were used for volume- and surface-level transforms, respectively, to
remove the confounding effects of subject differences in brain shape and size. For volume-level
analysis, each saliency map was nonlinearly registered to the FS fsaverage atlas. To this end,
T1-weighted brain volumes were first registered to the atlas in MATLAB using the imregister
function, which applied the transformation from native space to the atlas, as provided by FS.
MATLAB’s imregdemons function was used to deform nonlinearly and to map T1-weighted scans
onto the atlas. The transformations above were applied to each subject’s saliency map, resulting
in its registration to the atlas. For sur f ace-level analysis, saliencies were projected to the native
cortical surface. To achieve this, each subject’s saliency was projected onto the cortical mantle as
a cortical overlay using a customized algorithm for volume-to-surface mapping [51, 52]. Briefly,
voxels assigned to the gray matter ribbon by FS were considered. At each vertex of the native mesh
for the mid-thickness surface, ribbon voxels were selected within a cylinder that lay orthogonally
with respect to the local surface. The cylinder was centered on the vertex; its height and radius
were equal to the local cortical thickness. The saliency of ribbon voxels within the cylinder was
averaged according to a Gaussian weighted function (full width at half maximum = ∼4 mm, σ
= 5/3 mm) to compute a mean saliency value at the surface vertex in question. After cortical
surface projection, each subject’s saliency overlay was registered from native space onto the atlas.
Subjects’ saliency probability overlays were averaged into a cortical map of mean saliency.
For both volume- and surface-level analyses, each saliency map M was operationalized into
a saliency probability map P by dividing saliency at each brain location by the sum of all brain
saliencies. An average saliency probability map was computed for each sex and cognitive status,
yielding PM for males, PF for females, PCN for CN adults, and PCI for participants with any form
21
of CI. Both PCN and PCI were computed after averaging across sex effects. Relative sex differences
in P were computed as (PM −PF)/[(PF +PM)/2], i.e., as sex-specific deviations from the average
across sexes. The relative deviation of participants with CI from CN participants was computed as
(PCI −PCN)/PCN. Relative saliency differences between sexes or diagnostic statuses were mapped
after thresholding to include only statistically significant values. For each salience value considered, significance was evaluated using a paired-sample t-test (α = 0.05). Results were corrected
for multiple comparisons using the Benjamini-Hochberg procedure (false discovery rate = 0.05).
For volume-level visualization, CN participants’ mean saliency maps were plotted for each sex
along the coronal (x), sagittal (y), and axial (z) planes. For each coordinate, maps were generated
along planes whose equations were specified by coordinate values of −28 mm, 0 mm, and 28 mm,
respectively. In CN participants and participants with CI, the procedure was repeated after averaging across sexes. For sur f ace-level visualization, gray matter saliencies were mapped onto the
cortex to compare different cortical locations’ relative importance to the 3D-CNN when estimating
BA.
2.5.5 Data and code availability statement
3D-CNN software is available from https://github.com/irimia-laboratory/USC_BA_estimator MRI
data are publicly available from ADNI ( https://adni.loni.usc.edu/), UKBB (https://www.ukbiobank.
ac.uk/), CamCAN (https://www.cam-can.org/) and HCP (https://www.humanconnectome.org/).
There are no relevant accession codes required to access these data and the authors had no special
access privileges that others would not have to the data obtained from any of these databases.
22
2.6 Discussion
2.6.1 Significance
Whereas biological age can be computed for many phenotypic traits, BA summarizes MRI-derived
neuroanatomic profiles using one number. This highlights both the appeal and caveats of this measure. Although straightforward to grasp, BA (as defined here) does not capture the nuances and
complexity of brain aging. Nevertheless, with cautious interpretation, BA could assist diagnosis
and prognosis despite its limitations. Early screening for CI can help to monitor and improve the
welfare of aging adults [53]. Although positron emission tomography (PET) can aid diagnosis of
AD at the pre-clinical and prodromal stages [54], this technique is expensive, involves specialized
tracers, and exposes participants to radiation [55]. By contrast, MRI is noninvasive, more affordable, and safer. Thus, MRI-derived BAs that capture neurocognitive decline [56] could become
affordable and noninvasive pre-clinical measures of CI risk [57].
The correlations of neurocognitive measures with our estimated BAs are, in many cases, significantly stronger than with CA, suggesting that our BAs better reflect neurocognitive functioning.
These correlations are critical because one potential utility of BA estimation is to facilitate the
early identification of persons at high risk of MCI and AD. AGs are predictive of AD conversion
risk, as others reported [58, 59, 60]. BA is not correlated with neurocognitive function in participants with AD, with the exception of informant-rated functional ability. One possible reason is that
the 3D-CNN was trained on CN adults rather than participants with CI. Another reason could be
that correlations are more difficult to detect due to lower statistical power (smaller sample size) in
participants with AD compared to CI participants. Across persons with CI of any severity, BA (but
not CA) is significantly correlated with measures used routinely [38] to screen for (or to diagnose)
CI, including MCI (Fig. 2.3 and Tables S4 and S5). Thus, our contributions can help to understand
how CI-related neurocognitive changes within specific functional domains reflect neuroanatomic
features that modify regional BAs.
23
2.6.2 Sex differences in anatomic brain aging
Of note for patient-tailored profiling, our approach can generate subject-specific brain saliency
maps reflecting individual neuroanatomic patterns of brain aging. Anatomic interpretability of BA
is important because 1) brain regions age differently, 2) neuroanatomic alterations with age may
reflect distinct disease processes paralleled by BA [61], and 3) individual neuroanatomic deviations may parallel neurocognitive endophenotypes. Sex differences in saliency confirm findings
on the contributions of age to sex dimorphism in the pre- and postcentral gyri [62, 63] and the
pars triangularis of the left inferior frontal gyrus [64]. Males, who are at higher risk of motor
impairment due to Parkinson’s disease [65], exhibit greater saliency in the primary motor cortex.
Whereas males’ BA estimation relies more on the crowns of gyri on the lateral aspects of the frontal
lobes, females’ relies more on the troughs of sulci. These findings confirm prior reports on sex differences in older adults’ cortical gyrification [66]. Males’ saliencies are higher along ventricular
boundaries, indicating that BAs are disproportionately predicated upon ventricular enlargement in
men, as reported elsewhere [67]. The right hemisphere’s higher saliency in males is consistent with
their lateralization of language function [68] and with lateralization trends in old age [69]. Thus,
in females, typical cortical aging may be relatively slower in the right hemisphere. By contrast,
on average, most occipital and medial parietal areas exhibit age-related neuroanatomic patterns
that are more salient in males. Males also have higher saliency in superior parietal and frontal
regions, reflecting smaller gray matter volumes [70]. By contrast, females have higher saliency
at the occipital poles and in occipitoparietal regions, reflecting smaller gray matter volumes in
these regions [70]. Females’ saliencies are higher across inferior parietal regions, where cortex is
thicker than in males [71]. Thus, our approach to neuroanatomic saliency mapping can identify
sex differences in the neuroanatomy of cortical aging.
2.6.3 Anatomy changes according to neurocognitive status
Our interpretable 3D-CNN framework captures neuroanatomy changes related to both CN aging
and aging with CI. In the case of CN aging, the estimated BAs of CN participants in our two
24
independent samples (CamCAN and ADNI) are correlated with neurocognitive measures reflecting
typical aging (e.g., motor learning, multitasking, and word finding). In ADNI CN participants,
no significant associations were found between neurocognitive measures and either CA or BA.
This was expected, as ADNI cognitive measures are sensitive to CI rather than to CN aging [72].
In the case of CI, Fig. S14 I to P confirms that participants with either MCI or AD have AGs
considerably larger than those of sex- and age-matched CN adults [73], mostly due to older-thanexpected brains (BA > CA). Atrophy of the parahippocampal gyrus is a strong structural correlate
of MCI and AD [74]; our 3D-CNN’s greater reliance on this structure during BA estimation reflects
this (Fig 2.2 C). Similarly, saliency differences between CN and CI participants are greater in
parietal, occipital, and temporal cortices (Fig 2.2 D), whose atrophy is greater in participants with
CI [75] and whose burdens of amyloid β plaques and τ neurofibrillary tangles are typically higher
in AD [76]. The brain stem, which is affected by amyloid deposition early during AD, is more
salient in participants with CI than in CN adults [77]. Comparison of the cortical patterns in
Fig. 2.2 B and D indicates that saliency differences between sexes are largely paralleled by saliency
differences across cognitive statuses (CN versus CI). This may reflect females’ higher risk for AD,
and supports the hypothesis according to which their higher risk is paralleled by faster cortical
aging. Comparison of CN and CI cohorts suggests that the SFCN underestimates mean AG in
the latter group, and that the expected accuracy of BA estimation is lower for participants with
CI. These findings highlight the importance of an accurate BA estimator when studying diseased
populations. Some cortical structures that atrophy far more in CI than in CN aging are more salient
in the latter (blue regions in Fig 2.2 D). This may reflect the fact that the 3D-CNN was trained
on a CN adult cohort. During training on this cohort, our 3D-CNN likely relies on features whose
variance is moderate in CN aging. When estimating the BAs of participants with CI, however,
these features exhibit far greater variability. This may cause their relative saliency to decrease,
such that the saliency difference ∆P between CN and CI aging is negative in such regions. Thus,
although features with negative ∆P values can be useful for understanding how BA estimation
relies on CI-related neuroanatomy features, the negative sign of ∆P must be interpreted cautiously.
25
2.6.4 Comparison to other methods
Our 3D-CNN alleviates major limitations of other approaches. The quantitative comparison below
focuses on the SFCN because this open source approach performed best in a competition [35] for
which both training and testing data are available.
Accuracy. Our 3D-CNN estimates BA more accurately than the state of the art regardless of
whether accuracy is quantified using MAE or R
2
. In the test set, our model yields an MAE of ∼2.3
yr; this is ∼1 yr less than the SFCN, which is second best. Other (published) BA estimators have
MAEs that are even higher than that of the SFCN on their testing data. Presumably, since our
MAE is ∼ 2.3 yr, these estimators also perform more poorly than ours. However, we could not
ascertain this because we did not have access to the testing sets on which others estimators were
benchmarked. These estimators include a best linear unbiased predictor (MAE ≃ 3.3 yr) [78], a
3D residual neural network (3D-RNN, MAE ≃ 3.3 yr) [79], a graph CNN (MAE ≃ 4.6 yr) [80],
Gaussian process regression (MAE ≃ 4.1 yr) [14], support vector regression synergized with a
random forest classifier (MAE ≃ 3.5 yr) [81] and a 3D-DenseNet (MAE ≃ 3.3 yr) [82].
On the testing set, our model yields R
2 ≃ 0.96 and r ≃ 0.98. By contrast, the SFCN model
yields R
2 ≃ 0.92 and r ≃ 0.96 [37, 36, 14]. During testing, other (published) BA estimators achieve
even lower R
2
than the SFCN. These include Gaussian process regression (R
2 ≃ 0.91) [14], a 3DRNN (R
2 ≃ 0.90) [79], a graph CNN (R
2 ≃ 0.87) [80] and a 3D-DenseNet (R
2 ≃ 0.85) [82]. Our
R
2
is also higher than that of a BA estimator that used an optimized SFCN [83] with R
2 = 0.94.
These comparisons suggest that even high R
2
can involve undesirably large MAE, such that it can
be useful to consider both measures when evaluating accuracy.
In all females with CI and in AD males, our AGs are significantly larger than those estimated
by the SFCN. As expected, the 3D-CNN’s estimates of these subjects’ CAs are consistently larger
than their true CAs. Because CI involves more brain aging, this suggests that the 3D-CNN captures
CI better than the SFCN. Females are at higher risk for AD and exhibit faster decline than males
[84]. Females already have a larger mean AG in the MCI stage, whereas this is not the case for
males until the AD stage. Thus, our model captures known sex differences in AD risk.
26
Variances in AG between the 3D-CNN and SFCN are significantly different for the UKBB
cohort even though F = σ
2
SFCN/σ
2
CNN ≃ 1, which usually implies lack of significant differences in
variance. This finding can be explained by our use of Pitman’s variance ratio test, which is justified here because the variances being compared pertain to correlated samples (CNN- and SFCNcomputed AGs measured for the same cohort). Because the 3D-CNN and SFCN were both trained
on UKBB CN participants, the abilities of these methods to estimate BA for new UKBB participants is likely better (and therefore more similar) than their ability to estimate BA for participants
from altogether new cohorts. This similarity may explain the strong correlation r of UKBB BAs
across the two methods (females: r = 0.989; males: r = 0.990). The dependence of Pitman’s t on
r (see Methods) satisfies t ∼ (1 − r
2
)
−1/2
. A Maclaurin series expansion indicates that t → ∞ as
r → 1. Thus, Pitman’s t is large when r ≃ 1 even when σ
2
SFCN/σ
2
CNN ≃ 1. This explains our power
to detect even moderate differences between σ
2
SFCN and σ
2
CNN in UKBB CN participants.
Complexity. Model complexity was quantified using mean ET and the number of trainable parameters in the model. By both measures, our 3D-CNN’s execution complexity is lower than that
of previous approaches. For example, the 3D-CNN features a ∼10 times shorter ET compared to
the SFCN [36, 37], and ∼4 times fewer trainable parameters. The model of Leonardsen et al. [83],
which is based on the SFCN [36, 37], has more trainable parameters and is more challenging to
fine-tune. The 3D-DenseNet [83] has ∼7 million trainable parameters (compared to our 682,881)
and requires extensive fine-tuning on new validation datasets via grid searches for optimal hyperparameters.
Interpretability. Lee et al. [82] use a 3D-DenseNet to compute saliency by covering the brain
with occlusion masks of size 113 mm3 = 1,331 mm3
. According to these authors, their saliencies
correlate with PET-mapped amyloid β and τ burdens. However, for participants with CI, the
anatomic patterns of brain aging mapped by Lee et al. are broadly similar to ours (Fig. 2.2 C and
D) across similar age ranges. This suggests the hypothesis that BA saliencies like ours can reflect
dAD-related clinical PET findings. He et al. [85] used two-dimensional (2D) occlusions (box size:
322 mm2 = 1,024 mm2
) to map saliency, whereas Wood et al. [86] monitored quantify performance
27
and saliency by occluding 3D masks (size: 53 mm3 = 125 mm3
). Our study advances the state of
the art by 1) providing voxelwise saliency maps to reveal detailed spatial variability at native MRI
resolution (1 mm3
), 2) reporting comparisons by sex and cognitive status, and 3) conveying how
cognitive status relates to neurocognitive function.
Generalizability. Whereas most BA estimators are not typically tested across domain-specific
neurocognitive measures, our 3D-CNN features unique generalizability to independent cohorts in
its ability to capture neurocognitive endophenotypes. Since the R
2 values achieved on independent
and test data are similar, we surmise that overfitting was largely avoided. Compared to CA, the BA
of participants with any type of CI is significantly more correlated with measures of neurocognitive
function routinely used as clinical indicators of CI. Other published approaches have rarely been
evaluated according to this critical performance benchmark. Because the 3D-CNN was trained on
subjects aged 22-95 yr, its utility extends across the age range of adulthood.
2.6.5 Limitations
Although we validated the 3D-CNN in cohorts independent from those used for its training, differences in acquisition sequences and scanners across MRIs can affect results [87]. Like other
dementia diagnosis criteria, ADNI’s have limitations (e.g., a risk of false positive diagnoses [72])
that may affect the findings of studies like ours. Additionally, floor effects may affect cognitive
measures in participants with AD by attenuating their correlation. Conceivably, our failure to find
significant correlations between BA and neurocognitive measures in participants with AD could
be due to our lower power to detect small effects in the AD sample, which is smaller (N ≤ 172)
compared to the MCI (N ≤ 347) and combined CI (i.e., MCI or AD, N ≤ 519) samples. These
non-significant correlations, however, are not typically relevant for early CI screening because
most participants with severe CI have been already diagnosed by the time brain MRIs are typically
acquired. The nonuniform distribution of CAs in our aggregate sample translates into potential
training data imbalance and inaccuracy in BA estimates. Nevertheless, our approach is more accurate than others currently available, as reflected by our test set’s MAE and R
2
, which are the lowest
28
reported to date. Due to the lack of ground truth, there is no consensus on how the interpretability
of approaches like ours ought to be evaluated [82, 85, 86, 88].
29
Chapter 3
Raising the limit of image rescaling using auxiliary encoding
3.1 Introduction to super-resolution
Currently, ultra-high resolution (HR) images are often needed to be reduced from their original
resolutions to lower ones due to various limitations like display or transmission. Once resized,
there could be subsequent needs of scaling them up so it is useful to restore more high-frequency
details [89]. While deep learning super-resolution (SR) models [90, 91, 92] are powerful tools
to reconstruct HR images from low-resolution (LR) inputs, they are often limited to pre-defined
image downscaling methods. Additionally, due to memory and speed constraints, HR images or
videos are also commonly resized to lower resolution for downstream computer vision tasks like
image classification and video understanding. Similarly, they rely on conventional resizing methods which are subject to information loss and have negative impact on downstream tasks [93].
Hence, learned image downscaling techniques with minimum loss in high-frequency information
are quite indispensable for both scenarios. Lastly, it is known that SR models optimized for upscaling only are subject to model stability issues when multiple downscaling-to-upscaling cycles
are applied [94] so it further validates the necessity of learning downscaling and upscaling jointly.
To overcome these challenges and utilize the relationship between upscaling and downscaling steps, recent works designed the encoder-decoder framework to unite these two independent
tasks together. Kim et al. [95] utilized autoencoder (AE) architecture, where the encoder is the
downscaling network and the decoder is the upscaling network, to find the optimal LR result that
30
maximizes the restoration performance of the HR image. Sun et al. [96] designed a learned content
adaptive image downscaling model in which an SR model is trained simultaneously to best recover
the HR images. Later on, Li et al. [97] proposed a learning approach for image compact-resolution
using a convolutional neural network (CNN-CR) where the image SR problem is formulated to
jointly minimize the reconstruction loss and the regularization loss. Although the above models
can efficiently improve the quality of HR images recovered from corresponding LR images, these
works only optimize downscaling and SR separately, while ignoring the potential mutual intension
between downscaling and inverse upscaling.
More recently, a jointly optimized rescaling model was proposed by Xiao et al. [17] to achieve
significantly improved performance. An Invertible Rescaling Net (IRN) was designed to model
the reciprocal nature of the downscaling and upscaling processes. For downscaling, IRN was
trained to convert HR input to visually-pleasing LR output and a latent variable z. As z is trained
to follow an input-agnostic Gaussian distribution, the HR image can be accurately reconstructed
during the inverse up-scaling procedure although z is randomly sampled from a normal distribution.
Nevertheless, the model’s performance can be further improved if the high-frequency information
remaining in z is efficiently stored.
To resolve the above difficulties and take full potential of the IRN, here we propose two approaches, namely the IRN-meta (IRN-M) and IRN-alpha (IRN-A), respectively, to efficiently compress the high frequency information stored in z, which can be used to recover z and help restore
HR image consequently during the inverse up-scaling. For IRN-A, we train the model to extract
a fourth LR channel in addition to the LR RGB channels. It represents essential high frequency
information which was lost in the IRN baseline due to random sampling of z, and is saved as
the alpha-channel of saved LR output. For IRN-M approach, an AE module is trained to compress z as a compact latent variable, which can be saved as metadata of the LR output. In the
inverse upscaling process, z is restored from the latent space by utilizing the well-trained decoder.
Both modules are also successfully applied to the state-of-the-art (SOTA) rescaling model DLVIRN [98]. In summary, the main contribution of this paper is that we are the first to compress the
31
high-frequency information in z, which is not fully utilized in current invertible image rescaling
models, to improve the restored HR image quality in upscaling progress.
(a)
(b)
InvBlock
Haar Transformation
...
InvBlock
Latent varibale (z)
Downscaling module
...
InvBlock
Haar Transformation
...
InvBlock
Latent varibale (z)
s
... Downscaling module
Encoder
InvBlock
High frequency component
Low frequency component
Latent space
Decoder Stacked Downscaling module
Downscaling module
Downscaling module
Figure 3.1: Illustration of invertible image rescaling network architecture: (a) RGBA approach and (b)
metadata approach.
3.2 Proposed methods
3.2.1 IRN-alpha architecture
Fig. 5.1 (a) shows the IRN-A network architectures, where the invertible neural network blocks
(InvBlocks) are referenced from previous work IRN [17]. In the new model, the input HR image
is resized via Haar transformation before splitting to a lower branch xl and a higher branch xh.
More specifically, Haar transformation converts the input HR image (C,H,W) into a matrix of
shape (4C,H/2,W/2), where C, H, and W represent image color channels, height and width respectively. The first C channels represent low-frequency components of the input image in general
32
and the remaining 3C channels represent the high-frequency information on vertical, horizontal
and diagonal directions respectively. Different from the IRN baseline, which uses only the C lowfrequency channels in the lower branch, we add 1 additional channel, denoted as alpha-channel
for convenience as it would be stored as the alpha-channel in RGBA format, in the lower branch
xl
to store the compressed high-frequency information. After the first Haar transformation, the
alpha-channel is initialized with the average value across all 3C high-frequency channels, and only
3C − 1 channels are included in xh as the first channel is removed to make the total number of
channels remain constant.
After channel splitting, xl and xh are fed into cascaded InvBlocks and transformed to an LR
RGBA image y and an auxiliary latent variable z. First three channels of y consist of the visual RGB
channels and the fourth channel contains the compressed high-frequency components transformed
along the InvBlocks. The alpha-channel was normalized via a sigmoid function, S(α) = 1
1+e−α , to
help quantization of the alpha-channel and maintain training stability.
For the inverse upscaling process, the model needs to recover z, denoted as ˆz as it is not stored.
In previous work, ˆz was randomly drawn from normal Gaussian distribution. While this helps
creating diverse samples in generative models, it is not optimal for tasks like image rescaling
which aims to restore one HR image instead of diverse variations. Therefore, we set ˆz as 0, the
mean value of the normal distribution, for the inverse up-scaling process. This technique was also
validated in previous works like FGRN [99] and DLV-IRN [98]. Of note, at the end of inverse
process, the deleted high frequency channel needs to be recovered as
xm = 3C ×xα −∑
3C−1
i=1
x
i
h
(3.1)
where xm represents the channel removed from xh and xα represents the alpha-channel in xl
.
33
3.2.2 IRN-meta architecture
Besides storing the compressed high-frequency information in a separate alpha-channel, we also
propose an alternative space-saving approach to store the extracted information as metadata of the
image file. Image metadata is text information pertaining to an image file that is embedded into
the image file or contained in a separate file in a digital asset management system. Metadata is
readily supported by existing image format so this proposed method could be easily integrated
with current solutions.
The network architecture of our metadata approach is shown in Fig. 5.1 (b). Here xl and xh,
same as the IRN baseline, are split from Haar transformed 4C channels to C and 3C channels
respectively. Unlike the RGBA approach, the metadata method uses an encoder at the end to compress the z and save the latent vector S as metadata, rather than saving as the alpha-channel of the
output. S will be decompressed by the decoder for the inverse upscaling step. In our AE architecture, the encoder compacts the number of z channels from 3C ×n
2 −C to 4 via 2D convolution
layers and compresses the z’s height and width from (H/2
n
,W/2
n
) to (H/2
n+2
,W/2
n+2
) by using
max-pooling layers. Here n is 1 or 2 depending on the scale factor of 2× or 4×. Of note, the AE
was pre-trained with MSE loss before being embedded into the model structure.
After placing the well-trained AE in the IRN architecture, the entire structure was trained to
minimize the following mixture loss function:
L = λ1Lr +λ2Lg +λ3Ld +λ4Lmse (3.2)
where Lr
is the L1 loss for reconstructing HR image; Lg is the L2 loss for the generated LR image;
Ld is the distribution matching loss; and Lmse is the MSE loss between the input of the encoder and
the output of the decoder.
34
IRN-A αavg
BSD100 Urban100 DIV2K
PSNR/SSIM↑ PSNR/SSIM↑ PSNR/SSIM↑
Post-split ✗ 32.66 / 0.9083 32.50 / 0.9328 36.19 / 0.9464
Pre-split ✗ 33.02 / 0.9132 32.17 / 0.9186 36.60 / 0.9495
✓ 33.12 / 0.9150 33.10 / 0.9384 36.67 / 0.9504
Table 3.1: Comparison of 4× upscaling results using different IRN-A hyperparameters and settings. The
best results are highlighted in red.
IRN-M AEp AEf
BSD100 Urban100 DIV2K
PSNR/SSIM↑ PSNR/SSIM↑ PSNR/SSIM↑
2layers
✗ ✗ 31.41 / 0.8771 30.79 / 0.9074 34.79 / 0.9283
✓ ✓ 31.58 / 0.8793 31.30 / 0.9123 35.06 / 0.9306
✓ ✗ 31.65 / 0.8804 31.34 / 0.9154 35.09 / 0.9306
4layers ✗ ✗ 28.15 / 0.7765 25.82 / 0.7989 30.72 / 0.8591
✓ ✗ 31.69 / 0.8812 31.44 / 0.9143 35.15 / 0.9314
Table 3.2: Comparison of 4× upscaling results using different IRN-M hyperparameters and settings. The
best results are highlighted in red.
3.3 Experimental results
Following the same training strategy and hyperparameters in IRN baseline, our models were
trained on the DIV2K [100] dataset, which includes 800 HR training images. IRN-M and IRN-A
were trained with 500,000 and 250,000 iterations respectively. Both models were evaluated across
five benchmark datasets: Set5 [101], Set14 [102], BSD100 [103], Urban100 [104] and the validation set of DIV2K. The upscaled images quality across different models were assessed via the
peak noise-signal ratio (PSNR) and SSIM on the Y channel of the YCbCr color space. Following
previous works [99, 98], as it is not beneficial to add randomness in restoring HR images, we set ˆz
as 0 during the inverse up-scaling process for both training and validation steps in all experiments.
35
HR Image Bicubic CAR [96] IRN [17] DLV [98] DLV-IRN-M DLV-IRN-A GT
Figure 3.2: Visual examples from Urban100 test set (Best viewed in online version with zoom-in).
Method Scale Set5 [101] Set14 [102] BSD100 [103] Urban100 [104] DIV2K [105]
PSNR/SSIM↑ PSNR/SSIM↑ PSNR/SSIM↑ PSNR/SSIM↑ PSNR/SSIM↑
CAR [96] 2 38.94 / 0.9658 35.61 / 0.9404 33.83 / 0.9262 35.24 / 0.9572 38.26 / 0.9599
IRN [17] 2 43.99 / 0.9871 40.79 / 0.9778 41.32 / 0.9876 39.92 / 0.9865 44.32 / 0.9908
FGRN [99] 2 44.15 / 0.9902 42.28 / 0.9840 41.87 / 0.9887 41.71 / 0.9904 45.08 / 0.9917
DLV-IRN [98] 2 45.42 / 0.9910 42.16 / 0.9839 42.91 / 0.9916 41.29 / 0.9904 45.58 / 0.9934
DLV-IRN-M 2 45.83 / 0.9916 42.47 / 0.9850 43.38 / 0.9925 41.77 / 0.9911 45.91 / 0.9939
DLV-IRN-A 2 47.81 / 0.9937 44.96 / 0.9884 47.15 / 0.9967 45.07 / 0.9953 48.94 / 0.9968
CAR [96] 4 33.88 / 0.9174 30.31 / 0.8382 29.15 / 0.8001 29.28 / 0.8711 32.82 / 0.8837
IRN [17] 4 36.19 / 0.9451 32.67 / 0.9015 31.64 / 0.8826 31.41 / 0.9157 35.07 / 0.9318
HCFlow [106] 4 36.29 / 0.9468 33.02 / 0.9065 31.74 / 0.8864 31.62 / 0.9206 35.23 / 0.9346
FGRN [99] 4 36.97 / 0.9505 33.77 / 0.9168 31.83 / 0.8907 31.91 / 0.9253 35.15 / 0.9322
DLV-IRN [98] 4 36.62 / 0.9484 33.26 / 0.9093 32.05 / 0.8893 32.26/ 0.9253 35.55/ 0.9363
DLV-IRN-M 4 36.67 / 0.9490 33.33 / 0.9105 32.12 / 0.8909 32.33 / 0.9264 35.63 / 0.9373
DLV-IRN-A 4 37.56 / 0.9566 34.12 / 0.9246 33.12 / 0.9150 33.10 / 0.9384 36.67 / 0.9504
Table 3.3: Quantitative results of upscaled ×4 images of 5 datasets across different bidirectional rescaling
approaches. The best two results highlighted in red and blue respectively.
3.3.1 Ablation study
As the transformed alpha-channel is the key innovation for improved performance for IRN-A, the
pre-splitting and initial settings of the alpha-channel before the forward transformation process are
very important. For better analysis of their effects, Table 5.2 shows an ablation study that compares
the results for different settings of the alpha-channel, where “post-split" and “pre-split" refer to
splitting the alpha-channel after the downscaling module or before the InvBlock respectively, and
αavg represents presetting the average value of high-frequency information in the pre-split alphachannel. From Table 5.2, we notice that using the αavg with pre-split architecture performs best
across all options.
36
The IRN-M model constructs the HR image by decoding the latent space s saved in the metadata file. Table 3.2 shows another ablation study for determining the optimal AE structure, where
AEp represents that AE, before training as part of IRN-M, is pre-trained using MSE loss with
standalone random z; AEf represents fixing the AE during training the IRN-M; and “2layers" and
“4layers" represent two and four convolutional layers used in AE respectively. As shown in Table 3.2, using the IRN-M with pre-trained 4 layers AE and not fixing the AE during training has the
best performance. Of all three settings, pre-training of AE is the most critical factor in maximizing
performance.
3.3.2 Image rescaling
The quantitative comparison results for HR image reconstruction are shown in Table 6.1. Rather
than choosing SR models which only optimize upscaling steps, we consider SOTA bidirectional
(jointly optimizing downscaling and upscaling steps) models for fair comparison [96, 17, 98, 99,
106]. As shown in Table 6.1, DLV-IRN-A is efficient at storing high-frequency information in
the alpha-channel and consequently outperforms its baseline DLV-IRN, as well as other models,
including HCFlow and IRN models, which randomly samples ˆz for the upscaling step. For DLVIRN-M, while not as good as the -A variant, it still performs better than all other models, only
trailing behind FGRN for two small test sets at 4×. Hence we conclude that both -M and -A modules can improve the modeling of the high-frequency information and help restore the HR image
consequently. Visual examples of the 4× test in Fig 3.2 also validate the improved performance
from our models.
3.4 Discussion
To fully mine the potential of image rescaling models based on INN, two novel modules are proposed to store otherwise lost high-frequency information z. The IRN-M model utilizes an autoencoder to compress z and save as metadata in native image format so it can be decoded to an
37
approximate of z, while IRN-A adds an additional channel to store crucial high-frequency information, which can be quantized and stored as the alpha-channel, in addition to the RGB channels,
in existing RGBA format. With carefully designed autoencoder and alpha-channel pre-split, it is
shown that both modules can improve the upscaling performance significantly comparing to the
IRN baseline. The proposed modules are also applicable to newer baseline models like DLV-IRN
and DLV-IRN-A is by far the best, which further pushes the limit of image rescaling performance
with a significant margin.
38
Chapter 4
Fractional dynamics foster deep learning of COPD stage
prediction
4.1 Introduction to COPD
Chronic Obstructive Pulmonary Disease (COPD) is an increasingly prevalent respiratory disorder,
which represents a severe impediment for the quality of life [107, 108]; it is the third or fourth
major cause of death worldwide [109]. Medical practice presents COPD as an inflammatory lung
condition consisting of a slow, progressive obstruction of airways that reduces pulmonary capacity
[110]. Medical science has not entirely clarified what triggers COPD; nonetheless, scientists indicate the complex interactions between the environmental factors—such as pollution exposure or
smoking—and the genetics [111] as likely causes. COPD is not reversible, but early diagnosis creates incentives for achieving a better disease evolution, and an improved patient condition through
personalized treatments [112].
The Global Initiative for Obstructive Pulmonary Disease (GOLD) defines COPD—based on
pulmonary function testing or spirometry—as the ratio between the forced expiratory volume in
one second and the forced vital capacity (FEV1/FVC) of < 0.7 in a patient with symptoms of
dyspnea, chronic cough, and sputum production, with an exposure history to cigarette smoke or
biofuels, or occupational particulate matter. The spirometer is a device that measures the lung’s
volume and air debits, rendered as forced expiratory volume in one second (FEV1), forced vital
39
capacity (FVC), and the ratio between FEV1 and FVC; physicians use these parameters to classify patients in one of the following COPD stages: 1–Mild, 2–Moderate, 3–Severe, and 4–Very
Severe. The almost unanimously accepted classification methodology is the COPD Gold Standard
[113, 114], although there are some differences in applying it [115]. Unfortunately, early COPD
detection and diagnosis are challenging at the population level because relevant clinical signs are
hard to detect in the early phases. When suspected, patients are ordinarily subjected to pulmonary
function tests (i.e., spirometry) and mostly diagnosed when they are already in stages 2–4; thus, designing therapies to improve the disease trajectory becomes difficult [116]. Another problem with
spirometry is that it does not always render reliable results, mainly when not performed in a specialized pulmonary center [117]. Nonetheless, the fact that COPD has become a global threat [107,
108] further emphasizes the importance of decentralizing diagnosis, meaning that finding innovative methods to diagnose COPD outside respiratory medicine centers becomes paramount. Recent
medical research suggests that personalized medicine could improve COPD diagnosis [112]. One
approach to COPD personalized care is identifying patient phenotypes based on comorbidities,
simple clinical, and anthropometric data (e.g., age, body-mass index, smoker status). To this end,
the medical practice uses two questionnaires to evaluate symptoms and evaluate the severity of
the disease, namely COPD Assessment Test (CAT) and Medical Research Council Breathlessness
Scale (MRC) [118, 119]. Also, there are algorithmic methods for clustering COPD patients based
on big data, complex network analysis, and deep learning [120, 121, 122, 123, 124]. However,
these techniques have not resulted in high prediction accuracy; the reason is that they only focus
on investigating novel machine learning models rather than analyzing the geometric characteristics of the data. Furthermore, big data and Internet-of-Things (IoT) solutions were proven to
be effective in COPD management, but such existent engineering systems are merely monitoring
physiological signals to provide therapeutic feedback to physicians [125, 126]. Instead, in this
paper, in the distribution of moment-wise estimates of the Hurst exponents in the healthy/COPD
groups and offer a rigorous alternative to the conventional (spirometry-based) methodology for
COPD diagnostics. Two hypotheses underpin the solution we introduced:
40
1. The physiological signals relevant to COPD (e.g., respiratory rate, oxygen saturation, abdomen breathing effort, etc.) have a multi-fractal nature, and their fractional-order dynamics
specifically characterize the COPD pathogenic mechanisms.
2. We can capture the fingerprints of the COPD-related physiological processes with the coupling matrix in our mathematical modeling of the physiological dynamics. (In other words,
the coupling matrix A deciphers the interdependencies and correlations between the recorded
signals.)
In this work, we generate two novel COPD physiological signals datasets (WestRo COPD dataset
and WestRo Porti COPD dataset) and implement our method by analyzing the relevant physiological signals recorded with an IoMT (Internet of Medical Things) infrastructure. We extract the
fractional dynamics signatures specific to the COPD medical records and train a deep neural network to diagnose COPD stages using both fractal dynamic network signatures and expert analysis
(see Figure 4.1).
4.2 Recorded phyiological signals
WestRo COPD dataset. In this dataset, each medical case consists of 12 signal records. First, we
recorded seven physiological signals from our patients with the Respiratory Inductance Plethysmography – RIP (signals Thorax Breathing Effort and Abdomen Breathing Effort), the wireless
pulse-oximeter (signals Oxygen Saturation Levels, SpO2 beat-to-beat mode, Pulse, and Plethysmograph), and the nasal cannula (signal Nasal Pressure). The NOX T3™ portable sleep monitor
integrates and synchronizes the RIP, the wireless pulse-oximeter, and the nasal cannula. (Section
Experimental, subsection Data collection provides detailed information). Moreover, the Noxturnal™ software application, which accompanies the NOX T3™ derived five additional signals: RIP
Sum (the sum of the abdomen and thorax breathing effort signals), Activity (derived from the X,
Y, and Z gravity axes), Position (in degrees, derived from the X, Y, and Z gravity axes, where the
supine position is 0 degrees), Flow (derived from the nasal pressure signal), Resp Rate (respirations
41
IoT
infrastructures
placement
Fractional
dynamics
analysis Coupling matrix A
Each patient i has 12 signals
Physiological
signals gathering
Physiological signal Fluctuation function Generalized hurst exponent
a. b.
c. d. e.
f.
Physiological signal time series
Stage 1
Stage 2
Stage 3
Stage 4
Healthy
...
DNN predictions
Figure 4.1: Overview of the proposed method for COPD stage prediction: (a) based on the medical observations from the latest research in the field, we identify the physiological signals with relevance in COPD
and measure them, (b) we record such physiological signals with a medical sensor network—NOX T3™
portable sleep monitor [127]. An example of a physiological signal (Abdomen) recorded from a stage 4
COPD patient is shown in Figure 1 (c). Figure 1 (d) and (e) summarize the multifractal analysis in terms
of scaling function and generalized Hurst exponent. (In (c-e), we intend to present the case) (f) We employ
the analysis of the fractional-order dynamics to extract the signatures of the signals as coupling matrices
and fractional-order exponents and use these signatures (along with expert diagnosis) to train a deep neural
network that can identify COPD stages.
42
per minute derived from the RIP Sum signal). All the medical records in this dataset are gathered
from four Pulmonology Clinics in Western Romania (Victor Babe¸s Hospital – VB, Medicover 1 –
MD1, Medicover 2 – MD2, and Cardio Prevent – CP clinics).
WestRo Porti COPD dataset. The dataset consists of 6 physiological signals recorded in 13824
medical cases from 534 individuals during 2013-2020. The patients in the WestRo Porti are
screened with the Porti SleepDoc 7 potable PSG device by recording 6 physiological signals (Flow,
SpO2, Pulse, Pulsewave, Thorax, Abdomen) overnight. The 6 Porti SleepDoc 7 signals correspond, respectively, to the following NOX T3 signals: Flow, Oxygen Saturation Levels, Pulse,
Plethysmograph, Thorax Breathing Effort, and Abdomen Breathing Effort. The reason for involving this dataset in this study is that: (1) we want to involve it as an external dataset to validate
our model; (2) we want to test the robustness of our prediction and diagnosis approach where the
medical signals records in this dataset interfere with another disease (sleep apnea).
4.3 Multifractal detrended fluctuation analysis
Multifractal detrended fluctuation analysis (MF-DFA) is an effective approach to estimate the multifractal properties of biomedical signals[128]. The first step of MF-DFA is to calculate the cumulative profile (Y(t))
Y(t) =
t
∑
i=1
X(i)− ⟨X(i)⟩, (4.1)
where X is a bounded time series. Then, divide the cumulated signal equally into Ns non-overlapping
time windows with length s, and remove the local line trend (local least-squares straight-line fit) yv
from each time window. Therefore, F(v,s) characterizes the root-mean-square deviation from the
trend (i.e., the fluctuation),
F(v,s) = s
1
s
s
∑
i=1
{Y⌊(v−1)s+i⌋ −yv(i)}
2
. (4.2)
43
In [128], the authors defined the scaling function as
S(q,s) =
1
Ns
Ns
∑
v=1
µ(v,s)
q
1/q
, (4.3)
where µ is an appropriate measure which depends on the scale of the observation (s). Hence, the
scaling function is defined by substituting equation B.2 into equation B.3,
SF(q,s) =
1
Ns
Ns
∑
v=1
n1
s
s
∑
i=1
{Y⌊(v−1)s+i⌋ −yv(i)}
2
oq/2
1/q
. (4.4)
The moment-wise scaling functions for a multifractal signal exhibit a convergent structure that
yields to a focus point for all q-values. The convergent structure was first introduced in [128], and
such focus points, as described in [128], can be deduced from equation B.3, by considering the
signal length as a scale parameter,
S(q,L) =
1
Ns
NL
∑
v=1
µ(v,L)
q
1/q
= {µ(v,L)
q
}
1/q = µ(v,L), (4.5)
where the value of µ represents the entire signal, namely NL = 1 (i.e., takes only one time window
into consideration). According to equation B.5, the scaling function S(q,L) becomes independent
from the exponent q and the moment-wise scaling functions will converge to µ(v,L) which is the
mathematical definition of the focus point.
4.4 Fractal properties of physiological signals
To verify our first hypothesis, in this section, we show the fractal features of raw signals (Thorax, Oxygen Saturation, Pulse, Plethysmograph, Nasal Pressure, and Abdomen) in healthy persons
(stage 0) and critical COPD patients (stage 4) in our WestRo COPD dataset. As shown in [129,
130, 131, 132], Detrended Fluctuation Analysis (DFA) is an effective method to investigate the
44
statistical scaling and monofractal properties of non-stationary time series. For instance, the dichotomous models of fractional Gaussian noise (fGn) and non-stationary fractional Brownian motion (fBm)—initially described by Mandelbrot and van Ness [129]—have been shown as a proper
mono-fractal modeling framework for physiological signals [130]. In addition, DFA is also widely
used to investigate the time-series data in human respiration and heart rate. For instance, Peng et al.
applied the DFA technique to quantify the scaling behavior of nonstationary respiratory time series
and analyze the presence of long-range correlations of breathing dynamics in healthy adults [133].
Furthermore, Schumann et al. used DFA to measure the autocorrelations in heartbeat intervals and
respiration on longer time scales [134]. To overcome the challenges of the ’inversed’ singularity
spectrum in standard multifractal analysis, Mukli, Nagy, and Eke [128] proposed the focus-based
multifractal formulas, which compute a moment-wise global-error parameter capturing the finite
size effects and the signals’ degree of multifractality. In order to mine the physiological complexity
and account for its nonstationarity, we perform a comprehensive multifractal detrended fluctuation
analysis (MF-DFA) of the collected data.
To analyze the fractional dynamic characteristics of the COPD physiological processes, we
calculate the scaling (fluctuation) functions of the raw signals (Thorax, Oxygen Saturation, Pulse,
Plethysmograph, Nasal Pressure, and Abdomen) in healthy people (stage 0) and very severe COPD
patients (stage 4) (for detailed information about MF-DFA and the scaling function, see section Experimental, subsection Multifractal detrended fluctuation analysis). In Figure 4.2, panels (a-c) and
(g-i), respectively, show the scaling functions calculated from Abdomen, Pulse, Plethysmograph,
Thorax, Nasal pressure, and Oxygen saturation signals generated for a healthy person and panels
(d-f) and (j-l) illustrate the scaling functions for the same signals from a very severe COPD patient (stage 4). We set the q values as q ∈ {−5,−3,−1,1,3,5}. In Figures 4.2 (a-c) and (g-i),
we find that the scaling functions under different q values will converge to a focus point (the dark
purple nodes in each panel), except the Pulse signal. The focus point S(v,L) can be measured as
the scaling function’s fundamental property (for details on focus point, see section Experimental,
subsection Multifractal detrended fluctuation analysis). Accordingly, if a signal’s scaling function
45
has a focus point, it has multifractal features [128, 135]. The pink lines in Figure 4.2 are the lines
best-fitted to the scaling function’s observed data and the pink dots represent different segmented
sample sizes. To be more specific, we show the scaling functions of physiological signals extracted from all stage 4 patients and the healthy people (stage 0) in our dataset (where q ∈ [−5,5])
in Supplementary material (for detailed information, see Supplementary material’s section Hurst
exponents of physiological signals). Hence, from Figures 4.2 (a-c, g-i), the raw signals—except
the Pulse signal—in healthy persons have multifractal features. Conversely, in Figure 4.2 (d-f,
j-l), these scaling functions do not have focus points within the scale (except the Nasal pressure
signal), which may suggest that such signals—recorded from severe COPD patients—do not have
multifractal features.
Figure 4.3 presents the H(q) comparisons between physiological signals (Abdomen (a), Thorax
(b), Oxygen Saturation (c), Plethysmograph (d), Nasal Pressure (e), and Thorax (f)) extracted from
healthy people and patients with 95% confidence interval, where H(q) is the generalized Hurst
exponent and represents the set of associated slopes of the pink lines in Figure 4.2. As discussed
in Bashan et al., the Hurst exponent measures the auto-correlations of the time-series data, where
the autocorrelations are linked with both short-term and long-term memory [136], and larger Hurst
exponent values (i.e., H > 0.5) represent persistent behavior (i.e., a more correlated structure) in
monofractal fractional Gaussian noise type signals [137, 130]. In contract, the generalized Hurst
exponent (H(q)) can capture the scaling properties of the time series data and reflects the heterogeneous scaling of the qth order moments of the increments of the temporal process [128]. H(q)
confidence interval curves among healthy people (stage 0) and COPD patients (stage 4) in Figure
4.3 are not fully overlapped, which shows that the physiological signals extracted from healthy
people and severe COPD patients possess different fractional properties (The non-linear decrease
distribution of the H(q) function illustrates that the fitted lines of scaling functions under different
q values will converge to a focus point). In Figure 4.3, all signals recorded from stage 4 COPD patients have different H(q) confidence intervals than the signals recorded from stage 0 participants,
showing that signals collected from patients with different COPD stages have different fractional
46
features. Consequently, the overarching conclusion of Figure 4.2 and Figure 4.3 is that the physiological signals with relevance for COPD have different fractional dynamics characteristics in
healthy people—on the one hand—and severe COPD patients—on the other hand.
To analyze the H(q) functions across different COPD stages, we calculate the Wasserstein distance between each H(q) mean value curve in every physiological signal extracted from patients
with different stages (we calculate the Wasserstein distances with the wasserstein_distance function in Python’s scipy package [138]). For detailed results about H(q) curves for all COPD stages,
see Supplementary material’s section Hurst exponents of physiological signals. The Wasserstein
distance is a metric that measures differences between distributions [139]. In Figure 4.4, we show
that, for each physiological signal, the signals’ H(q) function extracted from stage 0 patients have
the largest Wasserstein distance from stage 4 patients (except the Plethysmograph signal). In contrast, the H(q) functions of signals extracted from patients in moderate to severe COPD stages
(i.e., 1, 2, and 3) have much smaller Wasserstein distances between each other (except the Abdomen signal). The results presented in Figure 4.4 represent evidence that reinforces the medical
observation stating that it is hard to distinguish early COPD stages. As such, it makes sense to
analyze the spatial coupling between these physiological processes (signals) across time. These
spatial coupling matrices A contain the different fractional features across different signal samples
and can help us classify the signals recorded from suspected patients into different COPD stages.
The general conclusion in Figures 4.2, 4.3, and 4.4 is that the physiological signals recorded
gathered from healthy individuals and severe COPD patients have distinct fractal properties. (This
dichotomy is less evident for the Pulse signal, even if the tendency towards multifractality in
healthy individuals is still present.) The notable exception is the Nasal Pressure signal, which
has multifractality in both healthy and COPD individuals. This observation suggests that COPD
mostly affects the physiology of muscles involved in or supporting breathing (Thorax Breathing
Effort and Abdomen Breathing Effort) [140, 141] and the circulatory system’s physiology (reflected in Oxygen Saturation Levels and Plethysmograph)[142]. Conversely, the upper respiratory
47
tract’s physiological dynamics do not appear to be affected even by severe and very severe COPD
stages.
(a)
(l) (j) (k)
(g) (h) (i)
(d) (e) (f)
(b) (c)
A
A A
A
A
A
Figure 4.2: The geometry of fluctuation profiles for the COPD-relevant physiological signals recorded
from a normal abdomen and a stage 4 COPD abdomen. We calculate the scaling functions from 6 raw
physiological signals: Abdomen, Thorax, Oxygen Saturation, Plethysmograph, Nasal Pressure, and Pulse,
where the exponents are q ∈ [-5, 5]. Panels (a-c) and (g-i) with signals recorded from a healthy person,
(d-f), and (j-l) with signals recorded from a representative stage 4 COPD patient. The resulting points (pink
nodes) of the multifractal scaling function with power-law scaling will converge to a focus point (dark purple
nodes) at the largest scale (L) if the tested signal has multifractal features.
48
(a)
(d) (e) (f)
(b) (c)
Figure 4.3: Multifractal analysis of 6 physiological signals from healthy people (stage 0) and stage 4
COPD patients with 95% confidence interval: Generalized Hurst exponent H(q) as a function of q-th
order moments (where q values are discretely extracted from −5 to 5) for physiological signals (Abdomen
(a), Thorax (b), Oxygen Saturation (c), Plethysmograph (d), Nasal Pressure (e), and Thorax (f)) extracted
from healthy people (stage 0) and severe COPD patients (stage 4).
4.5 Fractional dynamics modeling of COPD relevant
physiological signals subject to unknown perturbations
The dynamics of complex biological systems possess long-range memory (LRM) and fractal characteristics. For instance, several recent studies have demonstrated that stem cell division times [143],
blood glucose dynamics [144], heart rate variability [ivanov1996, ivanov1999] and brain-muscle
interdependence activity [145] are fitted by power-law distributions [146, 147, 148, 149]. The long
short-term memory (LSTM) architecture is one of the most widely used deep learning approaches
to analyze biological signals and perform prediction or classification. However, LSTM cannot
fully represent the long memory effect in the input, nor can it generate long memory sequences
from unknown noise inputs[8, 9]. Thus, when considering the very long-time series with longrange memory, LSTM cannot predict nor classify them with high accuracy. Indeed, in our study,
the length of each physiological signal has more than 72000 data points. We aim to capture both
49
(a)
(d) (e) (f)
(b) (c)
Figure 4.4: Comparison of Wasserstein distance between the distributions of mean H(q) curves across
different COPD stages (i.e., H(q) curves in Figure 4.3, Figure S2, and Figure S3). Comparison in terms of
Wasserstein distance between the H(q) distribution function for Abdomen (a), Thorax (b), Pulse (c), Nasal
Pressure (d), Oxygen Saturation (e), and Plethysmograph (f) signals recorded from patients across all the
COPD stages.
short-range and long-range memory characteristics of various physiological processes and—at the
same time—investigate the very long COPD signals with high accuracy; therefore, we adopt the
generalized mathematical modeling of the physiological dynamics,
∆
α
x[k +1] = Ax[k] +Bu[k]
y[k] = Cx[k], (4.6)
where x ∈ R
n
is the state of the biological system, u ∈ R
p
is the unknown input and y ∈ R
n
is the
output vector[144]. The main benefits of this generalized mathematical representation are threefold:
1. The model allows for capturing the intrinsic short-range memory and long-range memory of
each physiological signal through either an integer or fractional order derivative. To connect
50
the mathematical description with the discrete nature of measurements, the differential operator ∆ is used as the discrete version of the derivative; for example, ∆
1
x[k] = x[k]−x[k−1]. A
differential order of 1 has only one-step memory, and hence the classic linear-time invariant
models are retrieved as particular cases of the adopted mathematical model. However, when
the differential order is 1, the model cannot capture the long-range memory property of several physiological signals. Furthermore, we write the expansion of the fractional derivative
and discretization [150] for any i
th state (1 ≤ i ≤ n) as
∆
αixi
[k] =
k
∑
j=0
ψ(αi
, j)xi
[k − j], (4.7)
where αi
is the fractional order corresponding to the i
th state and ψ(αi
, j) = Γ(j−αi)
Γ(−αi)Γ(j+1)
with
Γ(.) denoting the gamma function. Equation (4.7) shows that the fractional-order derivative
framework provides a mathematical approach to capture the long-range memory by including all xi
[k − j] terms.
2. Our modeling approach describes the system dynamics through a matrix tuple (α,A,B,C)
of appropriate dimensions. The coupling matrix A represents the spatial coupling between
the physiological processes across time, while the input coupling matrix B determines how
the inputs affect these processes. We assume that the input size is always strictly smaller
than the state vector’s size, i.e., p < n. The coupling matrix A plays an essential role in
deciphering the correlations between the recorded physiological signals. These correlations
(entries of A) can indicate different physical conditions. For instance, when probing the
brain electrical activity (through electroencephalogram (EEG) signals), the correlations can
help at differentiating among various imaginary motor tasks [151]. Moreover, as described
in this work, we can exploit these correlations to differentiate among pathophysiological
states—such as degrees of disease progression—using physiological signals analysis. A key
challenge is the estimation accuracy of these correlations (A matrix), notably for partially
51
observed data. We have taken care of such limitations by using the concept of unknown
unknowns introduced in reference [144].
3. Since we may have only partial observability of the complex biological systems, we take
care of the unknown stimuli (excitations that may occur from other unobserved processes
but cannot be probed); as such, we include in the model the vector variable u and study
its impact on the recorded dynamics. In essence, we refer to this mathematical model as
a multi-dimensional fractional-order linear dynamical model with unknown stimuli. The
model parameters are estimated using an Expectation-Maximization (EM) based algorithm
described in reference [144], to overcome the lack of perfect observability and deal with
possibly small and corrupted measurements. Reference [144] proves that the algorithm is
convergent and shows that it reduces modeling errors.
4.6 Fractional Dynamics Deep Learning Prediction of COPD
stages
After extracting the signals’ features (short-range and long-range memory) with the fractional dynamic mathematical model, we utilize these features (i.e., coupling matrices A) to train a deep
neural network to predict patients’ COPD stage. Deep learning is a machine learning approach
that efficiently combines feature extraction and classification and it is a valuable tool for medical
diagnosis (i.e., it can logically explain a patient’s symptoms [1]). We develop the fractional dynamics deep learning model (FDDLM) presented in this section to predict the COPD stages for
our WestRo COPD dataset consisting of 4432 medical cases from patients in Pulmonology Clinics
from Western Romania. We evaluate these cases and FDDLM by k-fold cross-validation and holdout validation. K-fold cross-validation is a resampling procedure used to estimate how accurately
a machine learning model will perform in practice. In k-fold cross-validation (k = 5), we randomly
shuffle the input dataset and split it into 5 disjoint subsets. We select each subset as the test set
(20%) and combine the remaining subsets as the training set (80%). In hold-out validation, we
52
hold one institution out at a time. I.e., we hold out data from one institution as a test set, and the
remaining data from the other three institutions are used to train the models. The main steps of our
approach are:
1. We construct a COPD stage-predicting FDDLM, and calculate coupling matrix signatures
(A) of relevant physiological signals (such as Thorax Breathing Effort or Abdomen Breathing
Effort, etc.) to be used as the training data.
2. We train FDDLM with our training set to recognize the COPD level based on signal signatures.
3. We test FDDLM with our test set and predict patients’ COPD stage.
Our FDDLM uses a feedforward deep neural network architecture [152] with four layers: one
input layer, two hidden layers, and one output layer. We present our model’s structure in Figure
4.5. The input layer takes the input (i.e., signal signatures in the coupling matrix A) and pass it
to hidden layers. From layer to layer, neurons compute the sum of the weighted inputs from the
previous layer and settle the results through a nonlinear function (activation function) (see Figure
4.5 (a)). In our FDDLM, hidden layers’ activation functions are rectified linear unit (ReLU)[153].
The ReLU activation function returns the input value when the input is ≤ 0 (otherwise, it returns 0),
i.e., g(z) = max(0,z) for input value z. The output layer’s activation function is softmax (S), which
normalizes the input values into a probability distribution. We utilized the rmsprop optimizer
(with the default learning rate of 0.001) and the categorical cross entropy loss function. To avoid
potential overfitting, we insert dropout function (with 0.8 keep rate) after each hidden layer to
randomly select and ignore 20% of neurons.
4.6.1 K-fold cross-validation results
We process all physiological signals with the fractional dynamical model [144]; then, we feed
the signal signatures from the coupling matrix A to FDDLM. We implement the neural network
53
?
DNN
Inputs Outputs
Weights
Dataset
Inputs Weights
?
Biasb
Activation
function
(ReLu)
Output
(a) (b)
Hidden layers
(300 neurons)
Hidden layers
(100 neurons)
Figure 4.5: Network formulation for FDDLM: (a) Basic structure of an artificial neuron model; (b)
Overview of the neural network model we trained in FDDLM to identify COPD stages.
COPD Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
Sensitivity 97.50% 99.19% 98.66% 99.50% 96.92%
Specificity 99.87% 99.61% 99.41% 99.31% 99.88%
Precision 99.15% 97.60% 99.20% 98.06% 98.44%
Table 4.1: The COPD stage predicting results for test set with our Fractional Dynamics Deep Learning
Model (FDDLM).
54
(a) (c)
(d)
(b)
a
(j)
(g) (h)
(e) (f)
(k) (l)
AUROC
AUROC
AUROC
AUROC
AUROC
AUROC
(i)
Figure 4.6: Training and testing result comparisons (accuracy (a,d,g), AUROC (b,e,h), and loss (c,f,i)) of
different deep learning models for the k-fold cross-validation. Training/testing accuracy (a), AUROC (b),
and loss (c) for our FDDLM, where the training processes use signal signatures extracted with the fractional
dynamic mathematical model. Training/testing accuracy, AUROC, and loss for the Vanilla DNN model
(d-f) and the LSTM model (g-i), where the training processes use the physiological signals recorded with
portable sleep monitors (raw data). Both Vanilla DNN and LSTM models share similar network structures
and computation parameters with our FDDLM (Vanilla DNN has the same network structure as our model,
except the input size). We obtain these results with the k-fold cross-validation (k = 5). We also show the
confusion matrices for test set across different models: FDDLM in panel (j), Vanilla DNN in (k), and LSTM
in (l).
55
model in Python with Keras package and executed it on a computer with the Intel Core i7 2.2GHz
processor and 16GB RAM.
We evaluate our results based on accuracy, sensitivity, loss, precision, specificity, and area under the receiver operating characteristic curve (AUROC). Our estimation of all results—generated
from different models—uses the k-fold cross-validation method (with k = 5). Figure 4.6 presents
our model’s accuracy, AUROC, and loss curves on training and test sets. We also evaluate our
model by comparing it with the Vanilla DNN and LSTM models trained on physiological signals
(i.e., raw data) extracted from sleep monitors for reference. The Vanilla DNN and LSTM models have the same hyper-parameter setting as ours, including the optimizer, dropout configuration,
loss function, activation functions, except the input layer size. We aim to choose the classic deep
learning models with similar structures as baselines to investigate whether the fractional features
(coupling matrix A) are easier for models to classify than the raw data. Figures 4.6 (a-c) show the
training and test results observed from FDDLM. We observe that training and test accuracies and
AUROC increase, while loss decreases during training.
Figure 4.6 (d-f) and (g-i) present the training and test accuracy, AUROC, and loss results of
Vanilla DNN model and LSTM model trained with physiological signals (raw data). The training
and test results obtained from both Vanilla DNN model and LSTM model display overfitting, as test
accuracies decrease (test loss increase) while training accuracies increase (training loss decrease).
Thus, to deal with overfitting, we involve the early-stopping technique [154] in choosing the bestperforming Vanilla DNN and LSTM. Compared to our FDDLM, the best-performing Vanilla DNN
and LSTM result in much lower accuracies: 77.72%±0.688% and 78.54%±1.200%, respectively,
while our FDDLM achieves 98.66% ±0.447%.
Figures 4.6 (j-l) show the confusion matrix examples of FDDLM, Vanilla DNN, and LSTM,
respectively. The confusion matrices present the prediction results of the test set. (We construct
the test set using the last 20% of data in the WestRo COPD dataset and train models with the first
80% of data.)
56
The results point out that the FDDLM only misclassified 1.35% of the test sets in terms of individual COPD stages. Instead, Vanilla DNN and LSTM models misclassified 24.21% and 21.41%
of the test sets, respectively. (We also investigated the possibility of using the convolutional neural
network (CNN) model to characterize the physiological signal dynamics obtained from sleep monitors (raw data) and compare it with our FDDLM. The CNN model misclassified 63.87% of the
test sets with k-fold cross-validation. For detailed information about CNN model and results, see
section Experimental, subsection Neural network architecture for the WestRo COPD dataset and
Supplementary material’s subsection Training and testing results for CNN.) Table 5.2 presents the
precision, sensitivity, and specificity of our model’s predicting results; we find that all these results
exhibit a substantial accuracy except the sensitivity of stage 4, which is 96.92%. In conclusion,
our FDDLM predicts patients’ COPD stages with a much higher accuracy than Vanilla DNN and
LSTM models trained with physiological signals (raw data)—without overfitting—and represents
an effective alternative to the spirometry-based diagnostic. (We performed the K-fold analysis of
our model’s accuracy both on a per-recording and a per-patient basis and obtained very similar
results; see the Supplementary Information, section Per-patient based K-fold analysis.)
4.6.2 Hold-out validation
The COPD dataset consists of physiological signals recorded from consecutive patients from four
Pulmonology Clinics in Western Romania (Victor Babe¸s Hospital – VB, Medicover 1 – MD1,
Medicover 2 – MD2, and Cardio Prevent – CP clinics). To validate our FDDLM, we hold out all
data extracted from a single institution as test set and train models on data recorded from the other
three institutions. Following experimental setup from the previous section, we use Vanilla DNN
and LSTM models as baselines with hyper-parameters similar to our FDDLM. Figure 4.7 shows
the results of FDDLM, Vanilla DNN, and LSTM, in terms of accuracy, AUROC, and loss curves
of training and test sets. We train Vanilla DNN and LSTM models on physiological signals (i.e.,
raw data). Conversely, we train our FDDLM on the fractional signatures extracted from the raw
data.
57
(d)
(g)
(a)
(e)
(h)
(b)
(f)
(i)
(c) AUROC
AUROC
AUROC
AUROC
AUROC
AUROC
Figure 4.7: Training and testing result comparisons of different deep learning models for the hold-out validation. The training/testing accuracy (a), AUROC (b), and loss (c) for our FDDLM, where the training
processes use signal signatures extracted with the fractional dynamic mathematical model. The training/test
accuracy (d) and (g), AUROC (e) and (h), and loss (f) and (i), for Vanilla DNN model and LSTM model,
where the training processes use the physiological signals recorded with portable sleep monitors. Both
Vanilla DNN and LSTM models share similar network structures with FDDLM (i.e., same neural network
structure but different the input size). We obtain these results by holding out data from every single institution as the test set.
Figures 4.7 presents the training and test results (accuracy, AUROC, and loss) generated by
FDDLM (a-c), Vanilla DNN model (d-f), and LSTM (g-i) model, respectively. Figure 4.7 (a)
shows that the accuracy of FDDLM increases in training without overfitting. Conversely, the training and testing of Vanilla DNN and LSTM models clearly indicate lower values of the accuracy
(80.73% ± 3.46% and 80.83% ± 3.67%, respectively) in Figure 4.7 (d) and (g), while FDDLM
achieves 95.88% ±1.76%.
58
(b)
(c)
(a)
(d)
(f)
(g)
(e)
(h)
(j)
(k)
(i)
(l)
Figure 4.8: The comparison of confusion matrices resulted from different deep learning models: fractional
dynamics (a-d), Vanilla DNN (e-h), and LSTM (i-l). We built the test sets by holding out data gathered from
one institution (i.e., VB, MD1, MD2, and CP) at a time. The matrix representations clearly show that our
model outperforms both Vanilla DNN and LSTM—in all experiments and for all labels representing COPD
stages—in terms of prediction errors.
59
Of note, we observe that the test accuracy under hold-out validation (95.88%) is lower than the
accuracy obtained under k-fold cross-validation (for more detailed error analysis, we present the
visualization of extracted features (embeddings) in the last hidden layer of FDDLM across k-fold
and hold-out validation in the Supplementary materials subsection.) The reason for performance
degradation in hold-out is that the data recorded from each medical institution are imbalanced.
The Victor Babes (VB) and Cardio Prevent (CP) are two large clinics, and COPD patients are
more willing to get diagnosis or medical treatment in large units or hospitals rather than small
clinics, especially for severe and very severe COPD patients. Thus, the signals gathered from VB
and CP are more comprehensive than the Medicover 1’s (MD1) and Medicover 2’s (MD2). In holdout validation, although we balance the data across different institutions using over-sampling and
under-sampling approaches, the remaining imbalance in data collection is still the leading cause of
the prediction accuracy drop in the hold-out section.
Figure 4.8 (a, d, g, j), (b, e, h, k) and (c, f, i, l) show the confusion matrices (we only present the
test set prediction results) for FDDLM, Vanilla DNN, and LSTM models by holding out each institution as the test set, respectively. Results shown in Figure 4.8 prove that our model outperforms
the baselines in prediction accuracy across the entire hold-out validation process. Especially for
the early-stage detection (i.e., stages 1 and 2), our model achieves a higher accuracy than baselines
(for the sensitivity, specificity, and precision of these confusion matrices, see Supplementary materials Table S1, Table S2, and Table S3). The reason is that, as opposed to Vanilla DNN and LSTM
models, FDDLM can extract the signal signatures that contain the long-term memory of the time
series. We also use the convolutional neural network (CNN) model to characterize the dynamics of
the physiological signals recorded with sleep monitors to compare it with our FDDLM; the CNN
model misclassified 64.49% of the test sets under hold-out validation. For detailed information
about the CNN model and testing results, see section Experimental, subsection Neural network architecture for the WestRo COPD dataset and Supplementary material section Training and testing
results for CNN).
60
In summary, our model outperforms all baselines in terms of prediction accuracy under both
hold-out and k-fold cross-validation. The main conclusion is that FDDLM predicts patients’ COPD
stages with high accuracy and represents an efficient way to detect early COPD stages in suspected
individuals. Indeed, such a low-invasive and convenient tool can help physicians make precise
diagnoses and provide appropriate treatment plans for suspected patients.
4.6.3 Transfer learning
To evaluate our models’ performance, we utilize the transfer learning mechanism to investigate the
generalizability of our FDDLM. As such, we introduce the WestRo Porti COPD dataset. Transfer
learning is a machine learning method that reuses a model designed for analyzing a dataset on
another dataset, thus improving the learner from one domain by transferring information from
another related domain [155]. The medical subjects in the WestRo Porti are consecutive individuals
in the Victor Babes hospital records, screened for sleep apnea with the Porti SleepDoc 7 portable
PSG device by recording 6 physiological signals; some individuals are also in various COPD
stages. (For detailed information, see Experimental subsection Data collection). The reasons for
applying our COPD FDDLM are: (1) we want to verify that our model is valid on an external
dataset; (2) we want to test our model’s prediction performance when the medical signal records
are interfered with by another disease (i.e., sleep apnea).
We test FDDLM with the WestRo Porti COPD dataset to check the prediction performance.
Since the WestRo COPD Porti dataset only have 6 signals (whereas our model uses 12 signals), we
reconstructed the input size of the models from 144×1 to 36×1, retrained a new FDDLM with
WestRo and tested it on the WestRo Porti COPD dataset to check the performance. (Note that the
WestRo Porti COPD dataset patients are not included in the WestRo COPD dataset). The prediction
accuracy of FDDLM is 90.13%±0.89% with fine-tuning. The explanation for the accuracy drop is
that (i) the models are previously designed for analyzing medical records with 12 signals, not 6; (ii)
the two datasets are recorded by two different portable devices having different frequencies, which
61
influences the convergence of the coupling matrices; (iii) the co-existed sleep apnea in the medical
records gathered from the WestRo Porti COPD dataset also influence the prediction performance.
4.7 Discussion
COPD is often a silent and late-diagnosed disease, affecting over 300 million people worldwide;
intrinsically, its early discovery and treatment are crucial for the patient’s quality of life and—
ultimately—survival.
The inception of a disease entails a preclinical period where it is asymptomatic and—perhaps—
reversible; ideally, this period includes very early events that can occur even before birth [156].
Early COPD stages do not exhibit evident clinical signs; therefore, conventional spirometry-based
diagnosis becomes improbable. However, the development of biomarkers that include detecting
genetic variants for COPD development’s susceptibility is a priority. The COPD onset is a phase
of early COPD where the disease may express itself with some symptoms, including a minimal
airflow limitation [157]. In this phase, spirometry is insufficient to attain a reliable diagnosis,
which calls for new COPD detection tools.
The current strategy of waiting for surfacing symptoms to signal the disease presence is not
efficient if we want to impact COPD’s natural course. Targeting early COPD stages in younger
individuals could identify those susceptible to rapid disease progression, leading to novel therapies to alter that progression. New validated biomarkers (other than spirometry) of different lung
function trajectories will be essential for the design of future COPD prevention and treatment trials
[158]. Indeed, spirometry with FEV1 may not be the most sensitive test and may have particular
limitations in identifying the early COPD stages. Moreover, impulse oscillometry and specific
airway conductance were able to identify more subtle changes in lung function than traditional
spirometry [159]. Impulse oscillometry can identify abnormalities in patients who report COPD
symptoms but do not have abnormal spirometry [160]. Such complementary diagnostic modalities
62
could potentially aid in the early recognition of COPD, especially those whose symptoms do not
match their spirometry results.
The adjustment of current diagnostic approaches and the adoption of alternative modalities
may allow for earlier identification of COPD patients. The period of the most rapid decline in lung
function occurs early, and during this period, different testing strategies, smoking cessation efforts,
and the initiation of treatment may be most beneficial.
This work proposes an alternative, precise diagnostic approach to overcome the conventional
(spirometry-based) method’s limitations by using a fractional dynamics deep learning methodology. We involved the fractional-order dynamics model in extracting the signatures from the
physiological signals (recorded by a medical sensor network from suspected patients) and trained
a neural network model with these signal signatures to make the COPD stage prediction. From
a clinical standpoint, our fluctuation profile analysis for physiological signals with relevance in
COPD (see Figure 4.2) shows that multifractality is the fingerprint of healthiness—in healthy people, physiological signals present both short and long-range memory. Conversely, monofractals
or narrow-spectrum multifractals indicate a medical condition. We noticed two exceptions to this
observation; first, the Pulse signal is narrow-spectrum multifractal even in non-COPD (i.e., stage
0) individuals; second, the Nasal Pressure signal is multifractal in both COPD and non-COPD
subjects. The possible explanation for the fact that the Pulse signal is not entirely multifractal in
non-COPD individuals is that—most probably—such subjects may have other medical conditions,
such as sleep apnea or cardiovascular problems, which (along with the associated medication)
altered the short and long-range properties of the signal. (Indeed, all patients were referred for
polysomnography because of suspicion of respiratory disorders; some turned out to be COPD-free,
yet most of them have other respiratory disorders, such as sleep apnea, as well as cardiovascular
conditions.) The Nasal Pressure signal is multifractal in both COPD and non-COPD subjects because it manifests upper airway dynamics, which may be less affected as COPD is an inflammation
(resulting in the narrowing) of the lung airways.
63
We confirm the results with k-fold cross-validation and hold-out validation and show that our
approach can predict the patients’ COPD stages with high accuracy (98.66% ± 0.45%). The accuracy is particularly high for COPD stages 1–3, suggesting that our method is distinctly efficient
for detecting early-stage COPD. Furthermore, based on the transfer learning validation, we prove
that our model can also achieve high prediction accuracy when the medical signal records are interfered with by another disease (i.e., sleep apnea). Our work makes two main contributions in
medical diagnosis and machine learning fields. First, our fractional dynamics deep learning model
makes a precise and robust COPD stage prediction that can work before the disease onset, making
it especially relevant for primary care in remote areas or geographical regions missing medical
experts, where the vast majority of patients with early and mild COPD are diagnosed and treated
(Although the fractional-order dynamic model performs well in diagnosing COPD, it may not be
generalized in investigating other physiological signals. Indeed, not all physiological signals have
multifractal features (e.g., Ivanov et al. showed that the human gait interstride interval time series
among healthy people do not show multifractality [161])). Second, we developed a valid fractional
deep learning approach that outperforms the traditional deep learning model (e.g., DNN, LSTM,
CNN) of classifying and analyzing very long time-series raw data. (We provide detailed information to explain why our model can efficiently reduce the learning complexity and achieve a high
prediction accuracy in section Experimental, subsection Mutual information analysis).
Nowadays, the conventional spirometry-based diagnosis is the dominating approach to diagnosing COPD. The problem is that it entails many error-prone steps/stages involving human intervention such that general practitioners or well-trained nurses may also misdiagnose suspected
patients (of the 4610 subjects, 96.5% had a valid screening spirometry test [162]). Such a result
emphasizes that training and technique reinforcement are paramount, yet many primary care units
do not have the resources to perform them. In this paper, our fractal dynamics deep learning method
eliminates human intervention (and error) as much as possible; any nurse or MD can place the sensors on the patient’s body, turn on the NOX device to record the physiological signals in its local
memory. Afterward, we are dealing with a completely automated, computer-based process. The
64
sufficient signal length required for a correct diagnostic is 10 minutes (for detailed information, see
Supplementary material, section Convergence of coupling matrix). Therefore, our method is simple, robust, requires little human intervention, and has a relatively small duration of physiological
signal records; this also makes it suitable for addressing critical social aspects of healthcare. First,
there is equal opportunity in accessing reliable medical consultation for COPD, especially in areas
with a lower socioeconomic status where people do not have the means to travel to a specialized
state-of-the-art respiratory clinic [163]. With our method, any medical mission in such an area can
efficiently record data from many individuals in need and then process it automatically. Second,
our method abides by the commandments of universal health care amid the COVID-19 pandemic,
as it filters most of the physical interaction entailed by regular spirometry [122]. Although the
MF-DFA methods we use are widely used, they cannot exclude that the focus points are due to bimodality, multifractal noise, or mere monofractality. Hence, in future work, we plan to employ the
robust multifractal analysis developed by Mukli et al. [128] to analyze the physiological signals
and develop a new machine-learning framework to improve the robustness of predicting COPD
stages.
65
Chapter 5
Deciphering network science in brain-derived neuronal cultures
Understanding the mechanisms by which neurons create or suppress connections to enable communication in brain-derived neuronal cultures can inform how learning, cognition and creative behavior emerge. While prior studies have shown that neuronal cultures possess self-organizing criticality properties, we further demonstrate that in vitro brain-derived neuronal cultures exhibit a selfoptimization phenomenon. More precisely, we analyze the multiscale neural growth data obtained
from label-free quantitative microscopic imaging experiments and reconstruct the in vitro neuronal
culture networks (microscale) and neuronal culture cluster networks (mesoscale). We investigate
the structure and evolution of neuronal culture networks (NCN) and neuronal culture cluster networks (NCCN) by estimating the importance of each network node and their information flow. By
analyzing the degree-, closeness-, and betweenness-centrality, the node-to-node degree distribution (informing on neuronal interconnection phenomena), the clustering coefficient/transitivity (assessing the “small-world" properties), and the multifractal spectrum, we demonstrate that murine
neurons exhibit self-optimizing behavior over time with topological characteristics distinct from
existing complex network models. The time-evolving interconnection among murine neurons optimizes the network information flow, network robustness, and self-organization degree. These
findings have complex implications for modeling neuronal cultures and potentially on how to design biological inspired artificial intelligence.
66
5.1 Introduction to NCN evolution and properties
Current research in neuroscience models the brain as a dynamic complex network whose connections change continuously as we advance through life[164, 165]. Consequently, there is significant
motivation for understanding the mechanisms by which neurons create or suppress connections to
enable hierarchical parallel processing in the brain and explaining how learning, cognition and creative behavior emerge [166, 167, 168, 169, 170, 171]. Moreover, the brain connections are thought
to obey a constrained optimization, such as maximization of information processing capacity (efficiency) while minimizing the energy expenditure [172].
Motivated by these challenges, there is a growing effort on analyzing the evolution and emergence of connectivity and its implications for information processing in both in vitro neural cultures
and live brain sensing. For instance, prior efforts on investigating the morphological evolution of
assemblies of living neurons showed that cultured neurons self-organize and form complex neural
networks that exhibit a small-world structure (a network with many highly interconnected clusters
with few long-range connections among clusters) [19]. Moreover, Okujeni et al. [18] investigated
the impact of neuron clustering (by modulating the protein kinase C) on the spontaneous activity
in neuronal culture networks and showed that higher clustering contributed to synchronous bursting in some parts of the neuronal culture networks. Besides analyzing neuronal cultures at the
macroscale and mesoscale, pioneering efforts that combined functional magnetic resonance imaging (fMRI) based on blood oxygen level–dependent (BOLD) contrast with bulk calcium indicator
signal measurement enabled to investigate in vivo the neuronal and glial activity coupling in rat
somatosensory cortex [173].
In this study, relying on label-free quantitative microscopic imaging of rats and mice neurons
(with a resolution of 10000 times higher than MRI technology), we reconstruct the neuronal culture networks (constructed from rats neurons) and neuronal culture cluster networks (constructed
from mice neurons) and analyze their topological properties in order to elucidate how neurons generate new connections and connect with each other over time. In the brain-derived neuronal culture
67
network, the somas and the neurites represent the nodes and the edges, respectively. In the brainderived neuronal culture cluster network, the neuronal clusters and the cluster neurites (between
two different clusters) represent the vertices and their corresponding connections. By analyzing
the structure and evolution of neuronal culture network and neuronal culture cluster networks, we
demonstrate that these neuronal culture networks exhibit a unique self-optimization and assortative connectivity behavior, as well as a peculiar multifractal structure that cannot be captured by
existing complex network models[174, 175, 176, 177]. These findings suggest that a new class
of mathematical models and algorithmic tools need to be developed for describing the interwoven
time-varying nature of the neuronal culture’s information processing as well as for understanding
how these dynamic networks are controlled, or explaining the mechanisms of spontaneous activities in neuronal culture networks evolution.
5.2 Neuronal interconnections exhibit assortative behavior
In this section, we investigate neuronal culture networks (constructed from rat neurons) and neuronal culture cluster networks (constructed from mice neurons). We firstly present the generated
neuronal culture networks’ layouts in Figure 5.1 and Supplementary Figure S1. Figs. 5.1(a-c)
present the neurons images from the SLIM imaging experiments (for details on SLIM imaging
experiments see Methods section "Sample preparation" and "microscopy"), Figs.5.1(d-f) show the
zoomed portion of the middle region of the neuron images, Figs.5.1(g-i) present the neuronal culture networks’ layouts after executing our tracing algorithm (Method section "Cell segmentation
and neural tracing" provide detailed information). Different colors represent the different identifications for each neuron and neurite. Figs.5.1(j-l) consists of a network representation of the neurons with their spatial position of neurons altered. In addition, we also analyze neuronal clusters
obtained through a similar procedure as in Teller [178] while using a higher resolution provided by
SLIM imaging which decouples amplitude artifacts from highly detailed cellular information. The
68
Figure 5.1: Layouts for neuronal culture networks at three representative time points. Neurons at the start
of the experiment at time t = 0 hours (a), t = 7 hours (b) and the end of the experiment t = 14 hours (c).
The magnification zoom of the neurons at t = 0 hours (d), t = 7 hours (e) and t = 14 hours (f). Figures 1(g),
1(h) and 1(i) show the identified neurons and their connections obtained with our algorithm (see Methods
section on "Cell segmentation and neural tracing") for the three corresponding time points (each neuron
and neurite is identified by a unique color). After constructing the adjacency matrices from the tracing and
segmentation algorithm, the visualization of the network layouts at t = 0 hours (j), 7 hours (k) and 14 hours
(l) are presented.
69
neural computation emerges from the complex dynamic interconnection patterns and signaling
among neurons. Consequently, to decode the complexity of the dynamic neuronal interconnections, we investigate first the node-to-node degree distribution. While the degree distribution[179]
captures the first-order statistic of a complex network, the node-to-node degree distribution offers
second-order statistical information and explains how a node of a specific degree connects to lower
or higher degree nodes. To study the second-order statistics of the networks of neurons and neuronal clusters, we consider three consecutive snapshots (i.e., after 0, 7, and 14 hours) and estimate
for each target node the degree distribution of its neighbors. For example, if a neuron with degree 5
connects with another one with degree 4, we add 1 on the coordinate (5,4) in the 2D node-to-node
degree distribution plot. Fig.5.2(a) illustrates an artificially generated network example. The dotted lines represent new connections to this artificial network after a period of time. Figures 5.2(b)
and (c) illustrate the node-to-node degree distribution for the artificial network example without
and with the dotted links, respectively. In Figure 5.2(d-i), the x-axis represents the neuron degree
and the y-axis represents the degree of its neighbor. The rationale for constructing the node-tonode degree distribution plot is twofold: (i) In each separate graph, we can find the tendency of a
neuron with a certain degree to connect to neurons with lower, the same or higher degree; (ii) In the
neuronal culture networks, the degree varies due to informational exchanges over the new neural
connections. The length of the neurites grows over time. However, a neurite cannot be recorded
as an edge in our network before its axon terminal connects with another neuron. Investigating
the degree distribution at different time points can help us learn how the neurites grow and how
neuronal culture networks construct new connections.
By analyzing the node-to-node degree distribution for the network of neurons (i.e., Figs. 5.2(df)) and the network of neuronal clusters (i.e., Figs. 5.2(g-i)) across time, we observe: (i) In the
beginning (t = 0 hours), based on Figs. 5.2(d) and 5.2(g), the network of neurons and the network of neuronal clusters display a preferential attachment (PA)[180] phenomenon in the sense
that neurons or clusters tend to connect to nodes of the same degree. In the network of neurons,
the most frequent connection pattern corresponds to neurons of degree 11 that connect also to
70
Figure 5.2: Degree distribution for neurons and neural clusters. An artificial network example (a), where
the yellow node has the highest degree centrality, the red node has the highest closeness centrality and the
green node has the highest betweenness centrality (with solid lines and dotted lines). The node-to-node
degree distribution for the network example (a) without (b) and with (c) additional dotted lines in order
to mimic the connectivity phenomena observed in neural cluster networks,where the color bars present the
occurrence frequency of the node-to-node matrices and the red cycles with connection pairs represent the
coordinate of the peak values in the matrix. The node-to-node degree distribution for the neuronal culture
networks at the start of the experiment t = 0 hours (d), after 7 hours (e), and the end of the experiment after
14 hours (f). The node-to-node degree distribution for the neuronal culture cluster networks at the start of
the experiment t = 0 hours (g), after 7 hours (h), and at the end of the experiment after 14 hours (i).
71
neurons with 11 connections. In the network of neuronal clusters, the most frequent connection
pattern corresponds to nodes with 4 links that also connect to nodes of the same degree. (ii) After 7 hours, from Figs. 5.2(e) and 5.2(h), we observe a discrepancy between the neuronal culture
network and the neuronal culture cluster network. The network of neurons displays three peaks
corresponding to the following cases: neurons of degree 16 tend to connect to other neurons of
degree 16, neurons of degree 10 connect to other neurons with degree 11, and neurons of degree
11 connect to other neurons of degree 10. Also, the network is evolving and displays an increasing connectivity. In contrast, the neuronal culture cluster network exhibits only two peaks: the
neuronal clusters (nodes) of degree 5 connect to other nodes of degree 5 and nodes of degree 8
connect to other clusters of degree 8. (iii) After 14 hours, from Figs. 5.2(f) and 5.2(i), the network
of neurons displays a two peak pattern (i.e., nodes of degree 14 are more likely to connect to nodes
with degree 14 and nodes of degree 19 also prefer to connect to other nodes with degree 19) and
the network of neuronal clusters shows a single peak (i.e., the nodes of degree 8 are more likely to
connect with other nodes with degree 8). We also observe from all these plots that the distribution
of neurons and their neighbors degree has a higher density along the diagonal. One can explain
this phenomenon by the existence of multiple communities of neurons within which each neuron
is likely to be fully connected with its community members (Figs. 5.1(j-l)); consequently, most of
the neurons within the same community can have a similar degree. Furthermore, we can observe
from all plots in Figs. 5.2 that the regions with a higher occurrence frequency of node-to-node
distribution move and spread along the diagonal as time goes on. This implies that over time more
edges are generated and nodes gain in degree. We conclude that a node has a high probability
of connecting to other neurons that have the same degree. The assortativity coefficient represents
the tendency for nodes to connect to other nodes with similar properties within a network [178].
Thus, we calculate the assortativity coefficient of a single neuronal culture network and a single
neuronal culture cluster network for three consecutive snapshots in Table 5.1. Based on Table 5.1,
all the assortativity coefficients are positive values and both neuronal and neuronal culture cluster
networks have a decreasing tendency of assortativity coefficient. Based on Fig. 5.2 and Table 5.1
72
Models Assortativity
Start (t = 0 hours) Middle (t = 7 hours) End (t = 14 hours)
neuronal culture network 0.5118 0.4400 0.4258
neuronal culture cluster network 0.3153 0.2663 0.2109
Table 5.1: The assortativity coefficient for neuronal culture networks and neuronal culture cluster networks
in consecutive snapshots.
results, we conclude that (i) a neuron has a high probability of connecting to other neurons that
have the same degree and (ii) neurons will set up new connections with neighboring neurons over
time which have proximate degrees.
5.3 Optimizing Flow & Robustness in Neuronal Networks
Neural computations governing the sensorial processing, perception, and decision-making emerge
from the information transfer across interwoven time-varying complex neuronal culture networks[181].
To investigate the performance of information transfer from biological data consisting of only snapshots of microscale neuronal culture networks and mesoscale neuronal culture cluster networks, we
quantify their degree centrality, the closeness centrality and betweenness centrality[182]. Generally speaking, the centrality measures the importance of a node across a heterogeneous complex
network. For instance, in a social network, the influencer nodes (e.g., politicians, TV stars) have a
large number of followers and hence are capable of propagating specific messages faster than other
network nodes. The degree centrality measures the number of links incident upon a node and can
be related to the localized network transport or throughput capacity. The closeness centrality of a
node quantifies the average length of the shortest path between the target node and all other nodes
in the graph and encodes information about the information transmission latency across a specific
network topology. The betweenness centrality measures the number of times a node appears along
the shortest path between all pairs of two nodes. The higher the betweenness centrality of a node,
73
the more information paths pass through it and the less robust the network is to targeted attacks on
this node (for details on the degree-, closeness-, and betweenness-centrality, see Methods section
"Networks Centrality"). Fig.5.2 (a) shows an artificial network example (with additional connections) where the red node has the highest closeness centrality, the yellow node has the highest
degree centrality, and the green node has the highest betweenness centrality (Supplementary Table
S1 and S2 exhibits the degree-, closeness-, and betweenness centrality for each nodes in Fig. 5.2
(a)).
Fig. 5.3 illustrates the cumulative distribution function (CDF) curves of the degree centrality, closeness centrality and betweenness centrality estimated for three consecutive snapshots of a
single brain-derived neuronal culture network and a single brain-derived neuronal culture cluster
network (Supplementary Figure S2 and S3 present the histograms and smoothed curves about the
degree-, closeness, and betweenness centrality for neuronal culture network and neuronal culture
cluster network). For instance, over the course of the experiment, the CDF curves of the degree
centrality in Fig. 5.3(a) of the microscopic neuronal culture networks exhibits a shift towards
higher degree centrality values. This is best reflected in Fig. 5.3(g), where the average degree centrality shows an increasing trend. At a higher scale of a network of neuronal clusters and the same
three time points, the CDF curves of degree centrality exhibits a more pronounced shift to higher
values (see Fig. 5.3(d)). The higher degree centrality values are the higher the chance of nodes to
receive the information passed over the network. These results demonstrate that the networks of
neurons and neuronal clusters tend to optimize the degree centrality and support higher information transmission across the network over time. Neuronal culture networks achieve this increase
in degree centrality by growing connections, while the network of neuronal clusters increases their
degree centrality through the merging of clusters and connection inheritance.
Along the same lines, Fig. 5.3(b) for the network of neurons and Fig. 5.3(e) for the network of
neuronal clusters show that the CDF curves of closeness centrality is shifting to the right (higher
values). This trend can also be observed in Figure 5.3(h) where the average closeness centrality
has an upward tendency. The higher the closeness centrality of a node is the less time it takes for
74
Figure 5.3: Investigate the changes of degree-, closeness-, and betweenness-centrality in consecutive neuronal culture networks and neuronal culture cluster networks. The CDF curves of the degree centrality (a),
closeness centrality (b) and betweenness centrality (c) for neuronal culture networks for three times t = 0,
7, and 14 hours. The CDF curves of the degree centrality (d), closeness centrality (e) and betweenness
centrality (f) for neuronal culture cluster networks for three times t = 0, 7, and 14 hours. The average degree
centrality (g), average closeness centrality (h) and average betweenness centrality (i) for neuronal culture
networks for 15 time points within the 14 hours experiment.
75
Figure 5.4: Comparison of clustering indices between the neuronal culture networks and model-based
randomly constructed networks of the same size. (a) The comparison (errors) in terms of the transitivity
between the neuronal culture networks and the RR, ER, WS, BA, SSF, and WMG based generated networks
(for each model we generated 1000 network realizations) for the 14 hours experiment. (b) The comparison
(errors) in terms of average clustering coefficient between the neuronal culture networks and the RR, ER,
WS, BA, SSF, and WMG based constructed networks (for each model we generated 1000 networks) within
14 hours. (c) The comparison (errors) in terms of average square clustering coefficient between the neuronal
culture networks and the RR, ER, WS, BA, SSF, and WMG based networks during the 14 hours experiment.
this node to reach all other nodes. Consequently, these results show that the network of neurons
and the network of neuronal clusters tend to optimize the closeness centrality and minimize the
information transmission latency. By comparing the dynamics of the network of neurons with that
of the network of neuronal clusters, we observe a doubling effect for the magnitude of the location
of the peak in the closeness centrality CDF curves.
The analysis of the betweenness centrality CDF curves and the average betweenness centrality
shows a decreasing tendency for both the networks of neurons (Fig. 5.3(c)) and the network of
neuronal clusters (Fig. 5.3(f)). A lower node betweenness centrality means that the node appears
fewer times along the shortest path among all network nodes. Of note, during the course of the
experiment, we observe that some neurons die and are deleted from the network. If a neuron with
a high betweenness centrality is removed from the network, then the network has a higher chance
of becoming disconnected. However, since the average betweenness centrality is decreasing over
time, the dying neurons have a lower probability of causing network disconnection. Thus, we
conclude that the networks of neurons and neuronal clusters tend to minimize the betweenness
76
Figure 5.5: Variations of interconnections between two neighboring neurons. The exceedance probability
for the length of connections between two neurons (i.e., the probability of observing the length of a connection between two neurons exceeding a certain threshold) at the start of the experiment t = 0 hours (a), after
7 hours (b), at the end of the experiments after 14 hours (c), and the comparison of (a-c) is shown in (d).
centrality, which can increase the robustness of the network against cascading failures. In summary, the analysis of degree centrality, closeness centrality and betweenness centrality shows that
the networks of neurons and neuronal clusters tend to optimize the network information transfer.
5.4 Clustering analysis in NCN
In this section, we investigate whether the network of neurons can be well described by existing complex network models (i.e., Random Regular (RR) [174], Erdos-Renyi (ER) [175], WattsStrogatz (WS) [176], Barabasi-Albert (BA) [177], Spatial Scale-Free model (SSF) [183, 184] and
Weighted Multifractal Graph (WMG) [185]) by generating artificial networks of the same size (in
terms of number of nodes and edges) according to these models for all the considered time points
(within 14 hours) and computing their transitivity, clustering coefficient, and the square of the clustering coefficient metrics (see Figure 5.4). RR network is defined as a random d-degree network on
n nodes; ER network is G(n,p) model, where n is the number of nodes, p is the linking probability;
WS is a random network model which has small-world network properties; BA model generates
scale-free networks characterized by a power law degree distribution; SSF model exhibits spatial
scale-free networks where the probability of the incoming node i setting up a connection with
an existing node j is pi→j ∝ k jexp(−di j/rc) where di j is the distance between i and j, k j
is the
77
Figure 5.6: Multifractal analysis of neuronal culture networks and neuronal culture cluster networks. (a)
Multifractal spectrum f(α) as a function of Lipschitz-Holder exponent α for neuronal culture networks in t
= 0, 7, and 14 hours. (b) Generalized fractal dimension D(q) as a function of q-th order moment for neuronal
culture networks in t = 0, 7, and 14 hours. (c) Multifractal spectrum f(α) as a function of Lipschitz-Holder
exponent α for neuronal culture cluster networks in t = 0, 7, and 14 hours. (d) Generalized fractal dimension
D(q) as a function of q-th order moment for neuronal culture cluster networks in t = 0, 7, and 14 hours.
degree of node j and rc is the control parameter. Besides these well-known complex network models, since the neuronal culture network and neuronal culture cluster network possess multifractal
characteristics (for details on this conclusion, see Results section “neuronal culture networks and
neuronal culture cluster networks possess multifractal characteristics” and methods section “Multifractal analysis”), we also investigated whether the WMG model can provide a better fit for the
considered neuronal culture networks. The WMG model captures and generates weighted multifractal networks by mapping from recursively constructed measures of linking probability. The
transitivity T of a graph is based on the relative number of triangles in the graph, compared to the
total number of connected triples of nodes (also known as the global clustering coefficient). The
clustering coefficient measures the degree to which the network nodes connect to each other. The
78
square of the clustering coefficient quantifies the cliquishness in bipartite networks where triangles
are absent (for details on the transitivity, clustering coefficient, and square of the clustering coefficient, see Methods section “Clustering Coefficient"). We investigate the clustering coefficient
because we observe that the neurons tend to organize into various communities over time (see
Figure 5.1 (j), (k), and (l)), in this way, it is likely that the neuronal culture networks have “small
world" [176] properties. As one can observe from Figure 5.4, the values of the transitivity and
clustering coefficient metrics for the network of neurons are significantly higher than those corresponding to the artificially generated networks corresponding to the first five above-mentioned
models (To compare the biological networks against existing network models, for each time point,
we compute the average and the 95% confidence intervals of the transitivity, clustering coefficient
and averaged square clustering coefficient from 1000 artificially generated network realizations).
Alternatively stated, the neurons tend to form communities with very different topological structures than the well-known RR, ER, WS, BA, or the SSF models. In contrast, the fitting of the
WMG model shows smaller errors in terms of the transitivity, clustering coefficient and averaged
square clustering coefficient metrics.
After investigating the neuronal culture network structure characteristic, we analyze the spatial
organization or metric correlations of neuronal culture networks based on the topology. We measure the functional relationship between the Euclidean distance of neighboring neurons and the
number of interconnections among them. In Fig. 5.5, we calculate the probability of observing the
length of a connection between two neurons exceeding a certain threshold (also called exceedance
probability) for different timestamps (t = 0 hours, t = 7 hours and t = 14 hours). We find that larger
Euclidean distances (threshold) have lower probabilities. These results indicate that the physically closed neurons have more connections than the physically distant ones. Furthermore, we
observe that the distance between arbitrary two neurons in the same community decreases, while
the number of edges in each community increases. These observations also corroborate with the
conclusions drawn from analyzing the degree centrality and closeness centrality. In summary, we
conclude that the network of neurons (1) possesses a network generator that is different from the
79
Parameters neuronal culture networks neuronal culture cluster networks
Start Middle End Start Middle End
f(α)max 1.6220 1.5892 1.5690 1.4472 1.3213 1.1821
αmax −αmin 1.0661 1.1974 1.0561 0.9849 0.9708 0.8548
f(α)max − f(α)min 0.8840 0.8057 0.8211 1.0071 0.9581 0.7426
Table 5.2: The parameters of multifractal spectrum
RR, ER, SW, BA, and SSF models, (2) has more interconnections between physically closer neurons, (3) has the tendency to self-optimize in order to enable and support higher, faster and more
robust information transmission, and (4) exhibits multifractal topological characteristics.
5.5 Multifractal network analysis
Previous works[186, 187, 188] have argued that the brain intelligence is correlated with the regional gray matter, volume, tissue, and microstructure of white matter. Here, we adopt an alternative topological perspective to the correspondence between neuronal connectivity complexity
and intelligence and analyze the multifractal characteristics of neuronal culture networks. To comprehensively observe the structural complexity and heterogeneity of neuronal culture networks and
neuronal culture cluster networks, we use the finite box-covering algorithm[189] and estimate their
multifractal spectrum and generalized fractal dimension (for details on the multifractal analysis and
box-covering algorithm, see Method section "Multifractal analysis"). Multifractal analysis (MFA)
applies a distorting exponent q to the probability measure at different observation scales and can
quantify the structural characteristics of networks by comparing how the network behaves at each
distortion. MFA provides information about the heterogeneous self-similarity of our networks and
can help us identify changes in their topological heterogeneity over time. By observing the multifractal spectrum f(α) under different Lipschitz-Holder exponent α, we can capture the variation
in scaling behaviors of different subcomponents of the network. Equivalently, this variation could
80
be observed by learning the generalized fractal dimension D(q) under the order q. In multifractal
spectrum, the larger the α, the higher density of the self-similar structure in the network; the larger
the f(α), the larger the amount of the self-similar structures in the network; the larger the width
(αmax −αmin), the more diverse the fractal structure in the network.
Applying the MFA for the neuronal culture network (i.e., Figures 5.6(a), 5.6(b) and Table 5.2)
and the neuronal culture cluster network (i.e., Figs. 5.6(c), 5.6(d) and Table 5.2) across time, we
observe: (i) Fig. 5.6 shows that the neuronal culture networks and the neuronal culture cluster
networks possess multifractal properties. By comparing their multifractal spectrum parameters
summarized in Table 5.2, we conclude that the f(α)max and the width of the spectrum of the neuronal culture network are larger than those of the neuronal culture cluster network. Consequently,
the neuronal culture networks have stronger multifractality, which means stronger heterogeneity
and higher complexity. (ii) From Figs. 5.6(a) and 5.6(b), we can see that although the number of
edges of the neuronal culture network increase across time, its multifractal spectrum and the generalized fractal dimension have only small changes without monotonic variation with time. The
spectrum has no tendency to move over time, which shows the common self-similar structures of
the neuronal culture network do not change. Therefore we can conclude that our neuronal culture
network has a relatively stable multifractal structure, which means even if neurons generate new
connections over time, the self-similar structures of the neuronal culture network do not change
much. (iii) From Figs. 5.6(c) and 5.6(d), we can see different trends from the neuronal culture
network. The multifractal spectrum and the generalized fractal dimension of the neuronal culture cluster network exhibit a monotonic pattern over time. The result shows that the multifractal
spectrum moves down to the left, which means the common self-similar structures of the neuronal
culture cluster network become less dense. The width of the spectrum and the generalized fractal
dimension decreases across time, which means the self-similar structures become more concentrated so the diversity of the network structure decreases with time. This is because in the neuronal
culture cluster network, the clusters move and sometimes join to form a larger cluster. The continuous merger behavior will bring structural changes that reduce the heterogeneity of the neuronal
81
culture cluster network as our results show.
5.6 Discussion
By adopting a complex networks characterization, we find that brain-derived neuronal culture
networks and neuronal culture cluster networks of rats and mice exhibit a network flow selfoptimization phenomenon (i.e., higher information transmission, latency reduction, and robustness
maximization over time) either by growing connections or via the merging of neuronal clusters.
This analysis complements and contributes to earlier studies that showed the existence of selforganized criticality, of a small-world state and that higher clustering leads to spontaneous bursting in parts of the neuronal culture networks [18, 19]. Future work should investigate whether
the self-organized criticality is goal-driven and contributes to the observed self-optimization phenomenon. Furthermore, we concluded that neuronal interconnection architecture displays assortative behavior. To elucidate the mechanisms by which neurons create or suppress connections to
enable communication in brain networks and understand their role in learning, cognition, and creative behavior, future studies should combine the complex sensing approach [173] of probing the
neuron and glial cell activity coupling with network science concepts and tools presented in this
study. In addition, our clustering analysis demonstrates that the network model characterizing the
brain-derived neuronal culture networks does not fit the Random Regular (RR)[174], Erdos-Renyi
(ER)[175], Watts-Strogatz (WS)[176], Barabasi-Albert (BA)[177], and Spatial Scale-free (SSF)
network models[183, 184]. In contrast, the weighted multifractal graph model[185] provides the
best fit (smallest error) in terms of matching the clustering, transitivity, and square clustering coefficients. Finally, by analyzing the spatial properties associated with the topology of the monitored
neuronal culture networks we observe that closer neurons have more interconnections among them
than the distant ones.
82
Current neuroscience studies[190, 191] discuss the importance of investigating the in vitro neuronal cultures as an efficient system to model the neural activity, as well as, the role of understanding the spatial embedding and metric correlations on connectivity and activity in neuronal culture
networks[192]. Along these lines, our proposed combined network science framework and image processing tool can be further employed for analyzing the interactions and metabolic coupling
between neurons and glial cells (e.g., astrocytes) either via fMRI sensing[173] or an enhanced
quantitative phase imaging approach used in this work for live monitoring of neurons and glia.
With the goal of investigating the pulsation in vitro neuronal cultures, Orlandi et al.[190] showed
that neuronal spiking behavior can originate from a random set of spatial locations specific to each
culture and is modulated by a nontrivial interdependence between topology and neural dynamics.
To study the spatial arrangement of neurons in neuronal cultures, a random field Ising inspired
model[193] showed that metric correlations dominate the neuronal topological properties. Tibau
et al.[194] extracted the effective connectivity of neuronal cultures from the spontaneous activity
of calcium fluorescence imaging recordings and observed an increase in average connectivity over
time and various degrees of assortativity. This body of work suggests that the spontaneous activity
in the mammalian brain plays a fundamental role in brain development, information transmission,
and communication of different brain regions and provides a new research direction to investigate
the functional relationship between the evolution of the neuronal culture networks (with multifractal characteristics) and neuronal spiking activities.
In this work, we investigated the mathematical properties of brain-derived neuronal culture
networks and brain-derived neuronal culture cluster networks (by precisely locating and detecting each axon and dendrite within 0.03nm optical path-length accuracy [195]) which provides a
way to analyze the spontaneous evolution of the neuronal cultures in the early stages (i.e., 14h).
Furthermore, future studies should characterize and distinguish between healthy and unhealthy
behavior (e.g., glioblastoma/brain tumor) of neurons as well as identifying the degree of toxicity
of cultures. Moreover, future mathematical analysis of neuronal culture networks can also help
us understand how neurons connect to guide the information flow as we recall the past, envision
83
the future, or make social inferences, model the perception, inference, generalization and decision making. Lastly, by explaining the mechanisms of cognitive control emerging from multiscale
neuronal culture networks, we can identify new biological inspired strategies for designing deep
learning architectures.
84
Chapter 6
Enhancing neural network performance with leader-follower
architecture and local error signals
Artificial neural networks (ANNs) typically employ global error signals for learning [196]. While
ANNs draw inspiration from biological neural networks (BNNs), they are not exact replicas of
their biological counterparts. ANNs consist of artificial neurons organized in a structured, layered architecture [197]. Learning in such architectures commonly involves gradient descent algorithms [198] combined with backpropagation (BP) [199]. Conversely, BNNs exhibit more intricate
self-organizing connections, relying on specific local connectivity [200] to enable emergent learning and generalization capabilities, even with limited and noisy input data. We can conceptualize
a group (collective) of neurons as a collection of workers, wherein each worker receives partial information, generates an output, and transmits it to others to achieve a specific collective objective.
This behavior can be observed in various biological systems, such as decision-making among a
group of individuals [201], flocking behavior in birds to avoid predators and maintain flock health
[202], or collective behavior in cells fighting infections or sustaining biological functions [203].
The study of collective behavior in networks of heterogeneous agents, ranging from neurons
and cells to animals, has been a subject of research for several decades. In physical systems, interactions among numerous particles give rise to emergent and collective phenomena, such as stable
magnetic orientations [204]. A system of highly interconnected McCulloch-Pitts neurons [205]
85
has collective computational properties [204]. Networks of neurons with graded response (or sigmoid input-output relation) exhibit collective computational properties similar to those of networks
with two-state neurons [206]. Recent studies focus on exploring collective behaviors in biological
networks. This includes the examination of large sensory neuronal networks [207], the analysis of
large-scale small-world neuronal networks [208], the investigation of heterogeneous NNs [209],
and the study of hippocampal networks [210]. These studies aim to uncover the collective dynamics and computational abilities exhibited by such biological networks. In biological networks such
as the human brain, synaptic weight updates can occur through local learning, independent of the
activities of neurons in other brain regions [211, 92]. Partly for this reason, local learning has been
identified as effective means to reduce memory usage during training and to facilitate parallelism
in deep learning architectures, thereby enabling faster training [212].
Drawing inspiration from collective behavior and local learning in biological networks, we
propose a leader-follower neural network (LFNN) architecture mirroring the complexity observed
in biological systems. Our LFNN divides the NN into layers of elementary leader and follower
workers, leveraging characteristics of collective motion to designate leadership. Similar to a flock
of birds (Figure 6.1a-b), leaders in LFNNs are informed by and guide the entire system’s evolution
(Figure 6.1c-d). This biologically-plausible alternative to backpropagation (BP) utilizes distinct
error signals for leaders and followers, enabling training through local error signals.
We evaluate the LFNN architecture and its BP-free version trained with local loss (LFNN-ℓ)
on image processing datasets (i.e., MNIST, CIFAR-10, and ImageNet). Our LFNN and LFNNℓ outperform other BP-free algorithms and achieve comparable results to BP-enabled baselines.
Notably, our algorithm demonstrates superior performance on ImageNet compared to all other
BP-free baselines. Moreover, LFNN-ℓ can be conveniently incorporated into VGG, ResNet and
ViT architectures to accelerate the training process, and it significantly outperforms state-of-the-art
block-wise learning BP-free methods on CIFAR-10, Tiny-ImageNet, and ImageNet. In addition
to validating our models on 2D image datasets, we applied our LFNN-ℓ on a 3D-CNN model to
analyze 3D brain MRIs and predict brain age. The results show that our LFNN-ℓ outperformed the
86
?
? ?
?
?
?
?
?
?
?
Leaders Followers
a. b. c. d.
i.
? ? ?
? ? ?
Input
Output
?
?
...
...
?
?
Local error signal
Global error signal
Activity in network
Follower
Leader
...
...
BP in DNN
Neuron
e. f. g. h.
?
...
Update
ouput
layer
With BP BP-free
Leader Follower
Patterned
collective motion
j. k.
Figure 6.1: From bird flock to LFNN. a-b. A flock of birds where leaders are informed and lead the flock.
c. An abstracted network from the flock. d. A LFNN architecture. Weight updates of LFNN. e. BP in
classic deep neural network (DNN) training. Global prediction loss is back-propagated through layers. f.
An LF hierarchy in a DNN. Within a layer, neurons are grouped as (leader and follower) workers. g. Weight
update of follower workers. h. Weight update of leader workers with BP. i. BP-free weight update of
leader workers. Training visualization. j. workers activity visualization in an LFNN. At each time step, the
followers (black lines) align themselves with leaders (red lines). k. Patterned collective motion produced by
the classic Vicsek model [213].
latest 3D-CNN in mean absolute errors (MAE) and achieved a 2x speedup. This study introduces
complex collectives to deep learning, offering new and valuable insights into biologically plausible
neural network research and opening avenues for future work.
87
6.1 Results
Collective motion refers to ordered movement in systems consisting of self-propelled particles,
such as flocking or swarming behavior [214]. The main feature of such behavior is that an individual particle is dominated by the influence of others and thus behaves entirely differently from
how it might behave on its own [215]. A classic collective motion model, the Vicsek model [213],
describes the trajectory of an individual using its velocity and location, and uses stochastic differential/ difference equations to update this agent’s location and velocity as a function of its interaction strength with its neighbors. Inspired by collective motion, we explore whether these minimal
mathematical relations can be exploited in deep learning and develop the LFNN approach shown
in Figure 6.1e to i. Unlike classic deep learning architectures (Figure 6.1e), we define workers as
structures containing one or more grouped neurons that serve as basic units. In this architecture,
some workers act as leaders, responsible for guiding the optimization direction, while the remaining workers, called followers, minimize their distance from the leaders (Figure 6.1f). LFNN-ℓ is
a BP-free version trained with local loss, allowing parallel optimization within each block (e.g.,
CNN blocks in ResNet). Figures 6.1g to 6.1i illustrate how followers update and how leaders
update under BP and BP-free conditions (see Methods section for detailed information about the
inner working rules of the LFNN architecture). Furthermore, figures 6.1j and k visualize worker
activities using neuron output ⃗y before and after weight updates. The difference indicates activity
where followers (black lines) move with leaders (red lines). During initial training (steps 0-1000)
on MNIST with one hidden FC layer (we select 30% of the leaders from the 32 workers in each
training step and update their weight dynamics based on global and local prediction loss), both
show significant movement and rapid learning with larger steps. As training stabilizes, movement
decreases as weight changes lessen. This patterned movement in LFNNs resembles the Vicsek
model [213].
In what follows, we investigate the effect of the leadership size on LFNN’s performance, conduct an ablation study of loss terms in LFNN, and analyze the worker’s activity. To facilitate
88
? ?
...
...
...
c. d. e. f.
e.
? ?
...
...
...
? ?
...
...
...
? ?
...
...
...
Leader Follower
a. b.
Figure 6.2: a. Network performance results when varying leadership size from 10% to 100%. b. Ablation
study results from four different loss functions. Loss variation demonstration and leadership during training.
c. Global prediction loss and both local losses. d. Without local follower loss. e. Without local leader loss.
f. Global prediction loss alone.
demonstration and visualization, we utilize DNNs in this subsection. Next, we evaluate the CNNbased LFNNs and LFNN-ℓs architectures on three image processing datasets (i.e., MNIST, CIFAR10, and ImageNet) and compare them with a set of baseline algorithms. Furthermore, we embed
LFNN-ℓs within VGG-19 and ResNet-50/101/152 for validation on CIFAR-10, Tiny ImageNet,
and ImageNet. This approach aligns with prior experiments conducted in the literature on BP-free
algorithms and demonstrates the scalability of our approach [216, 217].
6.1.1 Leader-Follower Neural Networks with BP (LFNNs)
To assess the performance of LFNN for online classification, we conduct experiments on the pixelpermuted MNIST dataset [218]. Following the approach in [219], we construct a one-vs-all classifier using a simple NN architecture consisting of one hidden FC layer. In our experiments, we
vary the network architecture to examine the relationship between the network performance and
the leadership size. We consider network configurations with 32, 64, 128, 256, and 512 workers,
89
where each worker corresponds to a single neuron. We systematically vary the percentage of workers assigned as leaders from 10% to 100%. For each network configuration, we utilize the sigmoid
activation function for each worker and train the model using the Adam optimizer with a learning
rate of 5e-3. The objective is to investigate how different leadership sizes impact the classification
performance in the online settings. In our experiments, we employ the binary cross-entropy loss
for both the global prediction loss (Lg) and the local prediction loss for leaders (L δ
l
). For the
local error signal of followers (L
¯δ
l
), we use the mean squared error loss. Hence, the loss function
of LFNN is defined as follows:
L = Lg +λ1L δ
l +λ2L
¯δ
l
(6.1)
where the first term of the loss function applies to the output neurons and leader workers. The
second and third terms apply to the leader and follower workers, as illustrated in Figure 6.1c and
d (see "Error signals in LFNN." in Methods section for more detailed information). The hyperparameters λ1 and λ2 are used to balance the contributions of the global and local loss components.
It is important to note that the local loss L δ
l
and L
¯δ
l
are specific to each layer, filter, or block and
do not propagate gradients through all hidden layers. Here, the hyperparameters λ1 and λ2 are
both set to 1 to balance the global and local loss terms. In the ablation study of loss terms and the
worker activity study, we focus on a 32-worker LFNN with 30% leadership.
Leadership size and performance. In a study on the collective motion of inanimate objects,
such as radio-controlled boats, it was observed that to effectively control the direction of the entire
group, only a small percentage (5%-10%) of the boats needed to act as leaders [220]. This finding
aligns with similar studies conducted on living collectives, such as fish schools and bird flocks,
where a small subset of leaders were found to have a substantial impact on the behavior of the larger
group. In our experiment, we investigate the relationship between network performance and the
size of the leadership group. The results shown in Figure 6.2a indicate that our LFNN achieves high
performance on the permuted MNIST classification task after just one pass of training data. When
using a higher percentage of leadership, such as 90% or 100%, the LFNN achieves comparable
90
Epoch 0 Epoch 1 Epoch 2 Epoch 3 Epoch 4
Workers Workers Workers Workers Workers
Layer 2
Layer 1
Figure 6.3: Leadership in workers during training. The color and size of dots represent the times a worker
is selected as leader. A worker can be selected as leader up to 300 times in each epoch.
performance to a DNN trained with BP. Even with a lower percentage of leadership, ranging from
10% to 30%, the LFNN still achieves decent performance on this task. It is worth noting that
for more challenging datasets like ImageNet, higher percentages of leadership are needed. These
findings highlight both the similarities and differences between natural collectives and LFNNs in
the deep learning field.
Ablation study of loss terms. In our investigation of LFNN training using Eq. 6.1, we evaluate
the effectiveness of the local loss terms and examine the following aspects: (a) whether global loss
alone with BP is adequate for training LFNNs, and (b) how the inclusion of local losses contributes
to training and network performance in terms of accuracy. To address these questions, we consider
four variations of the loss function, as depicted in Figure 6.2c - f: (c) L1 = Lg +L δ
l +L
¯δ
l
: This
variant includes the global loss as well as all local losses. (d) L2 = Lg + L δ
l
: Here, the global
loss is combined with the local leader loss. (e) L3 = Lg + L
¯δ
l
: This variant utilizes the global
loss along with the local follower loss. (f) L4 = Lg: In this case, only the global loss is employed.
After training LFNNs with these four different loss functions, we observe the one-pass results
in Figure 6.2b. It is evident that using only the global prediction loss (L4) with backpropagation
leads to the worst performance. The network’s accuracy does not improve significantly when
adding the local follower loss (L3) because the leader workers, which the followers rely on for
weight updates, do not perform well. As a result, the overall network accuracy remains low.
However, when we incorporate the local leader loss (L2), we notice a significant improvement
in the network’s performance after 100 training steps. The local leader loss plays a crucial role
in this improvement. Despite updating only 30% of the workers at each step, it is sufficient to
91
guide the entire network towards effective learning. Moreover, when we further include the local
follower loss (L1) to update the weights of followers based on strong leaders, the overall network
performance improves even further. As a result, the network achieves high accuracy with just one
pass of training data. These results highlight the importance of incorporating both local leader and
local follower losses in LFNN training. The presence of strong leaders positively influences the
performance of followers, leading to improved network accuracy.
Leadership development. To investigate how leadership is developed during training, we
conduct a study using batch training, where leaders are re-selected in each batch. To provide
a clearer demonstration, we focus solely on local losses in this study, thereby eliminating the
effect of the global error signal and BP. We utilize an LFNN-ℓ with two hidden FC layers, each
containing 32 workers. The leadership rate is fixed at 20%, resulting in approximately 6 leaders
being selected in each layer at every training step. The NN is trained for 300 steps in each epoch,
and the visualization of the leadership dynamics during the first 5 epochs is presented in Figure 6.3.
In the panel, the visualization depicts the development of leadership during training. Each dot’s
color and size indicate the number of times a worker is selected as a leader. In the initial epoch
(Epoch 0), we observe that several workers in each layer have already emerged as leaders, being
selected most of the time. As training progresses, exactly six workers in each layer are consistently
developed as leaders, while the remaining workers are no longer selected. By the fifth epoch, the
leadership structure becomes nearly fixed, remaining relatively unchanged throughout the training
process.
From the results obtained, leadership in LFNN-ℓ is developed in the early stages of training and
becomes fixed thereafter. The performance of the entire network relies on these leaders. Although
this aspect is not the primary focus of the current work, one promising future direction involves the
development of an intelligent dynamic leader selection algorithm. Additionally, we also investigated the performance of the best-performing leaders in each layer and compared the performance
between leaders and followers in the supplementary materials.
92
Dataset MNIST MNIST CIFAR-10
Metric Test / Train Err. (↓) Test / Train Err. (↓) Test / Train Err. (↓)
BP-enabled
BP 2.67 / 0.00 2.41 / 0.00 33.62 / 0.00
LG-BP [221] 2.43 / 0.00 2.81 / 0.00 33.84 / 0.05
LFNN 1.18 / 1.15 2.14 / 1.49 19.21 / 3.57
BP-free
FA [222] 2.82 / 0.00 2.90 / 0.00 39.94 / 28.44
FG-W [223] 9.25 / 8.93 8.56 / 8.64 55.95 / 54.28
FG-A [224] 3.24 / 1.53 3.76 / 1.75 59.72 / 41.27
LG-FG-W [224] 9.25 / 8.93 5.66 / 4.59 52.70 / 51.71
LG-FG-A [224] 3.24 / 1.53 2.55 / 0.00 30.68 / 19.39
LFNN-ℓ 1.49 / 0.04 1.20 / 1.15 20.95 / 4.69
Number of Parameters 272K∼275K 429K∼438K 876K∼919K
Table 6.1: Comparison between the proposed model and a set of BP-enabled and BP-free algorithms under
MNIST, CIFAR-10. The best test errors (%) are highlighted in bold. Leadership size is set to 70% for all
the LFNNs and LFNN-ℓs.
6.1.2 BP-free Leader-Follower Neural Networks (LFNN-ℓs)
We propose a BP-free version of LFNN, named LFNN-ℓ, to address backward locking in BP. Our
approach removes the global loss from the LFNN loss function (Eq. 6.1) to avoid BP. Instead,
we use local losses L δ
l
and L
¯δ
l within each layer. We split the network into separate blocks
(e.g., ResNet or ViT encoder blocks) and embed an output layer in each block. Each output layer
computes a global prediction loss (cross-entropy loss), denoted as L o
g
(where o stands for output).
L o
g updates weights within its block only, without transferring gradients between blocks. This
allows all L o
g
losses to be computed in parallel.
In this section, we conduct a comparative analysis between LFNN-ℓs and several alternative
approaches, with the option of engaging BP. We evaluate their performance on the MNIST, CIFAR10, and ImageNet datasets to showcase the capabilities of LFNN-ℓs and further study the impact
of leadership size. All LFNN-ℓs and LFNNs in this section consist of FC and convolutional layers.
LFNNs are trained using a combination of BP, global loss, and local losses, while BP-free LFNNℓs are trained solely with local losses.
We compare our proposed LFNNs and LFNN-ℓs with BP, local greedy backdrop (LG-BP) [221],
Feedback Alignment (FA) [222], weight-perturbed forward gradient (FG-W) [223], activity perturbation forward gradient (FG-A) [224], local greedy forward gradient weight / activity-perturbed
93
Dataset Model Leadership Percentage
10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Tiny ImageNet
LFNN-ℓ
Test 73.98 63.09 54.24 49.63 44.87 40.96 37.17 38.05 36.06 39.56
Train 71.47 57.29 43.69 38.57 30.53 22.04 19.50 19.38 16.00 32.33
LFNN Test 39.85 40.12 39.34 39.18 39.33 39.41 39.42 38.63 35.21 39.56
Train 36.50 35.76 32.71 32.16 32.02 32.36 32.70 31.91 32.59 32.33
ImageNet Subset
LFNN-ℓ
Test 90.57 84.83 78.75 73.65 68.61 64.25 59.53 56.54 53.82 54.44
Train 68.96 51.89 39.49 27.78 22.68 13.37 9.23 5.41 5.58 6.40
LFNN Test 79.37 78.83 69.87 61.80 60.05 59.10 57.46 58.01 57.37 57.75
Train 53.13 52.18 38.38 26.26 25.21 20.35 18.42 18.40 16.70 17.94
Table 6.2: Error rate (% ↓) results of LFNNs and LFNN-ℓs (with different leadership percentage) on Tiny
ImageNet and ImageNet subset. We also trained CNN counterparts (without LF hierarchy) with BP and
global loss for reference. The test error rates of BP-enabled CNNs under Tiny ImageNet and ImageNet
subset are 35.76% and 51.62%, respectively.
(LG-FG-W and LG-FG-A) [224] on MNIST, CIFAR-10, and ImageNet datasets. To ensure a
fair comparison, we make slight modifications to our model architectures to match the number of
parameters of the models presented in [224].
Table 6.1 presents the image classification results for the MNIST and CIFAR-10 datasets using
various BP and BP-free algorithms. The table displays the test and train errors as percentages for
each dataset and network size. When comparing to BP-enabled algorithms, LFNN shows similar
performance to standard BP algorithms and outperforms the LG-BP algorithm on both the MNIST
and CIFAR-10 datasets. In the case of BP-free algorithms, LFNN-ℓ achieves lower test errors
for both MNIST and CIFAR-10 datasets. Specifically, in MNIST, our LFNN-ℓ achieves test error
rates of 2.04% and 1.20%, whereas the best-performing baseline models achieve 2.82% and 2.55%,
respectively. For the CIFAR-10 dataset, LFNN-ℓ outperforms all other BP-free algorithms with a
test error rate of 20.95%, representing a significant improvement compared to the best-performing
LG-FG-A algorithm, which achieves a test error rate of 30.68%1
.
Traditional BP-free algorithms have shown limited scalability when applied to larger datasets
such as ImageNet [225]. To assess the scalability of LFNN and LFNN-ℓ, we conduct experiments
1More experimental results for MNIST and CIFAR-10 under different percentage of leadership can be found in the
supplementary materials.
94
on ImageNet subset and Tiny ImageNet2
. We increase the models’ number of parameters to approximately 18M and explore the impact of leadership size on model performance. The results
of the error rates for Tiny ImageNet and ImageNet subset with varying leadership percentages
are presented in Table 6.2. For Tiny ImageNet, we observe that using a leadership percentage
of 90% yields the lowest test error rates, with LFNN achieving 35.21% and LFNN-ℓ achieving
36.06%. These results are surprisingly comparable to other BP-enabled deep learning models
tested on Tiny ImageNet, such as UPANets (test error rate = 32.33%) [226], PreActRest (test
error rate = 36.52%) [227], DLME (test error rate = 55.10%) [228], and MMA (test error rate
= 35.59%) [229].
In the ImageNet subset experiments, we follow the methodology of [230] and leverage the
ResNet-50 architecture as the base encoder, combining it with LFNN and LFNN-ℓ. LFNN and
LFNN-ℓ with 90% leadership achieve the lowest test error rates of 57.37% and 53.82%, respectively. These results surpass all baseline models in Table 6.1 and are even comparable to the test
error rates of BP-enabled algorithms reported in [230], which is 50.60%. This observation further
demonstrates the effectiveness of our proposed algorithm in transfer learning scenarios.
Furthermore, we observed even better results than those in Table 6.2 when further increasing
the number of parameters. From Figure 6.2a and Table S2, we recall that for simple tasks like
MNIST or CIFAR-10 classification, small leadership sizes can achieve decent results. In Table
6.2, we observe a clearer trend that for difficult datasets like ImageNet subset, a higher leadership
percentage is required to achieve better results. This presents an interesting avenue for future
exploration, particularly in understanding the relationship between network / leadership size and
dataset complexity.
Embedding LFNN-ℓ in VGG, ResNet and ViT scaling up to ImageNet. To evaluate the scalability of LFNN-ℓ within the context of classic CNN architectures, we integrated LFNN-ℓ into
VGG19, ResNet-50, ResNet-101, ResNet-152, and ViT-B-163
. We assess its impact on both accuracy performance (measured by error rates) and speedup performance across various datasets,
2
ImageNet results can be found in the supplementary materials.
3LLFN-ℓ was embedded in the MLP layers
95
a. b.
Figure 6.4: Classification error of a. ResNet-101 and b. ResNet-152 on CIFAR-10 and Tiny-ImageNet.
including CIFAR-10, Tiny-ImageNet, and ImageNet. For CNN families (VGG and ResNet), we
compare our method with two state-of-the-art BP-free block-wise learning algorithms: DGL [216]
and SEDONA [217]. For ViT, we compare our algorithm with the BP-based method. 4
. To
ensure a fair comparison, LFNN-ℓ employed a leadership size of 100%, guaranteeing that inner
communication among leaders and followers would not affect the speedup (the entire architecture
will be divided into K blocks of workers). Additionally, LFNN-ℓ was configured with the same
model implementations and hyperparameter settings (a batch size of 256 using SGD) as DGL and
SEDONA5
.
Tables 6.3 (a) and (b) show the prediction error rates for all block-wise BP-free learning models
with 4 blocks on CIFAR-10 and Tiny-ImageNet for the CNN family, respectively. In almost all
cases, DGL performs worse than BP, and SEDONA can marginally outperform BP. In contrast,
our LFNN-ℓ not only significantly outperforms BP but also outperforms SEDONA and DGL in all
cases. Figure 6.4 illustrates the error rates of various models with different numbers of blocks (K).
DGL consistently performs worse than BP across all values of K. SEDONA’s results are similar
to BP under Tiny-ImageNet (ResNet-101) and CIFAR-10 (ResNet-152) when K ≥ 8. In contrast,
LFNN-ℓ significantly outperforms BP, SEDONA, and DGL in all cases, suggesting that LFNN-ℓ
exhibits superior scalability compared to SEDONA and DGL.
Table 6.4 reveals the classification errors on the validation set and training speedup of ImageNet
for ResNet-101, ResNet-152 and ViT-B-16. It is evident that LFNN-ℓ outperforms BP, DGL, and
4We choose "ViT_B_16_Weights.IMAGENET1K_SWAG_E2E_V1" from this link as our baseline
5DGL: https://github.com/eugenium/DGL; SEDONA: https://github.com/mjpyeon/sedona.
96
Model BP DGL SEDONA LFNN-ℓ
VGG-19 12.31 12.19 11.58 6.52
ResNet50 7.99 8.27 7.53 5.05
ResNet101 7.14 8.30 6.59 4.92
ResNet152 6.35 6.39 6.13 4.62
(a)
Model BP DGL SEDONA LFNN-ℓ
VGG-19 47.11 48.70 43.44 40.09
ResNet50 46.54 46.04 45.60 36.83
ResNet101 44.50 46.20 40.88 35.28
ResNet152 39.18 42.36 35.90 35.01
(b)
Table 6.3: Error rates (% ↓) on CIFAR-10 (a) and Tiny-ImageNet (b) with all baseline models with 4 blocks.
Model Method Top-1 Err. Top-5 Err. Speedup Ratios Params. (M)
ResNet-101
BP 21.62 5.94 1 44.55
DGL 22.35 6.44 1.92 47.09
∗SEDONA 21.00 5.52 2.01 70.36
LFNN-ℓ 20.92 5.44 2.07 46.34
ResNet-152
BP 21.40 5.69 1 60.19
DGL 22.20 6.39 2.23 62.73
∗SEDONA 20.20 5.13 2.02 86.00
LFNN-ℓ 20.08 5.01 2.25 61.98
ViT-B-16 BP 14.70 2.35 1 86.86
LFNN-ℓ 11.16 1.18 2.14 88.57
∗Results are from SEDONA.
Table 6.4: Error rates (% ↓), speedup ratios (↑), and the number of parameters (↓) compared among different
methods on ResNet families and ViT, each with 4 blocks, when applied to ImageNet.
SEDONA in terms of both top-1 and top-5 validation errors. In CNN family, LFNN-ℓ achieves the
highest speedup in ResNet-101 and ResNet-152. Of note, DGL has a similar speedup with LFNNℓ on ResNet-152 architecture. This can be attributed to DGL’s uniform network splitting, which
significantly enhances parallelization in larger architectures. Speedup is calculated as the ratio of
BP’s wall-clock training time compared to BP-free benchmarks. To be more specific, Table 6.5
reveals the training time comparison. To ensure fairness, we calculate the speedup of all models by
distributing blocks across K A100 GPUs, using 90 epochs and a batch size of 256. Furthermore,
LFNN-ℓ requires the fewest additional parameters (Params.) for integrating into ResNet-101/152
when compared to other BP-free approaches. The reason for this is that LFNN-ℓ does not rely on
large auxiliary networks after each block, as SEDONA and DGL do. Of note, unlike the results
shown in Table 6.1, the LFNN-ℓ embedded in ResNet exhibits superior performance compared
to BP on ImageNet, indicating that the network size is a key factor in LFNN-ℓ’s performance.
97
Method ResNet-101 ResNet-152 ViT-B-16
BP 27.60 40.72 55.56
DGL 14.38 18.26 -
∗SEDONA 13.73 20.16 -
LFNN-ℓ 13.33 18.10 25.96
∗Results are from SEDONA.
Table 6.5: Training time (↓) in hours, compared across different methods for ResNet families and ViT, each
with 4 blocks, when applied to ImageNet.
MAE = 2.41 yr MAE = 2.23 yr MAE = 2.30 yr MAE = 2.12 yr
a 3D-CNN (males) 3D-CNN (females) LFNN- (males) LFNN- (females) . b. c. d.
Figure 6.5: BA estimation errors for test set participants. Scatter plots show participants’ BAs vs. CAs for
males (a) and females (b) using the BP-based 3D-CNN model. Plots (c) and (d) display the corresponding
results for LFNN-ℓ. Solid line in each panel represent no error.
For more complex datasets, LFNN-ℓ relies on additional leader-provided information, whereas in
small CNN architectures, leaders are unable to gather sufficient information to effectively guide
the training process. Furthermore, when we scale up LFNN-ℓ in a larger ViT-based model, we observe significant improvements compared to ViT trained with BP. Specifically, LFNN-ℓ achieves
a 24.08% lower top-1 error rate, a 49.79% lower top-5 error rate, and a training time that is 2.14
times faster than BP. We think the reason our LFNN-ℓ can outperform classic BP on VGG, ResNet,
and ViT is that our local losses are responsible for updating only one block. This avoids potential
conflicts among different loss functions. For more details, see Figure S4 and the "Loss Presentations" section in the appendices.
Embedding LFNN-ℓ in 3D-CNN for analyzing brain MRIs. To validate the scalability of
LFNN-ℓ in analyzing 3D biomedical images, we integrated LFNN-ℓ into the latest 3D-CNN [92]
with 4 blocks for predicting brain age in cognitively normal patients based on brain magnetic resonance images (MRIs). For a fair comparison, both models were trained on 4,681 participants and
98
tested on 1,170 participants from the UK Biobank (UKBB), Human Connectome Project-Aging
(HCP-A), Human Connectome Project-Young Adult (HCP-YA), and Alzheimer’s Disease Neuroimaging Initiative (ADNI). We assessed model performance using mean absolute error (MAE)
among male and female individuals. In the test set, the MAE of LFNN-ℓ between predicted brain
age (BA) and chronological age (CA) was 2.30 years (yr) for males and 2.12 yr for females, while
the 3D-CNN’s MAE was 2.41 yr for males and 2.23 yr for females. These results show that LFNNℓ outperforms BP-based model in analyzing 3D brain MRIs and predicting BA providing new ideas
for developing better distributed deep learning architectures for biomedical applications.
6.2 Methods
Datasets. Both MNIST and CIFAR-10 are obtained from the TensorFlow datasets [231]. MNIST [218]
contains 70,000 images, each of size 28×28. CIFAR-10 [232] consists of 60,000 images, each of
size 32×32. ImageNet [deng2009ImageNet] contains 1.3 million images of 1000 classes, which
we resized to 224×224. Tiny ImageNet [233] consists of a dataset of 100,000 images distributed
across 200 classes, with 500 images per class for training, and an additional set of 10,000 images
for testing. All images in the dataset are resized to 64×64 pixels. ImageNet subset (1pct) [230] is
a subset of ImageNet [deng2009ImageNet]. It shares the same validation set as ImageNet and includes a total of 12,811 images sampled from the ImageNet. These images are resized to 224×224
pixels for training.
LF hierarchy in fully connected layers. In a fully-connected (FC) layer containing multiple
neurons, we define workers as structures containing one or more neurons grouped together. Unlike
classic NNs where neurons are the basic units, LFNN workers serve as basic units. By adapting
the Vicsek model terms to deep learning, a worker’s behavior is dominated by that of neighbors
in the same layer. In addition, we consider leadership relations inside the group. According to
collective motion, “leadership” involves “the initiation of new directions of locomotion by one or
more individuals, which are then readily followed by other group members” [234]. Thus, in FC
99
layers, one or more workers are selected as leaders, and the rest are “followers” as shown in Figure
6.1b.
LF hierarchy extended in convolutional layers. Given a convolutional layer with multiple filters
(or kernels), workers can be defined as one or more filters grouped together to form filter-wise
workers. For a more coarsely-grained formulation, given a NN with multiple convolutional layers, a set of convolutional layers can be grouped naturally as a block (such as in VGG [235],
ResNet [236], Inception [237] architectures). Our definition of the worker can be easily adapted
to encompass block-wise workers to reflect this architecture where a block of convolutional layers
work together as a single, block-wise worker. Similarly, if a block contains one layer, it becomes a
layer-wise worker.
More formally, we consider a NN with M hidden layers, and a hidden layer contains N
workers. A worker can contain one or more individual working components, which can be neurons,
filters in convolutional layers, or blocks of NN layers, and each individual working component is
parametrized by a set of trainable parameters W . During training, at each time step t, leader
workers Nδ are dynamically selected and the remaining workers are labeled as followers (denoted
as N¯δ
) at time step t. Following the same notation, leader and follower workers are parameterized
by matrices W⃗
δ and W⃗¯δ
, respectively. The output of leader and follower workers in a hidden layer
reads f(⃗x,[W⃗
δ
,W⃗¯δ
]), where⃗x is the input to the current hidden layer and f(·) is a mapping function.
Error signals in LFNN. In human groups, one key difference between leaders and followers is that
leaders are informed individuals that can guide the whole group, while followers are uninformed
and their instructions differ from treatment to treatment [238]. Adapting this concept to deep
learning, LFNN leaders are informed that they receive error signals generated from the global
or local prediction loss functions, whereas followers do not have this information. Specifically,
assume that we train an LFNN with BP and a global prediction loss function Lg. Only leaders
Nδ and output neurons receive information on gradients as error signals to update weights. This is
similar to classic NN training, so we denote these pieces of information as global error signals. In
100
addition, a local prediction error L δ
l
is optionally provided to leaders to encourage them to make
meaningful predictions independently.
By contrast to leaders, followers N¯δ
do not receive error signals generated in BP. Instead,
they align with their neighboring leaders. Inspired by collective biological systems, we propose
an “alignment” algorithm for followers and demonstrate its application in an FC layer as follows:
Consider an FC layer where the input to a worker is represented by ⃗x, and the worker is parameterized by W⃗ (i.e., the parameters of all neurons in this worker). The output of a worker is given
by ⃗y = f(W⃗ ·⃗x). In this context, we denote the outputs of a leader and a follower as ⃗yδ and ⃗y ¯δ
,
respectively. To bring the followers closer to the leaders, a local error signal is applied between
leader workers and follower workers, denoted as L
¯δ
l = D(⃗yδ
,⃗y ¯δ
), where D(a,b)
6 measures the
distance between a and b. In summary, the loss function of our LFNN is defined as follows
L = Lg +λ1L δ
l +λ2L
¯δ
l
.
BP-free version (LFNN-ℓ). To address the limitations of BP such as backward locking, we
propose a BP-free version of LFNN. The approach is as follows: In Eq. 6.1, it can be observed
that the weight updates for followers are already local and do not propagate through layers. Based
on this observation, we modify LFNN to train in a BP-free manner by removing the BP for global
prediction loss. Instead, we calculate leader-specific local prediction loss (L δ
l
) for all leaders.
This modification means that the global prediction loss calculated at the output layer, denoted as
L o
g
(where o stands for output), is only used to update the weights of the output layer. In other
words, this prediction loss serves as a local loss for the weight update of the output layer only. The
total loss function of the BP-free LFNN-ℓ is given as follows:
L = L o
g +L δ
l +λL
¯δ
l
(6.2)
By eliminating the backpropagation of the global prediction loss to hidden layers, the weight update of leader workers in LFNN is solely driven by the local prediction loss, as depicted in Figure
6
In our experimentation, we utilize mean squared error loss.
101
6.1e. It’s important to note that the weight update of follower workers remains unchanged, regardless of whether backpropagation is employed or not, as shown in Figure 6.1c.
Dynamic leadership selection. In our LF hierarchy, the selection of leadership is dynamic and
occurs in each training epoch based on the local prediction loss. In a layer with N workers, each
worker can contain one or more neurons, enabling it to handle binary or multi-class classification
or regression problems on a case-by-case basis. This unique characteristic allows a worker, even if
it is located in hidden layers, to make predictions⃗y. This represents a significant design distinction
between our LFNN and a traditional NN. Consequently, all workers in a hidden layer receive their
respective prediction error signal, denoted as L δ
l
(⃗y, yˆ). Here, Ll(·,·) represents the prediction
error function, the superscript δ indicates that it is calculated over the leaders, ˆy denotes the true
label, and the top δ (0 ≤ δ ≤ 100%) workers with the lowest prediction error are selected as
leaders.
Definition 6.2.1 (Leadership). Within a set of N workers, each worker generates a prediction
error denoted as Ll(⃗y, yˆ). From this set, we select δ leaders based on their lowest prediction
errors. The prediction loss for these leaders is represented as L δ
l
(⃗y, yˆ). The remaining workers are
referred to as followers, and their prediction loss is denoted as L
¯δ
l
(⃗y, yˆ).
Implementation details. To enable workers in hidden layers to generate valid predictions, we apply
the same activation function used in the output layer to each worker. For instance, in the case
of a NN designed for N-class classification, we typically include N output neurons in the output
layer and apply the softmax function. In our LFNN, each worker is composed of N neurons, and
the softmax function is applied accordingly. The leader loss (L δ
l
) for each output layer is the
cross-entropy loss between the outputs and the true labels. In order to align the followers with the
leaders, we adopt a simplified approach by selecting the best-performing leader as the reference
for computing L
¯δ
l
. While other strategies such as random selection from the δ leaders were
also tested, they did not yield satisfactory performance. Therefore, for the sake of simplicity and
better performance, we choose the best-performing leader as the reference for the followers’ loss
computation.
102
Practical benefits and overheads. In contrast to conventional neural networks trained with BP
and a global loss, our LFNN-ℓ computes worker-wise loss and gradients locally. This approach
effectively eliminates backward locking issues, albeit with a slight overhead in local loss calculation. One significant advantage of the BP-free version is that local error signals can be computed in
parallel, enabling potential speed-up in the weight update process through parallel implementation.
Participants and Neuroimaging. This research was conducted in compliance with the US Code
of Federal Regulations (45 C.F.R. 46) and the Declaration of Helsinki. The MRIs utilized in
this study were sourced from other research projects, each having received approval from the
respective institutional review boards or ethical oversight committees, as applicable for ADNI [38]
and HCP [39]. The UKBB data collection was carried out with ethical approval from the North
West Multi-Centre Research Ethics Committee of the United Kingdom, while the CamCAN data
received ethical approval from the Cambridgeshire 2 (now East of England—Cambridge Central)
Research Ethics Committee. Informed written consent was obtained from all participants.
The combined dataset includes 5,851 cognitively normal individuals (3,142 females) aged 22-
95 years, sourced from ADNI (N = 510), HCP-A (N = 508), HCP-YA (N = 1,112), and UKBB
(N = 3,721). ADNI, initiated in 2003 as a public-private partnership led by Principal Investigator
Michael W. Weiner, MD, aims to determine if serial MRI, positron emission tomography, other
biological markers, and clinical and neuropsychological assessments can be integrated to monitor
the progression of MCI and early AD. Detailed MRI acquisition protocols for HCP-A and HCPYA are available in the literature [40]. For the UKBB data, we used preprocessed images generated
by the UKBB pipeline, which includes FreeSurfer reconstructions [41].
6.3 Discussion
Most deep neural networks today are trained using backpropagation (BP) [11, 12, 13]. However,
BP is considered "biologically implausible" because the brain does not form symmetric backward
connections. Additionally, BP is incompatible with high levels of model parallelism and restricts
103
potential hardware designs. These limitations highlight the need for a fundamentally different
learning algorithm for deep networks. To address these issues and incorporate "biological plausibility" into deep learning architectures, we present the LFNN approach. Inspired by natural collective behavior, LFNN introduces a leader-follower hierarchy within neural networks. Our study
demonstrates its effectiveness across various architectures, aligning with both biological observations and deep learning theories. We also introduce LFNN-ℓ, a BP-free variant that uses local error
signals instead of traditional BP to optimize each block in parallel, thus accelerating the training
process. We have shown that LFNN-ℓ, trained without a global loss, achieves superior performance
compared to other BP-free algorithms. Through extensive experiments on the MNIST, CIFAR-10,
and ImageNet datasets, we have validated the efficacy of LFNN with and without BP. LFNN-ℓ
not only outperforms other state-of-the-art BP-free algorithms across all tested datasets but also
achieves competitive results when compared to BP-enabled baselines in a significant number of
cases.
Our work is the first to introduce collective motion-inspired models for deep learning architectures, opening new avenues for the development of local error signals and alternatives to BP.
Previous works [239] have shown that training with local error signals involves solving multiple
small optimization problems. Each intermediate layer’s (block) weights are adjusted to reduce
the error of its own local classifier. These local errors help the intermediate blocks learn features
that are easier to separate into different categories. While intermediate blocks might not perfectly
separate the input into categories (indicated by some remaining error), the local errors encourage
them to make the features as distinct as possible. Subsequent blocks then build on these partially
separated features, refining them further. Ultimately, a higher block in the network learns a representation that is clear enough for a simple classifier to perform well.
Moreover, to the best of our knowledge, we are the first to embed a BP-free approach into
ViT architectures. Our LFNN-ℓ can be seamlessly integrated into convolutional neural networks
(e.g., VGG and ResNet) and ViT architectures with minimal additional parameters, significantly
accelerating the training process. Training deep learning models on large datasets is both costly
104
and time-consuming. For instance, pretraining ViT-L/16 on JFT-300M requires 0.23k TPUv3-coredays, ViT-H/14 requires 2.5k days, and BiT (ResNet152x4) demands 9.9k days [10]. In the field
of 3D biomedical image analysis, training times are even longer due to the larger data size of 3D
images compared to 2D images. Consequently, the acceleration provided by our LFNN model can
significantly reduce the model pre-training time in these domains. We believe this early study offers
novel tools and valuable insights for addressing fundamental challenges in deep learning, including
neural network architecture design and the development of biologically plausible decentralized
learning algorithms.
105
Chapter 7
Conclusion and future directions
MCPS represent a critical integration of medical devices, leveraging big data and cloud computing platforms to enhance healthcare services and diagnostics. These systems find increasing
applications in healthcare settings, aiming to deliver high-quality, efficient healthcare solutions in
hospitals and e-healthcare applications. In this thesis, we address challenges posed by MCPS and
propose methods to handle various scenarios where conventional MCPS might fall short. Our key
contributions can be summarized as follows:
• In Chapter 2, we present an interpretable 3D-CNN model for estimating BA. Trained on MRI
data from 4,681 CN participants and tested on 1170 CN participants from an independent
sample, our model achieves notably lower BA estimation errors than previous studies. The
CNN offers detailed anatomical maps of brain aging patterns at both individual and cohort
levels, revealing sex dimorphisms and neurocognitive trajectories in adults with MCI and
AD. Our study advances existing interpretable deep learning methods by providing voxelwise saliency maps at native MRI resolution, offering sex and cognitive status comparisons,
and establishing connections between cognitive status and neurocognitive function.
• To fully mine the potential of image rescaling models based on INN and develop more efficient super-resolution models, two novel modules are proposed in Chapter 3 to store otherwise lost high-frequency information z. The IRN-M model utilizes an autoencoder to compress z and save as metadata in native image format so it can be decoded to an approximate
106
of z, while IRN-A adds an additional channel to store crucial high-frequency information,
which can be quantized and stored as the alpha-channel, in addition to the RGB channels, in
existing RGBA format. With carefully designed autoencoder and alpha-channel pre-split, it
is shown that both modules can improve the upscaling performance significantly comparing
to the IRN baseline. The proposed modules are also applicable to newer baseline models
like DLV-IRN and DLV-IRN-A is by far the best, which further pushes the limit of image
rescaling performance with a significant margin.
• Chapter 4 introduces a fractional dynamics-based model designed to accurately analyze
long-term memory in COPD physiological signal datasets. This is achieved by extracting
fractional features (in the form of a coupling matrix A) from extensive time-series data, leading to improved interpretability and classification accuracy for deep learning models. Surprisingly, even linear classifiers exhibit commendable accuracy, underscoring the strength of
our model’s generalizability across diverse COPD records and datasets. As evidenced by the
results from k-fold cross-validation, hold-out validation, and external validations, we assert
that our FDDLM is highly versatile, capable of application to various COPD records containing physiological signals. Additionally, through transfer learning results, we posit that
our FDDLM is robust enough to accurately predict COPD stages across distinct datasets.
• Chapter 5 demonstrates a computer vision-based segmentation model designed to extract
neuronal culture networks from quantitative phase imaging data. Our investigation delves
into the mathematical attributes of brain-derived neuronal culture networks and brain-derived
neuronal culture cluster networks. This examination involves precise localization and detection of individual axons and dendrites, carried out with an impressive accuracy of 0.03nm in
optical path-length [195]. This meticulous approach offers a means to analyze the spontaneous evolution of neuronal cultures during their early stages.
• Chapter 6 presents the LFNN approach. Inspired by natural collective behaviors, LFNN
establishes a leader-follower hierarchy within neural networks. Our study demonstrates its
107
effectiveness across various architectures, aligning with both biological observations and
deep learning theories. Furthermore, we introduce LFNN-ℓ, a BP-free variant that employs
local error signals instead of traditional backpropagation to optimize each block in parallel,
thereby accelerating the training process. We have shown that LFNN-ℓ, trained without a
global loss, achieves superior performance compared to other BP-free algorithms. Through
extensive experiments on the MNIST, CIFAR-10, and ImageNet datasets, we validate the
efficacy of LFNN both with and without backpropagation. LFNN-ℓ not only outperforms
other state-of-the-art BP-free algorithms across all tested datasets but also achieves competitive results when compared to BP-enabled baselines in numerous cases.
This thesis addresses the challenges and limitations inherent in MCPS and presents innovative
solutions to enhance their performance in healthcare applications. These contributions collectively
address crucial challenges in the field, improving the interpretability of models, enhancing data
quality, and analyzing data geometry to boost model performance.
Looking ahead, there are exciting opportunities for further research and advancements in this
domain. Future work could focus on refining and expanding the proposed models to address a
wider range of medical challenges, enhancing their applicability and impact in healthcare settings.
In the context of the interpretable 3D-CNN brain-age prediction model, we identified a limitation related to its performance on longitudinal brain MRIs from the same individual. This limitation stems from the fact that the 3D-CNN was not originally trained on this specific type of data,
which prevents it from effectively understanding how the brain ages within the same individuals
over time. To address this limitation, our future work aims to train the interpretable 3D-CNN architecture using longitudinal brain MRI datasets. This training approach is anticipated to enhance
the model’s capability to capture aging patterns within the same individuals over time.
Moving to our SR model, while the IRN-M model demonstrated superior performance compared to most baseline models, it only surpassed the latest DLV-IRN model by a marginal percentage of 2%. The discrepancy between the high-frequency information recovered by the autoencoder
108
and the original information was minimal, with an MSE between the two of less than 1e−6. However, this small MSE difference did not sufficiently translate into substantial improvement across
the entire IRN-M model. To address this, our future work intends to incorporate a KL divergence
term in addition to the MSE loss during the autoencoder training. This dual loss approach aims to
enhance the training process and assess whether the new loss term can more effectively influence
the performance of the IRN-M architecture.
Furthermore, regarding data geometry, our fractional dynamic deep learning model has demonstrated promising performance on COPD datasets collected using the NOT device from the Victor
Babes institution. This hospital also use NOX T3™ to collect physiological signals data from
patients with sleep apnea. Given that both sleep apnea and COPD are respiratory system-related
disorders, our future work will explore the applicability of the FDDLM on sleep apnea patients’
data. This investigation aims to determine whether the model’s success on COPD data can be
extended to analyze and predict outcomes for patients with sleep apnea as well.
109
Bibliography
[1] Gregor Guncar et al. “An application of machine learning to haematological diagnosis”. ˇ
In: Scientific reports 8.1 (2018), pp. 1–12.
[2] Liangpei Zhang, Lefei Zhang, and Bo Du. “Deep learning for remote sensing data: A
technical tutorial on the state of the art”. In: IEEE Geoscience and remote sensing magazine
4.2 (2016), pp. 22–40.
[3] Supriyo Chakraborty et al. “Interpretability of deep learning models: A survey of results”.
In: 2017 IEEE smartworld, ubiquitous intelligence & computing, advanced & trusted computed, scalable computing & communications, cloud & big data computing, Internet of
people and smart city innovation (smartworld/SCALCOM/UIC/ATC/CBDcom/IOP/SCI).
IEEE. 2017, pp. 1–6.
[4] Zachary C Lipton. “The mythos of model interpretability: In machine learning, the concept
of interpretability is both important and slippery.” In: Queue 16.3 (2018), pp. 31–57.
[5] Chenzhong Yin et al. “Fractional dynamics foster deep learning of COPD stage prediction”. In: Advanced Science 10.12 (2023), p. 2203485.
[6] Y Li, Bruno Sixou, and F Peyrin. “A review of the deep learning methods for medical
images super resolution problems”. In: Irbm 42.2 (2021), pp. 120–133.
[7] Abhinav Jain et al. “Overview and importance of data quality for machine learning tasks”.
In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 2020, pp. 3561–3562.
[8] Alexander Greaves-Tunnell and Zaid Harchaoui. “A statistical investigation of long memory in language and music”. In: International Conference on Machine Learning. PMLR.
2019, pp. 2394–2403.
[9] Jianpeng Cheng, Li Dong, and Mirella Lapata. “Long short-term memory-networks for
machine reading”. In: arXiv preprint arXiv:1601.06733 (2016).
[10] Alexey Dosovitskiy et al. “An image is worth 16x16 words: Transformers for image recognition at scale”. In: arXiv preprint arXiv:2010.11929 (2020).
110
[11] Paul Werbos. “Beyond regression: New tools for prediction and analysis in the behavioral
sciences”. In: PhD thesis, Committee on Applied Mathematics, Harvard University, Cambridge, MA (1974).
[12] Yann LeCun. “A learning scheme for asymmetric threshold networks”. In: Proceedings of
COGNITIVA 85.537 (1985), pp. 599–604.
[13] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. “Learning Internal Representations by Error Propagation, Parallel Distributed Processing, Explorations in the
Microstructure of Cognition, ed. DE Rumelhart and J. McClelland. Vol. 1. 1986”. In:
Biometrika 71 (1986), pp. 599–607.
[14] James H Cole et al. “Predicting brain age with deep learning from raw imaging data results
in a reliable and heritable biomarker”. In: NeuroImage 163 (2017), pp. 115–124.
[15] Huiting Jiang et al. “Predicting brain age of healthy adults based on structural MRI parcellation using convolutional neural networks”. In: Frontiers in Neurology 10 (2020), p. 1346.
[16] Cynthia Rudin. “Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead”. In: Nature Machine Intelligence 1.5 (2019),
pp. 206–215.
[17] Mingqing Xiao et al. “Invertible image rescaling”. In: European Conference on Computer
Vision. Springer. 2020, pp. 126–144.
[18] Samora Okujeni, Steffen Kandler, and Ulrich Egert. “Mesoscale architecture shapes initiation and richness of spontaneous network activity”. In: Journal of Neuroscience 37.14
(2017), pp. 3972–3987.
[19] Daniel de Santos-Sierra et al. “Emergence of small-world anatomical networks in selforganizing clustered neuronal cultures”. In: PloS one 9.1 (2014), e85828.
[20] Anar Amgalan et al. “Brain age estimation reveals older adults’ accelerated senescence
after traumatic brain injury”. In: GeroScience (2022), pp. 1–17.
[21] Carlos Lopez-Otin et al. “The hallmarks of aging”. In: Cell 153.6 (2013), pp. 1194–1217.
[22] Andrei Irimia et al. “Statistical estimation of physiological brain age as a descriptor of
senescence rate during adulthood”. In: Brain Imaging and Behavior 9.4 (2015), pp. 678–
689.
[23] Iman Beheshti et al. “Bias-adjustment in neuroimaging-based brain age frameworks: A
robust scheme”. In: NeuroImage: Clinical 24 (2019), p. 102063.
111
[24] James H Cole et al. “Brain age predicts mortality”. In: Molecular Psychiatry 23.5 (2018),
pp. 1385–1392.
[25] Roy J Massett et al. “Regional neuroanatomic effects on brain age inferred using magnetic resonance imaging and ridge regression”. In: The Journals of Gerontology: Series A
(2022).
[26] Kaida Ning et al. “Improving brain age estimates with deep learning leads to identification of novel genetic factors associated with brain aging”. In: Neurobiology of Aging 105
(2021), pp. 199–204.
[27] Nils Opel et al. “Brain structural abnormalities in obesity: Relation to age, genetic risk, and
common psychiatric disorders”. In: Molecular Psychiatry 26.9 (2021), pp. 4839–4852.
[28] James H Cole et al. “Increased brain-predicted aging in treated HIV disease”. In: Neurology
88.14 (2017), pp. 1349–1357.
[29] James H Cole et al. “No evidence for accelerated aging-related brain pathology in treated
human immunodeficiency virus: Longitudinal neuroimaging results from the comorbidity in relation to AIDS (COBRA) project”. In: Clinical Infectious Diseases 66.12 (2018),
pp. 1899–1909.
[30] Andrei Irimia. “Cross-sectional volumes and trajectories of the human brain, gray matter,
white matter and cerebrospinal fluid in 9473 typically aging adults”. In: Neuroinformatics
19.2 (2021), pp. 347–366.
[31] Andrei Irimia et al. “Acute cognitive deficits after traumatic brain injury predict Alzheimer’s
disease-like degradation of the human default mode network”. In: Geroscience 42.5 (2020),
pp. 1411–1429.
[32] Ahmed Salih et al. “Brain age estimation at tract group level and its association with daily
life measures, cardiac risk factors and genetic variants”. In: Scientific Reports 11.1 (2021),
pp. 1–14.
[33] Wen Yih Isaac Tseng, Yung Chin Hsu, and Te Wei Kao. “Brain Age Difference at Baseline
Predicts Clinical Dementia Rating Change in Approximately Two Years”. In: Journal of
Alzheimer’s Disease Preprint (2022), pp. 1–15.
[34] Phoebe Imms, Helena C Chui, and Andrei Irimia. “Alzheimer’s disease after mild traumatic brain injury”. In: Aging 14.13 (2022), p. 5292.
[35] Lukas Fisch et al. “Predicting chronological age from structural neuroimaging: The predictive analytics competition 2019”. In: Frontiers in Psychiatry 12 (2021).
112
[36] Han Peng et al. “Accurate brain age prediction with lightweight deep neural networks”. In:
Medical Image Analysis 68 (2021), p. 101871.
[37] Weikang Gong et al. “Optimising a simple fully convolutional network for accurate brain
age prediction in the PAC 2019 challenge”. In: Frontiers in Psychiatry 12 (2021).
[38] Ronald Carl Petersen et al. “Alzheimer’s disease neuroimaging initiative (ADNI): Clinical
characterization”. In: Neurology 74.3 (2010), pp. 201–209.
[39] Jennifer Stine Elam et al. “The Human Connectome Project: A retrospective”. In: NeuroImage 244 (2021), p. 118543.
[40] David C Van Essen et al. “The Human Connectome Project: A data acquisition perspective”. In: NeuroImage 62.4 (2012), pp. 2222–2231.
[41] Fidel Alfaro-Almagro et al. “Image processing and quality control for the first 10,000 brain
imaging datasets from UK Biobank”. In: NeuroImage 166 (2018), pp. 400–424.
[42] Jason R Taylor et al. “The Cambridge Centre for Ageing and Neuroscience (Cam-CAN)
data repository: Structural and functional MRI, MEG, and cognitive data from a crosssectional adult lifespan sample”. In: NeuroImage 144 (2017), pp. 262–269.
[43] Lovingly Quitania Park et al. “Confirmatory factor analysis of the ADNI neuropsychological battery”. In: Brain Imaging and Behavior 6.4 (2012), pp. 528–539.
[44] Bruce Fischl. “FreeSurfer”. In: NeuroImage 62.2 (2012), pp. 774–781.
[45] Zijun Zhang. “Improved Adam optimizer for deep neural networks”. In: 2018 IEEE/ACM
26th International Symposium on Quality of Service (IWQoS). IEEE. 2018, pp. 1–2.
[46] Trang T Le et al. “A nonlinear simulation framework supports adjusting for age when
analyzing BrainAGE”. In: Frontiers in Aging Neuroscience (2018), p. 317.
[47] Matthias S Treder et al. “Correlation constraints for regression models: Controlling bias in
brain age prediction”. In: Frontiers in Psychiatry (2021), p. 25.
[48] Konstantinos Kamnitsas et al. “Efficient multi-scale 3D CNN with fully connected CRF
for accurate brain lesion segmentation”. In: Medical image analysis 36 (2017), pp. 61–78.
[49] Pierre Baldi and Peter J Sadowski. “Understanding dropout”. In: Advances in neural information processing systems 26 (2013).
[50] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. “Deep inside convolutional
networks: Visualising image classification models and saliency maps”. In: arXiv preprint
arXiv:1312.6034 (2013).
113
[51] Matthew F Glasser and David C Van Essen. “Mapping human cortical areas in vivo based
on myelin content as revealed by T1- and T2-weighted MRI”. In: The Journal of Neuroscience (), pp. 11597–11616.
[52] Sean O Mahoney et al. “Mild traumatic brain injury results in significant and lasting cortical demyelination”. In: Frontiers in Neurology 13 (2022), p. 854396.
[53] Soo Borson et al. “Improving dementia care: The role of screening and detection of cognitive impairment”. In: Alzheimer’s & Dementia 9.2 (2013), pp. 151–159.
[54] Charles Marcus, Esther Mena, and Rathan M Subramaniam. “Brain PET in the diagnosis
of Alzheimer’s disease”. In: Clinical Nuclear Medicine 39.10 (2014), e413.
[55] Juan José Vaquero and Paul Kinahan. “Positron emission tomography: Current challenges
and opportunities for technological advances in clinical and preclinical imaging systems”.
In: Annual Review of Biomedical Engineering 17 (2015), pp. 385–414.
[56] Jo Wrigglesworth et al. “Brain-predicted age difference is associated with cognitive processing in later-life”. In: Neurobiology of Aging 109 (2022), pp. 195–203.
[57] Laura E Korthauer et al. “Brain-behavior investigation of potential cognitive markers of
Alzheimer’s disease in middle age: A multi-modal imaging study”. In: Brain Imaging and
Behavior (2021), pp. 1–8.
[58] Johnny Wang et al. “Gray matter age prediction as a biomarker for risk of dementia”. In:
Proceedings of the National Academy of Sciences 116.42 (2019), pp. 21213–21218.
[59] Francesca Biondo et al. “Brain-age is associated with progression to dementia in memory
clinic patients”. In: NeuroImage: Clinical 36 (2022), p. 103175.
[60] Christian Gaser et al. “BrainAGE in mild cognitive impaired patients: predicting the conversion to Alzheimer’s disease”. In: PloS one 8.6 (2013), e67346.
[61] James H Cole et al. “Brain age and other bodily ‘ages’: Implications for neuropsychiatry”.
In: Molecular psychiatry 24.2 (2019), pp. 266–281.
[62] Daniel Brennan, Tingting Wu, and Jin Fan. “Morphometrical brain markers of sex difference”. In: Cerebral Cortex 31.8 (2021), pp. 3641–3649.
[63] Lydia Kogler et al. “Sex differences in the functional connectivity of the amygdalae in
association with cortisol”. In: NeuroImage 134 (2016), pp. 410–423.
[64] A Veronica Witte et al. “Regional sex differences in grey matter volume are associated with
sex hormones in the young adult human brain”. In: NeuroImage 49.2 (2010), pp. 1205–
1212.
114
[65] GF Wooten et al. “Are men at greater risk for Parkinson’s disease than women?” In: Journal of Neurology, Neurosurgery & Psychiatry 75.4 (2004), pp. 637–639.
[66] Tao Liu et al. “The effects of age and sex on cortical sulci in the elderly”. In: NeuroImage
51.1 (2010), pp. 19–27.
[67] Jeffrey A Kaye et al. “The significance of age-related enlargement of the cerebral ventricles
in healthy men and women measured by quantitative computed X-ray tomography”. In:
Journal of the American Geriatrics Society 40.3 (1992), pp. 225–231.
[68] Marco Hirnstein, Kenneth Hugdahl, and Markus Hausmann. “Cognitive sex differences
and hemispheric asymmetry: A critical review of 40 years of research”. In: Laterality:
Asymmetries of Body, Brain and Cognition 24.2 (2019), pp. 204–252.
[69] Roberto Cabeza. “Hemispheric asymmetry reduction in older adults: The HAROLD model.”
In: Psychology and Aging 17.1 (2002), p. 85.
[70] Siyuan Liu et al. “Integrative structural, functional, and transcriptomic analyses of sexbiased brain organization in humans”. In: Proceedings of the National Academy of Sciences
117.31 (2020), pp. 18788–18798.
[71] Elizabeth R Sowell et al. “Sex differences in cortical thickness mapped in 176 healthy
individuals between 7 and 87 years of age”. In: Cerebral Cortex 17.7 (2007), pp. 1550–
1560.
[72] Mark W Bondi et al. “Neuropsychological criteria for mild cognitive impairment improves diagnostic precision, biomarker associations, and progression rates”. In: Journal
of Alzheimer’s Disease 42.1 (2014), pp. 275–289.
[73] Luise Christine Löwe et al. “The effect of the ApoE genotype on individual BrainAGE
in normal aging, mild cognitive impairment, and Alzheimer’s disease”. In: PloS One 11.7
(2016), e0157514.
[74] Heidi IL Jacobs et al. “Parietal cortex matters in Alzheimer’s disease: An overview of
structural, functional and metabolic findings”. In: Neuroscience & Biobehavioral Reviews
36.1 (2012), pp. 297–309.
[75] Anders M Fjell et al. “What is normal in normal aging? Effects of aging, amyloid and
Alzheimer’s disease on the cerebral cortex and the hippocampus”. In: Progress in Neurobiology 117 (2014), pp. 20–40.
[76] Gregory S Day et al. “Tau PET binding distinguishes patients with early-stage posterior
cortical atrophy from amnestic Alzheimer’s disease dementia”. In: Alzheimer Disease and
Associated Disorders 31.2 (2017), p. 87.
115
[77] Heiko Braak and EVA Braak. “Staging of Alzheimer’s disease-related neurofibrillary changes”.
In: Neurobiology of Aging 16.3 (1995), pp. 271–278.
[78] Baptiste Couvy-Duchesne et al. “Ensemble learning of convolutional neural network, support vector machine, and best linear unbiased predictor for brain age prediction: ARAMIS
contribution to the predictive analytics competition 2019 challenge”. In: Frontiers in psychiatry (2020), p. 1451.
[79] Chen-Yuan Kuo et al. “Improving individual brain age prediction using an ensemble deep
learning framework”. In: Frontiers in psychiatry 12 (2021).
[80] Pierre Besson et al. “Geometric deep learning on brain shape predicts sex and age”. In:
Computerized Medical Imaging and Graphics 91 (2021), p. 101939.
[81] Franziskus Liem et al. “Predicting brain-age from multimodal imaging data captures cognitive impairment”. In: Neuroimage 148 (2017), pp. 179–188.
[82] Jeyeon Lee et al. “Deep learning-based brain age prediction in normal aging and dementia”.
In: Nature Aging (2022). URL: https : / / doi . org / 10 . 1038 / s43587 - 022 -
00219-7.
[83] Esten H Leonardsen et al. “Deep neural networks learn general and clinically relevant
representations of the ageing brain”. In: NeuroImage 256 (2022), p. 119210.
[84] Rena Li and Meharvan Singh. “Sex differences in cognitive impairment and Alzheimer’s
disease”. In: Frontiers in Neuroendocrinology 35.3 (2014), pp. 385–403.
[85] Sheng He, P Ellen Grant, and Yangming Ou. “Global-Local transformer for brain age
estimation”. In: IEEE Transactions on Medical Imaging 41.1 (2021), pp. 213–224.
[86] David A Wood et al. “Accurate brain-age models for routine clinical MRI examinations”.
In: NeuroImage 249 (2022), p. 118871.
[87] Hyunwoo Lee et al. “Estimating and accounting for the effect of MRI scanner changes
on longitudinal whole-brain volume change measurements”. In: NeuroImage 184 (2019),
pp. 555–565.
[88] Wen Zhang et al. “Deep representation learning for multimodal brain networks”. In: International Conference on Medical Image Computing and Computer-Assisted Intervention.
Springer. 2020, pp. 613–624.
[89] Jinbo Xing, Wenbo Hu, and Tien-Tsin Wong. “Scale-arbitrary Invertible Image Downscaling”. In: arXiv preprint arXiv:2201.12576 (2022).
116
[90] Chao Dong et al. “Learning a deep convolutional network for image super-resolution”. In:
European conference on computer vision. Springer. 2014, pp. 184–199.
[91] Yulun Zhang et al. “Image super-resolution using very deep residual channel attention networks”. In: Proceedings of the European Conference on Computer Vision (ECCV). 2018,
pp. 286–301.
[92] Chenzhong Yin et al. “Anatomically interpretable deep learning of brain age captures
domain-specific cognitive impairment”. In: Proceedings of the National Academy of Sciences 120.2 (2023), e2214634120.
[93] Yuan Tian et al. “Self-conditioned probabilistic learning of video rescaling”. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021, pp. 4490–4499.
[94] Zhihong Pan et al. “Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization
and Cycle Idempotence”. In: (2022), pp. 17389–17398.
[95] Heewon Kim et al. “Task-aware image downscaling”. In: Proceedings of the European
Conference on Computer Vision (ECCV). 2018, pp. 399–414.
[96] Wanjie Sun and Zhenzhong Chen. “Learned image downscaling for upscaling using content adaptive resampler”. In: IEEE Transactions on Image Processing 29 (2020), pp. 4027–
4040.
[97] Yue Li et al. “Learning a convolutional neural network for image compact-resolution”. In:
IEEE Transactions on Image Processing 28.3 (2018), pp. 1092–1107.
[98] Min Zhang et al. “Enhancing Image Rescaling using Dual Latent Variables in Invertible
Neural Network”. In: arXiv preprint arXiv:2207.11844 (2022).
[99] Shang Li et al. “Approaching the limit of image rescaling via flow guidance”. In: arXiv
preprint arXiv:2111.05133 (2021).
[100] Sefi Bell-Kligler, Assaf Shocher, and Michal Irani. “Blind super-resolution kernel estimation using an internal-gan”. In: Advances in Neural Information Processing Systems 32
(2019).
[101] Marco Bevilacqua et al. “Low-Complexity Single-Image Super-Resolution based on Nonnegative Neighbor Embedding”. In: Proceedings of the British Machine Vision Conference.
BMVA Press, 2012, pp. 135.1–135.10.
[102] Roman Zeyde, Michael Elad, and Matan Protter. “On single image scale-up using sparserepresentations”. In: International conference on curves and surfaces. Springer. 2010,
pp. 711–730.
117
[103] David Martin et al. “A database of human segmented natural images and its application to
evaluating segmentation algorithms and measuring ecological statistics”. In: Proceedings
Eighth IEEE International Conference on Computer Vision. ICCV 2001. Vol. 2. IEEE.
2001, pp. 416–423.
[104] Jia-Bin Huang, Abhishek Singh, and Narendra Ahuja. “Single image super-resolution from
transformed self-exemplars”. In: Proceedings of the IEEE conference on computer vision
and pattern recognition. 2015, pp. 5197–5206.
[105] Eirikur Agustsson and Radu Timofte. “NTIRE 2017 challenge on single image superresolution: Dataset and study”. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2017, pp. 126–135.
[106] Jingyun Liang et al. “Hierarchical conditional flow: A unified framework for image superresolution and image rescaling”. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021, pp. 4076–4085.
[107] Laura Marczak, Kevin O’Rourke, and Dawn Shepard. “When and why people die in the
United States, 1990-2013”. In: Jama 315.3 (2016), pp. 241–241.
[108] Ho Il Yoon and Don D Sin. “Confronting the colossal crisis of COPD in China”. In: Chest
139.4 (2011), pp. 735–736.
[109] Rafael Lozano et al. “Bin Abdulhak A, Birbeck G, Blyth F, Bolliger I, Boufous S, Bucello
C, Burch M, et al: Global and regional mortality from 235 causes of death for 20 age
groups in 1990 and 2010: a systematic analysis for the Global Burden of Disease Study
2010”. In: Lancet 380.9859 (2012), pp. 2095–2128.
[110] Jørgen Vestbo et al. “Global strategy for the diagnosis, management, and prevention of
chronic obstructive pulmonary disease: GOLD executive summary”. In: American journal
of respiratory and critical care medicine 187.4 (2013), pp. 347–365.
[111] Alvar Agustı and Rosa Faner. “COPD beyond smoking: new paradigm, novel opportunities”. In: The Lancet Respiratory Medicine (2018).
[112] Alvar Agusti et al. “Treatable traits: toward precision medicine of chronic airway diseases”.
In: European Respiratory Journal 47.2 (2016), pp. 410–419.
[113] Bartolome R Celli et al. “Standards for the diagnosis and treatment of patients with COPD:
a summary of the ATS/ERS position paper”. In: European Respiratory Journal 23.6 (2004),
pp. 932–946.
[114] Huib AM Kerstjens. “The GOLD classification has not advanced understanding of COPD”.
In: American journal of respiratory and critical care medicine 170.3 (2004), pp. 212–213.
118
[115] Marc Miravitlles et al. “A review of national guidelines for management of COPD in Europe”. In: European Respiratory Journal 47.2 (2016), pp. 625–637.
[116] Lucas MA Goossens et al. “Does the 2013 GOLD classification improve the ability to
predict lung function decline, exacerbations and mortality: a post-hoc analysis of the 4-
year UPLIFT trial”. In: BMC pulmonary medicine 14.1 (2014), p. 163.
[117] Filip Velickovski et al. “Automated spirometry quality assurance: supervised learning from
multiple experts”. In: IEEE journal of biomedical and health informatics 22.1 (2018),
pp. 276–284.
[118] James W Dodd et al. “The COPD assessment test (CAT): response to pulmonary rehabilitation. A multicentre, prospective study”. In: Thorax 66.5 (2011), pp. 425–429.
[119] PW Jones et al. “Development and first validation of the COPD Assessment Test”. In:
European Respiratory Journal 34.3 (2009), pp. 648–654.
[120] Pierre-Régis Burgel et al. “A simple algorithm for the identification of clinical COPD
phenotypes”. In: European Respiratory Journal 50.5 (2017), p. 1701034.
[121] Miguel J Divo et al. “Chronic obstructive pulmonary disease comorbidities network”. In:
European Respiratory Journal (2015), ERJ–01716.
[122] Yale Chang et al. “COPD subtypes identified by network-based clustering of blood gene
expression”. In: Genomics 107.2 (2016), pp. 51–58.
[123] Xu Min, Bin Yu, and Fei Wang. “Predictive modeling of the hospital readmission risk from
patients’ claims data using machine learning: a case study on COPD”. In: Scientific reports
9.1 (2019), pp. 1–10.
[124] Stefan Mihaicuta et al. “Network science meets respiratory medicine for OSAS phenotyping and severity prediction”. In: PeerJ 5 (2017), e3289.
[125] Hang Ding et al. “A mobile-health system to manage chronic obstructive pulmonary disease patients at home”. In: Engineering in Medicine and Biology Society (EMBC), 2012
Annual International Conference of the IEEE. IEEE. 2012, pp. 2178–2181.
[126] Maxine Hardinge et al. “Using a mobile health application to support self-management
in chronic obstructive pulmonary disease: a six-month cohort study”. In: BMC medical
informatics and decision making 15.1 (2015), p. 46.
[127] Alyssa Cairns et al. “A pilot validation study for the NOX T3 TM portable monitor for the
detection of OSA”. In: Sleep and Breathing 18.3 (2014), pp. 609–614.
119
[128] Peter Mukli, Zoltan Nagy, and Andras Eke. “Multifractal formalism by enforcing the universal behavior of scaling functions”. In: Physica A: Statistical Mechanics and its Applications 417 (2015), pp. 150–167.
[129] Benoit B Mandelbrot and John W Van Ness. “Fractional Brownian motions, fractional
noises and applications”. In: SIAM review 10.4 (1968), pp. 422–437.
[130] Andras Eke et al. “Physiological time series: distinguishing fractal noises from motions”.
In: Pflügers Archiv 439.4 (2000), pp. 403–415.
[131] Jan W Kantelhardt et al. “Multifractal detrended fluctuation analysis of nonstationary time
series”. In: Physica A: Statistical Mechanics and its Applications 316.1-4 (2002), pp. 87–
114.
[132] C-K Peng et al. “Mosaic organization of DNA nucleotides”. In: Physical review e 49.2
(1994), p. 1685.
[133] C-K Peng et al. “Quantifying fractal dynamics of human respiration: age and gender effects”. In: Annals of biomedical engineering 30.5 (2002), pp. 683–692.
[134] Aicko Y Schumann et al. “Aging effects on cardiac and respiratory dynamics in healthy
subjects across sleep stages”. In: Sleep 33.7 (2010), pp. 943–955.
[135] Frigyes Samuel Racz et al. “Multifractal dynamic functional connectivity in the restingstate brain”. In: Frontiers in Physiology 9 (2018), p. 1704.
[136] Amir Bashan et al. “Comparison of detrending methods for fluctuation analysis”. In: Physica A: Statistical Mechanics and its Applications 387.21 (2008), pp. 5080–5090.
[137] Espen Alexander Fürst EAFI Ihlen. “Introduction to multifractal detrended fluctuation
analysis in Matlab”. In: Frontiers in physiology 3 (2012), p. 141.
[138] Pauli Virtanen et al. “SciPy 1.0: Fundamental Algorithms for Scientific Computing in
Python”. In: Nature Methods 17 (2020), pp. 261–272. DOI: 10.1038/s41592-019-
0686-2.
[139] Ingram Olkin and Friedrich Pukelsheim. “The distance between two random vectors with
given dispersion matrices”. In: Linear Algebra and its Applications 48 (1982), pp. 257–
263.
[140] Ariel Jaitovich and Esther Barreiro. “Skeletal muscle dysfunction in chronic obstructive
pulmonary disease. What we know and can do for our patients”. In: American journal of
respiratory and critical care medicine 198.2 (2018), pp. 175–186.
120
[141] Sunita Mathur, Dina Brooks, and Celso RF Carvalho. “Structural alterations of skeletal
muscle in copd”. In: Frontiers in physiology 5 (2014), p. 104.
[142] Amany F Elbehairy et al. “Pulmonary gas exchange abnormalities in mild chronic obstructive pulmonary disease. Implications for dyspnea and exercise intolerance”. In: American
journal of respiratory and critical care medicine 191.12 (2015), pp. 1384–1394.
[143] Paul Bogdan et al. “Heterogeneous structure of stem cells dynamics: statistical models and
quantitative predictions”. In: Scientific reports 4 (2014).
[144] Gaurav Gupta, Sergio Pequito, and Paul Bogdan. “Dealing with Unknown Unknowns:
Identification and Selection of Minimal Sensing for Fractional Dynamics with Unknown
Inputs”. In: arXiv preprint arXiv:1803.04866 (2018).
[145] Yuankun Xue, Saul Rodriguez, and Paul Bogdan. “A spatio-temporal fractal model for a
CPS approach to brain-machine-body interfaces”. In: Design, Automation & Test in Europe
Conference & Exhibition (DATE), 2016. IEEE. 2016, pp. 642–647.
[146] Mahboobeh Ghorbani and Paul Bogdan. “A cyber-physical system approach to artificial
pancreas design”. In: Proceedings of the ninth IEEE/ACM/IFIP international conference
on hardware/software codesign and system synthesis. IEEE Press. 2013, p. 17.
[147] Yuankun Xue et al. “Minimum number of sensors to ensure observability of physiological
systems: A case study”. In: Communication, Control, and Computing (Allerton), 2016 54th
Annual Allerton Conference on. IEEE. 2016, pp. 1181–1188.
[148] Yuankun Xue and Paul Bogdan. “Constructing compact causal mathematical models for
complex dynamics”. In: Proceedings of the 8th International Conference on Cyber-Physical
Systems. ACM. 2017, pp. 97–107.
[149] Chenzhong Yin, Gaurav Gupta, and Paul Bogdan. “Discovering Laws from Observations:
A Data-Driven Approach”. In: International Conference on Dynamic Data Driven Application Systems. Springer. 2020, pp. 302–310.
[150] Andrzej Dzielinski and Dominik Sierociuk. “Adaptive Feedback Control of Fractional Order Discrete State-Space Systems”. In: CIMCA-IAWTIC. 2005.
[151] Gaurav Gupta, Sérgio Pequito, and Paul Bogdan. “Re-thinking EEG-based Non-invasive
Brain Interfaces: Modeling and Analysis”. In: Proceedings of the 9th ACM/IEEE International Conference on Cyber-Physical Systems. ICCPS ’18. Porto, Portugal: IEEE Press,
2018, pp. 275–286.
[152] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. “Deep learning”. In: nature 521.7553
(2015), pp. 436–444.
121
[153] G Ian, B Yoshua, and C Aaron. Deep learning (adaptive computation and machine learning). 2016.
[154] Garvesh Raskutti, Martin J Wainwright, and Bin Yu. “Early stopping and non-parametric
regression: an optimal data-dependent stopping rule”. In: The Journal of Machine Learning
Research 15.1 (2014), pp. 335–366.
[155] Chuanqi Tan et al. “A survey on deep transfer learning”. In: International conference on
artificial neural networks. Springer. 2018, pp. 270–279.
[156] Charlotte E Bolton et al. “Lung consequences in adults born prematurely”. In: Thorax 70.6
(2015), pp. 574–580.
[157] Stephen I Rennard and M Bradley Drummond. “Early chronic obstructive pulmonary disease: definition, assessment, and prevention”. In: The Lancet 385.9979 (2015), pp. 1778–
1788.
[158] SI Rennard, A Agustı, and J Vestbo. “The natural history of COPD: beyond Fletcher and
Peto”. In: BRN Rev 1.2 (2015), pp. 116–30.
[159] Zoë L Borrill et al. “The use of plethysmography and oscillometry to compare long-acting
bronchodilators in patients with COPD”. In: British journal of clinical pharmacology 65.2
(2008), pp. 244–252.
[160] Sophia Frantz et al. “Impulse oscillometry may be of value in detecting early manifestations of COPD”. In: Respiratory medicine 106.8 (2012), pp. 1116–1123.
[161] Plamen Ch Ivanov et al. “Levels of complexity in scale-invariant neural signals”. In: Physical Review E 79.4 (2009), p. 041920.
[162] Thibaud Soumagne et al. “Quantitative and qualitative evaluation of spirometry for COPD
screening in general practice”. In: Respiratory medicine and research 77 (2020), pp. 31–
36.
[163] DJ Weiss et al. “Global maps of travel time to healthcare facilities”. In: Nature Medicine
26.12 (2020), pp. 1835–1838.
[164] Mikail Rubinov and Olaf Sporns. “Complex network measures of brain connectivity: uses
and interpretations”. In: Neuroimage 52.3 (2010), pp. 1059–1069.
[165] Michael Breakspear. “Dynamic models of large-scale brain activity”. In: Nature neuroscience 20.3 (2017), p. 340.
[166] Ed Bullmore and Olaf Sporns. “The economy of brain network organization”. In: Nature
Reviews Neuroscience 13.5 (2012), p. 336.
122
[167] Caio Seguin, Martijn P Van Den Heuvel, and Andrew Zalesky. “Navigation of brain networks”. In: Proceedings of the National Academy of Sciences 115.24 (2018), pp. 6297–
6302.
[168] Danielle S Bassett et al. “Dynamic reconfiguration of human brain networks during learning”. In: Proceedings of the National Academy of Sciences 108.18 (2011), pp. 7641–7646.
[169] Martijn P Van den Heuvel, Edward T Bullmore, and Olaf Sporns. “Comparative connectomics”. In: Trends in cognitive sciences 20.5 (2016), pp. 345–361.
[170] Danielle S Bassett and Edward T Bullmore. “Small-world brain networks revisited”. In:
The Neuroscientist 23.5 (2017), pp. 499–516.
[171] Andrea Avena-Koenigsberger, Bratislav Misic, and Olaf Sporns. “Communication dynamics in complex brain networks”. In: Nature Reviews Neuroscience 19.1 (2018), p. 17.
[172] Lianchun Yu et al. “Efficient coding and energy efficiency are promoted by balanced excitatory and inhibitory synaptic currents in neuronal network”. In: Frontiers in Cellular
Neuroscience 12 (2018), p. 123.
[173] Kristina Schulz et al. “Simultaneous BOLD fMRI and fiber-optic calcium recording in rat
neocortex”. In: Nature methods 9.6 (2012), pp. 597–602.
[174] Angelika Steger and Nicholas C Wormald. “Generating random regular graphs quickly”.
In: Combinatorics, Probability and Computing 8.4 (1999), pp. 377–396.
[175] Edgar N Gilbert. “Random graphs”. In: The Annals of Mathematical Statistics 30.4 (1959),
pp. 1141–1144.
[176] Duncan J Watts and Steven H Strogatz. “Collective dynamics of ‘small-world’networks”.
In: nature 393.6684 (1998), p. 440.
[177] Albert-László Barabási and Réka Albert. “Emergence of scaling in random networks”. In:
science 286.5439 (1999), pp. 509–512.
[178] Sara Teller et al. “Emergence of assortative mixing between clusters of cultured neurons”.
In: PLoS Comput Biol 10.9 (2014), e1003796.
[179] Juan G Restrepo, Edward Ott, and Brian R Hunt. “Approximating the largest eigenvalue
of network adjacency matrices”. In: Physical Review E 76.5 (2007), p. 056119.
[180] Mark EJ Newman. “Clustering and preferential attachment in growing networks”. In: Physical review E 64.2 (2001), p. 025102.
123
[181] Boris Gourévitch and Jos J Eggermont. “Evaluating information transfer between auditory
cortical neurons”. In: Journal of neurophysiology 97.3 (2007), pp. 2533–2543.
[182] Tore Opsahl, Filip Agneessens, and John Skvoretz. “Node centrality in weighted networks:
Generalizing degree and shortest paths”. In: Social networks 32.3 (2010), pp. 245–251.
[183] Jules Lallouette et al. “Sparse short-distance connections enhance calcium wave propagation in a 3D model of astrocyte networks”. In: Frontiers in computational neuroscience 8
(2014), p. 45.
[184] Marc Barthélemy. “Spatial networks”. In: Physics Reports 499.1-3 (2011), pp. 1–101.
[185] Ruochen Yang and Paul Bogdan. “Controlling the multifractal generating measures of
complex networks”. In: Scientific Reports 10.1 (2020), pp. 1–13.
[186] Norbert Jaušovec. “The neural code of intelligence: From correlation to causation”. In:
Physics of life reviews (2019).
[187] Roberto Colom, Rex E Jung, and Richard J Haier. “Distributed brain sites for the g-factor
of intelligence”. In: Neuroimage 31.3 (2006), pp. 1359–1365.
[188] John Duncan et al. “A neural basis for general intelligence”. In: Science 289.5478 (2000),
pp. 457–460.
[189] Yuankun Xue and Paul Bogdan. “Reliable multi-fractal characterization of weighted complex networks: algorithms and implications”. In: Scientific reports 7.1 (2017), p. 7487.
[190] Javier G Orlandi et al. “Noise focusing and the emergence of coherent activity in neuronal
cultures”. In: Nature Physics 9.9 (2013), pp. 582–590.
[191] Anna Levina, J Michael Herrmann, and Theo Geisel. “Dynamical synapses causing selforganized criticality in neural networks”. In: Nature physics 3.12 (2007), pp. 857–860.
[192] Christopher J Honey, Jean-Philippe Thivierge, and Olaf Sporns. “Can structure predict
function in the human brain?” In: Neuroimage 52.3 (2010), pp. 766–776.
[193] Lluıs Hernández-Navarro et al. “Dominance of metric correlations in two-dimensional neuronal cultures described through a random field Ising model”. In: Physical review letters
118.20 (2017), p. 208101.
[194] Elisenda Tibau Martorell et al. “Neuronal spatial arrangement shapes effective connectivity traits of in vitro cortical networks”. In: IEEE Transactions on Network Science and
Engineering (2018).
124
[195] Zhuo Wang et al. “Spatial light interference microscopy (SLIM)”. In: Optics express 19.2
(2011), pp. 1016–1026.
[196] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning internal representations by error propagation. Tech. rep. California Univ San Diego La Jolla Inst for
Cognitive Science, 1985.
[197] Michael SC Thomas and James L McClelland. “Connectionist models of cognition.” In:
(2008).
[198] Léon Bottou et al. “Stochastic gradient learning in neural networks”. In: Proceedings of
Neuro-Nımes 91.8 (1991), p. 12.
[199] Raul Rojas. “The backpropagation algorithm”. In: Neural networks. Springer, 1996, pp. 149–
182.
[200] Henry Markram, Wulfram Gerstner, and Per Jesper Sjöström. “A history of spike-timingdependent plasticity”. In: Frontiers in synaptic neuroscience 3 (2011), p. 4.
[201] Serge Moscovici and Marisa Zavalloni. “The group as a polarizer of attitudes.” In: Journal
of personality and social psychology 12.2 (1969), p. 125.
[202] OJ O’Loan and MR Evans. “Alternating steady state in one-dimensional flocking”. In:
Journal of Physics A: Mathematical and General 32.8 (1999), p. L99.
[203] Peter Friedl, Yael Hegerfeldt, and Miriam Tusch. “Collective cell migration in morphogenesis and cancer”. In: International Journal of Developmental Biology 48.5-6 (2004),
pp. 441–449.
[204] John J Hopfield. “Neural networks and physical systems with emergent collective computational abilities”. In: Proceedings of the national academy of sciences 79.8 (1982),
pp. 2554–2558.
[205] Warren S McCulloch and Walter Pitts. “A logical calculus of the ideas immanent in nervous
activity”. In: The bulletin of mathematical biophysics 5.4 (1943), pp. 115–133.
[206] John J Hopfield. “Neurons with graded response have collective computational properties
like those of two-state neurons”. In: Proceedings of the national academy of sciences 81.10
(1984), pp. 3088–3092.
[207] Gašper Tkacik et al. “Searching for collective behavior in a large network of sensory neu- ˇ
rons”. In: PLoS computational biology 10.1 (2014), e1003408.
[208] Jingyi Qu and Rubin Wang. “Collective behavior of large-scale neural networks with GPU
acceleration”. In: Cognitive neurodynamics 11.6 (2017), pp. 553–563.
125
[209] Stefano Luccioli and Antonio Politi. “Irregular collective behavior of heterogeneous neural
networks”. In: Physical review letters 105.15 (2010), p. 158104.
[210] Leenoy Meshulam et al. “Collective behavior of place and non-place neurons in the hippocampal network”. In: Neuron 96.5 (2017), pp. 1178–1191.
[211] Natalia Caporale and Yang Dan. “Spike timing–dependent plasticity: a Hebbian learning
rule”. In: Annu. Rev. Neurosci. 31 (2008), pp. 25–46.
[212] Yuwen Xiong, Mengye Ren, and Raquel Urtasun. “Loco: Local contrastive representation
learning”. In: Advances in neural information processing systems 33 (2020), pp. 11142–
11153.
[213] Tamás Vicsek et al. “Novel type of phase transition in a system of self-driven particles”.
In: Physical review letters 75.6 (1995), p. 1226.
[214] Roland Bouffanais. Design and control of swarm dynamics. Vol. 1. Springer, 2016.
[215] Tamás Vicsek and Anna Zafeiris. “Collective motion”. In: Physics reports 517.3-4 (2012),
pp. 71–140.
[216] Eugene Belilovsky, Michael Eickenberg, and Edouard Oyallon. “Decoupled greedy learning of cnns”. In: International Conference on Machine Learning. PMLR. 2020, pp. 736–
745.
[217] Myeongjang Pyeon et al. “Sedona: Search for decoupled neural networks toward greedy
block-wise learning”. In: International Conference on Learning Representations. 2020.
[218] Yann LeCun. “The MNIST database of handwritten digits”. In: http://yann. lecun. com/exdb/mnist/
(1998).
[219] Joel Veness et al. “Gated linear networks”. In: arXiv preprint arXiv:1910.01526 (2019).
[220] Norbert Tarcai et al. “Patterns, transitions and the role of leaders in the collective dynamics
of a simple robotic flock”. In: Journal of Statistical Mechanics: Theory and Experiment
2011.04 (2011), P04010.
[221] Eugene Belilovsky, Michael Eickenberg, and Edouard Oyallon. “Greedy layerwise learning can scale to imagenet”. In: International conference on machine learning. PMLR.
2019, pp. 583–593.
[222] Timothy P Lillicrap et al. “Random synaptic feedback weights support error backpropagation for deep learning”. In: Nature communications 7.1 (2016), pp. 1–10.
126
[223] Atılım Güne¸s Baydin et al. “Gradients without backpropagation”. In: arXiv preprint arXiv:2202.08587
(2022).
[224] Mengye Ren et al. “Scaling forward gradient with local losses”. In: arXiv preprint arXiv:2210.03310
(2022).
[225] Sergey Bartunov et al. “Assessing the scalability of biologically-motivated deep learning
algorithms and architectures”. In: Advances in neural information processing systems 31
(2018).
[226] Ching-Hsun Tseng et al. “UPANets: Learning from the Universal Pixel Attention Neworks”.
In: Entropy 24.9 (2022), p. 1243.
[227] Jang-Hyun Kim, Wonho Choo, and Hyun Oh Song. “Puzzle mix: Exploiting saliency and
local statistics for optimal mixup”. In: International Conference on Machine Learning.
PMLR. 2020, pp. 5275–5285.
[228] Zelin Zang et al. “Dlme: Deep local-flatness manifold embedding”. In: Computer Vision–
ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXI. Springer. 2022, pp. 576–592.
[229] Dimitrios Konstantinidis et al. “Multi-manifold Attention for Vision Transformers”. In:
arXiv preprint arXiv:2207.08569 (2022).
[230] Ting Chen et al. “A Simple Framework for Contrastive Learning of Visual Representations”. In: arXiv preprint arXiv:2002.05709 (2020).
[231] Martın Abadi et al. “Tensorflow: A system for large-scale machine learning”. In: 12th
{USENIX} symposium on operating systems design and implementation ({OSDI} 16).
2016, pp. 265–283.
[232] Alex Krizhevsky, Geoffrey Hinton, et al. “Learning multiple layers of features from tiny
images”. In: (2009).
[233] Ya Le and Xuan Yang. “Tiny imagenet visual recognition challenge”. In: CS 231N 7.7
(2015), p. 3.
[234] J Krause et al. “Leadership in fish shoals”. In: Fish and Fisheries 1.1 (2000), pp. 82–89.
[235] Karen Simonyan and Andrew Zisserman. “Very deep convolutional networks for largescale image recognition”. In: arXiv preprint arXiv:1409.1556 (2014).
[236] Kaiming He et al. “Deep residual learning for image recognition”. In: Proceedings of the
IEEE conference on computer vision and pattern recognition. 2016, pp. 770–778.
127
[237] Christian Szegedy et al. “Going deeper with convolutions”. In: Proceedings of the IEEE
conference on computer vision and pattern recognition. 2015, pp. 1–9.
[238] Jolyon J Faria et al. “Leadership and social information use in human crowds”. In: Animal
Behaviour 79.4 (2010), pp. 895–901.
[239] Hesham Mostafa, Vishwajith Ramesh, and Gert Cauwenberghs. “Deep supervised learning
using local errors”. In: Frontiers in neuroscience (2018), p. 608.
[240] Meredith A Shafto et al. “The Cambridge Centre for Ageing and Neuroscience (CamCAN) study protocol: A cross-sectional, lifespan, multidisciplinary examination of healthy
cognitive ageing”. In: BMC Neurology 14.1 (2014), pp. 1–25.
[241] Franz Faul et al. “Statistical power analyses using G* Power 3.1: Tests for correlation and
regression analyses”. In: Behavior Research Methods 41.4 (2009), pp. 1149–1160.
[242] MR Miller. “ATS/ERS task force: standardisation of spirometry”. In: Eur Respir J 26
(2005), pp. 319–338.
[243] Brian L Graham et al. “Standardization of spirometry 2019 update. An official American thoracic Society and European respiratory Society technical statement”. In: American
journal of respiratory and critical care medicine 200.8 (2019), e70–e88.
[244] Romain A Pauwels et al. “Global strategy for the diagnosis, management, and prevention
of chronic obstructive pulmonary disease: NHLBI/WHO Global Initiative for Chronic Obstructive Lung Disease (GOLD) Workshop summary”. In: American journal of respiratory
and critical care medicine 163.5 (2001), pp. 1256–1276.
[245] MR Miller et al. “Standardization of spirometry Series “ATS/ERS Task Force”: Standardization of Lung Function Testing”. In: European Respiratory Journal 26.2 (2005), p. 321.
[246] Richard Glover, Brendan Cooper, and Julie Lloyd. “Forced expiratory time (FET) as an
indicator for airways obstruction”. In: European Respiratory Journal 44.Suppl 58 (2014).
[247] Paul L Enright, Kenneth C Beck, and Duane L Sherrill. “Repeatability of spirometry in
18,000 adult patients”. In: American journal of respiratory and critical care medicine 169.2
(2004), pp. 235–238.
[248] John L Hankinson et al. “Use of forced vital capacity and forced expiratory volume in 1
second quality criteria for determining a valid test”. In: European Respiratory Journal 45.5
(2015), pp. 1283–1292.
[249] Igor Barjaktarevic et al. “Bronchodilator responsiveness or reversibility in asthma and
COPD–a need for clarity”. In: International journal of chronic obstructive pulmonary disease 13 (2018), p. 3511.
128
[250] Matthew J Hegewald, Heather M Gallo, and Emily L Wilson. “Accuracy and quality of
spirometry in primary care offices”. In: Annals of the American Thoracic Society 13.12
(2016), pp. 2119–2124.
[251] Matthew Hegewald et al. Accuracy of spirometers used in primary care. 2015.
[252] Klaus F Rabe et al. “Global strategy for the diagnosis, management, and prevention of
chronic obstructive pulmonary disease: GOLD executive summary”. In: American journal
of respiratory and critical care medicine 176.6 (2007), pp. 532–555.
[253] Bartolome Celli et al. “Perception of symptoms and quality of life–comparison of patients’
and physicians’ views in the COPD MIRROR study”. In: International journal of chronic
obstructive pulmonary disease 12 (2017), p. 2189.
[254] Paul W Jones et al. “COPD: the patient perspective”. In: International journal of chronic
obstructive pulmonary disease 11.Spec Iss (2016), p. 13.
[255] Evdoxia Gogou et al. “Underestimation of respiratory symptoms by smokers: a thorn
in chronic obstructive pulmonary disease diagnosis”. In: NPJ primary care respiratory
medicine 31.1 (2021), pp. 1–8.
[256] Raúl H Sansores et al. “Prevalence of chronic obstructive pulmonary disease in asymptomatic smokers”. In: International journal of chronic obstructive pulmonary disease 10
(2015), p. 2357.
[257] Kirsten Bibbins-Domingo et al. “Statin use for the primary prevention of cardiovascular
disease in adults: US Preventive Services Task Force recommendation statement”. In: Jama
316.19 (2016), pp. 1997–2007.
[258] Joan B Soriano, Jan Zielinski, and David Price. “Screening for and early detection of
chronic obstructive pulmonary disease”. In: The Lancet 374.9691 (2009), pp. 721–732.
[259] Pablo Sanchez-Salcedo et al. “Disease progression in young patients with COPD: rethinking the Fletcher and Peto model”. In: European Respiratory Journal 44.2 (2014), pp. 324–
331.
[260] Yunus Çolak et al. “Prognosis of asymptomatic and symptomatic, undiagnosed COPD in
the general population in Denmark: a prospective cohort study”. In: The Lancet Respiratory
Medicine 5.5 (2017), pp. 426–434.
[261] Mikhail E Kandel et al. “Real-time halo correction in phase contrast imaging”. In: Biomedical optics express 9.2 (2018), pp. 623–635.
[262] Mikhail E Kandel et al. “Epi-illumination gradient light interference microscopy for imaging opaque structures”. In: Nature communications 10.1 (2019), pp. 1–9.
129
[263] Stephen P Borgatti. “Centrality and network flow”. In: Social networks 27.1 (2005), pp. 55–
71.
[264] Mark Ed Newman, Albert-László Ed Barabási, and Duncan J Watts. The structure and
dynamics of networks. Princeton university press, 2006.
[265] Mark EJ Newman. “The structure and function of complex networks”. In: SIAM review
45.2 (2003), pp. 167–256.
[266] Pedro G Lind, Marta C Gonzalez, and Hans J Herrmann. “Cycles and clustering in bipartite
networks”. In: Physical review E 72.5 (2005), p. 056127.
130
Appendices
A Interpretable deep learning based brain age prediction
architecture
A.1 Cognitive measures
A.1.1 CamCAN
Thirteen cognitive measures that assess emotional processing, executive function, memory, and
motor function were obtained from the CamCAN repository [42, 240]. Emotional processing
was measured via i) Ekman’s emotion expression recognition test, ii) the emotional memory test,
and iii) the emotional regulation test. Higher scores on Ekman’s emotion expression test indicate
greater recognition of facial expressions and emotions. Higher d
′
(correct rejections vs. correct
recognitions) values for negative stimuli on the emotional memory test indicate better explicit
memory for emotionally laden stimuli. Higher ratings on the emotional regulation test indicate
greater ability to regulate emotional responses. Executive function was measured using i) Cattell’s
fluid intelligence test, ii) the hotel test, and iii) a proverb comprehension task. Higher total scores
on Cattell’s fluid intelligence test across its four sub-tests indicate greater intelligence and mental
control. Shorter times on each of the five trials of the hotel test indicate greater complex planning
and multitasking ability. More proverbs correctly interpreted during the proverb comprehension
test indicate greater executive function and abstraction ability. Memory was measured using i)
Benton’s face recognition test, ii) the famous faces test, iii) a picture priming task, iv) the ToT
131
test and v) a VSTM task. A greater number of correct responses on Benton’s face recognition test
indicates better recognition of newly seen faces, while a greater number of correct responses on
the famous faces test indicates better recognition of well-known faces. Worse semantic memory
is indexed by slower word finding time on the picture priming task. A greater proportion of ToT
responses on the eponymous test indicates worse name recall and lexical production. Finally, the
VSTM task measures working memory as the capacity for reproducing colors in sequence, with
better memory indexed by higher scores. Motor function was assessed via i) a force matching
task, ii) a motor learning task, iii) a RT ‘choice’ task, and iv) a RT ‘simple’ task. Greater mean
over-compensation on the force matching task indicates worse motor control and sensorimotor
integration ability. Slower mean reaction times on the motor learning task indicates worse motor
adaptation. Slower mean reaction times on both the ‘choice’ and ‘simple’ tasks indicate worse
response speed for actions requiring decision-making and automatic processing respectively.
A.1.2 ADNI
Four established dementia rating scales measuring neural function were obtained from the ADNI
repository [43], including i) the clinical dementia rating scale – sum of boxes (CDRSB), ii) the diagnostic Alzheimer’s disease assessment scale (ADAS) versions 11 and 13, and iii) the mini-mental
state exam (MMSE). CDRSB indexes the degree of CI across six categories (memory, orientation,
judgement/problem solving, community affairs, home/hobbies, and personal care), with higher
scores indicating greater CI. Higher total scores on the ADAS also suggest greater CI. Question 4
of the ADAS (delayed word recall) was extracted from the total score for further analysis (ADAS
Q4). The MMSE is frequently used to screen for CI if scores are below 26 out of 30. Cognitive
performance was measured via four neuropsychological measures: i) the RAVLT, ii) delayed recall
on the logical memory test, iii) the digit symbol substitution test, and iv) the trail-making test. The
RAVLT is a verbal memory test involving recall of 15 words across 7 trials over a period of 30
minutes; higher scores index better performance. Measures from the RAVLT include the number
of words learned over trials 1-5 (learning), the number of words recalled after a short delay/trial
132
6 (immediate recall (IR)), the number of words forgotten after a long delay/trial 7 (forgetting),
and the percentage of words forgotten (P). On the logical memory test, a higher number of story
details remembered after a 30-minute delay indicates better verbal memory performance. The
digit symbol substitution test is a measure of psychomotor processing speed and attention; better
performance is indicated by a greater number of symbols substituted in the 90-second time limit.
The trail-making test quantifies visuomotor function, perceptual-scanning, and cognitive flexibility, with longer times for worse performance. Functional impairment was measured via the FAQ,
an informant rated opinion on the subject’s ability to carry out ten complex tasks such as managing
finances or remembering appointments; higher scores represent greater disability.
A.2 Supplementary plots
Sagittal, axial, and coronal planes of the average saliency maps for males are provided in Fig. 7.1,
Fig. 7.2, and Fig. 7.3, and for females in Fig. 7.4, Fig. 7.5, and Fig. 7.6.. Sagittal, axial, and coronal
planes of the average saliency maps for participants with CN are provided in Fig. 7.7, Fig. 7.8, and
Fig. 7.9, and for patients with CI in Fig. 7.10, Fig. 7.11, and Fig. 7.12.
133
Figure 7.1: Sagittal planes of the average salience maps for CN males.
134
Figure 7.2: Same as Fig. 7.1, but for axial planes.
135
Figure 7.3: Same as Fig. 7.1, but for coronal planes.
136
Figure 7.4: Sagittal planes of the average salience maps for CN females.
137
Figure 7.5: Same as Fig. 7.4, but for axial planes in females.
138
Figure 7.6: Same as Fig. 7.4, but for coronal planes in females.
139
Figure 7.7: Sagittal planes of the average salience maps for CN participants.
140
Figure 7.8: Same as Fig. 7.7, but for axial planes.
141
Figure 7.9: Same as Fig. 7.7, but for coronal planes.
142
Figure 7.10: Sagittal planes of the average salience maps for CI participants.
143
Figure 7.11: Same as Fig. 7.10, but for axial planes.
144
Figure 7.12: Same as Fig. 7.10, but for coronal planes.
145
A.3 Correlations with neurocognitive function
Correlations of neurocognitive measures with CA and BA for CN participants from CamCAN
(Table 7.1), CN participants from ADNI (Table 7.2), and patients with CI (MCI in Table 7.3,
AD in Table 7.4, and MCI and AD combined in Table 7.5). Fisher’s z is used to compare the
correlations of neurocognitive measures with CA and BA. Statistical power (1−β) was calculated
for each z-test using G*Power 3.1 [241].
146
neurocognitive measure N age measure rS p SE CIL CIU z p 1−β
Benton faces 622 CA -0.469 2.490E-35 0.001 -0.528 -0.402 0.476 0.634 0.076 BA -0.448 5.675E-32 0.001 -0.509 -0.382
fluid intelligence 621 CA -0.672 1.041E-82 0.001 -0.712 -0.625 0.778 0.436 0.122 BA -0.647 7.632E-75 0.001 -0.690 -0.597
emotion recognition 42 CA -0.231 0.141 0.024 -0.518 0.102 -0.190 0.850 0.054 BA -0.271 0.082 0.023 -0.544 0.050
emotional memory 303 CA -0.530 2.621E-23 0.002 -0.608 -0.444 -0.036 0.971 0.050 BA -0.532 1.633E-23 0.002 -0.606 -0.448
emotional regulation 291 CA -0.114 0.052 0.003 -0.228 0.001 -0.030 0.976 0.050 BA -0.116 0.047 0.003 -0.235 0.000
famous faces 620 CA -0.468 4.922E-35 0.001 -0.531 -0.396 0.741 0.459 0.115 BA -0.434 6.778E-30 0.001 -0.503 -0.362
force matching 300 CA 0.248 1.336E-5 0.003 0.130 0.361 -0.196 0.845 0.054 BA 0.233 4.475E-5 0.003 0.114 0.346
hotel task 618 CA 0.283 7.398E-13 0.001 0.210 0.354 -0.024 0.981 0.050 BA 0.282 9.387E-13 0.001 0.206 0.353
motor learning 296 CA 0.549 1.091E-24 0.002 0.464 0.625 -0.654 0.513 0.100 BA 0.510 5.256E-21 0.003 0.414 0.595
picture priming 543 CA 0.319 2.866E-14 0.002 0.238 0.398 -0.328 0.743 0.062 BA 0.300 8.567E-13 0.002 0.219 0.378
proverb comprehension 611 CA 0.117 0.004 0.002 0.037 0.196 -0.080 0.936 0.051 BA 0.112 0.006 0.002 0.031 0.191
RT (choice) 573 CA 0.680 6.010E-79 0.001 0.628 0.722 -0.674 0.501 0.103 BA 0.658 2.580E-72 0.001 0.608 0.703
RT (simple) 577 CA 0.381 2.467E-21 0.001 0.310 0.446 -0.520 0.603 0.082 BA 0.354 1.741E-18 0.001 0.280 0.420
ToT 604 CA 0.311 5.582E-15 0.002 0.233 0.385 -0.406 0.685 0.069 BA 0.289 4.121E-13 0.002 0.213 0.367
VSTM 609 CA -0.506 7.321E-41 0.001 -0.565 -0.441 0.319 0.749 0.062 BA -0.492 1.947E-38 0.001 -0.552 -0.425
Table 7.1: Correlations between neurocognitive measures and BA or CA for CN participants from CamCAN. Spearman’s rank correlation coefficients rS, p-values for the null hypotheses H0: rS = 0, and power
values 1−β for each correlation are provided. Degrees of freedom are N −2 for all. Fisher’s z- and p-values
are listed for the comparison of correlations between cognitive/neural measures with CA and BA; the null
hypothesis was H0: rS(CA) = rs(BA). p-values in bold are significant after FDR correction. Abbreviations:
N = sample size; SE = standard error; CIL = lower limit of confidence interval; CIU = upper limit of the
confidence interval; RT = response time; ToT = tip-of-the-tongue; VSTM = visual short term memory.
147
neurocognitive measure N age measure rS p SE CIL CIU z p 1−β
CDRSB 76 CA -0.007 0.951 0.012 -0.227 0.191 -0.225 0.822 0.056 BA -0.044 0.704 0.013 -0.285 0.185
ADAS11 76 CA -0.090 0.437 0.013 -0.318 0.143 0.737 0.461 0.114 BA 0.031 0.788 0.013 -0.202 0.265
ADAS13 76 CA -0.055 0.636 0.013 -0.266 0.174 0.764 0.445 0.119 BA 0.071 0.542 0.013 -0.148 0.294
ADASQ4 76 CA -0.029 0.806 0.013 -0.251 0.205 0.813 0.416 0.128 BA 0.106 0.364 0.013 -0.117 0.324
MMSE 76 CA 0.164 0.158 0.014 -0.093 0.394 0.111 0.911 0.051 BA 0.182 0.116 0.014 -0.070 0.419
RAVLT IR 76 CA -0.104 0.369 0.013 -0.329 0.126 -0.580 0.562 0.089 BA -0.198 0.086 0.014 -0.429 0.057
RAVLT learning 76 CA 0.070 0.546 0.013 -0.164 0.292 -1.187 0.235 0.221 BA -0.125 0.281 0.013 -0.344 0.111
RAVLT forgetting 76 CA 0.150 0.197 0.012 -0.061 0.349 -0.017 0.987 0.050 BA 0.147 0.205 0.012 -0.072 0.348
RAVLT P 76 CA 0.174 0.132 0.012 -0.036 0.383 0.277 0.782 0.059 BA 0.218 0.058 0.012 0.009 0.429
logical memory 76 CA 0.111 0.340 0.013 -0.122 0.332 0.095 0.925 0.051 BA 0.126 0.277 0.013 -0.109 0.346
digit symbol 76 CA -0.192 0.097 0.013 -0.408 0.034 0.995 0.320 0.169 BA -0.029 0.801 0.013 -0.265 0.201
trail-making 76 CA 0.114 0.326 0.014 -0.124 0.348 -0.189 0.850 0.054 BA 0.083 0.475 0.014 -0.162 0.327
FAQ 76 CA 0.181 0.117 0.011 -0.020 0.379 -0.478 0.632 0.077 BA 0.104 0.372 0.013 -0.126 0.333
Table 7.2: Same as Table 7.1, for CN participants in ADNI. Abbreviations: CDRSB = clinical dementia
rating sum of boxes; ADAS = Alzheimer’s disease assessment scale; MMSE = mini-mental state exam;
RAVLT = Rey auditory verbal learning test; RAVLT P = RAVLT percent forgetting; RAVLT IR = RAVLT
immediate recall; FAQ = functional activities questionnaire.
148
neurocognitive measure N age measure rS p SE CIL CIU z p 1−β
CDRSB 347 CA -0.064 0.237 0.003 -0.165 0.036 2.967 0.003 0.843 BA 0.161 0.003 0.003 0.061 0.256
ADAS11 346 CA 0.020 0.717 0.003 -0.087 0.121 2.703 0.007 0.771 BA 0.222 3.036E-5 0.003 0.121 0.323
ADAS13 344 CA 0.001 0.981 0.003 -0.099 0.106 2.694 0.007 0.768 BA 0.205 1.319E-4 0.003 0.098 0.304
ADASQ4 345 CA -0.063 0.247 0.003 -0.164 0.041 2.119 0.034 0.563 BA 0.099 0.066 0.003 -0.009 0.200
MMSE 347 CA 0.043 0.419 0.003 -0.062 0.147 -2.292 0.022 0.630 BA -0.131 0.015 0.003 -0.233 -0.030
RAVLT IR 346 CA -0.078 0.148 0.003 -0.181 0.029 -2.236 0.025 0.609 BA -0.244 4.467E-6 0.003 -0.344 -0.132
RAVLT learning 346 CA 0.080 0.137 0.003 -0.022 0.184 -1.947 0.052 0.495 BA -0.068 0.206 0.003 -0.172 0.033
RAVLT forgetting 345 CA -0.068 0.206 0.003 -0.173 0.041 -0.400 0.689 0.069 BA -0.099 0.067 0.003 -0.205 0.009
RAVLT P 343 CA -0.043 0.430 0.003 -0.153 0.062 1.746 0.081 0.415 BA 0.091 0.093 0.003 -0.016 0.194
logical memory 224 CA -0.001 0.985 0.004 -0.135 0.126 -1.177 0.239 0.218 BA -0.113 0.092 0.004 -0.239 0.019
digit symbol 347 CA -0.053 0.321 0.003 -0.157 0.051 -2.231 0.026 0.607 BA -0.220 3.574E-5 0.003 -0.318 -0.116
trail-making 341 CA 0.065 0.231 0.003 -0.045 0.172 2.008 0.045 0.519 BA 0.216 5.737E-5 0.003 0.114 0.315
FAQ 346 CA -0.029 0.585 0.003 -0.131 0.073 2.949 0.003 0.839 BA 0.193 2.991E-4 0.003 0.088 0.292
Table 7.3: Same as Table 7.2, for patients with MCI.
149
neurocognitive measure N age measure rS p SE CIL CIU z p 1−β
CDRSB 172 CA 0.060 0.432 0.006 -0.083 0.207 0.902 0.367 0.147 BA 0.157 0.039 0.006 0.010 0.302
ADAS11 169 CA -0.020 0.797 0.006 -0.173 0.135 0.744 0.457 0.115 BA 0.062 0.426 0.006 -0.087 0.207
ADAS13 160 CA -0.076 0.339 0.006 -0.226 0.076 1.022 0.307 0.176 BA 0.039 0.624 0.006 -0.113 0.190
ADASQ4 169 CA -0.125 0.104 0.006 -0.267 0.030 0.323 0.746 0.062 BA -0.090 0.243 0.006 -0.239 0.063
MMSE 172 CA 0.138 0.071 0.006 -0.014 0.278 -1.138 0.255 0.207 BA 0.015 0.846 0.005 -0.125 0.158
RAVLT IR 168 CA 0.155 0.045 0.006 0.000 0.300 -1.482 0.138 0.317 BA -0.007 0.927 0.006 -0.156 0.142
RAVLT learning 168 CA 0.141 0.068 0.006 -0.018 0.285 -0.699 0.485 0.108 BA 0.065 0.403 0.006 -0.089 0.215
RAVLT forgetting 166 CA 0.225 0.004 0.006 0.072 0.364 -1.320 0.187 0.262 BA 0.083 0.290 0.006 -0.072 0.227
RAVLT P 158 CA 0.025 0.757 0.007 -0.151 0.205 0.646 0.518 0.099 BA 0.098 0.221 0.006 -0.065 0.257
logical memory 126 CA -0.021 0.813 0.008 -0.203 0.165 0.697 0.486 0.107 BA 0.067 0.453 0.008 -0.117 0.249
digit symbol 165 CA 0.189 0.015 0.006 0.032 0.341 -1.130 0.259 0.204 BA 0.065 0.404 0.006 -0.094 0.218
trail-making 151 CA -0.095 0.246 0.007 -0.260 0.075 1.295 0.195 0.254 BA 0.055 0.500 0.006 -0.108 0.208
FAQ 172 CA 0.056 0.468 0.005 -0.083 0.195 1.111 0.267 0.199 BA 0.175 0.022 0.005 0.028 0.310
Table 7.4: Same as Table 7.2, for patients with AD.
150
neurocognitive measure N age measure rS p SE CIL CIU z p 1−β
CDRSB 519 CA 0.009 0.840 0.002 -0.081 0.093 3.629 2.849E-4 0.952 BA 0.231 1.081E-7 0.002 0.147 0.310
ADAS11 515 CA 0.045 0.306 0.002 -0.041 0.133 1.850 0.064 0.456 BA 0.160 2.785E-4 0.002 0.076 0.243
ADAS13 504 CA 0.021 0.634 0.002 -0.066 0.107 1.979 0.048 0.508 BA 0.145 0.001 0.002 0.059 0.229
ADASQ4 514 CA -0.044 0.316 0.002 -0.128 0.043 2.168 0.030 0.583 BA 0.091 0.039 0.002 0.007 0.177
MMSE 519 CA 0.025 0.567 0.002 -0.058 0.113 -2.934 0.003 0.835 BA -0.156 3.551E-4 0.002 -0.237 -0.068
RAVLT IR 514 CA -0.036 0.414 0.002 -0.123 0.053 -2.886 0.004 0.823 BA -0.213 1.055E-6 0.002 -0.295 -0.128
RAVLT learning 514 CA 0.082 0.063 0.002 -0.001 0.167 -3.028 0.002 0.857 BA -0.107 0.015 0.002 -0.190 -0.017
RAVLT forgetting 511 CA 0.025 0.567 0.002 -0.065 0.111 -0.974 0.330 0.164 BA -0.036 0.421 0.002 -0.126 0.049
RAVLT P 501 CA 0.010 0.827 0.002 -0.078 0.100 1.431 0.152 0.299 BA 0.100 0.025 0.002 0.008 0.186
logical memory 350 CA -0.026 0.623 0.003 -0.135 0.079 0.216 0.829 0.055 BA -0.010 0.853 0.003 -0.115 0.092
digit symbol 512 CA -0.011 0.809 0.002 -0.097 0.079 -2.666 0.008 0.760 BA -0.176 6.225E-5 0.002 -0.256 -0.088
trail-making 492 CA 0.052 0.246 0.002 -0.038 0.140 2.488 0.013 0.701 BA 0.208 3.100E-6 0.002 0.124 0.290
FAQ 518 CA 0.021 0.641 0.002 -0.065 0.107 4.095 4.226E-5 0.984 BA 0.269 4.954E-10 0.002 0.188 0.350
Table 7.5: Same as Table 7.2, for patients with CI (MCI and AD combined).
151
CA distribution
BA distribution
N
CA distribution
BA distribution
N
(A) (B)
Figure 7.13: Age estimation errors for test set participants. Scatter plots depict participants’ BAs as a
function of their CAs for males (A, blue) and females (B, red). The marginal distributions of CA and BA are
depicted as histograms on the top and right sides of each panel in (A) and (B), respectively.
B Fractional dynamics foster deep learning of COPD stage
prediction
B.1 Data collection
B.1.1 WestRo COPD dataset
The study cohort represents consecutive patients from 4 Pulmonology Clinics in Western Romania
(i.e., the WestRo cohort, comprising patients from Victor Babe¸s – VB, Medicover 1 – MD1, Medicover 2 – MD2, and Cardio Prevent – CP clinics). Data consist of physiological signals recorded
over long periods (i.e., 6-24 hours), using a protocol that ensures complete patient privacy. To
obtain a reliable medical diagnostic for each patient, we also collected the following data records:
age, sex, body mass index (BMI, as a ratio between mass in kilograms and the squared value of
152
(I) (J) (K)
(L)
(M) (N) (O) (P)
(A) (B) (C)
(D)
(E) (F) (G) (H)
(Q) (R)
* *
* * *
* * *
AG (yr) AG (yr)
CN
UKBB
CN
CamCAN
MCI AD CN
UKBB
CN
CamCAN
MCI AD
sample sample
Figure 7.14: Performance metrics according to sex, cohort, and neurocognitive status. (A) - (H) Scatter
plots of BA vs. CA for each sex in each test set cohort: (A, E) UKBB CN; (B, F) CamCAN CN; (C, G)
ADNI MCI, (D, H) ADNI AD. For CN participants (UKBB, CamCAN), depicted data reflect benchmarking
results. In participants with MCI or AD (ADNI), they are provided for illustration and reference only, since
benchmarking did not involve participants with CI. (I)-(P) Scatter plots of AG vs. CA for each sex across
test cohorts: (I, M) UKBB CN; (J, N) CamCAN CN; (K, O) ADNI MCI; (L, P) ADNI AD. (Q)-(R) AG
distributions for females and males, respectively. In (Q), for CamCAN, the SFCN’s violin plot range is (-76,
34) yr. Gray and black asterisks indicate significant differences in AG means and variances, respectively,
between the 3D-CNN and SFCN.
153
height in meters), smoking history (in years since quitting smoking, with value 0 representing current smokers), FVC and FEV1 in liters and percentage (used to render the COPD stage diagnosis
according to the ERS/ATS recommendation [242], with stage 0 representing no COPD), COPD
assessment test (CAT) and dyspnea severity with modified Medical Research Scale (MRC) questionnaires, exacerbations (number of moderate to severe exacerbation in the last year), COPD onset
(number of years since the onset). For detailed information about CAT and MRC, see Supplementary material, section Standard questionnaires, exacerbation history, and comorbidities of COPD
patients.
We also provide all data about body mass index (BMI), COPD onset, standard questionnaires (CAT
– COPD assessment test, mMRC – modified Medical Research Council dyspnea scale), exacerbation history, and comorbidities (cardiometabolic, cancer, metabolic, psychiatric, renal) for all the
patients in our dataset in Table 7.6
B.1.2 WestRo Porti COPD dataset
The WestRo Porti cohort consists of polysomnography (PSG) physiological signals recorded in
13824 medical cases from 534 individuals during 2013-2020. The subjects in the WestRo Porti
are consecutive individuals in the Victor Babes hospital records, screened for sleep apnea with the
Porti SleepDoc 7 portable PSG device by recording 6 physiological signals (Flow, SpO2, Pulse,
Pulsewave, Thorax, Abdomen) overnight, during sleep. The 6 Porti SleepDoc 7 signals correspond, respectively, to the following NOX T3 signals: Flow, Oxygen Saturation Levels, Pulse,
Plethysmograph, Thorax Breathing Effort, and Abdomen Breathing Effort.
In this work, the same medical doctor gave all diagnoses that led to determining the COPD
labels across all institutions. Moreover, the medical doctor used the same devices and diagnosis
method (sensors to collect physiological signals from patients and spirometers). In addition, the
same medical doctor collected the data in all clinics; spirometry was conducted with the help of
trained, experienced technicians, certified in pulmonary function testing, following the ATS/ERS
154
Patients ID Center COPD stage COPD onset Age Gender Smoking status BMI CAT MRC Exacerbation CC CA MC PC RD
P1 CP 2 0 66 M Ex 7 43.27 26 3 1 1 0 1 0 1
P2 CP 2 1 63 F Smoker 48.49 33 3 1 1 0 1 0 0
P3 CP 2 0 43 M Smoker 33.58 24 3 0 1 0 0 0 0
P4 CP 2 7 71 M Ex 11 27.21 22 2 0 1 0 1 0 0
P5 CP 3 0 63 M Ex 15 47.12 22 2 1 1 0 1 0 0
P6 VB 3 7 70 M Ex 12 27.31 22 2 1 1 0 0 0 0
P7 MD1 2 4 72 M Ex 23 68.17 24 3 1 1 0 1 0 0
P8 VB 2 2 88 M Ex 32 42.97 32 3 1 1 0 1 0 1
P9 CP 2 12 54 M Smoker 21.91 23 2 2 1 0 0 0 0
P10 MD2 2 0 67 M Ex 10 31.14 21 3 0 1 0 1 0 0
P11 CP 2 0 66 M Ex 12 48.45 32 3 1 1 0 1 0 1
P12 VB 3 5 76 M Ex 8 28.91 34 4 1 1 0 1 0 1
P13 CP 3 7 70 M Ex 25 36.57 26 3 0 1 0 0 0 0
P14 MD1 3 1 64 M ex 24 26.5 15 2 0 1 0 0 0 0
P15 VB 2 5 59 M Smoker 38.1 31 3 1 1 0 1 0 0
P16 VB 1 1 48 F ex 15 28.4 13 2 1 0 0 1 0 0
P17 CP 1 2 54 M ex 10 26.3 11 2 0 1 1 0 0 0
P18 VB 1 1 61 M ex 21 31.5 13 2 1 1 0 1 0 0
P19 CP 2 0 62 M Smoker 23.59 16 2 0 1 0 0 0 0
P20 VB 2 0 63 M Smoker 32.81 14 2 1 1 0 0 0 0
P21 VB 4 5 64 M Smoker 32.87 28 3 2 1 1 0 0 0
P22 CP 2 0 76 M Smoker 35.43 36 4 1 1 0 0 0 0
P23 VB 1 2 69 M Ex 22 37.35 28 3 0 1 0 1 0 0
P24 CP 2 1 70 F no 44.1 25 2 1 1 0 1 0 0
P25 VB 3 0 72 M Ex 10 39.86 36 4 1 1 0 1 0 1
P26 MD2 3 1 65 M Ex 1 28.4 24 2 0 1 0 1 0 1
P27 CP 1 3 56 M ex 5 34.1 19 2 0 0 0 1 0 0
P28 CP 3 0 75 M Smoker 23.67 32 3 1 1 0 0 0 0
P29 VB 3 3 60 M Smoker 22.15 21 3 0 0 0 0 1 0
P30 CP 3 2 55 F Smoker 39.1 27 2 2 1 0 0 0 0
P31 MD2 4 6 67 M Ex 2 37.18 34 4 1 1 1 1 1 0
P32 CP 4 7 64 M Ex 4 33.26 28 3 0 1 0 1 0 0
P33 VB 2 3 58 M Smoker 27.08 26 3 1 1 0 0 1 0
P34 CP 2 3 53 M Ex 38 29.67 16 2 1 1 0 0 0 0
P35 VB 3 1 76 M Ex 6 36.39 32 3 1 1 0 0 0 0
P36 CP 4 8 40 M no 41.5 35 3 2 1 0 0 1 0
P37 CP 2 0 77 M Ex 23 38.06 26 3 0 1 0 1 0 0
P38 MD1 2 1 49 F no 40.2 16 2 0 0 0 0 0 0
P39 CP 2 0 67 M Ex 7 22.89 28 3 1 1 0 1 0 0
P40 CP 3 8 68 F Ex 7 22.86 28 3 3 1 0 0 0 0
P41 VB 2 3 72 M Ex 20 31.05 19 2 1 1 0 1 1 0
P42 CP 3 12 67 M Ex 9 30.42 29 2 1 1 0 0 0 0
P43 CP 3 0 68 M Smoker 33.95 34 3 1 1 1 1 0 0
P44 VB 4 1 65 F Smoker 32.46 18 2 0 1 0 0 0 0
P45 VB 2 3 56 M Smoker 18.62 12 2 2 0 0 0 0 0
P46 VB 2 2 43 M no 31 17 2 1 0 0 1 0 0
P47 CP 2 1 55 F ex 10 28.5 20 2 0 1 0 0 0 0
P48 CP 0 0 65 M no 33 0 0 0 0 0 0 0 0
P49 MD1 0 0 43 M no 35.9 0 0 0 0 0 0 0 0
P50 MD2 0 0 68 M ex 24 31.4 0 0 0 0 0 0 0 0
P51 MD1 0 0 46 M no 31.3 0 0 0 0 0 0 0 0
P52 MD1 0 0 51 M no 25.4 0 0 0 0 0 0 0 0
P53 VB 0 0 72 F Smoker 27.4 0 0 0 0 0 0 0 0
P54 MD2 0 0 57 M ex 15 29.6 0 0 0 0 0 0 0 0
Table 7.6: Essential information of all the COPD patients in our dataset which include medical center;
COPD stage; COPD onset; age; gender; smoking status; body mass index (BMI); standard questionnaires
(CAT—COPD assessment test, mMRC—modified Medical Research Council dyspnea scale), exacerbation
history, and comorbidities (cardiometabolic (CC), cancer (CA), metabolic (MC), psychiatric (PC), and renal
(RD))
155
protocol (American Thoracic Society/European Respiratory Society). In all clinics included in our
study, there is a quality control program for all procedures.
Spirometry quality assurance includes examining test values and evaluating both the volumetime and flow-volume curves for evidence of technical errors. During testing, technicians record
a valid test composed of at least 3 acceptable maneuvers with consistent (i.e., repeatable) results
for FVC and FEV1. Achieving repeatability during testing means that the differences between the
largest and second-largest values for both FVC and FEV1 are within 150 ml. Additional maneuvers
can be attempted—up to a maximum of 8—to meet these criteria for a valid test. The observer bias
is reduced by ensuring that observers are well trained (specialized clinics do that regularly with
certification diplomas), having clear rules and procedures in place for the experiment (i.e., the
ERS/ATS protocol), and ensuring that behaviors are clearly defined. Therefore, since the same
medical doctor performed all evaluations with the same equipment and diagnosis approach, we are
confident that we substantially mitigated the intra- and inter-observer variability.
B.2 Multifractal detrended fluctuation analysis
Multifractal detrended fluctuation analysis (MF-DFA) is an effective approach to estimate the multifractal properties of biomedical signals[128]. The first step of MF-DFA is to calculate the cumulative profile (Y(t))
Y(t) =
t
∑
i=1
X(i)− ⟨X(i)⟩, (B.1)
where X is a bounded time series. Then, divide the cumulated signal equally into Ns non-overlapping
time windows with length s, and remove the local line trend (local least-squares straight-line fit) yv
from each time window. Therefore, F(v,s) characterizes the root-mean-square deviation from the
trend (i.e., the fluctuation),
F(v,s) = s
1
s
s
∑
i=1
{Y⌊(v−1)s+i⌋ −yv(i)}
2
. (B.2)
156
In [128], the authors defined the scaling function as
S(q,s) =
1
Ns
Ns
∑
v=1
µ(v,s)
q
1/q
, (B.3)
where µ is an appropriate measure which depends on the scale of the observation (s). Hence, the
scaling function is defined by substituting equation B.2 into equation B.3,
SF(q,s) =
1
Ns
Ns
∑
v=1
n1
s
s
∑
i=1
{Y⌊(v−1)s+i⌋ −yv(i)}
2
oq/2
1/q
. (B.4)
The moment-wise scaling functions for a multifractal signal exhibit a convergent structure that
yields to a focus point for all q-values. The convergent structure was first introduced in [128], and
such focus points, as described in [128], can be deduced from equation B.3, by considering the
signal length as a scale parameter,
S(q,L) =
1
Ns
NL
∑
v=1
µ(v,L)
q
1/q
= {µ(v,L)
q
}
1/q = µ(v,L), (B.5)
where the value of µ represents the entire signal, namely NL = 1 (i.e., takes only one time window
into consideration). According to equation B.5, the scaling function S(q,L) becomes independent
from the exponent q and the moment-wise scaling functions will converge to µ(v,L) which is the
mathematical definition of the focus point.
B.3 Neural network architecture for the WestRo COPD dataset
Fractional dynamics deep learning model (FDDLM). In our work, FDDLM consists of two
parts: (1) fractional signature extraction (for more details, please see section Experimental, subsection Multifractal detrended fluctuation analysis) and (2) a deep learning model. Keeping in
mind the input size of our training data (i.e., the coupling matrix A) and available GPU computational power, we constructed a deep neural network (DNN) architecture to handle the training and
prediction progress. We built the network with the TensorFlow Python framework [231]. Our deep
157
neural network consists of 6 layers: 1 input layer, 2 hidden layers, 2 dropout layers, and 1 output
layer. Also, we resampled the input data (matrix A) to 144×1 voxels and normalized each value
within the range [0, 1] (normalization is a technique for training deep neural networks that standardizes the inputs to a layer). We placed the dropout layers after each hidden layer with a 20%
drop rate (the first hidden layer has 300 neurons and the second hidden layer has 100 neurons);
each fully connected hidden layer utilizes the ReLU activation function. The so ftmax is utilized
as the activation function in the output layer. The DNN is optimized with the rmsprop optimizer
with a learning rate of 0.0001 and trained with the crossentropy loss function. FDDLM is trained
over 500 epochs with a batch size of 64 samples. Overall, the number of trainable parameters of
the deep learning model is 74,105.
Vanilla deep neural network (DNN) model. The Vanilla DNN model shares the same network
structure with the deep learning model in our FDDLM, except the input layer. The Vanilla DNN
contains 6 layers: 1 input layer (the input data, namely, the physiological signals are reshaped to
72000×1 voxels, and each value is normalized within the range [0, 1]), 2 hidden layers (the first
hidden layer has 300 neurons and the second hidden layer has 100 neurons), 2 dropout layers, and
1 output layer. The activation function for each fully connected hidden layer is ReLU, and the
activation function for the output layer is so ftmax. The Vanilla DNN model is optimized with the
rmsprop optimizer, having a default learning rate of 0.0001, and trained with the crossentropy loss
function. The model is trained over 500 epochs with a batch size of 64 samples. The total number
of trainable parameters of the Vanilla DNN model is 21,630,905.
Long short-term memory (LSTM) model. The LSTM model in this work has the following
layers: an input layer (the input physiological signals are reshaped to 6000×12 voxels, and each
value is normalized within the interval [0, 1]), an LSTM layer (with 300 neurons), a dropout layer
(with a 0.2 dropout rate), a dense layer (with 100 neurons), a dropout layer (with a 0.2 dropout
rate), and an output layer. ReLU is the activation function for the LSTM and dense layers. The
158
model is optimized with rmsprop having a default learning rate of 0.0001 and trained with the
crossentropy loss function. The LSTM model is trained over 500 epochs with a batch size of 64
samples. The total number of trainable parameters of the LSTM model is 535,805.
Convolutional neural network (CNN) model. The CNN model in this paper has the following
layers: an input layer (the input physiological signals are reshaped to 72000×1 voxels, each value
normalized within the range [0, 1]), a convolutional layer (64 neurons), a flatten layer, a dropout
layer (with a 0.2 dropout rate), a dense layer (with 32 neurons), a dropout layer (with a 0.2 dropout
rate), and an output layer (with 5 neurons). ReLU is the activation function for the convolutional
and dense layers, while so ftmax is the activation function for the output layer. The CNN model is
optimized with rmsprop having a default learning rate of 0.0001 and trained with the crossentropy
loss function. The CNN model is trained over 500 epochs with a batch size of 64 samples; the total
number of trainable parameters is 147,456,453.
We further compare resource usage and performance across different models under k-fold
cross-validation—namely, FDDLM, Vanilla DNN, LSTM, and CNN—by measuring the following
metrics: execution time, trainable parameters, RAM usage (in GB) and accuracy. The evaluation
results are shown in Figure 7.15 and Table 7.7 (Of note, the values in Table 7.7 are the mean results under k-fold validation (k = 5) and the values in Figure 7.15 are the normalized results from
Table 7.7). For Figure 7.15 and Table 7.7, we observe that our FDDLM has the highest accuracy
(i.e., 96.18%) with the lowest complexity (i.e., memory usage and execution time). These findings
indicate that our model’s predictions are more accurate while requiring a lower complexity than
traditional machine learning models (i.e., Vanilla DNN, CNN, and LSTM).
159
Figure 7.15: The radar plot for measuring the complexity and prediction performance for the WestRo
COPD dataset across different deep learning models
under k-fold validation (k = 5): fractional-dynamics
deep learning model (FDDLM), Vanilla deep neural
network (DNN), long short-term memory (LSTM),
and convolutional neural network (CNN). We normalized all the values represented in this plot.
FDDLM DNN LSTM CNN
RAM usage (GB) ↓ 0.98 8.17 7.68 9.35
Execution time (sec) ↓ 1076 47,590 205,135 345,085
Trainable parameters ↓ 74,105 21,630,905 535,805 147,456,453
Test accuracy (%) ↑ 98.66 77.72 78.54 36.12
Loss ↓ 0.2170 0.9601 0.5728 0.6245
Table 7.7: Complexity and prediction performance for the
WestRo COPD dataset across different deep learning models under k-fold validation (k = 5): fractional dynamics
deep learning model (FDDLM), Vanilla deep neural network (DNN), long short-term memory (LSTM), and convolutional neural network (CNN). ↑ / ↓ indicates higher/lower
values are better. All results are evaluated on the same machine for fair comparison.
B.4 Challenges and limitations of spirometry in COPD
Spirometry is a physiological test that measures the maximal air volume that an individual can
inspire and expire with maximal effort, thus assessing the effect of a disease on lung function. Together with the medical history, symptoms, and other physical findings, it is an essential tool that
provides essential information to clinicians in reaching a proper diagnosis [243]. Indeed, standard
spirometry is a laborious procedure: it needs preparation, a bronchodilation test, performance assurance, and evaluation. [244]
Preparation: (1) The ambient temperature, barometric pressure, and time of day must be recorded.
(2) Spirometers are required to meet International Organization for Standardization (ISO) 26782
standards, with a maximum acceptable accuracy error of ±2.5%. (3) Spirometers need calibration
daily, with calibration verification at low, medium, and high flow. (4) The technicians have to make
sure that the device produces a hard copy of the expiratory curve plot to detect common technical
errors. (5) The pulmonary function technician needs training in the optimal technique, quality performance, and maintenance. (6) There are activities that patients should avoid before testing, such
160
as smoking or physical exercise. (7) Patients should be adequately instructed and then supported
to provide a maximal effort in performing the test to avoid underestimating values and ultimately
diagnosis errors.
Bronchodilation: (1) The forced expiratory volume in one second (FEV1) should be measured
10-15 minutes after the inhalation of 400 mcg short-acting beta2 agonist, or 30-45 minutes after
160 mcg short-acting anticholinergic, or the two combined [245]. (2) Physicians also developed
new withholding times for bronchodilators before bronchodilator responsiveness testing [243].
Performance assurance: (1) Spirometry should be performed using standard techniques. (2) The
expiratory volume/time traces should be smooth and without irregularities, with a less than 1 second pause between inspiration and expiration. (3) The recording should be long enough to reach
a volume plateau; it may take more than 15 seconds in severe cases [246]. (4) Both forced vital
capacity (FVC) and FEV1 should represent the biggest value obtained from any of three out of
a maximum of eight technically good curves, and the values should vary by no more than 5% or
150 ml—whichever is bigger [247]. (5) The FEV1/FVC ratio should be taken as the technically
acceptable curve with the largest sum of FVC and FEV1 [248].
Evaluation: (1) The measurements evaluation compares the results with appropriate reference
values—specific to each age, height, sex, and race group. (2) The presence of a post-bronchodilator
FEV1/FVC < 0.70 confirms the presence of airflow limitation [249].
It is clear that the diagnosis process—primarily relying on spirometry—is pretty complex and,
thus, prone to errors because of human intervention. The large university clinics, such as our Victor Babes clinic in Timisoara (and the other institutions included in our paper’s recordings), avoid
errors by carefully training their personnel and enforcing strict procedures. Additionally, experienced and well-trained physicians corroborate the spirometry results with other clinical data, such
161
that diagnostic mistakes are highly improbable. However, we face a big problem with spirometry in primary care offices, which do not have all resources to consistently abide by the quality
assurance steps (preparation, bronchodilation test, performance assurance, and evaluation). Hegewald ML et al. showed that most spirometers tested in primary care offices were not accurate, and
the magnitude of the errors resulted in significant changes in the categorization of patients with
COPD. Indeed, they obtained acceptable quality tests for only 60% of patients [250]. In a similar
study, the authors reported a spirometry accuracy varying from 69.1% to 81.4% in the primary
care offices [251]. These prior experimental studies and findings are significant for the medical
community and constitute the motivation for our paper since primary care offices have an essential
role in the early detection of COPD cases.
B.5 Definition of COPD stages
The diagnosis of COPD is based on persistent respiratory symptoms such as cough, sputum production, and dyspnea, together with airflow limitation (caused by significant exposure to smoking,
noxious particles, or gases) evaluated with spirometry. The labels or disease stages are defined in
the standard guideline of the worldwide medical community [252]. Based on the FEV1 (forced
expiratory volume in one second) value measured by spirometry, the Global Initiative for Chronic
Obstructive Lung Disease (GOLD) guideline system categorizes airflow limitation into stages. In
patients with FEV1/FVC (forced vital capacity) <0.70, the standard labels: (1) STAGE 1 – mild:
FEV1≥ 80%; (2) STAGE 2 – moderate: 50% ≤ FEV1 <80%; (3) STAGE 3 – severe: 30% ≤
FEV1 < 50%; (4) STAGE 4 – very severe: FEV1 <30%. Additionally, in this paper, we assign the
STAGE 0 label to patients without COPD (i.e., FEV1/FVC≥0.70).
B.6 Early COPD stages
Nowadays, there have been many debates in the literature regarding the early stage of COPD or
the so-called asymptomatic COPD. Patients with COPD often underestimate the severity of the
disease—–primarily early morning and nighttime symptoms. The reasons may be the slow onset
162
of their symptoms, cough due to a long cigarette smoking history, and dyspnea attributed to getting
older. The majority of patients from a European cohort stated that they were not wholly frank with
their doctors during visits when reporting their symptoms and quality of life [253].
Around 36% of patients who describe their symptoms as mild-to-moderate also admit to being
too breathless to leave the house. For these reasons, there are two validated questionnaires (i.e.,
CAT and Modified Medical Research Council (mMRC)) that allow clinicians to accurately and objectively assess COPD symptoms. CAT is a globally used, 8 question, patient-filled questionnaire
to evaluate the impact of COPD (cough, sputum, dyspnea, chest tightness) on health status. The
range of CAT scores is 0–40. Higher scores denote a more severe impact of COPD on a patient’s
life [254]. The mMRC Dyspnea Scale stratifies dyspnea severity in respiratory diseases, particularly COPD; it provides a baseline assessment of functional impairment attributable to dyspnea in
respiratory diseases. Moreover, despite being highly symptomatic (mMRC≥2 and CAT≥10) and
having at least one exacerbation, many COPD patients did not seek medical help, as they felt COPD
symptoms as part of their daily smoking routine or due to aging. COPD awareness is poor among
smokers; the smoker population underestimates their respiratory symptoms, while their exercise
activity is reduced many times. Not surprisingly, 14.5% of the newly diagnosed COPD population
was reported as asymptomatic in primary care clinics [255]. Also, there is a high prevalence of
COPD among smokers with no symptoms [256]. We did not consider subjectively reported or observed clinical symptoms; instead, our analysis is based only on objectively measured parameters
(i.e., physiological signals).
Spirometry as a screening tool for the early stage of the disease is not entirely robust [257].
Indeed, spirometry can diagnose asymptomatic COPD, but its use is only recommended in smokers
or individuals with a history of exposure to other noxious stimuli [258]. Despite having an apparent
normal lung function, smokers with normal spirometry but a low diffusing capacity of the lung for
carbon monoxide (DLCO) are at significant risk of developing COPD with obstruction to airflow
[259]—a category that may also be asymptomatic COPD. Moreover, no other disease markers are
163
known to date to predict which patients with COPD of recent onset will progress to more significant
disease severity.
Nonetheless, undiagnosed asymptomatic COPD has an increased risk of exacerbations and
pneumonia. For these reasons, we need better initiatives for the early diagnosis and treatment of
COPD [260]. Our method also aims at addressing the problem of early detection because it has
an excellent accuracy at detecting early stages 1 and 2, which can also be detected with Spirometry. However, if our method can identify asymptomatic COPD that spirometry-based methods
cannot see remains an open question; to that end, we need a longitudinal study starting with a
significant cohort, which tracks the evolution of individuals over time to see if those predicted as
asymptomatic COPD indeed develop the symptomatic form of the disease after several years.
C Deciphering Network Science in Brain-Derived Neuronal
Cultures
C.1 Sample preparation & Microscopy
Neural clusters were prepared from mouse neurons and neural networks were prepared from rat
neurons. Neurons harvested from B6/J mice were thawed and plated on poly-D-lysine-coated
glass-bottom petri dishes. Low-density cultures (65 cells per mm2
) were grown at 37 , in the presence of 5% CO2, in Neurobasal growth medium supplemented with B-27, 1% 200 mM glutamine
and 1% penicillin/streptomycin. All reagents were sourced from Invitrogen (Thermo Fisher Scientific). Half the media was aspirated twice a week and replaced with fresh maintenance media
warmed to 37 . Live-cell imaging took place three days in vitro [261].
Dissected cortical tissue from AGE rats was dissociated in 3mg/mL protease 23 (Sigma P4032)
in 1x slice dissection solution (pH 7.4). Primary neurons were grown on Poly-d-Lysine (Advanced
BioMatrix 5049-50) treated glass-bottom dishes (Cellvis, P06-14-0-N). Cells were grown in maintenance media for 10 days before time-lapse microscopy. We observe a density of approximately
164
800 cells per mm2
.
All animal procedures were carried out per approved protocols from the Institutional Animal
Care and Use Committees (IACUC) at University of Nebraska Medical Center and University
of Illinois Urbana Champaign, and in accordance with the recommendations in the Guide for the
Care and Use of Laboratory Animals of the National Institutes of Health. (Animal Assurance PHS:
#A3294-01, Protocol Number: 10-033-08-EP).
Spatial light interference microscopy (SLIM) is an optical microscopy technique that can capture the of living neurons [195]. Neurons are particularly challenging to image, as complex
phenotypes such as arborization are adversely modulated by phototoxicity. A higher resolution
SLIM imaging method could decouples amplitude artifacts from high detailed cellular information.
When imaging neural networks, we attempt to ameliorate phototoxicity concerns by reducing the
illumination intensity (Thorlabs MCWHL5, 30 milliamps, 3% of total power) and average over
several images following the hybrid denoising scheme in [262]. To boost the sensitivity of our
measurements, we choose to use Spatial Light Interference Microscopy (SLIM Pro, Phi Optics)
which is particularly well suited to imaging the fine details found in neuronal arbors[261].
C.2 Networks centrality
In this paper, network centralities are analyzed by the NetworkX package in Python. The networks
centrality measures the importance of a node across a heterogeneous complex network. We briefly
introduced the degree-, closeness-, and betweenness- centrality as follows:
Degree centrality[263] of node v is defined as:
Degree(v) = deg(v) (C.1)
165
where deg(v) is the number of links upon node v. The node-to-node degree distribution means the
quotient between degree distribution of u and v (assume a edge e(u, v), where u is the source node
and v is the destination node).
Closeness centrality[264] of a node quantifies the average length of the shortest path between
the target node and all other nodes in the graph and encodes information about the information
transmission latency across a specific network topology. It can be defined as:
Closeness(v) = 1
∑u d(u, v)
(C.2)
where d(u, v) is the distance between node u and node v.
Betweenness centrality[265] is generally used to measure the number of time a node appears
along the shortest paths between two other randomly picked nodes. It can be defined as:
Betweenness(v) = ∑
s̸=v̸=t∈V
σst(v)
σst
(C.3)
where σst is the number of shortest paths between node s and t and σst(v) is the number of these
shortest paths which pass through the node v.
C.3 Clustering Coefficient.
In this thesis, these network clustering coefficients are analyzed by the NetworkX package in
Python. The clustering coefficient measures the degree of which degree in a complex network tend
to cluster together.
Transitivity also called global clustering coefficient is based on triplets of nodes. Triplet means
a group of three nodes which are connected by two or three edges. It can be defined as:
C =
3×number o f triangles
number o f all triplets
(C.4)
166
The clustering coefficient (or local clustering coefficient) measures how close /connected a
node is to its neighbors and forming a clique (i.e., a complete graph). The clustering coefficient is
defined as follows:
C(G) = 1
|V′
| ∑
v∈V′
c(v) (C.5)
where V
′
is the set of nodes (v) whose degree is larger or equal to 2. c(v) = δ(v)/τ(v), where δ(v)
is the number of triangles of node v and τ(v) is the number of all triplets of node v.
The squared clustering coefficient quantifies the cliquishness in bipartite networks (e.g. social
network) where triangles are absent (the standard clustering coefficient is always zero). Similar
with the triangles, the squares clustering coefficient is the rate between the number of squares and
the total number of possible squares [266]. It can be defined as:
C4,mn(i) = qimn
(km −αimn)(kn −αimn) +qimn
(C.6)
where qimn represents the number of neighbors of m and n (not considering node i); αimn is represented as αimn = 1+qimn+θmn; θmn is 1 if m and n are connected and 0 otherwise; ki
is the number
of neighbors for node i.
C.4 Multifractal analysis
Using the box-covering algorithm in a mono-fractal network, we can capture the relationship of
r (e.g., the size of the box) and M(r) (the number of nodes in the box) as a power law of the
form M(r) ∼ r
D, where D is the fractal dimension (a real-valued number representing the monofractal feature of the network). Multifractals could be considered as the superposition of multiple
mono-fractals, and we use the finite box-covering algorithm[189] to study the localized and heterogeneous self-similarity of networks. To capture the multifractal features in networks, the distortion
factor q was introduced to distinguish the details of different fractal structures. Then we can capture
167
the multifractality of the network by learning a generalized fractal dimension D(q) under different
distortion factors q. In this way, the number of nodes in the ith box scales as Mi(r) ∼ r
αi and the
number of boxes with the same α scales as N(α) ∼ r
−f(α)
, where α is the Holder exponent. The
relationship between the pair of (D(q),q) and (f(α),α)is decided by the Legendre transformation
as:
α =
d
dq
[(q−1)D(q)] (C.7)
f(α) = qα −(q−1)D(q) (C.8)
Thus, we can use Eqs C.7 and C.8 to calculate the multifractal spectrum f(α), therefore, we can
analysis the multifractality of the network by observing the spectrum.
168
Abstract (if available)
Abstract
In the realm of Medical Cyber-Physical Systems, where devices and information systems interact, medical cyber–physical data is generated digitally, stored electronically, and accessed remotely by medical professionals and patients. With the rise of medical big data, the collection and sharing of medical cyber–physical data offer significant value for diagnoses, pathological analyses, epidemic tracking, pharmaceuticals, insurance, and more. Prior research has focused on the interaction between cyber and physical spaces, constructing CPS-based architectures, and utilizing big data, deep learning, and cloud computing to develop medical diagnosis systems. However, challenges persist in these medical cyber-physical diagnosis systems, including (1) the lack of interpretability in deep learning models; (2) fluctuations in data quality, notably impacting the performance of medical data analysis; and (3) inadequate model designs misaligning datasets’ geometric properties with model architectures and (4) classical backpropagation limits compatibility with parallel computing during training of the deep learning architecture. To address these limitations, this thesis proposes novel mathematical-based deep learning and super-resolution models to accurately analyze medical datasets and improve image data quality. The overarching goal is to leverage Cyber-Physical Systems approaches for modeling, analysis, sensing, and optimization in healthcare and medical diagnosis, while overcoming the shortcomings of previous work.
To improve model performance and enhance the interpretability of architecture, we introduce an interpretable 3D-CNN model for predicting participants’ brain age (BA) using saliency maps. These maps shed light on hidden feature attribution and regional brain characteristics that reflect BA, offering insights into cognitive aging patterns. The model’s generalizability to new cohorts is also demonstrated. This study’s translational potential is highlighted by the connections between estimated BAs and neurocognitive measures of cognitive impairment (CI). In the pursuit of better medical image quality, we present two novel invertible neural networks (INNs): IRN-A and IRN-M. These networks address image super-resolution (SR) by enhancing image quality. Normalizing flow models using invertible neural networks (INN) have been widely investigated for successful generative image super-resolution (SR) by learning the transformation between the normal distribution of latent variable z and the conditional distribution of high-resolution (HR) images gave a low-resolution (LR) input. While the random sampling of latent variable z is useful in generating diverse photo-realistic images, it is not desirable for image rescaling when accurate restoration of the HR image is more important. Hence, in places of random sampling of z, we propose auxiliary encoding modules to further push the limit of image rescaling performance. Two options to store the encoded latent variables in downscaled LR images, both readily supported in existing image file format, are proposed. One is saved as the alpha-channel, the other is saved as meta-data in the image header, and the corresponding modules are denoted as suffixes-A and-M respectively. These models outperform existing methods in terms of PSNR and SSIM, pushing the boundaries of image rescaling performance.
In addition to model interpretation and image enhancement, leveraging the geometric characteristics of complex data, such as lengthy time series, can significantly enhance model performance and reduce computational complexity. Our approach involves fractional-order dynamical modeling, which extracts distinctive signatures (coupling matrices) from physiological signals across patients with chronic obstructive pulmonary disease (COPD). These fractional signatures are then utilized to construct and train a deep neural network capable of predicting COPD stages for suspected patients. Input features, including thorax breathing effort, respiratory rate, and oxygen saturation levels, inform the predictions. By employing this methodology, we ensure equitable access to reliable medical consultation for patients, particularly in regions with lower socioeconomic conditions.
Finally, to accelerate the training process of classic neural network (NN) architectures, we introduce a worker concept by incorporating local loss functions into the NN design. This NN structure contains workers that encompass one or more information processing units (e.g., neurons, filters, layers, or blocks of layers). Workers are either leaders or followers, and we train a leader-follower neural network (LFNN) by leveraging local error signals. The LFNN does not require backpropagation (BP) or a global loss function to achieve optimal performance (we denote LFNN trained without BP as LFNN-ℓ). Byinvestigating worker behavior and evaluating the LFNN and LFNN-ℓ architectures on a variety of image classification tasks (e.g., MNIST, CIFAR-10, ImageNet), we demonstrate that LFNN-ℓ trained with local error signals achieves lower error rates and superior scalability than state-of-the-art machine learning approaches. Furthermore, LFNN-ℓ can be conveniently embedded in classic convolutional NN architectures (e.g., VGG, ResNet, and Vision Transformer (ViT)), achieving a 2x speedup compared to BP-based methods and significantly outperforming models trained with end-to-end BP and other state-of-the-art BP-free methods in terms of accuracy on CIFAR-10, Tiny-ImageNet, and ImageNet.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Theoretical foundations for modeling, analysis and optimization of cyber-physical-human systems
PDF
Verification, learning and control in cyber-physical systems
PDF
Theoretical foundations and design methodologies for cyber-neural systems
PDF
Learning logical abstractions from sequential data
PDF
Theoretical and computational foundations for cyber‐physical systems design
PDF
Data-driven and logic-based analysis of learning-enabled cyber-physical systems
PDF
Physics-aware graph networks for spatiotemporal physical systems
PDF
Deep learning models for temporal data in health care
PDF
An FPGA-friendly, mixed-computation inference accelerator for deep neural networks
PDF
Dealing with unknown unknowns
PDF
Acceleration of deep reinforcement learning: efficient algorithms and hardware mapping
PDF
Learning distributed representations of cells in tables
PDF
Exploring complexity reduction in deep learning
PDF
Efficient learning: exploring computational and data-driven techniques for efficient training of deep learning models
PDF
Efficient deep learning for inverse problems in scientific and medical imaging
PDF
Understanding dynamics of cyber-physical systems: mathematical models, control algorithms and hardware incarnations
PDF
Advanced technologies for learning-based image/video enhancement, image generation and attribute editing
PDF
3D deep learning for perception and modeling
PDF
Deep learning techniques for supervised pedestrian detection and critically-supervised object detection
PDF
Deep representations for shapes, structures and motion
Asset Metadata
Creator
Yin, Chenzhong
(author)
Core Title
Biological geometry-aware deep learning frameworks for enhancing medical cyber-physical systems
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Degree Conferral Date
2025-05
Publication Date
01/21/2025
Defense Date
01/17/2025
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
computer vision,cyber-physical systems,deep learning,OAI-PMH Harvest
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Bogdan, Paul (
committee chair
), Deshmukh, Jyotirmoy (
committee member
), Irimia, Andrei (
committee member
), Ortega, Antonio (
committee member
)
Creator Email
chenzhoy@usc.edu
Unique identifier
UC11399FD4T
Identifier
etd-YinChenzho-13768.pdf (filename)
Legacy Identifier
etd-YinChenzho-13768
Document Type
Dissertation
Format
theses (aat)
Rights
Yin, Chenzhong
Internet Media Type
application/pdf
Type
texts
Source
20250115-usctheses-batch-1236
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
computer vision
cyber-physical systems
deep learning