Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Sense and sensibility: statistical techniques for human energy expenditure estimation using kinematic sensors
(USC Thesis Other)
Sense and sensibility: statistical techniques for human energy expenditure estimation using kinematic sensors
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
SENSE AND SENSIBILITY: STATISTICAL TECHNIQUES FOR
HUMAN ENERGY EXPENDITURE ESTIMATION USING
KINEMATIC SENSORS
by
Harshvardhan Vathsangam
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
August 2013
Copyright 2013 Harshvardhan Vathsangam
Acknowledgements
John Donne said it best when he said “No man is an island, Entire of itself. Each is a
piece of the continent, A part of the main.” The fruits of this thesis would not be possible
without the innumerable yet immeasurable instances of help and support of a number of
people. I have tried my best to thank all those I could think of. I trust that people who
haven’t been mentioned will understand that it is because I have been forgetful but not
ungrateful for their help and support.
First, I wish to thank my thesis committee for their guidance and help at various
stages of my nascent career. A special thank you to Fei Sha for teaching me my bread
and butter - machine learning. The time I spent in all those classes pays off everyday.
Thank you to Jill McNitt-Gray for patiently listening to my sermons about regression
and accelerometers. I am also grateful to the USC Annenberg Fellowship for their sup-
port and sincerely hope that I have justified their faith in me.
The first half of my life at USC was spent in the hallways of the Keck School of
Medicine learning how to do transdisciplinary research. For this, I am grateful to Donna
Spruijt-Metz for providing me the KNOWME project to cut my teeth. I would especially
like to thank Adar and Jeremy Emken for being more than mentors at times, offering
guidance and boosting the confidence of a noob grad student. I wish to thank E. Todd
Schroeder, one of the best collaborators one could ask for. More than half of this thesis
would not be possible without his support. Thank you David Erceg for patiently sitting
ii
with me and explaining how to work (and consequently not break) expensive clinical
equipment.
I am extremely grateful to the Robotic Embedded Systems Lab folks for the im-
mense role they have played in my personal and professional life. Thank you to Karthik
Dantu, Sameera Poduri, Megha Gupta, Hordur Heidarsson, Arvind Pereira, the two Jons
and Jnaneshwar Das for all the comments, discussions, food, tea and alcohol you have
provided at various points in time. A special shout-out to the bros Maheswaran Sathi-
amoorthy, Pramod Sharma and Prithviraj Banerjee.
I have been very lucky in getting the chance to work with some of the best Masters
and undergraduate students on campus. Thanks to Ankit Sharma for working on the
Movement Trackr app and to Anupam Tulsyan, who rolled measurement wheels with
me for indoor localization. Thanks to James Reinebold for telling me when I am at rest.
Thank you Mihir Daptardar for setting up a great infrastructure to launch our future
projects on. To Alec Tarashansky, a future star that I am lucky to mentor.
For a long time, I have had the good fortune of sharing space with some of the
smartest postdoctoral scholars I know. Thank you to Ryan Smith and Geoff Hollinger for
showing me what the highest standards of research should be. Talking to you, listening
to you and seeking your advice has been an education in itself and made me grow faster
as a professional.
One of the best things I could have done in my life was to take up two internships
with the folks in the Computational User Experiences Group at MSR, Redmond. Thank
you especially to T. Scott Saponas for providing me the opportunity to expand my hori-
zons and tackle really challenging problems in health sensing. You have pushed me to
go beyond the limits of what I thought I could do and I am better for it. An equally grate-
ful thanks to Dan Morris, if I am even 10% the researcher that you are, I would consider
iii
that a job well done. Thank you to Desney Tan for providing me the mentorship and
conversations I sorely needed and teaching me to “think big” on my career goals.
Which brings me to the colossus of my professional life. I will be forever indebted
to my advisor and mentor Gaurav S. Sukhatme for the uncountable hours of support
he has provided me. It is impossible to find an advisor who understood my madness
so well and then nurtured, encouraged and supported what I wanted to do at any given
time. Gaurav gave me the personal freedom to pursue the topics that interested me and
backed me no end. I hope that your research DNA has rubbed off on me and I promise
to take it forward and solve the toughest problems mankind faces.
To say that I have saved the best for the last would be an understatement. I would
be nowhere without the sacrifices, understanding and support of the closest people in
the world to me: my family. To my parents Srimathi and Rangachari Vathsangam, my
role models for life. You two are my guiding lights and pillars of strength. Every step I
take will be because you taught me to walk. To my sister and “perennial rival” Nivedita
Vathsangam, one of the best people I know. I hope you achieve everything that you set
your mind to and honestly don’t see why you won’t. To my other siblings, Prince and
Tia, my life would not be complete without your fur in my clothes, toys on my bed and
drool on my plates. And to my wife Shruti. We have spent thousands of (kooky) hours
together talking about everything under the sun and being there for each other. I’m very
lucky to have you and can’t wait to grow together and share many more adventures in
the times to come.
iv
Contents
Acknowledgements ii
List of Tables ix
List of Figures x
Abstract xv
Chapter 1: Physical Activity and Energy Expenditure 1
1.1 Physical Activity and Health . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Approach and Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Contributions of Dissertation . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 Mathematical Formulation of Energy Expenditure Prediction . . 7
1.3.2 Robust Characterization of Movement in the Frequency Domain 8
1.3.3 Development of Personalized Energy Expenditure Maps . . . . 8
1.3.4 Development of Energy Maps with Minimal Data . . . . . . . . 8
1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Chapter 2: Estimating Energy Expenditure 10
2.1 Techniques to Measure Energy Expenditure . . . . . . . . . . . . . . . 11
2.1.1 Direct Calorimetry . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.2 Indirect Calorimetry . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.2.1 Doubly Labeled Water . . . . . . . . . . . . . . . . . 12
2.1.2.2 Metabolic Units . . . . . . . . . . . . . . . . . . . . 13
2.1.2.3 Questionnaires . . . . . . . . . . . . . . . . . . . . . 14
2.1.2.4 Pedometers . . . . . . . . . . . . . . . . . . . . . . 14
2.1.2.5 Heart rate monitors . . . . . . . . . . . . . . . . . . 15
2.2 Kinematic Sensors and Energy Expenditure . . . . . . . . . . . . . . . 15
2.2.1 Kinematic Sensors . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2 Estimation Methodologies . . . . . . . . . . . . . . . . . . . . 17
2.2.3 Commercial Activity Monitors . . . . . . . . . . . . . . . . . . 18
2.2.3.1 Count-based monitors . . . . . . . . . . . . . . . . . 18
2.2.3.2 Pattern Recognition-based monitors . . . . . . . . . . 19
2.2.4 Energy Expenditure as a Regression Problem . . . . . . . . . . 20
2.3 Mathematical Formulation . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Energy Expenditure for Walking . . . . . . . . . . . . . . . . . . . . . 23
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
v
Chapter 3: Robust Movement Capture using Frequency Signatures 26
3.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.1 Periodicity of Center of Mass Movements . . . . . . . . . . . . 27
3.1.2 Time-Domain Techniques for Measuring Periodicity . . . . . . 29
3.1.3 Spectral Techniques for Activity Monitoring . . . . . . . . . . 30
3.2 Robustness of CoM tracking in the Frequency Domain . . . . . . . . . 32
3.3 Efficient Computation of Frequency Content . . . . . . . . . . . . . . . 41
3.3.1 The Momentary Fourier Transform . . . . . . . . . . . . . . . 42
3.3.1.1 Algorithm Description . . . . . . . . . . . . . . . . . 42
3.3.1.2 Performance Analysis . . . . . . . . . . . . . . . . . 45
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Chapter 4: Mapping Movement to Energy Expenditure 49
4.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2 Qualitative Illustration of Energy Expenditure Maps . . . . . . . . . . . 52
4.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3.1 Least-Squared Regression (LSR) . . . . . . . . . . . . . . . . . 53
4.3.1.1 Model definition . . . . . . . . . . . . . . . . . . . . 53
4.3.1.2 Inference . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3.1.3 Prediction . . . . . . . . . . . . . . . . . . . . . . . 54
4.3.2 Bayesian Linear Regression (BLR) . . . . . . . . . . . . . . . 55
4.3.2.1 Model Definition . . . . . . . . . . . . . . . . . . . 55
4.3.2.2 Inference . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3.2.3 Prediction . . . . . . . . . . . . . . . . . . . . . . . 56
4.3.3 Gaussian Process Regression (GPR) . . . . . . . . . . . . . . 58
4.3.3.1 Model Definition . . . . . . . . . . . . . . . . . . . 58
4.3.3.2 Inference . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3.3.3 Prediction . . . . . . . . . . . . . . . . . . . . . . . 60
4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4.1 Participant Statistics . . . . . . . . . . . . . . . . . . . . . . . 62
4.4.2 Data Collection and Pre-processing . . . . . . . . . . . . . . . 63
4.4.3 Training and Testing Procedure . . . . . . . . . . . . . . . . . 64
4.4.4 Comparison between sensors . . . . . . . . . . . . . . . . . . . 64
4.4.4.1 Single sensor feature comparison . . . . . . . . . . . 64
4.4.4.2 Comparison between accelerometer and gyroscopic data 66
4.4.5 Comparison across algorithms . . . . . . . . . . . . . . . . . . 67
4.4.6 Run time versus accuracy . . . . . . . . . . . . . . . . . . . . 67
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
vi
Chapter 5: Hierarchical Approaches to Creating Energy Expenditure Maps 72
5.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2 Qualitative Illustration of Hierarchical Maps . . . . . . . . . . . . . . . 74
5.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.3.1 Personal Models . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.3.1.1 Model Definition . . . . . . . . . . . . . . . . . . . 77
5.3.1.2 Inference . . . . . . . . . . . . . . . . . . . . . . . . 77
5.3.1.3 Prediction . . . . . . . . . . . . . . . . . . . . . . . 77
5.3.2 Weight-Scaled Models . . . . . . . . . . . . . . . . . . . . . . 79
5.3.3 Nearest-Neighbor Models . . . . . . . . . . . . . . . . . . . . 79
5.3.4 Hierarchical Linear Models . . . . . . . . . . . . . . . . . . . 80
5.3.4.1 Model Definition . . . . . . . . . . . . . . . . . . . 80
5.3.4.2 Inference . . . . . . . . . . . . . . . . . . . . . . . . 82
5.3.4.3 Prediction . . . . . . . . . . . . . . . . . . . . . . . 83
5.3.4.4 A note on initialization: . . . . . . . . . . . . . . . . 84
5.3.5 ACSM Speed-based Models . . . . . . . . . . . . . . . . . . . 85
5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.4.1 Participant Statistics . . . . . . . . . . . . . . . . . . . . . . . 86
5.4.2 Data Collection and Pre-processing . . . . . . . . . . . . . . . 87
5.4.3 Evaluation Methodology . . . . . . . . . . . . . . . . . . . . . 88
5.4.4 Comparison of Algorithms . . . . . . . . . . . . . . . . . . . . 88
5.4.5 Best Individual descriptor . . . . . . . . . . . . . . . . . . . . 90
5.4.6 Predictions with Reduced Training Data . . . . . . . . . . . . . 92
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Chapter 6: Conclusion 96
6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.3 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Bibliography 102
Appendix A: Data Collection 117
A.1 Data Collection 1: Personal Energy Expenditure . . . . . . . . . . . . . 117
A.1.1 Participant Statistics . . . . . . . . . . . . . . . . . . . . . . . 118
A.1.2 Hardware Description . . . . . . . . . . . . . . . . . . . . . . 118
A.1.3 Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
A.2 Data Collection 2: Energy Expenditure across a large population . . . . 120
A.2.1 Participant Statistics . . . . . . . . . . . . . . . . . . . . . . . 120
A.2.2 Hardware Description . . . . . . . . . . . . . . . . . . . . . . 121
A.2.3 Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
A.2.4 Illustrations of energy expenditure across participants . . . . . . 123
vii
A.3 Movement Trackr Application . . . . . . . . . . . . . . . . . . . . . . 123
Appendix B: Derivations of Learning Algorithms 126
B.1 Least-Squared Regression . . . . . . . . . . . . . . . . . . . . . . . . 126
B.2 Bayesian Linear Regression . . . . . . . . . . . . . . . . . . . . . . . 127
B.3 Hierarchical Linear Modeling . . . . . . . . . . . . . . . . . . . . . . . 129
Appendix C: Inactivity Recognition 133
C.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
C.1.1 Hardware Sensors . . . . . . . . . . . . . . . . . . . . . . . . 135
C.1.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . 136
C.1.3 Sensor Data Coordinate Transformation . . . . . . . . . . . . . 137
C.2 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
C.2.1 Trial 1: Constant movement . . . . . . . . . . . . . . . . . . . 138
C.2.2 Trial 2: Constant stationary behavior . . . . . . . . . . . . . . 138
C.2.3 Trial 3: Mixture of behaviors . . . . . . . . . . . . . . . . . . 138
C.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
C.3.1 Classifiers Trained on Constant Behaviors . . . . . . . . . . . . 140
C.3.1.1 Classification Accuracies . . . . . . . . . . . . . . . 140
C.3.1.2 Effect of Different Feature Sets . . . . . . . . . . . . 140
C.3.1.3 Effect of Coordinate Transformation to Global Coor-
dinate Frame . . . . . . . . . . . . . . . . . . . . . 142
C.3.2 Classifiers Trained on Mixed Behaviors . . . . . . . . . . . . . 142
C.3.2.1 Classification Accuracies . . . . . . . . . . . . . . . 142
C.3.2.2 Effect of Different Feature Sets . . . . . . . . . . . . 143
C.3.2.3 Effect of Coordinate Transformation to Global Coor-
dinate Frame . . . . . . . . . . . . . . . . . . . . . 144
C.3.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Appendix D: Energy Expenditure At Rest 146
Appendix E: Model Terminology 148
viii
List of Tables
2.1 Comparison of common activity monitors. For a more complete list
refer Welk et al [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
C.1 Confusion Matrix for Constant Behaviors (shown for kNN). Each value
in the confusion matrix represents one window of features that was as-
signed as either stationary or moving by the algorithm. All points from
Trial 1 were assigned a ground truth of moving and all points from Trial
2 were assigned a ground truth of stationary. Data from both trials was
used for this confusion matrix. . . . . . . . . . . . . . . . . . . . . . . 140
C.2 Confusion Matrix for Mixed Behaviors (shown for kNN). Each value
of the confusion matrix represents one window of features that was as-
signed as either stationary or moving by the algorithm. Ground truth
was obtained from annotating video recordings of the subjects as they
completed the tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
D.1 Common predictive equations to predict resting energy expenditure. All
results are inkcal=day . . . . . . . . . . . . . . . . . . . . . . . . . . 146
E.2 Terms used in this dissertation . . . . . . . . . . . . . . . . . . . . . . 149
ix
List of Figures
1.1 Our vision for robust physical activity monitoring: Accurate monitoring
relies on being able sense movement, characterize the movement data
and model an appropriate physiological response from the sensor data.
The results of these sensing techniques can be fed back to the user in
an appropriate format. This interaction will allow the derivation of ap-
propriate intervention techniques to modify user behavior which can be
sensed again. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 An example of one of the first respiration-based ’open-circuit’ indirect
calorimeters designed by Zuntz et al (1906). . . . . . . . . . . . . . . 11
2.2 Illustration of various techniques to estimate energy expenditure. . . . 16
3.1 Illustration of the biomechanical modeling of walking and running with
center of mass shown. Most common activities in our day are cyclical
nature. Tracking the periodic movement of this center of mass would
provide a good descriptor of movement for these activities. . . . . . . . 28
3.2 Illustration of periodic signal capture of walking using a phone-based
triaxial accelerometer and gyroscope mounted on the iliac crest. . . . . 32
3.3 An example of the utility of axis-wise Fourier Transforms of accelerom-
eter and gyroscopic data in extracting the fundamental frequency of gait.
The phone was worn on the left pocket. The participant walked in-tune
with metronome frequencies and the FFTs were calculated in real-time. 34
3.4 Examples of axis-wise Fourier Transforms of accelerometer data for
three different locations on a user in the top panel. Locations are shown
in the lower panel. Here the orientation of the phone was kept constant.
A more red color indicates the stronger presence of that frequency com-
ponent. The most dominant presence of periodicities corresponded to
acceleration in the vertical direction (Y axis, up-down) and in the for-
ward direction (Z axis, leg-swing).This was prominent regardless of the
position of the phone. . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.5 Examples of axis-wise Fourier Transforms of accelerometer data when
the phone was worn on the right pocket in three different orientations.
Here the position is held constant but the orientation is changed. When
rotating the phone by 180 degrees, the resultant frequency components
are unchanged. When rotating the phone by 90 degrees, the vertical
component shifts from the Y axis to the X axis since the vertical axis has
changed. This indicates that tracking the periodicity of one’s bounce is
robust if one can keep track of the current vertical direction. . . . . . . 38
x
3.6 Examples of axis-wise Fourier Transforms of accelerometer data when
the phone was held in a backpack, in one’s hand and when on a phone
call. These represent cases where limb movement cannot be observed.
Locations are shown in the lower panel. A more red color indicates
the stronger presence of that frequency component. The most dominant
component was the acceleration in the up-down direction (Y axis for
the backpack case, Z axis for the in-hand case, combination of X and Y
axis for on-phone). This indicates that tracking the periodicity of one’s
up-down movement is a robust technique to estimate periodicity across
a variety of locations. . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.7 Performance comparison of the Momentary Fourier Transform versus
the traditional FFT as a function of shift in the window. Momentary
Fourier Transforms have a reduced computational load when only the
frequency information for only a narrow part of the spectrum is desired
and when it is required to calculate the Fourier Transform incrementally. 45
4.1 Illustration of energy expenditure versus frequency for a single partic-
ipant. Figure 4.1a illustrates the simultaneous capture of frequency-
based features and energy expenditure. Here a participant was initially
at rest followed by walking at three different speeds on a treadmill be-
fore slowing down. There is a clear visual relationship between the most
dominant frequency and the energy expenditure. Figure 4.1b illustrates
the relationship between the most dominant frequency and energy ex-
penditure for the walking section of the experiment along with linear or
nonlinear fits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2 Graphical Representation of Least-Squares Regression. Dots represent
numbers or parameters, filled circles are observed random variables. . . 53
4.3 Graphical Representation of Bayesian Linear Regression. Dots repre-
sent numbers or parameters, filled circles are observed random variables. 55
4.4 Graphical Representation of Gaussian Process Regression. Dots repre-
sent numbers or parameters, filled circles are observed random variables
and unfilled circles are hidden random variables. The solid line indicates
that all functions are connected. . . . . . . . . . . . . . . . . . . . . . 58
4.5 Illustration of data recording procedure. The sensor was worn on the
right illiac crest. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
xi
4.6 Illustration of variation of prediction accuracy (measured by Average
RMS prediction error across all participants) with various movement
descriptors and algorithms. Results are grouped row-wise by algorithm
and column wise by sensor stream. LSR accuracy depended on amount
of training data. BLR and GPR showed consistently reduced errors with
increase in training data size. With BLR and GPR, use of all 3 axes
as features improved prediction accuracy as opposed to using just one
sensor axis. The best individual axis was corresponded to movement in
the up-down direction. . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.7 Illustrating the effect of combining triaxial accelerometer and gyroscopic
information (measured by average RMS prediction error across all par-
ticipants) in the case of BLR and GPR. Accelerometer and gyroscope
provide similar results when used separately. . . . . . . . . . . . . . . . 67
4.8 Illustration of relative algorithmic performance when triaxial informa-
tion from all sensors is used (measured by average RMS prediction error
across all participants). With increasing number of data points GPR be-
gins to perform comparably with BLR. . . . . . . . . . . . . . . . . . . 68
4.9 An illustration of the relationship between accuracy and run time for
LSR, BLR and GPR for a single participant. Scatter points of each class
represent different training percentages of the same class and feature
space. The best algorithm has to be as close to the origin as possible
(lowest error and lowest run time) . . . . . . . . . . . . . . . . . . . . 69
5.1 Illustration of energy versus frequency for multiple participants along
with line of best fit. Analysis of each individual line of best fit shows a
poor fit across participants. Participants who share similar morphology
show similar lines of best fit. This points to a more general approach that
uses morphological similarities across people to generate personalized
maps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.2 A graphical model showing the relationship between variables in a hi-
erarchical linear model. A two-level dependence is assumed. At the
lower, intra-person level, a linear relationship between a person’s move-
ment and energy consumption is formulated. At the higher, inter-person
level, the model parameters themselves are linearly dependent on the
morphological descriptors and population parameter k. . . . . . . . . . 81
5.3 Characteristics of study population plotted as height versus weight with
normal and overweight regions shown. Men are shown as triangles and
women are shown as circles. Average height was 1:73 0:07m and
average weight was 69:7 7:5kg . . . . . . . . . . . . . . . . . . . . 85
5.4 Illustration of recording procedure. Kinematic data were collected with
a sensor mounted on the right iliac crest. . . . . . . . . . . . . . . . . . 86
xii
5.5 Relative performance of algorithms as predicted by Average Root Mean-
Squared Error (ARMSE). HLM performance with all possible combina-
tions of descriptors resulting in 63 different hierarchical models. . . . . 89
5.6 Average score across participants for each individual descriptor. de-
scriptor combinations were ranked according to the ARMS errors that
they produced and the ranking per descriptor was extracted and aver-
aged across participants. Lower is better. Weight and height showed
the lowest ranking while sex, REE and RHR showed the highest rank-
ings. The effect of sex was absorbed by the weight and height since the
population on average weighed less and were shorter. . . . . . . . . . . 91
5.7 Illustration of the predictive capability of each algorithm when limited
training data were available. Lower is better. Hierarchical models per-
formed as well and in some cases were better than personal models.
However, they were able to achieve this with no prior information about
the participant other than their morphological descriptors. . . . . . . . . 92
A.1 Illustration of hardware used to capture treadmill walking information.
Acceleration information was collected with a Freescale MMA7260Q
triple-axis accelerometer. Rotational rates were collected with 2 In-
vensense IDG500 500
=s gyroscopes mounted perpendicular to each
other. The sensor hardware was modified to be worn with a custom de-
signed harness on the right iliac crest. The yellow box indicates sensor
mounting. The red box indicatesVO
2
recording via the mask leading to the
metabolic unit. Original image source for (a) and (b): www.sparkfun.com 118
A.2 Illustration of hardware, ground truth collection and population statis-
tics. Triaxial accelerations and rotational rates were recorded with a
phone on the right iliac crest. Energy expenditure was measured with
the Oxycon
TM
portable metabolic unit. Heart rate was measured with
a Polar Heart Rate monitor that was time-synced to the metabolic cart.
GPS measurements were taken using a mobile phone. . . . . . . . . . . 121
A.3 Split of population showing the variation of energy expenditure with
different activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
A.4 Screen shots of Movement Trackr Android Application . . . . . . . . . 125
C.1 The relative accuracy ratings of using only power as a feature compared
to a partial feature set of power and covariance between accelerometer
and angular rotation speed and using all sixteen features noted in Section
2.2. Using additional features helped the algorithms separate between
stationary and moving behaviors. . . . . . . . . . . . . . . . . . . . . . 141
C.2 Classification accuracies for the machine learning algorithms when trained
on features framed locally versus those framed globally. Training on
globally framed features resulted in more accurate classification. . . . . 141
xiii
C.3 The relative accuracy ratings of using only power as a feature compared
to a partial feature set of power and covariance between accelerometer
and rotation speed and using all sixteen features. The additional fea-
tures helped the machine learning algorithms overcome the noisy data
of mixed behaviors. . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
C.4 Classification accuracies for the machine learning algorithms when trained
on locally versus globally framed features when the data included tran-
sitions. Once again using globally framed coordinates aided classification.144
xiv
Abstract
Healthcare is undergoing a paradigm shift from the episodic, expert-driven, curative
approaches of the past towards a self-empowered, preventative model for the future.
Central to this is the treatment of chronic illnesses. This treatment will require the adop-
tion of behavioral changes on one’s lifestyle. A particular illness is that brought about
by the negative effects of physical inactivity. Regular physical activity is associated with
decreased mortality, lower risk of cardiovascular disease, diabetes mellitus, colon and
breast cancer. Despite this knowledge, physical activity levels are not adequate.
Promoting physical activity will rely on designing appropriate intervention measures
that bring about behavioral change. Central to this is the need to accurately measure and
characterize physical activity in a cost-effective yet ubiquitous manner. One charac-
terization of physical activity is the energy expended as a result of movement. In this
dissertation, we aim to demonstrate how kinematic sensors in combination with statisti-
cal techniques can accurately predict energy expenditure due to physical activity.
We cast the problem of determining energy expenditure in a mathematical frame-
work and discuss various functional maps. We rely primarily on data-driven regression
techniques to derive functional maps from movement to energy expenditure given a per-
son’s morphology. We focus on common movements such as walking or running.
In order to accurately estimate estimate energy expenditure, we derive a set of
frequency-based features that are robust to location on the human body and orientation.
xv
We show how one can use the up-down movement of the center of mass of the human
body to robustly characterize cyclic movement. Further, we demonstrate the utility of
Fourier transform-based techniques to in detecting the periodicity of this movement. We
also show a modification of the Discrete Fourier Transform to work in a recursive setting
to efficiently calculate frequency components.
We proceed to determine the most accurate technique to map movement to energy
expenditure given sufficient data for a person. We present three algorithms - Least-
Squares Regression (LSR), Bayesian Linear Regression (BLR) and Gaussian Process
Regression (GPR). We compare prediction accuracies using different sensor streams and
algorithms. A comparative study of accuracy versus inference time is also performed.
This work presents contributions in the comparison of non-probabilistic, probabilistic,
linear and nonlinear regression algorithms for wellness parameters.
We extend this work to be able to generate maps given a minimal set of morpho-
logical descriptors such as height, weight, age etc. We present and compare a set of
models including nearest neighbor models, weight-scaled models, a set of hierarchi-
cal linear models and speed-based approaches. We show how these approaches can be
used to evaluate the best subset of morphological descriptors and the best individual
descriptor to generate personalized maps across people. This work tackles the problem
of predicting morphological outcomes when minimal data are available by transferring
information from people for whom data are available.
These contributions are a step towards designing cost-effective, accurate and ubiqui-
tous solutions to estimate physical activity levels and designing interventions based on
accurately measured data.
xvi
Chapter 1
Physical Activity and Energy Expenditure
A bear, however hard he tries,
Grows tubby without exercise.
—A.A. Milne
M
ODERN healthcare is undergoing a paradigm shift from the episodic, expert-
driven, curative approaches of the past towards a self-empowered, preventative
model for the future. One of the main reasons for this is the change in nature of illnesses
afflicting society today. The world today is faced with the rising burden of chronic
diseases. Chronic diseases accounted for 60% of the 59 million reported deaths in the
world in 2004 [2]. Of these, the top five chronic diseases: high blood pressure, tobacco
use, high blood glucose, physical inactivity and obesity accounted for 22 million deaths.
This number is projected to increase because of an increasingly aging population, the
spread of these diseases in the developing world and the rise in children with unhealthy
lifestyles. This projection along with already existing high prevalence rates of seden-
tary time, low physical activity, poor diet and stressful living suggests a serious health
concern.
1
Tackling chronic illnesses hinges on the fact that their treatment requires the patient
to adopt behavioral and lifestyle changes to promote positive well-being. This requires
a multipronged approach. Patients with chronic illnesses will need accurate tools to
track and monitor their daily well being. Having such measures will improve awareness
about their own personal health. This suggests the development of cost-effective sensing
techniques that track indicators of health. Such sensors would need to be smart enough
to develop models about the patient they are monitoring and provide alerts if they detect
anomalies. The ability of these sensors would be further enhanced if they could be so
unobtrusive as to be forgotten. In addition, the vast data that sensors can provide will
allow care givers to assess their patients’ progress in a larger, continuous context rather
than the periodic checkpoints of the past.
In addition to accurate sensing, effective treatment of chronic illnesses will require
the information obtained from these sensors to be presented in a meaningful and engag-
ing manner. Effective presentation allows users to develop insights and connect with
their body’s health. This in turn has the potential to change their perceptions about
themselves and encourage new behaviors. Thus the information must be presented so
as to cause people to reflect on their behaviors and bring about behavioral changes in
their lifestyle. It is also important to engage patients with their family, peers or care
providers during the treatment process. Peer effects are a key component in bringing
about permanent, positive behavioral habits. It is again important to achieve this in a
low-cost setting. Here, technology-based solutions have the potential to enhance care
by providing cost-effective communication tools between the main parties regardless of
location or time of day.
2
1.1 Physical Activity and Health
Physical activity is defined as any kind of bodily movement that is produced by the con-
traction of muscle and raises energy expenditure above resting levels. There is evidence
to show that human beings were designed to be physically active [3]. Humans are capa-
ble of up to a 10-fold increase in metabolic rates for short periods of time. Conversely,
being sedentary for long periods poses excessive risks to one’s health. From an evolu-
tionary perspective, being physically fit would have provided early humans an advantage
in fleeing from predators or capturing prey. Human beings are anatomically favored to
be excellent long distance runners [4]. Fossil evidence indicates that endurance running
is a derived capability of the genus Homo, originating 2 million years ago and is thus an
instrumental component in our lives.
Research in physical activity epidemiology requires effort on multiple fronts. There
is a need to establish dose-response relationship between physical activity and risk of
disease. This is important to understand the full implications of our current lifestyles.
In addition, there is a need to develop methods to accurately assess physical activity be-
havior. Accurate assessment of behaviors will provide an objective baseline on which to
base future behavioral interventions. It is also necessary to identify factors that influence
behavior. These will provide the tools to learn or modify existing behavioral patterns.
On a related note, there is a need to evaluate and compare multiple interventions to
change behavior. Finally, it is important to translate research findings into practice to
reach the general public.
Regular physical activity has many health benefits [5]. Macera et al. [6, 7] showed
that regular physical activity is associated decreased mortality. Maintaining an active
lifestyle is also associated with lower risk of cardiovascular disease [8, 9, 10], diabetes
mellitus [11, 12], colon and breast cancer [13, 14, 15]. Regular physical activity along
3
with dietary control is an important mediator in treating obesity [16]. Increased physical
activity is also associated with mental health and well-being [17]. Furthermore, all these
approaches have shown evidence of a dose-reponse relationship: higher physical activity
is usually directly and indirectly associated with health benefits.
Despite this knowledge, physical activity levels are not adequate [18]. Matthews et
al. [19] showed that most people in the United States spend up to 8 hours a day just
sitting. A similar study among Australian adults by Healy et al. [20] has identified
that the majority of adult’s non-sleeping time (60%) is spent in being sedentary with
the remainder being incidental movement (35%) and a small fraction representing
moderate to vigorous physical activities (5%). The effects are prominent: lack of
physical activity kills more people than smoking [21]. In 2004, physical inactivity was
the fourth leading cause of preventable death worldwide [22].
In early human history, physical activity played a major role in all walks of life
including transportation, manual labor and leisure time entertainment. However, the
advent of industrialization and automation has brought with it a number of changes to
our lifestyles. These include the widespread availability of motorized transport, pow-
ered tools, the propagation of television and computer-based work stations, networked
communication platforms that eliminate the need to physically travel and the design of
urban spaces that make locomotion easy or favor motor vehicles. These conditions have
negated the the need for people to engage in physical activity other than for leisure.
However, even leisure time physical activity has declined [23].
It is not possible to undo advances in our lives because of the other conveniences that
they provide. Thus the task of promoting physical activity has to focus on redesigning
our lifestyles, behaviors and environments to increase activity levels throughout our day.
Central to redesigning our lifestyles is the need to accurately measure the quantity and
4
Figure 1.1: Our vision for robust physical activity monitoring: Accurate monitoring
relies on being able sense movement, characterize the movement data and model an
appropriate physiological response from the sensor data. The results of these sensing
techniques can be fed back to the user in an appropriate format. This interaction will al-
low the derivation of appropriate intervention techniques to modify user behavior which
can be sensed again.
quality of physical activity. Accurate and objective measurement provides a foundation
to assess an individual’s current physical activity levels and behaviors. Using these mea-
surements, one can then devise intervention measures to modify or develop new kinds
of behaviors. Physical activity is usually captured in one of five domains of measure-
ment [24]: (1) intensity, or the ratio of working metabolic rate to resting metabolic rate
(measured in Metabolic Equivalents or METs); (2) energy expenditure due to activities;
(3) time spent in activities; (4) type or label of activity (5) the frequency with which a
particular activity is undertaken.
1.2 Approach and Scope
In this dissertation, we focus on the problem of predicting energy expenditure due to
physical activity. We capture movement using off-the-shelf kinematic sensors such as
accelerometers and gyroscopes. Due to their small size, low cost, increasingly high
precision, low power consumption and portability, kinematic sensors are an attractive
option for deriving relevant morphological quantities from human movement.
5
Our main premise is that one can use movement captured using kinematic sensors
to determine energy expended. By definition, physical activity refers to movement that
raises energy expenditure above resting energy levels. Thus by accurately characterizing
one’s movement, one can determine energy expenditure due to that movement.
We take advantage of the natural properties of common activities to describe them.
For example, in the case of activities like walking or running, we use frequency derived
features to detect and estimate periodicity of the center of mass of the human body.
Our approach is data-driven. We collect examples of movement and energy expendi-
ture and fit statistical models that are consistent with the data. Our approach deals with
estimating the distribution of energy expenditure given a fixed movement.
We focus on utilizing phone-based kinematic sensors. Over 300 million smart phones
are sold per year and all of them carry kinematic sensors. The utilization of phone-
based platform is advantageous for a number of reasons. The use of phones allows high
compliance since people carry their phones with them all the time. Phones have high
computational capability and are networked, thus permitting just-in-time intervention
techniques.
We mainly focus on activities that are periodic in nature. In particular, we focus on
walking as it represents an important, common and easy to generate movement across a
wide variety of people to boost daily physical activity. However the techniques we pro-
pose can equally apply to other periodic activities such as running, cycling or rowing.
Our techniques assume that sensors are worn on the user for the duration in which activ-
ity characterization is required. This provides an advantage in being able to accurately
capture movement for common activities.
We do not focus on activity recognition. Instead, we assume that we know the ac-
tivity beforehand and derive meaningful features to describe the activity map to calories
6
consumed. This allows us to examine in detail the problem of generating functional
maps for a given activity. In the future, we plan on combining our techniques with
activity recognition algorithms to further generalize the capability of our approach.
Finally, our work does not address the problem of visualization of activity informa-
tion and intervention design. Appropriate communication of sensed information is im-
portant to be able to bring about the behavioral changes required for an active lifestyle.
We leave this for future work.
1.3 Contributions of Dissertation
Based on the approach and scope of our work, we present the following thesis:
Data-driven statistical techniques when applied to commodity on-body kinematic
sensors can accurately estimate energy expenditure of the human body.
The challenge in this domain is to be aware of the context in which activity was
generated, the morphology of the person performing the measurement and the sensing
capability of the device making the measurement. Our work fixes the context by consid-
ering a smaller set of activities and analyzes the sensing and morphological adjustments
required. In essence, we aim to provide a “sensibility” to the “sensing capability” of
kinematic sensors. Accordingly, this dissertation makes the following major contribu-
tions to the state-of-the-art in energy expenditure estimation using kinematic sensors:
1.3.1 Mathematical Formulation of Energy Expenditure Prediction
We cast the problem of obtaining regression maps for a certain activity in a mathe-
matical framework and then describe various regression models using this framework.
7
Using this model, we show how the problem of energy expenditure prediction can be
generalized across previous work.
1.3.2 Robust Characterization of Movement in the Frequency Domain
We present a frequency-based representation of movement using kinematic sensor data.
The method relies on the periodicity of many common activities in our day. In particular,
activities such as walking and running involve a periodic displacement of the center of
mass of the human body that can be tracked in multiple locations. We show that these
features are robust to placement and orientation and can be calculated efficiently in real-
time.
1.3.3 Development of Personalized Energy Expenditure Maps
We develop a set of techniques that define a functional map from movement measured
using kinematic sensors to energy expended. The method treats this map as a regression
problem. Through a comparison of different techniques, we show that one can be as
accurate as state-of-the-art medical equipment.
1.3.4 Development of Energy Maps with Minimal Data
We describe a set of techniques that allow one to generate functional maps from move-
ment to energy expenditure using minimal information about them viz. their morpho-
logical descriptors. We perform a comparative analysis of the generalization capability
of these algorithms and show how using a hierarchical approach to transfer information
across people can be used to generate maps from movement to energy expenditure.
8
The work described above is verified through extensive data collection studies and
laboratory experiments. Our results demonstrate the accuracy of a family of regression
algorithms and their potential for use in everyday applications. Together, these contri-
butions represent a step towards the development of an accurate, cost-effective, sensor
for pervasive activity monitoring and energy expenditure estimation.
1.4 Organization
The dissertation is organized as follows. Chapter 2 reviews the problem domain and re-
cent literature. It then formulates a language that describes the mapping from movement
to energy expenditure. Chapter 3 describes and compares a set of regression techniques
to estimate a personal map from movement to energy expenditure. In Chapter 4, we
extend these techniques to describe how to learn maps with minimal data 1about a user.
Experimental results are also presented at the ends of Chapters 4 and 5. The disserta-
tion concludes in Chapter 6 with a summary of contributions and suggestions for future
work. Supporting material is provided in several appendices.
9
Chapter 2
Estimating Energy Expenditure
Lest men suspect your tale untrue,
Keep probability in view.
—John Gay.
T
HE human life processes of growth, work and self-maintenance rely on complex
interactions of food metabolism and conversions between various forms of energy.
Much of the energy consumed leaves the body in the form of heat. Hence the study of
this energy balance is termed human calorimetry. During the course of one’s day, en-
ergy is expended in primarily three ways. Basal Metabolic Rate (BMR) is the minimum
amount of energy that a body requires when lying in morphological and mental rest.
A closely related term is Resting Metabolic Rate (RMR). This accounts for 60-75% of
total energy expenditure. This is largely a function of a person’s size. Roughly 10% of
total energy expenditure is caused by using energy in digestion, absorption and trans-
portation of nutrients after food. The most variable component of energy expenditure is
that due to muscular activity. This activity includes minor physical movement or gross
10
Figure 2.1: An example of one of the first respiration-based ’open-circuit’ indirect
calorimeters designed by Zuntz et al (1906).
muscular work or physical exercise. Ordinarily, this accounts for roughly 15-30% of to-
tal energy expenditure but can be as high as 40% for active individuals. Thus, being able
to accurately measure energy expenditure is important because it provides insight into
body functions, levels of physical activity, muscle contraction due to active movement
and the natural processes of growth and repair.
2.1 Techniques to Measure Energy Expenditure
Historically, the first evidence of energy expenditure measurements dates back to 1660,
when Robert Boyle observed that mice sealed in bell jars expired at the same time when
a burning flame was extinguished. The first breakthrough was made by Lavoisier in
1783. He laid the foundation of calorimetry when he concluded that: ’La respiration
est donc une combustion’ or ’Respiration is therefore a combustion.’ Most importantly
11
Lavoisier and Seguin established a methodology through their works that has remained
the benchmark of human energy expenditure estimation techniques to this day [25].
2.1.1 Direct Calorimetry
Calorimetry comes in two flavors. Direct calorimetry measures total heat loss from a
body using a thermally insulated chamber. The heat dissipated from the body (through
various processes such as evaporation, radiation, conduction and convection) is mea-
sured precisely and converted to calories burned.
2.1.2 Indirect Calorimetry
A more commonly used form of measurement is indirect calorimetry. Indirect calorime-
try measures the rate of oxygen burned (
_
VO
2
) and rate of carbon dioxide (
_
VCO
2
) pro-
duced. In the presence of O
2
, the body’s primary fuel sources (carbohydrates, fats or
protein) are broken down intoCO
2
and water, liberating energy in the form of adeno-
sine triphosphate (ATP). Under the assumption that all oxygen is used to oxidize fuels
and allCO
2
is recovered, one can then calculate energy expended. Thus, by measuring
the rate of exchange ofO
2
andCO
2
when performing a particular task (e.g., standing,
walking, running on a treadmill), the energy cost of that task can be determined.
2.1.2.1 Doubly Labeled Water
One form of indirect calorimetry is the use of Doubly Labeled Water (DLW). This was
first reported by Schoeller and van Santen [26]. In this method, an individual takes a
prescribed oral dose of water containing a known amount of non-radioactive isotopes
of both hydrogen (
2
H) and oxygen (
18
O). These mix with the normal hydrogen and
oxygen in he human body. Oxygen is lost asCO
2
and hydrogen is lost as water through
12
sweating, respiration, urine and evaporation. Since breathing is more frequent, oxygen
is lost faster than hydrogen. Accounting for this difference reflects the rate of CO
2
production. The amount of O
2
ingested is measured by assuming a standard western
diet or self-reporting of diets. This can then be used in combination with energy balance
equations to determine energy expenditure. The DLW method is useful in cases where
other forms of indirect calorimetry are not possible, particularly in infants, severely
injured people or pregnant/lactating women.
2.1.2.2 Metabolic Units
Another form of indirect calorimetry involves measurement of
_
VO
2
and therefore en-
ergy expenditure. Devices that implement this technique are called metabolic units. One
family of techniques employ the so-called ’closed-circuit’ method. Here the participant
is isolated from outside air. He/she then breathes normally into a respirometer that orig-
inally contains pure oxygen. Through the course of the study, the relative volume of
oxygen and carbon dioxide change in the respirometer and the measured rate of change
is converted into energy expenditure. This technique breaks down during periods of
prolonged, strenuous exercise.
The second and more commonly used technique is the ’open-circuit’ method which
involves letting outside air pass through a hood worn by a participant. The air flow and
the relative volume of oxygen and carbon dioxide are measured from a sampling of the
air and converted into equivalent energy expenditure [27]. In a laboratory setting, this
can be achieved in real-time using airflow with electronic samplers. However, because
of their size and complexity, these cannot be used to calculate energy expenditure from
free living activities. In scenarios requiring calorimetry in everyday conditions, the
most popular method is the Douglas bag technique where air sampling is carried out
13
in a Douglas bag or Tissot tank worn on the back. The first such apparatus was first
designed by Zuntz et al. [28] and the successive 100 years have allowed the development
of a range of ever more sophisticated devices. A more comprehensive review of such
devices can be found in Macfarlane [29] and Meyer et al [30].
2.1.2.3 Questionnaires
Yet another form of indirect calorimetry involves extensive usage of questionnaires and
activity recall [31, 32, 33, 34]. Participants are required to keep a constant log of their
activities and these are further validated against more objective techniques such as dou-
bly labeled water. Such techniques are of great advantage when assessing energy ex-
penditure in large population or cohort studies. However, they suffer from recall and
selective reporting biases. Users do not have an accurate memory about what activity
they performed did or simply do not report certain activities. Recently, these diaries
have been replaced by digital techniques [35, 36, 37]. These can be combined with the
Compendium of Physical Activities [38] to obtain approximate estimates of energy ex-
penditure. This makes them less susceptible to recall errors. However, they still suffer
from reporting bias and poor compliance.
2.1.2.4 Pedometers
Pedometers are devices that estimate the number of steps taken by their wearer [39].
Pedometers can be mechanical or electronic. The numbers of steps can be combined
with morphological descriptors such as height, weight or age to produce estimate of
energy expenditure. Pedometers are useful to estimate ambulatory activity but do not
generalize well for other kinds of activities involving upper trunk movement or arm
movement. However, they still represent a cost-effective technique for a rough estimate
14
of energy expenditure since ambulatory activities account for a large proportion of daily
energy expenditure due to physical activity
2.1.2.5 Heart rate monitors
Another technique for estimating energy expenditure utilizes a person’s heart rate infor-
mation (number of beats per minute) by developing functional maps from heart rate to
energy expended [40]. Heart rate-based techniques are limited by the inter-person and
intra-person variability in heart rate among individuals. For the same person, due to
changes in stress levels, fitness, age or altitude, the heart rate might change for the same
activity. Another issue with using heart rate is that it only provides an idea of the gross
stress on the body and not the context in which the body expends energy. It is possible
that a person may have a similar heart rate while being in different activities (e.g., yoga
versus brisk walking).
2.2 Kinematic Sensors and Energy Expenditure
Figure 2.2 summarizes the various technologies described in section 2.1 in a accuracy-
versus-cost domain. The size of the bubble represents the ease of deployment of each
technology. Whole room calorimetry techniques while being very accurate cannot be
widespread because of the impracticality of building these chambers on a scale that suf-
ficiently encompasses the geographical space that a user lives in.
_
VO
2
-based indirect
calorimetry techniques are easier to use. However, they are still expensive and cannot
be worn in every-day settings. The doubly-labeled water technique suffers from dis-
advantages, including the high cost of labeled water and lack of temporal resolution
when making measurements. At the other end of the spectrum, survey-based techniques
15
(a) Comparison of accuracy versus cost of current
techniques to estimate energy expenditure. The
size of the bubble indicates ease of deployment.
(b) Metabolic units are shown on the left.
Whole-room calorimetry is shown on the
right.
Figure 2.2: Illustration of various techniques to estimate energy expenditure.
are cost-effective and easy to deploy. However, they are prone to bias and missing in-
formation. What is needed is a technology that allows accurate estimation of energy
expenditure while being cost-effective and easy to deploy.
2.2.1 Kinematic Sensors
This thesis focuses on indirect calorimetry involves using kinematic sensors. A kine-
matic sensor is a device that measures the movement of the body to which it is attached.
The most common kinematic sensors are accelerometers (which measure accelerations)
and gyroscopes (which measure rotational rates). Kinematic sensors, by definition cap-
ture movement. Hence, it is a natural extension of thought that kinematic sensors on the
human body will be able to capture and quantify human movement.
Miniaturization, low power consumption, plummeting costs and ever-increasing ac-
curacy have led to the ubiquitous deployment of kinematic sensors in any everyday
situation that requires tracking of movement. Approximately 300 million smart phones
16
will be sold in 2013 and all of them ship with kinematic sensors or the ability to in-
terface with external kinematic sensors through wireless interfaces. Such a pervasive
presence opens up a tremendous opportunity in using commodity kinematic sensors to
bring calorimetry into the hands of the everyday user. Kinematic sensors thus offer
the potential to satisfy the need to be able to deploy cost-effective energy expenditure
monitoring without sacrificing on the integrity of objective, accurate measurement [41].
2.2.2 Estimation Methodologies
There are two kinds of methodologies involved in converting kinematic sensor data to
energy expended. At the highest level of granularity is modeling of the intricate interac-
tions between different limbs and converting these to an equivalent energy expenditure
model using biomechanics [42, 43, 44, 45]. Such models typically first estimate veloc-
ity by integrating accelerometer data. These can then be combined with the segmental
mass of each body segment to determine energy spent. This poses a problem because
of errors accumulated when integrating accelerometer signals. Also, these approaches
ignore the effect of internal interaction of muscles.
A second approach adopts a purely experimental perspective. Here, one or more
kinematic sensors are worn by a user in various locations on the human body and ob-
served streaming kinematic data are collected. These data are then split into window seg-
ments or epochs. From each of these epochs, relevant properties of movement (known
as features) are extracted. Simultaneously, ground truth readings are recorded from a
more precise form of calorimetry such a metabolic cart or a isolated heat chamber. The
problem is then transformed into one of mapping epoch-based features to ground truth
energy expended. These maps are then used in the general population. At this stage,
17
Device Name Sensors Output Interface
Count-Based
Actigraph GT3X Triaxial accelerometer Counts, Steps PC software
(www.theactigraph.com)
Actical Triaxial accelerometer Counts PC software
(actical.respironics.com)
RT3 Triaxial accelerometer Counts PC software
(www.stayhealthy.com)
Pattern Recognition-Based
SenseWear Armband Triaxial accelerometer, Raw acceleration, temperature PC software
(www.bodymedia.com) GSR, Skin temperature GSR, steps
IDEEA Triaxial accelerometer Raw acceleration, posture, PC software
(www.minisun.com) gait
DirectLife Triaxial accelerometer Energy expenditure, time Web-based
(www.directlife.philips.com) spent in walking and running interface
Fitbit Triaxial accelerometer Activity, energy expenditure, Web-based
(www.fitbit.com) steps interface, mobile app
Nike Fuelband Triaxial accelerometer Fuel points Web-based
(http://www.nike.com) interface, mobile app
Table 2.1: Comparison of common activity monitors. For a more complete list refer
Welk et al [1].
kinematic data are said to be validated. Indirect calorimetry using kinematic sensors
involves finding the most accurate such map.
2.2.3 Commercial Activity Monitors
Much of the research involving using kinematic sensors to calculate energy expenditure
has focused on the utility of accelerometers alone [46, 47]. These devices can be divided
into two generations. Table 2.1 outlines the characteristics of common senors in practice.
2.2.3.1 Count-based monitors
Until 2009, early sensors used proprietary methods to convert linear accelerations into
proprietary descriptors of movement such as counts. Chen at al. [48] used counts gener-
ated by a triaxial accelerometer to predict energy expenditure. A similar approach was
followed by Freedson et al. [49] in the validation of a triaxial accelerometer. These were
extended to field studies by Hendelman et al. [50] and Nichols et al. [51]. More recently,
Swartz et al. [52] and Klippel et al. [53] evaluated the validity of newer accelerometers
18
in predicting energy expenditure. Reilly et al. [54] evaluated their validity in children.
Plasqui et. al [55] compared the predictive capability of multiple accelerometers against
doubly labeled water techniques. Heil et. al [56] tested the utility of counts for modeling
general physical activity outcomes.
A limitation with count-based techniques is that since the methods used to generate
counts are proprietary, one cannot determine if these units are meaningful or physically
interpretable [57]. Researchers have attempted to work around this by validating these
counts against calories burned using simple linear regression techniques [58, 59] or
by using thresholds on counts to classify activity as low, moderate or vigorous [60].
Given a threshold, separate regression equations can be applied for the counts within
each division. These methods have had limited success for a number of reasons. First,
equations learned for one population for a particular activity are not guaranteed to work
for another population or different activity. Second, it is not clear how one should choose
thresholds for a population. A low intensity activity for one population (e.g. running
for children aged 8-10) might constitute vigorous activity for another (e.g. running for
adults aged 60-65). Third, using thresholds encodes energy expenditure in very low
resolution.
2.2.3.2 Pattern Recognition-based monitors
When looking at kinematic measurements, there is considerable advantage in looking
at raw kinematic data. Access to raw data allows the researcher to explore the physical
intuition behind movement and use features that explicitly mirror the quantity in ques-
tion [61]. These data need to be viewed in the context of the activity being performed,
the person performing the activity and their temporal and spatial setting among others.
Contextual information is important to improve energy expenditure information and to
19
better understand the relationships between physical activity and behavior [62]. Such
techniques have been applied successfully applied to infer activity type [63, 64]. Tapia et
al. [65, 62] showed that activity specific equations have the capability to improve recog-
nition rates over traditional count-based techniques. Choi et al. [66] describe a two
level regression framework to map integrated accelerometer signals to energy expendi-
ture for treadmill walking and running. Rothney et al. [67] described the successful
use of an artificial neural network to estimate energy expenditure. This represents an
example of developing nonlinear models from movement to energy expenditure. Albi-
nali et al. [68] showed that using separate regression equations by keeping in context
the activity performed improves the accuracy of energy predictions. Altini et al. [69]
used clustering techniques to provide accurate estimates of energy expenditure. This
underscores the need for the development of more sophisticated user models to predict
energy expenditure for unseen data.
This line of thinking has influenced the design and development of a new genera-
tion of “smart” sensors targeted at both consumer and clinical grade activity monitoring.
These sensors are context sensitive and fuse multimodal information to obtain more ac-
curate estimates of energy expenditure. In addition, they take advantage of the offsite
databases and internet connectivity to store and process information. While the algo-
rithms on the sensors are certainly more advanced than was previous available, there
still remains a lot to be explored in how one can accurately process the data that they
generate.
2.2.4 Energy Expenditure as a Regression Problem
Pattern recognition-based techniques point to a more general idea of learning the statis-
tics of how someone moves and how that movement causes them to expend energy.
20
Fitting functional approximations from movement descriptors to calories burned is a
special case of the machine learning problem of regression. In regression, a functional
approximation is learned from one multidimensional continuous space to another given
some example data for training. This functional approximation can then be used on
unseen data points. A question to ask is whether one can significantly improve energy
prediction with more sophisticated methods of regression. An ideal regression algorithm
would:
be aware of the context in which the activity is performed
be able to extract intelligent descriptors of movement depending on the activity
be able to assess the quality of the data provided to it
generate models that are accurate for a wide variety of people
be able to learn a model with minimal data from a person.
In this dissertation, we extend on previous work with a family of regression techniques
to generate personalized predictions of energy expenditure given kinematic data.
2.3 Mathematical Formulation
Our goal is to identify an accurate map from movement captured with phone-based
kinematic sensors to the energy expended as a result of that movement. Typically a
person, has a certain morphology that also determines this map. Thus, our goal is to
determine a functional map: (Movement;Physiology)
f
!EnergyExpenditure for a
person. This dissertation details details different techniques to find this map. We adopt
a data-driven approach, viz. we collect examples of a number of people performing
21
different kinds of movement and measure their energy expenditure. We then use these
samples to build a statistical model that maps movement to energy expenditure. One
of the main contributions of this dissertation is to identify a set of models and evaluate
them for accuracy in predicting energy expenditure.
To be able to describe the algorithms in subsequent sections, we first provide a
description of the language used in this dissertation. Consider a test population con-
sisting of P participants. For each participant p, we collect training data points in
the form of input-output pairs
x
np
;y
np
where n 2 f1; 2;:::Ng refers to the in-
dex of data point for each person, p, x
np
=
1 x
np;1
::: x
np;D
T
2 R
(D+1)1
is the D-dimensional descriptor of their movement and y
np
2 R is the energy ex-
pended by person p for that movement. Let there be N
p
such data points collected
for each person p. Thus for each participant p, we have a dataset consisting of the
energy matrix Y
p
=
y
1p
y
2p
::: y
Np
T
2 R
Np1
and movement matrix X
p
=
x
T
1p
x
T
2p
::: x
T
Np
T
2R
Np(D+1)
. We also record their corresponding morpholog-
ical descriptor Phys
p
. A morphological descriptor can be measurements like a person’s
height, weight, BMI, gender, resting heart rate or any nonlinear combination of ele-
mentary measurements. We denote Y =fY
1
; Y
2
;:::; Y
P
g, X =fX
1
; X
2
;:::; X
P
g,
PHYS =
Phys
T
1
Phys
T
2
::: Phys
T
P
T
to be the complete training data for
all participants in the population. Thus our aim is to determine a functional map:
x
np
; Phys
p
f
!y
np
for a personp and use this map to determine energy expenditure,
given an unseen example of their movement descriptor x
np
and personal morphological
descriptor, Phys
p
. Given examples, we build a statistical model that indicates what
the statistical distribution of energy expenditure, i.e.,p
y
np
jx
np
; Phys
p
is given their
movement and morphology.
22
2.4 Energy Expenditure for Walking
Throughout this dissertation, special emphasis is placed on predicting energy expendi-
ture from walking. Walking represents one of the most common activities in our day
[70]. Walking is an easily accessible form of moderate to vigorous exercise that can
be easily performed regardless of sex, ethnic group, age, education, or income level
[71]. Its most important advantage stems from the fact that it can be performed indoors
or outdoors and does not require special equipment or skill. In this regard, walking is
particularly important for its potential to reduce disparities in health related to lack of
physical activity. Time spent in walking has an important role in controlling blood glu-
cose. Knowler et al. [72] showed that a lifestyle intervention that included 150 min/wk
of brisk walking reduced the risk of advancing from glucose intolerance to diabetes by
over 50%. Regular walking is associated with the reduction of cognitive decline [73] and
risk of fall with age [74]. In particular, walking is an easily available intervention tool
to reduce sedentary lifestyles and therefore risk of cardiovascular disease and obesity
[75, 76]. Walking also reduces stress [77] and risk of depression [78].
From a measurement perspective, walking is an easily re-creatable as an exercise in
laboratory settings across a wide variety of people. Another important point is that it is
inherently cyclical in nature. Chang et al. [79] and Albu et al. [80] used this property
in segmenting movement. The periodic nature of walk lends itself well to tracking in
the frequency domain. In typical walk, because of the need to maintain balance, the
human body’s limb movements are usually correlated with each other. This implies
that periodicities can be extracted from a number of locations. This suggests that it is
possible to extract a robust descriptor of walking using kinematic sensors regardless of
location of the sensor on the body. A number of examples shown in this dissertation
23
refer to steady-state treadmill walking. This refers to walking (usually at a constant
speed) where the calories consumed are at a constant level.
2.5 Summary
Human calorimetry is the study of energy balance brought about by the natural pro-
cesses of growth, work and self-maintenance. Energy is expended in primarily three
ways: energy when at rest, energy due to digestion and energy due to muscular move-
ment. Being able to accurately measure energy expenditure is important because it
provides insight into body functions, levels of physical activity, muscle contraction due
to active movement and the natural processes of growth and repair. Energy expendi-
ture can be measured directly using whole room calorimetry or indirectly through
_
VO
2
estimation or doubly labeled water techniques. While these techniques are accurate,
they are expensive and cannot be deployed across large populations. At the other end
of the spectrum, questionnaires and surveys are prone to bias and missing data. What
is needed is a technology that allows accurate estimation of energy expenditure while
being cost-effective and easy to deploy.
The last decade has seen the emergence of kinematic sensors such as accelerometers
and gyroscopes as viable technologies to estimate energy expenditure. Since kinematic
sensors capture movement, they are a prime candidate for objective measurement of en-
ergy expended due to movement. A common technique to use kinematic sensor data
is to collect simultaneous examples of movement and energy expenditure and learning
a functional approximation that is consistent with the data. An ideal regression algo-
rithm would be aware of activity context, use intelligent descriptors of movement and
learn personalized models that are applicable for a large section of the population with
minimal data.
24
We lay special emphasis on walking. Regular walking can play a major role in
insulin regulation and prevention of falls and cognitive decline with age. The periodic
nature of walking lends itself well to tracking in the frequency domain. Walking is also
an easily measurable activity for validation of energy expenditure of algorithms.
25
Chapter 3
Robust Movement Capture using Frequency
Signatures
If you want to find the secrets of the universe,
think in terms of energy, frequency and vibration.
—Nikola Tesla.
A
N important piece in the problem of estimating energy expenditure from move-
ment is the issue of accurate and robust representation of movement. With respect
to our problem definition
x
np
; Phys
p
f
!y
np
, we are interested in finding a robust rep-
resentation of x
np
for different activities that is valid across multiple individuals.
Taking advantage of natural structure in human movement: It is important to
note that many gross motor movements of the human body have an inherent structure
to them. For example, walking and running are essentially periodic activities involv-
ing highly correlated movement of limb segments. The question then is can we take
advantage of this inherent structure to characterize these activities?
26
Efficient calculation: The widespread application of our techniques hinges on their
ability to be applied in portable platforms, particularly cellphones. Portable platforms
have to be efficient in their power consumption since they have limited battery life. Thus
techniques have to be efficient in their use of computational cycles. Given this require-
ment, we need to examine whether there exist efficient methods to calculate descriptors
of movement for portable form factors.
Robustness to location and orientation of the device: The descriptors that char-
acterize movement must be robust to the location and orientation of the device which
measures them. For this, the issue to be considered is how does one adjust for the loca-
tion and orientation of the sensor?
The problem domain: In this section, we address the problem of creating robust
descriptors of movement for calculating energy expenditure. We primarily focus on pe-
riodic activities such as walking or running because of their prevalence in our lives. We
use kinematic data from a phone-based triaxial accelerometer and triaxial gyroscope at
various locations. Through a data collection study, we show how frequency-based fea-
tures are robust to location and describe a preliminary technique to adjust for orientation
of devices. We also show an efficient way to calculate frequency components of a phone
signal using the Momentary Fourier Transform.
3.1 Related Work
3.1.1 Periodicity of Center of Mass Movements
The four most common gross motor activities in a day are sitting, standing walking and
running. An important point is that except for sitting and standing, all the activities
27
(a) Inverted pendulum
model of walking
(b) Spring mass model of running
Figure 3.1: Illustration of the biomechanical modeling of walking and running with cen-
ter of mass shown. Most common activities in our day are cyclical nature. Tracking the
periodic movement of this center of mass would provide a good descriptor of movement
for these activities.
are cyclical in nature [81, 82]. These activities involve repetitive movement of limb
segments to transport the center of mass of the body.
Figures 3.1a and 3.1b represent two common models of the most common gross
motor activities: walking and running. Walking is often modeled as an inverted simple
pendulum [83, 42, 43]. In this model, the person is modeled as a point mass connected
to a rigid beam with the foot as a pivot. During each leg’s stance phase, the point
mass vaults over the pivot point while the other leg swings when not in contact with the
ground. This is repeated alternately between legs. The center of mass is at its highest
point in the middle of the stance phase.
Running is often modeled as a simple spring-mass system [84, 85, 86, 83, 87]. In
this model, the person is modeled as a point mass that is connected to a beam by means
of a spring. On heel strike, the leg compresses (modeled by the compression of the
spring) until reaching maximum compression at the middle of the stance phase. During
this time, the other leg is in swing phase. The energy stored in muscles is then released
28
(modeled by the release of the spring) leading to a brief period when the system is in
air. This is repeated alternately between legs. In contrast to walking, the center of mass
is at its lowest point in the middle of the stance phase.
In both cases, the center of mass of the human body has a periodic motion. Another
important point is that since the human body can be modeled as a series of rigid bodies,
the center of mass motion of the human body will have a consistent signature regardless
of its location. In particular, mobile phones are usually placed in static positions with
respect to the human body. Thus, tracking this periodic motion would be a promising
approach to characterize movements such as walking or running. Positions resulting
from periodic movement can be represented as a combination of sinusoidal waves. An
important observation is that since acceleration and rotational rates are proportional to
the derivatives of the positions, they will also be periodic.
3.1.2 Time-Domain Techniques for Measuring Periodicity
Our goal is to be able to track and quantify the periodic movement of the center of
mass of the human body. This involves tracking the time period, amplitude and possible
harmonics in the period signal. Algorithms that detect such periodicity fall into two
categories. Time domain methods work with raw or derived time series data extracted
from the signal. The most common technique is peak counting, which as its name
suggests, involves detecting peaks in a time series [88], typically the sum of squares
magnitude of signals. The assumption made here is that the signal has peaks occuring
at regular intervals due to some physical phenomenon such as heel strike or the sensor
striking against the body.
Peak counting techniques have many limitations such as identifying heuristics to
determine a peak and sensitivity to body location. To offset these limitations, another
29
set of techniques involve identifying self-similarity in the signal. Here, a sub-sequence
in the signal is compared with a previous sub-sequence. If a certain distance threshold
is met, then the two signals are said to be matched and the process is repeated for future
sub-sequence. Typical distance measures used are Euclidean distance, auto-correlation
and dynamic time warping algorithms.
While such measures are more robust than peak counting techniques, they are still
bound by heuristics and sensitivity to choosing the right distance measures. It is pos-
sible that two sub-sequences are not identical but share the same distance measure as
identical sub-sequences. This can be prevented by incorporating a sense of temporal
dependence. This can be achieved using Hidden Markov Models [89]. Hidden Markov
Models (HMMs) model the time series as being due to a set of hidden states. For cycli-
cal time series, the states are modeled in a left-right fashion with reset. The resultant
time series is modeled as a set of observations that emanate from these states. Learning
the parameters of this model amounts to learning the probability of transition from one
state to another and the probability of observing a signal value given a state.
HMMs are the most robust among time-domain techniques to model periodicity.
However, an HMM trained with the sensor worn on one location of the human body
might not perform adequately in another body location. This is because the observation
probabilities of accelerations and rotational rates at that location would change. For this
incorporation of a location state variable is required. This involves modeling the system
as a Dynamic Bayesian Network [90] which needs approximation techniques.
3.1.3 Spectral Techniques for Activity Monitoring
A second category of techniques relies on analyzing sequences in the complimentary
frequency domain. The assumption behind such techniques is that periodic time series
30
can be approximated by a set of sine waves. Thus their representations in the frequency
domain would be sparse signals with peaks at the dominant frequencies of those sine
waves. Typically, such techniques rely on calculating the Fourier Transform of the time
series data and identifying periodicities in that space. In the particular case of walking
and running, Fourier transforms are particularly attractive because of the pervasive pres-
ence of the center of mass oscillations across the human body. Thus they are relatively
robust to position and orientation on the human body.
Much of the work in tracking activities through frequency-domain signal analysis
focuses on activity recognition. The intuition behind these approaches is that they cap-
ture the harmonic content of the signal and thus provide potentially useful information
for activity characterization. The most common technique is to obtain the Fast Fourier
transform of a signal within a given time period and then extract secondary features
of movement from the transform. Foerster and Farhrenberg [91] used accelerometers
worn on multiple sites on the human body and calculated walk frequency from Fourier
analysis of vertical accelerations of the sternum sensor . Bao and Intille [92] used fre-
quency domain features (such as frequency-domain entropy) in combinaton with time
domain features to predict activities. This was extended by Bonomi et al. [93] by using
frequency domain features such as spectral peak amplitude and most dominant peak in
combination with decision trees to classify common household activities. Matyjarvi et
al. [94] used frequency domain features to identify users based off gait patterns. Karan-
tonis et al. [95] used the Fast Fourier Transformation of acceleration in the vertical
direction to determine cyclical movements. Chung et al. [96] used frequency domain
features in distinguishing between running and walking. Ermes et al. [97] used fre-
quency domain techniques in classifying sports activities from accelerometers. A sim-
ilar approach was used by Cho et al. [98] in a buckle form factor. Preece et al. [99]
31
1 2 3 4 5 6 7 8 9
0
5
10
Time (s)
Accelerations (g)
1 2 3 4 5 6 7 8 9 10
−1
0
1
2
Time (s)
Rot. Rates (rad/s)
(a) Raw accelerations and rotational rates
0 1 2 3 4 5 6 7 8 9
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
Frequency (Hz)
Periodogram Magnitude
2.5 mph
3.5 mph
(b) Fourier Transform representation for walking
at two different speeds
Figure 3.2: Illustration of periodic signal capture of walking using a phone-based triaxial
accelerometer and gyroscope mounted on the iliac crest.
performed a detailed analysis of frequency domain techniques versus other kinds of fea-
tures and showed their utility in accurate classification for waist and ankle movements.
Lester et al. [63] extensively used frequency derived features to characterize human ac-
tivities. This particular result is important because the phone is most likely to be carried
in places close to the waist. He and Jin [100] showed how the direct cosine transform is
a good indicator for activity classification. Oshima et al. [101] used FFT-based features
for classifying household activities.
3.2 Robustness of CoM tracking in the Frequency Domain
Our goal is to be able to track center of mass movement using phone-based sensors. The
first step is to determine the feasibility of detecting frequencies. For this, we captured
triaxial acceleration and rotation rates, using a phone-based inertial sensor mounted on
the right iliac crest. Figure 3.2a shows sample inertial data from treadmill walking
collected over 10 seconds when a participant is walking at a speed of 2.5 mph. Regular
32
periodic patterns were observed in steady-state. Similar patterns were observed for other
speeds.The periodicity of walking signals was examined by computing their Fourier
transforms. Figure 3.2b illustrates Fourier transforms of two 10 second steady state
walking samples at 2.5 mph and 3.5 mph for the up-down acceleration streams. The
Fourier transform showed clear peaks indicating distinct periodic components for the
original signals. The peaks occurred at the same frequencies for all other sensor streams.
The location of these peaks was a function of walking speed. The dominant peak for
walking at 3.5 mph occurs at a higher frequency than the corresponding peak for walking
at 2.5 mph. This indicated the feasibility of periodic movement capture. It should also
be noted that the most dominant frequencies for walking occur in a narrow band of 1 -
2 Hz. A similar band of 2 - 3 Hz exists for running. This indicates that a bandpass filter
could be used to focus on activity specific frequencies.
We performed a data collection study to examine whether frequency information
could be captured at all possible center of mass frequencies. A single participant walked
to the beat of a metronome with the phone placed vertically in the left pocket. The pur-
pose of the metronome was to recreate a reliable set of frequencies as ground truth. The
participant walked for a minute at each metronome frequency. The metronome frequen-
cies were 85, 90, 95, 100, 105 and 110 beats per minute. A Galaxy Nexus S phone
logged accelerometer and gyroscope data at 50 Hz. Data from all 6 sensor streams
were bandpass filtered with 3dB cut-off frequencies between 0.9 Hz and 2.1 Hz. The
filtered six sensor streams where divided into 4 second epochs with a 5% overlap be-
tween epochs. The Fourier Transform was then calculated for each epoch. Figure 3.3a
illustrates the resultant Fourier transforms. A vertical band indicates the presence of
periodicities in the signal. It can be seen that the Fourier transforms of the accelerom-
eter in the Y (up-down) and Z (leg-swing) directions are dominant. The effect is less
33
(a) Visually, it can be seen that the FFTs of the ac-
celerometer in the Y (up-down) and Z (leg-swing)
directions are dominant. The frequencies shift to
the right with increasing ground truth metronome
frequency as can be seen when using the white line
as a reference. The effect is less prominent in case
of the gyroscopes. A gradual shift in the frequency
components of the accelerometer is seen with in-
crease in metronome frequency.
1.4 1.45 1.5 1.55 1.6 1.65 1.7 1.75 1.8 1.85
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
Ground Truth from Metronome (Hz)
Detected Frequency (Hz)
Correlation =0.98296
y = 1*x − 0.024
Data Points
linear
(b) Plot of the most dominant detected frequency in
the up-down direction versus the ground truth fre-
quency measured by a metronome. The dominant
frequency showed an almost perfect positive cor-
relation with the ground truth frequency indicating
the feasibility of our approach.
Figure 3.3: An example of the utility of axis-wise Fourier Transforms of accelerometer
and gyroscopic data in extracting the fundamental frequency of gait. The phone was
worn on the left pocket. The participant walked in-tune with metronome frequencies
and the FFTs were calculated in real-time.
prominent in case of the gyroscopes. A gradual shift in the frequency components of the
accelerometer is also seen with increase in metronome frequency. Figure 3.3b illustrates
a plot of the most dominant frequency extracted from the accelerometer Fourier trans-
form versus the ground truth from the metronome. The dominant frequency showed
an almost perfect positive correlation with the ground truth frequency indicating the
feasibility of Fourier transform based periodicity extraction.
We expanded this study to address whether these center of mass movements could
be tracked in different positions and orientations of the phone on the human body. The
same data collection process as described above was repeated for various locations and
orientations of the phone on the human body. We had two classes of locations: locations
where phone movement is tightly coupled with body movement such as pockets and
34
locations where phone movement was loosely coupled with body movement such as
in one’s hands. We chose our locations based on previous studies detailing common
locations of wearing a phone [102]. A similar study was carried out by Sun et al [103]
in the context of activity recognition. We focus more on the problem of ensuring that
features are robust so that any future classification or regression steps may be accurate.
35
(a) Left-Pocket (b) Right Pocket (c) Back Pocket
(d) Left Pocket (e) Right Pocket (f) Back Pocket
Figure 3.4: Examples of axis-wise Fourier Transforms of accelerometer data for three
different locations on a user in the top panel. Locations are shown in the lower panel.
Here the orientation of the phone was kept constant. A more red color indicates the
stronger presence of that frequency component. The most dominant presence of peri-
odicities corresponded to acceleration in the vertical direction (Y axis, up-down) and in
the forward direction (Z axis, leg-swing).This was prominent regardless of the position
of the phone.
Figure 3.4 illustrates the variation of accelerometer-based Fourier Transforms with
the phone worn in three different locations - left pocket, right pocket and back pocket.
All three axes are shown in each panel. The features with the highest strength were seen
in the accelerometer signals and hence only those are shown for brevity. In all cases, the
36
phone was upright and securely held. Strong periodicities were detected in the Y axis
(corresponding to up-down movement of center of mass) and Z axis (corresponding
to leg swing). These were prominent regardless of the position of the phone. This
indicates that tracking up-down movement and leg swing is a good technique to estimate
periodicities when in the pocket.
37
(a) Right Pocket (b) Right Pocket Upside
Down
(c) Right Pocket Side-
ways
(d) Right Pocket (e) Right Pocket Up-
side Down
(f) Right Pocket Side-
ways
Figure 3.5: Examples of axis-wise Fourier Transforms of accelerometer data when the
phone was worn on the right pocket in three different orientations. Here the position is
held constant but the orientation is changed. When rotating the phone by 180 degrees,
the resultant frequency components are unchanged. When rotating the phone by 90
degrees, the vertical component shifts from the Y axis to the X axis since the vertical
axis has changed. This indicates that tracking the periodicity of one’s bounce is robust
if one can keep track of the current vertical direction.
We repeated the experiment keeping the location constant but varying the orientation
of the phone. Figure 3.5 illustrates the variation of Fourier Transform features with
38
the phone worn on the same location (right pocket) in three different orientations - up,
upside-down and sideways. It was observed that the features remain unchanged when
the phone is rotated by 180 degrees (up versus upside-down). This can be understood
by the fact that the frequencies of vertical up-down and leg swing movements remain
unaffected by a flip of the phone. However, when the phone is placed sideways (a
rotation of 90 degrees), the vertical component shifts from the Y axis to the X axis.
Since the vertical axis has changed, the periodicities shift to that axis. This further
reinforces the need to track the vertical orientation of the phone.
39
(a) Back Pack (b) In Hand (c) On Phone Call
(d) Back Pack (e) In Hand (f) On phone
Figure 3.6: Examples of axis-wise Fourier Transforms of accelerometer data when the
phone was held in a backpack, in one’s hand and when on a phone call. These represent
cases where limb movement cannot be observed. Locations are shown in the lower
panel. A more red color indicates the stronger presence of that frequency component.
The most dominant component was the acceleration in the up-down direction (Y axis
for the backpack case, Z axis for the in-hand case, combination of X and Y axis for
on-phone). This indicates that tracking the periodicity of one’s up-down movement is a
robust technique to estimate periodicity across a variety of locations.
We also examined situations where phone is not directly on a person. Typically
the user is carrying the phone in their backpack, purse or is holding the phone in their
hand while looking at maps. Figure 3.4 illustrates the variation of Fourier Transform
40
features with the phone worn in three such locations - in the backpack (set vertically
and securely held), in one’s hand and diagonal when on a phone call, in the form of
a pseudo-color plot. All three axes are shown in each panel. A single band indicates
the presence of periodicities in the signal. In contrast to the pocket case, the effect of
leg-swing is not seen. However, in both cases, the strongest periodicities were detected
in the up-down direction viz. Y axis when in the back-pack and Z axis when in the hand.
An interesting observation was that there is a small periodicity observed even in the Z
direction corresponding to forward movement. This was more pronounced when the
user is on the phone and walking since the phone is diagonal. This can be understood
by the fact that the phone was not perfectly vertical. However, it was still securely held
so that it did not oscillate. Thus components of the vertical movement would leak into
other dimensions.
3.3 Efficient Computation of Frequency Content
Typically, the frequency content of a discrete time series is captured using the Fast
Fourier Transform [104]. The Fast Fourier Transform (FFT) is an efficient implemen-
tation of the Discrete Fourier Transform (DFT). Given a signalx[n]; n2 [1; 2;::], we
consider a set of signals in a particular time window of size N, viz. n2 [1; 2;:::;N].
For these n-samples, the N-point DFT is:
Y
k
=
N1
X
n=0
x
n
e
i2kn=N
:k2 [1; 2;::N]
Typically, the FFT is applied on a sliding window of size N. The traditional FFT-based
calculation of the Fourier Transform is inefficient for a number of reasons. First, it is
limited by the need for N to be a power of 2. This means that in order to calculate
41
the FFT, there has to be at least N samples at any given time. For large N, this can
delay the response of the FFT to new data. This is disadvantageous when dynamically
estimation frequency information. However, this must be balanced by choosing a large
enough N to obtain the right frequency resolution. Second, in our case, we are interested
in only those frequencies that correspond to human movement which is a subset of the
total frequencies for which the FFT is calculated. Calculating the FFT for the set of
frequencies which are not relevant is a waste of resources. Thus there is a need to
optimize the FFT for these parameters.
3.3.1 The Momentary Fourier Transform
3.3.1.1 Algorithm Description
The Momentary Fourier Transform [105, 106] is an adaptation of the DFT to address
these limitations. The Momentary Fourier Transform casts the DFT algorithm in an
incremental form, thus allowing a smaller number of calculations. We briefly describe
the matrix form of this transform described by Albrecht et al [107]. We arrange the
windowed samples of the signals[n]; n2 [1; 2;::N] in the form of a column vector:
s
i1
=
2
6
6
6
6
4
x
iN
:
:
x
i2
x
i1
3
7
7
7
7
5
;
s
i
=
2
6
6
6
6
4
x
i(N1)
:
:
x
i1
x
i
3
7
7
7
7
5
:
We consider theNN elementary cyclic permutation matrix:is:
42
P =
2
6
6
6
6
4
0 1 0 : 0
: 0 1 0 :
: : 0 1 0
0 : : 0 1
1 0 : : 0
3
7
7
7
7
5
:
We can see that the two column vectors s
i
and s
i1
are related to each other by:
s
i
=
2
6
6
6
6
4
s
i(N1)
:
:
s
i1
s
iN
3
7
7
7
7
5
+
2
6
6
6
6
4
0
:
:
0
s
i
s
iN
3
7
7
7
7
5
= Ps
i1
+
i
The Fourier transform can be considered a linear transformation mapping the vector s
i
8i
to a new space. If one considers the matrix:
T =
2
6
6
6
6
4
1 1 1 : 1
1 !
(N1)
!
2
: !
(N1)
1 !
(N1)
!
4
: !
2(N1)
: : : : :
1 !
(N1)
!
2(N1)
: !
(N1)(N1)
3
7
7
7
7
5
where; !
k
= e
j2k=N
=) T
1
=
1
N
2
6
6
6
6
4
1 1 1 : 1
1 !
1
!
2
: !
(N1)
1 !
2
!
4
: !
2(N1)
: : : : :
1 !
(N1)
!
2(N1)
: !
(N1)(N1)
3
7
7
7
7
5
We have, the N-point DFT vector S
i
for time instanti:
S
i
= Ts
i
= T (Ps
i1
+
i
)
= TPT
1
S
i1
+ T
i
43
Using the definitions of T,P and T
1
and the property!
kN
= 1, we have:
TPT
1
=
2
6
6
6
6
4
1 0 : : 0
0 !
1
0 : :
: 0 !
2
0 :
: : 0 : 0
0 : : 0 !
(N1)
3
7
7
7
7
5
:
Thus the momentary Fourier transform has the recursive form:
Y
i
=
2
6
6
6
6
4
1 0 : : 0
0 !
1
0 : :
: 0 !
2
0 :
: : 0 : 0
0 : : 0 !
(N1)
3
7
7
7
7
5
Y
i1
+
2
6
6
6
6
4
1
!
1
!
2
:
!
(N1)
3
7
7
7
7
5
(s
i
s
iN
)
If we were to consider each spectral componentS
i;k
; k2 (1; 2;:::N), we get the recur-
sive form:
S
i;k
= !
k
(S
i1;k
+s
i
s
iN
):
This represents a technique to calculate the DFT of a signal in an incremental fashion.
From the recursive form, we can see that to calculate a spectral component correspond-
ing to a certain frequency, we only need to know the value of the same component at a
previous instant in time (independent of other component) and historical values of the
raw signal. Thus this technique is computationally efficient when the DFT is required to
be calculated for only a small part of the spectrum. Another advantage in this technique
is that we are not restricted to choose our N to be a multiple of 2 unlike the FFT.
44
0 100 200 300 400 500 600 700 800 900 1000
10
6
10
7
Shift q between DFTs
Number of operations required (log−scale)
FFT
Full Momentary Fourier Transform
Momentary Fourier Transform, 10% spectrum
Reduced Momentary Fourier Transform, 2.5% spectrum
(a) Comparison of algorithms in log-space
0 100 200 300 400 500 600 700 800 900 1000
0
0.5
1
1.5
2
2.5
3
x 10
7
Shift q between DFTs
Number of operations required
FFT
Full Momentary Fourier Transform
Reduced Momentary Fourier Transform, 100
Reduced Momentary Fourier Transform, 25
(b) Comparison of algorithms in linear space
Figure 3.7: Performance comparison of the Momentary Fourier Transform versus the
traditional FFT as a function of shift in the window. Momentary Fourier Transforms
have a reduced computational load when only the frequency information for only a
narrow part of the spectrum is desired and when it is required to calculate the Fourier
Transform incrementally.
3.3.1.2 Performance Analysis
Consider a time series of length N
t
. We consider an N-point DFT, where the data is
shifted by q-samples (1qN) between each DFT computation. The number of
DFTs required isN
f
= (N
t
N)=q + 1.
When using radix-2 FFTs, the number of real operations required is:
N
ops1
= N
f
5Nlog
2
N:
In the case of the MFT, the number of real operations required is:
N
ops2
= N
f
(6 (q + 1)N
1
+ 2q(N
1
+ 1));
whereN
1
is the number of DFT coefficients that are calculated.
We performed a comparative analysis of the number of operations required for each
algorithm as a function of shift in window. Figure 3.7 compares each algorithm for
45
number of operations as a function of window shift in a typical scenario for capturing
human walk. We considered a one minute sample of information sampled at 50 Hz.
We desired a frequency resolution of 0.05 Hz which translated to a 1024 point Fourier
Transform.
Figure 3.7a shows the results in log space. Four curves are shown. The black curve
correspond to the number of operations required by the FFT as a function of window
shift. It can be seen that the MFT has a lower number of operations with a smaller
window shift. However, with a larger window shift the number of operations increases
drastically and the FFT is more efficient. This can be mitigated by calculating the DFT
for only a smaller section of the frequency spectrum. For example, the black line corre-
sponding an MFT calculating just 25 frequency components (2.5 percent of spectrum)
has a lower number of operations even with a large window shift. From this, it can be
seen that Momentary Fourier Transforms have a reduced computational load when only
the frequency information for only a narrow part of the spectrum is desired and when it
is required to calculate the Fourier Transform over small increments of time.
Using radix 2 FFTs, we would needN
ops1
= 51200N
f
real operations. An MFT with
a shift of 1 and all frequency components calculated would requireN
ops2
= 24588N
f
real operations. However, we are concerned with only a small part of the pertaining to
the movement in question. For example, in walking we are interested in movements
in the frequency range 0:75 2Hz. This corresponds to 25 components. Thus us-
ing a reduced frequency calculation, the typical number of real operations required is
N
ops2;reduced
= 352N
f
operations. This resulted in a 145-fold decrease in the number of
operations.
46
3.4 Summary
In this section, we described a robust technique to describe physical activities in our
daily lives. We take advantage of the fact that many common gross motor movements
of the human body have an inherent structure to them. Movements like walking and
running are essentially cyclical activities involving highly correlated movement of limb
segments. Walking and running can be approximated by an inverted pendulum and
spring mass system respectively. In both cases, the center of mass has a periodic up-
down movement. This up-down movement can be robustly detected in a number of
locations of the human body.
Robust capture of movement: We showed the utility of Fourier transform-based
techniques to in detecting this periodicity. Periodic signatures of movement can be accu-
rately tracked at all natural frequencies of walk. We also showed an initial study exam-
ining the robustness of frequency-based features to changes in location and orientation.
We tested common locations where the phone is expected to be. These included the front
pockets, back pocket, in hand, in a backpack and when on a phone call. As expected,
the up-down movement was the most constant signature across multiple locations of the
human body. When in the pockets, additional signatures were seen corresponding to
the leg swing. We also tested the robustness of frequency-based features to changes in
orientation. Tracking the vertical orientation is important in order to be able to estimate
vertical center of mass movement.
Efficient calculation of movement based features: We also showed a modification
of the Fast Fourier Transform in a recursive setting to efficiently calculate frequency
components. The resultant Momentary Fourier Transform 145 fold less operations than
an FFT. Momentary Fourier Transforms showed reduced computational load when the
47
frequency information for only a narrow part of the spectrum was desired and when it
was required to calculate the Fourier Transform in small time increments.
48
Chapter 4
Mapping Movement to Energy Expenditure
Those are my principles, and if you don’t like them... well, I have others.
—Groucho Marx.
G
IVEN a robust representation of movement, the next question to ask is whether
this representation of movement can be exploited to predict energy expenditure.
In particular, we are concerned with the problem of developing an optimal functional
approximation (measured by lowest prediction error) given large amounts of data per
person. Here, the problem
x
np
; Phys
p
f
! y
np
is reduced to x
np
fp
! y
np
for each
person p. Separate functions are found for each person.
Learning functional maps: As described in section 2.3, this involves learning a
regression map from movement to energy expenditure. In particular, given a dataset of
kinematic measurements and energy expenditure values, we seek to develop personal-
ized estimates of energy expenditure due to physical activity.
Variation with functional maps: Various kinds of functional maps can be learned.
Since our focus is on statistical techniques, we examine whether predictions of energy
49
expenditure can be improved with statistical techniques? If so, what kinds of statistical
techniques are superior and what is the highest accuracy that can be achieved?
Optimal combinations of sensors: Movement can occur in all three dimensions.
However, because of the nature of the activity, only a particular subset of movements
might be relevant. Hence, one issue to be considered is which kinds and combinations
of sensors are better at predicting energy expenditure?
The problem domain: In this section, we address the problem of estimating en-
ergy expenditure using kinematic sensors for a particular activity: treadmill walking.
Treadmill walking was chosen because it allows the capture of a regular, well-defined
and easily quantifiable movement in a laboratory setting. We use kinematic data from
a triaxial accelerometer and triaxial gyroscope as inputs. We treat the functional map-
ping of these inputs to energy expenditure as a regression problem. Our approach to
estimating energy expenditure from walking involves developing a probabilistic map
from movement features to calories burned. This section expands on work presented by
Vathsangam et al. [108, 109].
4.1 Related Work
Much of the research involving using kinematic sensors to calculate energy expenditure
per unit time for daily activities has focused on the utility of accelerometers [46, 47, 60].
These approaches use outputs produced by commercial accelerometers called counts as
determinants of physical activity and validate them against energy expenditure per unit
time [49, 53, 54, 55]. The most common technique is to fit regression equations that map
counts to energy expenditure per unit time [58, 110]. The usage of count-based tech-
niques is also imprecise because it is unclear whether the counts produced have any clear
50
physical interpretation [57]. An alternative approach to characterizing human motion in-
volves pattern recognition techniques that extract meaningful properties or features from
raw movement data and map these properties to calories expended [65]. These include
neural networks [67], probabilistic linear regression [108], piecewise regression [68]
and activity clustering [69]. Using such techniques, it is possible to learn a personalized
model for each user from data collected. Access to raw data allows the researcher to
explore the physical intuition behind movement and use features that explicitly mirror
the quantity in question.
The assumption behind using accelerometry for physical activity monitoring is that
data from an accelerometer represents body movement [111]. However, rigid body
movement consists of both accelerations and rotations [112]. Zappa et al. [113] showed
that rotational data cannot be completely separated from translational data using a sin-
gle triaxial accelerometer. Combining accelerometry and rotational rate measurements
through gyroscopes improve energy expenditure per unit time prediction by providing
a more complete picture of human movement. Gyroscopes are not influenced by grav-
itational acceleration and are more displacement tolerant than accelerometers. This is
because for a given body segment movement, a gyroscope provides the same readings
irrespective of position as long as the axis of placement is parallel to the measured axis
[114]. The introduction of low-cost, single-chip triaxial gyroscopic sensors has made
possible the utility of gyroscopes as alternatives to or in combination with accelerome-
ters for activity characterization.
Energy expenditure time per unit distance typically shows a quadratic dependence
on walking speed [115, 116]. Grieve et al. [117] reported that the relationship between
step frequency (f) and free walking speed (v) follows either a linear relationship or the
relationf =cv
; 0:5.
51
0.0 0.5 1.0 1.5 2.0
0
2
4
5
7
9
10
12
14
15
17
19
20
Frequency (Hz)
Experiment Time (min) →
2 3 4 5 6
Energy Expended
(kcal/min)
Rest
2.5
mph
3.0
mph
3.5
mph
(a) Plot of frequency features calculated and corre-
sponding energy expenditure for the same time pe-
riod.
‘
1.65 1.7 1.75 1.8 1.85 1.9 1.95 2
2
2.5
3
3.5
4
4.5
5
5.5
6
6.5
7
Frequency Values (Hz)
Energy Expenditure (kcal/min)
Data
Linear Fit
Nonlinear Fit
(b) Energy versus frequency along with line of best
fit
Figure 4.1: Illustration of energy expenditure versus frequency for a single participant.
Figure 4.1a illustrates the simultaneous capture of frequency-based features and energy
expenditure. Here a participant was initially at rest followed by walking at three dif-
ferent speeds on a treadmill before slowing down. There is a clear visual relationship
between the most dominant frequency and the energy expenditure. Figure 4.1b illus-
trates the relationship between the most dominant frequency and energy expenditure for
the walking section of the experiment along with linear or nonlinear fits.
4.2 Qualitative Illustration of Energy Expenditure Maps
We demonstrate a qualitative relationship between movement and energy expenditure
per unit time with particular reference to walking. We performed an experimental study
where a participant was initially at rest followed by walking at three different speeds on
a treadmill before slowing down. We tracked the periodicity of the center of mass of the
human body using a kinematic sensor and simultaneously measured energy expended
using a metabolic unit. Figure 4.1a illustrates the simultaneous capture of frequency-
based features and energy expenditure per unit time for this experiment. A pseudo color
plot of frequency-based features is shown in the left panel.
For the walking sections of the experiment a clear periodicity was seen as indicated
by the red band. Further, a clear visual relationship was seen between the most dominant
frequency and energy expenditure per unit time at each speed. Figure 4.1b illustrates
52
Figure 4.2: Graphical Representation of Least-Squares Regression. Dots represent num-
bers or parameters, filled circles are observed random variables.
the relationship between the most dominant frequency extracted from the Fourier-based
features and energy expenditure per unit time for the walking section of the experiment.
4.3 Algorithms
4.3.1 Least-Squared Regression (LSR)
4.3.1.1 Model definition
We consider the linear model mapping a movement descriptor to energy expenditure:
Y
p
= X
p
w
p
+I;sN
0;
1
p
(4.1)
y
np
= w
T
p
x
np
+;sN
0;
1
p
(4.2)
where is a noise parameter, x
np
= (1; x
np;1
;:::x
np;D
)
T
is the derived function space
consisting of fixed nonlinear functions of the input variables of dimensionD and w
p
=
53
(w
p;0
;:::;w
p;D
)
T
are the weights. Given that is a Gaussian, allowing a probabilistic
interpretation, we have:
p
y
np
jx
np
; w
p
;
p
= N
y
np
; w
T
p
x
np
;
1
p
(4.3)
p (Y
p
jw
p
;
p
) = N
Y
p
; X
p
w
p
;
1
p
I
(4.4)
4.3.1.2 Inference
The log-likelihood for this model is:
L =
Np
X
np=1
N
p
2
log
p
p
2
kY
p
X
p
w
p
k
2
We define an optimal fitting function as one that maximizes this log-likelihood. This
is equivalent to finding the optimal w
p
that would minimize the expected square-loss
E
D
n
y
np
f(x
np
; w
p
)
2
o
. The optimal prediction is given by:
w
p
=
X
T
p
X
p
1
X
T
p
Y
p
1
p
=
1
N
p
Np
X
np=1
y
np
w
T
p
x
np
2
(4.5)
4.3.1.3 Prediction
The optimal prediction for a new data point x
p
is given by:
y
p
= w
p
T
x
p
(4.6)
LSR is also prone to the presence of outliers because it does not take into account
the consistency of points in a dataset. Another drawback of LSR is its tendency to
over-fit to a given dataset due to which it often performs poorly on unseen data points.
One solution is to include a regularization term that controls the relative importance
54
Figure 4.3: Graphical Representation of Bayesian Linear Regression. Dots represent
numbers or parameters, filled circles are observed random variables.
of data-dependent noise. However finding the optimal involves techniques such as
K-fold cross-validation and the need to maintain a separate validation dataset.
4.3.2 Bayesian Linear Regression (BLR)
Bayesian Linear Regression [118] adopts a Bayesian approach to the linear regression
problem by introducing a prior probability distribution over w
p
.
4.3.2.1 Model Definition
Once again, we consider the linear model mapping a movement descriptor to energy
expenditure:
Y
p
= X
p
w
p
+I;sN
0;
1
p
(4.7)
y
np
= w
T
p
x
np
+;sN
0;
1
p
(4.8)
p(y
np
jx
np
; w
p
;
p
) = N
y
np
; w
T
p
x
np
;
1
p
(4.9)
55
where is a noise parameter, x
np
= (1; x
np;1
;:::x
np;D
)
T
is the derived function space
consisting of fixed nonlinear functions of the input variables of dimensionD and w
p
=
(w
p;0
;:::;w
p;D
)
T
are the weights.We choose a Gaussian prior,p(w
p
) =N
w
p
; 0;
1
p
I
over the model parameters w
p
in Equation. 4.1, where
p
is a hyperparameter.
4.3.2.2 Inference
The complete log-likelihood for this model is:
L =logp (Y
p
; w
p
j
p
;
p
) =
N
p
2
log
p
+
M
2
log
p
p
2
kY
p
X
p
w
p
k
2
+
p
2
kw
p
k
2
+const
Learning the model amounts to learning
m
Np
; S
Np
;
p
;
p
for each person p.
However, simultaneously learning of all these parameters is not possible because of
cyclical dependence. We instead adopt an iterative approach using the algorithm 1. In-
tuitively, it can be seen that in order to maximize this likelihood, one must minimize the
sum of least squares errorkY
p
X
p
w
p
k
2
and the magnitude termkw
p
k
2
. The mag-
nitude term is called a shrinkage term and places a constraint on the optimization by
forcing the magnitude of the weight vector to be as low as possible. Thus it penalizes
complicated models over simple ones. The relative values of
p
and
p
control how
much importance is given to penalizing models versus minimizing the least squares fit.
These are determined from the data itself. This provides Bayesian Linear Regression an
advantage over traditional cross-fold validation techniques in that the regularization of
the model is determined automatically from the data.
4.3.2.3 Prediction
The optimal prediction for a new data point is given by the predictive distribution:
56
Algorithm 1 EM algorithm for Bayesian Linear Regression
Inputs:
Movement descriptors in the data matrix form X
p
and corresponding energy prediction
values Y
p
.
Initialization:
Initialize, w
p
=
X
T
p
X
p
1
X
T
p
Y
p
;
p
and
p
as random values.
Define expectation of log-likelihood for each personp as:
L
p
=log (p (Y
p
j
p
;
p
)) =
D
2
ln
p
+
N
2
ln
p
+
p
2
kY
p
Xm
p
k
2
p
2
km
p
k
2
Repeat until Log-likelihood converges:
M-step:
p
=
D
km
p
k
2
+trace (S
p
)
p
=
N
kY
p
X
p
m
p
k
2
+trace
X
T
p
S
p
X
p
E-step:
S
p
=
p
I +
p
X
T
p
X
p
1
m
p
=
p
S
p
X
T
p
Y
p
Recalculate likelihood.
p(y
p
jx
p
; Y
p
;
p
;
p
) = N (m
T
p
x
p
;
2
p
(x
p
)) (4.10)
2
p
(x
p
) =
1
p
+ x
p
T
S
p
x
p
(4.11)
S
1
p
=
p
I +
p
X
T
p
X
p
(4.12)
m
p
=
p
S
p
X
T
p
Y
p
(4.13)
The output prediction of BLR (Equations 4.10 and 4.11) involves computing a mean
m
N
and a variance
2
Np
(x). The importance of a variance estimate is that it allows
the user to evaluate how “confident” the algorithm is of its prediction and provides the
necessary tool to evaluate the goodness of prediction of an unseen data point. Also,
it can be seen from Equation 4.11 that if an additional point x
Np+1
were added, the
57
Figure 4.4: Graphical Representation of Gaussian Process Regression. Dots represent
numbers or parameters, filled circles are observed random variables and unfilled circles
are hidden random variables. The solid line indicates that all functions are connected.
resultant variance
2
Np+1
(x
p
) <
2
Np
(x
p
). This tends to the limit
2
Np
(x
p
)
Np!1
=
1
p
or the
intrinsic noise in the process. Thus BLR reflects the higher precision in prediction with
the availability of larger quantities of data through a smaller variance. The use of a prior
helps guard against over-fitting. and are derived purely from the dataset without
needing a separate validation dataset.
4.3.3 Gaussian Process Regression (GPR)
4.3.3.1 Model Definition
We provide a nonlinear, nonparametric regression model for comparison. Given a set of
training points
(x
np
;y
np
)
Np
np=1
such that:
y
np
= f(x
np
) +;N
0;
1
p
I
(4.14)
58
a Gaussian Process Regression model [119] estimates a posterior probability distribution
over functionsff(x
np
)g
Np
np=1
evaluated at pointsfx
np
g
Np
np=1
such that any finite subset
of the functions is a joint multivariate Gaussian distribution. Consequently, for a given
set of points X
p
=fx
np
g
Np
np=1
, we have a corresponding vectorF
x
=ff(x
np
)g
Np
np=1
that
belongs to a multivariate Gaussian distribution:
F
Xp
s Nf
Np
(X
p
); K
Np
(X
p
; X
0
p
)g (4.15)
where
Np
(X
p
) =
(x
1p
)
T
(x
2p
)
T
::: (x
Np
)
T
T
and K
Np
(X
p
; X
0
p
) is the
N
p
N
p
kernel function whose each elements is the covariance valuek(x
ip
; x
jp
). The
key idea in GPR is that the similarity between two function outputs,f(x
np
) andf(x
n
0
p
),
depends on the input values, x
np
and x
n
0
p
and is captured by the kernelk(x
np
; x
n
0
p
). To
completely specify a GP, it is enough to specify
Np
(X
p
) and K
Np
(X
p
; X
0
p
). By defini-
tion, eachf(x
np
) is marginally Gaussian, with mean(x
np
) and variance k(x
np
; x
n
0
p
).
Typically, for ease of implementation, the mean of the dataset is subtracted from
each data point so that the mean function is 0. We wanted to capture the fact that two
similar movement descriptors will likely result in the same energy expenditure output.
When the movement descriptors are dissimilar, the energy expenditure will be different.
However, we wanted to capture the fact that this difference will taper off with increasing
dissimilarity. We thus chose the radial basis function kernel. Further, to capture the fact
that we only have access to noisy observations of the function values, it is necessary to
add the corresponding covariance function for noisy observations. The complete kernel
function can be expressed in element by element fashion as:
k(x
np
; x
n
0
p
) =
2
f
:e
1
2l
2
xnp
x
n
0
p
2
+
2
n
ij
(4.16)
where
2
f
is the signal variance,l is a length scale that determines strength of correlation
between points,
2
n
is the noise variance.
59
4.3.3.2 Inference
The log-likelihood for this model is:
L =
1
2
log
K
Np
(X
p
; X
0
p
)
1
2
Y
T
p
K
1
Np
Y
p
N
p
2
2
Typically, we don’t have an estimate for the hyperparameters and hence have to estimate
them from the data itself. Learning the model amounts to learning the hyper-parameters
f
and
n
. This is done by maximizing the likelihood with respect to each of the hyper-
parameters:
@L
@
j
=
1
2
Y
T
p
K
1
Np
@K
Np
@
j
K
Np
Y
p
1
2
tr
K
1
Np
@K
Np
@
j
=
1
2
tr
T
K
1
@K
Np
@
j
;where = K
1
Np
Y
p
This is computed inO(N
3
p
) time primarily due to the need to invert the kernel matrix
K
Np
. This is an expensive operations and thus Gaussian Process Regression requires a
large amount of time for training.
4.3.3.3 Prediction
For a new point x
p
there exists a corresponding target quantityf(x
p
). Sincef(x
p
)
also belongs to the same GP, it can be appended to the original training set to obtain a
larger set.
F
Xp[xp
s Nf
Np+1
(X
p
[ x
p
); K
Np+1
(X
p
[ x
p
; X
p
[ x
p
0
)g (4.17)
K
Np+1
=
K
Np
k
k
T
k
(4.18)
where k
p
has elements k
p
(x
np
; x
) for n
p
= 1;:::;N
p
and k(x
ip
; x
jp
) is defined in
Equation 4.16. Using properties of Gaussians and the definition of GPs, it follows that
60
for a new test pointp(y
p
jx
p
; X
p
; Y
p
) =N (f (x
p
);m(x
p
);
2
xp
). Because this joint
distribution is Gaussian by definition, we can marginalize our the remaining variables
using the properties of Gaussians to obtain:
m(x
p
) = k
T
p
K
1
Np
Y
p
(4.19)
2
(x
p
) = k k
T
p
K
1
Np
k
p
(4.20)
Equations 4.19 and 4.20 summarize the key advantages of GPR. Again, the use
of a probabilistic model to obtain a mean and variance for each prediction allows the
user to assess the confidence of each prediction. In contrast to BLR however, GPR is
non-parametric: its model complexity increases with larger quantities of data as evident
from the increasing size of the kernel matrix. GPR avoids the process of explicitly
constructing a suitable feature function space by dealing instead with kernel functions.
As the kernel implicitly contains a non-linear transformation, no assumptions about the
functional form of the feature space are necessary. This allows us to deal with non-linear
maps without having to construct non-linear function spaces. The motivation behind
considering this algorithm was to determine whether using a nonlinear probabilistic
map (GPR) offers benefits over a linear probabilistic map (BLR) in terms of increased
prediction accuracy.
4.4 Results
This section provides a comparative analysis of the prediction accuracy of different
models. We varied the models along three dimensions. First, we considered the ef-
fect of different sensor streams. Our study used two kinds of kinematic sensors: tri-
axial accelerometers and triaxial gyroscopes. Within data from each kinematic sensor,
61
we compared the effect of using triaxial information versus uniaxial information. Us-
ing the best feature space from each of these comparisons, we compared the utility of
accelerometers, gyroscopes and a combined solution using both sensors in terms of pre-
diction accuracy. Second, using the best feature space from the first study, we compared
the relative performance of algorithms measured by prediction accuracy. Finally, we
performed an empirical comparison of algorithm run time to provide further insight into
algorithm choice based on the trade-off between prediction accuracy and computational
capability. The motivation behind comparing these models was to understand what is the
best possible algorithm and kinematic information required to accurately predict energy
expenditure. Unless otherwise stated, results were significant (p < 0.05 on a per-subject
basis).
4.4.1 Participant Statistics
This evaluation used data from the first data collection study outlined in section A.1. 9
healthy adults (five male, four female) participated in the study. Height and weight of
each participant were recorded using a Healthometer balance beam scale. The partici-
pants had average age = 29 6 years, average height = 1:72 0:13 m, average weight =
75 20 kg and average BMI = 25 7. Participants walked at 11 predetermined speeds
between 2.5 mph and 3.5 mph in intervals of 0.1 mph. Speeds were chosen based on the
Compendium of Physical Activities [38]. Rate of oxygen consumption (
_
VO
2
, ml/min)
was measured and multiplied by 5 to obtain energy expenditure. Human movement was
captured with a modified version of the Sparkfun 6DoF kinematic Measurement Unit
(IMU) v4 [120].
62
(a) Data collection procedure (b) Kinematic sensor location on the body
Figure 4.5: Illustration of data recording procedure. The sensor was worn on the right
illiac crest.
4.4.2 Data Collection and Pre-processing
Each sensor stream from the IMU was passed through a lowpass filter with 3dB cutoff
at 20 Hz. This frequency was chosen keeping in mind that everyday activities fall in the
frequency range of 0.1-10 Hz [92]. For each participant p, data streams were divided
into 10 second epochs. Within each epoch, the 1024 point normalized FFT of each
stream was extracted to obtain frequency information. FFT values corresponding to
frequencies greater than 10 Hz were discarded. Thus for each epoch, six FFTs, one
corresponding to each axis were then concatenated to obtain feature vector x
np
. The
energy expenditure values from the MedGraphics metabolic system that fell within each
epoch were averaged and matched appropriately. Each 10 secondfFFT; Energyg pair
represents the training data
x
np
;y
np
for personp. Each participant’s dataset consisted
of approximately 77 minutes of data (orN
p
460 data points).
63
4.4.3 Training and Testing Procedure
For each participant’s data, we assume that eachfFFT;Energyg pair is independent
and identically distributed (i.i.d). Thus one can treat each point as independent from any
other in the dataset given the model. This need not necessarily hold for general walking
but follows from our steady-state assumption in treadmill walking. A fraction of the
data were uniformly sampled and partitioned into training data, the remaining fraction
constituting test data. Different models were trained with the same training data but
with different feature vectors and candidate algorithms. RMS error was calculated as a
measure of accuracy. This was repeated over 10 trials for different randomly sampled
data and results averaged. This was repeated for training data percentages from 10%
to 90% and constituted a per-subject measure of performance. The results were then
averaged over all subjects. This represented the Average Root Mean Squared Error
(ARMS error) for that algorithm.
4.4.4 Comparison between sensors
4.4.4.1 Single sensor feature comparison
Fig. 4.6 groups results accordingly. Each panel consists of testing errors when single
axes features are used with a fourth series consisting of triaxial features. Results are
grouped column-wise by sensor type (accelerometer or gyroscope) and row-wise by
algorithm type (LSR, BLR and GPR).
LSR was sensitive to the quantity of training data available regardless of the sen-
sor. Error using single axis streams peaked when 30% of the training data ( 150 data
points). were used. This was not true when triaxial features were used. This is because
10 20 30 40 50 60 70 80 90
0
0.5
1
1.5
2
2.5
3
Percentage of Training Data
ARMSE (kcal/min)
Up−Down
Forward−Backward
Left−Right
All Axes
(a) LSR with accelerometer features only.
10 20 30 40 50 60 70 80 90
0
0.5
1
1.5
2
2.5
3
Percentage of Training Data
ARMSE (kcal/min)
Twist
Left−Right
Forward−Backward
All Axes
(b) LSR with gyroscope features only
10 20 30 40 50 60 70 80 90
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Percentage of Training Data
ARMSE (kcal/min)
Up−Down
Forward−Backward
Left−Right
All Axes
(c) BLR with accelerometer features only
10 20 30 40 50 60 70 80 90
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Percentage of Training Data
ARMSE (kcal/min)
Twist
Left−Right
Forward−Backward
All Axes
(d) BLR with gyroscope features only
10 20 30 40 50 60 70 80 90
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Percentage of Training Data
ARMSE (kcal/min)
Up−Down
Forward−Backward
Left−Right
All Axes
(e) GPR with accelerometer features only
10 20 30 40 50 60 70 80 90
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Percentage of Training Data
ARMSE (kcal/min)
Twist
Left−Right
Forward−Backward
All Axes
(f) GPR with gyroscope features only
Figure 4.6: Illustration of variation of prediction accuracy (measured by Average RMS
prediction error across all participants) with various movement descriptors and algo-
rithms. Results are grouped row-wise by algorithm and column wise by sensor stream.
LSR accuracy depended on amount of training data. BLR and GPR showed consistently
reduced errors with increase in training data size. With BLR and GPR, use of all 3 axes
as features improved prediction accuracy as opposed to using just one sensor axis. The
best individual axis was corresponded to movement in the up-down direction.
65
at that percentage, LSR was an under-constrained system (150 data points for 156 vari-
ables). This resulted in the algorithm over-fitting to a dataset. BLR and GPR are less
prone to over-fitting at all percentages. With BLR and GPR, increasing the percent-
age of training data reduced prediction errors for that space. For these reasons, in the
remainder of this paper, we focus on results obtained from BLR and GPR.
With accelerometer information alone, using BLR, the errors in increasing order
were: triaxial accelerations, up-down accelerations, forward accelerations and sideways
accelerations. Using all three axes had the effect of introducing redundancy, resulting
in better prediction accuracies. The lowest error from a single axis was in the up-down
direction. This can be understood by the fact that we are tracking the acceleration of
the center of mass of the human body. An interesting observation when using a non-
linear approach like GPR is that all 3 axes yielded comparable errors. This is because
the nonlinear model operates in the similarity space as opposed to explicitly modeling
dependence on input features. This makes it robust to designing the right features. With
gyroscopic information alone, rotation in the left-right direction showed the lowest er-
ror. Triaxial information yielded higher accuracy. Gyroscopes track rotational rates of
the human body rather than accelerations.
4.4.4.2 Comparison between accelerometer and gyroscopic data
Fig. 4.7 outlines the results when BLR and GPR-based models are used. In the case
of BLR, using only gyroscope data resulted in higher average RMS errors than when
using only accelerometer data. Additionally, combining accelerometer and gyroscope
information reduced prediction errors. When using GPR, both the accelerometer and
gyroscope showed similar error level. Additional information by way of gyroscopes
provides further evidence that a certain data point belongs to a particular class.
66
10 20 30 40 50 60 70 80 90
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Percentage of Training Data
ARMSE (kcal/min)
Acc
Gyr
Both
(a) In the case of BLR, accelerometer and gyro-
scopic information show similar prediction ac-
curacies. Combining accelerometer and gyro-
scopic information shows lower prediction er-
rors.
10 20 30 40 50 60 70 80 90
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Percentage of Training Data
ARMSE (kcal/min)
Acc
Gyr
Both
(b) In the case of GPR, combining accelerometer
and gyroscopic information shows lower predic-
tion errors.
Figure 4.7: Illustrating the effect of combining triaxial accelerometer and gyroscopic
information (measured by average RMS prediction error across all participants) in the
case of BLR and GPR. Accelerometer and gyroscope provide similar results when used
separately.
4.4.5 Comparison across algorithms
Fig. 4.8 illustrates the results obtained from comparing a nonlinear approach (GPR) with
a linear approach (BLR). Both GPR and BLR performed better when more training data
are used. GPR performed better than BLR with any amount of training data. The use
of a similarity space allows GPR to be agnostic to the input features. Since walking is
inherently periodic, the input Fourier transform features all have a similar structure to
them. This ensures that regardless of the strength of the features, the strength of their
similarities can be used to train better models.
4.4.6 Run time versus accuracy
Parametric approaches like linear regression depend on the dimension of the input data
space,d and learning is of orderO(d
3
). Nonparametric approaches depend on the num-
ber of data points. In particular, for N data points, GPR requires the inversion of an
67
10 20 30 40 50 60 70 80 90
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Percentage of Training Data
ARMSE (kcal/min)
BLR
GPR
Figure 4.8: Illustration of relative algorithmic performance when triaxial information
from all sensors is used (measured by average RMS prediction error across all partici-
pants). With increasing number of data points GPR begins to perform comparably with
BLR.
NN matrix which is anO(N
3
p
) operation. Knowing the run time for training is im-
portant to understand the tradeoffs between prediction accuracy and time of training.
This is particularly important if these algorithms are to be implemented in resource-
constrained platforms such as mobile phones or portable PCs. In our study, there were
three classes of data types: Single sensor (either accelerometer or gyroscope) with only
one axis in use, single sensor with all three axes in use and both sensors all three axes
in use. Each of these cases multiplies the feature space used by 3. In addition, three
algorithms: LSR, BLR and GPR were used. Fig. 4.9 illustrates our results for one
participant. Similar trends exist for all participants.
For this study, the time taken to train a dataset with different percentages of training
data for one participant was recorded in the case of one feature space and one algorithm.
Prediction accuracies were also measured. A scatter plot was created with prediction
accuracies on the X-axis and algorithm run time on the Y-axis (Log-scale, base 10)
with all training percentages represented as one class. This was repeated for different
combinations of feature vectors and algorithms. In all plots, the algorithms are coded by
68
0 0.5 1 1.5 2 2.5 3
−8
−6
−4
−2
0
2
4
ARMSE (kcal/min)
Inference time (log−seconds)
LSR, One Axis
LSR, Single Sensor
LSR, All Sensors
BLR, One Axis
BLR, Single Sensor
BLR, All Sensors
GPR, One Axis
GPR, Single Sensor
GPR, All Sensors
(a) Scatter plot comparing the relationship be-
tween run time and prediction accuracy for BLR
(filled circles) and LSR (squares) when differ-
ent features are used. Run time is shown in a
logarithmic scale. BLR shows lower errors but
has a higher run time. In the case of LSR, ad-
dition of extra features shows no apparent bene-
fits in terms of accuracy but increases run time.
Addition of features improves BLR prediction
accuracy measured by consistency of prediction
and error rate at the expense of higher run time.
However absolute run time is still on the order
of a few seconds which justifies the selection of
BLR over LSR.
0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6
−6
−5
−4
−3
−2
−1
0
1
2
3
4
ARMSE (kcal/min)
Inference time (log−seconds)
BLR, One Axis
BLR, Single Sensor
BLR, All Sensors
GPR, One Axis
GPR, Single Sensor
GPR, All Sensors
(b) Scatter plot comparing the relationship be-
tween run time and prediction accuracy for BLR
(filled circles) and GPR (stars) when different
features are used. Run time is shown in a log-
arithmic scale. Nonlinear modeling with GPR
shows lower errors than BLR, particularly when
more training data are used. However, the run
time is at least two or three orders of magnitude
higher. This shows that for the same dataset, in-
creasingly higher accuracy requires much more
computing power. This represents an important
trade-off between the level of accuracy desired
and the algorithm to choose.
Figure 4.9: An illustration of the relationship between accuracy and run time for LSR,
BLR and GPR for a single participant. Scatter points of each class represent different
training percentages of the same class and feature space. The best algorithm has to be
as close to the origin as possible (lowest error and lowest run time)
color (Cardinal, LSR; Gold, BLR; Black, GPR) and feature spaces are coded by symbol
(LSR: empty square; BLR: filled circle; GPR: empty star). For clarity, plots are shown
in two views.
Fig. 4.9a shows a comparative analysis of run time versus accuracy for all three
algorithms. In the case of LSR, addition of extra features showed no benefits in terms of
accuracy but increases run time. Addition of features improved BLR prediction accu-
racy at the expense of higher run time. However in our study, the absolute run time for
training was still on the order of a few seconds and less than 30 seconds in all cases for
all participants. The consistency of results and lower error rates along with reasonable
69
training run times justifies the selection of BLR over LSR with any combination of fea-
ture vectors. Using both sensors offers limited advantage in terms of prediction accuracy
but requires larger run times. Therefore, it would be advisable in resource-constrained
systems to choose a model that only uses one of either sensor for training if accuracy is
not an issue.
Another important observation is that with more training data, both the accuracy and
run time of BLR reduce. This is because with more data available, the EM algorithm is
capable of converging faster. The estimation of model parameters becomes easier with
increasing evidence.
Fig. 4.9b shows a comparative analysis of run time versus accuracy for BLR and
GPR. Both BLR and GPR are probabilistic approaches and hence show consistently
better results with increasing dataset size. Nonlinear modeling with GPR showed lower
or comparable errors as BLR, particularly when more training data were used. However,
the run time was at least two orders of magnitude higher. Given a dataset, to be able to
obtain a higher accuracy requires increasingly larger computing power to accommodate
more sophisticated models. In resource-constrained systems, this incremental increase
in accuracy might not be justified. Therefore if computing power or battery consumption
is an issue, it would be advisable to use linear models over nonlinear models.
4.5 Summary
In this section, we described an experimental study to estimate energy expenditure dur-
ing treadmill walking using a single hip-mounted kinematic sensor comprised of a tri-
axial accelerometer and a triaxial gyroscope. Our approach involved representing the
cyclic nature of walking using Fourier transforms of triaxial accelerometer and gyro-
scopic sensor streams and establishing a relationship between Fourier domain features
70
and energy expended. We described three regression techniques: Least Squares Re-
gression (LSR), Bayesian Linear Regression (BLR) and Gaussian Process Regression
(GPR) and showed their applicability to this problem.
Comparison of sensing techniques: While employing BLR, accelerometer and
gyroscope data simultaneously improved prediction accuracy. Among accelerometer
features, Up-Down acceleration with GPR showed the lowest prediction error. Gyro-
scopic axes in the sway direction showed comparable errors. Gyroscopes were capable
of providing comparable results for energy prediction from treadmill walking. Addi-
tionally, combining accelerometer and gyroscope information reduced prediction errors.
Between accelerometers and gyroscopes, accelerometers are a better sensing option.
Comparison of algorithmic techniques: We reported and compared prediction ac-
curacies using different sensor streams and algorithms. LSR results depended heavily
on the number of points used for training. This was because LSR is prone to over-fitting.
BLR and GPR showed reduced errors with increasing training data size. GPR showed
higher accuracies than BLR. By working in the similarity space, GPR can counteract
the effects of feature selection and provide higher prediction accuracy.
Accuracy versus inference time tradeoff: GPR accuracy was higher than BLR.
However, GPR inference time was at least two orders of magnitude higher. Therefore if
computing power is an issue, it would be advisable to use linear models like BLR over
nonlinear models and trade off accuracy.
71
Chapter 5
Hierarchical Approaches to Creating Energy
Expenditure Maps
All generalizations are false, including this one.
—Mark Twain.
G
IVEN an individual for whom no previous information is available, acquiring
training data in free living conditions is not feasible. This leads to the question,
how does learn a regression function for a person in the absence of training data?
Hierarchical energy expenditure prediction: It is important to note that the prob-
lem of predicting energy expenditure from movement is inherently hierarchical in na-
ture. For the same movement, two individuals may expend different amounts of energy
due to interpersonal differences in morphological characteristics such as height, weight,
sex, and fitness [121, 122]. Individuals who are similar to each other might expend sim-
ilar amounts of energy for the same movement. One can capture such common traits
across individuals by treating them as part of a population where each member has dif-
ferent morphological characteristics. Each of the member’s regression maps would be
72
specific cases of a common population model. A person’s individual morphological
characteristics would then be used to obtain a personalized version of the population
map. The question then is how does one fuse individual and population level informa-
tion into a single framework?
Determining the right morphological parameters: Ideally, such an approach must
also allow one to incorporate as many morphological descriptors as necessary. Thus, a
related question is which descriptors are the most important in describing a population
and can these be determined from population data itself? These models could be used to
obtain a model that generates personalized maps depending on a person’s morphological
information
The problem domain: In this section, we address the problem of creating person-
alized predictions of energy expenditure for a person using phone-based accelerometer
data with little or no data from that person. We use a data-driven approach where we con-
solidate information from a representative population to which that individual belongs
and then use that information to generate a personalized model based on the person’s
individual morphological descriptor. We use triaxial accelerometer data captured from
a mobile phone. We compare different techniques to generate personalized maps with
an experimental study focused on steady-state treadmill walking. The primary contri-
butions of this section are a detailed description of the problem of hierarchical energy
expenditure prediction and an evaluation of candidate techniques on a test population of
35 individuals.This section expands on work presented by Vathsangam et al. [123, 124].
5.1 Related Work
Current kinematic sensor-based models use accelerometers and account for inter-personal
differences by treating all participants as one after normalization for size. This means
73
that the energy that is consumed by a person is scaled by an exponent of that person’s
weight or height [125]. By normalizing for size, all participants are then replaced by
a single pseudo-participant with scaled energy values and a regression value is learned.
Most common scaling coefficients include a range from 0:6 1:0 [126], the most com-
mon being 0:67 [127] and 0:75 [128]. Different populations require different scaling
coefficients. Rogers et al. [128] and Pearce et al. [129] showed that scaling coefficients
vary across age groups and stages of development in individuals. With respect to these
approaches, it is important to note that weight or height may not represent a complete
description of a person for developing a personalized energy expenditure model. Waters
et al. [130] showed that in addition to weight, the effect of other morphological de-
scriptors such as sex, stride length, gait style and heart rate also have to be incorporated.
Another approach is to explicitly model the energy expended as a polynomial function
of movement and weight. For example, Wyndham et al. [131] showed a linear depen-
dence on weight and a squared dependence on velocity per unit time. These approaches
point to a more general framework that predicts the energy expended for a person given
a certain movement and their morphological characteristics.
5.2 Qualitative Illustration of Hierarchical Maps
Figure 5.1a illustrates a plot of energy versus frequency plot for 35 participants. The
line of best fit across all participants is also shown in red. Figure 5.1b shows the same
plot with each participant’s individual line fit along with a general line fit. It can be seen
that the line of best fit across all participants is a poor fit for each individual participant.
For participants whose lines are in the top left part of the figure, the overall line of
best fit would underestimate the number of calories burned. Similarly, for participants
1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2
1
2
3
4
5
6
7
8
9
Gait Frequency (Hz)
Energy (kcal/min)
Population Data
Line of Best Fit
(a) Energy versus frequency along with overall line
of best fit
1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2
1
2
3
4
5
6
7
8
9
Gait Frequency (Hz)
Energy (kcal/min)
Under−estimate
Over−estimate
(b) Individual lines of best fit for each person ver-
sus over all line of best fit
1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2
0
1
2
3
4
5
6
7
8
9
10
Gait Frequency (Hz)
Energy (kcal/min)
(c) Energy versus Frequency color coded by gen-
der
1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2
0
1
2
3
4
5
6
7
8
9
10
Gait Frequency (Hz)
Energy (kcal/min)
(d) Energy versus Frequency color-coded by
weight
1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2
0
0.005
0.01
0.015
0.02
0.025
Frequency of walk
Energy (kcal/min)
(e) Effective space in weight-scaled approach
1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2
0
0.005
0.01
0.015
0.02
0.025
Frequency of walk
Scaled Energy (kcal/min/kg
1.4
)
(f) Weight-scaled approach color-coded by weights
Figure 5.1: Illustration of energy versus frequency for multiple participants along with
line of best fit. Analysis of each individual line of best fit shows a poor fit across par-
ticipants. Participants who share similar morphology show similar lines of best fit. This
points to a more general approach that uses morphological similarities across people to
generate personalized maps.
75
whose lines are in the bottom left part of the figure, the overall line of best fit would
underestimate the number of calories burned.
Figures 5.1c and 5.1d show the same space covered by figure 5.1a but color coded
by gender and weight respectively. It can be seen that males and female cover different
parts of the same plot suggesting separate equations for them. Further, people of similar
weights tend to have similar individual line fits.
Figures 5.1e and 5.1f show same space shown in figure 5.1a in a weight-normalized
sense. Here, the each participant’s energy expenditure per min is scaled by an exponent
of that participant’s weight. In weight-scaled approaches, these energy-scaled values are
used to train a single regression model. It can be seen that when the energy expenditure
is scaled by an exponent of the weight, the original space is transform into a new space
where there is higher correlation between energy expenditure and walking frequency.
The weight-scaled approaches suggest a technique that utilizes morphological descrip-
tors as a means to consolidate information across people and then use that consolidated
information to obtain personalized plots.
5.3 Algorithms
5.3.1 Personal Models
It is possible to develop a procedure where we simply train a separate model x
np
f
!
y
np
8n
p
2 [1; 2;:::N
p
] for each personp. An obvious candidate is the Bayesian Linear
Regression model proposed in section 4.3.2.
76
5.3.1.1 Model Definition
Once again, we consider the linear model mapping a movement descriptor to energy
expenditure:
Y
p
= w
T
p
X
p
+I;sN
0;
1
p
(5.1)
y
np
= w
T
p
x
np
+;sN
0;
1
p
(5.2)
p(y
np
jx
np
; w
p
;
p
) = N
y
np
; w
T
p
x
np
;
1
p
(5.3)
where is a noise parameter, x
np
= (1; x
np;1
;:::x
np;D
)
T
is the derived function space
consisting of fixed nonlinear functions of the input variables of dimensionD + 1 and
w
p
= (w
p;0
;:::;w
p;D
)
T
are the weights. We choose a Gaussian prior,p(w
p
) =N
w
p
; 0;
1
p
I
over the model parameters w
p
in Equation 4.1, where
p
is a hyperparameter.
5.3.1.2 Inference
The complete log-likelihood for this model is:
L =logp (Y
p
; w
p
j
p
;
p
) =
N
p
2
log
p
+
M
2
log
p
p
2
kY
p
X
p
w
p
k
2
+
p
2
kw
p
k
2
+const
Learning the model amounts to learning
m
Np
; S
Np
;
p
;
p
for each person p.
However, simultaneously learning of all these parameters is not possible because of
cyclical dependence. We instead adopt an iterative approach using the algorithm 2.
5.3.1.3 Prediction
The optimal prediction for a new data point is given by the predictive distribution:
77
Algorithm 2 EM algorithm for Bayesian Linear Regression
Inputs:
Movement descriptors in the data matrix form X
p
and corresponding energy prediction
values Y
p
.
Initialization:
Initialize, w
p
=
X
T
p
X
p
1
X
T
p
Y
p
;
p
and
p
as random values.
Define expected log-likelihood for each personp as:
L
p
=log (p (Y
p
j
p
;
p
)) =
D
2
ln
p
+
N
2
ln
p
+
p
2
kY
p
Xm
p
k
2
p
2
km
p
k
2
Repeat until Log-likelihood converges:
M-step:
p
=
D
km
p
k
2
+trace (S
p
)
p
=
N
kY
p
X
p
m
p
k
2
+trace
X
T
p
S
p
X
p
E-step:
S
p
=
p
I +
p
X
T
p
X
p
1
m
p
=
p
S
p
X
T
p
Y
p
Recalculate likelihood.
p(y
p
jx
p
; Y
p
;
p
;
p
) = N (m
T
p
x
p
;
2
p
(x
p
)); (5.4)
2
p
(x
p
) =
1
p
+ x
p
T
S
p
x
p
; (5.5)
S
1
p
=
p
I +
p
X
T
p
X
p
; (5.6)
m
p
=
p
S
p
X
T
p
Y
p
: (5.7)
Personal models perform well when used on data from the same person. However,
they have poor interpolation capability across people because of the differences in mor-
phological characteristics between individuals. Hence, we present a set of modifications
to the personal model to take morphological differences into account. In this work, the
personal model was used as the ground truth for the best possible predictive capability
that could be achieved for person given that sufficient data are available for that person.
78
5.3.2 Weight-Scaled Models
A commonly used technique to adjust for inter-person differences is to scale the energy
expended by the individual by an exponent of their weight,Weight
s
. Each participants
energye expenditure per unit time,y
np
replaced by a scaled valuey
np;scaled
=
y
np
Weight
s
;s2
[0:1; 1:5]. The set of all energy values Y
p
are thus Y
0
p
=
y
1p;scaled
y
2p;scaled
::: y
Np;scaled
T
2R
Np1
and movement descriptors X
p
remain the same. The energy expenditure values
from all participants are then treated as though belonging to a single pseudo-participant.
The original problem
x
np
; Phys
p
f
!y
np
8n2 [1; 2;:::N] is then recast as as personal
regression model problem x
np
f
0
!y
np;scaled
. A potential limitation with this approach is
that it does not allow the incorporation of other morphological parameters or nonlinear
combinations of morphological parameters.
5.3.3 Nearest-Neighbor Models
One approach to incorporate an arbitrary combination of morphological parameters is
to extend the personal model approach with nearest neighbor-based interpolation. Here,
to predict energy expenditure for a personp, a personal model such as in section 4.3.2 is
learned from a personp
0
who is “closest” in morphological similarity to personp. Given
the morphological matrix, PHYS =
Phys
T
1
Phys
T
2
::: Phys
T
P
T
, we calculate
a space of reduced dimensionality using principal component analysis [132]. We first
normalize the morphological matrix by ensuring that each column (corresponding to a
morphological variable) has zero mean and unit variance. Normalizing to unit variance
is required because of the differing scales of each morphological descriptor. We then
apply the PCA transform and preserve the first three dimensions (82% of variance pre-
served). Each point in this space corresponds to a personp. In this reduced space, for
a given personp, the data from the closest personp
0
as measured by Euclidean distance
79
were used to train a model. This model was then used to predict energy expenditure
using the input data for personp.
An issue with nearest neighbor-based approaches is that they are sensitive to the met-
ric space under consideration. Also, heuristics are required to determine the right set of
nearest neighbors. They also do not take into account the quality of data available from
the nearest neighbor. What is needed is an approach that consolidates the information
that takes advantage of data from all the people in a population simultaneously.
A note on the nearest neighbor metric
After transforming the data into lower dimensional space, two clusters corresponding
to men and women were seen. The third dimension separated these clusters into two
planes. Therefore, rather than applying Euclidean distance to the three dimensional
space, a gender-specific two-dimensional Euclidean distance metric was used. For ex-
ample, if person p was a woman, only other women were considered for determining
the closest person.
This model can arbitrarily be extended to K-nearest neighbors. We show the results
of just a single nearest neighbor as an illustration of the technique.
5.3.4 Hierarchical Linear Models
5.3.4.1 Model Definition
To consolidate information across people, we adopt a two-level approach with hierarchi-
cal linear models (HLMs) [133]. Hierarchical Linear Modeling has been successfully
used in various biological systems for joint modeling across a population [134]. Figure
5.2 illustrates our approach. As in Section 4.3.2, we assume that each output energy
value,y
np
is linearly dependent on inputx
np
. This can be expressed as:
80
Figure 5.2: A graphical model showing the relationship between variables in a hier-
archical linear model. A two-level dependence is assumed. At the lower, intra-person
level, a linear relationship between a person’s movement and energy consumption is for-
mulated. At the higher, inter-person level, the model parameters themselves are linearly
dependent on the morphological descriptors and population parameter k.
y
np
N
y
np
; w
T
p
x
np
;
1
p
;
8n
p
2 (1; 2;:::N
p
):
For each participant, we are also given morphological descriptors determined by Phys
p
and the complete set for all P people, PHY =fPhys
p
g
P
p=1
. We model top-down
dependence of each person’s model parameters,w
p
on their morphological descriptors
Phys
p
and a “population” parameter k. Each componentw
p;m
(m = 0; 1; 2;:::D) of
w
p
follows the relation:
w
p;l
N
w
p;l
; k
T
l
Phys
p
;
1
p
I
;
l2 (0; 1;:::;D + 1):
81
p
is a noise term that incorporates noise in the mapping from Phys
p
tow
p;m
. Each w
p
in turn influences energy predictionsy
np
for an inputx
np
as before. Figure 5.2 illustrates
the graphical representation of the hierarchical linear regression model. If there areM
morphological parameters, then the dimension of each k
m
is alsoM. Thus the overall
parameter k =
k
0
k
1
::: k
D
T
is a (D + 1)M matrix.
The HLM combinesP personal regression models in two ways. First, the local re-
gression coefficients w
p
determine energy values for each person. Second, the different
coefficients are connected through the population-level parameter k. The population
parameter connects data from multiple participants and consolidates that information.
Intuitively, the HLM captures the inherent similarity in movement across different peo-
ple while accounting for variations in the mapping between individual walking styles
and energy consumption.
5.3.4.2 Inference
Training the hierarchical linear regression model is equivalent to learning individual
w
p
’s, the overall parameter k as well as the noise parametersf
p
g
P
p=1
,f
p
g
P
p=1
. The
complete log-likelihood function is:
L =log
P
Y
p=1
p (Y
p
; w
p
jk;
p
;
p
) =
P
X
p=1
N
p
2
log
p
+
M
2
log
p
p
2
kY
p
X
p
w
p
k
2
+
p
2
M
X
m=1
w
p;m
Phys
T
p
k
m
2
!
+const
!
To maximize this likelihood each and
p
and
p
must achieve the right balance by be-
ing as large as possible while minimizing the relative sum of least squares error terms
kY
p
X
p
p
k
2
and magnitude terms
w
p;m
Phys
T
p
k
m
2
. Intuitively, the algorithm
82
has to balance the intra-person fit given by the least-squares error term and the inter-
person fit given by the magnitude term. Each k
m
forces the w
0
p;m
s to maintain consis-
tency by sharing information across people. The confidence of each w
p
(measured by
p
) is proportional to how close the mean
p
approximates what the predicted mean is
from other people and what the inherent noise is in the person’s model. Likewise the
intra-person prediction is a weighted combination of what the higher level prediction is
from other people (given by
p
m
p
) and what the individual experimental noise levels
are (given by
p
X
T
p
Y
p
). After convergence, these will represent a balanced prediction
that consolidates model parameters across people.
Again, the appearance of cross-terms in the differential of this log-likelihood does
not allow direct estimates of the parameters and variables. So, we resort to an approxi-
mate method using the EM algorithm. Figure 3 describes the algorithm we use to learn
this model.
5.3.4.3 Prediction
Given the model, we predict energy values for a new personP + 1 with morphological
parameters given byPhys
P+1
, using the equation:
w
P+1;l
N
w
P+1
; Phys
T
P+1
k
l
;
1
P+1
;
8l2f1; 2;:::Dg:
y
n
P+1
N
y
n
P+1
; w
T
P+1
x
n
P+1
;
1
P+1
;
8n
p+1
2f1; 2;:::N
P+1
g:
We set
P+1
and
P+1
to be the average of
p
’s and
p
’s over all people.
In this way, given a person’s morphological parameters, the HLM generates a per-
sonalized set of parameters w
P+1
similar to what a personal model would produce.
83
Algorithm 3 EM Algorithm for hierarchical linear model
Inputs:
morphological descriptors in the data matrix form PHYS and corresponding rest en-
ergy values Y.
Initialization:
Initialize, w
p
=
X
T
p
X
p
1
X
T
p
Y
p
;
p
and
p
as random values.
Define expectation of log-likelihood as:
L =
P
P
p=1
Np
2
log
p
+
M
2
log
p
p
2
jjY
p
X
p
p
jj
2
+
p
2
jj
p
jj
2
1
2
lnjS
p
j
Repeat until log-likelihood converges:
M-step:
k
l
=
Phys
p
Phys
T
p
Phys
T
p
l
m
p;l
= Phys
T
p
k
l
p
=
D
k
p
m
p
k
2
+trace (S
p
)
p
=
N
p
Np
X
np=1
kY
p
X
p
p
k
2
+trace
X
T
p
S
P
X
p
E-step:
m
p;l
= Phys
T
p
k
m
S
p
=
p
I +
p
X
T
p
X
p
1
p
=
p
I +
p
X
T
p
X
p
1
p
X
T
p
Y
p
+
p
m
p
Recalculate likelihood.
These can then be used to predict energy expenditure for personP +1 given their move-
ment.
5.3.4.4 A note on initialization:
Since there are a large number of model parameters and the EM algorithm is guaranteed
only to converge to a local optimum, proper initialization is key. Fortunately, we have
an intuitively good initialization set for each of these parameters as the estimates given
by the individualized model algorithm. Using Bayesian Linear Regression, we train
P individual models and obtain estimates forfw
p
N (
p
;
p
);
p
;
p
g
P
p=1
. We then
84
55 60 65 70 75 80 85 90
1.5
1.55
1.6
1.65
1.7
1.75
1.8
1.85
1.9
1.95
2
Weight (kgs)
Height (meters)
Normal
Overweight
Obese
Males
Females
Figure 5.3: Characteristics of study population plotted as height versus weight with
normal and overweight regions shown. Men are shown as triangles and women are
shown as circles. Average height was 1:730:07m and average weight was 69:77:5kg
use these models to train a higher level Bayesian linear regression model that maps
individualw
p;l
’s tok
l
’s across people. We feed these initial estimates into our model.
5.3.5 ACSM Speed-based Models
In order to compare our approach with current state-of-the-art techniques, speed-based
calorie predictions obtained from the ACSM Exercise Guidelines [135] were calculated
on the corresponding recorded speeds. The ACSM Exercise Guidelines provide a means
to estimate calories expended from the speed of walk as:
Energy (kcal=min) = ((13:4Speed (mph)) + 17:5)Weight (kg):
85
Figure 5.4: Illustration of recording procedure. Kinematic data were collected with a
sensor mounted on the right iliac crest.
5.4 Results
5.4.1 Participant Statistics
This evaluation used data from the first data collection study outlined in section A.2.
Data were collected from a total of 35 participants (25 male, 10 female). Figure 5.3
describes participant statistics. Each participant’s height was measured with a wall-
based height chart, weight was measured with the EatSmart Precision Digital weighing
scale. Body mass index was extracted from these measures. The participant was asked to
sit still and meditate for five minutes while energy expenditure and heart rate data were
collected at the frequency of the every breath. These were averaged to obtain resting
energy expenditure and resting heart rate respectively. The participant then walked on
a treadmill at three speeds - 2.5, 3.0 and 3.5 mph for six minutes per speed with two
minutes of settling time between speeds to reach steady state.
86
5.4.2 Data Collection and Pre-processing
Each participant p, wore a Galaxy Nexus S phone running Android 2.3.3 on the right
iliac crest with a belt holder to record movement. Accelerometer data were captured
with a custom-built smart phone app - Movement Trackr [136]. The app records triaxial
accelerometer data at a set sampling rate. For the purposes of this study, the accelerome-
ter settings were set at “Fastest” (50 Hz). A Butterworth bandpass filter with 3dB cutoff
between 0.75 Hz and 2.3 Hz was applied to the raw accelerometer data. Energy ex-
penditure was measured using the Oxycon
TM
Mobile Metabolic unit from Carefusion.
The unit was worn as a backpack fitted to the comfort of the participant. The metabolic
unit reports participant
_
VO
2
and
_
VCO
2
and derived calorie data at the frequency of ev-
ery breath. Phone data were synchronized with metabolic unit data in post-processing.
Data streams consisting of triaxial accelerometer and energy expenditure data that cor-
responded to walking were segmented out. These were further segmented into separate
steady-state walking sub-sections corresponding to each speed.
For each walking sub-section, data were divided into 10 second intervals or epochs
np
;n
p
2 [1;N
p
]. For each epoch, a 1024 point periodogram for the Y-axis was cal-
culated from the accelerometer data. This corresponded to the up-down movement of
the participant. The periodogram coefficients corresponding to frequencies greater than
2.5 Hz were discarded. The average step frequency for epoch
np
, was calculated by
extracting the frequency corresponding to the highest magnitude in this domain. The
extracted step frequency, step-frequency squared and a constant term were used as the
descriptors of movementx
np
corresponding to epoch
np
, for that personp. The energy
expenditure for that epoch,y
np
was obtained by averaging metabolic unit values for
np
.
This was repeated for each epochs and the complete input and output datafX
p
; Y
p
g
were obtained for each participant. The complete set of morphological descriptors for
87
this participant, Phys
p
consisted of height, weight, BMI, resting heart rate (RHR) and
resting energy expenditure (REE).
5.4.3 Evaluation Methodology
We used a 1-of-K methodology to rank the predictive capability of an algorithm. Given
a population of P participants, P 1 were chosen to train either a nearest-neighbor,
weight-scaled, speed-based model or an HLM with a specific combination of morpho-
logical descriptors. Depending on the combination of morphological descriptors used,
a different hierarchical model could be obtained. To train the personal model, a uni-
formly sampled set consisting of 60% of the data for thep
th
participant were selected
as training data, the remaining constituting test data. The personal model described in
section 4.3.2 was learned from this training data and as a reference. All models were
used to predict energy expenditure on the test data for participantP and the root-mean-
squared (RMS) error for each model was calculated. This was repeated with different
randomly sampled data over twenty iterations and the error calculated each time. This
represented performance per participant. The average root-mean-squared (ARMS) error
was calculated by averaging RMS errors across all iterations. The errors represented
the respective performance of each algorithm for participant p. This was repeated for
each participant in the population and the errors were averaged across all participants.
ARMS error was used to compare algorithm performance.
5.4.4 Comparison of Algorithms
Given five morphological descriptors, we evaluated HLM performance with all possible
combinations of descriptors resulting in 63 different hierarchical models. When using
morphological descriptors we also considered squared and cross terms up to quadratic.
88
0 6 12 18 24 30 36 42 48 54 60
0.2
0.5
0.8
1.1
1.4
1.7
2
2.3
2.6
2.9
Parameter Type
ARMSE (kcal/min)
ACSM
Nearest Nbr
HLM
Personal
(a) Comparison of generalization models. HLM
variance across people is shown in gray. Near-
est neighbor models showed the highest error fol-
lowed by ACSM based models. Lower is better.
HLM descriptors corresponding to REE, RHR and
sex were poor morphological descriptors to de-
rive a personal model from. Size-based descriptors
such as weight, height and BMI showed higher ac-
curacy.
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
Min Exponent =1.4
Exponent value
ARMSE (kcal/min)
Weight Scaled
HLM−Min
Personal
(b) Illustration of the performance of weight-scaled
model compared with HLMs and personal models.
The best HLM showed the same level of perfor-
mance as weight-scaled models with an exponent
1.4. When evaluated on a per-person basis, the
exponent showed a high variation (1:2:6) indi-
cating that a single weight scale is not appropriate
across all users.
Figure 5.5: Relative performance of algorithms as predicted by Average Root Mean-
Squared Error (ARMSE). HLM performance with all possible combinations of descrip-
tors resulting in 63 different hierarchical models.
This was based on the intuition that a nonlinear functional dependence on morphological
descriptors could be approximated by a Taylor series. We used personal regression and
speed-based models for reference. The order of introduction of descriptors was REE,
RHR, BMI, Weight and Height. Each descriptor combination was assigned a number
corresponding to a binary encoding of that combination. For example, a number of 25
corresponded to a binary encoding of 10101 or resting energy expenditure, BMI and
height.
Figure 5.5a illustrates the variation of errors with different descriptor combinations
as identified by the encoding along with nearest neighbor, speed-based and personal
models for comparison. Nearest neighbor models showed highest errors. This could be
to the lack of sufficient neighbors close enough to a particular participant due to a smaller
89
population. With respect to the HLM, the earlier descriptor combinations correspond to
using REE, RHR and sex alone. These descriptor combinations resulted in errors that
were higher than speed-based models and nearest neighbor models. Once size-based
descriptors such as BMI, weight and height were introduced, the errors reduced and
were more consistent. The lowest errors were obtained when all the descriptors were
used. For the remainder of this paper, we neglect nearest neighbor models for the sake
of brevity.
Weight-scaled models were also compared. Figure 5.5b illustrates the variation of
mean of errors with the weight exponent (standard deviation shown in gray). The best
HLM showed the same level of performance as weight-scaled models with an exponent
1.4. However, when evaluated on a per-person basis, the exponent showed a high vari-
ation (1:2:6) indicating that a single weight scale does not work equally well across
all users. This needs further investigation.
5.4.5 Best Individual descriptor
We used the HLM to evaluate the best individual morphological descriptor to gener-
ate an accurate, personalized energy model from movement to energy expenditure. For
each participant, the ARMS errors of the HLM corresponding to each descriptor com-
bination, were sorted in ascending order and assigned a rank. For each error in this
order, the descriptors corresponding to that error were awarded a score equal to the error
rank. For example, if weight and height appeared in the third lowest error, they both
received a score of three. This was calculated for each error in the ranking. The aver-
age of all scores awarded to each descriptor was calculated and represented the relative
performance of that individual descriptor. This was repeated for each participant. The
intuition behind this scheme is that if the appearance of a particular descriptor results in
90
Height Weight BMI Sex RHR REE
15
20
25
30
35
40
Different Physiological Features
Average Score
Mean
Median
Figure 5.6: Average score across participants for each individual descriptor. descriptor
combinations were ranked according to the ARMS errors that they produced and the
ranking per descriptor was extracted and averaged across participants. Lower is better.
Weight and height showed the lowest ranking while sex, REE and RHR showed the
highest rankings. The effect of sex was absorbed by the weight and height since the
population on average weighed less and were shorter.
the lower errors, it will appear in the beginning of the sorted list more often. Hence a
lower score for a morphological descriptor implies greater importance in personalizing
a person’s energy expenditure model.
Figure 5.6 shows the comparative mean ranking (filled square) with standard devia-
tion for each individual descriptor across all users. Median is also shown for reference
(cross). Weight and height were the best individual descriptors with the lowest score.
This indicates that size-based descriptors are the best descriptors to generate a model.
Sex had the highest score. This could be because women in our study were on average
shorter and weighed less than men and this difference was absorbed in the size based
descriptors. Even though BMI is derived from height and weight, it did not result in a
lower score. This could be because the relative contribution of weight and height were
mitigated by the mathematical transformation of BMI. In our population, a participant
with a large weight was also tall. BMI is defined as Weight=Height
2
. Thus, a higher or
lower BMI would be mitigated by the corresponding higher or lower height. One way to
91
10 20 30 40 50 60 70 80 90
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
Percentage Training Data
RMS Error (Kcal/min)
Personal Model
HLM mean
HLM Median
(a) The hierarchical model performed better
than the personal model when no or limited
training data were available. With more data,
the personal model performed better.
2.5 mph 3.0 mph 3.5 mph
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Different Speeds
RMS Error (kcal/min)
Individual
Hierarchical
ACSM
Median
(b) The personal model showed the lowest error
when predicting the intermediate speed of 3.0 mph.
This shows that personal models show the best pre-
dictive capability when data from extreme values
is available. ACSM speed-based models showed
highest RMS errors at higher speeds. Hierarchical
models showed uniform predictive capability across
all speeds.
Figure 5.7: Illustration of the predictive capability of each algorithm when limited train-
ing data were available. Lower is better. Hierarchical models performed as well and in
some cases were better than personal models. However, they were able to achieve this
with no prior information about the participant other than their morphological descrip-
tors.
test this further would be to perform a similar study including obese people (for whom
BMI is typically larger). We aim to expand on this in future work. Resting energy ex-
penditure and resting heart rate showed high scores indicating that rest-based descriptors
are not good descriptors for generating a model.
5.4.6 Predictions with Reduced Training Data
We extended our study to examine the variation of predictive capacity of each algorithm
with reduced data. Figure 5.7a describes the performance of the hierarchical model
versus the personal model (p<0.05) with increasing training data. At low percentage
92
of data, the hierarchical model performed better than the personal model. With more
training data, the personal model out-performed the hierarchical model.
A second experimented tested the ability of each algorithm in absence of data cor-
responding to a particular speed. Instead of randomly sampling the data as described
in section 5.4.3, we used training data corresponding to two out of three speeds and
trained a personal model with that data. All the general models were trained exactly as
described before. Testing was done on the third speed. This was repeated for all possible
combinations of speeds and the corresponding RMS error was calculated. The results
are shown speed-wise in figure 5.7b.
In the figure, each group represents RMS error when that particular speed was ex-
cluded from the training data. The interpolation capability of the individualized model
was the best when predicting the 3.0 mph and poorest at 3.5 mph. Intuitively, this can
be understood as the best interpolation capability can be obtained when using training
data from the extrema of speeds available. The second best error for personal models
was obtained when higher speeds were used and tested on the lowest speed. Hierarchical
models showed lesser variance in predictive capability across speeds. This indicates that
when training data from either extremum is not available, it would be preferable to use
HLMs. ACSM speed-based models showed increasingly higher errors with increasing
speed indicating their unsuitability for predicting energy expenditure at higher speeds.
It is important to note that the hierarchical model does not have access to any person-
specific training data from the participant. Despite this, the model performs comparably
and in some cases out-performs the personal model that has access to training data. This
is because the hierarchical model utilizes information that is consolidated and trans-
ferred from the remaining participants. This results in a more informed prior which
results in more accurate models when less data are available.
93
5.5 Summary
An issue with current regression techniques is how one can obtain accurate regression
maps with as little data as possible. One way to achieve this is to consolidate data from
previous recordings on other people into a generalization model based on morphological
descriptors such as height, weight, age etc. Current approaches use a limited number of
descriptors such as weight and height. We extended this work with a family of regression
techniques that incorporated an arbitrary number of morphological descriptors. Our
chief contributions are summarized as:
Mathematical formulation of the problem of generalization: We cast the prob-
lem of normalization of regression maps in a mathematical framework and then de-
scribed various regression models using this framework. These included nearest neigh-
bor models, weight-scaled models, a set of hierarchical linear models and speed-based
approaches. The relative merits and demerits of these approaches were also described.
Model comparison of generalization algorithms: We performed a comparative
analysis of the generalization capability of these algorithms taking the example of tread-
mill walking on a population of 35 participants. Given the population, nearest neighbor
models showed highest errors. Descriptor combinations corresponding to REE, RHR
and sex alone resulted in errors that were higher than speed-based models and near-
est neighbor models. Size-based descriptors such as BMI, weight and height produced
lower errors. This indicates that among our chosen descriptor set, size-based descriptors
such as weight and height were the best available to generate personalized models. The
best HLM showed the same level of performance as weight-scaled models with an ex-
ponent 1.4. However, when evaluated on a per-person basis, the same weight exponent
could not be used.
94
Evaluation of the best descriptor combination: We used the hierarchical linear
model to determine the best individual descriptor to describe a person. Weight was the
best individual descriptor followed by height. Sex, REE and RHR showed the highest
rankings. The effect of sex was absorbed by the weight and height since women in the
population were on average shorter and lighter than men.
95
Chapter 6
Conclusion
The woods are lovely, dark and deep.
But I have promises to keep,
And miles to go before I sleep,
And miles to go before I sleep.
—Robert Frost.
T
HE dissertation closes with a summary of contributions and several suggestions for
future research. The discussion of follow-on work details open problems related
to energy expenditure estimation and the designing of interventions for raising physical
activity in general.
6.1 Contributions
In this section, we review the contributions of the dissertation. Our efforts focused on
developing accurate and precise techniques to estimate a person’s energy expenditure
given their movement and morphology. We relied primarily on data-driven regression
techniques to derive functional maps from movement to energy expenditure.
96
In Chapter 3, we showed how one can use the up-down movement of the center of
mass of the human body to robustly characterize cyclic movement. We showed the util-
ity of Fourier transform-based techniques in detecting the periodicity of this movement.
We also showed an initial study examining the robustness of frequency-based features
to changes in location and orientation. The up-down movement was the most consistent
signature across multiple locations of the human body. Tracking the vertical orientation
is important in order to be able to estimate vertical center of mass movement. We also
showed a modification of the Fast Fourier Transform in a recursive setting to efficiently
calculate frequency components. The resultant Momentary Fourier Transform requires
145 times less operations than an FFT.
In Chapter 4, we described a set of techniques to develop personal maps from move-
ment to energy expenditure. We presented three algorithms - Least-Squares Regression
(LSR), Bayesian Linear Regression (BLR) and Gaussian Process Regression (GPR).
We reported and compared prediction accuracies using different sensor streams and al-
gorithms. LSR results depended heavily on the number of points used for training. This
was because LSR is prone to over-fitting. BLR and GPR showed reduced errors with
increasing training data size. GPR showed reduced errors compared to BLR due to
operating in the similarity space between features rather than relying on explicit depen-
dence on input features. Among accelerometer features, Up-Down acceleration showed
the lowest prediction error. Gyroscopic axes showed comparable errors. Combining
accelerometer and gyroscope information reduced prediction errors. Choice of axis is
not an issue when using GPR. However, GPR training time was at least two orders of
magnitude higher. Therefore if computing power is an issue, it would be advisable to
use linear models like BLR over nonlinear models and trading off accuracy.
97
In Chapter 5, we showed how one can generate personalized maps from movement
to energy expenditure using a minimal set of morphological descriptors such as height,
weight, age etc. We cast the problem of normalization of regression maps in a mathe-
matical framework and then described various regression models using this framework.
These included nearest neighbor models, weight-scaled models, a set of hierarchical lin-
ear models and speed-based approaches. We performed a comparative analysis of the
generalization capability of these algorithms. Given the population, nearest neighbor
models showed highest errors. Descriptor combinations corresponding to REE, RHR
and sex alone resulted in errors that were higher than speed-based models and nearest
neighbor models. We used the hierarchical linear model to determine the best individual
descriptor to describe a person. Weight was the best individual descriptor followed by
height.
These contributions are a step towards designing cost-effective, accurate and ubiqui-
tous solutions to estimate physical activity levels and designing interventions based on
accurately measured data.
6.2 Future Work
There are a number of possible directions for future research, in areas related to both
physical activity monitoring and energy expenditure estimation. We highlight a few
research directions in our horizon.
Frequency-based sensing: In chapter 3, we described a frequency based technique
for estimating periodicities. We plan on expanding the scope of this work to include
extensive validation across a large population. As part of this validation, we plan on
testing the validity of frequency based features across common locations on the phone
98
and across a larger number of people. We also plan on exploring the usage of frequency
based features in activity classification.
Modeling the input space: One limitation in our work is that we do not explicitly
model the input space of center of mass frequencies, instead relying on measuring it
directly. Our next goal is to extend this work and examine how a person’s morphology
can affect the range of frequencies that they can generate. These are likely to depend
on height, weight, leg length, gender and other factors. Learning a joint distribution
of frequency and morphological features will allow us to characterize people based on
measurements of their activity frequency.
Scope of activities: We plan on expanding the scope of our work to a larger range
of activities. This includes learning energy expenditure maps for other forms of walking
such as walking uphill (or downhill), up or down stairs, disabled gait etc. In this dis-
sertation, we focused on activities that are in steady state. We will expand our work to
include in transitional energy expenditure values when a person moves from one activ-
ity to another. We also plan to explore the utility of frequency-based features for other
activities such as running, cycling or rowing.
Characterization of modeling capability: In chapters 4 and 5, we described mod-
els from movement to energy expenditure. It is important for us to be able to understand
the theoretical bounds of how good these models can be with reference to the ground
truth. For this we plan on performing a theoretical analysis and an experimental study
to verify the best possible performance of these models. We also plan on modifing our
models to further incorporate morphological similarity-based model generation. Here a
non-parametric model in the upper layer would utilize the similarities between people
(represented by a kernel function) to generate the personalized model parameters. We
99
also plan on developing similarity-based nonparametric models to predict resting energy
expenditure given morphological descriptors.
Smart data acquisition: In order for kinematic sensor-based calorimetry to be ubiq-
uitous, the algorithms must be accurate across a large population of people with minimal
data available. Our current methods rely on being able to collect large amounts of data
per person or across people to train our algorithms. This solution will not be feasible
in free living settings. In chapter 5, we presented a set of generalized models to predict
energy expenditure in the absence of training data. Ideally, such techniques should be
as good as developing a personalized model for each person at all data ranges. However
with more data available, personalized models outperform the generalized techniques.
To tackle this, we plan on exploring hybrid approaches that use both historical informa-
tion and unsupervised, active data selection to improve the accuracy of our models. For
example, time can be saved if one could use the generalized model as an informative
prior and then intelligently pick a few data points to make the energy expenditure map
as accurate as possible. This implies that the algorithm needs to be being actively aware
of what data to choose to make its performance better.
Incorporating other kinds of sensors: In this dissertation, we focused mainly on
the utility of kinematic sensors in predicting energy expenditure. Other kinds of sen-
sors could help in making these estimates more accurate. For example, being able to
measure skin or basal body temperature would help in assessing metabolic rate due to
rest. Combining heart rate information can help in predicting energy expenditure for
dynamic activities. To explore these possibilities, we plan on performing a comparative
study for treadmill and free walking.
Behavioral characterization: We plan on incorporating algorithms in a cellphone
platform to enable long-term activity monitoring. This would allow us to learn daily
100
behavioral profiles of cell phone users. This would represent a key step in designing in-
tervention techniques tailored towards positive behavioral change. Learning behavioral
profiles would also allows us to identify similarities or differences in lifestyles between
people in various geographical locations. This can potentially inform policy planning
and health initiatives so that they may be targeted at specific geographical locations.
This also allows us to characterize long-term changes in behaviors to detect possible
signs of chronic disease.
Real-time monitoring and intervention: The possibility of incorporating our tech-
niques in a portable, connected platform like a cellphone also opens up avenues in de-
signing data-driven, just-in-time interventions to change human behavior. Of particular
importance is intervention to promote non-exercise activities such as walking or fidget-
ing in regular intervals throughout the day. These offer the potential to improve daily
activity profiles. Possible intervention techniques include display of physical activity
information in the form of graphs or visual metaphors, use of social support or com-
petition to influence behaviors, gamification or providing haptic feedback based on the
user’s current state. The goal of our future research will be to investigate the relative
utility of each of these approaches and use them for positive behavioral change.
6.3 Final Remarks
Healthcare stands to benefit from data-driven computing to inform treatments and track
personal wellness. Our work focused on being able to track one aspect of this - physical
activity monitoring. Through the work presented in this dissertation, we showed how
data-driven approaches could be applied to commodity sensing techniques. We showed
how they could be as accurate as state of the art clinical equipment. We also showed
how one can utilize data from people to obtain accurate models without almost zero
101
information about a person. With the progress of research in this area, we believe that
kinematic sensors coupled with other sensing techniques have the potential greatly assist
in understanding physical activity profiles of millions of people.
Our hope is that these contributions will pave the way for a new set of treatments
and techniques that are personalized, pervasive and persuasive. Our vision of the future
is that every person will have a protective shield of sensors around them monitoring
their health. These sensors will know them and understand them through the knowledge
they accumulate both within and across people over multiple generations. Our goal is
to build a system that will make available the knowledge and tools to manage health
available 247 to every individual on this planet.
We end with a slightly controversial quote by Vinod Khosla:
“Doctors can be replaced by software – 80% of them can. I’d much rather have a good
machine learning system diagnose my disease than the median or average doctor. ”
Maybe some day, his wish will come true.
102
Bibliography
[1] G. J. Welk, J. McClain, and B. E. Ainsworth, “Protocols for evaluating equiva-
lency of accelerometry-based activity monitors,” Medicine and science in sports
and exercise, vol. 44, no. 1, 2012.
[2] W. H. Organization, Global health risks: mortality and burden of disease at-
tributable to selected major risks. World Health Organization, 2009.
[3] C. Bouchard, S. N. Blair, and W. L. Haskell, Physical activity and health. Human
Kinetics Publishers, 2012.
[4] D. M. Bramble and D. E. Lieberman, “Endurance running and the evolution of
homo,” Nature, vol. 432, no. 7015, pp. 345–352, 2004.
[5] D. Warburton, C. Nicol, and S. Bredin, “Health benefits of physical activity: the
evidence,” CMAJ, vol. 174(6), pp. 801–809, 2006.
[6] C. A. Macera, K. E. Powell et al., “Population attributable risk: implications of
physical activity dose,” Medicine and science in sports and exercise, vol. 33, no.
6; SUPP, pp. 635–639, 2001.
[7] C. A. Macera, J. M. Hootman, and J. E. Sniezek, “Major public health benefits of
physical activity,” Arthritis Care & Research, vol. 49, no. 1, pp. 122–128, 2003.
[8] R. S. Paffenbarger and W. E. Hale, “Work activity and coronary heart mortality.”
The New England Journal of Medicine; The New England Journal of Medicine,
1975.
[9] R. S. Paffenbarger, R. J. BRAND, R. I. SHOLTZ, and D. L. JUNG, “Energy
expenditure, cigarette smoking, and blood pressure level as related to death from
specific diseases,” American Journal of Epidemiology, vol. 108, no. 1, pp. 12–18,
1978.
[10] A. Dunn, B. Marcus, J. Kampert, M. Garcia, H. K. III, and S. Blair, “Comparison
of Lifestyle and Structured Interventions to Increase Physical Activity and Car-
diorespiratory Fitness: A Randomized Trial,” JAMA, vol. 281, pp. 327–334, 1999.
[Online]. Available: http://jama.ama-assn.org/cgi/content/abstract/281/4/327
[11] S. P. Helmrich, D. R. Ragland, R. W. Leung, and R. S. Paffenbarger Jr, “Physical
activity and reduced occurrence of non-insulin-dependent diabetes mellitus,” New
England journal of medicine, vol. 325, no. 3, pp. 147–152, 1991.
103
[12] S. P. Helmrich, D. R. Ragland, R. S. Paffenbarger Jr et al., “Prevention of non-
insulin-dependent diabetes mellitus with physical activity.” Medicine and science
in sports and exercise, vol. 26, no. 7, p. 824, 1994.
[13] J. B. Kampert, S. N. Blair, C. E. Barlow, and H. W. Kohl, “Physical activity,
physical fitness, and all-cause and cancer mortality: a prospective study of men
and women,” Annals of epidemiology, vol. 6, no. 5, pp. 452–457, 1996.
[14] H. D. Sesso, I.-M. Lee, and R. S. Paffenbarger, “Physical activity and breast
cancer risk in the college alumni health study (united states),” Cancer Causes
and Control, vol. 9, no. 4, pp. 433–439, 1998.
[15] I. Thune and A.-S. Furberg, “Physical activity and cancer risk: dose-response and
cancer, all sites and site-specific.” Medicine and Science in Sports and Exercise,
vol. 33, no. 6 Suppl, p. S530, 2001.
[16] R. Ross, D. Dagnone, P. J. Jones, H. Smith, A. Paddags, R. Hudson, and
I. Janssen, “Reduction in obesity and related comorbid conditions after diet-
induced weight loss or exercise-induced weight loss in men,” Ann Intern Med,
vol. 133, no. 2, pp. 92–103, 2000.
[17] C. W. Suitor and V . I. Kraak, Adequacy of evidence for physical activity guidelines
development: workshop summary. National Academy Press, 2007.
[18] D. W. Dunstan, G. N. Healy, T. Sugiyama, and N. Owen, “Too much sitting
and metabolic risk–has modern technology caught up with us?” European En-
docrinology, vol. 6, no. 1, pp. 19–23, 2010.
[19] C. E. Matthews, K. Y . Chen, P. S. Freedson, M. S. Buchowski, B. M. Beech,
R. R. Pate, and R. P. Troiano, “Amount of time spent in sedentary behaviors in
the united states, 2003–2004,” American journal of epidemiology, vol. 167, no. 7,
pp. 875–881, 2008.
[20] G. N. Healy, K. Wijndaele, D. W. Dunstan, J. E. Shaw, J. Salmon, P. Z. Zim-
met, and N. Owen, “Objectively measured sedentary time, physical activity, and
metabolic risk the australian diabetes, obesity and lifestyle study (ausdiab),” Di-
abetes care, vol. 31, no. 2, pp. 369–371, 2008.
[21] C. P. Wen and X. Wu, “Stressing harms of physical inactivity to promote exer-
cise,” The Lancet, vol. 380, no. 9838, pp. 192–193, 2012.
[22]
[23] C. Macera, D. Jones, M. Yore, S. Ham, H. Kohl, C. Kimsey Jr, D. Buchner et al.,
“Prevalence of physical activity, including lifestyle activities among adults-united
104
states, 2000-2001.” Morbidity and Mortality Weekly Report, vol. 52, no. 32, pp.
764–766, 2003.
[24] D. Spruijt-Metz, D. Berrigan, L. Kelly, R. McConnell, D. Dueker, G. Lindsey,
A. Atienza, S. Nguyen-Rodriguez, M. Irwin, J. Wolch, M. Jerrett, Z. Tatalovich,
and S. Redline, Handbook of Assessment Methods for Eating Behaviors and
Weight-Related Problems: Measures, Theory, and Research, 2nd ed. Sage,
2009, ch. 6, pp. 187–254.
[25] J. Speakman, “The history and theory of the doubly labeled water technique,”
American Journal of Clinical Nutrition, vol. 68, pp. 932S–938S, 1998.
[26] D. Schoeller and J. Hnilicka, “Reliability of the doubly labeled water method
for the measurement of total daily energy expenditure in free-living subjects.”
Journal of Nutritition, vol. 126(1), pp. 348S–354S, 1996.
[27] M. Elia and G. Livesey, “Energy expenditure and fuel selection in biological sys-
tems: the theory and practice of calculations based on indirect calorimetry and
tracer methods.” World Rev Nutr Diet, vol. 70, pp. 68–131, 1992.
[28] N. Zuntz, F. Müller, and A. Loewy, Höhenklima und Bergwanderungen in ihrer
Wirkung auf den Menschen. D. Verl. Bong & CO, 1906.
[29] D. Macfarlane, “Automated Metabolic Gas Analysis Systems: A Review,” Sports
Medicine, vol. 31, pp. 841–861, December 2001.
[30] T. Meyer, R. Davison, W. Kindermann et al., “Ambulatory gas exchange
measurements-current status and future options,” International Journal of Sports
Medicine, vol. 26, no. 1, p. 19, 2005.
[31] S. N. BLAIR, W. L. HASKELL, P. Ho, R. S. PAFFENBARGER, K. M.
VRANIZAN, J. W. FARQUHAR, and P. D. WOOD, “Assessment of habitual
physical activity by a sevenday recall in a community survey and controlled ex-
periments,” American Journal of Epidemiology, vol. 122, no. 5, pp. 794–804,
1985.
[32] A. S. Leon, J. Connett, D. R. Jacobs Jr, and R. Rauramaa, “Leisure-time physical
activity levels and risk of coronary heart disease and death,” JAMA: the journal
of the American Medical Association, vol. 258, no. 17, pp. 2388–2395, 1987.
[33] A. L. Stewart, C. J. Verboncoeur, B. Y . McLellan, D. E. Gillis, S. Rush, K. M.
Mills, A. C. King, P. Ritter, B. W. Brown, and W. M. Bortz, “Physical activity
outcomes of champs ii a physical activity promotion program for older adults,”
The Journals of Gerontology Series A: Biological Sciences and Medical Sciences,
vol. 56, no. 8, pp. M465–M470, 2001.
105
[34] R. E. Taylor-Piliae, L. C. Norton, W. L. Haskell, M. H. Mahbouda, J. M. Fair,
C. Iribarren, M. A. Hlatky, A. S. Go, and S. P. Fortmann, “Validation of a new
brief physical activity survey among men and women aged 60–69 years,” Ameri-
can journal of epidemiology, vol. 164, no. 6, pp. 598–606, 2006.
[35] J. Zuzanek, “Experience sampling method: Current and potential research appli-
cations,” in Workshop on Time-use Measurement and Research, Washington, DC,
1999.
[36] N. Bolger, A. Davis, and E. Rafaeli, “Diary methods: Capturing life as it is lived,”
Annual review of psychology, vol. 54, no. 1, pp. 579–616, 2003.
[37] S. S. Intille, “Technological innovations enabling automatic, context-sensitive
ecological momentary assessment,” The science of real-time data capture: Self-
reports in health research, pp. 308–337, 2007.
[38] B. E. Ainsworth, W. L. Haskell, M. C. Whitt, M. L. Irwin, A. M. Swartz, S. J.
Strath, W. L. O Brien, D. R. Bassett, K. H. Schmitz, P. O. Emplaincourt et al.,
“Compendium of physical activities: an update of activity codes and met inten-
sities,” Medicine and science in sports and exercise, vol. 32, no. 9; SUPP/1, pp.
498–504, 2000.
[39] P. L. Schneider, S. E. Crouter, O. Lukajic, and D. R. Bassett, “Accuracy and
reliability of 10 pedometers for measuring steps over a 400-m walk,” Medicine
and Science in Sports and Exercise, vol. 35, no. 10, pp. 1779–1784, 2003.
[40] H. Kashiwazaki et al., “Heart rate monitoring as a field method for estimating
energy expenditure as evaluated by the doubly labeled water method.” Journal of
nutritional science and vitaminology, vol. 45, no. 1, p. 79, 1999.
[41] S. S. Intille, J. Lester, J. F. Sallis, and G. Duncan, “New horizons in sensor devel-
opment,” Medicine and science in sports and exercise, vol. 44, no. 1, 2012.
[42] G. A. Cavagna, H. Thys, and A. Zamboni, “The sources of external work in level
walking and running.” The Journal of physiology, vol. 262, no. 3, pp. 639–657,
1976.
[43] G. A. Cavagna, N. C. Heglund, and C. R. Taylor, “Mechanical work in terrestrial
locomotion: two basic mechanisms for minimizing energy expenditure,” Ameri-
can Journal of Physiology-Regulatory, Integrative and Comparative Physiology,
vol. 233, no. 5, pp. R243–R261, 1977.
[44] M. Garcia, A. Chatterjee, A. Ruina, M. Coleman et al., “The simplest walking
model: Stability, complexity, and scaling,” J BIOMECH ENG TRANS ASME,
vol. 120, no. 2, pp. 281–288, 1998.
106
[45] Y . Jang, M. Jung, J. Kang, and H. C. Kim, “An wearable energy expenditure
analysis system based on the 15-channel whole-body segment acceleration mea-
surement,” in Engineering in Medicine and Biology Society, 2005. IEEE-EMBS
2005. 27th Annual International Conference of the. IEEE, 2005, pp. 3834–3836.
[46] C. V . Bouten, K. R. Westerterp, M. Verduin, and J. D. Janssen, “Assessment of
energy expenditure for physical activity using a triaxial accelerometer.” Med Sci
Sports Exerc, vol. 26, no. 12, pp. 1516–1523, Dec 1994.
[47] M. J. Mathie, A. C. Coster, N. H. Lovell, and B. G. Celler, “Accelerometry:
providing an integrated, practical method for long-term, ambulatory monitoring
of human movement.” Physiological measurement, vol. 25, no. 2, April 2004.
[Online]. Available: http://view.ncbi.nlm.nih.gov/pubmed/15132305
[48] K. Y . Chen and M. Sun, “Improving energy expenditure estimation by using a
triaxial accelerometer.” J Appl Physiol, vol. 83, no. 6, pp. 2112–2122, 1997.
[49] P. S. Freedson, E. Melanson, and J. Sirard, “Calibration of the computer
science and applications, inc. accelerometer.” Medicine and science in sports
and exercise, vol. 30, no. 5, pp. 777–781, May 1998. [Online]. Available:
http://view.ncbi.nlm.nih.gov/pubmed/9588623
[50] D. Hendelman, K. Miller, C. Baggett, E. Debold, P. Freedson et al., “Validity of
accelerometry for the assessment of moderate intensity physical activity in the
field.” Medicine and science in sports and exercise, vol. 32, no. 9 Suppl, p. S442,
2000.
[51] J. Nichols, C. Morgan, L. Chabot, J. Sallis, K. Calfas et al., “Assessment of physi-
cal activity with the computer science and applications, inc., accelerometer: labo-
ratory versus field validation.” Research quarterly for exercise and sport, vol. 71,
no. 1, p. 36, 2000.
[52] A. M. Swartz, S. J. Strath, D. R. Bassett Jr, W. L. O’Brien, G. A. King, B. E.
Ainsworth et al., “Estimation of energy expenditure using csa accelerometers at
hip and wrist sites.” Medicine and science in sports and exercise, vol. 32, no. 9
Suppl, p. S450, 2000.
[53] N. Klippel and D. Heil, “Validation of energy expenditure prediction algorithms
in adults using the actical electronic activity monitor,” Med Sci Sports Exerc,
vol. 35, 2003.
[54] J. J. Reilly, L. A. Kelly, C. Montgomery, D. M. Jackson, C. Slater, S. Grant,
and J. Y . Paton, “Validation of actigraph accelerometer estimates of total energy
expenditure in young children.” Int J Pediatr Obes, vol. 1, no. 3, pp. 161–167,
2006.
107
[55] G. Plasqui and K. R. Westerterp, “Physical activity assessment with accelerome-
ters: An evaluation against doubly labeled water,” Obesity, vol. 15, no. 10, pp.
2371–2379, 2007. [Online]. Available: http://dx.doi.org/10.1038/oby.2007.281
[56] D. P. Heil, S. Brage, and M. P. Rothney, “Modeling physical activity outcomes
from wearable monitors,” Medicine and science in sports and exercise, vol. 44,
no. 1, 2012.
[57] R. P. Troiano, “Translating accelerometer counts into energy expenditure:
advancing the quest.” J Appl Physiol, vol. 100, no. 4, pp. 1107–1108, Apr 2006.
[Online]. Available: http://dx.doi.org/10.1152/japplphysiol.01577.2005
[58] D. P. Heil, “Predicting Activity Energy Expenditure Using the Actical Activity
Monitor,” Research Quarterly for Exercise and Sport, vol. 77, no. 1, p. 64, March
2006.
[59] A. Godfrey, R. Conway, D. Meagher, and G. OLaighin, “Direct measurement of
human movement by accelerometry,” Medical Engineering & Physics, vol. 30,
no. 10, pp. 1364 – 1386, 2008.
[60] S. E. Crouter, K. G. Clowers, and D. R. Bassett, “A novel method for using ac-
celerometer data to predict energy expenditure.” J Appl Physiol, vol. 100, no. 4,
pp. 1324–1331, Apr 2006.
[61] M. P. Rothney, “Advancing accelerometry-based physical activity monitors :
quantifying measurement error and improving energy expenditure prediction,”
Ph.D. dissertation, Vanderbilt University, 2007.
[62] D. R. Bassett Jr, A. Rowlands, and S. G. Trost, “Calibration and validation of
wearable monitors,” Med Sci Sports Exerc, vol. 44, no. 1 suppl, pp. S32–8, 2012.
[63] J. Lester, T. Choudhury, and G. Borriello, “A practical approach to recognizing
physical activities,” Pervasive Computing, pp. 1–16, 2006.
[64] K. Zhang, P. Werner, M. Sun, F. X. Pi-Sunyer, and C. N. Boozer, “Measurement
of human daily physical activity,” Obesity Research, vol. 11, no. 1, pp. 33–40,
2012.
[65] M. Tapia, “Using machine learning for real-time activity recognition and estima-
tion of energy expenditure,” Ph.D. dissertation, Massachusetts Institute of Tech-
nology. Dept. of Architecture. Program in Media Arts and Sciences., 2008.
[66] J. H. Choi, J. Lee, H. T. Hwang, J. P. Kim, J. C. Park, and K. Shin,
“Estimation of activity energy expenditure: accelerometer approach.” Conf Proc
IEEE Eng Med Biol Soc, vol. 4, pp. 3830–3833, 2005. [Online]. Available:
http://dx.doi.org/10.1109/IEMBS.2005.1615295
108
[67] M. Rothney, M. Neumann, A. Beziat, and K. Chen, “An artificial neural network
model of energy expenditure using nonintegrated acceleration signals,” J Appl
Physiol, vol. 103, no. 4, pp. 1419–27, October 2007.
[68] F. Albinali, S. Intille, W. Haskell, and M. Rosenberger, “Using wearable activity
type detection to improve physical activity energy expenditure estimation,” in
Proceedings of the 12th ACM international conference on Ubiquitous computing,
ser. Ubicomp ’10. New York, NY , USA: ACM, 2010, pp. 311–320. [Online].
Available: http://doi.acm.org/10.1145/1864349.1864396
[69] M. Altini, J. Penders, and O. Amft, “Energy expenditure estimation using wear-
able sensors: A new methodology for activity-specific models,” 2012.
[70] D. R. Bassett Jr, A. L. Cureton, B. E. Ainsworth et al., “Measurement of
daily walking distance-questionnaire versus pedometer.” Medicine and Science
in Sports and Exercise, vol. 32, no. 5, p. 1018, 2000.
[71] I.-M. Lee, D. M. Buchner et al., “The importance of walking to public health.”
Medicine and science in sports and exercise, vol. 40, no. 7 Suppl, p. S512, 2008.
[72] W. Knowler, E. Barrett-Connor, and S. Fowler, “Reduction in the incidence of
type 2 diabetes with lifestyle intervention or metformin,” New England Journal
of Medicine, vol. 346, pp. 393–403, 2002.
[73] E. McAuley, A. F. Kramer, and S. J. Colcombe, “Cardiovascular fitness and neu-
rocognitive function in older adults: a brief review,” Brain, behavior, and immu-
nity, vol. 18, no. 3, pp. 214–220, 2004.
[74] M. C. Robertson, A. J. Campbell, M. M. Gardner, and N. Devlin, “Preventing
injuries in older people by preventing falls: A meta-analysis of individual-level
data,” Journal of the American Geriatrics Society, vol. 50, no. 5, pp. 905–911,
2002.
[75] J. Levine, “Nonexercise activity thermogenesis – liberating the life-force,” Jour-
nal of Internal Medicine, vol. 262, pp. 273–287, 2007.
[76] J. A. Levine, S. K. McCrady, L. M. Lanningham-Foster, P. H. Kane, R. C. Foster,
and C. U. Manohar, “The role of free-living daily walking in human weight gain
and obesity,” Diabetes, vol. 57, no. 3, pp. 548–554, 2008.
[77] M. G. Berman, J. Jonides, and S. Kaplan, “The cognitive benefits of interacting
with nature,” Psychological Science, vol. 19, no. 12, pp. 1207–1212, 2008.
[78] F. Dimeo, M. Bauer, I. Varahram, G. Proest, and U. Halter, “Benefits from aerobic
exercise in patients with major depression: a pilot study,” British journal of sports
medicine, vol. 35, no. 2, pp. 114–117, 2001.
109
[79] C. Chang, R. Ansari, and A. Khokhar, “Efficient tracking of cyclic human motion
by component motion,” Signal Processing Letters, IEEE, vol. 11, no. 12, pp. 941–
944, 2004.
[80] A. B. Albu, R. Bergevin, and S. Quirion, “Generic temporal segmentation of
cyclic human motion,” Pattern Recognition, vol. 41, no. 1, pp. 6–21, 2008.
[81] C. R. Lee and C. T. Farley, “Determinants of the center of mass trajectory in
human walking and running.” Journal of Experimental Biology, vol. 201, no. 21,
pp. 2935–2944, 1998.
[82] D. A. Winter, Biomechanics and motor control of human movement. Wiley,
2009.
[83] R. Alexander, “Simple models of human movement,” Applied Mechanics Re-
views, vol. 48, p. 461, 1995.
[84] R. Blickhan, “The spring-mass model for running and hopping,” Journal of
biomechanics, vol. 22, no. 11, pp. 1217–1227, 1989.
[85] T. A. McMahon, “Spring-like properties of muscles and reflexes in running,” in
Multiple Muscle Systems. Springer, 1990, pp. 578–590.
[86] R. Blickhan and R. Full, “Similarity in multilegged locomotion: bouncing like
a monopode,” Journal of Comparative Physiology A: Neuroethology, Sensory,
Neural, and Behavioral Physiology, vol. 173, no. 5, pp. 509–517, 1993.
[87] C. T. Farley and O. Gonzalez, “Leg stiffness and stride frequency in human run-
ning,” Journal of biomechanics, vol. 29, no. 2, pp. 181–186, 1996.
[88] R. W. Levi and T. Judd, “Dead reckoning navigational system using accelerome-
ter to measure foot impacts,” Dec. 10 1996, uS Patent 5,583,776.
[89] L. Rabiner and B. Juang, “An introduction to hidden markov models,” ASSP Mag-
azine, IEEE, vol. 3, no. 1, pp. 4–16, 1986.
[90] K. P. Murphy, “Dynamic bayesian networks,” Probabilistic Graphical Models,
M. Jordan, 2002.
[91] F. Foerster and J. Fahrenberg, “Motion pattern and posture: correctly assessed
by calibrated accelerometers,” Behavior Research Methods, vol. 32, no. 3, pp.
450–457, 2000.
[92] L. Bao and S. S. Intille, “Activity recognition from user-annotated acceleration
data,” Massachusetts Institute of Technology, pp. 1–17, April 2004. [Online].
Available: http://dx.doi.org/10.1007/b96922
110
[93] A. G. Bonomi, A. Goris, B. Yin, K. R. Westerterp et al., “Detection of type, du-
ration, and intensity of physical activity using an accelerometer,” Med Sci Sports
Exerc, vol. 41, no. 9, pp. 1770–1777, 2009.
[94] J. Mantyjarvi, M. Lindholm, E. Vildjiounaite, S.-M. Makela, and H. Ailisto,
“Identifying users of portable devices from gait pattern with accelerometers,”
in Acoustics, Speech, and Signal Processing, 2005. Proceedings.(ICASSP’05).
IEEE International Conference on, vol. 2. IEEE, 2005, pp. ii–973.
[95] D. M. Karantonis, M. R. Narayanan, M. Mathie, N. H. Lovell, and B. G. Celler,
“Implementation of a real-time human movement classifier using a triaxial ac-
celerometer for ambulatory monitoring,” Information Technology in Biomedicine,
IEEE Transactions on, vol. 10, no. 1, pp. 156–167, 2006.
[96] W.-Y . Chung, A. Purwar, and A. Sharma, “Frequency domain approach for activ-
ity classification using accelerometer,” in Engineering in Medicine and Biology
Society, 2008. EMBS 2008. 30th Annual International Conference of the IEEE.
IEEE, 2008, pp. 1120–1123.
[97] M. Ermes, J. Parkka, J. Mantyjarvi, and I. Korhonen, “Detection of daily activi-
ties and sports with wearable sensors in controlled and uncontrolled conditions,”
Information Technology in Biomedicine, IEEE Transactions on, vol. 12, no. 1, pp.
20–26, 2008.
[98] Y . Cho, Y . Nam, Y .-J. Choi, and W.-D. Cho, “Smartbuckle: human activity recog-
nition using a 3-axis accelerometer and a wearable camera,” in Proceedings of the
2nd International Workshop on Systems and Networking Support for Health Care
and Assisted Living Environments. ACM, 2008, p. 7.
[99] S. J. Preece, J. Y . Goulermas, L. P. Kenney, and D. Howard, “A comparison of
feature extraction methods for the classification of dynamic activities from ac-
celerometer data,” Biomedical Engineering, IEEE Transactions on, vol. 56, no. 3,
pp. 871–879, 2009.
[100] Z. He and L. Jin, “Activity recognition from acceleration data based on discrete
consine transform and svm,” in Systems, Man and Cybernetics, 2009. SMC 2009.
IEEE International Conference on. IEEE, 2009, pp. 5041–5044.
[101] Y . Oshima, K. Kawaguchi, S. Tanaka, K. Ohkawara, Y . Hikihara, K. Ishikawa-
Takata, and I. Tabata, “Classifying household and locomotive activities using a
triaxial accelerometer,” Gait & posture, vol. 31, no. 3, pp. 370–374, 2010.
[102] F. Ichikawa, J. Chipchase, and R. Grignani, “Where’s the phone? a study of
mobile phone location in public spaces,” in Mobile Technology, Applications and
Systems, 2005 2nd International Conference on. IEEE, 2005, pp. 1–8.
111
[103] L. Sun, D. Zhang, B. Li, B. Guo, and S. Li, “Activity recognition on an accelerom-
eter embedded mobile phone with varying positions and orientations,” Ubiquitous
Intelligence and Computing, pp. 548–562, 2010.
[104] P. Welch, “The use of fast fourier transform for the estimation of power spectra: a
method based on time averaging over short, modified periodograms,” Audio and
Electroacoustics, IEEE Transactions on, vol. 15, no. 2, pp. 70–73, 1967.
[105] A. Papoulis, Signal analysis. McGraw-Hill, 1977.
[106] R. Bitmead and B. Anderson, “Adaptive frequency sampling filters,” Circuits and
Systems, IEEE Transactions on, vol. 28, no. 6, pp. 524–534, 1981.
[107] S. Albrecht, I. Cumming, and J. Dudas, “The momentary fourier transformation
derived from recursive matrix transformations,” in Digital Signal Processing Pro-
ceedings, 1997. DSP 97., 1997 13th International Conference on, vol. 1. IEEE,
1997, pp. 337–340.
[108] H. Vathsangam, B. A. Emken, E. T. Schroeder, D. Spruijt-Metz, and G. S.
Sukhatme, “Energy estimation of treadmill walking using on-body accelerom-
eters and gyroscopes,” in Engineering in Medicine and Biology Society (EMBC),
2010 Annual International Conference of the IEEE. IEEE, 2010, pp. 6497–6501.
[109] H. Vathsangam, A. Emken, E. T. Schroeder, D. Spruijt-Metz, and G. S. Sukhatme,
“Determining energy expenditure from treadmill walking using hip-worn inertial
sensors: An experimental study,” Biomedical Engineering, IEEE Transactions
on, vol. 58, no. 10, pp. 2804–2815, 2011.
[110] A. Godfrey, R. Conway, D. Meagher, and G. ÓLaighin, “Direct measurement of
human movement by accelerometry,” Medical engineering & physics, vol. 30,
no. 10, pp. 1364–1386, 2008.
[111] A. V . Rowlands, “Accelerometer assessment of physical activity in children: an
update.” Pediatr Exerc Sci, vol. 19, no. 3, pp. 252–266, Aug 2007.
[112] J.-C. Cheng and J. M. F. Moura, “Tracking human walking in dynamic scenes,” in
ICIP ’97: Proceedings of the 1997 International Conference on Image Process-
ing (ICIP ’97) 3-Volume Set-Volume 1. Washington, DC, USA: IEEE Computer
Society, 1997, p. 137.
[113] B. Zappa, G. Legnani, A. J. van den Bogert, and R. Adamini,
“On the number and placement of accelerometers for angular velocity
and acceleration determination,” pp. 552–554, 2001. [Online]. Available:
http://link.aip.org/link/?JDS/123/552/1
112
[114] K. Aminian and B. Najafi, “Capturing human motion using body-fixed
sensors: outdoor measurement and clinical applications,” Computer Animation
and Virtual Worlds, vol. 15, no. 2, pp. 79–94, 2004. [Online]. Available:
http://dx.doi.org/10.1002/cav.2
[115] H. J. Ralston, “Energy-speed relation and optimal speed during level walking,”
European Journal of Applied Physiology and Occupational Physiology, vol. 17,
no. 4, pp. 277–283, 1958.
[116] M. Zarrugh, F. Todd, and H. Ralston, “Optimization of energy expenditure dur-
ing level walking,” European Journal of Applied Physiology and Occupational
Physiology, vol. 33, no. 4, pp. 293–306, 1974.
[117] D. Grieve and J. Ruth, “The relationships between length of stride, step frequency,
time of swing and speed of walking for children and adults,” Ergonomics, vol. 9,
no. 5, pp. 379–399, 1966.
[118] G. E. Box and G. C. Tiao, “Bayesian inference in statistical analysis,” DTIC
Document, Tech. Rep., 1973.
[119] C. E. Rasmussen and C. K. Williams, Gaussian processes for machine learning.
MIT press Cambridge, MA, 2006, vol. 1.
[120] Sparkfun, “6 Dof v4 Datasheet - http://www.sparkfun.com/datasheets/Sensors/IMU/6DOF-
v4-Rev1.pdf,” 2009.
[121] N. Butte, U. Ekelund, and K. Westerterp, “Assessing physical activity using wear-
able monitors: Measures of physical activity,” Medicine and Science in Sports
and Exercise, vol. 44(1 Suppl 1), pp. S5–12, 2012.
[122] C. Delbue, R. Passmore, J. Thomson, and J. A. Watt, “Variations in Energy Ex-
penditure during Walking,” Brit. J. soc. Med., vol. 3, pp. 139–142, 1949.
[123] H. Vathsangam, A. Emken, E. T. Schroeder, D. Spruijt-Metz, and G. S. Sukhatme,
“Towards a generalized regression model for on-body energy prediction from
treadmill walking,” in Pervasive Computing Technologies for Healthcare (Perva-
siveHealth), 2011 5th International Conference on. IEEE, 2011, pp. 168–175.
[124] H. Vathsangam, E. T. Schroeder, and G. S. Sukhatme, “On determining the
best physiological predictors of activity intensity using phone-based sensors,” in
Point-of-Care Healthcare Technologies (PHT), 2013 IEEE. IEEE, 2013, pp.
140–143.
[125] P. Weyand, B. Smith, M. Puyau, and N. Butte, “The mass-specific energy cost of
human walking is set by stature.” J Exp Biol, vol. 213 (Pt 23), pp. 3972–9, 2010.
113
[126] I. Zakeri, M. Puyau, A. Adolph, F. V ohra, and N. Butte, “Normalization of Energy
Expenditure Data for Differences in Body Mass or Composition in Children and
Adolescents,” J of Nutrition, vol. 136, pp. 1371–1376, 2006.
[127] A. Neville, R. Ramsbottom, and C. Williams, “Scaling physiological measure-
ments for individuals of different body size,” Eur J Appl Physiol Occup Physiol.,
vol. 65(2), pp. 110–7, 1992.
[128] D. Rogers, B. Olson, and J. Wilmore, “Scaling for the vo_2-to-body size rela-
tionship among children and adults,” J Appl. Physiol., vol. 79(3), pp. 958–967,
1995.
[129] M. Pearce, D. Cunningham, A. Donner, P. Rechnitzer, G. Fullerton, and
J. Howard, “Energy cost of treadmill and floor walking at self-selected paces,”
European Journal of Applied Physiology and Occupational Physiology, vol. 52,
pp. 115–119, 1983.
[130] R. L. Waters and S. Mulroy, “The energy expenditure of normal and pathologic
gait,” Gait & Posture, vol. 9, no. 3, pp. 207 – 231, 1999.
[131] C. Wyndham, W. van der Walt, A. van Rensburg, G. Rogers, and N. Strydom,
“The influence of body weight on energy expenditure during walking on a road
and on a treadmill,” European Journal of Applied Physiology and Occupational
Physiology, vol. 29, pp. 285–292, 1971.
[132] J. Shlens, “A Tutorial on Principal Component Analysis,” Systems Neurobiology
Laboratory, Salk Insitute for Biological Studies, Tech. Rep., 2005. [Online].
Available: http://www.snl.salk.edu/˜shlens/pub/notes/pca.pdf
[133] A. Gelman and J. Hill, Data Analysis Using Regression and Multi-
level/Hierarchical Models. Cambridge University Press, 2007.
[134] A. E. Gelfand, S. E. Hills, A. Racine-Poon, and A. F. M. Smith, “Illustration of
bayesian inference in normal data models using gibbs sampling.” Journal of the
American Statistical Association, vol. 85, pp. 972–985, 1990.
[135] ACSM, ACSM’s Guidelines for Exercise Testing and Prescription. American
College of Sports Medicine, 2010.
[136] RESL, “Movement Trackr Android Application -
https://github.com/mobilesensing-usc/MovementTrackr.” [Online]. Available:
https://github.com/mobilesensing-usc/MovementTrackr
[137] B. E. Ainsworth, W. L. Haskell, S. D. Herrmann, N. Meckes, D. R. Bassett,
C. Tudor-Locke, J. L. Greer, J. Vezina, M. C. Whitt-Glover, and A. S. Leon,
114
“2011 compendium of physical activities: a second update of codes and met val-
ues,” Medicine and science in sports and exercise, vol. 43, no. 8, pp. 1575–1581,
2011.
[138] J. Weir, “New methods for calculating metabolic rate with special reference to
protein metabolism.” Journal of Physiology, vol. 109, pp. 1–9, 1949.
[139] [Online]. Available: https://code.google.com/p/achartengine/
[140] E. M. Berke, T. Choudhury, S. Ali, and M. Rabbi, “Objective measurement of
sociability and activity: mobile sensing in the community,” The Annals of Family
Medicine, vol. 9, no. 4, pp. 344–350, 2011.
[141] A. Thiagarajan, L. Ravindranath, H. Balakrishnan, S. Madden, L. Girod et al.,
“Accurate, low-energy trajectory mapping for mobile devices,” in Proceedings of
the 8th USENIX conference on Networked systems design and implementation.
USENIX Association, 2011, pp. 20–20.
[142] N. Ravi, N. Dandekar, P. Mysore, and M. L. Littman, “Activity recognition from
accelerometer data,” in Proceedings of the national conference on artificial intel-
ligence, vol. 20, no. 3. Menlo Park, CA; Cambridge, MA; London; AAAI Press;
MIT Press; 1999, 2005, p. 1541.
[143] E. Miluzzo, N. D. Lane, K. Fodor, R. Peterson, H. Lu, M. Musolesi, S. B. Eisen-
man, X. Zheng, and A. T. Campbell, “Sensing meets mobile social networks: the
design, implementation and evaluation of the cenceme application,” in Proceed-
ings of the 6th ACM conference on Embedded network sensor systems. ACM,
2008, pp. 337–350.
[144] I. Constandache, R. R. Choudhury, and I. Rhee, “Towards mobile phone localiza-
tion without war-driving,” in INFOCOM, 2010 Proceedings IEEE. IEEE, 2010,
pp. 1–9.
[145] H. Vathsangam, A. Tulsyan, and G. S. Sukhatme, “A data-driven movement
model for single cellphone-based indoor positioning,” in Body Sensor Networks
(BSN), 2011 International Conference on. IEEE, 2011, pp. 174–179.
[146] Y . Cui and S. S. Ge, “Autonomous vehicle positioning with gps in urban canyon
environments,” Robotics and Automation, IEEE Transactions on, vol. 19, no. 1,
pp. 15–25, 2003.
[147] F. Ben Abdesslem, A. Phillips, and T. Henderson, “Less is more: energy-efficient
mobile sensing with senseless,” in Proceedings of the 1st ACM workshop on Net-
working, systems, and applications for mobile handhelds. ACM, 2009, pp.
61–62.
115
[148] O. J. Woodman, “An introduction to inertial navigation,” University of Cam-
bridge, Computer Laboratory, Tech. Rep. UCAMCL-TR-696, 2007.
[149] Y . Wang, J. Lin, M. Annavaram, Q. A. Jacobson, J. Hong, B. Krishnamachari,
and N. Sadeh, “A framework of energy efficient mobile sensing for automatic
user state recognition,” in Proceedings of the 7th international conference on
Mobile systems, applications, and services. ACM, 2009, pp. 179–192.
[150] J. Reinebold, H. Vathsangam, and G. Sukhatme, “Inactivity recognition: Separat-
ing moving phones from stationary users,” in PhoneSense 2011, 2011.
[151] L. Vicci, “Quaternions and rotations in 3-space: The algebra and its geometric
interpretation,” TR01-014, Dept. of Computer Science, University of North Car-
olina at Chapel Hill, 2001.
[152] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Wit-
ten, “The weka data mining software: an update,” ACM SIGKDD Explorations
Newsletter, vol. 11, no. 1, pp. 10–18, 2009.
[153] J. A. Harris and F. G. Benedict, “A biometric study of human basal metabolism,”
Proceedings of the National Academy of Sciences of the United States of America,
vol. 4, no. 12, p. 370, 1918.
[154] M. D. Mifflin, S. St Jeor, L. A. Hill, B. J. Scott, S. A. Daugherty, and Y . Koh, “A
new predictive equation for resting energy expenditure in healthy individuals.”
The American journal of clinical nutrition, vol. 51, no. 2, pp. 241–247, 1990.
[155] O. E. Owen, E. Kavle, R. S. Owen, M. Polansky, S. Caprio, M. A. Mozzoli, Z. V .
Kendrick, M. Bushman, and G. Boden, “A reappraisal of caloric requirements in
healthy women.” The American journal of clinical nutrition, vol. 44, no. 1, pp.
1–19, 1986.
[156] O. E. Owen, J. L. Holup, D. A. D’Alessio, E. S. Craig, M. Polansky, K. J. Smalley,
E. C. Kavle, M. C. Bushman, L. R. Owen, and M. A. Mozzoli, “A reappraisal
of the caloric requirements of men.” The American journal of clinical nutrition,
vol. 46, no. 6, pp. 875–885, 1987.
[157] F. Joint et al., Energy and Protein Requirements: Report of a Joint
FAO/WHO/UNU Expert Consultation; Energy and Protein Requirements: Report
of a Joint FAO/WHO/UNU Expert Consultation. World Health Organization,
1985, no. 724.
[158] D. Frankenfield, L. Roth-Yousey, C. Compher et al., “Comparison of predictive
equations for resting metabolic rate in healthy nonobese and obese adults: a sys-
tematic review.” Journal of the American Dietetic Association, vol. 105, no. 5, p.
775, 2005.
116
Appendix A
Data Collection
It is a capital mistake to theorize before one has data. Insensibly,
one begins to twist facts to suit theories, instead of theories to suit facts.
—Sherlock Holmes.
T
HE results in this dissertation would not have been possible without two exten-
sive data collection studies. This section expands on the purpose, methodology
and protocol followed in collecting data. Over time, kinematic hardware was upgraded
to optimize for power consumption and ease-of-use. Informed consent was obtained
from each participant and all studies were approved by the Institutional Review Board,
University of Southern California.
A.1 Data Collection 1: Personal Energy Expenditure
This study, focused on collecting data to examine in-detail the relationship between
kinematic sensor features and energy expenditure for a person. The principle was to test
117
(a) Sensor board used to collect
data.
(b) The controller board with RN-
41 Bluetooth module
(c) An example recording proce-
dure for a single participant.
Figure A.1: Illustration of hardware used to capture treadmill walking information.
Acceleration information was collected with a Freescale MMA7260Q triple-axis ac-
celerometer. Rotational rates were collected with 2 Invensense IDG500 500
=s gyro-
scopes mounted perpendicular to each other. The sensor hardware was modified to be
worn with a custom designed harness on the right iliac crest. The yellow box indicates
sensor mounting. The red box indicatesVO
2
recording via the mask leading to the metabolic
unit. Original image source for (a) and (b): www.sparkfun.com
various regression models under the assumption that infinite data would be available per
person.
A.1.1 Participant Statistics
Seven healthy adults (three male, four female) participated in the study. Height and
weight of each participant were recorded using a Healthometer balance beam scale. The
participants had average age = 29 6 years, average height = 1:67 0:10 m, average
weight = 66 17 kg.
A.1.2 Hardware Description
Human movement was captured with a modified version of the Sparkfun 6DoF kine-
matic Measurement Unit (IMU) v4 [120]. Fig. A.1a illustrates the hardware used. The
118
v4 provides three axes of acceleration data, three axes of gyroscopic data, and three axes
of magnetic data with three sensors: a Freescale MMA7260Q triple-axis accelerometer,
set at 1.5 g sensitivity and two Invensense IDG500 500
=s gyroscopes. At the time of
this study, the absence of triaxial gyroscopes required that two biaxial gyroscopes be
mounted perpendicular to each other and calibrated to function as one gyroscope. Con-
trol was through an LPC2138 ARM7 processor. Custom firmware was used on the con-
troller board to stream sensor data continuously. Data were sampled at 100 Hz. The unit
used Bluetooth to transmit data to either a nearby PC or mobile phone using the RN41
Bluetooth module set at 115200 bps. Maximum range of the transmitter was approxi-
mately 5 m in indoor conditions. The system was powered from a 3.3V rechargeable
lithium-polymer battery power supply. The sensor was encased in a custom-designed
harness to be worn on the right iliac crest (participants were asked to wear the harness
tightly to prevent any slippage). The use of sensors in all three axes allowed the capture
of periodicity in all three planes – sagittal, frontal and transverse.
The treadmill used for the experiments was the research quality NordicTrack A2550
PRO. Rate of oxygen consumption (
_
VO
2
, ml/min) was used as the representation of
energy expenditure. This was measured using the MedGraphics Cardio II metabolic
system with BreezeSuite v6.1B (Medical Graphics Corporation). The metabolic system
outputs data at the frequency of every breath. Before each test, the flow meter was
calibrated against a 3.0 L syringe and the system was calibrated against O
2
and CO
2
gases of known concentration. Fig. A.1c illustrates the recording procedure.
A.1.3 Protocol
Each participant walked at 11 predetermined speeds between 2.5 mph and 3.5 mph
in intervals of 0.1 mph. Speeds were chosen based on the Compendium of Physical
119
Activities [137]. The duration of walking data collected for each speed was 7 minutes
with two minutes of changeover time to allow for settling of
_
VO
2
consumption. For
each participant, data were recorded in two sessions with the first session consisting of
speeds 2.5 mph, 2.8 mph, 3 mph, 3.3 mph and 3.5 mph and the second session at the
remaining speeds.
A.2 Data Collection 2: Energy Expenditure across a large
population
This study, focused on collecting data to closely examine the how a map from movement
to energy expenditure could be generated with minimal data from a user. In contrast to
the previous dataset, this study focused on collecting limited amounts of data per person
but collecting more data across people. The set of sensor data was also expanded to
include heart rate information.
A.2.1 Participant Statistics
Data were collected on a total of 49 participants (29 male, 20 female). Figure A.2a de-
scribes participant statistics. Each participant’s height was measured with a wall-based
height chart and weight was measured with the EatSmart Precision Digital weighing
scale. Average height was 1:73 0:07m and average weight was 69:7 7:5kg, average
BMI was 23:7 3:8.
120
50 60 70 80 90 100 110
1.5
1.55
1.6
1.65
1.7
1.75
1.8
1.85
1.9
1.95
2
Weight (kgs)
Height (meters)
Normal
Overweight
Obese
Males
Females
(a) Characteristics of study population
plotted as height versus weight with nor-
mal and overweight regions shown. Men
are shown as triangles and women are
shown as circles.
(b) Each participant
walked on a treadmill for
three speeds.
(c) Each participant
walked on a track at a
self-selected speed.
Figure A.2: Illustration of hardware, ground truth collection and population statistics.
Triaxial accelerations and rotational rates were recorded with a phone on the right iliac
crest. Energy expenditure was measured with the Oxycon
TM
portable metabolic unit.
Heart rate was measured with a Polar Heart Rate monitor that was time-synced to the
metabolic cart. GPS measurements were taken using a mobile phone.
A.2.2 Hardware Description
Each participant, wore a Galaxy Nexus S phone running Android 2.3.3 on the right iliac
crest with a belt holder to record movement. Accelerometer data were captured with
a custom-built smart phone app - Movement Trackr as described in section A.3. The
app records triaxial accelerometer, triaxial gyroscope and GPS data (when available)
at a set sampling rate. For the purposes of this study, the accelerometer settings were
set at “Fastest” (50 Hz) and the gyroscope settings were set at “Game” (100 Hz).
All phone data were stored locally and synchronized in post-processing. A Butterworth
bandpass filter with 3dB cutoff between 0.75 Hz and 2.3 Hz was applied to the raw
accelerometer data. Energy expenditure was measured using the Oxycon
TM
Mobile
Metabolic unit from Carefusion. The unit was worn as a backpack fitted to the comfort
of the participant. The metabolic unit reports participant
_
VO
2
and
_
VCO
2
and derived
121
calorie data at the frequency of every breath. Heart rate data were also collected us-
ing the Polar Heart Rate Monitor and synced directly with the calorie counts from the
metabolic unit. Calories were estimated using the Weir equation [138].
A.2.3 Protocol
Data collection was carried out in two sessions - indoor and outdoor. In the indoor
session, each participant was asked to sit still and meditate for five minutes while basal
metabolic rate and resting heart rate data were collected at the frequency of every breath.
Each participant then walked on a treadmill at three speeds - 2.5, 3.0 and 3.5 mph for
five minutes per speed with two minutes of settling time for each speed. In the outdoor
session, each participant was asked to walk on the university athletic track at a self-
selected speed for approximately 20 min.
122
A.2.4 Illustrations of energy expenditure across participants
The availability of these data allowed us to analyze variation in energy consumption
across people with activity intensity, in particular for walking. Figure A.3 illustrates the
trends in our population. In general, heart rate and energy expenditure increased with
speed of walk. However, for the same walking speed, women on average had a higher
heart rate and burned less calories than men. This could be attributed to the fact that the
women population was on average shorter and weighed less.
A.3 Movement Trackr Application
We created a custom Android application called “Movement Trackr” [136] to be able to
record kinematic data from a phone. Figure A.4 shows screen shots from the app. This
app was used on a Galaxy Nexus S running 2.3.3. The app is compatible with android
versions greater than 2.3.3. The app is capable of recording triaxial accelerometer data,
triaxial gyroscope data and GPS data. Each of the accelerometer and gyroscope can
capture data at preset data rates of Fastest, Game, UI and Normal. The GPS data was
recorded at the fastest rate possible which was 1 Hz. Users have the option of free
recording or recording set to a timer.
The app also supports multiple profiles (stored on the phone) for many users. Within
each profile, a user can enter their name, age, gender, weight, height, leg length and race.
They can also set up a nickname and a profile photograph. Users also have the option
to display previously recorded data in the form of time series (for accelerometer and
gyroscope) or a map overlay (for GPS).
For a given user profile, users can also view accelerometer, gyroscope, orientation or
GPS data for each recording session. We used the achartengine API [139] to plot data.
123
40 60 80 100 120 140 160
0
2
4
6
8
10
12
Heart Rate (bpm)
Energy (kcal/min)
Rest
2.5 mph
3.0 mph
3.5 mph
(a) Variation of heart rate versus energy expendi-
ture. Heart rate and energy expenditure increases
with speed.
40 60 80 100 120 140 160
0
2
4
6
8
10
12
Heart Rate (bpm)
Energy (kcal/min)
Males
Females
(b) Variation of heart rate versus energy expen-
diture. For the same walking speed, women on
average had a higher heart rate and burned less
calories than men.
40 50 60 70 80 90 100 110 120
0
1
2
3
4
5
6
7
Corr. Coefficient = 0.31844
Weight (kg)
Energy (kcal/min)
(c) At rest, weight and energy were poorly corre-
lation
40 50 60 70 80 90 100 110 120
2
3
4
5
6
7
8
9
10
11
Weight (kg)
Energy (kcal/min)
Corr. Coefficient = 0.78956
(d) With higher activity, energy and weight
showed a high correlation.
Figure A.3: Split of population showing the variation of energy expenditure with differ-
ent activities
124
(a) App Home Screen (b) Profile screen
(c) Settings screen (d) Plot screen
Figure A.4: Screen shots of Movement Trackr Android Application
125
Appendix B
Derivations of Learning Algorithms
Hobbes: Ooh, that’s a tricky one. You have to use
calculus and imaginary numbers for this.
Calvin: IMAGINARY NUMBERS ?!
Hobbes: You know, eleventeen, thirty-twelve, and
all those. It’s a little confusing at first.
—Calvin and Hobbes.
B.1 Least-Squared Regression
The optimal solution in a least-squared regression sense can be obtained by maximizing
the likelihood of the data. This is because the function has a global maximum with
respect to w
p
.
The log-likelihood for least-squared regression is:
L =
Np
X
np=1
N
p
2
log
p
p
2
kY
p
X
p
w
p
k
2
The parameters to be estimated are w
p
and
p
. Differentiating w.r.t
p
, we get:
126
@L
@
p
= 0
=) 0 =
N
p
2
p
1
2
kY
p
X
p
w
p
k
2
=)
1
p
=
1
N
p
kY
p
X
p
w
p
k
2
Differentiating w.r.t w
p
, we get:
@L
@w
p
= 0
0 = (Y
p
X
p
w
p
) X
p
=) w
p
=
X
T
p
X
p
1
X
T
p
Y
p
B.2 Bayesian Linear Regression
We have the following model definition:
p (Y
p
jw
p
;
p
) = N
Y
p
; X
p
w
p
;
1
p
I
p (w
p
j
p
) = N
w
p
; 0;
1
p
I
p (Y
p
j
p
;
p
) =
p (Y
p
jw
p
;
p
)p (w
p
j
p
)dw
p
Under this model, using Bayes’ rule and properties of Gaussians, we can estimate the
posterior of w
p
as:
=) p (w
p
jY
p
;
p
;
p
) = N (w
p
;
p
; S
p
)
p
=
p
S
p
X
T
p
Y
p
S
p
=
p
X
T
p
X
p
+
p
I
1
127
The marginal likelihood function is:
L
marginal
= logp (Y
p
j
p
;
p
)
=
M
2
log
p
+
N
p
2
log
p
p
2
kY
p
X
p
w
p
k
2
p
2
kw
p
k
2
Unlike the least-squared case, the maximum likelihood method to estimate model
parameters will not work because of the appearance of cross terms in the derivatives.
Thus we resort to the EM algorithm. We consider the logarithm of the complete likeli-
hood:
L =logp (Y
p
; w
p
j
p
;
p
) =
N
p
2
log
p
+
M
2
log
p
p
2
kY
p
X
p
w
p
k
2
+
p
2
kw
p
k
2
+const
hLi
p(wpjp;p)
=
N
p
2
log
p
+
M
2
log
p
p
2
kY
p
X
p
p
k
2
+
p
2
k
p
k
2
p
2
tr
X
T
p
S
p
X
p
p
2
tr [S
p
]
We derive the EM algorithm for this. The parameters to be estimated are
p
and
p
. The
hidden variable to be estimated is w
p
. We estimate the parameters in the M-step.
@L
@
p
= 0
=) 0 =
M
2
p
1
2
k
p
k
2
1
2
tr [S
p
]
=)
p
=
M
k
p
k
2
+tr [S
p
]
@L
@
p
= 0
=) 0 =
N
p
2
p
1
2
kY
p
X
p
w
p
k
2
1
2
tr
X
T
p
S
p
X
p
=)
p
=
N
p
kY
p
X
p
w
p
k
2
+tr
X
T
p
S
p
X
p
128
We estimate posterior w
p
given
p
,
p
:
p
=
p
S
p
X
T
p
Y
p
S
p
=
p
X
T
p
X
p
+
p
I
1
B.3 Hierarchical Linear Modeling
We have the following model definition:
p (Y
p
jw
p
;
p
) = N
Y
p
; X
p
w
p
;
1
p
I
p (w
p
j
p
; k) =
M
Y
m=1
N
w
p;m
; Phys
T
p
k
m
;
1
p
I
p (Y
p
j
p
;
p
) =
p (Y
p
jw
p
;
p
)p (w
p
j
p
; k)dw
p
The marginal likelihood function is:
L
marginal
= logp (Y
p
j
p
;
p
)
=
P
X
p=1
M
2
log
p
+
N
p
2
log
p
p
2
kY
p
X
p
w
p
k
2
M
X
m=1
w
p;m
Phys
T
p
k
m
2
!
We consider the logarithm of the complete likelihood:
129
L =log
P
Y
p=1
p (Y
p
; w
p
jk;
p
;
p
) =
P
X
p=1
N
p
2
log
p
+
M
2
log
p
p
2
kY
p
X
p
w
p
k
2
+
p
2
M
X
m=1
w
p;m
Phys
T
p
k
m
2
!
+const
!
hLi
p(wpjp;p);p=1;2:::;P
=
P
X
p=1
N
p
2
log
p
+
P
X
p=1
M
2
log
p
P
X
p=1
p
2
kY
p
X
p
p
k
2
+
P
X
p=1
p
2
*
M
X
m=1
w
p;m
Phys
T
p
k
m
2
+!
=
N
p
2
P
X
p=1
log
p
+
PM
2
log
p
P
X
p=1
p
2
kY
p
X
p
p
k
2
+Tr
X
T
p
S
p
X
p
P
X
p=1
p
2
M
X
m=1
p;m
Phys
T
p
k
m
2
+Tr (S
p
)
!
We derive the EM algorithm for this. The parameters to be estimated are
p
and
p
,
p = 1; 2;:::P . The hidden variable to be estimated is w
p
,p = 1; 2;:::P . We estimate
the parameters in the M-step.
130
@L
@
p
= 0
=) 0 =
MP
2
p
1
2
M
X
m=1
p;m
Phys
T
p
k
m
2
1
2
tr [S
p
]
=)
p
=
MP
M
X
m=1
p;m
Phys
T
p
k
m
2
+tr [S
p
]
@L
@
p
= 0
=) 0 =
N
p
2
p
1
2
(Y
p
X
p
p
)
2
1
2
Tr
X
T
p
S
p
X
p
=)
p
=
N
p
kY
p
X
p
w
p
k
2
+tr
X
T
p
S
p
X
p
@L
@k
m
= 0
=)
p;m
= Phys
T
p
k
m
8p;foreachm
=) k
m
=
Phys
p
Phys
T
p
1
Phys
p
m
We see that the estimated values for
p
remains the same. However the estimated value
for
p
has a modification to incorporate top-down dependence of terms.
In the E-step, we estimate the posterior of each w
p
given the parameters
p
;
p
; k.
We denote =f
p
g
P
p=1
and =f
p
g
P
p=1
.
p
w
p
jY
p
;;
;
;k; Phys
p
=
p
Y
p
jw
p
;;;k; Phys
p
p
w
p
j;;k; Phys
p
p
Y
p
j
p
;
p;
;k; Phys
p
=
p (Y
p
jw
p
p
)p
w
p
j;k; Phys
p
p
Y
p
j
p
;
p;
;k; Phys
p
(usingconditionalindependence)
=
N
Y
p
; X
p
w
p
;
1
p
I
Q
M
m=1
N
w
p;m
; Phys
T
p
k
m
;
1
p
I
p
Y
p
j
p
;
p;
; k; Phys
p
131
We substitute the Gaussian values and use the method of completion of squares to obtain
the resultant posterior distribution:
p
= S
p
p
S
p
X
T
p
Y
p
+
p
m
p
S
p
=
p
X
T
p
X
p
+
p
I
1
m
p;m
= Phys
T
p
k
m
132
Appendix C
Inactivity Recognition
A related topic to measuring energy expenditure is the idea of detection of rest. Detect-
ing rest in free-living settings is a key component is monitoring sedentary lifestyles and
health [140]. It also has widespread applications in indoor localization [141]. We define
"at rest" to be when the user is standing (or sitting) in a fixed position with respect to the
world (specifically for this study taken to be not moving more than a meter in a three
second window). It is important to note that even when the user is at rest they may be
actively using the phone: interacting with it for applications or switching its position
relative to their body. An algorithm that claims to accurately detect rest must be robust
to these variations.
Work in rest detection has fallen under the broader field of activity recognition -
using mobile sensors and machine-learning techniques to recognize aspects of human
motion. Previous studies have enabled highly accurate classification of divergent activ-
ities such as folding laundry or running a vacuum cleaner [92, 142, 63, 143]. However,
unlike our approach, these studies assume a constant (or at least known) location for the
mobile phone in relation to the body of the user during the course of the experiments.
133
Similarly, work has been done on localization using sensors typically available on most
phones (GPS, Wi-Fi networks, GSM) [144, 145]. These systems rely, at least partially,
on signals transmitted over radio waves from known fixed-point locations. However,
relying on the constancy of these signals can be problematic. For example, GPS con-
nection can be lost inside buildings or in the "urban canyons" of major cities [146]. GPS
and Wi-Fi are also more power-hungry than other, internal sensors [147]. To avoid these
pitfalls, our approach uses only on-phone kinematic sensors.
Although in theory localization could be accomplished by integrating information
from accelerometers over time given an initial starting position ("dead reckoning"), in
practice sensor noise corrupts the calculations and such methods are inaccurate given a
significant length of time [148]. However, solving the simpler problem of determining
whether or not the user holding a phone is moving may be valuable information in its
own right. If this could be determined with a high degree of accuracy without knowing
in advance where the phone is stored on a person’s body, it could assist other, more
complicated, localization schemes.
Thiagarajan et al. [141] tried similar strategies using acceleration to detect move-
ment in cars but relied on preset thresholds and assumed a constant location for the
mobile device. Similarly, Wang et al. [149] detected movement but did not integrate
gyroscope data and relied on empirical thresholds. Data driven techniques allow us to
operate in non-linear spaces thus permitting flexibility in threshold design.
Our approach to rest detection treats rest as a binary classification problem in the
presence of non-rest data. Our study applies the established pattern developed for ac-
tivity recognition: sampling sensor hardware, extracting features, and using statistical
machine learning algorithms to classify unknown data points [92]. However, we expand
134
on these methods with two main contributions: our studies examine in detail which fea-
tures are most relevant for rest detection and show how correcting for phone orientation
improves accuracy. Our techniques do not require a fixed location or orientation of the
phone on the user’s person. We also pay particular interest to the problem of accurately
differentiating rest from the activity of walking (we choose to focus on walking as it is
the most typical form of human movement in office environments).
In Section C.1 we will cover the design, noting the features used (Section 2.2) and
how frame the sensor values reported by the phone into a global frame of reference to
provide useful training data for the machine learning algorithms. This work is based on
earlier research on inactivity recognition [150].
C.1 Design
To classify rest, a systematic way of sampling the kinematic sensors, extracting relevant
features, and applying these features as inputs to machine learning algorithms is needed.
Varying which features are trained on, whether or not rotational correction is performed,
and what machine learning algorithms are used can all affect the final performance of
the system. Our paper tests along these axes.
C.1.1 Hardware Sensors
For this experiment we used a standard Nexus-S phone equipped with the Android op-
erating system. The custom designed MovementTrackr App [136] was used to record
data from two kinds of kinematic sensors: accelerometers reporting triaxial accelera-
tions (m=s
2
) and triaxial rotational gyroscopes reporting angular speeds (rad=s). All
135
sensors recorded these values to text files at the fastest possible sampling frequency per-
mitted by the Android Sensor API (approximately 35 Hz for the accelerometers and 800
Hz for the gyroscopes).
C.1.2 Feature Extraction
Feature extraction replaces raw, potentially noisy data with statistically meaningful ag-
gregations across time intervals. In our study, training features were extracted from
the text logs across a sliding window of size three seconds with a one second overlap
between consecutive windows. We used the following 16 features to describe phone
movement:
Accelerometer Power (as in [63])
Accelerometer Means (along X, Y , Z axes)
Accelerometer Variances (along X, Y , Z axes)
Gyroscope Means (along X, Y , Z axes)
Gyroscope Variances (along X, Y , Z axes)
Covariance between acceleration and gyroscope rotation rates (along X, Y , Z axes)
These features were chosen for their ease of implementation and the fact that they can be
computed in O(n) time and have been used before with success for activity recognition
[92, 142]. The features were further grouped as: accelerometer power only (referred
to as “Power Only”), accelerometer power and covariance between acceleration and
rotation rates (referred to as "Partial"), and the entire set of sixteen (referred to as “Full”).
136
C.1.3 Sensor Data Coordinate Transformation
A distinguishing aspect in our approach is the use of “world rotated features” to describe
and characterize rest. Conventional activity recognition algorithms use sensor readings
that are normally measured in the local coordinate system of the phone [92, 142, 141].
The Android API returns orientation information about the phone in the form of
a quaternion:Q(X;Y;Z;) =
cos (=2) X:sin (=2) Y:sin (=2) Z:sin (=2)
,
where X, Y and Z are the direction cosines of the axis of rotation and specifies an
angle of rotation about that axis. Knowing the orientation quaternion allows us to rotate
the triaxial accelerometer and gyroscope sensor streams from the local coordinate frame
of the phone to a global coordinate frame [151]. At this point, sensor readings are
said to be “corrected” for phone orientation. Using a global coordinate frame ensures
that sensor readings corresponding to a particular axis remain so irrespective of phone
orientation. As such, local repetitive movements (as with circular motions of the device)
can be distinguished from movements associated with location changes.
C.2 Experiment Setup
Data collection was divided into three kinds of trials, grouped by type of movement:
constant movement, constant stationary behavior, and a mixture of both movement and
stationary behavior. The median age of the eight test participants was 27 years with a
standard deviation of 6. Participants had a median body weight of 70.40 kg (standard
deviation of 11.77) and a median height of 1.79 meters (standard deviation of 0.09).
All test participants responded that they regularly used mobile phones (although not
necessarily Android devices).
137
C.2.1 Trial 1: Constant movement
The aim of this trial was to test the accuracy of our algorithm in scenarios where the
user is always moving. Test subjects were given the phone and told to walk around the
USC campus for five minutes. The subjects were not given explicit instructions on how
to carry the phone (during the experiment we observed some subjects holding the phone
in their hand and others who kept the phone in a pocket). Ground truth was taken to be
the moving state for all data in this trial.
C.2.2 Trial 2: Constant stationary behavior
The aim of this trial was to test the accuracy of our algorithm in situations where the
user is always at rest. Test subjects were given the phone and told to not move outside
of a one-meter radius for five minutes. While inside the circle, they were instructed
to complete various tasks that involved small movements of the phone. Example tasks
included using the phone’s calculator App to solve a simple math expression, standing
up, reorienting the phone towards an object in the room to take a picture, and putting the
phone inside (and later removing it from) a drawer. The presence of these tasks ensured
that the users would not keep the phone still for the duration of the experiment and
produced motions similar to those encountered while using mobile phones for gaming
or office work. Ground truth was taken to be the stationary state for all data in this trial.
C.2.3 Trial 3: Mixture of behaviors
The aim of this trial was to test the accuracy of our algorithm in situations involving
a mixture of activities typical of daily lifestyles. Test subjects were given the phone
and told to complete a list of tasks in five minutes. In this trial, the tasks involved
138
both walking small distances (down a hallway and back) and using the phone to answer
questions on the survey as in the second trial. Video recordings made of the test subjects
during the trial were used to annotate the ground truth of the data collected as belonging
to either the stationary (at rest) or moving (walking) sets. For sliding windows that
spanned both classifications (i.e. took place during transitions between the two states),
a majority vote of readings taken was used to label the data.
C.3 Results
Data from each of these trials formed the input to classification algorithms. Classifica-
tion of features as either stationary or moving was implemented the open source machine
learning toolkit Weka [152]. Weka includes standard algorithms for k-nearest neighbors,
support vector machines, J48 decision trees, and Naive Bayes learning. With the excep-
tion of selecting k=5 for kNN, default parameters were used for each of the algorithms.
This was because the emphasis was more on finding the right feature spaces for the
algorithms and not the algorithms themselves. Results were evaluated with respect to
two categories of user behaviors: classification and training from constant behaviors
(the first two trials) and from mixed behaviors (the third trial). Leave-one-out cross val-
idation was used to generate the confusion matrices. Data was collected from a total
of eight volunteers for the first two trials and seven volunteers for the third trials (one
subject’s third trial log had corrupted data and could not be used).
139
Ground Truth
Startionary Moving
Prediction
Stationary 2274 73
Stationary 47 2246
Table C.1: Confusion Matrix for Constant Behaviors (shown for kNN). Each value in
the confusion matrix represents one window of features that was assigned as either sta-
tionary or moving by the algorithm. All points from Trial 1 were assigned a ground
truth of moving and all points from Trial 2 were assigned a ground truth of stationary.
Data from both trials was used for this confusion matrix.
C.3.1 Classifiers Trained on Constant Behaviors
C.3.1.1 Classification Accuracies
Classification was achieved with a total accuracy of roughly 97.41% for kNN across the
subject pool with a sliding window size of length three seconds with a one second over-
lap. The training was performed with the full set of sixteen globally referenced features.
Of the 120 points classified incorrectly, a total of 94 of these occurred in consecutive
temporal groups of size >= 2 (78.33%). A total of 56 points out of those classified in-
correctly occurred in groups of size >= 3 (46.67%). The next best performing algorithm
was SVM with a classification accuracy of 96.68%.
C.3.1.2 Effect of Different Feature Sets
Using all sixteen features outperformed using just accelerometer power by as much as
5% for kNN. Using the partial feature set (as defined in Section 2.2) performed some-
where between the full feature set and using just accelerometer power. One possible
explanation for this result is that some movements still associated with rest (i.e. putting
the phone in a drawer or moving it around in the air while repositioning it) can actually
generate accelerations of sufficient magnitude to confuse them with walking. Adding
in covariance between the angular rotation velocities provides additional insight on how
140
Power Partial Full
0
50
100
150
Algorithm used
Accuracy Percentage
kNN
SVM
Bayes
Decision Trees
Figure C.1: The relative accuracy ratings of using only power as a feature compared to a
partial feature set of power and covariance between accelerometer and angular rotation
speed and using all sixteen features noted in Section 2.2. Using additional features
helped the algorithms separate between stationary and moving behaviors.
kNN SVM Bayes
0
50
100
150
Algorithm used
Accuracy Percentage
Not Rotated
Rotated
Figure C.2: Classification accuracies for the machine learning algorithms when trained
on features framed locally versus those framed globally. Training on globally framed
features resulted in more accurate classification.
the movement is occurring. Surprisingly, adding more features hurt the Naive Bayes
classifier’s performance (possibly due to over-fitting).
141
Ground Truth
Startionary Moving
Prediction
Stationary 909 33
Stationary 46 1042
Table C.2: Confusion Matrix for Mixed Behaviors (shown for kNN). Each value of
the confusion matrix represents one window of features that was assigned as either sta-
tionary or moving by the algorithm. Ground truth was obtained from annotating video
recordings of the subjects as they completed the tasks.
C.3.1.3 Effect of Coordinate Transformation to Global Coordinate Frame
Transforming to a global coordinate frame resulted in over an 8% increase for some of
the algorithms. This implies that while maintaining the orientation of the device with
respect to global coordinates may not always work for accurate position estimation,
it is still capable of determining if displacement occurs. Rotating to a global frame of
reference for the sensors accounts for rotational changes in sensor streams. For example,
when a phone is rotated by 90 degrees about an axis, with respect to the local frame of
reference, axes of sensor streams will be will be switched to another axis. Accounting
for this rotation ensures that sensors streams always map to the same axis of rotation.
C.3.2 Classifiers Trained on Mixed Behaviors
C.3.2.1 Classification Accuracies
Transitions were handled without much loss of precision, resulting in a total accuracy
of roughly 96.11% for kNN when using the full feature set. Of the 79 points classified
incorrectly, a total of 59 of these occurred in consecutive temporal groups of size >= 2
(74.68%). A total of 43 points out of the those classified incorrectly occurred in groups
of size >= 3 (54.3%). SVM out-performs kNN with a total accuracy of 96.40% when
trained on all sixteen features. Additional features once again improved the performance
142
Power Partial Full
0
50
100
150
Algorithm used
Accuracy Percentage
kNN
SVM
Bayes
Decision Trees
Figure C.3: The relative accuracy ratings of using only power as a feature compared
to a partial feature set of power and covariance between accelerometer and rotation
speed and using all sixteen features. The additional features helped the machine learning
algorithms overcome the noisy data of mixed behaviors.
of algorithms, as illustrated in Figure 4. The improvement was larger than in the constant
behavior case (roughly 10% for decision trees and kNN). As in the previous section, the
full feature set performs the best, followed by accelerometer and covariance, with using
only accelerometer power performing the worst.
C.3.2.2 Effect of Different Feature Sets
Additional training features helped the algorithms more for mixed behaviors with tran-
sitions between the two states than when the behaviors were constant throughout data
recording. Data with transitions is noisier and has more periods where the user was at
rest as per our definition, but still moving in some way (such as when they are sitting
down or standing up). The additions to the feature vector helped to overcome these
complications by providing additional descriptive insight on how the motion occurred.
143
kNN SVM Bayes
0
50
100
150
Algorithm used
Accuracy Percentage
Not Rotated
Rotated
Figure C.4: Classification accuracies for the machine learning algorithms when trained
on locally versus globally framed features when the data included transitions. Once
again using globally framed coordinates aided classification.
C.3.2.3 Effect of Coordinate Transformation to Global Coordinate Frame
Figure C.3 illustrates the effect of rotation to a global frame on classification accuracy.
The difference between locally and globally referenced features is more pronounced for
the mixed behavior trial. Rotating to a global frame of reference provides additional
insight on whether or not the accelerations are being applied in world space.
C.3.3 Conclusion
We have shown that recognizing whether or not the user is moving or not can be done
with high accuracy (>95%) using only kinematic sensors from a single mobile phone
that was not kept in a constant location during the tests. Furthermore, we have demon-
strated it in a semi-naturalistic environment (with transitions between rest and non-rest)
representative of daily lifestyles of everyday users. By doing so, we have identified the
optimal features for high accuracy and also underscored the usefulness of rotating to a
global frame of reference in activity recognition. It should be noted that all approaches
used in this study, including using only accelerometer power, give usable classification
144
rates. However, for some applications higher classification rates might be necessary. In
particular, if data from the accelerometers and gyroscopes implies that motion is not
occurring, then there would be no reason to continually check the GPS or Wi-Fi sensors
to determine if the user is moving (thus saving power). Although in this experiment all
analysis was done via post-processing data collected from the mobile phones, the logi-
cal next step is to integrate the data collection and machine learning algorithms on the
phone hardware itself. The algorithm presented would still function as described in a
real-time setting, the only main difference is that instead of writing the sensor readings
to file they would instead be stored in memory with classification decisions occurring
at the end of every window. Knowing whether or not the person holding a mobile de-
vice is at rest will enable richer applications to be developed with diverse goals from
documenting sedentary lifestyles to being built into indoor localization schemes.
145
Appendix D
Harris Benedict, 1918 [153]
Men: 13:75weight+5height6:75age+66:47
Women: 9:56weight+1:84height4:67age+665:09
Mifflin-St Jeor, 1990 [154]
Men: 9:99weight+6:25height4:92age+5
Women: 9:99weight+6:25height4:92age161
Owen, 1986-87 [155, 156]
Men: 879+10:2weight
Women: 795+7:18weight
WHO, 1985 [157]
Weight only (kg)
Men
18-30 15:3weight+679
31-60 11:6weight+879
>60 13:5weight+487
Women
18-30 14:7weight+496
31-60 8:7weight+829
>60 10:5weight+596
Weight and Height (m)
Men
18-30 15:4weight27height+717
31-60 11:3weight+16height+901
>60 8:8weight+1128height1071
Women
18-30 13:3weight+334height+35
31-60 8:7weight25height+865
>60 9:2weight+637height302
Table D.1: Common predictive equations to predict resting energy expenditure. All
results are inkcal=day
146
Energy Expenditure At Rest
Resting metabolic rate (RMR) is the number of calories expended when a person is at
rest. A typical person spends close to 16 hours a day sleeping or at rest. Thus tracking
energy expenditure due to rest is by far the most critical component in assessing physi-
cal activity levels. Similar to calculating energy expenditure due to activities, there is a
need for indirect calorimetry techniques, in particular predictive equations to determine
energy expenditure at rest [158]. Since there is no movement involved, RMR depends
solely on morphological characteristics of individuals. RMR can vary between indi-
viduals due to differences in body composition, age, gender, ethnicity and so on. An
accurate RMR equation will need to account for the differences.
Table D.1 lists common equations used to predict energy expenditure. The Harris-
Benedict equation [153] was derived from predominantly normal-weight white popula-
tion (136 men, ages 16-63, 103 women, ages 15-74) over a time frame of 1907-1917.
The Owen equation was based on a smaller sample size (60 men, ages 18-82 44 women,
ages 18-65) and included both obese and non-obese individuals. The World Health
organization equations [157] were derived from young European military and police re-
cruits, including 2279 men and 247 women. The total age range was 19-82 years. The
Mifflin-St Jeor equations were derived from a larger sample size (251 males, ages 19-78
years, 247 females, ages 19-78 years) [154] are the most accurate, particularly for obese
individuals due to the presence of a larger number of individuals who were obese.
147
Appendix E
Model Terminology
This section describes the list of terms used in this document for easy reference.
148
Term Description Dimension
p Person p 11
N
p
Total number of data
points for person p
11
n
p
n
th
p
data point for person
p out of a total ofN
p
datapoints
11
x
np
n
th
p
movement
descriptor for personp
(D +1)1
Phys
p
morphological
descriptors for personp
(M +1)1
X
p
Collection of input data
for person p
N
p
(D +1)
y
np
n
th
energy expenditure
value for personp
corresponding to
movementx
np
11
Y
p
Collection of energy
expenditure values for
person p
N
p
1
p
Precision parameter for
the mapping fromx
np
to
y
np
for person p
11
w
p
Model parameter for
person p
1(D +1)
p
Precision parameter for
confidence of model,w
p
11
k Population parameter
matrix for generalization
(D +1)M
X Collection of all input
data points
PN
p
(D +1)
Y Collection of all energy
expenditure values
PN
p
1
PHYS Collection of all
morphological values
P(M +1)
Table E.2: Terms used in this dissertation
149
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Bayesian methods for autonomous learning systems
PDF
Kernel methods for unsupervised domain adaptation
PDF
Energy proportional computing for multi-core and many-core servers
PDF
Modeling and recognition of events from temporal sensor data for energy applications
PDF
Energy use intensity estimation method based on building façade features by using regression models
PDF
New theory and methods for accelerated MRI reconstruction
PDF
Leveraging training information for efficient and robust deep learning
PDF
Modeling, learning, and leveraging similarity
PDF
Identifying injury risk, improving performance, and facilitating learning using an integrated biomechanics informatics system (IBIS)
PDF
Modeling and optimization of energy-efficient and delay-constrained video sharing servers
PDF
Learning affordances through interactive perception and manipulation
PDF
Imaging informatics-based electronic patient record and analysis system for multiple sclerosis research, treatment, and disease tracking
PDF
Using nonlinear feedback control to model human landing mechanics
PDF
Structure learning for manifolds and multivariate time series
PDF
Computational model of stroke therapy and long term recovery
PDF
Generating gestures from speech for virtual humans using machine learning approaches
PDF
Active sensing in robotic deployments
PDF
Statistical inference for dynamical, interacting multi-object systems with emphasis on human small group interactions
PDF
Incorporating aggregate feature statistics in structured dynamical models for human activity recognition
PDF
Inference of computational models of tendon networks via sparse experimentation
Asset Metadata
Creator
Vathsangam, Harshvardhan
(author)
Core Title
Sense and sensibility: statistical techniques for human energy expenditure estimation using kinematic sensors
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
07/30/2014
Defense Date
04/01/2013
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
accelerometer,Bayesian,energy expenditure,Fourier transform,Gaussian process regression,gyroscope,machine learning,mobile phone,OAI-PMH Harvest,physical activity,regression,statistical,Walking
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Sukhatme, Gaurav S. (
committee chair
), McNitt-Gray, Jill L. (
committee member
), Saponas, T. Scott (
committee member
), Sha, Fei (
committee member
)
Creator Email
hvathsangam@gmail.com,vathsang@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-308578
Unique identifier
UC11295228
Identifier
etd-Vathsangam-1903.pdf (filename),usctheses-c3-308578 (legacy record id)
Legacy Identifier
etd-Vathsangam-1903.pdf
Dmrecord
308578
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Vathsangam, Harshvardhan
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
accelerometer
Bayesian
energy expenditure
Fourier transform
Gaussian process regression
gyroscope
machine learning
mobile phone
physical activity
regression
statistical