Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Automatic tracking of flies and the analysis of fly behavior
(USC Thesis Other)
Automatic tracking of flies and the analysis of fly behavior
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
AUTOMATIC TRACKING OF FLIES AND
THE ANALYSIS OF FLY BEHAVIOR
A DISSERTATION
PRESENTED TO THE FACULTY OF THE
UNIVERSITY OF SOUTHERN CALIFORNIA GRADUATE SCHOOL
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
(COMPUTATIONAL BIOLOGY AND BIOINFORMATICS)
By
Mohammad Abbasi
August 2017
ii
© Copyright by Mohammad Abbasi 2017
All Rights Reserved
To Mom, Dad, Mehdi and Meysam.
iv
Acknowledgements
First and foremost, I would like to thank my PhD advisors, Professors Simon
Tavaré and Paul Marjoram. Simon has always been supportive of my work and given me
freedom to pursue different projects, explore new ideas and encouraged me to focus on
long-term impact for which I am very grateful. Paul has been a great role model in my
academic career and has provided me with endless support and guidance on a personal
level as well. As a mentor, he has always been engaged in all details of my projects and
encouraged me to think analytically and helped shape my statistical understanding. He
has always believed in me through the ups and downs of my research and infused
confidence in me through his encouragements to push my own boundaries. He is my
primary resource for discussions regarding methodology and has been instrumental in
preparing this thesis. I am also very proud to have worked with Prof. Sergey Nuzhdin. He
has always been supportive of me and provided me with unlimited scientific advice and
help in carrying out my projects.
My gratitude goes to the other members of my qualification exam and defense
committee, Professors Kim Sigmund, Sandy Eckel and Peter Ralph for their timely
suggestions and advice in addition to inspirational interactions.
I would like to specially thank my officemates, Brad and Steven, for their
friendship and help in addition to all the invaluable lessons I learned through different
scientific discussions and interactions in the office. I would also like to thank my
colleague, Sarah for formulizing interesting hypothesis and all the help in preparing
manuscripts in addition to setting up the biological experiments used in this thesis. I
v
would also like to thank all current and former members of MCB whom I’ve had the
pleasure of working with, including Reza, Asif, Wendy, Peter, Misagh, Zohreh, Ehsan,
Ben and Nate. I also want to thank the present and past members of the Nuzhdin lab for
their thoughtful comments on my projects and presentations. I would also like to extend
my gratitude to the present and past department administrators Katie, Hayley, Christina,
Linda, Doug and Miguel who have always been helpful and kind and for their continued
support. I would also like to acknowledge all my co-workers at Episona, Alan, Phil,
Mike, Latoya, Chris and Annalisa for their full support through completing this thesis.
Dealing with the hardships of graduate life while living away from my family
would not have been possible without the help and support of my beloved friends here in
Los Angeles. I will forever be grateful to my dear friends; Amir, Paniz, Mahsa, Hesam,
Tina, Ashkan, Maani, Hannaneh, Nariman and Pedram who have always been there for
me through the ups and downs of the past years with continued support. I also wanted to
specially thank Maral, for bearing with me through the final months of preparing this
thesis and supporting me throughout the way. Also, my extended gratitude goes to my
dear friends Rad, Saeed and Mahnoosh, for their unlimited support, motivation and true
friendship throughout all my years of academic studies. All these people have provided
me with genuine love and support and been my family away from home.
Finally and most importantly, I would like to thank my family for their
unconditional support and endless love in every step of my life. My parents have
provided me with the opportunity to pursue my dreams and have taught me to always aim
for the stars and that nothing is beyond my reach. None of my accomplishments would
have been possible if it have not been for their continued motivation and support. I am
vi
forever grateful for their patience, encouragement and guidance throughout these years.
My brothers, Mehdi and Meysam have always been there for me throughout the years
and paved the way for my success and for this I will forever be thankful to them.
vii
Table of Contents
Chapter 1: Introduction ............................................................................................................. 1
Chapter 2: High Throughput Tracking of flies........................................................................ 7
2.1 Introduction .................................................................................................................. 7
2.2 Methods ...................................................................................................................... 11
2.2.1 Experiment setup ............................................................................................. 11
2.2.2 Automatic video tracking of flies with movement .......................................... 12
2.2.3 Arena Detection............................................................................................... 15
2.2.4 Background subtraction................................................................................... 15
2.2.4.1 Correcting background using histogram methods (for places with little
movement) ................................................................................................................... 16
2.2.5 Extracting foreground ..................................................................................... 18
2.2.6 Threshold for background subtraction ............................................................ 18
2.2.7 Gaussian mixture models................................................................................. 20
2.2.8 Distinguishing males from females ................................................................. 21
2.2.9 Tracking .......................................................................................................... 22
2.2.10 Hungarian algorithm ....................................................................................... 22
2.2.11 Adapting Hungarian algorithm with cost function.......................................... 22
2.3 Results ........................................................................................................................ 24
2.3.1 Tracking validation ......................................................................................... 24
2.3.2 Analyzing fly movement using tracking data ................................................. 25
2.4 Discussion ................................................................................................................... 25
Chapter 3: Studying effects of social environment on the movement of male and females in
Drosophila melanogaster ............................................................................................................. 27
3.1 Introduction ................................................................................................................ 27
3.2 Methods ...................................................................................................................... 32
3.2.1 Experimental Setup ......................................................................................... 32
3.2.2 Fly lines ........................................................................................................... 33
3.2.3 Automatic tracking .......................................................................................... 34
3.2.4 Movement dataset ........................................................................................... 34
3.2.5 Validation of methods ..................................................................................... 35
3.2.6 Analysis of movement using mixed effect models ......................................... 35
3.2.7 Relationship between male and female movement ......................................... 37
3.2.8 Ψ between abiotic environments ..................................................................... 38
3.2.9 Ψ j for individual genotypes ............................................................................. 39
3.3 Results ........................................................................................................................ 41
3.3.1 Analysis of movement..................................................................................... 41
3.3.1.1 Movement and male genotype ....................................................................... 41
3.3.1.2 Female movement with different male genotypes ........................................ 42
3.3.1.3 Movement is sexually dimorphic .................................................................. 43
3.3.2 Relationship between male and female movement ......................................... 43
3.3.2.1 Ψ between abiotic environments ................................................................... 43
3.3.2.2 Ψ j for individual genotypes ........................................................................... 45
3.4 Discussion ................................................................................................................... 45
Chapter 4: Modeling social group structure of flies using recurrent event models for HMM
corrected tracking data ................................................................................................................ 51
4.1 Introduction ................................................................................................................ 51
viii
4.2 Methods ...................................................................................................................... 54
4.2.1 Experiment Setup ............................................................................................ 54
4.2.2 Tracking .......................................................................................................... 54
4.2.3 Hidden Markov Models .................................................................................. 59
4.2.4 Dataset for number of flies on patch ............................................................... 63
4.2.5 Modeling using recurrent event analysis......................................................... 64
4.2.6 Regression modeling for survival data ............................................................ 67
4.2.6.1 Kaplan-Meier Survival Plots ......................................................................... 68
4.2.6.2 Hazard rate .................................................................................................... 69
4.2.6.3 Cox Proportional Hazards Model .................................................................. 69
4.2.6.4 Assessing the Proportional Hazard Assumption ........................................... 71
4.2.6.5 The counting process model .......................................................................... 72
4.2.6.6 Conditional Model ......................................................................................... 73
4.2.6.7 Frailty Models ............................................................................................... 73
4.2.7 Using frailty models to analyze sojourn time of leaving events ..................... 74
4.3 Results ........................................................................................................................ 76
4.3.1 Results and validation of the constrained HMM algorithm ............................ 76
4.3.2 Kaplan-Meier Survival Plots ........................................................................... 77
4.3.3 Assessing Proportional Hazard Assumption ................................................... 78
4.3.4 Using recurrent events models ........................................................................ 79
4.3.4.1 Leaving events ............................................................................................... 79
4.3.4.2 Joining events ................................................................................................ 81
4.4 Discussion ................................................................................................................... 83
Chapter 5: MovTrack and Click-it as tools for studying the behavior of organisms in video-
recorded setups ............................................................................................................................. 86
5.1 Introduction ................................................................................................................ 86
5.2 Methods ...................................................................................................................... 88
5.2.1 MovTrack: A tool for high-throughput analysis of the behavior of video-
recorded organisms ..................................................................................................... 88
5.2.1.1 Using MovTrack............................................................................................ 90
5.2.2 Click-it: User interface for manual low resolution-high accuracy object
detection...................................................................................................................... 92
5.2.2.1 Explaining the UI and how it works with images and steps ......................... 92
5.2.2.2 Using Click-it ................................................................................................ 93
5.3 Results ........................................................................................................................ 96
5.3.1 Examples of Use ............................................................................................. 96
5.3.2 Analyzing the sedation times of D. Simulans using MovTrack ..................... 97
5.3.2.1 Experimental Design ..................................................................................... 97
5.3.2.2 Detecting overall movement using MovTrack .............................................. 98
5.3.2.3 Moving Average and thresholding .............................................................. 101
5.3.2.4 Validation of Sedation time method ............................................................ 103
5.3.2.5 Genotypic variation in sedation times in ethanol environments ................. 103
5.3.3 A new behavioral phenotyping strategy for pacific oysters .......................... 104
5.3.4 High accuracy data using Click-it for the study of Drosophila melanogaster
group structure .......................................................................................................... 105
5.4 Discussion and Future development ......................................................................... 106
Chapter 6: Conclusion ............................................................................................................... 108
References ................................................................................................................................... 113
1
Chapter 1
Introduction
This thesis focuses on animal behavior, presenting a variety of work that first
identifies how and where animals (in this case mostly Drosophila) are moving, and then
aims to understand the underlying dynamics of the behaviors that are reflected in those
movements.
We have always been interested in the study of animal behavior. In part because
understanding animal behavior helps us get insight into human behaviors and understand
the world around us better. Because animals represent more tractable systems than
humans, and are much more easily subject to experimental manipulation, it is natural to
start with animals when trying to understand many aspects of human behavior. For
example, in (Sumpter 2006), the concept of self-organization was studied in the collective
behavior of animals. Ant pheromone trails are an example of such collective behaviors.
After finding food, ants tend to leave a pheromone marking the trail from the food to the
2
nest so that other ants in the group can follow the smell and find the food source. This
trail of ants can be as small as tens of centimeters or as long as hundreds of kilometers
(Hölldobler and Wilson 1990). Other forms of animal collective and collaborative
behaviors can be seen in the V-shaped flocks of geese migrating from one place to
another (Hawkes et al. 2014) and schools of fish which, come in a variety of shapes and
sizes (Duffy and Wissel 1988).
Animals have also been used as model organisms to study a variety of diseases in
humans. In particular, Drosophila has been widely used to study diseases and behaviors
relevant to humans. This is in part due to the short life span of Drosophila, which allows
the study of genetic manipulation. Also, their small size makes them a good fit for use in
experimental labs. For example, Drosophila has been used within a variety of medical
and psychological studies, such as aging (Tan et al. 2008), drug abuse (Morozova et al.
2006) and aggressive behaviors (Nilsen et al. 2004). As part of this thesis, we use
Drosophila as a model organism to study the effect of ethanol on locomotive behaviors.
This will be described in Chapter 3.
In recent years, the study of animal behavior has become much easier due to the
use of video analysis methods (e.g., Gomez-Marin et al. 2012; Ardekani et al. 2012;
Perez-Escudero et al. 2014; Poiesi and Cavallaro 2015). Using these methods, the
researcher records videos of experiments and then analyzes those videos in order to study
the behavior of the animals. Video analysis can be undertaken in 3 ways; Manually (e.g.,
Dailey et al 1998), Semi-Automatically (e.g., Kavasidis et al. 2012) and Automatically
(e.g., Gomez-Marin et al. 2012; Ardekani et al. 2012; Perez-Escudero et al. 2014;
Mersch, Crespi, and Keller 2013; Khan, Balch, and Dellaert 2005).
3
In manual video analysis well-trained human experts analyze the videos taken
from experiments. In this method, the experts watch the videos of the experiments and
annotate different behaviors throughout the videos without the help of any automated
computational tool. The use of spreadsheets in order to record the time stamps of each
behavior in a video is quite common in this method. This method has the benefit that the
behaviors are annotated more accurately, but it is very time consuming. Also, annotating
large numbers of behaviors in an experiment often imposes unreasonable time-burdens
and is prone to human error.
In semi-automated video analysis, the human expert uses a Graphical User
Interface (GUI) in order to watch the video and annotate the behaviors by clicking on the
screen using well-defined buttons in the GUI. This method has the benefit that it is likely
less prone to human error and can save time in annotation of behaviors. Some versions of
this method work in two steps. In the first step, they construct a first-level
characterization of the interesting features of the behavior and then flag potential
occurrences within a section of video-tracking data. In the second step, those occurrences
are validated using human experts. These methods have the benefit that experts will only
need to look at places where there is the potential for the behavior to actually occur. This
can save a lot of time while also allowing accurate prediction of behavior. However,
these methods are obviously prone to false-negative errors and are still not very feasible
in cases in which more complex behaviors are of interest. Therefore, they are primarily
useful in relatively small experiments with simple behaviors. They may also require a
large number of experts to annotate the videos manually, which is not always possible.
This has triggered the need for fast and reliable annotation methods, which can handle a
4
large amount of experiments in a short time. In Chapter 2 we develop a method for
automatic annotation of behaviors as a solution to this problem.
In automatic video analysis, videos of experiments are analyzed and reliable
predictions of animal behaviors are then made using computational and mathematical
methods. One important part of this method is automatic tracking of animal movement.
This approach has received much attention in the past years, resulting in the development
of tools for tracking the movement of fish (Butail and Paley 2011), ants (Perez-Escudero
et al. 2014), flies (Gomez-Marin et al. 2012; Ardekani et al. 2012) and other animals.
However, these automated approaches impose restrictions on experimental setups. These
restrictions can mean that the environment is very far from ‘normal’ for the animal of
interest, and thus bring into question any conclusions resulting from the analysis.
Moreover, some of these approaches work by defining unique features for the object of
interest in a video. This requires high-resolution video in order to be able to detect the
features of interest consistently throughout the video.
In this thesis, we introduce an automatic tracking method for the high-throughput
analysis of Drosophila behavior. This tracking method is based on the assumption of
there being significant movement of the flies throughout the video. It works in three
steps. In the first step, the static background of the video is found using a global Gaussian
average method. Second, this background is then subtracted from each frame of the video
to form a foreground frame, which contains the moving flies. Finally, the foreground
frame is then thresholded so that the resulting binary image contains ones in the locations
of the flies and zeros everywhere else.
5
The thresholded foreground contains connected components of pixels referred to
as blobs. Ideally each blob will refer to a single fly and we can collect the properties of
the object of interest by fitting an ellipse to the blob referring to that fly. However in
many applications blobs in the foreground frame can correspond to more than one object.
Such instances are referred to as merged blobs. Merged blobs usually appear in places
where flies interact with each other. In our experiment, we use the fact that the number of
flies in each arena is fixed in order to better define blobs, which refer to more than one
fly. Gaussian mixture models (GMMs) are used to find the properties of flies in blobs. A
GMM is a weighted sum of the probability density function of multiple Gaussians. Here
an EM algorithm is used to fit two-dimensional GMMs to the pixel intensities of a
merged blob.
Having found the properties of the flies in each experiment, we use an adaptation
of the Hungarian algorithm (Kuhn 1955) to do the tracking. This algorithm finds the best
global matching of two sets of observations by minimizing the distance between pairs of
observations. We modify the Hungarian algorithm by using information about the sex
and movement direction of flies and use it to find the best matching between flies in
consecutive frames. A more complete definition of these methods can be found in
Chapter 2.
After finding the movement trajectories of animals, the goal is often to understand
their behavior. This is frequently done using mathematical models and machine learning
algorithms. Machine learning algorithms are mostly used in order to train classifiers to
detect animal behaviors using the tracking data (e.g., Kimura et al. 2014; Dankert et al.
2009). This approach uses manually-defined features extracted from the tracking data.
6
The resulting behavioral annotation can then be used in mathematical models, which aim
to better understand both the behaviors and the relationships between them. These
mathematical models range from simple linear regressions, which try to understand the
effect of different variables on the behavior of animals (e.g., Histed and Maunsell 2013),
to more complex models such as Markov Models (e.g., Patterson et al. 2008).
In our study we aim to use statistical analysis in order to understand the behavior
of Drosophila Melanogaster. We explore the effects of ethanol exposure on social
behaviors. To do so, we first develop a tracking method in Chapter 2 in order to collect
movement data from high-throughput experiments on different genotypes of Drosophila
Melanogaster in a variety of experimental settings and use these data in Chapter 3 to
understand the effects of social environment on locomotory behavior. Although this
tracking method is introduced for D. Melanogaster, it can also be adjusted and then used
in the study of other animals such as fish and ants. Such tracking data of course contain
errors, so in Chapter 4 we aim to further refine the predictions of our tracking algorithm
to more accurately predict the number of flies on a patch (a simple, high-level summary
of social behavior, i.e., group size). We do this by modeling the number of flies on a
patch as the hidden state of a Hidden Markov Model (HMM), for which we estimate
appropriate transition and emission probabilities. We then use recurrent survival analysis,
and the Cox proportional hazard models, to test for the determinants of the group
structures of flies. Finally, in Chapter 5 some novel computational tools for the study of
animal behavior are introduced in addition to some examples of the use of these tools.
These tools are useful in cases where the experiment setup does not allow for the simple
use of automatic tracking software.
7
Chapter 2
High Throughput Tracking of flies
2.1 Introduction
Gaining an understanding of biological systems has been the focus of many
studies in recent years. An example of this is attempts to understand the underlying
mechanisms of behaviour in a variety of organisms. Recently this has become more
feasible through the use of new technology and automatic tools. In particular, the use of
automatic tracking has grown rapidly in the past decade, and this has enabled scientists to
efficiently collect large amounts of data, allowing for many new analyses to be performed.
Example of this are cases in which tracking has been used to understand the behaviours
of single molecules (Kusumi et al. 2014), ants (Mersch, et al. 2013), honeybees (Kimura
et al. 2014), fish (Butail and Paley 2011) mice (Ohayon et al. 2013) and flies (Branson et
al. 2009)
8
In this thesis, we are focused on studying the behaviour of Drosophila using
automatic tracking methods. Drosophila have been widely used as a model organism to
study behaviour. This has mostly been due to the short life span of fruit flies, which
makes them a good target for studying the effect of genetic manipulation. Also
Drosophila are a great fit for studies in controlled environments due to their small size
and ease of maintenance and manipulation. This has allowed analysis of a variety of
behaviours having importance in medical and physiological studies. Examples include
studies of aggression (Chan and Kravitz 2007), memory-loss (Tan et al. 2008), drug
abuse (Devineni and Heberlein 2009) and spatial preference (Soibam et al. 2012).
Ethanol, and the effect it has on human behaviour, has been a focus of many
studies because of uncertainty regarding the physiological causes of drunken behaviour.
This has led to many experiments using model organisms to study behaviour under
ethanol exposure. Observations of Drosophila under the influence of ethanol have been
used as the basis of studies of a variety of behaviours and diseases. In Lee et al. (2008)
the effect of ethanol on disinhibited sexual behaviour was studied using Drosophila
courtship behaviour. It was shown that the presence of ethanol increases disinhibited
courtship in Drosophila. In Devineni and Heberlein (2009) alcohol addiction in humans
was studied using Drosophila behaviour under ethanol exposure. Such studies are useful
because behaviours such as hyperactivity, and their underlying molecular pathways, are
conserved between humans and Drosophila. In the experiment described in this chapter,
we study the effect of ethanol exposure on the behaviour of flies in various genotypes of
Drosophila melanogaster. The details of the experiment are introduced in the Methods
section. The results will be discussed in full in Chapter 3.
9
To study fly behaviour most efficiently, we need to be able to monitor the flies
automatically using a tracking system. In the past, visual observation methods (e.g.
Balakireva et al. (1998) ; Martin and Grotewiel (2006)) were used to study the movement
of flies in the laboratory using well-trained investigators. However, these methods are
time-consuming and prone to human errors. Moreover, the data generated are not detailed
enough to enable simultaneous inference for many behaviours, and applications are
therefore usually limited to detecting one behaviour.
Recently, tracking software has become available. It offers the potential for
automatic tracking of flies, allowing us to collect vastly greater quantities of data. This
then allows us to make more refined inferences about behaviours. These methods
commonly rely on a variety of pre-existing video tracking methods to detect the object. In
Kain et al. (2013), marked flies are used to simplify the video tracking. However, this
marking can affect the behaviour of the flies. In Perez-Escudero et al. (2014), object
detection-based video tracking is performed by extracting features from individual flies.
This method requires high-resolution videos in order to be able to extract useful features
and then accurately detect the flies. In White et al. (2010) the different behaviours of
mid-age and old flies, such as walking activity and speed were studied. These behaviours
where then analysed in order to study the effects of Parkinson disease. In Ofstad et al.
(2011) the visual place learning ability of flies was measured by genetically silencing
neurons in their brains. In Branson et al. (2009), as is common, movement-based video
tracking is performed in an environment with controlled (infrared) lighting. Ctrax was
subsequently introduced in 2009 as an open source, automatic tracking software for use
in fly tracking. Since then it has been used in more than 100 different studies of fly
10
behaviour. The main goal of Ctrax is to use automatic tracking in order to quantify
behaviour. For example, the tracking data is used in Ctrax to annotate single fly
behaviours such as walk, stop, sharp turn, crabwalk, backup, and jump. These behaviours
can be detected from the video tracking data using machine-learning techniques. Using
Ctrax has helped the ethology community by reducing the time required to gain
understanding of different fly behaviours.
A further software package is JAABA (Kabra et al. 2013), a tool for training
classifiers that act on automatic tracking data to detect different behaviours. JAABA was
introduced in 2012 by Kabra, et al. and has been used in the behavioural study of a
variety of animals since then (Ohyama et al. 2013).
However, the automated approaches discussed above typically impose restrictions
on the environment in which the fly can be placed. These restrictions may deny
experimenters the ability to study behaviour in environments that capture key features of
natural fly environments. For example, we wish to perform our experiments in close-to-
natural environments by minimizing lighting constraints. This is the main reason we
cannot use Ctrax for our tracking (since it requires infrared lighting to provide a good
contrast between background and moving flies, and this lighting is not suitable for
studying the behaviours of interest in our study).
Consequently, in this chapter, we present a high-throughput automatic tracking
method which allows for more realistic fly environments (e.g. by removing restrictions
on lighting), can detect moving flies even in low resolution (unlike existing methods), is
able to successfully process data in which flies may move very little, and allows for even
higher-throughput by enabling the tracking of many flies in a single video.
11
In our experimental setup, we aim to collect data from multiple replicates of the
experiment at the same time. However, this means that we need to include multiple
arenas in a single video. This will divide the resolution of the camera between the arenas
in the video, which results in lower resolution images for any single arena. For this
reason, we are not able to use object detection methods like idTracker for video tracking,
which require high resolution videos to be able to extract features from the tracking
object. Therefore, we developed our own image tracking methods, described below.
2.2 Methods
2.2.1 Experiment setup
Our goal is to understand the effect of ethanol on the behavior of flies. In our
experiment, we aimed to study genetic variation in behaviour using a variety of
genotypes of D. melanogaster. In this study, flies of six different genotypes were subject
to varying lengths of ethanol exposure. For each exposure time we ran six replicates of
the experiment, three in environments with ethanol and three in environments without
ethanol. Each replicate consisted of 12 arenas, where each arena contained two males and
one female. Each arena was a closed circular arena containing food.
We set up the experimental assays for each replicate by placing flies in cold
temperatures for a maximum of 10 minutes, during which we placed the flies on the
arenas (cold temperatures make the flies docile). Afterward, we returned the assays to
room temperature and waited 10 minutes before starting the experiment. This allowed the
flies to get familiar with their environment. For each genotype we then ran the
experiment for a (common) set of durations of ethanol exposure (10, 20 and 30 minutes).
12
We did this because the response time to ethanol exposure may be different in each
genotype. Videos of each experimental replicate were collected using two cameras,
where each camera covers six arenas of the experiment. The first frame of a sample video
can be seen in Figure 1.
Figure 1 The first frame of a sample video
2.2.2 Automatic video tracking of flies with movement
We now wish to analyze the movement of the flies. We wish to do this
automatically, so we used cameras to record the movements of each fly throughout the
experiment. This means recording the spatial properties of the flies in every frame of a
video. We then process this data to produce tracks for individual fly movement.
13
There are two approaches that can be taken to track an object. In the first
approach, distinguishable features of the object are detected in each frame of the video,
and using a similarity threshold the object is identified in every frame (Kiryati et al.
1991). This can be undertaken using a machine-learning scheme, where the features are
calculated for a training dataset consisting of multiple images of the tracked object, and
then used to train a classifier, which identifies the object in other images. This method is
called the object detection method for tracking. In order to use this method we need high-
resolution videos in order to be able to extract meaningful features from the objects. In
our setup we are monitoring multiple replicates of the experiment in one video, as a
consequence of the reduced field of view for each specific patch we therefore do not have
enough resolution to use object detection methods.
The second tracking method is based on the assumption that the tracking object
moves throughout the video. We divide the pixels in each frame into ‘foreground’ and
‘background’. Here, we also make the assumption that the background remains constant
during the video and use a background detection method, which will be explained later.
The foreground pixels will contain the tracking object, which will be extracted using
thresholding on the pixel intensities.
In this project we exploit the second method for tracking. The tracking consists of
a number of parts, which will be discussed separately. The steps taken to get the data
from the video can be seen in Figure 2.
14
Figure 2: This figure shows a flowchart of the steps taken to detect objects in the video
Camera
Frame Inquisition
Background
Image
Current
Frame
Background
Subtraction
Foreground Frame
Thresholded
Foreground Frame
Blob Detection
Merged
Blob
GMM
Elipse Fitting
Yes
No
15
2.2.3 Arena Detection
In the first step, Hough transforms were used to locate arenas in the video. The
Hough transform is a feature extraction method, which has been historically used to
extract lines from an image (Duda and Hart 1972). Nowadays, it is mostly used in image
processing and computer vision in order to extract imperfect shapes or, most commonly,
circles, from images (Duda and Hart 1972; McLaughlin 1998; Kiryati et al. 1991). This
method perfectly suits our context because the shapes of the arenas are imperfect (i.e.
distorted) circles.
2.2.4 Background subtraction
Background detection has had wide use in the detection of moving objects in
stationary camera fields (Stauffer and Grimson 1999; Piccardi 2004; Ardekani et al.
2012). In movement-based object detection, the aim is to detect moving objects from the
static background in the video. Background subtraction is the first step in this method. It
is based on distinguishing the static background from the non-static foreground. We use a
global Gaussian average method to detect the background of the video. In this method, a
Gaussian probability density function is assumed to describe the distribution of the
intensity level of each pixel throughout the video (Piccardi 2004).
The mean intensity of the fitted Gaussian distribution, 𝜇 , is calculated by fitting
the distribution to the observed pixel intensity across all frames. This is effective because
flies represent a small fraction of each image from the video. We repeat this method for
all 800 x 600 pixels in the video to form a background image. Figure 3 (top row) shows a
sample frame (left) and the extracted background image (right) for an example video.
16
2.2.4.1 Correcting background using histogram methods (for places
with little movement)
The method described in the previous section results in relatively few errors in
experiments in which the objects have high motility, for which it returns a close to perfect
background which can then be used down the pipeline for object detection. But in
experiments in which one or more objects have little to no movement, this method shows
reduced performance. The problem is that a tracking object that seldom moves will be
considered as background and cannot be detected in the later steps of the object detection
pipeline. Figure 4 (right) shows a sample of a background for a sample video that
contains flies that showed little to no movement.
In order to avoid this issue, we use a histogram method to define the background.
This exploits the assumption that pixels that correspond to flies that aren’t moving
represent a small proportion of pixels in the overall background image. Specifically, we
proceed as follows. First, for each arena we consider only the pixels inside the circle
corresponding to the arena. We then replace all pixel intensities that fall in the lower 1%
quantile of the distribution of all pixel intensities inside that arena, with the pixel
intensity corresponding to that 1% quantile. We repeat this process for all arenas in our
experiment to get the full background image. Figure 4 (left) shows such a corrected
background image. We then use this background to detect foreground objects and
perform the tracking of fly movement.
17
Figure 3: Shown here are frames illustrating the steps taken to obtain a thresholded
foreground image. A typical image is shown in the first panel, while the second contains
that image's background. We first extract the foreground of this same frame of a video,
and apply a threshold to obtain the location of the ‘blob’s corresponding to the fly
locations.
Figure 4: This figure shows the background image for an experiment in which flies have
little to no movement (right) alongside the corrected background image (left)
18
2.2.5 Extracting foreground
The foreground for each frame was extracted using the estimated background.
(2.1) 𝐹 ( 𝑖 ) = 𝑚 𝑎 𝑥 ( 0 , 𝜇 − 𝑋 ( 𝑖 ) )
Equation (2.1) is applied pixel-by-pixel, where X(i) is the measured pixel intensity
and F(i) is the foreground pixel for that pixel in that frame. Note that, in our application,
foreground objects (i.e. flies) are darker than the background, and so will have lower
values of X(i).
To remove noise, a threshold is applied to the calculated F(i) values:
(2.2) 𝑇𝐹 ( 𝑖 ) = 𝐼 ( 𝐹 ( 𝑖 ) > 𝜏 )
where I(.) is the indicator function and τ is the threshold. TF(i) is an indicator
function which shows whether the pixel corresponds to foreground or background. Figure
3 (bottom row) shows the foreground image (left) and the thresholded foreground image
(right) for a sample frame of video. Several methods can be used to choose the threshold
τ. Here we used a cross-validation approach explained in the next section (Sezgin and
Sankur 2004). We then extracted the set of connected non-zero components of the set of
TF values for each frame.
2.2.6 Threshold for background subtraction
The threshold, 𝜏 , is determined in the following way. First, we choose a set of
possible values, M={m1, m2,…, mn}, for the threshold. Each of the m ∈ M values were
then separately applied and the thresholded foreground image was found for a large
number of randomly sampled frames. To determine a performance score these images
19
were then analysed manually and the score ( 𝑠 𝑚𝑖
) for threshold m ∈ M, was defined for
frame i as follows:
(2.3) 𝑠 𝑚𝑖
=
𝑛 𝑓 𝑙 𝑖 𝑒𝑠 − 𝑓 𝑜 𝑢 𝑛 𝑑 − 𝑛 𝑚 𝑒𝑟𝑔 𝑒𝑑 − 𝑛 𝑛 𝑜 𝑛 − 𝑓 𝑙 𝑖 𝑒𝑠 𝑛 𝑓 𝑟 𝑎 𝑚 𝑒 − 𝑓 𝑙 𝑖 𝑒𝑠
here 𝑛 𝑓 𝑙𝑖𝑒 𝑠 − 𝑓 𝑜𝑢𝑛 𝑑 represents the number of flies detected as a single connected
component in the thresholded foreground frame, 𝑛 𝑚 𝑒 𝑟𝑔 𝑒 𝑑 shows the number of flies
found as merged blobs (blobs with more than one fly in a connected
component), 𝑛 𝑛 𝑜 𝑛 − 𝑓 𝑙𝑖𝑒 𝑠 shows the number of connected components found that are not
flies, and 𝑛 𝑓 𝑟𝑎 𝑚 𝑒 − 𝑓 𝑙𝑖𝑒 𝑠 shows the manually curated number of flies in the original frame
(which is always fixed for these experiments). We then define the global score for
threshold value m, sm, to be
(2.4) 𝑠 𝑚 = 𝑚 𝑒 𝑎 𝑛 𝑖 ( 𝑠 𝑚𝑖
, 𝑖 𝜖 𝑉 )
where i indexes frames in our test frame set V. The threshold is then chosen to be:
(2.5) 𝜏 = 𝑎 𝑟 𝑔 𝑚 𝑎 𝑥 𝑚 ( 𝑠 𝑚 )
see Figure 5 for examples of thresholded images with different threshold values.
Figure 5: This is an example of how thresholding affects the extracted data for an
illustrative frame. The left image shows the results of applying a low threshold. The
20
middle frame shows results of using a threshold that is too high, such that data are lost.
The right image shows results using a well-chosen threshold
2.2.7 Gaussian mixture models
Having defined the thresholded foreground of a frame, Gaussian mixture models
(GMMs) were then used to de-convolve blobs that represent more than one fly. A GMM,
𝑃 ( 𝑥 ) , is a probability density function that consists of a weighted sum of multiple
Gaussian densities (Sheng et al. 2005). It is defined as:
(2.6) 𝑃 ( 𝑥 ) = ∑ 𝑤 𝑖 𝑀 𝑖 = 1
𝑁 ( 𝑥 | 𝜇 𝑖 , 𝛴 𝑖 )
Here, M represents the number of components in the mixture of Gaussians, 𝑥 is
the D-dimensional vector of pixel values, 𝑃 ( 𝑥 ) represents the probability density function
of 𝑥 , the 𝑁 ( 𝑥 | 𝜇 𝑖 , 𝛴 𝑖 ) are the Gaussian mixture components, and the 𝑤 𝑖 s are the weights
assigned to each component of the mixture (so that ∑ 𝑤 𝑖 𝑀 𝑖 = 1
= 1). Here we used a mixture
of two-dimensional Gaussian distributions, since the shape of a fly is reasonably captured
by an ellipse.
In our experiments, there will always be three flies in each arena. Therefore, if
there are two blobs in the thresholded foreground of a given frame of video, rather than
the expected three, the biggest blob was found and labelled as the merged blob. A two
component GMM was then fitted to that blob (or if there was just one blob, we fitted a
three component GMM). An EM algorithm was used to fit the GMMs. This was carried
out using the ‘GMdistribution’ function in Matlab. The EM algorithm is prone to getting
stuck at local maxima. To avoid this issue, the prediction of the centers of the closest
blobs in the previous frame was used as the starting point for the EM algorithm. After
21
fitting the GMMs, the Eigen decomposition of the covariance matrix 𝛴 of the 2-D
Gaussian distributions was used to find the properties of the ellipse fitted to each fly.
2.2.8 Distinguishing males from females
The female flies in these experiments are white-eyed yellow body colour mutants.
These mutants have the property of being lighter in colour than males. After detecting the
three flies in our arena, males and females were distinguished by using the colour
intensity of the pixels of the image of each fly. In the first step, for each fly image in a
given frame we find the minimum of the pixel intensities (i.e. the darkest). We then find
and label the fly with the maximum intensity (i.e. lightest) of those pixels as the female.
This is carried out for each frame, and the method performs with relatively low error rate
in frames without merged blobs. In the presence of a merged blob we simply detect
whether the female fly is present in the merged blob. This is done by comparing the pixel
intensities of the two blobs and determining which one contains the female fly with the
use of the method just described.
We then considered the times during which a fly corresponds to a single ‘blob’
between instances of merged blobs to smooth these frame-by-frame calls of sex. A
reasonable assumption is that during the period that a blob corresponds to a single fly, its
sex should not change, so we measured the proportion of those frames in which a blob
was labelled as male (female). After performing this step for each blob, the blob with the
highest proportion of female labels was assigned as the female fly for that time period.
22
2.2.9 Tracking
Tracking consists of connecting the blobs across frames in the video. We do this
using a Hungarian matching algorithm (Kuhn 1955) that we have modified to also use the
male/female labels in each frame.
2.2.10 Hungarian algorithm
The Hungarian algorithm is a well-known matching algorithm in the field of
computer vision and machine learning. It finds the best global matching between two sets
of observations, where “best” is defined as the match that minimizes the distance between
pairs of observations. The algorithm was introduced by Kuhn in 1955 and has been used
in many matching problems since then. Here we use it to match the flies in consecutive
frames and thereby produce trajectories for each fly. We now give details.
2.2.11 Adapting Hungarian algorithm with cost function
In order to determine the movement of a fly, we need to pair blobs across frames
(i.e. across time). A modified version of the Hungarian algorithm was applied throughout
the video to find the optimal correspondences of blobs in consecutive frames and to
thereby record the movement of each blob.
The Hungarian algorithm minimizes the Euclidean distance between pairs of
blobs (modelled by ellipses) fitted in consecutive frames. In our case, we applied the
standard Hungarian algorithm without modification in cases in which consecutive frames
contained no merged blobs, thus choosing the matching that minimized the Euclidean
distance between blobs. Matching using the standard Hungarian algorithm is less reliable
in cases in which a frame contains a merged blob and there are typically two local
23
maxima that correspond to the two matchings for assigning flies to the merged blob.
Therefore, an additional step was applied to choose between these two maxima, in which
we constructed a cost function that includes two terms: the first one penalizes matches of
blobs that have been labelled with different sexes (the ‘sex mismatch’ cost), while the
second term is proportional to the angular difference of the major axis of the ellipse
(fitted by the GMM) corresponding to matched flies in two consecutive frames (the
‘angle difference’ cost). The final matching of flies to blobs was chosen to be the one
with the lowest cost function value (See Algorithm 1 for a complete description).
Algorithm 1
Definitions:
Let { 𝑋 𝑖𝑗
| 𝑖𝜖 { 1 , … , 𝑁 } } be the object of the flies in frame i containing the features of the
fitted ellipse to the blob of the fly. These features are:
(x,y)=the coordinates of the centroid of the ellipse.
ang() = the angle between the major axis of the ellipse fitted and the x axis.
sex() = the sex of the fly (male, female).
f = a flag indicating the presence of a merged blob in the frame.
Let 𝑓 𝑖 = 1 indicate the presence of a merged blob in the frame i.
Let { 𝑀 𝑖𝑗
| 𝑁 × 𝑀 𝑚 𝑎 𝑡𝑟 𝑖 𝑥 } contain output trajectories of flies.
Let ang_cost( 𝑋 𝑖 , 𝑋 𝑖 − 1
) be the sum of the angular differences of matched flies between
frame i and frame i-1. i.e. ang_cost( 𝑋 𝑖 , 𝑋 𝑖 − 1
)=sum(abs(ang( 𝑋 𝑖 )- ang( 𝑋 𝑖 + 1
)))
Let sex_cost( 𝑋 𝑖 , 𝑋 𝑖 − 1
) be the sum of incorrect matchings of male/female flies between
frame i and frame i-1. i.e. sex_cost( 𝑋 𝑖 , 𝑋 𝑖 − 1
)=M-sum (I(sex( 𝑋 𝑖 ),sex( 𝑋 𝑖 + 1
)))
Initiate 𝑀 1
= 𝑋 1
Loop:
for i=2:N
If ( 𝑓 𝑖 = 0)
Let 𝑋 `
𝑖 be the Hungarian matching of the set 𝑋 𝑖 to 𝑋 𝑖 − 1
.
𝑀 𝑖 = 𝑋 `
𝑖 .
end
If ( 𝑓 𝑖 = 1 )
Let 𝑋 ′
𝑖 & 𝑋 ′′
𝑖 be the two best Hungarian matchings of the set 𝑋 𝑖 to 𝑋 𝑖 − 1
.
24
𝑡𝑜 𝑡 𝑐 𝑜𝑠 𝑡 ′
= 𝛼 sex_cost( 𝑋 ′
𝑖 , 𝑋 𝑖 − 1
) + 𝛽 ang_cost( 𝑋 ′
𝑖 , 𝑋 𝑖 − 1
)
𝑡𝑜 𝑡 𝑐 𝑜𝑠 𝑡 ′′
= 𝛼 sex_cost( 𝑋 ′′
𝑖 , 𝑋 𝑖 − 1
) + 𝛽 ang_cost( 𝑋 ′′
𝑖 , 𝑋 𝑖 − 1
)
𝑀 𝑖 = 𝑎 𝑟 𝑔 𝑚 𝑖 𝑛 𝑋 𝑖 ( 𝑡𝑜 𝑡 𝑐 𝑜𝑠 𝑡 ′
, 𝑡𝑜 𝑡 𝑐 𝑜𝑠 𝑡 ′′
) .
end
end for
Output M
2.3 Results
2.3.1 Tracking validation
In order to assess the performance of our tracking approach, we manually
annotated a test set of video data. This set consisted of a total of 36 videos, three videos
from each treatment. From each video, we randomly picked ten 20-second intervals of
video and recorded instances of two different measures of error; track switching accuracy
and sex identification accuracy. Sex identification errors occur when the female fly is
detected as a male fly (and vice-versa). Track switching errors occur when a track that is
supposed to annotate the movement of a single fly switches between two flies.
We recorded error rates in two different contexts: cases in which the flies were
visually distinct and cases in which the flies came into close contact with each other,
where merged blobs are likely to appear. The results for accuracy of the tracking can be
seen in Table 1. We see that inaccuracy in recording closely positioned flies was less than
5% for sex identification and 6% for track switching. When flies were not in close
contact, error rates were very low.
Table 1: The results for the validation of the tracking method
Track Switching Sex Identification
Distinct Flies 99% 99%
Flies in close contact 94% 95%
25
2.3.2 Analyzing fly movement using tracking data
Our method was applied to track flies and extract tracking data from 1013 arenas
from our experiment. We then used the tracking data to study the effect of environment
and genotype on male and female movement. A complete report on the results of the full
study can be found in Chapter 3.
2.4 Discussion
In this study we introduced a high-throughput automatic tracking method to track
the movement of flies. Crucially, our method performs well in low-resolution videos in
arenas with little to no restriction on experimental environment. This allows high-
throughput analysis of behavioural data derived from more realistic (i.e. less constrained)
environments. In addition, our method allows experimenters to restrict the size of the
variation due to non-biological experimental factors in the behavioural data, since
multiple trials (i.e. arenas) can be performed in the same experimental unit. This can
allow for more precise statistical analysis and a better understanding of the biological
influences on the behavioral phenotype of interest.
Most experimental restrictions can affect the behavior of animals in unknown
ways and interfere with analysis of the biological questions of interest. For this reason,
studying animals in experimental situations that are as close as possible to being ‘natural’
is of great importance. Our method provides a platform for tracking flies (and other small
animals) with little to no restrictions. We also introduced a correction to the background
subtraction algorithm to allow for the detection of flies that undergo little to no
26
movement. This can be very important in the study of some behaviors. For example, we
might be interested in the behavior of different flies in an arena in which mating happens.
In these experiments, two flies can spend a large percentage of time mating and will be
not moving. This can cause the regular background subtraction methods to fail. Our
method allows for the tracking of flies before and during mating.
We will show in Chapter 3 that with the use of our tracking method, we can study
locomotory behaviors of flies in order to answer complex questions regarding the effect
of environment and genotype on fly behavior.
27
Chapter 3
Studying effects of social environment on
the movement of male and females in
Drosophila melanogaster
In this chapter we use the automatic tracking software described in Chapter 2 to
study effects of social environment on the movement of male and female D.
melanogaster. Much of the text for this chapter is taken from a paper we published in
Evolution (Signor et al. 2017).
3.1 Introduction
The expectation that directional selection should remove variation from
populations has long been at odds with the observation that there is abundant variation for
fitness related traits (Barton 1990; Houle 1998; T. Johnson and Barton 2005). There are
many hypothesized mechanisms for the maintenance of variation: balancing selection
(Connallon and Clark 2014), sexually antagonistic selection (Foerster et al. 2007),
28
context specific fitness (Mojica et al. 2012; Miller and Svensson 2014), and epistasis
(Arnqvist et al. 2014) among other possible mechanisms. Context specific fitness can
include spatially (Johnson et al. 2013) or temporally varying selection (Bergland et al.
2014), and social or sexual selection (Lyon and Montgomerie 2012). Here we will focus
on social context, in the form of indirect genetic effects (IGEs). IGEs occur when the
phenotype of a focal individual changes in response to alterations in the genetic
composition of its social environment (Wolf et al. 1998). For example, if trait values of
individuals in a population were decomposed into two additive genetic components the
first would originate from the focal individual, termed the direct genetic effect. The
remainder would come from the genes of its social partners, and this would be the IGE
(Moore et al. 1997; Bijma 2014; Dingemanse and Araya-Ajoy 2015). This can impact the
evolution of other traits, their genetic architecture, response to selection, and the direction
of selection (Moore et al. 1997; Wolf et al. 1998; Wolf 2000).
In this chapter we will focus on two aspects of IGEs: the effect of the abiotic
environment on IGEs, and the importance of IGEs for intersexual interactions. We use a
description of IGEs termed the ‘coefficient of interaction’, or Ψ (Moore et al. 1997;
Bleakley and Brodie 2009). Ψ is measured as the partial regression coefficient of the
behavior of an individual on its social partner. When Ψ is measured on standardized traits
it varies between -1 and 1 which provides an easily interpretable means of measuring the
strength and direction of social effects. For example, if the majority of movement in a
focal individual is due to the amount of movement in its social partners this would result
in a non-zero Ψ. Replication can then establish the genetic contribution to a non-zero Ψ.
We use Drosophila, as it easily lends itself to genetic replication in the laboratory
29
through the use of inbred lines. To investigate the effect of the abiotic environment on
IGEs and the importance of IGEs for intersexual interactions, we will estimate Ψ in two
abiotic environments for a sexually antagonistic trait, locomotion. Variation in Ψ between
abiotic environments would result in spatially varying social effects, which could
maintain variation for a trait (Harris et al. 2008).
The abiotic environment is expected to have large effects on IGEs, as it does on
the expression of many trait values (Hayes et al. 1993; David et al. 1994; Gurganus et al.
1998). It has been shown to be important for IGEs in one study(Bailey and Zuk 2012). To
investigate the abiotic context-specific effects of IGEs we varied our abiotic environment,
using ethanol as our variable because it affects a range of behaviors including
locomotion, it is a common component of the fermenting fruit that makes up the primary
habitat of D. melanogaster, and it has important fitness effects (Gibson et al. 1981;
Dorado and Barbancho 1984; Gibson and Wilks 1988). Further, ethanol concentrations
vary across natural substrates, including the fermented and alcoholic wine seepages that
are known habitats of D. melanogaster (Gibson et al. 1981; Milan et al. 2012). Areas of
higher ethanol concentrations provide caloric benefits and increased resistance to
parasitism, as well as negative effects such as slowed development time (McClure et al.
2011; Pohl et al. 2012; Milan et al. 2012).
While the potential importance of the abiotic environment for context specific
fitness is clear, we also wanted to investigate an interaction for which the importance of
the abiotic environment is less clear, sexually antagonistic selection. Sexually
antagonistic selection occurs whenever the fitness optimum for a given trait differs
between males and females, and may be responsible for much of the fitness variation in
30
adult Drosophila (Chippindale et al. 2001; Foerster et al. 2007; Cox and Calsbeek 2009;
Innocenti and Morrow 2010; Harano et al. 2010). Many sexual traits have different sex-
specific optima (Rostant et al. 2015), so if IGEs can interfere with the expression of sex-
specific phenotypes this could have large effects on the outcomes of evolution. For
example, if males that move more cause their social partners to move more this could
decrease the fitness of their social partners. If the abiotic environment has a large effect
on these sexually antagonistic interactions then the effect on social partners will be very
fine grained, resulting in the maintenance of variation in both sexes (Long and Rice 2007;
R. M. Cox and Calsbeek 2009).
Here we will integrate these two concepts by focusing on variation in a sexually
antagonistic trait (locomotion) that is affected by IGEs. In Drosophila melanogaster
locomotion is an important component of fitness, and it is highly sexually dimorphic,
with males being nearly three times more active than females (Long and Rice 2007). In
D. melanogaster, males are selected for increased activity because their fitness is
primarily determined by their ability to locate and court females (Bateman 1948;
Partridge, et al. 1987; Jordan et al. 2006). In one instance, male genotypes that showed
high activity when not courting were the most reproductively successful in the population
(Partridge et al. 1987), while Long and Rice (2007) also found that males with higher
activity levels sired more offspring. In addition, it has been shown that females are more
likely to mate with males from populations selected for increased activity, and that the
inverse is true in low activity populations (Jordan et al. 2006).
Females are selected to behave in a different manner as a result of their own sex-
specific optima. Numerous studies have demonstrated that in D. melanogaster females
31
each successive mating results in a reduction in survival, with little evidence that this is
offset by increases in offspring quality (Tompkins et al. 1982; Brown et al. 2004; Orietza
et al. 2005; Kuijoer et al. 2006; Fiumera, Dumont, and Clark 2006; Stewart et al. 2008;
Slatyer et al. 2012). Higher activity is generally detrimental for females, with increasing
activity levels associated with lower fecundity (Long and Rice 2007). This may be
because female movement is a stimulus for male courtship, resulting in interference
during oviposition, and an increase in the energy spent by females to reject males
(Maklakov and Arnqvist 2009). In order to lessen the degree of male interference
stimulated by female locomotion, selection then results in decreased movement in
females (Long and Rice 2007).
We use movement in D. melanogaster, measured in groups of two males of the
same genotype and one female of another genotype, to create an exceedingly detailed
picture of IGEs in different abiotic environments. We will establish that there are IGEs
for locomotion in D. melanogaster by measuring Ψ. We will also consider the possibility
that Ψ are context dependent and vary between abiotic environments. It has been
demonstrated that IGEs are context dependent in different biotic environments
(Chenoweth et al. 2010), but little is known about how Ψ varies between abiotic
environments. We will discuss the potential impact of context specific Ψ and their
possible role in sexual antagonism (Wolf et al. 1998).
32
3.2 Methods
3.2.1 Experimental Setup
We used six male genotypes and a single female genotype. Female genotype was
invariant across all assays, while male genotype was the same within an assay and
different across assays. Flies were assayed in groups of two males and one female. The
flies were assayed within chambers that contained 12 isolated, circular arenas with a
diameter of 2.54 cm (VWR cat. no. 89093-496). The arenas contained either standard
grapefruit medium or medium in which 15% of the water has been replaced with ethanol.
Flies were filmed for 10, 20, or 30 minutes for three replicates of each of two conditions
(i.e. each genotype was filmed 18 times with 12 arenas per filming). The different
durations were used to obtain fly samples that had been exposed to ethanol for different
amounts of time.
The flies were sedated through exposure to cold for 10 minutes, and placed in the
behavioral chambers with a paintbrush (two males and one female per arena). They were
allowed to acclimate for 10 minutes prior to commencing recording using PointGrey
Grasshopper digital cameras. This acclimation period is standard and is long enough for
the initial startle response to ethanol to have concluded (Cho et al. 2004; Grosjean et al.
2011; Li et al. 2015). Recording was automated using VideoGrabber
(http://code.google.com/p/video-grabber/), and set-up of the assays was facilitated with
FlyCapture (PointGrey). Arenas in which flies were damaged in any way were excluded
from the analysis. The assays were conducted within a two hour window after dawn, the
period in which the flies are most active (Klarsfeld et al. 2003; Chiu et al. 2010; Allada
33
and Chung 2010). Replicates were conducted randomly across days under standardized
conditions (25 °C, 70% humidity).
3.2.2 Fly lines
The six male genotypes came from natural genotypes collected from an orchard in
Winters, California in 1998 and were made isogenic by at least 40 generations of full
sibling inbreeding (Yang and Nuzhdin 2003; Campo et al. 2013). Flies in natural
conditions are more heterozygous than inbred laboratory genotypes, so each genotype
was crossed to a reference strain (w
1118
, Bloomington stock number 3605) to create the
F1 flies used in the behavioral assays. With this design we have the ability to replicate
behavioral observations because the flies resemble wild flies but are genetically identical
(Wahlsten 2001; Brakefield 2003). Flies were reared on a standard medium at 25 °C with
a 12-h light/12-h dark cycle. To standardize offspring quality all F1 flies were produced
from females of the same age and held at the same density (10 individuals of each sex per
vial). Male F1 flies used for the assays were collected as virgins and reared in single sex
vials at a density of 24-30 individuals per vial.
The female genotype used as the focal individual in each group was an inbred
laboratory strain y
1
w
1
(Bloomington stock number 1495) that allows us to treat multiple
individuals as replicates of a genotype. The w
1
refers to a mutation causing white eyes,
and while vision may be important for males in deciding whether or not to approach an
object, in D. melanogaster it does not have an important role in courtship or receptivity
(Agrawal et al. 2014; Bontonou and Wicker-Thomas 2014). Females were produced and
reared in the same manner as described for males. Both males and females were aged
three-four days prior to observation.
34
For the y
1
w
1
focal individuals we wanted to reduce unnecessary variation in
female phenotype and more closely mimic natural situations where most females will be
mated. To do this we added three to five males of a standard genotype to the virgin
females a day before the behavioral assays were conducted. While it is possible that not
all females were mated, this effect will be random across genotypes and would contribute
only to noise in the data.
3.2.3 Automatic tracking
We used a system of automatic tracking of fly movement to produce the data for
this paper (See Chapter 2 for a full explanation of the tracking algorithm). In brief, we
began by performing a background-subtraction step for each frame of the video data. The
resulting foreground image was then thresholded to reduce noise, after which the position
of flies was determined using Gaussian mixture models. We then formed movement
`tracks’ by matching detected flies in consecutive frames. For each experiment, we
recorded the movement of flies as the number of pixels traversed each second (a pixel
equals 0.127 mm).
3.2.4 Movement dataset
The videos were divided into 5-minute intervals and the rate of movement was
calculated as the average across each interval (6 windows spanning 30 minutes). The two
male measurements were averaged within each arena, for each time point, as they are the
same genotype. Thus, for each five-minute interval we have two measurements, one for
males and one for females (Figure 1). Note that all arenas in which mating occurred were
excluded from the analysis.
35
3.2.5 Validation of methods
To validate our tracking algorithms, we visually inspected 360 20-second
intervals from randomly selected videos, comparing the video with the inferred tracking
data (See Figure 1 for an example of fly movement over time). Error frequency was
evaluated in two ways: i) the frequency with which a fly track switched between two flies
(i.e. the software changed its identification of a fly erroneously), and ii) the frequency of
sex miss-identification. (For results of this, refer to Chapter 2 section 2.3.1.)
Figure 1: The movement of three flies in an arena over a time period of 1000 seconds.
The x-axis describes time in seconds. The y-axis shows the movement of a fly in pixels.
The flies are represented by different colors; males (blue) and females (red).
3.2.6 Analysis of movement using mixed effect models
This analysis includes measurements of movement from 1013 arenas. Each arena
has measurements for every five-minute interval, thus a trial that was run for 30 minutes
has six observations per arena (72 observations per chamber, six for each of 12 arenas).
After removing outliers, we applied a logarithmic transformation to measured movement
values to improve normality of the data. We then fit a linear mixed effects model to
36
explain movement rates as a function of covariates, noting that we have repeated
measures (i.e. measures for several five-minute time intervals for each arena). The
genotype of the male fly was treated as a random effect, with a slope and intercept term
(for movement as a function of time) for each genotype. Arena ID was nested within
genotype as a random effect; however, it was included with only an intercept term to
reflect individual differences in baseline movement rate, but not response across time.
We also included day as a predictor in order to control for batch effects. Day might
ideally be treated as a random effect, but since we have data from a large number of days
- and days are not independent, because in consecutive series of days some variables
(such as experimenter and food texture) will be correlated - the model fit then becomes
unstable. Consequently, we took the pragmatic solution of treating day as fixed effect on
baseline movement rates. The abiotic environment was included as a fixed effect.
We fitted this model using the lme (nlme) function in R. We assessed the
significance of the fixed effect variables using an F-test implemented by the ANOVA
(nlme) function. The significance of the random effect variables was assessed by using a
likelihood ratio test to compare model fits in which the variable was included and
excluded (without changing any other terms), using the ANOVA (nlme) function in R. In
addition, we explored the three-way interaction between genotype, abiotic environment,
and time to study genotype by abiotic environment effects on movement. We use REML
(restricted maximum likelihood) to fit the lme model parameters. This is appropriate
since our fixed effects are common to all models being compared. To analyze locomotory
behavior, we fit the model separately for male and female movement rates. The R
commands used to fit this can be seen below.
37
Male movement
lme (Male Movement ~ Time * Environment + Day, random = list (Genotype ~ 1+Time
* Environment, Arena_ID = ~ 1) )
Female movement
lme (Female Movement ~ Time * Environment + Day, random = list (Genotype ~
1+Time * Environment , Arena_ID = ~ 1) )
3.2.7 Relationship between male and female movement
Our interest is in understanding how female movement is affected by male
movement in different abiotic environments. We investigate this by regressing female
movement on male movement. This can be done for all genotypes in each environment,
resulting in an overall estimate of Ψ, using the mean male phenotype for each genotype.
For this we begin by calculating genotype-specific average male movement rates, and
regressing individual-level female movement on the calculated mean movement of the
male genotype in her arena. Ψ, the coefficient of interaction, is then defined as the
coefficient of the resulting regression. Note that for this portion of the analysis movement
is averaged across all time points for each arena.
In addition, it is also possible to perform the regression using individual-level
phenotype data separately for each male genotype, resulting in a (male) genotype-specific
estimate of Ψ, denoted by Ψj. Ψj illustrates whether there is genetic variation for social
traits, and that the specific males that females encounter will alter their behavior in
different ways. However, in this case, the resulting estimates are confounded with
38
unmeasured environmental effects, and are likely to be inflated. This is an important
caveat for describing Ψj , and it should be interpreted with caution. However, with that
consideration, Ψj describes genetic variation in social effects, which has rarely been done,
and which is likely to impacts the fitness and evolution of social partners in the
population.
3.2.8 Ψ between abiotic environments
We use Ψ as a measure of IGEs, modeling the effect of the phenotype of the focal
female’s male partners on her own movement phenotype. The full model for estimating Ψ
is:
(3.1) 𝓏 𝑗𝑘
= 𝛼 + 𝛹 𝑋 𝑗 ̅
+ 𝘀
Here 𝑧 𝑗𝑘
denotes the measured female movement for the kth trial with the jth male
genotype. 𝑋 𝑗 ̅
is the mean male movement across all trials containing male genotype j,
(i.e., the genotype-specific estimate of male movement). 𝛼 consists of both the effect of
female genotype and the effect of female environment. If, as is standard, we assume that
environmental deviations have mean zero and are uncorrelated with genotype, then the
resulting estimate of Ψ is not conflated with unmeasured environmental effects
(McGlothlin and Brodie 2009). Female genotype was invariant, thus 𝛼 is a constant.
Finally, 𝘀 is the error term. The model was fitted using the lme (nlme) function in R
while accounting for day effect and repeated measures. We used REML (restricted
maximum likelihood) to fit the lme model parameters. To estimate overall Ψ values for
each treatment (i.e., ethanol/non-ethanol), the fitted regression coefficient for female
39
partner phenotype was calculated as the estimate of Ψ in each environment (i.e. ethanol
exposure). The R code for this analysis is as follows:
lme (Female Movement ~ Time + Day+GEMM, random = list (Arena_ID = ~ 1 ) )
where GEMM is the average movement of male flies for each genotype in a given
environment. To test the effect of environment on Ψ, we assessed the significance of the
effect of interaction between the partner individual phenotype and environment on the
focal individual phenotype using the full dataset and a mixed model to account for day
and Arena_ID. The R code is as follows:
lme (Female Movement ~ Time + Day+GEMM*Environment, random = list (Arena_ID = ~ 1 ) )
3.2.9 Ψj for individual genotypes
Ψj describes the effect of a specific social environment on female behavior. To
estimate Ψj we regressed the phenotype of each female on the phenotype of the males in
her arena, rather than the mean of each genotype across trials. As noted earlier, estimates
of Ψj are likely to be inflated by the effects of unmeasured environmental effects, as we
are not using mean values that allow for the assumption that the mean of environmental
deviations is zero. There are many potential sources of unmeasured environmental
effects, such as covariance between male and female environment and feedback between
male and female phenotype. Extensive environmental controls, described previously,
were intended to minimize the effect of a shared environment, or render it roughly
equivalent across experiments. However, we cannot completely remove the effect of a
shared environment, nor can we rule out the possibility that different genotypes are more
40
sensitive to shared environment. Thus, our estimates of Ψj should be interpreted with
caution (see Discussion). The full model for estimating Ψj is as follows:
(3.2) 𝑧 𝑗𝑘
= 𝛼 + 𝛹 𝑗 𝑋 𝑗𝑘
+ 𝘀
Variables are as defined previously, with the exception of 𝑋 𝑗𝑘
which is the measured
male movement for the kth trial with the jth male genotype. The model was fitted using
the lme (nlme) function in R while accounting for day effect and repeated measures. As
before, REML (restricted maximum likelihood) was used to fit the lme model parameters.
To estimate Ψj, we estimated the fitted slope of male phenotype for each genotype. This
analysis was performed for ethanol and non-ethanol environments separately. The R code
for this analysis is as follows:
lme (Female Movement ~ Time + Day+ Male Movement, random = list (Genotype=~ 1+Male
Movement, Arena_ID = ~ 1 ) )
In this model, we added a random effect for male genotype. This permits the genotype-
specific regression terms for male movement to have different intercept terms, allowing
for the possibility of differences in baseline female movement rates as a function of male
genotype. In order to test the hypothesis that Ψj varies between genotypes against the null
hypothesis that Ψj is constant, we remove the genotype-specific slope term from the
model, leaving just the baseline slope of male movement, to form the null model. We
then use a likelihood ratio test between the full and null models to assess the significance
of the male genotype by male movement interaction term. The null model is as follows:
lme (Female Movement ~ Time + Day+ Male Movement, random = list (Genotype=~1,
Arena_ID = ~ 1 ) )
41
3.3 Results
3.3.1 Analysis of movement
There are several steps of analysis required to establish IGEs for locomotion that
vary in different environments. We must establish that there is variation for movement,
variation between genotypes in their environmental interaction, and that there are IGEs.
3.3.1.1 Movement and male genotype
In order for female movement to be affected by IGEs resulting from male
movement, male movement must be heritable. We found extensive genetic variation in
multiple components of male movement (Figure 2). Overall activity level varied between
males, with some genotypes displaying consistently higher activity levels than others
(See Figure 2 for the results of the full model) (male movement, genotype: χ
2
(df=1) =
12.77, p < 10
-3
). When we looked at changes in movement during the 30 minutes the flies
were assayed there was also variation in the slope of activity between genotypes (male
movement, genotype x time: χ
2
(df =1) = 27.14, p < 10
-4
). In ethanol, there was a general
trend towards increasing movement over the course of the 30-minute experiment (male
movement, environment x time: F1,2931 = 53.67, p < 10
-4
), however this trend varies
between genotypes (male movement, genotype x environment x time, for models with
and without the three-way interaction term: χ
2
(df=1) = 8.47, p = 0.004) (Fig. 2).
Furthermore, ethanol affected the movement of different genotypes differently, i.e. there
was a significant genotype-by-environment interaction (genotype x environment: χ
2
(df=1) = 28.81, p < 10
-4
). This establishes that there are direct genetic effects for
locomotion in males.
42
Figure 2: A) Male movement over time for each of six genotypes in ethanol and non-
ethanol exposed environments. The x-axis describes time in minutes. The y-axis shows
the log-transformed movement of male flies. Each plot represents a separate genotype.
The solid and dashed lines represent the linear model fitted to the movement of male flies
over time in ethanol and non-ethanol environments respectively. N=330 measures of
movement B) The results of the full model investigating variation in movement in males
of Drosophila melanogaster. The variables are time (T), environment (E), and genotype
(G). The interaction terms included in the full model were environment x time, genotype
x time, genotype x environment, and genotype x environment x time. The degrees of
freedom (df) for each variable and interaction are listed. For variables with fixed effects
the results of the F-test are shown, for variables with random effects the results of the
likelihood ratio test (LRT) to compare model fits is shown.
3.3.1.2 Female movement with different male genotypes
Female movement was different depending upon the presence of ethanol (female
movement, ethanol: F1,959 = 6.14, p = 0.013). It was also different in the presence of
43
different male genotypes, and the effect of male genotype varied depending on the
presence of ethanol (Figure 3) (female movement, male genotype: χ
2
(df=1) = 7.94, p =
.005) (female movement, male genotype x environment: χ
2
(df=1) = 7.12, p = 0.008).
Female movement also increased over time, though this effect varied among male
genotypes (female movement, time: F1,2931 = 121.38, p < 10
-4
) (female movement,
genotype x time: χ
2
(df=1) = 8.66, p = 0.003).
3.3.1.3 Movement is sexually dimorphic
Overall activity of males was higher than that for females (2.7x). We tested the
significance of this difference using a t-test to compare the movement of males and
females (t1012 = 33.62, p < 10
-4
). However, the magnitude of change in ethanol
environments was not different between males and females. Females were of a different
genotype than males and so we cannot disentangle the effect of sex and genotype;
however, this pattern of sexual dimorphism has been observed previously.
3.3.2 Relationship between male and female movement
3.3.2.1 Ψ between abiotic environments
The estimate of Ψ is lower in environments with ethanol than in non-ethanol
environments (ΨETOH=0.04, ΨNON-ETOH=0.11) (Figure 4a). We saw a significant effect of
the abiotic environment on Ψ (environment: F1,962 = 7.16, p = 0.008) (Figure 4b).
44
Figure 3: A) Female movement over time in the presence of each of six male genotypes
in ethanol and non-ethanol exposed environments. The x-axis describes time in minutes.
The y-axis shows the log-transformed movement of female flies, with the plots
representing movement in the presence of each male genotype. The solid and dashed
lines represent the linear model fitted to the movement of male flies over time in ethanol
and non-ethanol environments respectively. N=330 measures of movement B) The results
of the full model investigating variation in movement in females of Drosophila
melanogaster. The variables are time (T), environment (E), and genotype (G). The
interaction terms included in the full model were environment x time, genotype x time,
genotype x environment, and genotype x environment x time. The degrees of freedom
(df) for each variable and interaction are listed. For variables of fixed effect, the results of
the F-test are shown, for variables with random effects the results of the likelihood ratio
test (LRT) to compare model fits is shown.
45
3.3.2.2 Ψj for individual genotypes
The estimated social effect of different male genotypes was different, and varied
between abiotic environments in a genotype-specific manner (Figure 4c). For example, Ψj
was unchanged between environments for the first genotype, and much lower in the
presence of ethanol for the sixth genotype. We tested for variation in Ψj among genotypes
separately for each environment by comparing model fit with and without a genotype x
male movement term. Different genotypes had significantly different Ψj without ethanol
(Figure 4c) (genotype x male movement, result for models with and without a genotype x
male movement term: χ
2
(df=1) = 19.87, p < 10
-4
). We also saw an effect of genotype in
the presence of ethanol on Ψj, though it is statistically less robust (genotype x male
movement, result for models with and without a genotype x male movement term: χ
2
(df=1) = 3.90, p =0.05) (Figure 4c).
3.4 Discussion
Deciphering the role of IGEs is a critical step in understanding both the
maintenance of variation and the importance of social environment in evolution. This
study is one of few to have measured Ψ. While the effect of biotic environment on IGEs
has been demonstrated (Chenoweth et al. 2010), this is the first case where the abiotic
environment was implicated in a genotype specific manner (but see (Bailey and Zuk
2012)). We have shown that for locomotion in D. melanogaster Ψ varies in a context
specific manner, changing between environments. This indicates that the abiotic
environment will change the social environment given the same genotype frequencies.
46
Figure 4: A) Ψ estimated for each environment. The fitted values were estimated using a
generalized mixed model (see Methods and Supp. Methods). B) The results of the full
model investigating Ψ in two abiotic environments. To calculate Ψ we included the
average movement of male flies for each genotype in a given environment (GEMM) as a
predictor of the movement phenotype of the partner individual. C) Ψj estimated for each
genotype individually, in each environment. Note that a shared environment may be
conflated with estimates of Ψj.
Context specific fitness has been receiving more attention of late, as it has become
clear that there are many traits that are context dependent. This includes diverse systems
such as diet in neriid flies (Adler et al. 2013) or resource allocation in plants (Delph et al.
2011). Context specific fitness is common for social traits, particularly when they are
sexual, where for example mate choice can vary by climate (Robinson et al. 2012), the
presence of congeners (Svensson et al. 2010), or population density (Krupa and Sih 1993;
Taff et al. 2013). Context specific fitness, varying geographically or temporally, is
expected to confer different evolutionary rates and directions among different populations
(Bailey and Zuk 2012). In the case of IGEs the importance of this is mediated by the
47
strength and direction of Ψ (Bailey and Zuk 2012). Here, we show that Ψ varies between
environments and that different male genotypes have different social effects on females.
Depending upon heterogeneity and isolation within social groups and populations this
could potentially retard or promote divergence between populations, an interesting future
avenue for modeling of IGEs.
When Ψj is measured for individual genotypes (with the aforementioned caveats),
Ψj changed unpredictably between environments, with some genotypes having a large
magnitude of effect in one environment and a very low impact in another. Many social
traits are context dependent, so this result is not surprising. However, despite values of Ψ
being consistently positive in this study, locomotion is a sexually antagonistic trait where
females are being selected for lower levels of movement and males for higher levels of
movement. The expectation is that male genotypes with positive values of Ψ that
correspond to higher (lower) levels of female activity have negative fitness consequences
for female (male) flies, and this is another illustration of sexual antagonism in this
system.
Attempting to estimate Ψj necessitates confounding shared environment with the
effect of different genes in the environment. As there is no way to estimate one without
the other, our work is likely to over-estimate Ψj. Estimates of Ψj are much higher than
estimates of Ψ, indicating that the confounding effects of shared environment may be
large and important. Extensive environmental controls were used to attempt to hold the
effect of shared environment constant across assays, however this will not control for any
genetic variability in responsiveness to shared environment. Ψj illustrates important
patterns in our data, and does suggest that females respond to males of different
48
genotypes in different ways. However, we stress that absolute values of Ψj are not
interpretable and that there is likely a large effect of shared environment.
In this study, we observe that social effects are altering female phenotype such
that male flies that move more cause their female social partners to move more. There is
also a possibility that males who do not interfere with females will be favored as social
partners. This would result in a positive correlation between male and female phenotypes
in the absence of direct genetic correlations. We speculate that this could result in
selection for increased resistance in females to male social effects or selection on females
for social group choice, both potentially interesting areas of future research.
Studies to date have not explored how sexual selection by males changes between
environments (Miller and Svensson 2014), although there has been extensive
documentation of spatially varying changes in female preference and sexual selection
(Gilburn and Day 1994; Candolin et al. 2007; Chaine and Lyon 2008; Roulin and
Salamin 2010; Botero and Rubenstein 2012). Given that there is extensive variation in
male harm and resistance to male harm, male behavior is likely to also play an important
role in context specific differences in selection (Friberg 2005; Fiumera et al. 2006; Gay et
al. 2011). We show here that the effect of males on females varies in a context dependent
manner for a fitness related trait. This corroborates other studies that have found that sex-
specific optima may be very context dependent (Gosden and Svensson 2009; Bailey and
Zuk 2012; Robinson et al. 2012; Taff et al. 2013; Connallon 2015).
Locomotion in flies has a shared genetic basis between males and females which
is expected to hinder the evolution of sex-specific optima (Long and Rice 2007). What
49
we do not know, and what may alter the outcome of this conflict, is whether or not IGEs
for locomotion have a separate genetic basis from locomotion itself. If IGEs have a
separate genetic basis, this could be a means of circumventing the conflict over
locomotion. This could create both intralocus sexual conflict contributing to maintenance
of variation (locomotion) and interlocus sexual conflict (IGEs), resulting in an arms race
between males and females. Resolution of the intralocus conflict occurring for
locomotion may be complicated because locomotion is highly polygenic and related to
other aspects of physiology, including epistatic interactions with genes not involved in
the conflict (Badyaev 2002; Van Doorn 2009; Harano et al. 2010; Parsch and Ellegren
2013). IGEs are evolvable and have a genetic basis, so a potentially interesting avenue of
future research would be the ability of IGEs to circumvent a stalled intralocus sexually
antagonistic conflict (Bleakley and Brodie 2009; Chenoweth, Rundle, and Blows 2010).
In summary, we have shown that locomotion in Drosophila is affected by both
IGEs and IGE-by-environment interactions. We have demonstrated that Ψ varies in
between environments, and different genotypes of male have different social effects on
females. We explored the possible fitness effects of IGEs and IGE-by-environment
interactions for a sexually antagonistic trait. Finally, we discussed the potential broader
consequences of IGEs on the maintenance of sexual conflict, although additional research
into this matter is needed. We also note that we have also performed a similar analysis to
study effects of social environment in D. simulans. A complete comparison between
these effects in D. simulans and D. melanogaster has been carried out and used to prepare
a paper which at the time of writing this thesis is under review at The American
50
Naturalist. Since the details of the study are very similar to those above, we do not report
them here.
51
Chapter 4
Modeling social group structure of flies
using recurrent event models for HMM
corrected tracking data
4.1 Introduction
The study of social group composition of species is important for understanding
the ecological and evolutionary effects of social interactions (Saltz et al. 2011; Kohn et
al. 2011; Foley et al. 2015). These studies can help in understanding a variety of
behaviors such as mating, competition or cooperation (Whitehead 1996). Social group
structure also plays an important role in determining the population biology of a species
(Wilson 1996). As we have seen in Chapter 3, Drosophila has been used as a model
organism for studying social group composition. In Saltz and Foley (2011) aggressive
behavior was studied in male flies and it was shown that aggression causes reductions in
aggregation. The mating preferences of female flies were studied in Cabral et al. (2008)
and it was shown that females might have a tendency to mate with less aggressive male
flies.
52
In Foley et al. (2015), the social structure of Drosophila melanogaster was
studied using a model for individual social group preferences. This was performed using
agent-based simulation and approximate Bayesian computation, and they modeled the
leaving or joining rate of male and female flies along with the effect that other males and
females have on those rates. The effect of aggressive and non-aggressive genotypes on
leaving/joining rates was also studied.
In Chapter 2, we studied the importance of tracking software and introduced an
algorithm to track the movement of fruit flies. These algorithms track the position of
objects in a video and generate detailed data about the position of objects throughout the
video. In recent years, there has been vast improvement in the accuracy of tracking
algorithms. However, detecting the number of objects of interest in each frame of the
video still remains a challenging problem. Errors in detecting the correct number of
objects in a frame will result in incorrect tracking results. These errors can occur due to
two major events. In the first event, an object may be detected as two or more objects
(Huang and Essa 2005). This is mainly a problem in background subtraction methods. In
these methods, the background of the video is estimated (globally or locally) and is
removed from the frame to get the foreground frame. The foreground frame is then
thresholded to get the blobs (connected components) in the foreground image. In ideal
circumstances, each blob corresponds to a single object, though this might not be true
based on the threshold used for extracting the blobs from the foreground frame. If we use
a low threshold, our method will be prone to errors, such as detecting subtle lighting
changes as blobs. If we use a high threshold, we will be prone to errors such as dividing
an object into two or more blobs. This can happen in places where an object consists of
53
two or more colors. This can introduce errors in the estimate of the number of objects in
the frame. These errors can also occur in tracking algorithms that use feature-based object
detection (Han et al. 2007).
The second event that can cause errors in estimating the number of objects in a
frame is occlusion (Huang and Essa 2005). Occlusion happens when one object masks
another object in the frame, or two objects are in close contact with each other (Yilmaz,
Li, and Shah 2004). This can cause the blobs of two objects to merge and result in a blob
containing more than one object. It is referred to as the “merged blob” problem (Yong
Zhou 2014). Tracking objects in the presence of occlusion then becomes non-trivial due
to incorrect estimates of the number of objects in the frame. There have been many
studies that aim to solve this problem (Huang and Essa 2005, Zhou and Li 2014). A
Gaussian mixture model is used in Branson et al. (2009) to detect the number of objects
in a merged blob. These methods are often based on optimization algorithms and though
effective, can have many errors in estimating the number of objects in a frame in less
than ideal situations.
Here we aim to first correct the tracking data to get a more accurate estimate of
the number of flies on a patch and then use the more detailed information in our dataset
to model how group size changes with relation to genotype and sex ratio using group
joining and leaving events. We treat each patch as a group and model the time the group
size on patch stays unchanged before a joining or leaving event happens.
54
4.2 Methods
4.2.1 Experiment Setup
A fixed number of flies (n=20) were placed in a closed arena with four patches of
food. PointGrey Grasshopper digital cameras were placed over each patch to record the
activities of flies. For each patch of food, two one hour long videos were recorded in the
morning (AM1 and AM2) and two videos were recorded in the evening of the same day
(PM1 and PM2). A schematic of the experiment setup can be seen in Figure 1. The six
genotypes of D. melanogaster for this experiment (three for males and three for females)
were previously used in Saltz and Foley (2011), and Foley et al. (2015). These genotypes
had different aggression levels. Four of these genotypes, (W23, W58, W89, and W145)
were collected in 1998, in Winters California and were nearly isogenic lines. The other
genotypes were a neutral control and high-aggression selected genotypes provided by R.
Greenspan (Dierick and Greenspan 2006). The 20 flies in each experiment were chosen
with three sex ratios (1:1, 2:1, 1:2). The combination of sex-ratio and genotype of each
experiment is considered as the treatment.
4.2.2 Tracking
Here, we use a graph-based tracking algorithm. After finding the blobs containing
the flies (which is done using the methods described in the previous chapter), a distance
matrix D is formed between the blobs in frames i and i+1. The elements of this distance
matrix, D={dij}, are the distances between the blobs in two consecutive frames. These
distances are then classified into ‘small’ and ‘large’ (small and large are defined using a
threshold 𝛾 , which is fixed and is found by looking at the maximum movement of several
55
flies in consecutive frames). This matrix is then used in a Hungarian algorithm in order to
pair blobs in the two frames. This would be simple if we do not have any occlusions
(splits or merges of blobs) or joining and leaving events. In the presence of these events
the Hungarian algorithm will fail since there is no optimal matching. Here we define a
set, O, that counts the possible occurrences. O can consist of Appearance (A),
Disappearance (D), Split (S) or Merge (M). The set O={A,D,S,M}, denotes the number of
times the events have happened between two consecutive frames.
Figure 1: Schematic of a sample experiment arena with 4 patches
If we know O, we can adjust the Hungarian algorithm to find the best matching,
and thus the tracks are formed. In order to find O, we look at the distance matrix D
between the blobs of the two consecutive frames. Six different outcomes are possible,
56
illustrated in Figure 2. Any row with two or more small distances as defined by ( 𝑑 𝑖𝑗
< 𝛾 )
in it, indicates that two blobs in frame i+1 are very close to a single blob in frame i. This
represents a split event (c). Two or more small-distances in a column suggests that a blob
in frame i+1 is very close to two blob in frame i and this represents a merge event (b).
Rows with zero small-distances, suggest that there are no blobs in frame i+1 which are
close to the blob corresponding to those rows in frame i and represent disappearances (e)
and columns with zero small-distances, suggest that there are no blobs in frame i which
are close to the blob corresponding to those columns in frame i and represent appearances
(f). This is based on the assumption that flies do not move far between consecutive
frames. The rest of the rows will have one small-distance which will be the matching
assigned to them. After deriving our best estimate of the set O in this way, we use it to
adjust the Hungarian algorithm and perform the matching. This is carried out by adjusting
the distance matrix between consecutive frames i and i+1, denoted by D, in 3 steps. First
the blobs that appeared in frame i+1 and those that disappeared from frame i are found
and the corresponding columns and rows for these blobs are removed from matrix D.
Second the split/merged blobs in frame I, and the blobs they have split/merged to in
frame i+1, are detected and their corresponding columns and rows are removed from
matrix D. What remains is a square matrix D. In the third step, we use the Hungarian
algorithm on the square matrix D to complete the matching. In this step, since D is a
square matrix, we will have a perfect matching between the remaining blobs of frame i
and frame i+1.
57
Figure 2: We illustrate the different occurrences possible between frame i and frame i+1.
On the right side of the Figure, the formation of the patches in frames i and i+1 can be
seen. Each colored ellipse shows a fly in the patch. On the left side of the figure, the
corresponding distance matrix can be seen. In the matrix, S refers to small distances and
L refers to large distances, as distinguished by our threshold 𝜸 . a) No occurrences. b)
Merge. c) Split.
58
Figure 2 (continued): d) Split and Merge. e) Disappearance. f) Appearance.
In our study we are interested in the number of flies on each patch for each frame
of a video. This is estimated using the tracking algorithm. Due to the nature of the
tracking algorithm, there are errors in the estimate these numbers. The results for the
validation of the tracking algorithm using one hundred 20-second intervals of video can
be seen in section 4.3.1. Error correction of these inferred estimates is then performed
using a Hidden Markov Model (HMM). We now introduce HMMs.
59
4.2.3 Hidden Markov Models
Hidden Markov Models (HMMs) have been used in many fields of science,
including speech recognition (Jelinek 1976) and DNA sequence analysis (Durbin et al.
1998). HMMs are statistical models in which the data are assumed to have the Markovian
property with unobserved ‘hidden’ states. A stochastic process 𝑍 = { 𝑍 𝑛 , 𝑛 𝜖 𝑁 } is said to
have the Markov property if the conditional probability distribution of the future state of
the process given the past, is only dependent on the present state of the process, that is:
(4.1) 𝑃 ( 𝑍 𝑛 = 𝑧 𝑛 | 𝑍 𝑛 − 1
= 𝑧 , … , 𝑍 0
= 𝑧 0
) = 𝑃 ( 𝑍 𝑛 = 𝑧 𝑛 | 𝑍 𝑛 − 1
= 𝑧 𝑛 − 1
)
In other words the process is memory-less. The difference between a Markov model
(Markov chain) and a hidden Markov model is that in a Markov model, the states are
visible and transition rates between states can be estimated directly by observing the data,
whilst in hidden Markov models the underlying states of the process are hidden, and one
only observes the outputs of a second ‘layer’, which is dependent on the hidden state.
Figure 3a, shows a schematic of a hidden Markov model. Importantly, the hidden state of
the model at time t is only dependent on the hidden state at time t-1. Figure 3b shows the
possible hidden states (X1, …, X3) and observations (Y1, …, Y4) for a HMM with three
possible hidden states and four possible observations in addition to the transition (aij) and
emission probabilities (bij). aij is the probability of transitioning from state Xi to Xj. These
form the transition probability matrix A={aij}. bij is the probability of observing Yj when
we are in state Xi. These bij’s form the emission probability matrix B={bij}. In order to
run the HMM, we also need the initial probability 𝜋 ( 𝑖 ) of each state i. This is the
probability of starting the sequence of hidden states at state i. When fitting an HMM, we
estimate the parameters of the transition matrix and the emission matrix using observed
60
data. This is done using the Baum-Welch algorithm (Baum et al. 1970). This algorithm is
based on the Expectation-Maximization algorithm and involves two steps: the
expectation step and the maximization step. The maximization step involves maximizing
a proxy to the log likelihood and finding the maximum likelihood estimates of the
parameters of the model and the expectation step updates the current model using the
predictions made for the parameters, to get closer to the optimal solution. Using these
methods, we estimate the transition probabilities of the HMM. After finding the HMM,
we can use the Viterbi Algorithm (Viterbi 1967), in order to find the most probable
underlying states for a sequence of observations.
In our application the observed data will be a sequence of the number of flies on a
patch at each time point, extracted from the experimental videos using object tracking,
with consecutive time points of the observed sequence being 1/3 of a second apart. We
then estimate the HMM transition and emission matrices using the methods just
described.
In our analysis, we have the assumption that the number of flies on a patch does
not change by large amounts. For example, when we have eight flies on a patch at one
time point, it doesn’t jump to four flies in the next step and the flies leave/join one-by-
one in different time steps. To reflect this, we use constrained HMMs. In Constrained
HMMs, the transition probability matrix is constrained to be a semi-diagonal matrix (i.e.
a matrix with non-zero values on the diagonal, and the cells next to the diagonal, and zero
everywhere else) (Roweis 1999). The emission matrix is allowed to have non-zero values
in any position of the matrix, to allow for errors in the observations. Figure 4 shows the
structure of a constrained HMM. An initial value for the emission probability matrix is
61
provided, that has high values on the diagonal, and the value of the cells drops as the
distance to the diagonal increases. The emission probabilities within this matrix and the
probabilities for the allowed transitions in the transition probability matrix are estimated
as part of the analysis. In constrained HMMs, the parameters of the transition and
emission matrix are estimated using the same method used for HMMs (Baum-Welch
algorithm), with the difference that we do not update the transition probabilities that are
constrained to be zero (Roweis 1999).
Figure 3: a) The structure of a hidden Markov model. Here Zt is the hidden state at time t
and Ot shows the observed data at time t. b) A Hidden Markov Model with three hidden
states x1,…,x3 and 4 possible observations y1,…,y4 . Here the aijs are the transition
probabilities between the underlying states and the bijs are the emission probabilities of
the observations at different states.
Z1 Z2 Z3 ZT
b)
a)
62
As a part of our analysis we also use the Viterbi algorithm (Viterbi 1967) to
obtain the most likely sequence of true underlying states (i.e., the most likely number of
flies on patch during each frame).
The hope is that by fitting this constrained Hidden Markov model to the observed
(i.e. emitted) sequence of the number of flies on patch, and then finding the true hidden
states for each observation, we can remove many errors from the data. These errors,
typically look like brief jumps to an adjacent state, before returning to the true state, and
usually occur when a fly is joining or leaving the patch or in places where the flies are
interacting and form a merged blob (Figure 5).
Figure 4: The transition probability matrix and the emission probability matrix for
constrained hidden Markov model. The white cells on the transition probability matrix
show zero entiries and the color of the cells gets darker as the probabilities increase. The
black cells represent high probabilities.
Transition probabilities Emission probabilities
63
Figure 5: The observed and hidden state for the number of flies on patch for a period of
time of a sample subject. The blue/red dots show places where the hidden and observed
agree/donot agree.
4.2.4 Dataset for number of flies on patch
The data on the number of flies on a patch from each video is extracted using the
video tracking software explained in Section 2.2, and corrected for errors using the
constrained HMM algorithm explained in Section 2.3. The extracted data contains
information on the number of flies on the patch at each time point. As mentioned before,
the consecutive time points are 1/3 of a second apart. These data are then used to create a
dataset of the joining and leaving events on each patch. Each line of this dataset
corresponds to a joining or leaving event on a single patch. The different columns of this
dataset are as follows;
1- Patch ID: a unique identifier for each patch in the experiment.
64
2- Number of flies on patch: number of flies on patch right before the event happens.
3- tstart: Either the time of the previous event, or zero for the first event.
4- tstop: The time of the event or censoring.
5- Male Genotype: the genotype of the male flies on the patch. Male flies on each
patch are chosen from the same genotype. There are three possible values for
male genotypes, coded as E, F, G.
6- Female Genotype: the genotype of the female flies on the patch. Female flies on
each patch are chosen from the same genotype. There are three possible values for
female genotypes, coded as A, B, C.
7- Number of Females: The number of females in each experiment, which is an
indicator of the sex ratio of the males and females in the experiment. It can take
the values of (5, 10, 15).
8- Time Period: the time period of the experiment performed (am1, am2, pm1, pm2)
9- Leave: an indicator of whether a leaving event happened in the time frame or the
data point was censored due to a joining event happening or the study coming to
an end.
10- Join: an indicator of whether a joining event happened in the time frame or the
data point was censored due to a leaving event happening or the study coming to
an end.
4.2.5 Modeling using recurrent event analysis
Processes in which events occur repeatedly over time are referred to as recurrent
event processes (Lawless and Cook 2007). In some settings, we might observe a large
65
number of events from a relatively smaller number of processes. This can be the case
when observing stoppages in an assembly line (Sobaszek and Gola 2016) or when
studying incidences of injuries in manufacturing plants (Kubo et al. 2014, 2013). In other
settings, data might be available for a larger number of processes, which include a
smaller number of events. There are many examples of such settings in medical studies,
such as the occurrence of asthma attacks in respirology trials (Guo and Yokoyama 2012).
These types of recurrent event data also occur frequently in business, such as insurance
claims for policy-holders and the filling of warranty claims on automobiles. In a recurrent
event analysis setting, we aim to understand the individual event process and identify
sources of variation across a population of such processes. We also want to be able to
compare groups of processes and determine the relationship between event occurrence
and fixed or time varying covariates.
In our setting we observe recurrent event data when studying leaving and joining
events of flies from a patch of food. In our study, the flies are treated as identical (indeed,
they are clones) and the number of flies on each of a given number of food patches is
considered as a unique process, which exhibits multiple joining and leaving events per
trial. Using recurrent event analysis, we want to be able to model these events and study
the effect of covariates such as genotype, sex ratio and the number of flies on patch on
the rate at which joining and leaving events occur.
The characteristic that is common among all recurrent event processes is the
presence of an intrinsic correlation between events of the same process, caused by the
fact that the same “component” can “fail” repeatedly (Amorim and Cai 2015). Ignoring
this correlation can result in falsely narrow confidence intervals and very small p-values
66
for estimated effects. This causes the null hypothesis to be rejected more than it should
be. We need to correct for the correlations present within each process to have correct
results for the hypothesis tests.
The Cox proportional hazards model (Cox 1972) is the most well known method
for analyzing survival data. This model assumes independence between all events and
hence can only be used to model the time to the next event for each process (Cox 1972).
Several methods have been proposed to extend the Cox model to account for
recurrent events. These methods take different approaches to account for the within-
process correlation in the recurrent event data. Some methods, such as frailty models use
a random effect to account for the dependency between events within each process.
(Therneau and Grambsch 2000). Other methods use a Markov assumption stating that the
future events are only dependent on the current state. Examples of these methods are the
Andersen-Gill (AG) (Andersen and Gill 1982) and the Prentice, Williams and Peterson
(PWP) (Prentice, Williams, and Peterson 1981) models. In our study we focus on frailty
models, which will be discussed later. In Section 2.6, we give a brief introduction to
different methods and the approaches they take to account for within-individual
correlations and point the reader to additional references for full definitions of the
models.
To be able to describe the recurrent events models, we will first introduce the Cox
proportional hazard model as the most used regression model for frameworks in which
censoring is present. We will then explain the approach taken in the frailty model to
account for within-individual correlation, and use this model to estimate the effect of the
covariates on the joining and leaving events.
67
4.2.6 Regression modeling for survival data
Regression models have been used in many fields to study the effect of predictor
variables (covariates) on an outcome variable (Hosmer et al. 2008). The choice of correct
regression model is heavily based on the measurement scale of the outcome variable and
the goals of the analysis. In logistic regression, we study the relation between covariates
and an event happening or not. However, in some applications we are interested in the
elapsed time before an event happens and how that time relates to different covariates
(i.e. time until tumor recurrence, time until AIDS for HIV patients, time until a machine
fails) (Wang et al. 2013). Also there are situations in which an event does not happen by
the end of an experiment, and we are not sure whether the event might have happened if
the study continued or not (Prinja, Gupta, and Verma 2010). In these situations, we use
survival analysis to study data in which the main interest is the time until an event occurs.
The response of a survival analysis is usually referred to as ‘failure time’, ‘event time’ or
‘survival time’, depending upon context. The survival time has some common properties;
it is usually continuous and for some objects might be incompletely determined. This
usually happens when we do not observe an event/failure by the end of the experiment
and we will only know that their survival time is at least as long as the time of the
experiment. These observations are referred to as censored observations. The survival
function, which gives the probability of surviving past time t, is denoted by the equation
below:
(4.2) 𝑆 ( 𝑡 ) = 𝑃𝑟 ( 𝑇 > 𝑡 ) = 1 − 𝐹 ( 𝑡 )
where T is the response variable (T>0), and F(t) is the cumulative distribution function.
68
As mentioned earlier, in our experiment an event is defined as a fly joining or
leaving a patch. The sojourn time or survival time is defined as the time flies remain on
patch in the current state before a joining or leaving event occurs.
Here, we first introduce the Kaplan-Meier product-limit estimator (Kaplan and
Meier 1958), which is a non-parametric estimator for survival functions in the presence
of censoring and plot the survival functions for different levels of each covariate.
4.2.6.1 Kaplan-Meier Survival Plots
The Kaplan-Meier estimator is used to estimate the survival function in a non-
parametric framework (Kaplan and Meier 1958; Kaplan 1983). The Kaplan-Meier
survival plot is constructed of declining horizontal steps, which will approach the true
survival function of a population if the sample size is large enough. This estimator uses
information from censored and uncensored observations by considering different points
in time as a series of steps, 𝑡 𝑠 , which are defined by the censored times and observed
failures. The value of the estimate is constant between consecutive steps. The Kaplan-
Meier estimator is defined as:
(4.3) 𝑆 ̂
( 𝑡 ) = ∏ ( 1 −
𝑑 𝑠 𝑛 𝑠 )
𝑡 𝑠 ≤ 𝑡
where 𝑛 𝑠 is the number of processes which have yet to be censored or have not
experienced the event at time 𝑡 𝑠 (and therefore are still at risk of experiencing the event)
and 𝑑 𝑠 is the number of processes which have experienced the event by that time.
Kaplan-Meier plots have been used in a variety of different fields. For example,
they are used to reflect the fraction of patients living for a certain amount of time after
going through treatment in medical research. They are also used in other fields, to reflect
69
failure times of different machines, the length of unemployment time for people after
losing a job, etc. (Meyer 1990).
Kaplan-Meier plots are a powerful non-parametric method for estimating and
visualizing survival functions, but do not provide risk assessment for covariates.
Therefore, we use Cox proportional hazard models to assess the effect of covariates on
the hazard rate of the survival times. First we will define the hazard rate.
4.2.6.2 Hazard rate
The hazard rate, h(t), for an event happening is the probability of the event
happening at time t conditional on observing no previous event before time t, defined
using the equation below:
(4.4) ℎ ( 𝑡 ) = li m
∆ 𝑡 → ∞
Pr ( 𝑡 < 𝑇 < 𝑡 + ∆ 𝑡 | 𝑇 > 𝑡 )
∆ 𝑡 =
𝑓 ( 𝑡 )
𝑆 ( 𝑡 )
where f(t) is the probability density function of the survival times and S(t) is the survival
function, defined as in Equation (4.2). This leads to the following relationship between
hazard rate and survival function.
(4.5) ℎ ( 𝑡 ) = −
𝑑 𝑑𝑡
lo g ( 𝑆 ( 𝑡 ) )
In parametric frameworks, the hazard rate can be obtained from the distribution of
the survival function. For example, for an exponential distribution, it is equal to the rate
parameter of the distribution.
4.2.6.3 Cox Proportional Hazards Model
Cox proportional hazards models are the most commonly used multivariable
survival models and were introduced by Cox in 1972. They are used to model the effect
70
of multiple covariates on survival. This is done by conveniently separating the baseline
hazard from the effect of covariates as below:
(4.6) ℎ ( 𝑡 , 𝑋 ) = ℎ
0
( 𝑡 ) e x p ( 𝛽 0
+ ∑ 𝛽 𝑗 𝑋 𝑗 𝑝 𝑗 = 1
)
where ℎ
0
( 𝑡 ) is the baseline hazard function, p is the number of covariates, 𝑋 𝑗 is the value
of the jth covariate, 𝛽 0
is an intercept term and 𝛽 𝑗 is the coefficient of the effect of
covariate j on the baseline hazard, which can be interpreted as a proportional change on
the log scale. As can be seen from the equation, this model is semi-parametric in the
sense that the effect of the covariates on the baseline hazard can be calculated without the
need to know the distribution of the baseline hazard function. This can be very useful in
cases in which the distribution of the hazard function is not known and cannot be
determined easily. In order to define the likelihood of the model, we first make some
definitions. We assume that each subject i has a time of event ti and times for events are
ordered as 0< t1 <…< tn with no ties. We define the risk set Ri of event at time ti to be all
the subjects that are at risk of event occurrence at time ti. Using the proportional hazards
assumption, the probability of subject i experiencing an event at time ti, conditional on an
event occurring at that time, can be defined as:
(4.7) 𝑃 ( E v e n t f o r s u b j e c t i | o b s e rve an e v e n t at 𝑡 𝑖 ) =
e x p ( 𝛽 0
+ ∑ 𝛽 𝑗 𝑋 𝑖𝑗
𝑝 𝑗 = 1
)
∑ ex p ( 𝛽 0
+ ∑ 𝛽 𝑗 𝑋 𝑘𝑗
𝑝 𝑗 = 1
)
𝑘𝜖 𝑅 𝑘
where p is the number of covariates, 𝑋 𝑖𝑗
is the value of the j
th
covariate for subject i, 𝛽 0
is
the intercept of the model and 𝛽 𝑗 is the coefficient of the effect of covariate j on the
baseline hazard. The partial likelihood for the Cox proportional hazards model is defined
using the probability in Equation (4.7):
71
(4.8) 𝑃𝐿 ( 𝛽 ) = ∏ [
e x p ( 𝛽 0
+ ∑ 𝛽 𝑗 𝑋 𝑖𝑗
𝑝 𝑗 = 1
)
∑ 𝐼 𝑘 ( 𝑡 ) e x p ( 𝛽 0
+ ∑ 𝛽 𝑗 𝑋 𝑘𝑗
𝑝 𝑗 = 1
)
𝑘 ]
𝛿 𝑖 𝑛 𝑖 = 1
where 𝐼 𝑘 ( 𝑡 ) = 1 if the k
th
subject is in the risk set at time t and is zero otherwise. 𝛿 𝑖 , is an
indicator of whether the event happened ( 𝛿 𝑖 = 1) or the event time was censored ( 𝛿 𝑖 =
0). In the Cox model, parameter estimates are obtained by maximizing the partial
likelihood, as opposed to the likelihood (Cox 1975).
As stated before, in a Cox model, the effect of the covariates on the hazard rate is
assumed to be multiplicative with respect to the hazard rate. This is known as the
proportional hazard requirement (Breslow 1975).
4.2.6.4 Assessing the Proportional Hazard Assumption
In order to define the criteria to assess the proportional hazard we start with the
survival function S(t):
(4.9) 𝑆 ( 𝑡 , 𝑋 ) = e x p ( − ∫ ℎ ( 𝑢 , 𝑋 ) 𝑑𝑢
𝑡 0
)
where, from the Cox proportional hazard model in Equation (4.6) we have:
(4.10) ℎ ( 𝑡 , 𝑋 ) = ℎ
0
( 𝑡 ) e x p ( 𝛽 0
+ ∑ 𝛽 𝑗 𝑋 𝑗 𝑝 𝑗 = 1
)
so the survival function will be:
(4.11) 𝑆 ( 𝑡 , 𝑋 ) = e x p ( − ∫ ℎ
0
( 𝑢 ) e x p ( ∑ 𝛽 𝑗 𝑋 𝑗 𝑝 𝑗 = 1
) 𝑑𝑢
𝑡 0
) = [ 𝑆 0
( 𝑡 ) ] e x p ( − e x p ( ∑ 𝛽 𝑗 𝑋 𝑗 𝑝 𝑗 = 1
) )
Now, taking the natural logarithm twice, we get:
(4.12) 𝐿𝐿𝑆 ( 𝑡 , 𝑋 ) = − ln ( − ln ( 𝑆 ( 𝑡 , 𝑋 ) ) ) = − ln ( − ln ( [ 𝑆 0
( 𝑡 ) ] ) ) − ∑ 𝛽 𝑗 𝑋 𝑗 𝑝 𝑗 = 1
72
So, if we calculate the difference between LLS(t,X) for two sets of survival observations
from two levels of a covariate (X1 and X2), we have:
(4.13) ln ( − ln ( 𝑆 ( 𝑡 , 𝑋 2
) ) ) − ln ( − ln ( 𝑆 ( 𝑡 , 𝑋 1
) ) ) = ∑ 𝛽 𝑗 ( 𝑋 2 , 𝑗 − 𝑋 1 , 𝑗 )
𝑝 𝑗 = 1
which is independent of time. This means that under the proportional hazard assumption,
the space between the plots of 𝐿𝐿𝑆 ( 𝑡 , 𝑋 ), which represent the ln-ln of the survival
function, does not change over time for different levels of the covariate and results in
parallel curves.
We will now introduce some extensions to the Cox proportional hazards model,
which account for recurrent events.
4.2.6.5 The counting process model
The Andersen-Gill (AG) model uses a counting process approach, where the
events are assumed to be independent and each subject is modeled as a multi-event
counting process (Ross 1995) with independent increments (Andersen and Gill 1982). A
counting process N(t), is defined as a non-negative, non-decreasing and integer stochastic
process such that N(t) - N(s) represents the number of events which have occurred during
the interval [s,t]. In the AG model, if a subject is under observation at an event time, it
can contribute to the risk set of that event time. In the AG model a baseline hazard rate is
assumed for all events and global parameters are estimated for different covariates. This
is the simplest model for recurrent events, given that the assumptions hold. Here, the
within-subject correlation is accounted for by adjusting the estimates of the standard
errors using a robust variance estimator discussed in (Lin and Wei 1989).
73
4.2.6.6 Conditional Model
The Prentice, Williams and Peterson (PWP) model is a conditional model, which
uses stratification to model the ordered recurrent events. The stratification is based on the
previous number of events that have occurred in the process. In this model, being at risk
for a subsequent event is dependent on observing the previous event. For example, a
process is not at risk for the 4
th
event until the 3
rd
event has occurred. In this model, both
event-specific and overall effects can be modeled for each covariate. There are two
conditional models, which differ in how the time-scale is used. One model defines the
time from the beginning of the study for each event as the survival time (PWP-CP) and
the other uses the time since the previous event (gap time, PWP-GT). In this setup, a
stratum variable is added to the dataset, which indicates the number of the events for
which the subject is at risk. The hazard function for the s
th
event under the PWP-CP
model can be defined as follows:
(4.14) ℎ
𝑠 ( 𝑡 , 𝑋 ) = ℎ
0 𝑠 ( 𝑡 ) e x p ( 𝛽 0
+ ∑ 𝛽 𝑗 𝑋 𝑗 𝑝 𝑗 = 1
)
and under the PWP-GT model as:
(4.15) ℎ
𝑠 ( 𝑡 , 𝑋 ) = ℎ
0 𝑠 ( 𝑡 − 𝑡 𝑠 − 1
) e x p ( 𝛽 0
+ ∑ 𝛽 𝑗 𝑋 𝑗 𝑝 𝑗 = 1
)
where 𝑡 𝑠 − 1
is the time of the previous event.
4.2.6.7 Frailty Models
The frailty model is used when observations can be clustered into groups such as
hospitals or cities, or when there are recurrent event times (Rondeau et al. 2003). In our
experiment each patch will be considered as a process or a group. This model uses a
74
random effect in order to account for the dependence between the recurrent events of a
process. This random effect, takes into account the unmeasured heterogeneity (between
subjects or processes), which cannot be measured using the covariates of the model
alone. In our experiment, this heterogeneity is present between different patches of the
experiment due to unmeasured covariates such as the properties of food on patches,
lighting and etc.
In the frailty model, the basic idea is to account for the heterogeneity by using an
unmeasured random effect in the hazard function (Hougard 2000). The shared frailty
model with a random effect is the most commonly used frailty model (Rondeau et al.
2003). Using this model, we assume that the recurrent event times (joining/leaving times)
are independent conditional on the covariates and the random effect (Amorim and Cai
2015). The frailty model is defined as below, by extending Equation (4.6) for the Cox
proportional hazards model:
(4.16) ℎ
𝑓 ( 𝑡 , 𝑋 ) = 𝑧 𝑖 ℎ ( 𝑡 , 𝑋 )
where 𝑧 𝑖 is the unmeasured random effect for the ith process (patch), which denotes the
frailty. The subscript f is used to distinguish the hazard function for the frailty model. The
process uses a hazard ratio, which is larger or smaller than average based on the size
value of the random effect ( 𝑧 𝑖 > 1 , 𝑧 𝑖 < 1).
4.2.7 Using frailty models to analyze sojourn time of leaving events
As we outlined in the previous section, the frailty model is used to analyze the
effect of different covariates on recurrent event data. Here we extend the frailty model to
include random effects for covariates of interest:
75
(4.17) ℎ
𝑚 ( 𝑡 , 𝑋 ) = ℎ
0
( 𝑡 ) e x p ( 𝛽 0
+ ∑ 𝛽 𝑗 𝑋 𝑗 𝑝 𝑗 = 1
+ ∑ 𝑏 𝑘 𝑍 𝑘 𝑞 𝑘 = 1
)
where Zk is the vector of the k
th
random effect covariate (including the random effect for
Patch ID) and bk is the coefficient of the effect of the random effect k. 𝑋 𝑗 is the value of
the jth covariate, 𝛽 0
is the intercept of the model and 𝛽 𝑗 is the coefficient of the effect of
covariate j on the baseline hazard. The subscript m denotes that this is the hazard for the
mixed effects model. The bk are iid samples drawn from a Gaussian distribution with
mean zero and variance Σ. The likelihood of the model (Rondeau et al. 2003) is based on
the Cox partial likelihood in equation (4.8) for any fixed b and 𝛽 values and is defined as:
(4.18) lo g [ 𝑃𝐿 ( 𝛽 , 𝑏 ) ] = ∑ ∫ [ 𝐼 𝑖 ( 𝑡 ) 𝜂 𝑖 ( 𝑡 ) − lo g ( ∑ 𝐼 𝑗 ( 𝑡 ) 𝑒 𝜂 𝑗 ( 𝑡 )
𝑗 ) ]
∞
0
𝑛 𝑖 = 1
where 𝜂 𝑖 ( 𝑡 ) = 𝛽 𝑋 𝑖 ( 𝑡 ) + 𝑏 𝑖 𝑍 𝑖 represents the linear score for each subject i at time t and
𝐼 𝑖 ( 𝑡 ), is an indicator for subject i being in the risk set. 𝐼 𝑖 ( 𝑡 ) = 1, if subject i is under
observation at time t, otherwise 𝐼 𝑖 ( 𝑡 ) = 0 (Therneau and Grambsch 2000). A maximum
likelihood method is used to estimate the fixed and random effect coefficients of the
model.
In our study, we analyze the effect of covariates on joining and leaving events
separately. This is done by using an indicator variable for leaving and joining events in
our model. When modeling leaving, the joining events are treated as censoring times and
when modeling joining, the leaving events are treated as censoring times. We include
male and female genotype as a random effect. Sex ratio, time period and number of flies
on patch are included as fixed effects in the model. This model was fit using the coxme
package in R. The significance of the fixed effects was assessed using a Z-test performed
by the anova.coxme function in R, and the significance of the random effects was tested
76
using a likelihood ratio test between the full model and a model excluding the factor of
interest by using the anova function in R. The models used were defined as:
Model 1:
Coxme ( Surv (Time to leaving event) ~ Number of Flies on Patch + Sex Ratio + Time
Period + (1|Male Genotype) + (1|Female Genotype) + (1|Patch ID) ).
Model 2:
Coxme ( Surv (Time to joining event) ~ Number of Flies on Patch + Sex Ratio + Time
Period + (1|Male Genotype) + (1|Female Genotype) + (1|Patch ID) ).
4.3 Results
4.3.1 Results and validation of the constrained HMM algorithm
We used the HMM model to reduce the number of false join/leave events. We
used one hundred 20-second intervals of video to validate the method. These intervals
where chosen from places with the most difference between the tracking and HMM
output. Table 1 shows the total number of join/leaves for the 3 different datasets over the
intervals. As we can see, the number of false join/leaves was reduced by 3.3 fold.
Table 1: Total number of join/leave events for three different datasets over one hundred
20 second intervals.
HMM
corrected
Tracking
output
Validation
data
Number of join/leaves recorded 580 1504 176
Using the constrained HMM algorithm, we are able to reduce the number of false
join/leave events (mostly those with small sojourn times). Figure 6 shows a histogram of
the sojourn time of the events in the observation data, and those inferred by the HMM for
77
a sample patch. In our experience with the data, we have seen that in the tracking data
false join/leave events tend to have small sojourn times.
Figure 6: Histogram of the sojourn time (waiting time) of flies before a join/leave event
happens. The blue bars show the observed raw tracking output data and the red bars show
the hidden states of the HMM which are used as the HMM corrected output.
4.3.2 Kaplan-Meier Survival Plots
As discussed earlier, Kaplan Meier plots can be used to visually assess the
difference between the survival functions for different levels of a covariate. Figure 7
shows the Survival function for different levels of covariates, estimated using the Kaplan
Meier estimator. We can see that there are small differences in the survival at different
levels of the covariates, however the significance of this is not clear. Therefore, we will
Observed output (blue) and the hidden states (red)
Count
0 25 50 75 100
0 200 400 600
L ength of waiting time in state before a jump
78
use a the frailty model, which is an extension to the Cox proportional hazards model, to
test the significance of these differences and quantify the effect of each covariate on the
survival function.
4.3.3 Assessing Proportional Hazard Assumption
This is done by plotting the ln-ln of the survival function for different levels of the
covariates as defined in Equation (4.10) in the methods section. If the proportional
hazards assumption holds, these plots are almost parallel. The plots can be seen in Figure
8. Looking at Figure 8 we can see that the plots for different levels of each covariate are
approximately parallel which justifies the use of Cox proportional hazard models.
Figure 7: Survival functions for the time before a leaving event for different levels of the
covariates a) genm=Male genotype b) genf=Female genotype c) sexratio= Number of
females in the arena d) Time of Day (Period)= the time of day the experiment was run
(1=am1, 2=am2, 3=pm1, 4=pm2)
79
Figure 8: ln (– ln (Survival )) plots for the time before a leaving event for different levels
of the covariates a) genm=Male genotype b) genf=Female genotype c) sexratio= Number
of females in the arena d) Time of Day (period)= the time of day the experiment was run
(1=am1, 2=am2, 3=pm1, 4=pm2). These plots can be used to assess the assumptions of
proportional effect of covariates on baseline hazard.
4.3.4 Using recurrent events models
4.3.4.1 Leaving events
The results of modeling leaving events using the frailty model for the HMM
corrected tracking data can be seen in Table 2. For this dataset, we have data from 112
patches of the experiment.
80
Table 2: Results for modeling leaving events using frailty models.
Fixed Effect Coef exp(Coef) df LRT-χ2 p-value
Number of Females -0.087 0.92 1 13.503 <10
-4
Time Period 3 4.503 0.212
Number of Flies on Patch 0.345 1.41 1 2953.679 <10
-4
Random Effect df LRT-χ2 p-value
Male Genotype 1 25.2 <10
-4
Female Genotype 1 25.83 <10
-4
Patch ID 1 1947 <10
-4
Here we discuss the effect of each covariate on the time before leaving separately:
1- Sex Ratio
An increase in the number of females in an experiment decreases the probability
of a leaving event happening by 8% (p-value<10
-4
). This is in concordance with the
findings of Foley et al. (2015) that individual female flies have lower leaving rates than
male flies. In addition, more females in an experiment may have an effect on the leaving
rates of other flies in the experiment. We also saw in Chapter 3 that female flies have
lower activity levels than male flies in D. melanogaster, which in turn can lead to a lower
patch leaving rate.
2- Number of Flies on Patch
There is a significant increase in the leaving rate as the number of flies on patch
increases (increases by 41%, p-value<10
-4
). This means that having more flies on patch
increases the chance of flies leaving, which is what we would expect.
3- Male and Female Genotype
81
The effect of male and female genotype is significant which suggests that there is
variation between genotypes in the rate of flies leaving. In this experiment the genotypes
of flies that were used have varying levels of aggression. The results here suggest that
there is a relation between the aggressive behavior of flies and the rate at which flies
leave patches. This is in agreement with the findings of Foley et al. (2015) that more
aggressive genotypes tend to have a higher leaving and joining rate.
4- Patch ID
The Patch ID variable was added to account for the unmeasured heterogeneity
between different patches. On the experiment level, this heterogeneity can be due to
lighting and difference in food between patches, since experiments on patches were
performed on different days. It might also be due to the persistence of particular flies on a
given patch, or their repeated return to the same patch. Flies are assumed to behave in
identical ways in our analysis, but the Patch ID term can soak up any effects due to actual
heterogeneity between their behavior, if such heterogeneity does in fact exist.
5- Time Period
As expected, we did not observe significance differences between leaving rates of
different time periods.
4.3.4.2 Joining events
The results of modeling joining events using the frailty model for the HMM
corrected tracking data can be seen in Table 3. For this dataset, we have data from 112
patches of the experiment.
82
Table 3: Results for modeling leaving events using frailty models.
Fixed Effect Coef exp(Coef) df LRT-χ2 p-value
Number of Females 0.066 1.07 1 6.024 0.014
Time Period 3 3.177 0.365
Number of Flies on Patch -0.30 0.74 1 2432.632 <10
-4
Random Effect df LRT-χ2 p-value
Male Genotype 1 49.67 <10
-4
Female Genotype 1 49.75 <10
-4
Patch ID 1 2316 <10
-4
Now we discuss the effect of each covariate on the time before joining separately:
1- Sex Ratio
An increase in the number of females in an experiment increases the probability
of a joining event happening by 7% (p-value<10
-4
). This suggests that flies tend to join
patches more in experiments with more females. Having more females in an experiment
can result in an increase in the ratio of females in groups of flies on patch, which suggests
that flies have a higher joining rate for patches with more females on them. This is also in
agreement with the results in Foley et al. (2015) which suggest that male flies have a
higher tendency to drive other flies away from the patch and make the patch undesirable.
2- Number of Flies on Patch
There is a significant decrease in the joining rate as the number of flies on patch
increases (decreases by 26%, p-value<10
-4
). This means that having more flies on patch
decreases the chance of flies joining. This is what we expect, since the number of flies in
an experiment is fixed.
3- Male and Female Genotype
83
As observed for leaving rates, there is variation between genotypes in the joining
rates of flies due to the significance of male and female genotypes. The results here
suggest that there is a relation between the aggressive behavior of flies and the rate at
which flies join patches. This is in agreement with the findings of Foley et al. (2015) that
more aggressive genotypes tend to have a higher joining rate.
4- Patch ID
We observed significance of Patch ID for joining rates. Similar to leaving rates,
this significance might be due to unmeasured heterogeneity between different patches,
which in turn might be caused by differences in lighting or food or the long-term
presence of specific flies on a patch.
5- Time Periods
As expected, we did not observe significance differences between joining rates of
different time periods.
4.4 Discussion
Automatic tracking has been widely used in behavioral studies for small
organisms. Despite the fact that many methods for automatic tracking have been
introduced, the difficulties of identifying the number of objects in open field tracking
problems still persist. In this chapter, a constrained HMM was introduced as a post-
processing step to correct the estimates of the number of objects in a given frame. We
showed that this method produces better estimates for the number of objects throughout
the frames of a video by removing many of the errors, particularly those associated with
short sojourn times.
84
In many behavioral studies involving tracking, we are interested in measuring the
presence of certain behaviors in our subjects over time and how particular treatments
affect these outcomes. In this chapter, we used recurrent event models in order to study
these treatment effects on movement behavior.
We studied the leaving and joining rate of flies from patches of food in different
social settings and observed a high variation between leaving and joining rates for
different genotypes of flies. In Chapter 3, we observed that in D. Melanogaster, female
flies tend to have much lower movement rates than male flies and have a tendency to
remain in the same place. Here we observed a decrease in leaving rate as the ratio of
females in the arena increased, which is in line with those results and which can therefore
be an important factor in shaping the group structure of flies.
Frailty models are used when there is within-subject heterogeneity that can be
characterized using a random effect. Caution needs to be taken when using these models
with small number of events/subjects in order to produce stable estimates. A smaller
number of events seem to be sufficient if the random effect is large, otherwise it is
necessary to have a large number of events (Amorim and Cai 2015). In our setup, for
each patch we have a fairly large number of events, so the model will be able to produce
stable estimates for both small and large random effects.
Kaplan-Meier plots were used to visualize the difference between survival
functions of different levels of covariates. These plots can be utilized as a tool for
estimating the rate of survival in a given state. This can help us get a simple descriptive
measure of the difference between populations and different treatments. Using these
85
plots, we were able to visualize the difference between genotypes of male and female
flies in their leaving rates from patches.
In each arena of our experiment, flies can move freely between four patches of
food in the arena, which can cause correlation between summary statistics from these
patches. In our analysis, we used the assumption of independence between the four
patches of an arena in modeling leaving/joining rates. However, we can use information
from other patches of an arena to define summary statistics for joining and leaving to be
able to perform a more accurate analysis. We also fit HMMs separately for each patch to
correct the estimates for the number of flies on patch. An alternative way of utilizing the
HMMs is by fitting the HMMs jointly, simultaneously using the information of all four
patches of an experiment. The rationale behind this method is that joining and leaving
events on different patches of a four-patch experiment might be correlated. However, it is
important to recall that flies do have an off patch state which is not observed, but which
must be passed through between changing patches. This will reduce any correlation
between the number of flies on the four patches.
In summary, we have shown the use of recurrent events time series analysis to
study movement behavior and produced results, which are consistent with our beliefs
about fly movement behavior. Our source data was automatic tracking data, and we
studied the effects of genotype and sex ratio on social structures of fruit flies. We also
introduced a post-processing method to correct calls of the number of objects in any
given frame of a video by utilizing an HMM. As such we have demonstrated a reasonably
high-throughput analysis pipeline for such data.
86
Chapter 5
MovTrack and Click-it as tools for
studying the behavior of organisms in
video-recorded setups
5.1 Introduction
In Chapter 2, we introduced an automatic tracking software, which can be used
for behavioral studies of small organisms. This software was used in Chapter 3 and
Chapter 4 to study the behaviors of fruit flies. Here we introduce Click-it and MovTrack
as additional novel software for studying the behavior of organisms in video-recorded
setups.
Click-it is a graphical user interface (GUI) based software application for semi-
manual tracking of objects, developed in Matlab. The GUI allows for simple use by those
not familiar with working in coding environments. It provides an alternative to automatic
video tracking in experiments in which that is not feasible. MovTrack is a tool that allows
87
for high-throughput analysis of behavior from video-recorded organisms. MovTrack is
implemented in Matlab, and produces summaries of animal movement from video input.
While, in an ideal setting, automatic tracking utilities can provide high-resolution
data on the position of objects throughout a video, automated tracking algorithms have
several shortcomings which limit their utility. First, automatic video tracking tools can
rarely be used out of the box and require tuning to be able to produce reliable tracking
data for new tracking problems. Second, these algorithms typically require specific (and
highly constrained) experimental setups in order to be able to provide accurate tracking
data. For example, in Branson et al. (2009), the algorithm used requires high contrast
between subject and background, and therefore uses backlit videos. In Ardekani et al.
(2012), small shadows and changes in lighting are shown to affect the accuracy of the
tracking data. These constraints on the setup of the experiment, as well as being non-
trivial to achieve, can also significantly affect the results of the experiment. For example,
in animal behavior studies, changing the lighting or restricting the arena can have effects
on the behavior of the animals and result in inferences that may be misleading when used
in the context of animals in their natural habitats (Baldauf et al. 2008). Furthermore, in
many biological studies experiments are actually performed in the natural environment of
the animal, where we have little to no control over the experimental environment in terms
of lighting and background (Mench 1998). Click-it requires no restrictions to the
experimental setup and can be used with any video. Third, automatic tracking algorithms
tend to have low accuracy in settings involving many moving objects (e.g., Ardekani et al
2012), or significant numbers of interactions (‘occlusions’) between objects. These issues
can lead to incorrect estimates of the number of objects, and can also lead to inadvertent
88
switching of tracks between objects. By using Click-it as a semi-manual annotation
device, such errors can be minimized.
Click-it can also be used to generate training data by quickly and accurately
labelling the location of objects in a large number of frames, regardless of lighting. This
training data can then be used to develop machine learning algorithms or to assess the
accuracy of automated tracking algorithms.
In some setups it is sufficient to use the overall behavior of a group of individuals
rather than the behavior of individual organisms. MovTrack provides researchers with the
ability to measure the overall behavior of a group of individuals over time in studies for
which automatic tracking is not feasible. Because different experimental setups produce
videos that vary in features such as luminance, color, contrast and frame rate, and
because animals vary in their rate of movement, MovTrack allows for the adjustment of
settings such as luminance thresholding and frame-rate resolution.
In this chapter, we describe our implementation of the Click-it and MovTrack
software and give a brief overview of how they work. We will then give examples of
their use in the study of fruit flies and the behavior of pacific oyster larvae.
5.2 Methods
5.2.1 MovTrack: A tool for high-throughput analysis of the behavior of
video-recorded organisms
This software allows for high-throughput analysis of behavior from video-
recorded organisms. MovTrack was implemented in Matlab (The MathWorks, Inc.,
89
Natick, Massachusetts, United States) and uses video input to produce summaries of
animal movement. MovTrack summarizes movement as follows. First, the video frames
are subsampled at set intervals, by an amount that the user has specified. A short
sampling interval will be appropriate if animals are moving rapidly. A longer interval
risks reaching saturation (i.e. all animals will move at a greater-than-body-length in each
interval). If animals are moving slowly, however, a longer interval is more appropriate. In
the example studies of Pacific oyster larvae and flies described below, we used a
resolution of one frame per second, which was sufficient to detect movement changes
among treatments.
The difference in pixel intensity between pairs of consecutive sampled frames,
calculated for each pixel, is then defined as the “difference frame”. This difference frame
is the measure of change between time points. The difference frame is then transformed
to an 8-bit gray-scale image, and thresholded to filter out minor changes in frames not
caused by movement of organisms (such as light fluctuations, or minor changes in non-
focal features). It is best to maximize difference in light intensity between foreground
(target) and background, within the experimental setup, to facilitate threshold selection.
We apply a threshold of 30 (based on the 8-bit grayscale intensity spectrum 0-255).
The threshold is then applied to the pixel intensities of the grayscale image, to
convert grayscale to binary “different” or “not different” pixels. The measure of
movement between two consecutive frames is then calculated as the number of
“different” pixels between frames. By repeating this calculation across the video, we
produce a time series estimate of movement. We will now give a step by step
documentation on how to use MovTrack.
90
5.2.1.1 Using MovTrack
This software was developed in Matlab R2015b. It utilizes Matlab functions from
the image processing toolbox. Use of the MovTrack software requires the following
steps:
1- Install Matlab software.
2- Open the MovTrack_Gui.m using Matlab.
3- Click the Run button (green play button) from the Editor tab in Matlab (Figure 1).
4- If prompted, choose the “Add to path” button. This will put the folder containing
the ‘m’ files in the Matlab path so the functions in the folder can be used.
Figure 1: The Editor tab in Matlab
5- After completing the previous steps, the software user interface will open up,
which can be seen in Figure 2. This interface has two panels. The top panel
(Single video) can be used to extract movement of a single video. The bottom
panel (Batch Video) can be used to extract movement data from multiple videos.
6- Single Video panel: Using this panel movement is extracted in the following
steps.
a. Use the Browse button to load the video.
b. Determine the desired frame rate to analyze the video. Default is 30,
which results in a resolution of 1 frame/sec in the movement time series
for a 30 frame/sec video.
91
Figure 2: The user interface for MovTrack
c. Click the Run button. This will start the analysis process. After the process
has concluded, you will see a dialog box. Press ok to continue.
d. The movement track file will be stored as a csv file in the same folder as
the video with the name “[video_name]_ _movement_frame_rate_X.csv”.
7- Batch Process:
a. Load a txt file containing the name of videos to be processed.
b. Determine the desired frame rate to analyze the video. Default is 30
frames, which results in resolution 1 frame/sec in the movement time
series for a 30 frame/sec video.
92
c. Click the Run button. This will start the analysis process. After the process
has concluded, you will see a dialog box. Press ok to continue.
d. The movement track files will be stored as csv files in the same folder as
the videos with the name “[video_name]_movement_frame_rate_X.csv”.
5.2.2 Click-it: User interface for manual low resolution-high accuracy
object detection
We now describe “Click-it”, a software platform that can be used to create low
resolution, and high accuracy tracking data using semi-manual annotation. Click-it is
designed in Matlab 2015b and it uses the Image Processing Toolbox (Version 9.3) (The
MathWorks, Inc., Natick, Massachusetts, United States). Click-it allows the input of the
frame rate at which the video should be sampled and the start/stop frame of the section of
video we intend to annotate. Moreover, it allows the user to use the mouse cursor and
mouse clicks to indicate (‘detect’) objects in each frame. The keyboard can also be used
to label detected objects in frames. This is useful in cases in which more than two objects
are present. The output data is a csv file containing the frame id, x, y coordinates and the
ASCII code of the button clicked for each object detected. A complete tutorial on using
Click-it is provided in the next section.
5.2.2.1 Explaining the UI and how it works with images and steps
In this section we will illustrate the use of Click-it with an example using a
sample video from the study of Drosophila behavior. This video is a sample from the
experimental setup introduced in Chapter 4. In this study, different combinations of
93
genotypes and sex ratios of Drosophila melanogaster were introduced to a complex arena
with multiple food patches. Flies were marked by small dots of colored paint to
distinguish sex and genotype. Each patch was videotaped, and the spontaneously-forming
social groups were characterized. We will now give a step-by-step explanation of how to
use Click-it.
5.2.2.2 Using Click-it
Step 1: Initiation
Open the Click_It.m file using Matlab and run the program. When prompted, click
on the Add Path option to place the folder containing the program in the Matlab path. The
GUI of the program will then open. The GUI can be seen in Figure 3.
The buttons and text boxes function as follows;
- ‘Browse’ button: Use this button to open the Video file you want to track.
- ‘Frame Rate’ textbox: Enter the frame rate you want to use to sample your video.
(e.g., to sample one frame per second from a 30 frame/sec video, use 30).
- ‘Start Frame’ textbox: Enter the starting frame for the portion of video from
which you want to sample.
- ‘Stop Frame’ textbox: Enter the end frame of the portion of video from which you
want to sample.
- ‘Run’ button: After filling all the fields, press ‘Run’ to begin annotation.
Step2: Object Detection
94
Figure 3: The GUI of the Click-it program
After pressing ‘Run’, you will see the first frame of the portion of video you wish
to annotate (Figure 4). Use the cursor in order to click on the objects you wish to detect.
The program stores the x-y position of each object in addition to the mouse button
clicked. If you have three or fewer objects, you can use the different mouse buttons to
distinguish between them. If there are more than three objects, you can use keyboard
buttons to distinguish between them. To use the keyboard, point the cursor to the object,
and press the keyboard button annotated to that object.
After detecting all the objects in the frame, click the “return/enter” button on the
keyboard. The detected objects will be shown on the frame using an alphabetical
character (either the key pressed or M1, M2 or M3 for the three mouse buttons). An
example can be seen in Figure 5.
You will be prompted to continue to the next frame and can continue by clicking
‘Yes’. At this point, if you have made a mistake, you can redo the current frame by
clicking ‘No’.
Completion: After Annotating the last frame and clicking “Yes”, no new frame
will pop up. Upon completion, results will be saved in a csv file with the same name as
95
the video file, but with ‘_Objects.csv’ appended to the name. This file will be saved to
the same folder as the video. Table 1 shows a sample of the saved file.
Figure 4: A sample frame opened after pressing the ‘Run’ button in Click-it. The cursor
can be used to detect each object in the video frame.
Figure 5: The resulting frame with detected objects shown and with alphabetical
characters in green.
96
Table 1: A sample of the resulting data extracted from a video. The first column in the
table is a unique index for each entry in the program. This table shows the annotation of
16 flies over 2 frames (total 108 frames processed). The Second column shows the
Frame_ID and the third column is the ascii code for the key entered for each entry. The
fourth and fifth column show the x,y coordinates of the entries.
5.3 Results
5.3.1 Examples of Use
Movtrack can be used in the study of the behavior of a wide variety of organisms.
Here we will give two examples of its use for analyzing behavior of Drosophila and
Pacific oyster larvae. Specifically, we will give a detailed description of the use of
MovTrack in finding the sedation times of Drosophila simulans and proceed to analyze
the genetic variation in sedation times under ethanol exposure. We will also give a brief
97
description of the use of MovTrack in the study of pacific oyster larvae and direct the
reader to our published paper (Hall et al. 2016) for a complete overview of this study.
5.3.2 Analyzing the sedation times of D. Simulans using MovTrack
In this experiment we are interested in the genetic variation observed in sedation
times of different genotypes of D. simulans, under the effect of ethanol. The study of
sedation due to ethanol sensitivity in fruit flies has been of much interest (Linde et al.
2014; Maples and Rothenfluh 2011; Kaun et al. 2012). Here we show the utility of
MovTrack in quantifying sedation times of fruit flies and study the effect of genotype on
the sedation times of D. simulans. Movtrack provides a measure of movement of flies,
which can be used to measure sedation times.
5.3.2.1 Experimental Design
Our experimental setup, which can be seen in Figure 6, is as follows. First we
place flies in an open test tube, which has food at the bottom. We close the other end of
the tube with a tip, which in some experiments contains ethanol. We place multiple test
tubes horizontally on a tube holder and run the experiment for two hours while recording
the experiment using a camera. In each tube we only have flies from a single genotype.
During the experiment, the flies eventually become sedated. Our goal is to calculate the
sedation time for each genotype and study variation in the sedation time between
different genotypes and under the effect of ethanol. We run the experiment for data from
943 tubes, which in total contain flies from 154 Genotypes, in two treatments (ethanol,
and non-ethanol). We define the sedation time of flies as the time at which the movement
of flies inside the test tube drops below a certain threshold. So, in order to analyze
98
sedation times, we will first need to have a measure of the flies’ movement inside the test
tubes.
Figure 6: the setup of the experiment for finding the sedation time of flies under ethanol
exposure.
5.3.2.2 Detecting overall movement using MovTrack
To measure the movement of the flies inside the patch, we use image-processing
methods. The process is as follows:
1 - In the first step, we need to find the position of each tube inside the video. To
do this we first find the position of flies for a sub-sample of the frames of the videos (we
sub-sample by a factor of 300). This is done by finding the background image of the
video (using the first 5 minutes of the video before the flies are sedated) and calculating
the position of the flies inside each frame in the sub-sample using the foreground frame
99
and blob detection (explained in detail in Chapter 2). Here we are not concerned about
merged blobs since we only want to find the approximate position of the flies to estimate
the position of the tube. We save the blob positions into a dataset.
2 - After finding the position of blobs, we use an agglomerative hierarchical
clustering algorithm (Ward 1963), based on Euclidean distance, in order to cluster the
blobs using their y coordinates. An agglomerative hierarchical clustering algorithm is a
bottom-up approach in which each observation starts off as its own cluster and as we
move up in the hierarchy, pairs of clusters merge to form new clusters. At each level of
the hierarchy, the blobs are assigned to different clusters. Our aim is to choose the level
of the hierarchy where all blobs of the same tube cluster together. This is done by using a
manually defined threshold on the distance between the clusters in the hierarchy. The
distance between two clusters is defined as the average Euclidian distance between each
of their members (group average). We use a threshold of 40 for the distance between
clusters, meaning that any two clusters with a distance less than 40 are merged together.
The threshold is defined by looking at a training set of 20 different videos and
determining the distance threshold at which the number of clusters is equal to the number
of tubes. We then find the clusters at the cut-off threshold and assign each blob to a
separate cluster.
3 - In each cluster we find the maximum and minimum y and x coordinates of the
blobs in that cluster, which correspond to the position of the test tube. The length of the
tubes is constrained to be same and is calculated by using the median of the x coordinate
of the left most and right most points of the tubes. This is done to reduce errors in finding
the position of the tubes. Figure 7 shows the tubes found for a sample video.
100
Figure 7: The tubes detected using an agglomerative hierarchical clustering algorithm.
4 - Now in order to estimate the movement of the flies inside the test tubes, we
find the change in the pixel intensities between consecutive frames 1 second apart. By
summing up these differences in pixel intensities inside the mask of each tube, as
described earlier, we construct a measure for the movement per second inside that tube.
We save the movement per second as a time series for each genotype, which can then be
used to find the sedation time. Figure 8 shows a sample of the time series of the
movement per second inside the test tube for a sample genotype.
101
Figure 8: The time series of movement of flies inside a single tube.
5.3.2.3 Moving Average and thresholding
To find the sedation times, we use the time series of the movement per second in
each tube. The idea is that if the average movement of the flies, over a window, drops
below a certain threshold and remains there, the flies are annotated as being sedated. The
sedation time is found using the following steps: First we use a simple moving average
method with a window length of 300 seconds in order to smooth the time series of the
data.
(5.1) 𝑀𝐴
𝑖 =
1
300
∑ 𝑋 j
𝑖 𝑗 = 𝑖 − 300
𝑓 𝑜 𝑟 𝑖 ≥ 300
Movement
Time
102
where Xi is the measure of movement at time i and MAi is the smoothed measure
of movement at time i. In the second step, we determine the threshold of movement for
sedation. This threshold is determined using a training dataset. For each tube in this
dataset we first observe the videos and manually determine the sedation times. Then we
look at the time series of movement and determine the movement inside the tubes using
the time series of movement at time of sedation. The average of the movement inside the
tubes at this time in our training dataset is noted as 𝑀𝐴
𝑠 and the threshold 𝜏 will be:
(5.2) 𝜏 = 1 . 2 × 𝑀𝐴
𝑠
The coefficient is added to account for the noise in the movement time series. In the next
step, we use this threshold on the moving average time series to get the thresholded time
series TMAi defined as:
(5.3) 𝑇 𝑀 𝐴 𝑖 = 𝐼 ( 𝑀𝐴
𝑖 > 𝜏 )
The thresholded time series will be a series of ones and zeros. The time of
sedation is defined as the time when the thresholded time series reaches zero and does not
change back to one. This will be equal to the time the moving average of the time series
drops below the threshold and stays there till the end of the experiment. This is done for
each tube and the resulting sedation time is stored in a dataset. In cases in which the
movement time series does not drop below the threshold, or the movement drops and
again increases over the threshold, sedation is determined not to have happened and left
censoring has happened. In these cases, we flag the case as left censored and return the
length of the experiment as the sedation time
We also calculate the slope of movement before sedation:
103
(5.4) 𝑠 𝑙 𝑜 𝑝 𝑒 =
𝑀𝐴
𝑆 𝑒𝑑 𝑎 𝑡 𝑖 𝑜 𝑛 𝑡 𝑖 𝑚 𝑒 − 𝑀𝐴
3 0 0
𝑆 𝑒 𝑑 𝑎 𝑡𝑖 𝑜𝑛 𝑡𝑖 𝑚 𝑒
Using a moving average method with a window of 300 seconds causes the
method to be prone to issues when sedation occurs before the first five minutes. In our
analysis we consider the sedation time to be 300 seconds for these experiments, however
we do not observe many experiments, which sedate in less than five minutes.
5.3.2.4 Validation of Sedation time method
In order to validate the sedation time calculation method, we randomly selected
80 experiments with D. simulans flies and observed the videos at the calculated sedation
times. We observed that the flies in all of these experiments were sedated within a one-
minute interval of the sedation time.
5.3.2.5 Genotypic variation in sedation times in ethanol environments
Figure 9 shows the boxplot of the sedation times over different genotypes for
ethanol experiments. We can see that there is genotypic variation between sedation times
of D. simulans under ethanol exposure (sedation time~genotype: χ
2
(df=1)=49.15, p <10
-
3
). We tested this effect using a linear mixed effect model. In this model Genotype was
treated as a random effect, and we used the log of the sedation times to normalize the
data.
104
Figure 9: Boxplot of sedation times of ethanol experiments for different genotypes
5.3.3 A new behavioral phenotyping strategy for pacific oysters
Copper is among the most studied marine metallotoxins (Rivera-Duarte et al.
2005). It is both heavily utilized in commercial applications (e.g. industrial discharges
and antifouling hull coatings) and readily bioavailable in the water column. In bivalve
mollusks, common responses to copper toxicity include increased mortality rates and
disruption of normal development, especially during early life history stages (Mai et al.
2012; Ivanina et al. 2014). In this example study we profile the larval movement
characteristics of 48 hour old larval Pacific oysters Crassostrea gigas (C. gigas) under
increasing concentrations of Cu2
+
. C. gigas full sibling families were subjected to a
0
1000
2000
3000
4000
SZ100 SZ101 SZ102 SZ11 SZ113 SZ116 SZ12 SZ120 SZ121 SZ123 SZ124 SZ125 SZ126 SZ127 SZ128 SZ130 SZ133 SZ134 SZ135 SZ136 SZ137 SZ138 SZ139 SZ14 SZ141 SZ142 SZ143 SZ145 SZ146 SZ15 SZ153 SZ154 SZ156 SZ158 SZ16 SZ160 SZ163 SZ164 SZ166 SZ168 SZ173 SZ175 SZ182 SZ185 SZ186 SZ189 SZ19 SZ190 SZ193 SZ194 SZ195 SZ196 SZ200 SZ201 SZ202 SZ204 SZ207 SZ208 SZ209 SZ211 SZ213 SZ214 SZ218 SZ22 SZ220 SZ223 SZ225 SZ226 SZ228 SZ232 SZ233 SZ234 SZ235 SZ237 SZ238 SZ24 SZ241 SZ243 SZ244 SZ246 SZ248 SZ249 SZ252 SZ254 SZ259 SZ262 SZ263 SZ265 SZ266 SZ267 SZ268 SZ27 SZ270 SZ271 SZ272 SZ274 SZ275 SZ278 SZ281 SZ282 SZ284 SZ286 SZ287 SZ288 SZ289 SZ29 SZ291 SZ292 SZ293 SZ298 SZ299 SZ3 SZ30 SZ31 SZ323 SZ33 SZ39 SZ4 SZ40 SZ41 SZ43 SZ45 SZ46 SZ47 SZ48 SZ49 SZ5 SZ53 SZ54 SZ56 SZ58 SZ6 SZ60 SZ62 SZ64 SZ65 SZ68 SZ71 SZ75 SZ77 SZ8 SZ80 SZ82 SZ87 SZ9 SZ93 SZ99
Genotype
Sedation_Time
Boxplot of Sedation Time vs Genotype in Ethanol Exper iments
105
series of increasing Cu
2+
concentrations in Filtered Sea Water (FSW), from 0 ppb to 36
ppb, for n=10 conditions. MovTrack was used to show quantitatively that Cu2+
concentration and total movement of larvae are not dependently linked. A familial
component of Cu2+ stress reaction was potentially observed, with some genetic lines
showing significant differences in movement metrics, supporting the hypothesis that
Cu2+ toxicity response may have a heritable component. This work provides a proof of
concept for MovTrack software as a reliable phenotyping strategy for quantitative
measurement of marine larvae behavior. For full methods and results refer to Hall et. al
(2016).
Click-it:
Click-it can also be used in a wide range of applications. Here we will illustrate
this with two examples of the use of Click-it in the study of Drosophila behavior.
5.3.4 High accuracy data using Click-it for the study of Drosophila
melanogaster group structure
The first example of the use of Click-it is based on the experimental setup
explained in Chapter 4. In brief, different combinations of genotypes and sex ratios of D.
melanogaster were introduced to a complex arena with multiple food patches. Flies were
marked by small dots of colored paint to distinguish sex and genotype. Each patch was
videotaped, and the spontaneously-forming social groups were characterized. A picture
of a sample experimental patch in one of these trials was shown in Figure 4. Automatic
detection of the color of these flies proved to be unreliable. Instead, Click-it was used to
generate reliable data for both the number of flies on each experimental patch, and their
106
color (a different keyboard button was used when capturing the position of each color of
fly). The average time for an inexperienced researcher (undergraduate) to process 100
frames from a video, containing an average of 4 flies, was approximately 15 minutes.
The data obtained - XY coordinate information, sex, and color information - can
be used to gather samples to train machine learning classifiers, for example.
5.4 Discussion and Future development
Click-it is a simple tool for generating high-accuracy/low-resolution data in setups
in which automatic tracking is either non-trivial or impossible. However, being manual,
the process of data collection using Click-it, while extremely accurate, is time
consuming. When automated tracking is accurate, it is clearly preferable. Click-it,
however, can be useful in generating training data for such algorithms, or for machine
learning or other classification techniques. It can also work together with automatic
algorithms to produce unbiased estimates of reliability. As such, in future versions of
Click-it automatic tracking might be used in addition to semi-manual annotation to add
an intelligent guess for the position of objects in each frame, from which cursor
movement can proceed where necessary. This has the potential to decrease the time
required for data generation.
Movtrack is a useful tool in the study of group behavior for a variety of
organisms. Here we observed the utility of MovTrack in the study of fruit flies and in
aquatic settings. MovTrack is simple to work with and can be easily adapted to be useful
in other experimental settings. It can also be used in the primary steps of an experiment to
determine the experimental settings. In the study of Pacific oysters, we used MovTrack in
107
the early stages of the experiment to determine the camera zoom (i.e. 2x, 3x, …) needed
to get the best measure of movement. It was also used as an initial screening step to
determine the range of Cu
2+
concentration needed for the experiment.
In summary, in this chapter we have introduced novel software for the study of
behavior in organisms, which can help in further improving the quality and simplicity of
behavior studies.
108
Chapter 6:
Conclusion
In this thesis we studied animal behavior. More specifically, we studied the
behavior of Drosophila, which is among the most widely-used model organisms for the
study of human disease and behavior. Here we used computer vision software to quantify
the movement of flies. This software ranged from automatic tracking to semi-manual
annotation software. The measured movement was then used to understand the dynamics
of aspects of Drosophila behavior.
One of the main focuses of this thesis was automatic tracking. In Chapter 2, we
introduced software for High-throughput tracking of flies to collect movement data based
on the well-known Hungarian algorithm (Kuhn 1955). Here we used information from
the flies (i.e. sex, movement direction) and extended the Hungarian algorithm to develop
a more accurate tracking algorithm. Our method requires minimal restrictions to lighting
109
conditions and is able to detect flies in low-resolution videos, which in turn allows for the
high throughput study of fly behavior in more natural environments (such as the
experiment in Chapter 3). We also saw in Chapter 2 that with an extra background
correction step, the software can allow for the detection of flies which seldom move
through an experiment. This is very useful in experimental setups in which the interest is
in the effect of a stimulus that highly reduces the movement of the flies. An example of
this was the study of Drosophila movement under the effect of ethanol, which has been
shown to sedate flies given enough exposure time. Our tracking software there however
had the limitation that the number of objects in a given frame need to be known a priori.
Though this did not cause any restriction in our studies, in which the number of flies is
fixed and known, in future extensions an extra step could be used to estimate the number
of objects in a given frame. One might use different methods for this estimation, based on
a trade-off between accuracy-speed. For example, an extra optimization step could be
added to the GMM algorithm to detect the number of objects in a blob. This could be
done by calculating a goodness-of-fit measure for GMMs with different number of
Gaussians fit to the blob of interest and use the results with the best fit. This however will
of course produce some errors, which might be fixed in a post-processing step by using
the data from all the frames in a video (similar to the HMM method in Chapter 4).
This tracking software was used in Chapter 3 to study the effect of social
environment on the locomotory behavior of Drosophila melanogaster. Using automatic
tracking, we were able to measure the effect of genotype and abiotic environment on the
movement of male and female flies in closed experimental arenas. We used mixed effect
models to measure the “coefficient of interaction”, Ψ, and show that it varies between
110
abiotic environments for locomotion in D. melanogaster. We observed that females tend
to move more when partnered with more active males. This is one of the few studies to
have measured Ψ. We also note that a similar analysis has been performed to study the
effects of social environment on D. simulans. The results of the comparison of these
effects in the two species has been prepared as a paper and at the time of preparation of
this thesis is under review at the American Naturalists Journal.
In Chapter 4 we focused on the study of social group structure in Drosophila. We
proposed a constrained HMM method to fix errors in identifying the number of objects in
an open field tracking problem. Using this method, we are able to produce more accurate
estimates for the number of flies on a patch of food in our experiment setup. The
corrected data was then used to study the joining and leaving rates of flies of different
genotypes from patches of food in various social settings. This analysis was done using
frailty models. Frailty models are able to account for the within-‘individual’
heterogeneity in recurrent event data using a random effect. In many behavioral studies,
we are interested in studying the recurrence of a specific behavior. For example the rate
of courtship before mating happens. In this chapter we introduced a high-throughput
pipeline that could be used for the analysis of data collected from these studies in future.
In Chapter 4, we assumed independence between the four food patches of an
experiment. This assumption seems a reasonable pragmatic step, given that flies can also
be “off-patch” (i.e., unobserved) in that experimental set-up. In future extensions to that
work, we could simultaneously use the information count of flies on all four patches of an
experiment. This would more accurately reflect the experimental data, but the resulting
111
Markov Chain would have a significantly more complex state-space, and it is not obvious
that the resulting analysis would be improved.
In Chapter 5 we introduced two novel pieces of software for the analysis of
animal behavior. These software can be used in video-based setups in which automatic
tracking is not feasible due to restrictions that automatic tracking software would impose
on the experimental set-up. In many behavioral studies, we are interested in how animals
behave in their natural environment. The restrictions imposed by existing automatic
tracking routines require changes to the study environment and so results might not apply
to flies in their natural environment. Click-it allows for semi-manual tracking of objects
using a user friendly GUI. Click-it is easy to use and provides high-accuracy data for use
in studying behavior. It can also be used to create training data for assessing the accuracy
of automated tracking algorithms or machine learning algorithms.
MovTrack was introduced to measure the overall behavior of a group of
individuals over time. MovTrack is able to adjust for different lighting between
experiment setups by allowing for the adjustment of luminance thresholding and hence
does not impose significant lighting restrictions on the experimental setup. In Chapter 5,
we showed the utility of this software with different organisms and in a variety of
experimental settings.
In Summary, in this thesis we reported on the development of novel software and
methods for the study of animal behavior. Here, this software was mostly used for the
study of Drosophila behavior, though it could easily be used for the study of other
organisms given minor adjustments. In future work, the software could be further
extended to account for more complex setups. The constrained HMM introduced in
112
Chapter 4 could be used to input the number of objects in a frame for the tracking
software introduced in Chapter 2 to further extend this software for the use in general
setups where the number of objects of interest is not known a priori. In addition the
tracking data produced from our software could be used in addition to Click-it to build an
automatic platform for detecting complex behaviors of Drosophila based on machine
learning algorithms.
113
References
Adler, M. I., Cassidy, E. J., Fricke, C., & Bonduriansky, R. (2013). The Lifespan-
Reproduction Trade-Off Under Dietary Restriction Is Sex-Specific And Context-
Dependent. Experimental Gerontology, 48(6), 539–548.
Agrawal, S., Safarik, S., & Dickinson, M. (2014). The Relative Roles Of Vision And
Chemosensation In Mate Recognition Of Drosophila Melanogaster. The Journal of
Experimental Biology, 217(15), 2796 LP-2805.
Allada, R., & Chung, B. Y. (2010). Circadian Organization of Behavior and Physiology
in Drosophila. Annual Review of Physiology, 72, 605–624.
Amorim, L. D. A. F., & Cai, J. (2015). Modelling recurrent events: a tutorial for analysis
in epidemiology. International Journal of Epidemiology, 44(1), 324–333.
Andersen, P. K., & Gill, R. D. (1982). Cox’s Regression Model for Counting Processes:
A Large Sample Study. The Annals of Statistics, 10(4), 1100–1120.
Ardekani, R., Biyani, A., Dalton, J. E., Saltz, J. B., Arbeitman, M. N., Tower, J., …
Tavaré, S. (2012). Three-Dimensional Tracking And Behaviour Monitoring Of Multiple
Fruit Flies. Journal of The Royal Society Interface, 10(78):20120547.
Arnqvist, G., Vellnow, N., & Rowe, L. (2014). The Effect Of Epistasis On Sexually
Antagonistic Genetic Variation. Proceedings of the Royal Society B: Biological Sciences,
281(1787) pii: 20140489.
Badyaev, A. V. (2002). Growing Apart: An Ontogenetic Perspective On The Evolution
Of Sexual Size Dimorphism. Trends in Ecology & Evolution, 17(8), 369–378.
Bailey, N. W., & Zuk, M. (2012). Socially Flexible Female Choice Differs Among
Populations Of The Pacific Field Cricket: Geographical Variation In The Interaction
Coefficient Psi (Ψ). Proceedings of the Royal Society B: Biological Sciences, 279(1742),
3589–3596.
Balakireva, M., Stocker, R. F., Gendre, N., & Ferveur, J.-F. (1998). Voila, a New
DrosophilaCourtship Variant that Affects the Nervous System: Behavioral, Neural, and
Genetic Characterization. The Journal of Neuroscience, 18(11), 4335-4343.
Baldauf, S. A., Kullmann, H., & Bakker, T. C. M. (2008). Technical Restrictions of
Computer-Manipulated Visual Stimuli and Display Units for Studying Animal
Behaviour. Ethology, 114(8), 737–751.
Barton, N. H. (1990). Pleiotropic Models of Quantitative Variation. Genetics, 124(3),
773–782.
Bateman, A. J. (1948). Intra-Sexual Selection In Drosophila. Heredity, 2(3), 349–368.
Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A Maximization Technique
Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains. The
Annals of Mathematical Statistics 41, 164–171.
Bergland, A. O., Behrman, E. L., O’Brien, K. R., Schmidt, P. S., & Petrov, D. A. (2014).
114
Genomic Evidence of Rapid and Stable Adaptive Oscillations over Seasonal Time Scales
in Drosophila. PLoS Genetics, 10(11), e1004775.
Bijma, P. (2014). The Quantitative Genetics Of Indirect Genetic Effects: A Selective
Review Of Modelling Issues. Heredity, 112(1), 61–69.
Bleakley, B. H., & Brodie, E. D. (2009). Indirect Genetic Effects Influence Antipredator
Behavior in Guppies: Estimates of the Coefficient of Interaction Psi and the Inheritance
of Reciprocity. Evolution, 63(7), 1796–1806.
Bontonou, G., & Wicker-Thomas, C. (2014). Sexual Communication in the Drosophila
Genus. Insects, 5(2), 439–458.
Botero, C. A., & Rubenstein, D. R. (2012). Fluctuating Environments, Sexual Selection
and the Evolution of Flexible Mate Choice in Birds. PLoS ONE, 7(2), e32311.
Brakefield, P. M. (2003). Artificial Selection And The Development Of Ecologically
Relevant Phenotypes. Ecology, 84(7), 1661–1671.
Branson, K., Robie, A. A., Bender, J., Perona, P., & Dickinson, M. H. (2009). High-
throughput ethomics in large groups of Drosophila. Nat Meth, 6(6), 451–457.
Breslow, N. E. (1975). Analysis of Survival Data under the Proportional Hazards Model.
International Statistical Review / Revue Internationale de Statistique, 43(1), 45–57.
Brown, W. D., Bjork, A., Schneider, K., & Pitnick, S. (2004). No Evidence That
Polyandry Benefits Females in Drosophila melanogaster. Evolution, 58(6), 1242–1250.
Butail, S., & Paley, D. A. (2011). Three-dimensional reconstruction of the fast-start
swimming kinematics of densely schooling fish. Journal of The Royal Society Interface,
9(66), 77 -88.
Cabral, L. G., Foley, B. R., & Nuzhdin, S. V. (2008). Does Sex Trade with Violence
among Genotypes in Drosophila melanogaster? PLOS ONE, 3(4), e1986.
Campo, D., Lehmann, K., Fjeldsted, C., Souaiaia, T., Kao, J., & Nuzhdin, S. V. (2013).
Whole Genome Sequencing Of Two North American Drosophila Melanogaster
Populations Reveals Genetic Differentiation And Positive Selection. Molecular Ecology,
22(20), 5084–5097.
Candolin, U., Salesto, T., & Evers, M. (2007). Changed Environmental Conditions
Weaken Sexual Selection In Sticklebacks. Journal of Evolutionary Biology, 20(1), 233–
239.
Chaine, A. S., & Lyon, B. E. (2008). Adaptive Plasticity in Female Mate Choice
Dampens Sexual Selection on Male Ornaments in the Lark Bunting. Science, 319(5862),
459 -462.
Chan, Y.-B., & Kravitz, E. A. (2007). Specific Subgroups Of Frum Neurons Control
Sexually Dimorphic Patterns Of Aggression In Drosophila Melanogaster. Proceedings of
the National Academy of Sciences , 104(49), 19577–19582.
Chenoweth, S. F., Rundle, H. D., & Blows, M. W. (2010). Experimental Evidence For
The Evolution Of Indirect Genetic Effects: Changes In The Interaction Effect Coefficient,
Psi (Ψ), Due To Sexual Selection. Evolution, 64(6), 1849–1856.
115
Chippindale, A. K., Gibson, J. R., & Rice, W. R. (2001). Negative Genetic Correlation
For Adult Fitness Between Sexes Reveals Ontogenetic Conflict In Drosophila.
Proceedings of the National Academy of Sciences of the United States of America, 98(4),
1671–1675.
Chiu, J. C., Low, K. H., Pike, D. H., Yildirim, E., & Edery, I. (2010). Assaying
Locomotor Activity to Study Circadian Rhythms and Sleep Parameters in Drosophila.
Journal of Visualized Experiments : JoVE, (43), 2157.
Cho, W., Heberlein, U., & Wolf, F. W. (2004). Habituation Of An Odorant-Induced
Startle Response In Drosophila. Genes, Brain and Behavior, 3(3), 127–137.
Connallon, T. (2015). The Geography Of Sex-Specific Selection, Local Adaptation, And
Sexual Dimorphism. Evolution, 69(9), 2333–2344.
Connallon, T., & Clark, A. G. (2014). Balancing Selection in Species with Separate
Sexes: Insights from Fisher’s Geometric Model. Genetics, 197(3), 991-1006.
Cox, D. R. (1972). Regression Models and Life-Tables. Journal of the Royal Statistical
Society. Series B (Methodological), 34(2), 187–220.
Cox, R. M., & Calsbeek, R. (2009). Sexually Antagonistic Selection, Sexual
Dimorphism, and the Resolution of Intralocus Sexual Conflict. The American Naturalist,
173(2), 176–187.
Dankert, H., Wang, L., Hoopfer, E. D., Anderson, D. J., & Perona, P. (2009). Automated
Monitoring And Analysis Of Social Behavior In Drosophila. Nature Methods, 6(4), 297–
303.
David, J. R., Moreteau, B., Gauthier, J. P., Pétavy, G., Stockel, A., & Imasheva, A. G.
(1994). Reaction Norms Of Size Characters In Relation To Growth Temperature In
Drosophila Melanogaster: An Isofemale Lines Analysis. Genetics, Selection, Evolution :
GSE, 26(3), 229–251.
Delph, L. F., Andicoechea, J., Steven, J. C., Herlihy, C. R., Scarpino, S. V, & Bell, D. L.
(2011). Environment-dependent intralocus sexual conflict in a dioecious plant. New
Phytologist, 192(2), 542–552.
Devineni, A. V, & Heberlein, U. (2009). Preferential Ethanol Consumption in Drosophila
Models Features of Addiction. Current Biology, 19(24), 2126–2132.
Dingemanse, N. J., & Araya-Ajoy, Y. G. (2015). Interacting Personalities: Behavioural
Ecology Meets Quantitative Genetics. Trends in Ecology & Evolution, 30(2), 88–97.
Dorado, G., & Barbancho, M. (1984). Differential Responses In Drosophila Melanogaster
To Environmental Ethanol: Modification Of Fitness Components At The Adh Locus.
Heredity, 53(2), 309–320.
Duda, R. O., & Hart, P. E. (1972). Use of the Hough Transformation to Detect Lines and
Curves in Pictures. Commun. ACM, 15(1), 11–15.
Duffy, D. C., & Wissel, C. (1988). Models Of Fish School Size In Relation To
Environmental Productivity. Ecological Modelling, 40(3), 201–211.
Durbin, R., Eddy, S., Krogh, A., & Mitchison, G. (1998). Biological Sequence Analysis:
116
Probabilistic Models Of Proteins And Nucleic Acids. Cambridge University Press,
Cambridge.
Fiumera, A. C., Dumont, B. L., & Clark, A. G. (2006). Natural Variation In Male-
Induced “Cost-Of-Mating” And Allele-Specific Association With Male Reproductive
Genes In Drosophila Melanogaster. Philosophical Transactions of the Royal Society B:
Biological Sciences, 361(1466), 355-361.
Foerster, K., Coulson, T., Sheldon, B. C., Pemberton, J. M., Clutton-Brock, T. H., &
Kruuk, L. E. B. (2007). Sexually Antagonistic Genetic Variation For Fitness In Red Deer.
Nature, 447(7148), 1107–1110.
Foley, B. R., Saltz, J. B., Marjoram, P., & Nuzhdin, S. (2015). A Bayesian Approach to
Social Structure Uncovers Cryptic Regulation of Group Dynamics in Drosophila
Melanogaster. The American Naturalist, 185(6), 797–808.
Friberg, U. (2005). Genetic Variation in Male and Female Reproductive Characters
Associated with Sexual Conflict in Drosophila melanogaster. Behavior Genetics, 35(4),
455–462.
Gay, L., Brown, E., Tregenza, T., Pincheira-Donoso, D., Eady, P. E., Vasudev, R., …
Hosken, D. J. (2011). The Genetic Architecture Of Sexual Conflict: Male Harm And
Female Resistance In Callosobruchus Maculatus. Journal of Evolutionary Biology, 24(2),
449–456.
Gibson, J. B., May, T. W., & Wilks, A. V. (1981). Genetic Variation At The Alcohol
Dehydrogenase Locus In Drosophila Melanogaster In Relation To Environmental
Variation: Ethanol Levels In Breeding Sites And Allozyme Frequencies. Oecologia,
51(2), 191–198.
Gibson, J. B., & Wilks, A. V. (1988). The Alcohol Dehydrogenase Polymorphism Of
Drosophila Melanogaster In Relation To Environmental Ethanol, Ethanol Tolerance And
Alcohol Dehydrogenase Activity. Heredity, 60(3), 403–414.
Gilburn, A. S., & Day, T. H. (1994). Evolution of Female Choice in Seaweed Flies:
Fisherian and Good Genes Mechanisms Operate in Different Populations. Proceedings of
the Royal Society of London. Series B: Biological Sciences, 255(1343), 159-165.
Gomez-Marin, A., Partoune, N., Stephens, G. J., & Louis, M. (2012). Automated
Tracking of Animal Posture and Movement during Exploration and Sensory Orientation
Behaviors. PLOS ONE, 7(8), e41642.
Gosden, T. P., & Svensson, E. I. (2009). Density ‐ Dependent Male Mating Harassment,
Female Resistance, and Male Mimicry. The American Naturalist, 173(6), 709–721.
Grosjean, Y., Rytz, R., Farine, J.-P., Abuin, L., Cortot, J., Jefferis, G. S. X. E., & Benton,
R. (2011). An Olfactory Receptor For Food-Derived Odours Promotes Male Courtship In
Drosophila. Nature, 478(7368), 236–240.
Guo, P., & Yokoyama, K. (2012). Survival Analysis of Victims of Sulfur Oxide Air
Pollution Suffering from COPD or Asthma in Yokkaichi, Japan, in Relation to
Predisposing Exposure. Journal of Environmental Protection, 3, 1251–1259.
Gurganus, M. C., Fry, J. D., Nuzhdin, S. V, Pasyukova, E. G., Lyman, R. F., & Mackay,
117
T. F. (1998). Genotype-Environment Interaction At Quantitative Trait Loci Affecting
Sensory Bristle Number In Drosophila Melanogaster. Genetics, 149(4), 1883–1898.
Hall, M., Foley, B., Cheung, E., Abbasi, M., & Churches, N. D. (2016). A New
Behavioral Phenotyping Strategy for Pacific Oyster (Crassostrea gigas) Larvae Reveals
Cohort-Level Effects on Copper Toxicity Swimming Response. Annals of Aquaculture
and Research, 3(3), 1025.
Han, B., Roberts, W., Wu, D., & Li, J. (2007). Robust Feature-based Object Tracking.
Proc. of SPIE, 6568.
Harano, T., Okada, K., Nakayama, S., Miyatake, T., & Hosken, D. J. (2010). Intralocus
Sexual Conflict Unresolved by Sex-Limited Trait Expression. Current Biology, 20(22),
2036–2039.
Harris, W. E., McKane, A. J., & Wolf, J. B. (2008). The Maintenance Of Heritable
Variation Through Social Competition. Evolution, 62(2), 337–347.
Hawkes, L. A., Butler, P. J., Frappell, P. B., Meir, J. U., Milsom, W. K., Scott, G. R., &
Bishop, C. M. (2014). Maximum Running Speed of Captive Bar-Headed Geese Is
Unaffected by Severe Hypoxia. PLOS ONE, 9(4), e94015.
Hayes, P. M., Liu, B. H., Knapp, S. J., Chen, F., Jones, B., Blake, T., … Kleinhofs, A.
(1993). Quantitative Trait Locus Effects And Environmental Interaction In A Sample Of
North American Barley Germ Plasm. Theoretical and Applied Genetics, 87(3), 392–401.
Histed, M. H., & Maunsell, J. H. R. (2014). Cortical Neural Populations Can Guide
Behavior By Integrating Inputs Linearly , independent of synchrony. Proceedings of the
National Academy of Sciences of the United States of America, 111(1) 178-187.
Hölldobler, B., & Wilson, E. O. (1990). The Ants. Harvard University Press.
Hosmer, D. W., Lemeshow, S., & May, S. (2008). Applied Survival Analysis: Regression
Modeling of Time to Event Data. Wiley (2nd ed.).
Hougard, P. (2000). Analysis of Multivariate Survival Data. Springer.
Houle, D. (1998). How Should We Explain Variation In The Genetic Variance Of Traits?
Genetica, 102(0), 241.
Huang, Y., & Essa, I. (2005). Tracking Multiple Objects Through Occlusions. 2005 IEEE
Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05).
Innocenti, P., & Morrow, E. H. (2010). The Sexually Antagonistic Genes of Drosophila
melanogaster. PLOS Biology, 8(3), e1000335.
Ivanina, A. V, Hawkins, C., & Sokolova, I. M. (2014). Immunomodulation By The
Interactive Effects Of Cadmium And Hypercapnia In Marine Bivalves Crassostrea
Virginica And Mercenaria Mercenaria. Fish & Shellfish Immunology, 37(2), 299–312.
Jelinek, F. (1976). Continuous Speech Recognition by Statistical Methods, 64(4), 532–
556.
Johnson, D. W., Monro, K., & Marshall, D. J. (2013). The Maintenance Of Sperm
Variability: Context-Dependent Selection On Sperm Morphology In A Broadcast
Spawning Invertebrate. Evolution, 67(5), 1383–1395.
118
Johnson, T., & Barton, N. (2005). Theoretical models of selection and mutation on
quantitative traits. Philosophical Transactions of the Royal Society B: Biological
Sciences, 360(1459), 1411-1425.
Jordan, K. W., Morgan, T. J., & Mackay, T. F. C. (2006). Quantitative Trait Loci for
Locomotor Behavior in Drosophila melanogaster. Genetics, 174(1), 271 LP-284.
Kabra, M., Robie, A. A., Rivera-Alba, M., Branson, S., & Branson, K. (2013). JAABA:
Interactive Machine Learning For Automatic Annotation Of Animal Behavior. Nat Meth,
10(1), 64–67.
Kain, J., Stokes, C., Gaudry, Q., Song, X., Foley, J., Wilson, R., & de Bivort, B. (2013).
Leg-Tracking And Automated Behavioural Classification In Drosophila. Nature
Communications, 4, 1910.
Kaplan, M. (1983). Commentary 1. Pharmacotherapy: The Journal of Human
Pharmacology and Drug Therapy, 3(4), 191.
Kaplan, & Meier, P. (1958). Nonparametric Estimation from Incomplete Observations.
Journal of the American Statistical Association, 53(282), 457–481.
Kaun, K. R., Devineni, A. V, & Heberlein, U. (2012). Drosophila Melanogaster As A
Model To Study Drug Addiction. Human Genetics, 131(6), 959–975.
Kavasidis, I., Palazzo, S., Di Salvo, R., Giordano, D., & Spampinato, C. (2012). A Semi-
automatic Tool for Detection and Tracking Ground Truth Generation in Videos. In
Proceedings of the 1st International Workshop on Visual Interfaces for Ground Truth
Collection in Computer Vision Applications (p. 6:1--6:5). New York, NY, USA.
Khan, Z., Balch, T., & Dellaert, F. (2005). MCMC-based Particle Filtering For Tracking
A Variable Number Of Interacting Targets. IEEE Transactions on Pattern Analysis and
Machine Intelligence.
Kimura, T., Ohashi, M., Crailsheim, K., Schmickl, T., Okada, R., Radspieler, G., &
Ikeno, H. (2014). Development of a New Method to Track Multiple Honey Bees with
Complex Behaviors on a Flat Laboratory Arena. PLOS ONE, 9(1), e84656.
Kiryati, N., Eldar, Y., & Bruckstein, A. M. (1991). A Probabilistic Hough Transform.
Pattern Recognition, 24(4), 303–316.
Klarsfeld, A., Leloup, J.-C., & Rouyer, F. (2003). Circadian Rhythms Of Locomotor
Activity In Drosophila. Behavioural Processes, 64(2), 161–175.
Kohn, G. M., King, A. P., Scherschel, L. L., & West, M. J. (2011). Social Niches And
Sex Assortment: Uncovering The Developmental Ecology Of Brown-Headed Cowbirds,
Molothrus Ater. Animal Behaviour, 82(5), 1015–1022.
Krupa, J. J., & Sih, A. (1993). Experimental Studies On Water Strider Mating Dynamics:
Spatial Variation In Density And Sex Ratio. Behavioral Ecology and Sociobiology, 33(2),
107–120.
Kubo, J., Cullen, M. R., Desai, M., & Modrek, S. (2013). Associations Between
Employee And Manager Gender: Impacts On Gender-Specific Risk Of Acute
Occupational Injury In Metal Manufacturing. BMC Public Health, 13(1), 1053.
119
Kubo, J., Goldstein, B. A., Cantley, L. F., Tessier-Sherman, B., Galusha, D., Slade, M.
D., … Cullen, M. R. (2014). Contribution Of Health Status And Prevalent Chronic
Disease To Individual Risk For Workplace Injury In The Manufacturing Environment.
Occupational and Environmental Medicine, 71(3), 159-166.
Kuhn, H. W. (1955). The Hungarian Method For The Assignment Problem. Naval
Research Logistics Quarterly, 2(1–2), 83–97.
Kuijper, B., Stewart, A. D., & Rice, W. R. (2006). The Cost Of Mating Rises Nonlinearly
With Copulation Frequency In A Laboratory Population Of Drosophila Melanogaster.
Journal of Evolutionary Biology, 19(6), 1795–1802.
Kusumi, A., Tsunoyama, T. A., Hirosawa, K. M., Kasai, R. S., & Fujiwara, T. K. (2014).
Tracking Single Molecules At Work In Living Cells. Nat Chem Biol, 10(7), 524–532.
Lawless, J. F., & Cook, R. J. (2007). The Statistical Analysis of Recurrent. Springer.
Lee, H.-G., Kim, Y.-C., Dunning, J. S., & Han, K.-A. (2008). Recurring Ethanol
Exposure Induces Disinhibited Courtship in Drosophila. PLOS ONE, 3(1), e1391.
Li, Y., Fink, C., El-Kholy, S., & Roeder, T. (2015). The Octopamine Receptor Octß2r Is
Essential For Ovulation And Fertilization In The Fruit Fly Drosophila melanogaster.
Archives of Insect Biochemistry and Physiology, 88(3), 168–178.
Lin, D. Y., & Wei, J. L. (1989). The Robust Inference for the Cox Proportional Hazards
Model. Journal of the American Statistical Association, 84(408), 1074–1078.
Long, T. A. F., & Rice, W. R. (2007). Adult Locomotory Activity Mediates Intralocus
Sexual Conflict In A Laboratory-Adapted Population Of Drosophila melanogaster.
Proceedings of the Royal Society B: Biological Sciences, 274(1629), 3105-3112.
Lyon, B. E., & Montgomerie, R. (2012). Sexual Selection Is A Form Of Social Selection.
Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1600),
2266-2273.
Mai, H., Cachot, J., Brune, J., Geffard, O., Belles, A., Budzinski, H., & Morin, B. (2012).
Embryotoxic And Genotoxic Effects Of Heavy Metals And Pesticides On Early Life
Stages Of Pacific Oyster (Crassostrea gigas). Marine Pollution Bulletin, 64(12), 2663–
2670.
Maklakov, A. A., & Arnqvist, G. (2009). Testing for Direct and Indirect Effects of Mate
Choice by Manipulating Female Choosiness. Current Biology, 19(22), 1903–1906.
Maples, T., & Rothenfluh, A. (2011). A Simple Way to Measure Ethanol Sensitivity in
Flies. Journal of Visualized Experiments : JoVE, (48), 2541.
Martin, I., & Grotewiel, M. S. (2006). Distinct Genetic Influences On Locomotor
Senescence In Drosophila Revealed By A Series Of Metrical Analyses. Experimental
Gerontology, 41(9), 877–881.
McClure, K. D., French, R. L., & Heberlein, U. (2011). A Drosophila Model For Fetal
Alcohol Syndrome Disorders: Role For The Insulin Pathway. Disease Models
& Mechanisms, 4(3), 335-346.
McGlothlin, J. W., & Brodie III, E. D. (2009). How To Measure Indirect Genetic Effects:
120
The Congruence Of Trait-Based And Variance-Partitioning Approaches. Evolution,
63(7), 1785–1795.
McLaughlin, R. A. (1998). Randomized Hough Transform: Improved Ellipse Detection
With Comparison. Pattern Recognition Letters 19(3-4):299-305.
Mench, J. (1998). Why It Is Important to Understand Animal Behavior. ILAR Journal,
39(1), 20–26.
Mersch, D. P., Crespi, A., & Keller, L. (2013). Tracking Individuals Shows Spatial
Fidelity Is a Key Regulator of Ant Social Organization. Science, 340(6136), 1090 -1093.
Meyer, B. D. (1990). Unemployment Insurance and Unemployment Spells.
Econometrica, 58(4), 757–782.
Milan, N. F., Kacsoh, B. Z., & Schlenke, T. A. (2012). Alcohol Consumption as Self-
Medication against Blood-Borne Parasites in the Fruit Fly. Current Biology, 22(6), 488–
493.
Miller, C. W., & Svensson, E. I. (2014). Sexual Selection in Complex Environments.
Annual Review of Entomology, 59(1), 427–445.
Mojica, J. P., Lee, Y. W. H. A., Willis, J. H., & Kelly, J. K. (2012). Spatially And
Temporally Varying Selection On Intrapopulation Quantitative Trait Loci For A Life
History Trade-Off In Mimulus Guttatus. Molecular Ecology, 21(15), 3718–3728.
Moore, A. J., Brodie, E. D., & Wolf, J. B. (1997). Interacting Phenotypes and the
Evolutionary Process: I. Direct and Indirect Genetic Effects of Social Interactions.
Evolution, 51(5), 1352–1362.
Morozova, T. V, Anholt, R. R. H., & Mackay, T. F. C. (2006). Transcriptional Response
To Alcohol Exposure In Drosophila Melanogaster. Genome Biology, 7(10), R95.
Nilsen, S. P., Chan, Y.-B., Huber, R., & Kravitz, E. A. (2004). Gender-Selective Patterns
Of Aggressive Behavior In Drosophila Melanogaster. Proceedings of the National
Academy of Sciences of the United States of America , 101(33), 12342–12347.
Ofstad, T. A., Zuker, C. S., & Reiser, M. B. (2011). Visual Place Learning in Drosophila
melanogaster. Nature, 474(7350), 204–207.
Ohayon, S., Avni, O., Taylor, A. L., Perona, P., & Roian Egnor, S. E. (2013). Automated
Multi-Day Tracking Of Marked Mice For The Analysis Of Social Behaviour. Journal of
Neuroscience Methods, 219(1), 10–19.
Ohyama, T., Jovanic, T., Denisov, G., Dang, T. C., Hoffmann, D., Kerr, R. A., & Zlatic,
M. (2013). High-Throughput Analysis of Stimulus-Evoked Behaviors in Drosophila
Larva Reveals Multiple Modality-Specific Escape Strategies. PLOS ONE, 8(8), e71706.
Orteiza, N., Linder, J. E., & Rice, W. R. (2005). Sexy Sons From Re-Mating Do Not
Recoup The Direct Costs Of Harmful Male Interactions In The Drosophila Melanogaster
Laboratory Model System. Journal of Evolutionary Biology, 18(5), 1315–1323.
Parsch, J., & Ellegren, H. (2013). The Evolutionary Causes And Consequences Of Sex-
Biased Gene Expression. Nat Rev Genet, 14(2), 83–87.
Partridge, L., Green, A., & Fowler, K. (1987). Effects Of Egg-Production And Of
121
Exposure To Males On Female Survival In Drosophila Melanogaster. Journal of Insect
Physiology, 33(10), 745–749.
Patterson, T. a, Thomas, L., Wilcox, C., Ovaskainen, O., & Matthiopoulos, J. (2008).
State-Space Models Of Individual Animal Movement. Trends in Ecology & Evolution,
23(2), 87–94.
Perez-Escudero, A., Vicente-Page, J., Hinz, R. C., Arganda, S., & de Polavieja, G. G.
(2014). idTracker: Tracking Individuals In A Group By Automatic Identification Of
Unmarked Animals. Nat Meth, 11(7), 743–748.
Piccardi, M. (2004). Background Subtraction Techniques: A Review. 2004 IEEE
International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).
Pohl, J. B., Baldwin, B. A., Dinh, B. L., Rahman, P., Smerek, D., Prado, F. J., …
Atkinson, N. S. (2012). Ethanol Preference in Drosophila melanogaster is Driven by Its
Caloric Value. Alcoholism: Clinical and Experimental Research, 36(11), 1903–1912.
Poiesi, F., & Cavallaro, A. (2015). Tracking Multiple High-Density Homogeneous
Targets. IEEE Transactions on Circuits and Systems for Video Technology.
Prentice, R. L., Williams, B. J., & Peterson, A. V. (1981). On The Regression Analysis
Of Multivariate Failure Time Data. Biometrika, 68(2), 373–379.
Prinja, S., Gupta, N., & Verma, R. (2010). Censoring In Clinical Trials: Review Of
Survival Analysis Techniques. Indian Journal of Community Medicine, 35(2), 217–221.
Rivera-Duarte, I., Rosen, G., Lapota, D., Chadwick, D. B., Kear-Padilla, L., & Zirino, A.
(2005). Copper Toxicity to Larval Stages of Three Marine Invertebrates and Copper
Complexation Capacity in San Diego Bay, California. Environmental Science &
Technology, 39(6), 1542–1546.
Robinson, M. R., van Doorn, G., Gustafsson, L., & Qvarnström, A. (2012). Environment-
Dependent Selection On Mate Choice In A Natural Population Of Birds. Ecology Letters,
15(6), 611–618.
Rondeau, V., Commenges, D., & Joly, P. (2003). Maximum Penalized Likelihood
Estimation in a Gamma-Frailty Model. Lifetime Data Analysis, 9(2), 139–153.
Rostant, W. G., Kay, C., Wedell, N., & Hosken, D. J. (2015). Sexual Conflict Maintains
Variation At An Insecticide Resistance Locus. BMC Biology, 13(1), 34.
Roulin, A., & Salamin, N. (2010). Insularity And The Evolution Of Melanism, Sexual
Dichromatism And Body Size In The Worldwide-Distributed Barn Owl. Journal of
Evolutionary Biology, 23(5), 925–934.
Roweis, S. T. (1999). Constraint Hidden Markov Models. In Proc. of the International
Conference of Advances in Neural Information Processing System (pp. 782–788).
Denver, USA.
Saltz, J. B., Foley, B. R., Hoffmann, A. E. A. A., & Bronstein, E. J. L. (2011). Natural
Genetic Variation in Social Niche Construction: Social Effects of Aggression Drive
Disruptive Sexual Selection in
Drosophila melanogaster
. The American
Naturalist, 177(5), 645–654.
122
Sezgin, M., & Sankur, B. (2004). Survey Over Image Thresholding Techniques And
Quantitative Performance Evaluation. Journal of Electronic Imaging, 13(1), 146–168.
Sheng, X., Hu, Y. H., & Ramanathan, P. (2005). Distributed Particle Filter With GMM
Approximation For Multiple Targets Localization And Tracking In Wireless Sensor
Network. IPSN 2005. Fourth International Symposium on Information Processing in
Sensor Networks, 2005.
Signor, S. A., Abbasi, M., Marjoram, P., & Nuzhdin, S. V. (2017). Social Effects For
Locomotion Vary Between Environments In Drosophila Melanogaster Females.
Evolution, in press.
Slatyer, R. A., Mautz, B. S., Backwell, P. R. Y., & Jennions, M. D. (2012). Estimating
Genetic Benefits Of Polyandry From Experimental Studies: A Meta-Analysis. Biological
Reviews, 87(1), 1–33.
Sobaszek, L., & Gola, A. (2016). Survival Analysis Method As A Tool For
Predictingmachine Failures. Actual Problems of Economics, 177(3), 421–428.
Soibam, B., Mann, M., Liu, L., Tran, J., Lobaina, M., Kang, Y. Y., … Roman, G. (2012).
Open-Field Arena Boundary Is A Primary Object Of Exploration For Drosophila. Brain
and Behavior, 2(2), 97–108.
Stauffer, C., & Grimson, W. E. . (1999). Adaptive background mixture models for real-
time tracking. In CVPR (pp. 246–252).
Stewart, A. D., Hannes, A. M., Mirzatuny, A., & Rice, W. R. (2008). Sexual Conflict Is
Not Counterbalanced By Good Genes In The Laboratory Drosophila Melanogaster Model
System. Journal of Evolutionary Biology, 21(6), 1808–1813.
Sumpter, D. J. T. (2006). The Principles Of Collective Animal Behaviour. Philosophical
Transactions of the Royal Society B: Biological Sciences, 361(1465), 5-22.
Svensson, E. I., Eroukhmanoff, F., Karlsson, K., Runemark, A., & Brodin, A. (2010). A
Role For Learning In Population Divergence Of Mate Preferences. Evolution, 64(11),
3101–3113.
Taff, C. C., Freeman-Gallant, C. R., Dunn, P. O., & Whittingham, L. A. (2013). Spatial
Distribution Of Nests Constrains The Strength Of Sexual Selection In A Warbler.
Journal of Evolutionary Biology, 26(7), 1392–1405.
Tan, L., Schedl, P., Song, H.-J., Garza, D., & Konsolaki, M. (2008). The Toll→NFκB
Signaling Pathway Mediates the Neuropathological Effects of the Human Alzheimer’s
Aβ42 Polypeptide in Drosophila. PLOS ONE, 3(12), e3966.
Therneau, T. M., & Grambsch, P. M. (2000). Modeling Survival Data: Extending the Cox
Model. Springer.
Tompkins, L., Gross, A. C., Hall, J. C., Gailey, D. A., & Siegel, R. W. (1982). The Role
Of Female Movement In The Sexual Behavior Of Drosophila Melanogaster. Behavior
Genetics, 12(3), 295–307.
van der Linde, K., Fumagalli, E., Roman, G., & Lyons, L. C. (2014). The FlyBar:
Administering Alcohol to Flies, J. Vis. Exp. (87), e50442.
123
Van Doorn, G. S. (2009). Intralocus Sexual Conflict. Annals of the New York Academy of
Sciences, 1168(1), 52–71.
Viterbi, A. J. (1967). Error Bounds for Convolutional Codes and an Asymptotically
Optimum Decoding Algorithm. IEEE Transactions on Information Theory, I13(2), 260–
269.
Wahlsten, D. (2001). Standardizing Tests Of Mouse Behavior: Reasons,
Recommendations, And Reality. Physiology & Behavior, 73(5), 695–704.
Wang, B., Zhang, S., Yue, K., & Wang, X.-D. (2013). The Recurrence And Survival Of
Oral Squamous Cell Carcinoma: A Report Of 275 Cases. Chinese Journal of Cancer,
32(11), 614–618.
Ward, J. H. (1963). Hierarchical Grouping To Optimize An Objective Function. Journal
of the American Statistical Association, 58(301), 236–244.
White, K. E., Humphrey, D. M., & Hirth, F. (2010). The Dopaminergic System in the
Aging Brain of Drosophila. Frontiers in Neuroscience, 4,205.
Whitehead, H. (1996). Babysitting, Dive Synchrony, And Indications Of Alloparental
Care In Sperm Whales. Behavioral Ecology and Sociobiology, 38(4), 237–244.
Wilson, E. O. (1996). Sociobiology: The New Synthesis. Cambridge, MA: Belknap of
Harvard UP.
Wolf, J. B. (2000). Indirect Genetic Effects And Gene Interactions. Epistasis And The
Evolutionary Process. New York N.Y.: Oxford Univ. Press.
Wolf, J. B., Brodie III, E. D., Cheverud, J. M., Moore, A. J., & Wade, M. J. (1998).
Evolutionary Consequences Of Indirect Genetic Effects. Trends in Ecology & Evolution,
13(2), 64–69.
Yang, H.-P., & Nuzhdin, S. V. (2003). Fitness Costs of Doc Expression Are Insufficient
to Stabilize Its Copy Number in Drosophila melanogaster. Molecular Biology and
Evolution, 20(5), 800–804.
Yilmaz, A., Li, X., & Shah, M. (2004). Contour-Based Object Tracking With Occlusion
Handling In Video Acquired Using Mobile Cameras. IEEE Transactions on Pattern
Analysis and Machine Intelligence.
Yong Zhou, Y. L. (2014). A Traversing and Merging Algorithm of Blobs in Moving
Object Detection. Natural Sciences, 8(1L), 327--331.
Abstract (if available)
Abstract
This thesis focuses on animal behavior, presenting a variety of work that first identifies how and where animals (in this case mostly Drosophila) are moving, and then aims to understand the underlying dynamics of the behaviors that are reflected in those movements. We introduce an automatic tracking method for the high-throughput analysis of Drosophila behavior. This tracking method is based on the assumption of there being significant movement of the flies throughout the video. After finding the movement trajectories of animals, the goal is often to understand their behavior. This is frequently done using mathematical models and machine learning algorithms. Machine learning algorithms are mostly used in order to train classifiers to detect animal behaviors using the tracking data. This approach uses manually-defined features extracted from the tracking data. The resulting behavioral annotation can then be used in mathematical models, which aim to better understand both the behaviors and the relationships between them. These mathematical models range from simple linear regressions, which try to understand the effect of different variables on the behavior of animals, to more complex models such as Markov Models. In our study we aim to use statistical analysis in order to understand the behavior of Drosophila Melanogaster. We explore the effects of ethanol exposure on social behaviors. To do so, we first develop a tracking method in Chapter 2 in order to collect movement data from high-throughput experiments on different genotypes of Drosophila Melanogaster in a variety of experimental settings and use these data in Chapter 3 to understand the effects of social environment on locomotory behavior. Although this tracking method is introduced for D. Melanogaster, it can also be adjusted and then used in the study of other animals such as fish and ants. Such tracking data of course contain errors, so in Chapter 4 we aim to further refine the predictions of our tracking algorithm to more accurately predict the number of flies on a patch (a simple, high-level summary of social behavior, i.e., group size). We do this by modeling the number of flies on a patch as the hidden state of a Hidden Markov Model (HMM), for which we estimate appropriate transition and emission probabilities. We then use recurrent survival analysis, and the Cox proportional hazard models, to test for the determinants of the group structures of flies. Finally, in Chapter 5 some novel computational tools for the study of animal behavior are introduced in addition to some examples of the use of these tools. These tools are useful in cases where the experiment setup does not allow for the simple use of automatic tracking software.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Biological interactions on the behavioral, genomic, and ecological scale: investigating patterns in Drosophila melanogaster of the southeast United States and Caribbean islands
PDF
Animal behavior pattern annotation and performance evaluation
PDF
Model selection methods for genome wide association studies and statistical analysis of RNA seq data
PDF
Plant genome wide association studies and improvement of the linear mixed model by applying the weighted relationship matrix
PDF
Bayesian analysis of transcriptomic and genomic data
PDF
Automatic tracking of protein vesicles
PDF
Understanding the characteristic of single nucleotide variants
PDF
Robustness and stochasticity in Drosophila development
PDF
Essays on bioinformatics and social network analysis: statistical and computational methods for complex systems
PDF
Innovative sequencing techniques elucidate gene regulatory evolution in Drosophila
PDF
Data modeling approaches for continuous neuroimaging genetics
PDF
Investigating the evolution of gene networks through simulated populations
PDF
Ancestral inference and cancer stem cell dynamics in colorectal tumors
PDF
Bayesian models for a respiratory biomarker with an underlying deterministic model in population research
PDF
Nonlinear modeling and machine learning methods for environmental epidemiology
PDF
Deciphering protein-nucleic acid interactions with artificial intelligence
PDF
Comparison of nonlinear mixed effect modeling methods for exhaled nitric oxide
PDF
Genome-wide studies of protein–DNA binding: beyond sequence towards biophysical and physicochemical models
PDF
Identification and analysis of shared epigenetic changes in extraembryonic development and tumorigenesis
PDF
Simulating the helicase motor of SV40 large tumor antigen
Asset Metadata
Creator
Abbasi, Mohammad
(author)
Core Title
Automatic tracking of flies and the analysis of fly behavior
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Computational Biology and Bioinformatics
Publication Date
07/21/2018
Defense Date
06/20/2017
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
animal behavior,automatic tracking,computer vision,Cox proportional hazard model,Drosophila behavior,hidden Markov models,mixed effect model,OAI-PMH Harvest
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Tavaré, Simon (
committee chair
), Eckel, Sandy (
committee member
), Marjoram, Paul (
committee member
), Nuzhdin, Sergey (
committee member
)
Creator Email
abbaside@usc.edu,m.e.abbasi.d@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-415063
Unique identifier
UC11214581
Identifier
etd-AbbasiMoha-5620.pdf (filename),usctheses-c40-415063 (legacy record id)
Legacy Identifier
etd-AbbasiMoha-5620-1.pdf
Dmrecord
415063
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Abbasi, Mohammad
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
animal behavior
automatic tracking
computer vision
Cox proportional hazard model
Drosophila behavior
hidden Markov models
mixed effect model