Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Assessing the value of crowdsourced data in aiding first responders: a case study of the 2013 Boston Marathon
(USC Thesis Other)
Assessing the value of crowdsourced data in aiding first responders: a case study of the 2013 Boston Marathon
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
i
Assessing the Value of Crowdsourced Data in Aiding First Responders:
A Case Study of the 2013 Boston Marathon
by
Devlin Quinlan Howieson
A Thesis Presented to the
Faculty of the USC Graduate School
University of Southern California
In Partial Fulfillment of the
Requirements for the Degree
Master of Science
(Geographic Information Science and Technology)
August 2018
ii
Copyright © 2018 by Devlin Howieson
iii
To my children: Donovan, Fiona, and Malcolm
iv
Table of Contents
List of Figures ............................................................................................................................... vii
List of Tables ............................................................................................................................... viii
Acknowledgements ........................................................................................................................ ix
List of Abbreviations ...................................................................................................................... x
Abstract .......................................................................................................................................... xi
Introduction .................................................................................................................... 1
1.1. Definitions...........................................................................................................................1
1.2. Motivation ...........................................................................................................................3
1.3. Objectives ...........................................................................................................................4
1.4. Boston Marathon and Bombing Events ..............................................................................4
1.5. Study Area ..........................................................................................................................6
1.6. Thesis Outline .....................................................................................................................7
Related Work.................................................................................................................. 9
2.1. Crowdsourced Data .............................................................................................................9
2.1.1. OpenStreetMap ........................................................................................................10
2.1.2. Twitter ......................................................................................................................10
2.2. Event Detection .................................................................................................................11
2.3. Terrorism...........................................................................................................................19
2.4. Data Quality ......................................................................................................................20
2.5. Conclusion ........................................................................................................................21
v
Methodology ................................................................................................................ 22
3.1. Twitter Data Analysis .......................................................................................................22
3.1.1. Obtaining IRB Approval and Requesting Twitter Data ...........................................23
3.1.2. Twitter Data Receipt, Conversion, and Filtering .....................................................23
3.2. Keyword Development .....................................................................................................26
3.2.1. Text Analysis ...........................................................................................................26
3.2.2. Interviews .................................................................................................................26
3.2.3. Finalize Keywords and Search Tweets ....................................................................29
3.3. Spatial and Temporal Analysis .........................................................................................29
3.3.1 Kernel Density .......................................................................................................29
3.3.2 Spatio-Temporal Analysis .....................................................................................30
Results .......................................................................................................................... 31
4.1. Establishing Keywords .....................................................................................................31
4.1.1. Text Analysis ...........................................................................................................31
4.1.2. Interview Responses ................................................................................................40
4.1.3. Keywords for Searches ............................................................................................41
4.2. Spatial Analysis ................................................................................................................44
4.2.1. Daily Results ............................................................................................................44
4.2.2. Spatial Analysis Takeaways ....................................................................................54
4.3. Spatio-Temporal Analysis ................................................................................................54
4.3.1. Analysis of Monday’s Tweets .................................................................................54
4.3.2. Spatio-Temporal Analysis Takeaways ....................................................................61
vi
Conclusion .................................................................................................................... 64
5.1. Findings.............................................................................................................................64
5.2. Study Limitations ..............................................................................................................66
5.2.1. Data Limitations.......................................................................................................67
5.2.2. Method Limitations ..................................................................................................68
5.3. Moving Forward ...............................................................................................................69
References ..................................................................................................................................... 71
Appendix A: Interview Protocol ................................................................................................... 74
vii
List of Figures
Figure 1 Events Surrounding the 2013 Boston Marathon Bombing .............................................. 6
Figure 2 Twenty-Five Mile Radius Around Boston Marathon Finish Line ................................... 7
Figure 3 Tweets in JSON Format ................................................................................................. 24
Figure 4 Geotagged Tweets Monday 15 April- Friday 19 April 2013 ......................................... 25
Figure 5 Density of Tweets on Monday 15 April 2013 ................................................................ 45
Figure 6 Density of Tweets on Tuesday 16 April 2013 ................................................................ 47
Figure 7 Density of Tweets on Wednesday 17 April 2013 ........................................................... 49
Figure 8 Density of Tweets on Thursday 18 April 2013 .............................................................. 51
Figure 9 Density of Tweets on Friday 19 April 2013 ................................................................... 53
Figure 10 First Tweets Mentioning Events, 15 April 2013 .......................................................... 55
Figure 11 Monday Tweets From First 15 Minutes After Bombing .............................................. 56
Figure 12 Monday Tweets Between 16 Minutes and 1 Hour After Bombing .............................. 58
Figure 13 Monday Tweets Over an Hour After Bombing ............................................................ 60
Figure 14 Tweets Within First Hour Containing Fuck ................................................................. 62
viii
List of Tables
Table 1 Tweets by Day Following Boston Marathon Bombing 2013 .......................................... 24
Table 2 First Responder Interviewees........................................................................................... 27
Table 3 Monday’s Ten Most Frequent and Words of Interest ...................................................... 32
Table 4 Tuesday’s Ten Most Frequent and Words of Interest ...................................................... 33
Table 5 Wednesday’s Ten Most Frequent and Words of Interest ................................................ 35
Table 6 Thursday’s Ten Most Frequent and Words of Interest .................................................... 36
Table 7 Friday’s Ten Most Frequent and Words of Interest ......................................................... 37
Table 8 Combined Days Ten Most Frequent and Words of Interest ............................................ 39
Table 9 Words Relating to Events provided by First Responders ................................................ 41
Table 10 Final List of Keywords for Each Day ............................................................................ 42
Table 11 Relevance of Tweets with the Word Fuck One Hour Following the Bombing ............. 43
ix
Acknowledgements
I am grateful to my advisor, Dr. Darren Ruddell, for his guidance and to my other faculty who
gave me assistance when needed. I am also indebted to the first responder interviewees who
generously shared their experiences and opinions. Finally, I would like to thank my wife,
Susannah Howieson, who pushed me to always do better; without her support I would not have
made it this far.
x
List of Abbreviations
AGI Ambient Geographic Information
API Application Program Interface
EMS Emergency Medical Services
FBI Federal Bureau of Investigation
GIS Geographic information system
GISci Geographic information science
IED Improvised Explosive Device
IRB Internal Review Board
JMA Japan Meteorological Agency
MIT Massachusetts Institute of Technology
NED New event detection
OSM OpenStreetMap
RED Retrospective event detection
SMI Social Media Ingestor
SSI Spatial Sciences Institute
SVM Support vector machine
US United States
USC University of Southern California
USGS United States Geological Survey
VGI Volunteered Geographic Information
xi
Abstract
Terrorism continues to be one the most significant security threats of our time. Recent terrorism
events include mass shootings and bombings in the U.S. and worldwide. First responders—law
enforcement, emergency medical services, and fire services—are responsible for managing the
chaos in the immediate aftermath of a terrorism event. Providing first responders with high
quality, detailed information as quickly as possible could greatly enhance their ability to respond
effectively. Recently, crowdsourced data available through platforms such as Twitter, Facebook,
and other social media outlets, have emerged as a potential source to aid first responders
following a terrorism event. The focus of this thesis is to determine if Twitter posts are a useful
source of intelligence for first responders. Mining this readily available data could also be useful
following a natural disaster.
The utility of twitter data for first responders was explored using a case study of the
events following the Boston Marathon bombing in 2013. Twitter data was collected via GNIP, a
social media API aggregation company. Through text analysis and interviews with first
responders, a list of relevant keywords was developed. Kernel density was used to determine
density of tweets in relation to events that took place from April 15th through April 19th, 2013.
Spatio-temporal analysis was conducted to show when and from where tweets were being sent
on April 15
th
, 2013. Results show that on Monday through Thursday the greatest density of
tweets was surrounding the bombsites; when events related to the suspects occurred on Thursday
and Friday, the density of tweets around those events increased. The spatio-temporal results
show that as the day progressed, the majority of tweets spread throughout the Boston
Metropolitan area. The overall finding of this thesis is that crowdsourced data, such as Twitter,
can provide potentially useful information to aid first responders following a terrorism event.
1
Introduction
According to Global Terrorism Database (GTD) over 150,000 incidents of terrorism occurred
between 1970 and 2015. Some recent terrorism events include the May 2017 bombing of an
arena in Manchester, England, United Kingdom, Pulse Nightclub shooting Orlando, Florida,
November 2015 Paris Attacks Paris, France, and the 2015 San Bernardino attack San
Bernardino, California. Since it is difficult to prevent these tragic events, assistance should be
provided to those who respond to mitigate the damage. First responders are those who arrive first
to an emergency to render assistance in the capacity of their duties. Time is a valuable asset for
first responders. The faster first responders can receive information, the better they are prepared
to respond to said situation. Geospatial information in the form of locations or places is one of
the important bits of data that is needed to respond to an emergency situation.
The communication of pertinent information via social media could be beneficial to first
responders. In the current smartphone age, geospatial data can be shared instantaneously on
social media. Social media is a new source of news and information. When incidents occur
people immediately post about what they have witnessed or are still witnessing. It is this quick
release of data that could be taken advantage of by first responders.
There are many areas in which geospatial intelligence can be beneficial, including for
military support, cybersecurity, resource mapping, and disaster response and management. This
thesis focuses on whether geospatial intelligence can benefit first responders after a terrorism
event using crowdsourced data from Twitter.
1.1. Definitions
In 2003, President George W. Bush released Homeland Security Presidential Directive 8
which defined first responder as “individuals who in the early stages of an incident are
2
responsible for the protection and preservation of life, property, evidence, and the environment,
including emergency response providers as defined in section 2 of the Homeland Security Act of
2002 (6 U.S.C. 101), as well as emergency management, public health, clinical care, public
works, and other skilled support personnel (such as equipment operators) that provide immediate
support services during prevention, response, and recovery operations” (HSPD-8). The
Homeland Security Act of 2002 defines emergency response providers as “Federal, State, and
local governmental and nongovernmental emergency public safety, fire, law enforcement,
emergency response, emergency medical (including hospital emergency facilities), and related
personnel, agencies, and authorities” (6 U.S.C. § 101(6)).
Social media contains geospatial information in a variety of ways. It can be done through
posts on Facebook, Twitter, YouTube, or other platforms. Volunteered geographic information
(VGI) describes any geospatial information that is volunteered through any number of sources,
from OpenStreetMap (OSM) to social media. Goodchild (2007) refers to VGI as user-generated
geographic information that is volunteered by non-professionals, and therefore may not be as
accurate as professionally created spatial data. Stefanidis et al. (2013) argues that geospatial
information transmitted by reference only should be called ambient geographic information
(AGI). This is because people are not directly volunteering geographic data via social media, but
are simply referencing it. These references are the key to gathering spatial information that can
provide intelligence to first responders. Crowdsourced data falls into the category of VGI. Many
of the terms that describe volunteered involvement relating to geographic information science
(VGI, crowdsourcing, AGI) are used interchangeably. This paper uses the term crowdsourced
data.
3
1.2. Motivation
This research is beneficial for spatial sciences because it looks at a relatively new area in
which spatial data can be collected. In Stefanidis et al. (2013), the authors suggest that humans
are transformed into sensors by reporting real time events through the use of Twitter. Having
hundreds if not thousands of different sensors reporting real time events in an emergency
situation could give first responders more on-the-ground intelligence.
There may be some downfalls to this data, however, which is why it is an important topic
to research thoroughly. The main issue that affects the utility of crowdsourced data is its
accuracy. Goodchild and Li (2012) note that the quality of the data is variable and highly
undocumented. The unknown quality of data does not preclude using crowdsourced data as data
source so long as it is verified. According to Comber et al. (2013), VGI can be accurate if it can
be cross-referenced with control data. However, identifying control data in an emergency may be
complicated.
It has only been a decade since the term VGI was coined by Goodchild. In that span there
have been many studies involving VGI and crowdsourcing. However, there appears to be a gap
in the research involving gathering intelligence from social media for use by first responders.
There is a lot of the research related to the emergency response and management of an
environmental hazard, but not relating to terrorism or other homeland security threats. For
example, Crooks et al (2013) used Twitter data to determine an epicenter and the reach of an
earthquake in Virginia in 2011. Another paper by Mills et al. (2009) discusses how Twitter can
be used as an emergency communication system. The authors describe first responders using
social media to communicate to the public, rather than the other way around. There is a lack of
research conducted on the effectiveness of using crowdsourced data to inform first responders.
4
1.3. Objectives
It is important for first responders to use any and all sources of potential information. It
could be beneficial for first responders to use crowdsourced data that contains either volunteered
or referenced geospatial information. The overall goal of this study is to determine if
crowdsourced data could aid first responders during a terrorism event. The study investigates the
following objectives:
(1) What type of information is found in crowdsourced data during a terrorism event
(2) What type of information within crowdsourced data could be useful to first
responders during a terrorism event.
(3) How geospatial information is included in crowdsourced data during a terrorism
event.
(4) How timely is the information found in crowdsourced data during a terrorism event.
This study uses on Twitter data from the Boston Metropolitan Area during the events
related to the 2013 Boston Marathon bombing. Through text analysis, interviews, spatial
analysis, and spatio-temporal analysis this study examines if crowdsourced data can aid first
responders during a terrorism event. If this paper shows that crowdsourced data contains
valuable geospatial intelligence, first responder agencies could set up systems to automatically
harvest relevant information in case of terrorism or other homeland security event.
1.4. Boston Marathon and Bombing Events
The Boston Marathon is an annual race held on the third Monday of April. The Boston
Athletic Association sponsors the 26-mile 385-yard race, which started in 1897. In 2013, the race
had its 117th running with 26,839 entrants. The event attracts 500,000 spectators making it New
England’s most viewed sporting event (Boston Athletic Association).
5
This thesis is using the Boston Marathon bombing events in Boston, Massachusetts
(Figure 1) as a case study. The events surrounding the Boston Marathon bombing lasted 5 days
from April 15th, 2013 to April 19th, 2013. On April 15th at 2:49 p.m., during the annual running
of the marathon, the first of two bombs was detonated at 671 Boylston Street; the second bomb
was detonated thirteen seconds later at 755 Boylston Street (After Action Report 2014).
Homemade improvised explosive devices (IEDs) hidden in backpacks caused the explosions.
The bombs took the lives of three individuals and injured 264 more (After Action Report 2014).
On April 18th, the Federal Bureau of Investigation (FBI) released photos of the two suspects.
That evening a Massachusetts Institute of Technology (MIT) Police Officer was fatally shot in
his marked police vehicle on the MIT campus in Cambridge, MA (After Action Report 2014).
Later that same night, the suspects committed a carjacking in the Allston neighborhood of
Boston. Before 1 a.m. on the 19th, Watertown, MA police responded to the location of the
carjacked vehicle and engaged in a firefight with the suspects, where one of the suspects was
killed and the other suspect escaped. The evening of April 19th, Watertown police received a
911 call from a resident reporting he saw the suspect in a boat parked in his yard at 67 Franklin
Street (After Action Report 2014). Police responded and took the suspect into custody at 8:41
p.m.
6
Figure 1 Events Surrounding the 2013 Boston Marathon Bombing
1.5. Study Area
The major events took place in Boston, Cambridge, and Watertown, Massachusetts.
These cities are part of the Boston metropolitan area, which has a population over 4 million. The
events are all within a 10-mile radius of the bombing event. This thesis studies a 25-mile radius
around the Boston Marathon finish line (Figure 2). This area was chosen because the suspects’
location was unknown at the time of the bombing and all the events fell within this area.
7
Figure 2 Twenty-Five Mile Radius Around Boston Marathon Finish Line
1.6. Thesis Outline
Following the concepts in the introduction, there is a review of the related literature,
methodology for assessing the use of Twitter data by first responders, results of the analyses, and
discussion of the findings.
In Chapter 2, related literature is reviewed. To be able to assess the usefulness of Twitter
in this study, previous studies in crowdsourcing are discussed. These studies focus on event
8
detection using Twitter and similar platforms. This chapter ends with a section on data quality,
which is a key concern working with VGI, such as crowdsourced data.
In Chapter 3, a methodology is presented. First this paper discusses the gathering of
Twitter data from a third party. Approval from USC’s Internal Review Board (IRB) was needed
to conduct interviews. Keyword development was based on text analysis of the Twitter data and
responses from interviews. Finally, it describes using kernel density for spatial analysis, and
tweet time stamps and coordinates to show spatio-temporal analysis.
Chapter 4 shows the results of text analysis and interviews used to establish keywords for
searches of geotagged tweets. The spatial analysis shows density of relevant geotagged tweets to
the events. The spatio-temporal analysis presents the relationship between time and space of the
relevant tweets.
Chapter 5 discusses conclusions from the results and takeaways from the overall process
of the thesis. This chapter also explores the limitations of the methods and suggests
improvements for future work. Lastly, it gives recommendations for future study and describes
how this project can potentially provide insight to future efforts to use Twitter for intelligence
gathering.
9
Related Work
Crowdsourced data has been shown to be a useful source of information with event detection.
However, there does not seem to be literature on crowdsourcing for intelligence gathering
purposes as it relates to first responders. This thesis will follow similar processes of some of the
research discussed in this chapter.
2.1. Crowdsourced Data
Crowdsourced data falls into the category of VGI. Goodchild (2007) was the first to coin
the term VGI, which describes an information-gathering process that uses people who have no
formal qualifications to voluntarily collect geographic information. Examples of VGI include
OpenStreetMap (OSM), Twitter, and “Did you Feel it”.
Building on Goodchild’s concept, Stefanidis et al. (2013) suggest the use of social media
produces a new form of VGI that they call ambient geographical information (AGI). People
using social media are not directly providing the geospatial information, but still transmit it in a
variety of ways. Geospatial information in social media posts can be geotagged (contain
coordinates) or can refer to place (e.g. Boston, MA). It is through crowdsourced data that
geospatial information can be harvested for the first responders.
There have been multiple studies in which crowdsourcing geographic information is
discussed. Crowdsourcing is a process of acquisition and analysis of big data generated by a
diverse number of sources (Xu et al. 2015; Xu et al. 2016). With advances in technology such as
the smartphone, millions of people globally now have the capability to create and share
geospatial data, whether its active (updating OSM) or passive (geotagged tweet) in nature.
10
Many of the terms that describe volunteer involvement relating to geographic information
science (GISci) (VGI, crowdsourcing, AGI) are used interchangeably (See et al. 2016; Spyratos
et al. 2014). This paper uses the term crowdsourced data for geotagged tweets.
2.1.1. OpenStreetMap
One example of the power of VGI is OpenStreetMap. OSM began as an open-source
street map of the world. It is an open-source application that anyone can contribute to. The
contributions come in the form of adding and updating data about roads, trails, cafés, railway
stations, and more. OSM is one of the best-known forms of crowdsourcing of geospatial data
(Heipke 2010; Goodchild and Li 2012). Over the last decade, OSM’s number of users has grown
in excess of 4 million (https://www.openstreetmap.org/stats/data_stats.html). This allows for
mass amounts of crowdsourced geospatial data to be shared and edited.
OSM has the ability to aid in natural disasters. After the 2010 Haiti earthquake, OSM was
used to provide rescue workers with geospatial data. A detailed city map of the Haitian capital,
Port-au-Prince, containing bridges, functioning infrastructure, and damaged buildings was
completed a few days after the earthquake (Heipke 2010). The ability to have a volunteer service
with firsthand knowledge of an area can allow for more comprehensive mapped areas. However,
completeness of an area depends on the population density, and more affluent areas are better
mapped then more deprived areas (Heipke 2010). In many situations, OSM can produce a
detailed, user-generated up-to-date map.
2.1.2. Twitter
Twitter is a micro-blogging application that began in 2006. It is currently the most
popular micro-blogging service, and is the eighth most popular site (from 2012) in the world
according to the three-month Alexa traffic rankings (Atefeh and Khreich 2015). Twitter allows
11
users to submit posts (tweets) containing 140-characters. These tweets are automatically posted
as a stream on the user’s profile and shared with the user’s network of followers. Posts can be
made using Twitter’s mobile app or via their website. These posts contain a wide variety of
topics, from sports, politics, and natural disasters. According to Twitter’s website
(https://about.twitter.com/company), there are currently 328 million active users, with 79 percent
outside the United States. Like other forms of crowdsourcing social media feeds, Twitter can
provide geographic information, whether it is from geotagged tweets or a reference to a location
in a tweet. Geotagged tweets are where coordinates are embedded into a tweet. However, this
feature’s default is off so users have to turn it on.
Unlike OSM, social media feeds are not a source where people purposely contribute
geographic information to update or expand a geographic database (Crooks et al. 2013). Twitter
users share geographic information actively by mentioning place or passively through geotagged
tweets. This is why Stefanidis et al. (2013) coined the term AGI, because people are not directly
volunteering geographic data but they are still creating it. Twitter has consequently become a
new potentially valuable source of geographic information. Twitter users are the sensors that
generate this information (Crooks et al 2013; Sakaki et al. 2013). With hundreds of millions of
users, there is potential to collect geographic information about a single event from multiple
sensors.
2.2. Event Detection
Being able to detect an emergency or major event is not a simple task. It takes a lot of
information to know when, what, and where an event is taking place. One of the areas in which
social media data is currently being used is event/emergency detection (Gulnerman and
Karaman, 2017). The increased use of social media during crises allows for information to be
12
used by authorities during emergencies (Yin et al. 2012). Spatial and temporal information can
be extracted from social media to detect real-time events (Xu et al. 2015). Since this a newer area
of study, event detection from Twitter creates new challenges (Atefeh et al. 2015). It is by
overcoming these challenges that this emerging field can prove helpful in event detection.
Environmental disasters affect every part of the globe, and come in many different forms.
They require geospatial information to prepare for, monitor during, and respond to. Kongthon et
al. (2012) mention that Twitter has had a growing role as a collector and distributor of
information regarding emergencies and disasters such as wildfires, floods, hurricanes,
earthquakes, and tsunamis. Through the use of crowdsourced data like Twitter posts, pertinent
information can be gathered about environmental disasters. This data can then be used by
agencies to respond to these disasters. The focus of this thesis will be on geospatial information
taken from Twitter.
Earthquakes are one of the most unpredictable disasters, which is why it is important to
collect information on the extent of an event. One of the earliest forms of crowdsourced
geospatial data relates to earthquakes. This is the United States Geological Survey’s (USGS)
“Did You Feel It?” project (http://earthquake.usgs.gov/earthquakes/dyfi/) that gathers
information from people who felt an earthquake (Heipke 2010; Crooks et al. 2013). Through the
use of social media data can be gathered real time by those affected by earthquakes.
Twitter can also be a used to gather data on earthquakes according to a number of sources
(Sakaki et al. 2010; Crooks et al. 2013; Sakaki et al. 2013; Stefanidis et al. 2013). These sources
refer to users as sensors sending out information in regards to the earthquakes they felt.
Sakai et al. (2010) investigate the real-time nature of Twitter, as it pertains to
earthquakes. Data was gathered using Twitter’s application program interface (API), and then
13
was classified as positive or negative via a support vector machine (SVM) algorithm based on
the content of the tweets. They use particle filtering to estimate the locations of earthquakes,
between August and October 2009. Particle filtering is an algorithm maintains a probability
distribution for the location estimation. They were able to create location estimates for 25
earthquakes from 621 tweets. They then developed an earthquake detection system. In Sakai et
al. (2013) an update to their previous research they reported on their system “Toretter”. Their
system was able to detect 93 percent stronger earthquakes than the Japan Meteorological Agency
(JMA) scale. However, the precision is low and the system can produce a lot of false-positives.
The authors admit they need to change certain conditions to gain better precision.
Crooks et al. (2013) analyze the spatial and temporal characteristics of Twitter activity
responding to an earthquake that occurred on the east coast of the United States in August 2011.
Their analysis consists of a three-step process of harvesting. These steps are extract data through
a providers API, store data in a local database, and analyze the data for information of interest.
They received a 1 percent random sample filtered within the contiguous 48 states over an eight-
hour period after the earthquake. From the sample 144,892 tweets referenced earthquakes and
21,362 were geolocated with coordinates. They show that tweets on the event appeared in
locations two to three minutes before the seismic wave. Through further analysis of tweets
within the first ten minutes they are able to approximate an impact area (Crooks et al. 2013).
This study shows that social media can be used to analyze the extent of the earthquake from its
epicenter.
Harvesting AGI from social media is the topic of the Stefanidis et al. (2013) paper. They
gather data through provider APIs and process it with their prototype system Social Media
Ingestor (SMI). The SMI analyzes the data to extract content of interest (tweets on earthquake),
14
and then inserts it into a database. Twitter data was collected on the Sendai earthquake in March
of 2011. Their analysis was able to identify two major distributors of information (users:
NHK_PR, asahi_tokyo); NHK_PR is a national news organization, and asahi_tokyo tweets about
local information. The data analyzed can identify clusters of users who share interests; with these
shared interests the main providers of information can distribute that information to large groups
of users. It is through the distribution of tweets from popular users that allows for the collection
of useful information on natural disasters.
Floods affect populations all over the world every year. As a hazard floods pose a
seasonal and extended time period threat (Palen et al. 2010). It is because of this seasonal/
extended threat that social media like Twitter can be a successful tool in preparing for and
responding to floods. Kongthon et al. (2012) suggests that Twitter has played an increasing role
in distributing information for emergencies such as floods. Floods are another disaster where
Twitter can be used to gather and distribute information about.
Starbird et al. (2010) and Palen et al. (2010) use the same data in researching the flooding
of the Red River Valley in early 2009. Both articles refer to the micro-blogging of Twitter as
computer mediated communication (CMC). The purpose of their research is to analyze CMC in
relationship to an emergency event (Red River Valley flood). They collected data from a 51 day
period between March and April 2009 via Twitter’s API. The keywords red river and redriver
returned 4,983 unique authors and a second search to collect the entire stream for each user
resulted in 4,592,466 tweets. Data was produced in different geographical locations, and relative
distance from the event. Local users or those connected another way to the area provided data
about the floods. Some regular users posted almost exclusively on flood-related matters during
critical times (Palen et al. 2010). This information came from original posts, to sharing
15
information from other posts/users. The authors suggest that official information remains
important and is complemented by social media communications.
Analyzing the role Twitter played in 2011 floods in Thailand was the focus of Kongthon
et al. (2012) paper. Thailand is a country prone to flooding during the rainy season. During this
period Twitter traffic in Thailand grew over 50 percent. For their research they collected 175,551
tweets with a keyword hashtag #thaiflood, this was narrowed down to 64,582 after removing
retweets and duplicates. The tweets then were separated into five categories: situational
announcements and alerts (39.1 percent), support announcements (10.2 percent), requests for
assistance (8.4 percent), requests for information (5 percent), and other (37.3 percent).
Government agencies receiving real-time information could use it in combination with requests
for assistance to provide help in a timely manner. Most of the top users during this period are
flood/disaster related government or private organizations. This allowed for citizens to choose
which sources of information they would follow during the flood to obtain the most relevant, up-
to-date, and credible information. The authors conclude that Twitter is effective tool for Thai
citizens to obtain and distribute real-time information during a disaster. However, like other
forms of VGI data quality is a main concern.
Social media has the ability for information to be distributed faster than standard forms of
communication. According to Yin et al. (2012), research was to use Twitter data to provide near-
real-time-data on an emergency situation. Using the Twitter API to search the Australia, and
New Zealand regions they collected 66 million tweets. Their data covered a wide variety of
topics such as cyclones, earthquakes, and floods. The data was processed using various methods,
including burst detection, text classification, online clustering, and geotagging. Burst-detection
method continuously monitors a Twitter feed and alerts when it detects an incident. This method
16
was able to have a detection rate of 72.13 percent with a false alarm rate below 2 percent. Text
classification used naive Bayes algorithm (86.2 percent detection rate) and SVM (87.5 percent
detection rate) to extract useful data, while also removing a list of stop words. Online clustering
uses an algorithm that groups tweets by similar topics. For geotagging they employ a tweet’s
coordinates or the location information from the user’s profile. Through the public’s collective
intelligence, proper authorities could better understand critical situations, and make the best
decisions for deploying aid, rescue, and recovery operations
Atefeh and Khreich (2015) discuss different types of techniques that have been used for
event detection with Twitter. The first technique classifies unspecified and specified event
detection. An unspecified detection relies on the temporal signal of Twitter to detect an event,
while specified detection relies on specific information that is a known event. The authors
classify tweets into retrospective event detection (RED) and new event detection techniques.
Last they categorized tweets into supervised and unsupervised (or both) techniques. Unspecified
events are generally detected by exploiting the temporal patterns in Twitter streams. These
streams can have a sudden increase in the use of specific keywords. Specified events include
known or planned events, and they could be partially or fully specified with the relation to
content, and metadata (location, time, and venue). New event detection involves continuous
monitoring of Twitter streams for new events in near real time. This is best suited for detecting
unknown events. Twitter’s historical data is best suited for RED techniques. The unsupervised
technique focuses on clustering, while the supervised technique uses classification algorithms
such as SVM. The authors believe that Twitter provides valuable user generated content.
However, all the data has to be filtered and classified into the many event techniques.
17
Xu et al. (2015) focus on emergency event detection via crowdsourcing. Weibo, a
Chinese micro-blogging service similar to Twitter was used to conduct the research. The authors
propose their 5W model of What, Where, When, Who, and Why. This model is very basic in
identifying an emergency event. “What” is to identify what happened during the event. “Where”
is for locating the event. “When” is for creating a timeline of the event. “Who” identifies person
in different roles during the event. “Why” allows for information to be given for response. In
identifying a fire in Guangzhou their search returned 246 messages relating to a fire. Of this 246,
21 messages satisfied the “what” criteria and had location information, check-in information, and
an image. The “where” was taken from 12 of the 21 messages that identified Beijing Road as the
location. When was taken from the messages timeline, first message appearing at 15:24 and the
last message at 16:41. The reason (Why) for the fire was damage to the electric wiring in an old
house; an official user posted the information (police of Guangzhou). The authors show spatial
data can be taken from social media for event detection and that their 5W method can be
accurate and effective.
Xu et al. (2016) conducted a follow up study using a knowledge base approach to be able
to detect an emergency event. A knowledge base system uses an algorithm that has the ability to
filter the noise and redundant information of social media. The knowledge base design is based
on keywords, patterns, sentences, keyword graphs, and temporal features. The authors suggest a
three-layer method (social sensor, crowdsourcing, and knowledge base). The method is then
separated into three steps, selecting candidate messages, creating semantic, temporal, and spatial
information from the messages, and adding temporal feature to knowledge base. There are four
principles that guide candidate message selection. Messages are selected from Weibo using
keywords; these words are weighted depending on their importance. A fire event occurred
18
Beijing Road on May 29, 2014 in Guangzhou. Five keywords were used which returned 246
messages with 21 messages complying with the four principles. These messages were used to
build a knowledge base. Spatial and temporal data was taken from the 21 messages gibing five
spatial locations and six time stamps that coincide with the changing of the event. The author’s
proposed knowledge base algorithm showed good performance and effectiveness in the detection
of emergency events.
Another example on how Twitter can be used for event detection is the attempted coup
on July 15th 2016 in the Republic of Turkey. The coup failed because the rogue military
members lacked access to mainstream and social media (Esen and Gumuscu, 2017). It was the
Turkish people who helped stop the coup. They took to Twitter to mention jet and tank sightings
occurring in Ankara and Istanbul (Unver and Alassaad, 2016). It is through the use of social
media that the locations of rogue troops were know.
Unver and Alassaad (2016) compared social media data to data on mosques that used
loud speakers to broadcast salah prayers to mobilize against the coup. They used algorithms that
collected social media and open source data with a high level of spatial and temporal accuracy.
Their analysis showed that there was a notable mobilization against the coup an hour before
President Erdogan suggested it. The digital resistance of the coup over Twitter started because a
military blockade of a bridge in Istanbul, at the same time mosques started to play the salah
prayers. Their analysis showed that the social media mobilization had roughly the same
geospatial networks as the mosques’ 300-meter loudspeaker radii. Their conclusion was that
social media played a role in the victory of the over the failed coup.
Monitoring to coup attempt in Turkey by comparing social media data to traditional
media data is the focus of Gulnerman and Karaman (2017). The use geotagged tweets from a
19
ten-hour period from within the boundary of Istanbul, comparing it to spatial data that was
published in the news. There were thirty-nine events that were identified. They used hot-spot
analysis to compare tweet density to traditional media data. They conducted text analysis to
create a list of keywords. Some of the of the thirty-nine events had limited number of keywords.
The results of their study show that the social media density does not always match up with the
events mentioned in traditional media. They conclude that social media can be a source for event
detection, but there needs to be further study on accuracy control.
2.3. Terrorism
The reaches of terrorism have no boundaries; it can happen any place and any time. There
are a multitude of reasons for an individual to commit an act of terrorism. According to the FBI,
terrorism is split into two categories, international and domestic. They define international
terrorism being “perpetrated by individuals and/or groups inspired by or associated with
designated foreign terrorist organizations or nations (state-sponsored).” While domestic terrorism
is “perpetrated by individuals and/or groups inspired by or associated with primarily U.S.-based
movements that espouse extremist ideologies of a political, religious, social, racial, or
environmental nature.” (FBI, 2016).
With emerging technologies terrorist events can be detected and planned. When dealing
with terrorism Twitter can be beneficial in identifying threats, but also can be a facilitator in
coordinating terrorist activities (Cheong and Lee, 2011). In a study by Oh, Agrawal, and Rao
(2010) they showed that Twitter indirectly contributed to situational awareness of terrorists
during the 2008 Mumbai terrorist attacks. This was because operational sensitive information
was exposed via Twitter. In another study by Cheong and Lee (2011) simulated terrorist events
were used by randomly injecting terrorist related keywords into original tweets surrounding two
20
events. The clustering of tweets around and during their scenarios can chronicle civilian
response. Their framework can be used in real-world situations by agencies to immediately
record and respond to terror events. This has the ability can give first responders the information
they require to respond appropriately.
2.4. Data Quality
The use of VGI as a tool that gather geospatial information from multiple sources can be
useful. The question is whether that information is actually useable. Data quality with VGI has
been a concern from the beginning. Data quality is a major concern, because volunteered
information does not have assurances officially created data has (Goodchild & Glennon, 2010).
Volunteered geographic information offers an alternative for the acquisition and gathering of
geographic information. Goodchild & Li (2012) believe VGI suffers from a general lack of
quality assurance. The reason data quality is an issue is that anyone can delete or modify data,
and the data entered may not be accurate. Comber et al. (2013) mention the main issue with VGI
is the unknown quality, but in their study of land cover it can be used as long as it is linked to
control data.
While the quality of VGI may always be questionable, social media data is full of
misinformation. This is one area that may hinder the effectiveness of VGI as an intelligence
source. In the aftermath of the 2013 Boston Marathon bombing, there was some misinformation.
Starbird et al. (2014) showed that misinformation was circulated via twitter after the bombing.
Tapia et al. (2014) showed that a crowdsourced investigation fueled misinformation and was
further amplified by the nature of retweeting and social media. The problem that will be faced is
that people will continue to post information that is not accurate. This can delay the response to
21
the actual emergency. While relying on the public to give spatial information, the concern is
whether it is accurate.
The research that is currently available on VGI covers a range of issues. However, there
is little focused on intelligence gathering for first responders. The research on using social media
for intelligence highlights the issues with volunteered information. That issue is the accuracy.
While VGI can be a source of geospatial information, it needs be evaluated further for
authenticity.
2.5. Conclusion
The advancement of technology over recent years VGI has given geographic information
system (GIS) a new sensor for collecting spatial information. The wide variety in which VGI is
available allows for many types of user driver content, from OSM to Twitter. Volunteered
Geographic information has taken a step further into the realm of crowdsourcing. With social
media accessible around the world millions of people are able to share geospatial information.
Crowdsourced information is slightly different from basic VGI, as it is not actively
shared. One of the leading sources of this data is Twitter, with millions of active users and
billions of tweets shared. Through crowdsourcing natural disasters have the ability to be
monitored, responded, and even detected in near real-time. It is through the use of Twitter this
thesis will attempt to gather pertinent information that can be valuable to first responders.
22
Methodology
This chapter details the methods used to determine if crowdsourced Twitter data can be a useful
source of intelligence for first responders. The methods that will be used in this thesis will follow
a mixed methods approach by using both qualitative and quantitative data. First, Twitter data was
collected, filtered for time relevant and geotagged data. Then, a text analysis of the data and
interviews with first responders were conducted for keyword development. With this information
spatial, and spatio-temporal analysis was conducted.
3.1. Twitter Data Analysis
There are two ways in which Twitter data could be collected for this thesis. The first
option was to write your own API to harvest tweets, users, entities, and places directly from
Twitter. However, using Twitter’s API only allows you to go back seven days. The other option
was to get Twitter data through a third party. This study acquired Twitter data through GNIP,
which is Twitter’s enterprise API platform (gnip.com). Using a third party to collect Twitter data
allowed for easier acquisition of data, because it did not require learning programming language
for an API request.
GNIP is a social media API aggregation company that collects data from various social
networks (GNIP 2017). GNIP allows access to Twitter’s full archive via their PowerTrack.
PowerTrack provides customers with the ability to filter a data source’s full archive, and only
receive the data that they are interested in. Using GNIP’s PowerTrack filtering language allows
matching of tweets based on a range of attributes. These include user information, geolocation,
language, and others. This allows for historic and real-time data analysis.
23
3.1.1. Obtaining IRB Approval and Requesting Twitter Data
Twitter data has possible identifying information (username, location, etc.). To be able to
research this data correctly a USC IRB application was submitted. IRB approval was obtained
for the twitter data analysis. Data was obtained through a third party, GNIP.
In order to obtain the data needed for this thesis, review of PowerTrack was needed to
determine the best filter(s). Careful consideration was made in the selection of the filter(s) used
for data collection; the filter that was selected was point radius. Point radius matches against the
exact coordinates (x,y) against a “Place” geo-polygon, where the Place is contained within the
defined region (GNIP 2017). This was chosen because the events took place within a ten-mile
radius of the finish line (site of explosions). The maximum radius of 25 miles was used with the
center point being the finish line with the coordinates coming from Google Earth.
Historical data was requested with a start date of April 15, 2013, the day of the marathon,
and an end date of April 20, 2013, the day after the suspect’s apprehension. This is because
GNIP’s timestamps are in Greenwich Mean Time (GMT), which was five hours ahead of Eastern
Standard Time (EST). For format, Twitter’s native format JOSN was selected. The filter(s) that
was applied was point_radius:[ -71.07861111 42.34972222 25.0mi].
3.1.2. Twitter Data Receipt, Conversion, and Filtering
Assessing if Twitter can be valuable using keywords from geotagged posts began by
receiving 864 gzip (.gz) compressed files from GNIP. Combining and uncompressing was the
next step in the process, this yielded 154,915 individual tweets in JavaScript Object Notation
(JSON) format (Figure 3). With the data in JSON, it was filtered for the timeframe (14:40
4/15/13 to 21:10 4/19/13) surrounding the bombing events. It was then converted to comma
separated values (CSV) format using JSON-CSV.com Desktop Edition. The CSV files were
24
separated into separate days Monday through Friday. There were 117,003 total tweets during the
researched time period.
Figure 3 Tweets in JSON Format
Each row representing one tweet potentially contained user name, user screen name, user
location, language, tweet, and latitude and longitude. Tweets without latitude and longitude were
excluded by filtering in Microsoft Excel. This left a total of 99,756 geotagged tweets during this
time period. A breakdown of total tweets and geotagged tweets for each day can be seen in Table
1.
Table 1 Tweets by Day Following Boston Marathon Bombing 2013
Monday Tuesday Wednesday Thursday Friday Total
4/15/13
(Begins at
14:40)
4/16/13 4/17/13 4/18/13 4/19/13
(Ends at
21:10)
Total Tweets 17,653 19,959 21,699 21,876 35,816 117,003
Geotagged Tweets 14,892 17,602 18,899 19,117 29,246 99,756
Geotagged tweets were separated into a new database and visualized in ArcMap 10.5
(Figure 4). The new databases were loaded into NVivo 11 to identify frequency of keywords and
Microsoft SQL Server to run queries of keywords. Keywords relevant to Boston Marathon
bombing events were determined using the methods described in Section 3.2.
25
Figure 4 Geotagged Tweets Monday 15 April- Friday 19 April 2013
26
3.2. Keyword Development
To collect useful Twitter data, keywords need to be determined. The purpose of the
interviews is to vet the type and characteristics of the information that would be useful to first
responders. Based on the background research and literature review, an initial list was developed.
Once the list was revised based on input from interviewees, it was verified using text analysis. A
final list of keywords for the twitter data search was then determined.
3.2.1. Text Analysis
In order to validate the interview results, text analysis of the twitter data (see previous
section for more information on obtaining and initial processing of twitter data) was performed
using NVivo 11. After uploading tweets into NVivo separated by day, searches were run to
identify the 500 most frequently used words. Frequency searches were conducted for each day
individually and for all five days together. Before using the most frequently used words for the
keyword determination. Irrelevant words were removed, including http and lol (text speak for
“laugh out loud”).
3.2.2. Interviews
Since determining if Twitter data will be useful to first responders is the goal of the
project, they were the targeted interviewees. Interviews were conducted with 9 first responders
from various agencies across the country (Table 2). Interviewees were from local, state, and
Federal agencies that covered law enforcement, EMS, and firefighters. The interviewees
possessed different skill sets and experience levels. This diversity allowed for a variety of
perspectives, not just the point of view from personnel at a single organization or experience
level. Initial candidates were former military law enforcement colleagues. These colleagues are
now members of local and Federal agencies, including the Quincy Police Department (Quincy,
27
Massachusetts), St. Louis Fire Department (St. Louis, Missouri), Army Criminal Investigative
Division, and FBI. In order to identify additional interviewees, the research relied on the chain-
referral approach to sampling interviewees, whereby initial interviewees identify additional ones
(Atkinson and Flint 2001).
Table 2 First Responder Interviewees
Date Position Organization Experience
10/12/17 Police Officer Marine Corps Police
Department
5 years
10/15/17 Military Police
Sergeant
United States Marine
Corps
8 years
10/19/17 Fire Private St. Louis Fire Department
(St. Louis, MO)
1 year 10 months
10/21/17 Special Agent Federal Bureau of
Investigation
1 year
10/27/17 Sheriff’s Deputy Franklin County Sheriff’s
Office (North Carolina
4 ½ Years
10/29/17 Special Agent US Army Criminal
Investigation Command
2 ½ years
10 years other Law
Enforcement
10/30/17 Sheriff’s Deputy Duchess County Sheriff’s
Office
2 years 3 months
3 years other Law
Enforcement
11/2/17 Patrolman Quincy Police Department 5 years
2 years other Law
Enforcement
11/5/17 Patrol Officer Owatonna Police
Department (Owatonna,
MN)
4 Years
2 Years other Law
Enforcement
3.2.2.1. Creating Protocol, Obtaining IRB Approval, Piloting, and Revisions
An interview protocol for piloting was created based on the background research and
literature review. The protocol began with an introduction to the purpose and background of the
28
research project. Interviewees were informed that the interview was completely voluntary and
asked for permission to audio record. The topics of the questions were:
• Background, current agency, and position of interviewee
• Experience with emergencies, number and type
• Current use of social media as an information source during an emergency
• Types of information interviewee found/would find useful during an emergency,
e.g. location of event, extent of affected area, type of damage (e.g. fire, gas,
flooding, active shooter, etc.), severity of damage, location and number of
casualties, presence of other first responders (Fire, Police, Medical), etc.
• Characteristics of information interviewee would find useful during an
emergency, e.g. more definite location, more detail, more timely, etc.
The protocol was targeted to last between 30 minutes and 1 hour.
Since this research involved interviewing human subjects (first responders) review from
USC’s IRB was needed. An application for an exemption for human subjects research was
submitted to USC’s IRB. The research methodology, interview protocol were submitted. After a
review from IRB this thesis’s research was approved for exemption.
Pilot interviews were conducted with former military law enforcement colleagues. Based
on their input, revisions to the questions were made. These revisions included making questions
clearer if any are confusing and cutting or adding questions (See Appendix A for final protocol).
3.2.2.2. Conduct Interviews
These interviews were held in October and November 2017. After obtaining permission,
each interview was audio recorded. Once the interviews were completed, they were transcribed
29
into word files and then organized each interview file by topic. For each topic, the responses
were compiled and summarized from all the interviews.
3.2.3. Finalize Keywords and Search Tweets
In order to be considered a keyword for this study the words had to appear on both the
text analysis and from list created through interviews. Based on the compiled interview
responses and text analysis, a revised the list of keywords for the twitter data search was created.
Tweets were searched using these keywords.
3.3. Spatial and Temporal Analysis
In order to evaluate the use of Twitter to aid first responders, spatial and temporal data
needs to be reviewed. An emergency event such as the bombing is not only spatially import but
also time sensitive. The distribution of tweets will be compared to event locations. The Twitter
data timeline can be compared to event time to determine temporal accuracy.
3.3.1 Kernel Density
Geotagged tweets that contain at least one of the keywords will be used to show the
spatial distribution of information. This thesis will use the kernel density tool in ArcMap to
analyze the data. The Calculate Distance Band from the Neighbor Count tool in ArcMap will be
used to determine the average distance between geotagged tweets. The average distance will be
used as the search radius for kernel density tool. This will show how tweets are distributed
around event locations. Figures will display the geographic density distribution of keyword
tweets.
30
3.3.2 Spatio-Temporal Analysis
Whenever a tweet is posted it has a time stamp of when it occurred. With these time
stamps a timeline can be created from the information relating to the events. This timeline will
start with the first and end with the last tweet that contains a keyword. This will help show the
spatio-temporal data that is associated with each tweet following the bombings. It is through this
data, tweets can be analyzed showing where and when they occurred. This will allow for a
timeline showing each individual tweet and where those tweets came from. These tweets will be
mapped based of the times they occurred. Tweets will be grouped together based on time sent.
This information will be visualized showing where and when tweets were sent for Monday.
31
Results
This thesis looks at crowdsourced data as a possible aid to first responders. Over one hundred
thousand tweets surrounding the time of the Boston Marathon bombing and subsequent related
events were analyzed. This chapter describes the results of a text analysis of the tweets and
interviews with nine first responders that were used to establish a list of keywords. After
identifying tweets with the keywords, spatio-temporal analysis was conducted.
4.1. Establishing Keywords
The final set of keywords (Table 10) was the result of comparing the text analysis to the
interview responses. Words that appeared in both were used for this study. There is also a brief
discussion of the presence of fuck as one of the consistently most frequently used words.
4.1.1. Text Analysis
Tables were compiled for each day analyzing the frequency of words found in tweets.
NVivo 11 placed similar words into one ranking (Tables 3-8). Tables were compiled for each
individual day and one for Monday through Friday combined. For this study, Boston will be
disregarded even though it holds the number one spot on all days. This is because Boston is too
general of a location. In each table, there are the ten most frequent words followed by the
frequency rankings of possible keywords.
Monday’s results were compared to interview results focusing on the bombings. Table 3,
shows that variants of Boston showed up 2113 times, representing the number one spot, and
variants of exploding (498) only showed up 26 times. Bomb and variants is thirteenth with 534
appearances. Other possible keywords include marathon (and variants), which appeared 649
times, and explosives (and variants), which appeared 576 times, and represent the seventh and
32
ninth most frequently cited words, respectfully. There are a total of fourteen possible keywords
with variants that were pulled from the top 500.
Table 3 Monday’s Ten Most Frequent and Words of Interest
Rank Word Count Similar Words
1 boston 2113 #boston, @boston, @bostons, boston, bostons
2 thank 1031 #thankful, #thanks, thank, thankful, thankfully, thanking, thanks
3 just 947 @just, just, juste
4 prayforboston 762 #prayforboston, prayforboston
5 people 750 people, peoples
6 safely 665 #safe, safe, safely
7 marathon 649 #marathon, @marathon, marathon, marathoner, marathoners,
marathons
8 loving 607 #love, #lovely, @lovely, love, loved, lovee, lovely, loves, loving
9 explosives 576 #explosion, #explosions, explose, explosion, explosions, explosive,
explosives
10 fuck 564 #fuck, #fucked, fuck, fucked, fucking, fucks
13 bomb 534 #bomb, #bombing, #bombings, #bombs, bomb, bombe, bombed,
bombes, bombing, bombings, bombs
17 bostonmarathon 473 #bostonmarathon, @bostonmarathon, bostonmarathon
68 injured 174 injure, injured, injuring
72 police 162 #police, police
96 dead 133 #dead, dead, deadly
101 dying 128 #die, #dying, die, died, dies, dying
146 attack 93 attack, attacked, attacking, attacks
173 terrorists 82 #terrorist, terrorist, terroristes, terrorists
194 fired 74 fire, fired
207 boylston 70 #boylston, boylston
271 terrorism 51 #terror, terror, terrorism, terrorized
280 cops 49 cop, copped, cops
293 blasts 47 blast, blasting, blasts
498 exploding 26 explode, exploded, explodes, exploding
Tuesday’s results (Table 4) are similar to Monday’s but not identical. Bomb (and
variants) drops from thirteenth to forty-fourth. Variants of marathon dropped to the thirty-fourth
spot and variants of explosive to the two hundred and twelfth spot. The ten additional possible
keywords range from thirty-fourth to four hundred and seventy-third. No words relating to
33
‘suspect’ were in the top 500, which may not be surprising since no suspects had been identified
or apprehended at this point.
Table 4 Tuesday’s Ten Most Frequent and Words of Interest
Rank Word Count Similar Words
1 boston 1748 #boston, @boston, @bostons, boston, boston', bostoner,
bostons
2 just 967 just
3 liking 700 like, liked, likely, likes, liking
4 loving 687 #love, #lovely, love, loved, lovee, lovely, lovelys, loves,
loving, 'loving
5 getting 682 @get, get, gets, getting
6 today 514 #today, today, todays
7 days 502 #day, day, days
8 thanks 492 #thankful, #thanks, thank, thanked, thankful, thankfully,
thanks
9 ones 479 one, 'one, ones
10 now 468 #now, now
34 marathoners 240 #marathon, marathon, marathoner, marathoners, marathons
39 @bostonmarathons 226 #bostonmarathon, @bostonmarathon, @bostonmarathons
44 bombs 209 #bomb, #bombing, bomb, bombe, bombed, bombes, bombing,
bombings, bombs
192 police 78 #police, police
195 boylston 76 #boylston, boylston
212 explosive 72 #explosion, #explosions, explosion, explosions, explosive,
explosives
425 terrorize 39 terror, terror’, terrorism, terrorize
434 cops 38 #cops, cop, copped, cops
461 attack 36 attack, attacked, attacking, attacks
473 blast 35 blast, blasted, blasts
Continuing onto the results of Wednesday’s text analysis (
34
Table 5), there is no direct mention of words relating to the bombing events in the top
ten. Variants of marathon fell to the sixty-ninth spot and explosives did not even make it in the
top 500 most frequent words. There are seven other possible keywords ranging from thirty-third
to four hundred and twenty-ninth. Variants of suspect broke into the top 500 for the first time at
one hundred-sixtieth with 97 occurrences, even though there were still no suspects publically
identified or in custody.
35
Table 5 Wednesday’s Ten Most Frequent and Words of Interest
Rank Word Count Similar Words
1 boston 1620 #boston, @boston, boston, boston', bostoner, bostons
2 just 1007 just
3 liking 856 like, liked, likee, likely, likes, liking
4 getting 827 get, gets, getting
5 loving 644 #love, @love, @lovely, love, loved, lovee, lovely, loves, loving
6 now 547 now
7 fucks 543 #fuck, fuck, fucked, fucking, fucks
8 days 539 day, days
9 knows 487 know, knowing, knows
10 timing 448 @time, time, time', timee, times, timing
33 bombs 275 #bombings, #bombs, bomb, 'bomb, bombe, bombed, bombing,
bombings, bombs
69 marathoners 168 #marathon, @marathon, marathon, marathoners, marathons
121 @bostonmarathon 112 #bostonmarathon, @bostonmarathon
160 suspects 97 #suspect, suspect, 'suspect, suspecte, suspected, suspects
175 police 91 police
264 boylston 64 #boylston, boylston
429 blasts 42 blast, blasted, blasting, blasts
On Thursday two events took place that added to possible additional words – the release
of the suspects’ identities and the shooting of the MIT police officer. The results (Table 6) had
eleven possible keywords, all of them outside the top ten. Variants on marathon fell to the one
hundred and third spot and explosives to the three hundred-fiftieth
spot. Variants on the word
suspect climbed slightly to the one hundred twenty-seventh spot with 113 occurrences. Words
such as MIT (76), Cambridge (93), and shooting (259) saw spikes that were likely related to the
shooting.
36
Table 6 Thursday’s Ten Most Frequent and Words of Interest
Rank Word Count Similar Words
1 boston 1431 #boston, #boston#bostonstrong#marathon#backbay#prayforboston,
@boston, @bostons, boston, 'boston
2 just 996 @just, just, justing
3 liking 848 like, liked, likee, likely, likes, liking
4 getting 830 #get, @get, get, gets, getting
5 loving 672 #love, #lovely, @love, @lovely, love, loved, lovely, loves, loving
6 days 531 #day, day, days
7 now 517 now, 'now, nows
8 fucks 499 #fuck, fuck, fucked, fucking, fucks
9 ones 491 @one, one, ones
10 goodness 477 #good, @goode, good, goodness, goods
58 bombs 176 #bombing, #bombings, bomb, bombed, bombing, bombings, bombs
76 mit 159 #mit, @mit, mit, mits
93 cambridge 139 #cambridge, cambridge
103 marathoners 129 #marathon, marathon, marathoners
127 suspects 113 #suspect, #suspects, suspect, 'suspect, suspects
163 police 98 #police, police
210 officer 78 office, officer, officers, offices
214 boylston 75 #boylston, boylston
259 shooting 65 #shooting, shoot, shooting, shootings, shoots
262 @bostonmarathon 64 #bostonmarathon, @bostonmarathon, @bostonmarathons
350 explosion 48 explosion, explosions, explosive
Friday was the final day of the events surrounding the Boston Marathon bombing and
subsequent identification and capture of the suspects alleged to be responsible. Several events
took place, starting in the early hours. There was a carjacking, a firefight during which one of the
suspects died, and later the other suspect was apprehended. The full results can be seen in Table
7. Suspects jumped from the one hundred twenty-seventh spot all the way to ninth. Other words
in the top ten were Watertown (6) was the location of the firefight and police (7). There were 21
other possible keywords outside of the top ten. Even the living suspect’s name appeared in the
top 500, with 291 mentions of Tsarnaev and 190 of Dzhokhar.
37
Table 7 Friday’s Ten Most Frequent and Words of Interest
Rank Word Count Similar Words
1 bostons 2826 #boston, @boston, @bostons, boston, 'boston, bostons
2 just 1801 @just, just
3 getting 1474 get, 'get, gets, getting
4 now 1354 #now, now, 'now, now#realheroes, nows
5 liking 1313 like, liked, liked’, likee, likely, likes, liking
6 watertown 1188 #watertown, watertown
7 police 1114 #police, polic, police
8 fucks 1059 #fuck, fuck, fucked, fucking, fucks
9 suspects’ 1027 #suspect, #suspects, suspect, suspect#2, suspect’, suspected,
suspects, suspects', suspects’
10 thanks 1018 #thank, #thankful, #thanks, thank, thankful, thankfully,
thanking, thanks
40 bombs 483 #bomb, #bombing, bomb, bombed, bombing, bombings,
bombs
65 cambridge 325 #cambridge, @cambridge, cambridge
66 manhunts 316 #manhunt, @manhunt, manhunt, manhunting, manhunts
72 mit 309 #mit, @mit, mit
78 cops 297 #cops, cop, cops
95 shots 273 shot, shots, shots', shotting
100 custody 251 #custody, custody
112 marathon 221 #marathon, marathon
146 bombers' 188 #bomber, bomber, bombers, bombers'
148 terrorists 183 #terrorist, #terrorists, terrorist, 'terrorist, terroriste, terroristing,
terrorists
159 explosives 177 #explosion, #explosions, explosion, explosion', explosions,
explosive, explosives
166 sirens 172 siren, sirens
188 shooting #shooting, shoot, 'shoot, shooting, shootings, shoots
222 #bostonbombings 140 #bostonbomb, #bostonbomber, #bostonbombers,
#bostonbombing, #bostonbombings
228 guns 138 gun, gunned, gunning, guns
277 helicopters 110 #helicopter, helicopter, helicopters
291 tsarnaev 101 #tsarnaev, tsarnaev
300 enforcement 97 enforcement, enforcements, enforcers
309 dzhokhar 94 #dzhokhar, @dzhokhar, dzhokhar
319 #bostonmarathon 91 #bostonmarathon
326 campus 88 campus, campuses
327 captured 88 #captured, capture, captured, captures, capturing
481 arrests 59 #arrest, arrest, arrested, arresting, arrests
38
The text analysis of all five days combined shows that there are no possible keywords in
the top ten most frequent (Table 8). However, words that appeared in the top 500 on only a
couple of days made the combined day list because they appeared so often on those days.
Suspect only appeared Wednesday through Friday, but was used enough to make the combined
day list. Watertown only appeared on Friday’s top 500 but was fifty-second in the combined day
list. Then there are other examples such as bomb (twenty-eighth), which showed up every day.
39
Table 8 Combined Days Ten Most Frequent and Words of Interest
Rank Word Count Similar Words
1 bostons 9738 #boston,
#boston#bostonstrong#marathon#backbay#prayforboston,
@boston, @bostons, boston, boston', 'boston, bostoner, bostons
2 just 5718 @just, just, juste, justing
3 getting 4344 #get, @get, get, 'get, gets, getting
4 liking 4275 like, liked, liked’, likee, likely, likes, liking
5 now 3440 #now, now, 'now, now#realheroes, nows
6 loving 3322 #love, #lovely, @love, @lovely, love, loved, lovee, lovely,
lovelys, loves, loving, 'loving
7 thanks 3172 #thank, #thankful, #thanks, thank, thanke, thanked, thankful,
thankfully, thanking, thanks
8 fucks 3120 #fuck, #fucked, fuck, fucked, fucking, fucks
9 peoples 2696 peopl, people, peoples, peoples'
10 knows 2652 @know, know, knowing, knows
28 bombs 1677 #bomb, #bombing, #bombings, #bombs, bomb, 'bomb, bombe,
bombed, bombes, bombing, bombings, bombs
33 police 1543 #police, polic, police
39 marathons 1407 #marathon, @marathon, marathon, marathoner, marathoners,
marathons
46 suspects’ 1267 #suspect, #suspects, suspect, 'suspect, suspect#2, suspect’,
suspecte, suspected, suspecting, suspects, suspects', suspects’
52 watertown 1199 #watertown, watertown
68 bostonmarathon 966 #bostonmarathon, @bostonmarathon, @bostonmarathons,
bostonmarathon
75 explosives 904 #explosion, #explosions, explose, explosion, explosion',
explosions, explosive, explosives
113 cambridge 715 #cambridge, @cambridge, cambridge
165 mits 539 #mit, @mit, mit, mits
211 cops 438 #cops, cop, copped, cops
218 shots 423 #shots, shot, shots, shots', shotted, shotting
285 manhunts 323 #manhunt, @manhunt, manhunt, manhunting, manhunts
305 terrorists 310 #terrorist, #terrorists, terrorist, 'terrorist, terroriste, terroristes,
terroristing, terrorists
311 boylston 304 #boylston, boylston
338 sirens 275 #sirens, siren, sirene, sirens
387 attack 238 attack, attacked, attackers, attacking, attacks
Only possible keywords that showed up in the top 500 most frequent words each day
were considered for the tweet searches. Some of the highest were in the top ten while the lowest
40
was four hundred ninety-eighth. The text analysis allowed us to establish a baseline for suitable
keywords.
4.1.2. Interview Responses
Interviews were conducted with nine first responders over a four-and-a-half-week period
in October and November 2017. Eight out of nine of the interviewees were law enforcement and
the other interviewee was a firefighter; all considered themselves to be first responders. They
ranged from over a decade of experience to just a year; and as such their responses varied.
Every interviewee has responded to an emergency, while two have responded to a major
emergency (boiler explosion and Blue Angel crash). None of the interviewees have responded to
a terrorist event. However, they all train for these types of events.
When responding to an emergency, all interviewees said location is one of the most
important pieces of information needed. “Without a location, how are we going to know where
to go,” one interviewee said. The next most popular result was what type of emergency they are
responding to. Then other responses related to more detailed information on the event (e.g.
suspect information, injuries, site security). After reviewing all responses from interviewees,
specific words that were provided for each day can be found in Table 9.
41
Table 9 Words Relating to Events provided by First Responders
Monday Tuesday Wednesday Thursday Friday
Attack Attack Attack Attack Attack
Back Bay Black Hat Black Hat Bomb/ Bombing Bomb/ Bombing
Blast Bomb/Bombing Bomb/Bombing Boston Boston
Bomb/ Bombing Boston Boston Boylston Boylston
Boom Boylston Boylston Cambridge Cambridge
Boston Cops Cops Cops Cops
Boylston Explosive/
Explosion
Explosive/
Explosion
Dzhokhar
Tsarnaev
Dzhokhar
Tsarnaev
Cops Finish Line Finish Line Explosive/
Explosion
Explosive/
Explosion
Explosive Marathon Marathon Finish Line Finish Line
Explosion Police Police Marathon Marathon
Finish Line Suspects White Hat MIT MIT
Fire Terrorist/
Terrorism
Police Police
Injured White Hat Shooting Shooting
Marathon Suspect Suspects
Police Tamerlan
Tsarnaev
Terrorist/
Terrorism
Suspects Watertown
Terrorist/
Terrorism
4.1.3. Keywords for Searches
A list of keywords was developed through text analysis and interviews for each
individual day. In order to be a keyword, it had to be on the most 500 frequent words from the
text analysis and be mentioned in an interview. Not every possible keyword from the text
analysis was used, and the same can be said about words provided by interviewees. Boston was
the most frequent word for every day and was brought up by interviewees, but it was omitted
from the keyword list for being too broad of a location.
The keywords (Table 10) are separated by day. The number of keywords on Monday and
Friday were the highest, these are the two days were major events took place; Monday had the
42
bombing, while Friday had a firefight and arrest. Thursday is the next highest, this coincides with
the release of suspect identities and MIT Police Officer shooting. Monday through Wednesday
focus around the marathon bombings. Thursday and Friday show need for all of the week’s
events.
Table 10 Final List of Keywords for Each Day
Monday Tuesday Wednesday Thursday Friday
Attack Attack Bomb/Bombing Bomb/Bombing Bomb/Bombing
Blast Bomb/Bombing Boylston Boylston Cambridge
Bomb/Bombing Boylston Marathon Cambridge Cops
Boom Cops Police Cops
Dzhokhar
Tsarnaev
Boylston
Explosive/
Explosion
Suspect
Explosive/
Explosion
Explosive/
Explosion
Cops Marathon
Marathon Marathon
Explosive/
Explosion
Police
MIT MIT
Fire
Terrorist/
Terrorism
Police Police
Injured
Shooting Shooting
Marathon
Suspect Suspects
Police
Terrorist/
Terrorism
Terrorist/
Terrorism
Watertown
Microsoft SQL Server searches were conducted using the keywords in Table 10. A search
of Monday’s twelve keyword variants resulted in 2410 geotagged tweets from 2:40 p.m.to 11:59
p.m. Tuesday’s results from the eight keyword variants produced 949 geotagged tweets.
Wednesday produced the smallest results of 672 geotagged tweets; it also only had five keyword
variants. With other events occurring on Thursday, there was an increase of keywords to ten, but
still only 979 geotagged tweets. Friday, which had the most events take place, had twelve
keyword variants and returned the most geotagged tweets at 4453.
43
4.1.4. Text Anomaly Analysis
Fuck was in the top ten every day but Tuesday (when it ranked 11th), and was in the top
ten for the aggregate results. As an expression of raw emotion, it could also be related to the
terrorism event. It may be worth adding fuck to the list of keywords if it can be determined that it
was used frequently to reference the terrorism event and the other keywords were not also
included in the tweet. For example, on Monday there were 600 tweets containing the word after
the bombing occurred; 109 also included keywords and 491 did not.
There were 120 tweets containing the word fuck in the first hour following the bombing.
Forty-five of these tweets contained another keyword and the remaining tweets were reviewed
for relevance to the terrorism event. For example, the first time it was used after the bombing
was 2 minutes. Someone tweeted, “WHAT THE FUCK,” in all capital letters. Though there are
no details, it could have been related to the bombing because it was sent from about two blocks
away. However, it is hard to know for sure since we cannot be certain of the context. Some
tweets are obviously not related to the bombing (@vanillaice. Word to ya mother brother fucka),
and a portion are simply hard to tell. Table 11 displays the analysis of the tweets containing the
word fuck for the first hour following the bombing.
Table 11 Relevance of Tweets with the Word Fuck One Hour Following the Bombing
Keyword
Probably
Relevant
Unclear Not Relevant Total
Tweets 38 26 41 15 120
44
4.2. Spatial Analysis
The results from the keyword searches were used to produce images to show the
distribution of tweets along with their density. In this section, images for each individual day are
presented and examined.
4.2.1. Daily Results
Looking at Monday’s tweet density (Figure 5), the majority of the study area is blue,
meaning it has a low density. However, there are pockets of higher tweet density areas
throughout the study area. The largest pocket of high tweet density is made up of neighborhoods
such as Fenway, Back Bay, Beacon Hill, and Downtown. This high tweet density area is
approximately 3.4 square miles, and includes the marathon finish line and the sites of the two
explosions.
There are roughly 800 geotagged tweets occurring from pre-explosion to just before
midnight in this area. These tweets make up a third of tweets returned from the keyword search.
The largest clusters of tweets surround the explosion sites.
45
Figure 5 Density of Tweets on Monday 15 April 2013
46
Moving on to Tuesday (Figure 6), it can be seen that the majority of the study area’s
tweet density is low. There are fewer pockets of higher density locations, but on average, they
are larger than Monday’s. The largest pocket occurs in relatively the same location as Monday,
but is only approximately 2 square miles. There is another high tweet density pocket that is
separate from the others. It is in the Fenway neighborhood and is a little over 0.5 square miles. A
high-density of tweets is also at Logan International Airport.
The decrease in density size of the main location can partially be attributed to the smaller
number of keywords used and therefore smaller number of tweets returned by the search. The
overall number of geotagged tweets between Monday and Tuesday dropped by more than half.
The largest density area contained about 440 tweets, or about 46 percent of the entire day. The
largest clusters of tweets were still around the bombing locations. The other two large clusters in
Fenway neighborhood and Logan International airport had just 50 and 20 tweets, respectfully. In
the majority of tweets, the focus is mostly about the events of the previous day.
47
Figure 6 Density of Tweets on Tuesday 16 April 2013
48
Wednesday’s results (Figure 7) show similar trends as seen in Tuesday’s results. There is
another decline in keywords, which results in fewer geotagged tweets than the previous day. The
largest density cluster is still in the same neighborhoods as before, and the Fenway cluster from
Tuesday is now reconnected to the main density cluster. The area of the largest density cluster is
approximately 2.8 square miles, and is still smaller than Monday’s results. There are other
noticeable clusters at Logan International Airport, East Boston, and Cambridge, all under 0.25
square miles.
The geotagged tweets continued to drop between Tuesday and Wednesday, with a
difference of 277. Over half (53 percent) of the daily geotagged tweets were in the largest
density cluster. The largest tweet clusters for the third day were still around the bombing
locations. Logan International airport had 9 tweets, East Boston had 6 tweets, and Cambridge
had 13 tweets. Most tweets were supportive in nature, and still related to the events that took
place on Monday.
49
Figure 7 Density of Tweets on Wednesday 17 April 2013
50
On Thursday, with the occurrence of a new event (MIT Police Shooting), the results
shifted to reflect this (Figure 8). With the addition of this event, keywords increased and
changed. Correspondingly, the number of geotagged tweets increased. However, it is still fewer
than Monday’s results. The biggest change from the previous day is that there is a large density
cluster in Cambridge where MIT is located. This density cluster is approximately 1.8 square
miles. The largest cluster that contains the bombing locations is approximately 3.7 square miles,
which is slightly larger than Monday’s results. This shows the tweet density increased in both
areas with the addition of the shooting.
Between Wednesday and Thursday, the number of geotagged tweets from keywords
increased by 307. The main cluster density contained 376 geotagged tweets, about 38 percent of
the daily total. The Cambridge cluster around the shooting location contained 141 tweets, about
14 percent of the daily total. Over half of the tweets from Thursday came from these two density
clusters, meaning there are groupings of tweets around both the bombing and shooting sites. In
addition to the shooting, an increase in tweets came from the release of the identities of the two
suspects.
51
Figure 8 Density of Tweets on Thursday 18 April 2013
52
The final day of the events surrounding the Boston Marathon bombing took place on
Friday, April 19th, 2013. This was the day in which the keywords returned the most geotagged
tweets at 4453, and they were spread throughout 21 of the 24 hours in the day. The results show
that there is no single massive density cluster like the four previous days (Figure 9). Instead there
are multiple smaller clusters throughout the study area. The largest pocket of approximately 0.6
square miles, is near the explosion sites, but no longer includes them. The other event locations
fall within or adjacent to high tweet density locations. These events took place within a 5-mile
radius of each other within the Boston Metropolitan area. The largest density near the explosion
sites represent 226 geotagged tweets, about 5 percent of the daily total. The cluster around the
shooting/carjacking sight contained 33 tweets. Near the firefight location, there was 146
geotagged tweets. These occurred from the midnight hour all the way to 8:00 p.m. The final
cluster location, where the suspect was apprehended, only had a handful of tweets; these
referenced the street that it occurred on.
53
Figure 9 Density of Tweets on Friday 19 April 2013
54
4.2.2. Spatial Analysis Takeaways
Comparing Figures 4 through 8, you can see the change in spatial distribution of the
tweets. Although each day varied in the number of tweets returned from keywords, the densities
from which they came stayed relatively the same. When other events occurred the spatial
distribution of tweets sent changed with them. The higher density of tweets appeared around
each new event that took place. Geotagged tweets are following the course of events, giving
locations and information.
Another point to note is that when fewer keywords were used, fewer tweets resulted from
the searches. This means useful tweets could have been excluded simply because its particular
keyword was not mentioned frequently enough on the day in question.
4.3. Spatio-Temporal Analysis
In this section, the temporal data surrounding the bombing events that occurred on
Monday was analyzed. Temporal data is a key piece of information needed to respond to an
emergency event.
4.3.1. Analysis of Monday’s Tweets
The temporal analysis of tweets resulting from the keyword searches started at the point
when the first explosion took place at 2:49 p.m. There were 2387 geotagged tweets from the time
the bomb went off until midnight. It can be seen in Figure 10 that the first tweet referencing the
explosion did not occur until 3 minutes after the bomb. There were ten non-bombing related
tweets before 2:51 p.m. that were removed from the dataset, meaning a total of 2377 tweets were
analyzed. The figures that follow show the relevant tweets in the first 15 minutes, 16 minutes to
one hour, and more than one hour.
55
Figure 10 First Tweets Mentioning Events, 15 April 2013
The spatio-temporal results of the first 15 minutes show where the tweets immediately
after the bombing came from (Figure 11). The first tweet was 0.12 miles away from the
explosion site or approximately two city blocks. Within the first 15 minutes, 56 tweets were sent,
the most (7) occurring 6 minutes after. The distances from the bomb locations ranged from 252
feet to 22 miles. There are 33 tweets within a one-mile radius of the bombing sites, making up 59
percent of the tweets from the first 15 minutes. Looking at the content along with the location
shows that they are informative tweets, and occurred around the event bombing locations. While
the tweets from farther out are helpful with relaying information, they would not be useful in
communicating information to first responders.
56
Figure 11 Monday Tweets From First 15 Minutes After Bombing
57
The results from the second time period – 16 minutes to 1-hour (Figure 12) show a large
increase of tweets. There were 588 keyword tweets in this 45-minute period, for a 950 percent
increase over the first 15 minutes. There were five-time ranges for this period. The first was 16-
20 minutes, which had 43 tweets. In the next 5-minute period, 59 tweets were sent. The third
period, which ended at 30 minutes, had 76 tweets. There were 235 tweets sent between 31 and 45
minutes. The hour ended with an additional 175 tweets. Within the first hour of the bombing a
total of 644 tweets were sent. This accounts for 27 percent of the relevant tweets found through
the Monday keyword search.
Like the first 15 minutes, tweets sent in the next 45 minutes were sent from a variety of
locations. The tweets display a similar density pattern to the tweets over the entire day, as seen
earlier in Figure 4. The two closest tweets to each individual bombsite were 26 feet and 152 feet,
respectfully. The farthest tweet during this time period was 24 miles away. There were 155
keyword tweets that fell within a one-mile radius of the bombing sites. This accounted for 26
percent of the tweets within this time period.
58
Figure 12 Monday Tweets Between 16 Minutes and 1 Hour After Bombing
59
The remaining temporal information that this thesis analyzed were the tweets from more
than an hour after the explosions occurred (Figure 13). The results show six time ranges: five in
one-hour intervals up to 5 hours, then one more than 5 hours. In the remaining eight hours and
ten minutes of the day, 1735 keyword tweets were returned. This meant an average of over 200
tweets an hour. The individual breakdown of tweets per range were 596 for 1-2 hours, 375 for 2-
3 hours, 222 for 3-4 hours, 178 for 4-5 hours, and 364 for over 5 hours. The largest number of
tweets was sent the first hour after the explosions.
As with the previous two temporal analyses, the keyword tweets all fall within the 25-
mile original search radius. The tweets display a similar density pattern to the tweets over the
entire day, as seen earlier in Figure 4. There were tweets sent from the bombing sites, they just
referred to what had happened earlier that day. Around 20 percent of the tweets sent after an hour
came within a one-mile radius of the bombing sites. It can be seen that the informative tweets
regarding the bombing events spread throughout the Boston Metropolitan area after the first
hour.
60
Figure 13 Monday Tweets Over an Hour After Bombing
61
4.3.2. Text Anomaly Spatio-Temporal Analysis
There were 120 tweets containing the word fuck in the first hour following the bombing
(Figure 14). Thirty-eight of these tweets contained another keyword (red) and so were already
captured by the previous spatio-temporal analysis. There are three other classifications of tweets
containing this word. There is probable (orange), which probably reference the bombings as
determined through context or other words pertaining to the events that are not keywords. Then
there is unclear (yellow), which are tweets that contain fuck and may or may not relate to the
bombing. Finally, there is not related (blue); these contain the word fuck, but definitely in
another context. Twenty-one of the tweets (or 17.5%) with the word fuck within the first hour
were located within a one-mile radius of the events. None of these tweets was not related; they
were either keyword, probable, or unclear. The first tweet (WHAT THE FUCK) came two
minutes after the explosion and was classified unclear.
62
Figure 14 Tweets Within First Hour Containing Fuck
63
4.3.3. Spatio-Temporal Analysis Takeaways
Comparing the three spatio-temporal figures show that, over time, keyword tweets spread
farther out from the bombing sites. Within the first 15 minutes the majority of tweets were
focused around the bombing sites. In the remaining 45 minutes of the first hour two things are
noticed. The first is that there is more Twitter activity surrounding the bombing sites and the
second is that a larger number of tweets have spread throughout the 25-mile radius study area.
With just the spatial and temporal data it is certain that an event took place, where it occurred,
and that information spread over time throughout the Boston Metropolitan Area.
Looking at the content along with the location shows that there were informative tweets
in an extremely timely manner (just 3 minutes after the first bomb). These geotagged tweets
came from users that were around the event bombing locations. If first responders had access to
these tweets, they could have learned valuable information about the explosions and known
approximately where they took place.
64
Conclusion
Crowdsourced data can be a useful tool in aiding first responders in an emergency event. This
work shows various ways in which spatially referenced Twitter data offers unique insight into a
terrorism event using the 2013 Boston Marathon bombing as a case study. People become
sensors by following their natural tendency to share information during an emergency. Instead of
a single source of intelligence for an emergency, crowdsourced data has the potential to have
hundreds, if not thousands, of sources.
The results of this thesis show: 1) what type of information was found in tweets during
the Boston Marathon bombing terrorism event, 2) what type of information would be useful for
first responders during the event, 3) how geospatial information was included in the tweets, and
4) the timeliness of the information. While there are some limitations to this study and the data
used, Twitter has the potential to be a great source of intelligence, though it will likely require an
automated process to sift through all the crowdsourced data in real time.
5.1. Findings
The overall finding of this thesis is that crowdsourced data, such as Twitter, can provide
potentially useful information to aid first responders following a terrorism event.
This study determined what type of information is found in crowdsourced data during a
terrorism event through text analysis of Twitter data. Many of the top 500 words resulting from
the text analysis were irrelevant. The text analysis did lead to keywords useful for the case study,
but only because targeted searches were possible since the events were known. Text analysis of
the most frequent words would not necessarily lead to useful information in a real time situation.
All the keywords used for Twitter data searches were located in the top 500 words. In addition,
65
due to the different events that occurred through the week (a bombing, a shooting, a carjacking,
and a firefight), useful keywords were different each day.
Fuck also appeared in the top ten most frequent words every day but Tuesday, and in the
aggregate results. Human interpretation allowed for the determination that it is likely that some
of the tweets with the word fuck, but without other keywords, were related to the terrorism event.
However, without context, it will be very difficult for a machine to determine whether a simple
statement like “what the fuck” is related to some event or part of regular speech. The only way
relevance was determined was by reviewing both the location and timing of the tweet, and this
was only possible because we already knew an event took place.
Interviews with first responders provided data on what type of information would be
useful for first responders during a terrorism event. The results of the interviews confirmed that
there are multiple pieces of information needed to respond to an emergency. The most popular
answer was location; this is because first responders need to know where to go during an
emergency. In regards to the case study, the interviewees referred to the marathon finish line,
Boylston Street, and other location identifiers. Interviewees also said what kind incident and all
information pertaining to the incident was also important. In reference to the Boston Marathon
bombing, interviewees mentioned, bomb, explosion, firefight, suspect names, etc. These findings
were combined with the findings of the text analysis to establish keywords for the Twitter data
search.
This study showed that geospatial information was present in Twitter data around the
Boston Marathon in two ways. All geotagged tweets had absolute location by including a set of
coordinates in their metadata. A subset of tweets also included relative location by referencing
locations of the events, such as Boylston Street or Watertown. Tweets showed the particular
66
street where the bomb went off and that it was the finish line of the Boston Marathon. The spatial
analysis shows that location matters and proximity matters in crowdsourced data. The largest
density of tweets was centered on the bombing location. As other events occurred, the density of
tweets around those events increased in turn. First responders could track spikes in tweet density,
which could serve as a type of alarm that something was happening in the area. If this were
coupled with a broad array of keywords relevant to emergencies, then first responders would
know where and what was happening.
Not only was the information found in crowdsourced data during the Boston Marathon
useful, but also it was also timely. The spatio-temporal analysis was performed for Twitter data
from Monday. Within three minutes, the data showed that an explosion occurred in the Back Bay
neighborhood of Boston. As time passed, more and more tweets were produced; there was a 950
percent increase between the first 15 minutes and the next 45 minutes. This shows that first
responders could access information using crowdsourced data following a terrorism event within
just a few minutes.
5.2. Recommendations for First Responders
Based on the results of this study, social media data appears to be a valuable resource for
first responders to examine during a terrorism event. As such, this thesis leads to several
recommendations for how first responders can use crowdsourced data as a source of intelligence:
• Agencies need to develop or acquire the ability to collect crowdsourced data in
real time.
• Agencies need to develop a list of keywords that are specific to their area of
operations (events, locations, possible targets).
67
• Agencies need to develop a list of keywords that pertain to events they may be
responding to (bomb, shooting, etc.).
• Agencies should create a buffer around their area of operations to focus on things
that are occurring in their response area.
• Agencies should also develop or acquire machine-learning programs to identify
spikes in keywords or anomaly words.
• Agencies should use social media as a tool to disseminate information to the
public during an emergency situation.
5.3. Study Limitations
While this study successfully assessed crowdsourced data during an emergency, there are
areas for improvement. The limitations include: only 1 percent of the tweets in this time period
were studied, the accuracy of the contents of the tweets, and the keywords used. Learning from
the limitations that were confronted in this study will help improve the methodology in future
research of this kind.
5.3.1. Data Limitations
One of the biggest limitations that were encountered in this study was the data received.
Twitter data received from the third-party source GNIP is technically incomplete; you only
receive 1 percent of the tweets from the requested time period up to a maximum of one million.
Based on the number of tweets received, there were potentially 15,491,500 tweets during this
time period. Accessing the full number of tweets has the possibility of giving even more
information to assist first responders.
The next data limitation is data quality. As with any form of crowdsourced data, what
someone tweets may not be accurate. Starbird et al. (2014) explained this well with analysis of
68
Twitter data following the Boston Marathon bombing. They discussed three rumors, a death of a
young girl, a false flag attack, and releasing the wrong identities of the suspects. Anyone can say
anything on a social media platform, regardless of truth, much like speech in general. There are
certain limited situations during which speech is not protected by the First Amendment, e.g.
falsely yelling “Fire!” in a crowded theater. However, so far no one has been prosecuted for
spreading false information on Twitter. This means that first responders will have to check the
accuracy of tweeted information, using up valuable time they could be acting.
5.3.2. Method Limitations
While the methods used to determine keywords provided useful results, they did
constrain the search criteria. The text analysis performed using NVivo 11 had certain limitations.
NVivo 11 produced the most frequently used words; this study looked at the top 500, but the
program can do more. However, it does not recognize open compound words, such as “finish
line”; these words would be separated as “finish” and “line” in the search results. This means
that some words that should have been important may not have been recognized as such because
“finish” and “line” may not be deemed relevant on their own.
The creating of the final keyword list depended on a word appearing as a top word in the
text analysis and included as an interview response. This caused for the keywords to vary from
day to day depending on the frequency, as well as a whole list of possible keywords provided by
the text analysis and interviews to go unused. The fact that only 1 percent of texts were searched
for keywords could also impact the results. Lastly, a keyword might show up in a tweet but have
no connection to the events taking place; for example, “slurpees are so underrated #bomb”
returned from Monday’s keyword search.
69
5.4. Moving Forward
This study shows that crowdsourced data such as Twitter can be used in gathering
information in an emergency. Through the analysis, the results show where and when these
tweets were issued. After examining the findings of this study, it appears that there is a need for a
meta-analysis of terrorist events and the use of social media. The events surrounding the Boston
Marathon were uncommon for a terrorist event because it took place over multiple days, while
most are single day discrete events. This meta-analysis should look at a number of topics
including single and multi-day events; different social media platforms; if first responders were
on scene (like Boston Marathon) or not; and events that occurred in other countries. Another area
that could be looked into is mass casualties events such as the 2017 Las Vegas Shooting.
Examining studies related to these topics has implications on how to best utilize social media
data for first responders.
Given the volume of data produced on social media (potentially 15 million tweets over
five days within 25-mile radius), an automated process would be needed to review all the Twitter
data during real time. There are currently companies such as Dataminr that use machine learning
for detecting, classifying, and determining the significance of public information in real time.
Machine learning could be the key to being able to use crowdsourced data for intelligence
purposes. There is too much social media traffic for an analyst to search. Also, there are many
different types of emergency events, so keywords would need to be different for each one. An
automated process could provide real time information for any number of events, and filter out
unnecessary data. Analysis could also be done to determine if the frequency of the use of the
word fuck increased dramatically after the bombing and other events. Even though the exact
70
event would not be immediately obvious from many of the tweets, if there was a sudden uptick
of emotive tweets, a machine could potentially detect it.
As the day and then the week went on, more tweets appeared away from the event sites.
This shows that Twitter is also a good way to disseminate information to the public. People miles
away from the explosions shared information within minutes. In this way, Twitter can also assist
first responders by communicating important information to the public rather than only through
harvested intelligence from tweets. It could ultimately help intelligence gathering if a member of
the public that is informed via Twitter has important intelligence to share later.
While the Boston Marathon bombing was a major emergency event, it may not have been
the best choice for assessing the value of crowdsourced data for first responders. The main
reason being that there were already first responders on the scene for the marathon. These first
responders would be providing on the ground intelligence about the incident. Intelligence
gathered from tweets may prove to be even more useful during an unknown event where there
are no first responders initially, such as the Pulse Night Club shooting in Orlando, Florida or the
Las Vegas mass casualty event in Nevada. In those cases, geotagged tweets could have led first
responders to the exact site of the emergency faster than 911 calls or other more conventional
methods. Crowdsourced data would also be less useful during the firefights with the Boston
Bombing suspects. Law enforcement on the ground would have better intelligence than any
bystanders providing information through Twitter.
71
References
"After Action Report for the Response to the 2013 Boston Marathon Bombings." December
2014. http://www.mass.gov/eopss/docs/mema/after-action-report-for-the-response-to-the-
2013-boston-marathon-bombings.pdf.
Atefeh, Farzindar, and Wael Khreich. “A survey of techniques for event detection in
twitter.” Computational Intelligence 31, no. 1 (2015): 132-164.
Atkinson, and J. Flint. 2001. “Accessing Hidden and Hard-to-Reach Populations: Snowball
Research Strategies.” Social Research Update (33).
Boston Marathon. Accessed March 26, 2018. http://www.baa.org/races/boston-marathon/.
Cheong, Marc, and Vincent C. S. Lee. "A microblogging-based approach to terrorism
informatics: Exploration and chronicling civilian sentiment and response to terrorism
events via Twitter." Information Systems Frontiers13, no. 1 (2010): 45-59.
Crooks, Andrew, Arie Croitoru, Anthony Stefanidis, and Jacek Radzikowski. “#Earthquake:
Twitter as a Distributed Sensor System.” Transactions in GIS17, no. 1 (2013): 124-47.
Esen, Berk, and Sebnem Gumuscu. "Turkey: How the Coup Failed." Journal of Democracy 28,
no. 1 (2017): 59-73
Goodchild, Michael F. “Citizens as sensors: the world of volunteered geography.” GeoJournal,
no. 4 (2007): 211-21.
Goodchild, Michael F., and J. Alan Glennon. “Crowdsourcing geographic information for
disaster response: a research frontier.” International Journal of Digital Earth 3, no. 3
(2010): 231-241.
Goodchild, Michael F., and Linna Li. “Assuring the quality of volunteered geographic
information.” Spatial Statistics1 (2012): 110-20.
Gulnerman, Ayse Giz, and Himmet Karaman. "Social Media Spatial Monitor of Coup Attempt in
the Republic of Turkey." 2017 17th International Conference on Computational Science
and Its Applications (ICCSA), 2017.
Heipke, Christian. “Crowdsourcing geospatial data.” ISPRS Journal of Photogrammetry and
Remote Sensing65, no. 6 (2010): 550-57.
Homeland Security Presidential Directive / HSPD-8, “National Preparedness.” December 17,
2003.
72
“Information on more than 170,000 Terrorist Attacks.” Global Terrorism Database. Accessed
March 12, 2018. https://www.start.umd.edu/gtd/.
Kongthon, Alisa, Choochart Haruechaiyasak, Jaruwat Pailai, and Sarawoot Kongyoung. “The
Role of Social Media During a Natural Disaster: A Case Study of the 2011 Thai
Flood.” International Journal of Innovation and Technology Management11, no. 03
(2012): 2227-232.
Oh, Onook, Manish Agrawal, and H. Raghav Rao. "Information control and terrorism: Tracking
the Mumbai terrorist attack through twitter." Information Systems Frontiers13, no. 1
(2010): 33-43.
Palen, Leysia, Kate Starbird, Sarah Vieweg, and Amanda Hughes. “Twitter ‐based information
distribution during the 2009 Red River Valley flood threat.” Bulletin of the Association
for Information Science and Technology 36, no. 5 (2010): 13-17.
Sakaki, Takeshi, Makoto Okazaki, and Yutaka Matsuo. “Earthquake Shakes Twitter Users: Real-
Time Event Detection by Social Sensors,” 19th Int’l Conf. World Wide Web (WWW ’10),
(2010) 851-866.
Sakaki, Takeshi, Makoto Okazaki, and Yutaka Matsuo. “Tweet Analysis for Real-Time Event
Detection and Earthquake Reporting System Development.” IEEE Transactions on
Knowledge and Data Engineering 25, no. 4 (April 2013): 919-31.
See, Linda, Peter Mooney, Giles Foody, Lucy Bastin, Alexis Comber, Jacinto Estima, Steffen
Fritz, Norman Kerle, Bin Jiang, Mari Laakso, Hai-Ying Liu, Grega Milčinski, Matej
Nikšič, Marco Painho, Andrea Pődör, Ana-Maria Olteanu-Raimond, and Martin
Rutzinger. “Crowdsourcing, Citizen Science or Volunteered Geographic Information?
The Current State of Crowdsourced Geographic Information.” ISPRS International
Journal of Geo-Information 5, no. 5 (2016): 55.
Starbird, Kate, Leysia Palen, Amanda L. Hughes, and Sarah Vieweg. “Chatter on the red: what
hazards threat reveals about the social life of microblogged information.” In Proceedings
of the 2010 ACM conference on Computer supported cooperative work, pp. 241-250.
ACM, 2010.
Starbird, Kate, Jim Maddock, Mania Orand, Peg Achterman, and Robert M. Mason. “Rumors,
false flags, and digital vigilantes: Misinformation on twitter after the 2013 boston
marathon bombing.” iConference 2014 Proceedings(2014).
Stefanidis, Anthony, Andrew Crooks, and Jacek Radzikowski. “Harvesting ambient geospatial
information from social media feeds.” GeoJournal, no. 2 (2013): 319-38.
Spyratos, Spyridon, Michael Lutz, and Francesco Pantisano. “Characteristics of Citizen-
contributed Geographic Information.” 17th AGILE Conference on Geographic
Information Science, (2014).
73
Tapia, Andrea H., and Nicolas J. LaLone. “Crowdsourcing Investigations: Crowd Participation
in Identifying the Bomb and Bomber from the Boston Marathon Bombing.” International
Journal of Information Systems for Crisis Response and Management (IJISCRAM) 6, no.
4 (2014): 60-75.
"Terrorism." FBI. May 03, 2016. Accessed March 16, 2018.
https://www.fbi.gov/investigate/terrorism.
Unver, H. Akin, and Hassan Alassaad. "How Turks Mobilized Against the Coup." Foreign
Affairs. April 24, 2018. Accessed April 24, 2018.
https://www.foreignaffairs.com/articles/2016-09-14/how-turks-mobilized-against-coup.
Xu, Zheng, Yunhuai Liu, Neil Y. Yen, Lin Mei, Xiangfeng Luo, Xiao Wei, and Chuanping Hu.
“Crowdsourcing based Description of Urban Emergency Events using Social Media Big
Data.” IEEE Transactions on Cloud ComputingPP, no. 99 (2015).
Xu, Zheng, Hui Zhang, Chuanping Hu, Lin Mei, Junyu Xuan, Kim-Kwang Raymond Choo,
Vijayan Sugumaran, and Yiwei Zhu. “Building knowledge base of urban emergency
events based on crowdsourcing of social media.” Concurrency and Computation:
Practice and Experience 28, no. 15 (2016): 4038-052.
Yin, Jie, Andrew Lampert, Mark Cameron, Bella Robinson, and Robert Power. “Using social
media to enhance emergency situation awareness.” IEEE Intelligent Systems 27, no. 6
(2012): 52-59.
74
Appendix A: Interview Protocol
Introduction
Terrorism continues to be one the most significant security threats of our time. Recent
terrorism events include mass shootings and bombings in the U.S. and worldwide. First
responders—law enforcement, emergency medical services, and fire services—are responsible
for managing the chaos in the immediate aftermath of a terrorism event. Providing first
responders with high quality, detailed information as quickly as possible could greatly enhance
their ability to respond effectively. The focus of this thesis is to determine if Twitter posts are a
useful source of intelligence for first responders.
The utility of twitter data for first responders is being explored using a case study of twitter
posts immediately following the Boston Marathon bombing in 2013. Determining if tweets can
help first responders determine the location of event, extent of affected area, type of damage,
severity of damage, location and number of casualties, and presence of other first responders.
Information mined from tweets will be compared to the actually available information to
determine if it could have provided a more definite location, more detail, and was timelier. This
thesis also discusses related research questions about using crowdsourced data for intelligence
purposes including the relative accuracy of the data the potential for automating searches and
pushing the information to first responders in case of an emergency, and whether there are ways
to encourage or boost social media use to augment available information following an incident.
1. Do you consider yourself a first responder?
2. How would define the term first responder?
3. What organization are you a first responder for?
4. How long have you been with this agency?
5. What is your current position/title?
6. Do you have other experience as a first responder, and in what capacity?
7. Do you have experience as a first responder in a emergency situation?
8. Does your organization classify emergency events? How?
9. How do you receive information for responding to an emergency?
10. What type of information do you receive?
75
11. Do you know if your organization uses social media during an emergency, either to
gather information or disseminate information to the public?
a. If yes, is there a designated person in your agency responsible for social media?
12. What information would you find useful during an emergency?
a. Location of event
b. Extent of affected area
c. Type of damage (e.g. fire, gas, flooding, active shooter, etc.),
d. Severity of damage
e. Number of casualties
f. Presence of other first responders (Fire, Police, Medical), etc.
13. Have you ever responded to a terrorist event?
14. Do your answers from above change if you are considering a terrorist event?
15. Do your answers from above change if you are considering a multi-day event such as the
Boston Marathon bombing events?
16. What characteristics of information would you find useful during an emergency?
a. More definite location
b. More detail
c. Timelier, etc.
17. Do your answers from above change if you are considering a terrorist?
18. Do your answers from above change if you are considering a multi-day event such as the
Boston Marathon bombing events?
19. How often do you train to prepare for a major emergency/terrorist event?
20. Is there anyone you would recommend for this interview?
Abstract (if available)
Abstract
Terrorism continues to be one the most significant security threats of our time. Recent terrorism events include mass shootings and bombings in the U.S. and worldwide. First responders—law enforcement, emergency medical services, and fire services—are responsible for managing the chaos in the immediate aftermath of a terrorism event. Providing first responders with high quality, detailed information as quickly as possible could greatly enhance their ability to respond effectively. Recently, crowdsourced data available through platforms such as Twitter, Facebook, and other social media outlets, have emerged as a potential source to aid first responders following a terrorism event. The focus of this thesis is to determine if Twitter posts are a useful source of intelligence for first responders. Mining this readily available data could also be useful following a natural disaster. ❧ The utility of twitter data for first responders was explored using a case study of the events following the Boston Marathon bombing in 2013. Twitter data was collected via GNIP, a social media API aggregation company. Through text analysis and interviews with first responders, a list of relevant keywords was developed. Kernel density was used to determine density of tweets in relation to events that took place from April 15th through April 19th, 2013. Spatio-temporal analysis was conducted to show when and from where tweets were being sent on April 15th, 2013. Results show that on Monday through Thursday the greatest density of tweets was surrounding the bombsites
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Social media to locate urban displacement: assessing the risk of displacement using volunteered geographic information in the city of Los Angeles
PDF
Creating a Web GIS to support field operations and enhance data collection for the Animal and Plant Health Inspection Service (APHIS)
PDF
Using GIS to perform a risk assessment for air-transmitted bioterrorism within San Diego County
PDF
Development of a Web GIS application to aid marathon runners in the race selection and planning process
PDF
Classifying ash cloud attributes of Eyjafjallajökull volcano, Iceland, using satellite remote sensing
PDF
Tracking Santa Barbara County wildfires: a web mapping application
PDF
The impact of severe coastal flooding on economic recovery disparities: a study of New Jersey communities following Hurricane Sandy
PDF
A comparison of address point and street geocoding techniques in a computer aided dispatch environment
PDF
Building a geodatabase design for American Pika presence and absence data
PDF
Mapping future population impacts caused by sea level rise in Huntington Beach and Newport Beach: comparing the cadastral-based dasymetric system to past dasymetric mapping methods
PDF
A comparative study of ground and satellite evapotranspiration models for southern California
PDF
The impact of definition criteria on mapped wildland-urban interface: a case study for ten counties along the Oregon-California border
PDF
Exploring the pernicious effects of redlining and discriminatory policies on an American city: a spatio-temporal case study of New York City
PDF
Radar horizon estimation from monoscopic shadow photogrammetry of radar structures: a case study in the South China Sea
PDF
The use of site suitability analysis to model changes in beach geomorphology due to coastal structures
PDF
Finding the green in greenspace: an examination of geospatial measures of greenspace for use in exposure studies
PDF
The global market for wombs: a study of the transnational surrogacy industry in Mexico
Asset Metadata
Creator
Howieson, Devlin Quinlan
(author)
Core Title
Assessing the value of crowdsourced data in aiding first responders: a case study of the 2013 Boston Marathon
School
College of Letters, Arts and Sciences
Degree
Master of Science
Degree Program
Geographic Information Science and Technology
Publication Date
06/28/2018
Defense Date
04/19/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Boston Marathon,crowdsourced data,first responders,geospatial intelligence,GIS,OAI-PMH Harvest,terrorism,Twitter
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Ruddell, Darren (
committee chair
), Fleming, Steven (
committee member
), Tao, Ran (
committee member
)
Creator Email
howieson@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-12952
Unique identifier
UC11671717
Identifier
etd-HowiesonDe-6356.pdf (filename),usctheses-c89-12952 (legacy record id)
Legacy Identifier
etd-HowiesonDe-6356.pdf
Dmrecord
12952
Document Type
Thesis
Format
application/pdf (imt)
Rights
Howieson, Devlin Quinlan
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
Boston Marathon
crowdsourced data
first responders
geospatial intelligence
GIS
Twitter