Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
Computer Science Technical Report Archive
/
USC Computer Science Technical Reports, no. 964 (2016)
(USC DC Other)
USC Computer Science Technical Reports, no. 964 (2016)
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
User-GeneratedMobileVideoDatasetwithFineGranularity
SpatialMetadata
Ying Lu
y
Hien To
y
Abdullah Alfarrarjeh
y
Seon Ho Kim
y
Yifang Yin
z
Roger Zimmermann
z
Cyrus Shahabi
y
y
Integrated Media Systems Center, University of Southern California, Los Angeles, CA 90089
z
School of Computing, National University of Singapore, Singapore 117417
y
{ylu720, hto, alfarrar, seonkim, shahabi}@usc.edu
z
{yifang, rogerz}@comp.nus.edu.sg
ABSTRACT
When analyzing and processing videos, it has become in-
creasingly important in many applications to also consider
contextual information, in addition to the content. With the
ubiquity of sensor-rich smartphones, acquiring a continuous
stream of geo-spatial metadata that includes the location
and orientation of a camera together with the video frames
has become practical. However, no such detailed dataset
is publicly available. In this paper we present an extensive
geo-tagged video dataset that has been collected as part of
the MediaQ [3] and GeoVid [1] projects. The key features
of the dataset are that each video le is accompanied by a
metadata sequence of geo-tags consisting of both GPS loca-
tions and compass directions at ne-grained intervals. The
dataset has been collected by volunteer users and its statis-
tics can be summarized as follows: 2,397 videos containing
208,978 video frames that are geo-tagged, collected by 289
users in more than 20 cities across the world over a peri-
od of 10 years (2007{2016). We hope that this dataset will
be useful for researchers, scientists and practitioners alike in
their work.
Keywords
Multimedia; Dataset; Mobile Video; Metadata
1. INTRODUCTION
Due to the ubiquitous availability of smartphones, a num-
ber of trends have recently emerged with respect to mobile
videos. First, we are experiencing unprecedented growth in
the amount of mobile video content that is being collected
with smartphones. Creating, sharing and viewing videos are
immensely popular activities with mobile users. This is fa-
cilitated by the ease with which it is possible to record and
play back videos on mobile devices. Second, each smart-
phone include a plethora of dierent built-in sensors such
as cameras, a global position system (GPS) receiver, and
a compass. This facilitates the modeling of video content
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ACMMultimediaSystems’16DatasetTrack, May 10-13, 2016. Klagenfurt
am W orthersee, Austria
Copyright 2016 ACM X-XXXXX-XX-X/XX/XX ...$10.00.
through its geo-spatial properties at the ne granular level
of a frame, a concept referred to as Field-Of-View (FOV) [6].
The FOV model has been shown to be very useful for various
media applications such as online mobile media management
systems, for example MediaQ [12] and GeoVid [4]. The video
dataset (http://mediaq.usc.edu/dataset/) presented in this
paper has been collected with their respective mobile app-
s [3, 1].
Oering a dataset of user-generated, geo-tagged videos
can help researchers with dierent undertakings. (1) In ad-
vanced spatiotemporal video search, with the attached geo-
metadata in this video dataset, it is possible to convert the
challenging problem of user-generated video indexing and
querying to the problem of indexing and querying FOV s-
patial objects. Large-scale video data management using
spatial indexing and querying of FOVs is a challenging prob-
lem, especially to maximally harness the geographical prop-
erties of FOVs. To attack this challenge, several indexes
such as a grid-based index [15, 16] and the OR-tree [14]
have been proposed. (2) Correlating video content with it-
s spatial information adds a dierent perspective to video
analytics (e.g., the acquisition of user behavior). (3) 3D
model reconstruction or creating panoramas at specic lo-
cations from user-generated videos can provide updated and
\fresh" immersive user experiences. Spatial information can
be eectively utilized for ltering irrelevant video frames [13,
22]. (4) The geo-metadata associated with videos can facil-
itate the selection of the most relevant videos / images for
down-stream computer vision applications. For example, a
persistent tracking application was studied with GIFT [7]
(Geospatial Image Filtering Tool) to select the key video
frames to signicantly reduce the communication and pro-
cessing costs. (5) Videos collected at dierent times for the
same location enable the study of before and after situations
for viewable scenes.
To the best of our knowledge, no existing public dataset
contains user-generated, nely geo-tagged videos. Google
Street View [2, 25] provides a set of images with locations
and compass directions captured using professional-grade e-
quipment. However, Google data is collected by cars driving
through every street once, which treats every location \e-
qually" without re
ecting the \popularity" of a place. How-
ever, in our user-generated dataset, more videos are collected
in popular locations. Flickr provides an image dataset with
locations. However, it does not provide camera viewing di-
rections. The Stanford mobile video dataset [8] provides
user-generated videos without spatial information. Sakire
et al. [5] have presented a synthesized dataset using a ran-
dom walk model and it includes only metadata without the
actual video content. On the other hand, our dataset is nov-
el for the following two reasons: (1) it fuses spatial metadata
(GPS information and compass direction) with a set of sam-
pled frames in the recorded video les and (2) the data is
crowd-sourced (also referred to as user-generated videos or
UGVs) with a wide variety of mobile devices, i.e., it re
ects
a very heterogeneous set of hardware and software.
The remaining parts of this paper are organized as fol-
lows. Section 2 explains the data collection mechanism and
describes the dataset. Section 3 provides statistics about our
dataset followed by examples of the use of such a dataset in
Section 4. Finally, we conclude in Section 5.
2. DATASET
2.1 SpatialModelforGeo-TaggedMobileVideos
We represent each video as a sequence of video frames,
and the visible scene of each video frame is modeled as a
Field of View (FOV) [6]. As shown in Figure1, each FOV
f is in the form of , where p is the camera
position consisting of the latitude and longitude coordinates
read from the GPS sensor in a mobile device, is the angle
of the camera viewing direction
!
d with respect to the north
obtained from the digital compass sensor,R is the maximum
visible distance within which an object can be recognizable,
and denotes the visible angle obtained from the camera
lens property at the current zoom level. FOVs in two dimen-
sions considering only camera azimuths are circular sector
shaped while 3-dimensional FOVs are in cone shaped con-
sidering other two camera rotation types: pitch and roll. We
deneF as the video frame setffj8f2 v;8v2Vg in the
set of all the videos,V.
p
θ
α
North
R
d
r
Figure 1: 2D Field-of-View (FOV) model.
2.2 DataCollection
We have collected the geo-tagged mobile video data from
two mobile applications, GeoVid [4, 1] and MediaQ [12, 3].
GeoVid is a system for collecting, indexing, searching geo-
tagged videos. GeoVid was extended into MediaQ system
which collects more metadata of video contents and addi-
tionally exploits the idea of spatial crowdsourcing termed
GeoCrowd [11] to collect on-demand media content on be-
half of users. GeoVid and MediaQ mobile apps (iOS or
Android) are developed to collect videos and capture geo-
metadata along with the videos. They share some common
geospatial metadata, such as location and direction, in their
own mobile videos collection.
While recording a video on a mobile device, various sen-
sors (e.g., GPS, compass) are used to collect geospatial meta-
data (e.g., locations, camera viewing directions). Each time-
stamped sensor data is recorded whenever its change is re-
ported during the video recording. Note that we do not
collect sensor readings if their values do not change over
time. This update-only policy reduces the sampling rate sig-
nicantly. Since the compass sensor generates readings very
frequently, we limit its acquisition frequency (e.g., no more
than 5 per second) to avoid unneccessary redundancy with-
out losing accuracy. GPS sensor data is sampled approxi-
mately once per second. After recording a video, we gener-
ate the geo-metadata le by synchronizing both the sampled
GPS and compass sensor data. Specically, for each GPS
record, we combine the compass reading with the closest
timestamp. Then, we enhance the accuracy of the location
metadata using a post-processing ltering step immediately
after generating the metadata. We applied the data cor-
rection algorithm with Kalman ltering and weighted linear
least square regression [23] to manage potential big variance
in GPS signals.
2.3 DatasetDescription
Our proposed dataset consists of two sets, videos and their
geospatial metadata. The metadata is stored in two les;
the rst le (i.e., VideoMetadata.txt) contains the video le
and the recording mobile device information for each video,
and the second le (i.e., FOVMetadata.txt) includes the geo-
information about the sampled geo-frames (i.e., FOVs) of
videos. Both les are in a tab space separated value format.
2.3.1 VideoMetadata
Each record in the VideoMetadata.txt le contains the in-
formation of a video. Each record is composed of seven
elds:
VideoFileName: is the physical video le name which
is considered as the identier of the video record.
DeviceID: is an identier of the smartphone used for
recording the video. In our system, we used the sub-
scriber identity module (SIM) identier to generate a
unique identier for a smartphone. For privacy reason-
s, we reported anonymized device IDs to protect user
identity.
DeviceModel: is the model of the smartphone. Ex-
ample values of this eld includeGalaxy Nexus , GT-
I9190, and Nexus 5.
IsVideoContentUploaded: is a
ag indicating whether
the actual video le exists in the dataset. Because the
size of metadata is small, our app sends it to the server
automatically immediately after a video recording is
done and the user can upload the recorded video later
whenever he has a good network bandwidth. In our
dataset, 15.43% of the total videos have only metadata
without video les. The value of this eld is 1 or 0; 1
means the actual video and its metadata exist and 0
means only the video metadata is available.
VideoResolution: is the resolution of recorded video
in the number of pixels (e.g., 19201080).
VideoLength: is the temporal length of the video le
in seconds.
FrameCount: is the total number of frames included
in the video le.
The values of the rst ve elds are captured at video record-
ing time while the remaining elds are computed for records
with both metadata and actual video les after processing
the video les. The data type of all the elds is string (i.e.,
a sequence of characters) except the elds VideoLength,
IsVideoContentUploaded, andFrameCount which are nu-
meric values.
2.3.2 SpatialVideoMetadata
In this metadata le (i.e., FOVMetadata.txt), we provide
a ne granularity geospatial metadata about the collected
videos, consisting of a set of frame records (i.e., FOVs). Each
record represents an FOV comprised of eleven elds:
VideoFileName: is the video le name that contains
the FOV.
FOVNumber: is the sequential number of the FOV in
the video namedVideoFileName. The tuple (VideoFileName,
FOVNumber) uniquely identies an FOV (or a record).
Latitude: is the latitude of the camera location, based
on the GPS coordinates, when the frame was recorded.
The value of Latitude is in the range [90
, 90
] in
double precision (i.e., rounded o up to 15 decimal
places).
Longitude: is the longitude of the camera location.
The value of Longitude is within the range [180
,
180
] in double precision.
Z : is the azimuth angle of the camera viewing direc-
tion. In two-dimension, it expresses the angular dis-
tance from the North, as shown in Figure 1. In 3D, it
is rotation angle around theZ axis based on the world
coordinate system
1
. Z axis points towards the cen-
ter of the Earth and is perpendicular to the ground,
Y axis, which is tangential to the ground and points
towards the magnetic North Pole of the Earth. X axis
is dened as the vector product of Y and Z. The Z
is a double precision value within the range [0
, 360
].
X : is the pitch angle of the camera viewing direction.
It refers to the rotation angle around the X axis.
Y : is the roll angle of the camera viewing direction.
It refers to the rotation angle around the Y axis. The
Y is a double precision value within the range [180
,
180
].
R: is the maximum visible distance of the FOV within
which an object can be recognizable. In our dataset,
the value of R is a predened constant (e.g., 0.1 kilo-
meters).
: is the visible angle of the FOV obtained from the
camera lens property at the zoom level when the frame
is recorded. The value of is also set as a predened
constant (e.g., 51
).
Timestamp: is the number of elapsed milliseconds
from 1970-01-01 00:00:00.
1
http://developer.android.com/reference/android/hardware/SensorEvent.html
Total number of videos with geo-metadata 2,397
Total number of videos with both geo-metadata and contents 1,924
Total length of videos with contents (hour) 38.54
Average length per video with content (sec) 72.14
Percentage of videos which have keywords 22.78%
Average camera moving speed (km/h) 4.5
Average camera rotation speed (degrees/sec) 10
Total number of users 289
Average number of videos by each user 8.29
Total number of FOVs 208,978
Total FOV number of video with contents 142,687
Average number of FOV per second 1.03
Average number FOV per video 74.16
Table 1: Overview of the dataset.
Keywords: is a set of keywords that are associated
with the FOV. The keywords are separated by a semi-
colon. (e.g., school; new high street; spring street).
Our apps provide two types of keywords: automatic
and manual. The automatic keywords are extracted
from OpenStreetView
2
based on the FOV covering
area [18]. The manual keywords are provided by users
when uploading a video. The manual keywords for a
video are populated to all FOVs belonging to it. If an
FOV is associated with multiple keywords, then the
keywords are separated with commas. If there is no
keyword, the eld is NULL.
3. MOBILEVIDEOSTATISTICS
In this section, we present the statistics of our video dataset.
Table 1 shows the overview statistics of our dataset col-
lected during the period of 10 years (20072016). There are
2397 videos in total, in which 1924 videos have both video
contents and geo-metadata. For the 1924 videos, there are
41:33 hours in total, and each video is 66:33 seconds on av-
erage. As we discussed in Sec. 2.3.2, videos can be attached
with manual-typed keywords and the auto-tagged keywords
based on their FOV covering area. In our dataset, 22:78%
of the total videos have keywords. Most of the videos are
recorded by users casually in a walk mode. The camera mov-
ing speed is 4:5 km/h on average, and the camera rotation
speed is 10 degrees/sec (i.e., the azimuth angle Z changing
speed). In addition, our dataset is collected by 289 users
(e.g., students, faculties), and each user collects 8:29 videos
on average. Moreover, there are 208; 978 FOVs sampling
from the 2397 videos and 142; 687 FOVs sampling from the
1924 videos with contents only. Therefore, the average FOV
sampling rate is 1:03 FOVs per second, and each video is
associated with 74:16 FOVs on average.
We next discuss the detailed statistics of the dataset.
Figure 2 displays the location distribution of the collect-
ed videos on Google Maps. As we can see, the videos are
collected from the allover the world. Specically, as shown
in Figure 3a, most of the videos are recorded in Los An-
geles (45%), Singapore (28%), Munich (13%), and the rest
14% videos are from 18 other cities. We also plotted the
distribution of the number of videos collected at dieren-
t years (2007 2016) in Figure 3b. Most of the data are
collected during the ve years (2011 2015). As mentioned
above, there are 289 users collected this dataset, and the
number of videos collected by each user are summarized in
2
http://openstreetview.org/
Figure 3c. Around one third of the users recorded only one
video. However, 18% of the users collected more than 10
videos. In addition, we display the distribution of the num-
bers of videos collected by devices with dierent OS (e.g.,
Android, iOS) in Figure 3d. Note that we do not know the
OS type of 62% video because old version of our app did not
collect this information. Figure 4 shows the distribution of
video length. It is worth noting that more than 40% of the
videos are less than 30 seconds, and around 4% the videos
are more than 300 seconds. Furthermore, as shown in Fig-
ure 5, the resolutions of most of the videos are 640 480
(36%) and 720 480 (50%), which are standard resolutions
for mobile digital videos.
Figure 2: Locations of the videos.
(a) Per city (b) Per year
(c) per user (d) Per OS type
Figure 3: The distributions of the number of videos.
Figure 4: Video length distribu-
tion
Figure 5: Video resolution distri-
bution
Places Video#
p1 Near RTH Building @ USC, Los Angeles 13
p2 37th Street on USC Campus @ Los Angeles 10
p3 Chinesischer Turm @ Munich 8
p4 Tommy Trojan @ USC, Los Angeles 7
p5 The lawn outside Leavey Library @ USC, Los Angeles 6
Figure 6: Top 5 video dense locations
Places FOV#
p6 Merlion Park @ Singapore 293
p7 Esplanade Theatres @ Singapore 234
p8 Singapore Cruise Center 156
p9 The Float @ Marina Bay, Singapore 138
p10 Bedok Reservoir Park @ Singapore 101
Figure 7: Top 5 FOV dense locations
In our dataset, the locations of the videos are skewed dis-
tributed. In some popular places, there are many videos.
This dataset includes the geo-tagged video dataset used in
our previous panoramic image generation [13] and 3D model
reconstruction work [22]. To nd the dense locations (i.e.,
places with many videos / FOVs), we calculate the number-
s of videos and FOVs that are located within each \place"
(i.e., a 100 meters 100 meters cell) in the dataset. Note
that the videos we consider during the calculation contain
both video contents and geo-metadata. Further, for each
video, we the camera location (Latitude, Longitude) of its
rst FOV as the location of the video. We refer the camera
location of FOV as the FOV location. After the calcula-
tion, we nd there 14 places with more than 5 videos, and
the top-5 video densest places are listed in Table 6. Mean-
while, there are 62 places with more 20 FOVs, and the top-5
FOV densest places are listed in Table 7. In Table 6, four of
the places are in the USC compass in Los Angeles and one
is in Munich. In Table 7, all the places are in Singapore.
Their top-5 densest places are dierent mainly because that
the videos collected Singapore are longer. In the ten dense
places, videos can be used for panorama generation, motion
structure, 3D model reconstruction, etc. For example, p2
a quiet street on the USC campus where route panoramic
images [13] can be generated. p3,p5,p6,p7 andp9 are out-
door open area where point panorama [13] can be created.
In p4 and p6, there some recognizable gures (e.g., Tommy
Trojan, Merlion) where 3D model can be reconstructed [22].
4. EXAMPLESOFUSE
The dataset can be used for a variety of purposes. This
section provides some past use cases of the spatial metadata
by the authors.
4.1 AdvancedSpatiotemporalVideoSearch
It is very challenging to index and search videos and pic-
tures at a large scale. Traditional techniques, such as anno-
tating videos with keywords and content-based retrieval, are
often unsatisfactory due to the lack of appropriate keywords
and the accuracy of search results, respectively . While man-
ually annotated videos can be searched, annotating large
video collections is laborious and time-consuming. On the
other hand, content-based video retrieval is challenging, com-
putationally expensive and the result is often associated with
uncertainty. With the availability of sensor-rich metada-
ta (e.g., location, direction), it is ecient to search video
data at the high semantic level preferred by humans [12].
For example, region queries (i.e., rectangular query or circle
query) can easily retrieve all FOVs that overlap with a giv-
en region using spatial database technologies. A directional
query searches all video segments whose FOV direction an-
gles are within a range of an allowable error margin to a
user-specied input direction angle. Figure 8 shows dier-
ent results of two directional range queries whose spatial
regions are the same.
(a) North (b) South
Figure 8: Examples of directional query. The red icons represent
video segments with a specic compass direction.
4.2 VideoDataAnalytics
Using geospatial metadata, video data analytics might be
easier. For example, one can detect events (or hotspots)
from a large number of videos. Figure 9 shows FOV cover-
age of 315 videos collected during two years 2014 and 2015
at the University of Southern California. The gure shows
that several hotspots are automatically recognized. There
are two types of hotspots, point of interests where many
people visited frequently (i.e., RTH and PHE buildings and
Tommy Trojan) and actual events happened at the campus
where many videos were collected during a short time pe-
riod (two-days LA book festival). It has been shown that
detecting hotspots is also helpful in disasters to quickly de-
tect situations [21]. Particularly, the coverage map would
help decision makers to better understand the disaster situ-
ation and provide timely actions accordingly. For example,
on-site volunteers can be sent to the hotspots to check the
situation or o-site analysts may want to collect more data
in sparsely covered areas.
Figure 9: A heatmap of FOV coverage of 315 videos are overlaid on
top of Google Maps. This shows that it is possible to automatically
detect hotspots from geo-tagged mobile video metadata.
Another application regarding metadata analytics is that
one may quickly detect informative, signicant or impor-
tant video segments (i.e., a sequence of FOVs with more
useful visual information) from a large collection of videos.
One may dene a signicant segment using a specic pat-
tern/rule. For example, if a user stops and shoots a video
for a while and points his phone toward a specic direction,
it is likely that there is an interesting event in front of him.
Therefore, by searching for a specic pattern (i.e., a period
of at least 20 seconds while movement is minimal) in camera
movements, we might be able to eciently nd a subset of
signicant video segments with only metadata. For exam-
ple, we nd one event as shown in Figure 10, where someone
stops and records a group of people singing on a stage. The
pattern can be relaxed to search for panoramic scenes (i.e.,
the location does not change while the direction increases
or decreases) [13] or scenes where the camera direction does
not change much (e.g.,j maximum angle - minimum anglej
< 10 degrees).
Figure 10: An actual event are automatically found from a large col-
lection of geo-tagged mobile videos.
4.3 Geospatial Filtering for Computer Vision
Applications
In computer vision eld, researchers mainly focus on the
analysis of a given set of input image and videos without con-
sidering what would be the most eective input dataset for
the analysis. However, an unprecedented number of videos
/ images are currently being collected. Therefore, how to s-
elect the relevant input videos / images for computer vision
eciently becomes a challenging problem. Nevertheless, the
geo-metadata associated with videos in our dataset can fa-
cilitate the selection of the most relevant videos / images
for down-stream computer vision applications (e.g., object
tracking, feature extraction, people counting). For example,
a persistent tracking application was studied with GIFT [7]
(Geospatial Image Filtering Tool) to select the key video
frames to signicantly reduce the communication and pro-
cessing costs.
4.4 Content-basedvideoretrieval
However video search has been widely studied in the re-
search community, identifying an object in a video database
is still a challenging problem. Searching videos were studied
in dierent paradigms; content-based, keywords, semantic-
based, and spatial-based. In content-based search, low-level
features (e.g., color histograms, Gabor texture features, and
motion vectors) are extracted [19]. Also, user-generated
annotations (i.e., tags) [20] [9] can be utilized to under-
stand video content. To avoid the labours work of user
tagging, video context is also used to probe the seman-
tic [17] [10]. On the other hand, spatial metatdata enables
video search [24] [14] [16]. Our dataset consisting of geo-
referenced videos tagged with keywords improves video re-
trieval when leveraging dierent paradigms of video search.
5. CONCLUSION&FUTUREWORK
In this paper, we have presented a new dataset of user-
generated mobile videos. The key features of this dataset are
each video le is accompanied with a metadata sequence of
geo-tags consisting of both GPS locations and compass direc-
tions at ne-grained intervals. To the best of our knowledge,
no existing public dataset contains user-generated, nely
geo-tagged videos. With the additional spatial information,
this video dataset will be used in advanced spatiotemporal
video search, accelerating video content analysis technolo-
gies, video mining and video retrieval and ranking. Future
extensions of this dataset will mainly target the following
directions: 1) providing more precise camera setting prop-
erties, e.g., zoom level, camera lens; 2) incorporating with
social information, e.g., the numbers of users that view, like
and share the videos, for social media applications;
6. REFERENCES
[1] GeoVid Project. http://geovid.org/.
[2] Google Street View Dataset. http://crcv.ucf.edu/
data/GMCP_Geolocalization/#Dataset.
[3] MediaQ Project.
http://mediaq1.cloudapp.net/home/.
[4] S. Arslan Ay, L. Zhang, S. H. Kim, M. He, and
R. Zimmermann. Grvs: a georeferenced video search
engine. In Proceedings of the 17th ACM international
conference on Multimedia, pages 977{978, 2009.
[5] S. A. Ay, S. H. Kim, and R. Zimmermann. Generating
synthetic meta-data for georeferenced video
management. In ACM SIGSPATIAL/GIS, pages
280{289, 2010.
[6] S. A. Ay, R. Zimmermann, and S. H. Kim. Viewable
scene modeling for geospatial video search. In
Proceedings of the 16th ACM international conference
on Multimedia, pages 309{318, 2008.
[7] Y. Cai, Y. Lu, S. H. Kim, L. Nocera, and C. Shahabi.
Gift: A geospatial image and video ltering tool for
computer vision applications with geo-tagged mobile
videos. In Multimedia and Expo Workshops, pages 1{6,
2015.
[8] V. R. Chandrasekhar, D. M. Chen, S. S. Tsai, N.-M.
Cheung, H. Chen, G. Takacs, Y. Reznik,
R. Vedantham, R. Grzeszczuk, J. Bach, et al. The
stanford mobile visual search data set. In Proceedings
of the 2nd annual ACM conference on Multimedia
systems, pages 117{122, 2011.
[9] Y. Gao, M. Wang, Z.-J. Zha, J. Shen, X. Li, and
X. Wu. Visual-textual joint relevance learning for
tag-based social image search. IEEE Transactions on
Image Processing, 22(1):363{376, 2013.
[10] L. Jiang, S.-I. Yu, D. Meng, T. Mitamura, and A. G.
Hauptmann. Bridging the ultimate semantic gap: A
semantic search engine for internet videos. In
Proceedings of the 5th ACM International Conference
on Multimedia Retrieval, pages 27{34, 2015.
[11] L. Kazemi and C. Shahabi. Geocrowd: enabling query
answering with spatial crowdsourcing. In ACM
SIGSPATIAL/GIS, pages 189{198, 2012.
[12] S. H. Kim, Y. Lu, G. Constantinou, C. Shahabi,
G. Wang, and R. Zimmermann. Mediaq: mobile
multimedia management system. In Proceedings of the
5th ACM Multimedia Systems Conference, pages
224{235, 2014.
[13] S. H. Kim, Y. Lu, J. Shi, A. Alfarrarjeh, C. Shahabi,
G. Wang, and R. Zimmermann. Key frame selection
algorithms for automatic generation of panoramic
images from crowdsourced geo-tagged videos. In Web
and Wireless Geographical Information Systems, pages
67{84. Springer, 2014.
[14] Y. Lu, C. Shahabi, and S. H. Kim. An ecient index
structure for large-scale geo-tagged video databases.
In ACM SIGSPATIAL/GIS, pages 465{468, 2014.
[15] H. Ma, S. A. Ay, R. Zimmermann, and S. H. Kim. A
grid-based index and queries for large-scale geo-tagged
video collections. In DASFAA Workshops, pages
216{228, 2012.
[16] H. Ma, S. A. Ay, R. Zimmermann, and S. H. Kim.
Large-scale geo-tagged video indexing and queries.
GeoInformatica, Dec. 2013.
[17] G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, T. Mei, and
H.-J. Zhang. Correlative multi-label video annotation.
In Proceedings of the 15th international conference on
Multimedia, pages 17{26, 2007.
[18] Z. Shen, S. Arslan Ay, S. H. Kim, and
R. Zimmermann. Automatic tag generation and
ranking for sensor-rich outdoor videos. In Proceedings
of the 19th ACM international conference on
Multimedia, pages 93{102, 2011.
[19] J. Sivic and A. Zisserman. Video google: A text
retrieval approach to object matching in videos. In
Proceedings of the 9th IEEE International Conference
on Computer Vision, pages 1470{1477, 2003.
[20] X. Tian, L. Yang, J. Wang, Y. Yang, X. Wu, and
X.-S. Hua. Bayesian video search reranking. In
Proceedings of the 16th ACM international conference
on Multimedia, pages 131{140, 2008.
[21] H. To, S. H. Kim, and C. Shahabi. Eectively
crowdsourcing the acquisition and analysis of visual
data for disaster response. In 2015 IEEE International
Conference on Big Data, pages 697{706, 2015.
[22] G. Wang, Y. Lu, L. Zhang, A. Alfarrarjeh,
R. Zimmermann, S. H. Kim, and C. Shahabi. Active
key frame selection for 3d model reconstruction from
crowdsourced geo-tagged videos. In Multimedia and
Expo (ICME), pages 1{6, 2014.
[23] G. Wang, B. Seo, and R. Zimmermann. Automatic
positioning data correction for sensor-annotated
mobile videos. In ACM SIGSPATIAL/GIS, pages
470{473, 2012.
[24] Y. Yin, Y. Yu, and R. Zimmermann. On generating
content-oriented geo features for sensor-rich outdoor
video search. Multimedia, IEEE Transactions on,
17(10):1760{1772, 2015.
[25] A. R. Zamir and M. Shah. Image geo-localization
based on multiplenearest neighbor feature matching
usinggeneralized graphs. IEEE Transactions on
Pattern Analysis and Machine Intelligence,
36(8):1546{1558, 2014.
Abstract (if available)
Linked assets
Computer Science Technical Report Archive
Conceptually similar
PDF
USC Computer Science Technical Reports, no. 948 (2014)
PDF
USC Computer Science Technical Reports, no. 959 (2015)
PDF
USC Computer Science Technical Reports, no. 968 (2016)
PDF
USC Computer Science Technical Reports, no. 748 (2001)
PDF
USC Computer Science Technical Reports, no. 943 (2014)
PDF
USC Computer Science Technical Reports, no. 742 (2001)
PDF
USC Computer Science Technical Reports, no. 739 (2001)
PDF
USC Computer Science Technical Reports, no. 843 (2005)
PDF
USC Computer Science Technical Reports, no. 962 (2015)
PDF
USC Computer Science Technical Reports, no. 911 (2009)
PDF
USC Computer Science Technical Reports, no. 590 (1994)
PDF
USC Computer Science Technical Reports, no. 592 (1994)
PDF
USC Computer Science Technical Reports, no. 912 (2009)
PDF
USC Computer Science Technical Reports, no. 966 (2016)
PDF
USC Computer Science Technical Reports, no. 835 (2004)
PDF
USC Computer Science Technical Reports, no. 799 (2003)
PDF
USC Computer Science Technical Reports, no. 840 (2005)
PDF
USC Computer Science Technical Reports, no. 886 (2006)
PDF
USC Computer Science Technical Reports, no. 650 (1997)
PDF
USC Computer Science Technical Reports, no. 766 (2002)
Description
Ying Lu, Hien To, Abdullah Alfarrarjeh, Seon Ho Kim, Yifang Yin, Roger Zimmermann, and Cyrus Shahabi. "User-generated mobile video dataset with fine granularity spatial metadata." Computer Science Technical Reports (Los Angeles, California, USA: University of Southern California. Department of Computer Science) no. 964 (2016).
Asset Metadata
Creator
Alfarrarjeh, Abdullah
(author),
Kim, Sean Ho
(author),
Lu, Ying
(author),
Shahabi, Cyrus
(author),
To, Hien
(author),
Yin, Yifang
(author),
Zimmermann, Roger
(author)
Core Title
USC Computer Science Technical Reports, no. 964 (2016)
Alternative Title
User-generated mobile video dataset with fine granularity spatial metadata (
title
)
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Tag
OAI-PMH Harvest
Format
6 pages
(extent),
technical reports
(aat)
Language
English
Unique identifier
UC16270395
Identifier
16-964 User-Generated Mobile Video Dataset with Fine Granularity Spatial Metadata (filename)
Legacy Identifier
usc-cstr-16-964
Format
6 pages (extent),technical reports (aat)
Rights
Department of Computer Science (University of Southern California) and the author(s).
Internet Media Type
application/pdf
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/
Source
20180426-rozan-cstechreports-shoaf
(batch),
Computer Science Technical Report Archive
(collection),
University of Southern California. Department of Computer Science. Technical Reports
(series)
Access Conditions
The author(s) retain rights to their work according to U.S. copyright law. Electronic access is being provided by the USC Libraries, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Repository Email
csdept@usc.edu
Inherited Values
Title
Computer Science Technical Report Archive
Coverage Temporal
1991/2017
Repository Email
csdept@usc.edu
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/