Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Understanding semantic relationships between data objects
(USC Thesis Other)
Understanding semantic relationships between data objects
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
UNDERSTANDINGSEMANTICRELATIONSHIPSBETWEENDATAOBJECTS
by
NaChen
ADissertationPresentedtothe
FACULTYOFTHEUSCGRADUATESCHOOL
UNIVERSITYOFSOUTHERNCALIFORNIA
InPartialFulfillmentofthe
RequirementsfortheDegree
DOCTOROFPHILOSOPHY
(COMPUTERSCIENCE)
August2013
Copyright 2013 NaChen
Tomyfamily
ii
Acknowledgments
PhDisajourney. Likeeveryjourneyinlife,therearemanypeoplewhohavehelpedme
alongtheway. Iwouldliketoexpressmysinceregratitude.
First, tomyadvisorProf. ViktorPrasanna, forhisinsightfulguidanceandgenerous
support, and for challenging me to take my research to the next level. I would like to
thank Prof. Prasanna, Prof. Dennis McLeod, and Prof. Raghu Raghavendra for taking
theirprecioustimetoserveonmydissertationcommittee.
Second, to my husband and parents, for their love, encouragement and patience. I
couldnothavecompletedmyPhDwithoutthem.
Third, to p-group, for working with me, and for all the helpful discussions. Special
thanks to Vikram Sorathia, Hao Wu, Charalampos Chelmis, Om Patri, Yinuo Zhang,
XingShi,GregoryHarris,JingZhaoandJaniceThompson.
Last but not least, I would like to thank Chevron Corporation and Center for Inter-
activeSmartOilfieldTechnologies(Cisoft)forfundingmyresearch.
Onceagain,thankyouall.
iii
TableofContents
Acknowledgments iii
ListofTables vii
ListofFigures viii
Abstract xii
Chapter1: Introduction 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Understandingimagesemantics . . . . . . . . . . . . . . . 2
1.1.2 Semanticrelationshipmining . . . . . . . . . . . . . . . . . 5
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Chapter2: RelatedWork 11
2.1 ImageUnderstanding . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 ImageClusteringandCategorization . . . . . . . . . . . . . . . . . 13
2.3 RankingontheSemanticWeb . . . . . . . . . . . . . . . . . . . . 15
2.4 LearningtoRank . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Chapter3: UnderstandingWebImagesbyObjectRelationNetwork 17
3.1 SystemOverview . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 GuideOntology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 DirectedGraphicalModel . . . . . . . . . . . . . . . . . . . . . . . 21
3.3.1 Ontologybasedenergy . . . . . . . . . . . . . . . . . . . . 22
3.3.2 Visualfeaturebasedenergy . . . . . . . . . . . . . . . . . . 25
3.3.3 Energyoptimization . . . . . . . . . . . . . . . . . . . . . 28
3.3.4 ORNgeneration . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4.1 Energyfunctionevaluation . . . . . . . . . . . . . . . . . . 30
3.4.2 Systemevaluation . . . . . . . . . . . . . . . . . . . . . . . 32
iv
Chapter4: ApplicationsofObjectRelationNetwork 36
4.1 Automaticimagetagging . . . . . . . . . . . . . . . . . . . . . . . 36
4.2 Automaticimagedescriptiongeneration . . . . . . . . . . . . . . . 38
4.3 Imagesearchbyimage . . . . . . . . . . . . . . . . . . . . . . . . 39
4.4 SemanticImageClustering . . . . . . . . . . . . . . . . . . . . . . 42
4.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4.2 Bag-of-SemanticsModel . . . . . . . . . . . . . . . . . . . 45
4.4.3 ImageClustering . . . . . . . . . . . . . . . . . . . . . . . 47
4.4.4 Experimentalresults . . . . . . . . . . . . . . . . . . . . . 51
Chapter5: LearningtoRankSemanticRelationships 55
5.1 SemanticAssociationFeatures . . . . . . . . . . . . . . . . . . . . 56
5.1.1 AssociationLength . . . . . . . . . . . . . . . . . . . . . . 57
5.1.2 TopicFeatures. . . . . . . . . . . . . . . . . . . . . . . . . 57
5.1.3 RelationComplexity . . . . . . . . . . . . . . . . . . . . . 58
5.1.4 PropertyFrequencyFeatures . . . . . . . . . . . . . . . . . 59
5.1.5 PopularityFeatures . . . . . . . . . . . . . . . . . . . . . . 60
5.2 LearningtoRank . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2.1 TheRankingFramework . . . . . . . . . . . . . . . . . . . 61
5.2.2 TheRankingSVMAlgorithm . . . . . . . . . . . . . . . . 62
5.2.3 UserPreferencesandWeightVector . . . . . . . . . . . . . 63
5.3 SystemImplementation . . . . . . . . . . . . . . . . . . . . . . . . 64
5.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.4.1 ExperimentalSetup . . . . . . . . . . . . . . . . . . . . . . 65
5.4.2 RankingResults . . . . . . . . . . . . . . . . . . . . . . . . 68
5.4.3 Comparison1: RankingQualityperUser . . . . . . . . . . 69
5.4.4 Comparison2: OverallRankingQuality . . . . . . . . . . . 72
Chapter6: Rankbox: Adaptively Mining Semantic Relationships Using
UserFeedback 76
6.1 SystemOverview . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.2 SemanticAssociationSearch . . . . . . . . . . . . . . . . . . . . . 78
6.3 RankingSearchResults . . . . . . . . . . . . . . . . . . . . . . . . 79
6.3.1 Semanticassociationfeatures . . . . . . . . . . . . . . . . . 80
6.3.2 Learningw
u
usingLDA . . . . . . . . . . . . . . . . . . . 82
6.4 AdaptivelyLearningUserPreferences . . . . . . . . . . . . . . . . 84
6.5 SystemImplementation . . . . . . . . . . . . . . . . . . . . . . . . 85
6.6 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.6.1 Experimentalsetup . . . . . . . . . . . . . . . . . . . . . . 86
6.6.2 Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . 88
v
Chapter7: ConclusionandFutureWork 93
7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.2 Futurework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Bibliography 96
vi
ListofTables
1.1 Typical results of the semantic association search betweem Harry
Potter and JamesPotter . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1 Evaluation results of the energy functions. The first columns con-
tains data for Figure 3.6. The last two columns show the gain in
accuracy by using the complete energy function E(L), where k is
thenumberofdetectedobjects. . . . . . . . . . . . . . . . . . . . . 32
3.2 Human evaluation of ORN and detection: possible score ranges
from5(perfect)to1(failure). k isthenumberofdetectedobjects. . . 33
4.1 ExamplerulesforinferencingimplicitknowledgefromORNs . . . 39
5.1 Featuressortedinimportancedescendingorderfordifferentusers 64
5.2 Top10topicswiththemostnumberofentities . . . . . . . . . . . . 66
5.3 Querysetandresultstatistics . . . . . . . . . . . . . . . . . . . . . 68
5.4 Data of average cumulative loss ratio, average total rank of users’
top-10, precision@10, and nDCG
10
. Our method outperforms the
othertwomethodsunderallfourmetrics. . . . . . . . . . . . . . . 75
6.1 Example results of a semantic association search between two fic-
tionalcharacter HarryPotter and Ginny Weasley . . . . . . . . . . . 79
6.2 Testqueries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
vii
ListofFigures
1.1 Imageunderstanding . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Resultsofrunningpersonandballdetectorsonawebimage . . . . 3
1.3 ImagesandtheirObjectRelationNetworksautomaticallygenerated
byoursystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 AsmallfractionoftheRDFgraphwecreatedfromFreebasedata[31]
underthetopic“fictional universe”. Thecolorofeachinstancenode
denotesitsclass. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1 Systempipelineforanexampleimage: (a)theinputtooursystemis
animagewithnometadata;(b)objectdetectorsfindgenericobjects;
(c)theguideontologycontainsbackgroundknowledgerelatedtothe
detected objects and their relations; (d) a directed graphical model
is constructed; (e) the best labeling of the graph model is predicted
using energy minimization; (f) the output Object Relation Network
representsthemostprobableyetontologically-consistentclassassign-
mentsforthedirectedgraphmodel;(g)typicalapplicationsofORNs. 18
3.2 Given an image with detected objects shown in left, and the cardi-
nalityconstraintThrow
1
←−−−−−Basketball,weaddanedgebetween
node pair (r
1;3
;r
2;3
) and penalize the energy function if both nodes
areassignedas Throw. . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Edges are added between object nodes that have the potential to
form a collection. Energy bonus is given when the labeling results
inalargeandinformativecollection. . . . . . . . . . . . . . . . . . 25
3.4 The probability distribution over person1’s potential class assign-
mentsisestimatedinatop-downmanner. . . . . . . . . . . . . . . 27
viii
3.5 Theprobabilitydistributionsareorganizedinanetworktopredicta
mostprobableyetconsistentlabeling,whichmayinreturnimprove
theclassificationresult. . . . . . . . . . . . . . . . . . . . . . . . . 31
3.6 Errorrateofclassassignmentsunderthreedifferentscenarios. . . . 31
3.7 The ontology we use for system evaluation. Constraints are only
shownintherootlayerforclarity. . . . . . . . . . . . . . . . . . . 33
3.8 Examplesof“good”ORNsgeneratedfromoursystem . . . . . . . 34
3.9 Examplesof“bad”ORNsgeneratedfromoursystem . . . . . . . . 35
4.1 Tagsandnaturallanguagedescriptionsautomaticallygeneratedfrom
our ORN-based approaches. Annotation results from the ALIPR
system(http://alipr.com/)arealsoshowedforreference. (Part1) . . 37
4.2 Tagsandnaturallanguagedescriptionsautomaticallygeneratedfrom
our ORN-based approaches. Annotation results from the ALIPR
system(http://alipr.com/)arealsoshowedforreference. (Part2) . . 38
4.3 Image search results of our approach and Google Image Search by
Image(http://images.google.com/)-Part1 . . . . . . . . . . . . . . 40
4.4 Image search results of our approach and Google Image Search by
Image(http://images.google.com/)-Part2 . . . . . . . . . . . . . . 41
4.5 Images and their bag-of-semantics descriptions automatically gen-
erated from our system. ORNs are shown in the middle as inter-
mediate results. The left two columns show the views of bag-of-
semanticsmodelsfromtwodifferentlenses. . . . . . . . . . . . . . 43
4.6 Anexampleguideontology . . . . . . . . . . . . . . . . . . . . . . 46
4.7 Differentchoicesoflensresultindifferentclusterhierarchies. Lenses
are chosen to cluster images according to relation types and person
typesin(A)and(B)respectively. . . . . . . . . . . . . . . . . . . . 50
4.8 Several“good”clusteringresultsobtainedbyusingveryfinelenses,
i.e.splitting subclasses of Person class and all the relation classes.
Eachrowcontainsexampleimagesandthelabelofacluster. . . . . 53
4.9 Examplesof“bad”clusteringresults. Theclusteringerrorsareusu-
ally caused by detection errors (e.g., false detection and missing
detection). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
ix
5.1 AsmallfractionofourFreebaseknowledgebaseschemawiththree
topicregions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.2 Anexampleofacomplexrelationnode . . . . . . . . . . . . . . . 59
5.3 Support vector machine algorithm: a linear classifier is learned to
separatepositiveandnegativesampleswithmaximalmargin. . . . . 63
5.4 Architecture of our rankingsystem (∗: Each user hashis ownrank-
ingfunction.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.5 Screenshots of ranking results from our implementation. The num-
ber“#i”aheadofeachresultistherankproducedbythecorrespond-
ing ranking approach. URank denotes user-assigned rank. Accord-
ingtoourexperimentsetup,onlytheuser’smostfavorite10results
haveURankvalues. . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.6 Cumulative loss ratio for each of the 20 users. This figure indicates
thatourapproach(LtR)andLtR CAalwaysperformbetterthanthe
baselineincapturinguserpreferences. . . . . . . . . . . . . . . . . 71
5.7 Total rank of each user’s top 10 favoriteresults: before having seen
allofhis10mostfavoriteresults,thenumberofresultsauserneeds
to examine in our method(LtR) is far less than that in the base-
line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.8 Precision@k forallthreemethods . . . . . . . . . . . . . . . . . . 74
5.9 nDCG
k
forallthreemethods . . . . . . . . . . . . . . . . . . . . . 74
6.1 Ouradaptiverankingsystemforsemanticassociationsearch. Com-
ponentsinboldtextarespecifictoeachuser. . . . . . . . . . . . . . 78
6.2 AsimpleRDFmodelwithtwotopicregionsattheschemalevel . . 81
6.3 LDAmaximizesthedistancebetweentheprojectionmeanswhileat
the same time minimizing the scatter within the set (illustrated in a
2-dimensionalspace). . . . . . . . . . . . . . . . . . . . . . . . . 83
6.4 ThewebgraphicaluserinterfaceofRankbox . . . . . . . . . . . . 87
6.5 Iteration 1: a user who is interested in family and romantic rela-
tionships selects some search results and gives her feedback (user
feedbackisinblue). . . . . . . . . . . . . . . . . . . . . . . . . . . 89
x
6.6 Iteration 2: the users preference has been reflected in the ranking
results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.7 Precision@10periteration . . . . . . . . . . . . . . . . . . . . . . 92
6.8 Thenumberofactiveusersperiteration . . . . . . . . . . . . . . . 92
xi
Abstract
Semantic Web technologies are a standard, non-proprietary set of languages and tools
thatenablemodeling,sharing,andreasoningaboutinformation. Words,termsandenti-
ties on the Semantic Web are connected through meaningful relationships, and thus
enable a graph representation of knowledge with rich semantics (also known as an
ontology). Understanding the semantic relationships between data objects has been a
critical step towards getting useful semantic information for better integration, search
anddecision-making. Thisthesisaddressestheproblemofsemanticrelationshipunder-
standing from two aspects: first, given an ontology schema, an automatic method is
proposed to understand the semantic relationships between image objects using the
schema as a useful semantic source; second, given a large ontology with both schema
andinstances,alearning-to-rankbasedrankingsystemisdevelopedtoidentifythemost
relevantsemanticrelationshipsfromtheontologyaccordingtouserpreferences.
Thefirstpartofthisthesispresentsanautomaticmethodforunderstandingandinter-
pretingthesemanticsofunannotatedwebimages. Weobservethattherelationsbetween
objectsinanimagecarryimportantsemanticsabouttheimage. Tocaptureanddescribe
suchsemantics,weproposeObjectRelationNetwork(ORN),agraphmodelrepresent-
ingthemostprobablemeaningoftheobjectsandtheirrelationsinanimage. Guidedand
constrained by an ontology, ORN transfers the rich semantics in the ontology to image
objectsandtherelationsbetweenthem,whilemaintainingsemanticconsistency(e.g.,a
xii
soccerplayercankickasoccerball,butcannotrideit). Wepresentanautomaticsystem
whichtakesarawimageasinputandcreatesanORNbasedonimagevisualappearance
and the guide ontology. Our system is evaluated on a dataset containing over 26,000
web images. We demonstrate various useful web applications enabled by ORNs, such
as automatic image tagging, automatic image description generation, image search by
image,andsemanticimageclustering.
Inthesecondpartofthisthesis,alearning-to-rankbasedrankingsystemisproposed
for mining complex relationships on the Semantic Web. Our objective is to provide an
effective ranking method for complex relationship mining, which can 1) automatically
personalize ranking results according to user preferences, 2) be continuously improved
to more precisely capture user preferences, and 3) hide as many technical details from
endusersaspossible. Weobservethatausersopinionsonsearchresultscarryimportant
information regarding his interests and search intentions. Based on this observation,
our system supports each user to give simple feedback about the current search results,
andemploysamachine-learningbasedrankingalgorithmtolearntheuserspreferences
from his feedback. A personalized ranking function is then generated and used to sort
theresultsofeachsubsequentquerybytheuser. Theusercankeepteachingthesystem
his preferences by giving feedback through several iterations until he is satisfied with
thesearchresults. OursystemisevaluatedonalargeRDFknowledgebasecreatedfrom
Freebase linked-open-data. The experimental results demonstrate the effectiveness of
ourmethodcomparedwiththestate-of-the-art.
xiii
Chapter1
Introduction
Semantic Web [10] technologies, proposed by W3C (World Wide Web Consortium),
areastandard,non-proprietarysetoflanguagesandtoolsthatenablemodeling,sharing,
andreasoningaboutinformation. SemanticWebaimsatcreatingawebofdatathatcan
beunderstoodandprocessedbycomputers.
Ontology, as one of the fundamental components of Semantic Web, is a data model
representing data objects and the semantic relations between them. Ontology provides
a formal and explicit graph representation of information, in which the nodes represent
concepts and entities, while the edges represent meaningful relationships between the
nodes. Ontologies have been used as the structural frameworks for organizing infor-
mation in various domains, such as artificial intelligence, social network, biomedical
informatics,systemsengineering,andenergyinformatics.
SemanticrelationshipsareattheheartofSemanticWebandontology. Theyconnect
words, terms and entities through meaning, and thus enable a web of linked data with
richsemantics. Understandingthesemanticrelationshipsbetweendataobjectshasbeen
a critical step towards getting useful semantic information for better integration, search
anddecision-making.
This thesis addresses the problem of semantic relationship understanding from two
aspects: first,givenametadataontology,anautomaticmethodisproposedtounderstand
thesemanticrelationshipsbetweenimageobjectsusingthemetadataontologyasause-
ful semantic source; second, given a large ontology with both metadata and instance
1
Figure1.1: Imageunderstanding
data,alearning-to-rankbasedrankingsystemisdevelopedtoidentifythemostrelevant
semanticrelationshipsfromtheontologyaccordingtouserpreferences.
1.1 Overview
1.1.1 Understandingimagesemantics
Understanding the semantics of images has been a critical component in many appli-
cations, such as automatic image description and image search. The goal of image
understanding is to make computers be able to tell the meaning of an image, such as
scene, actions and objects in the image (Figure 1.1). Manual annotation, particularly
tagging, has been considered as a reliable source of image semantics due to its human
origins. Manual annotation can yet be very time-consuming and expensive when deal-
ingwithweb-scaleimagedata. AdvancesinSemanticWebhavemadeontologyanother
usefulsourcefordescribingimagesemantics(e.g.,[59]). Ontologybuildsaformaland
explicitrepresentationofsemantichierarchiesfortheconceptsandtheirrelationshipsin
images, and allows reasoning to derive implicit knowledge. However, the gap between
ontological semantics and image visual appearance is still a hinderance towards auto-
mated ontology-driven image annotation. With the rapid growth of image resources on
2
(a) input image
Ball1
Person 1
(b) bounding box and detected objects
Figure1.2: Resultsofrunningpersonandballdetectorsonawebimage
theworld-wide-web,vastamountofimageswithnometadatahaveemerged. Thusauto-
maticallyunderstandingrawimagessolelybasedontheirvisualappearancebecomesan
importantyetchallengingproblem.
Advances in computer vision have offered computers an eye to see the objects in
images. Inparticular,objectdetection[26]canautomaticallydetectwhatisintheimage
and where it is. For example, given Fig. 1.2(a) as the input image to object detectors,
Fig. 1.2(b) shows the detected objects and their bounding boxes. However, current
detection techniques have two main limitations. First, detection is limited to isolated
objectsandcannotseethroughtherelationsbetweenthem. Second,onlygenericobjects
are detected; detection quality can drop significantly when detectors attempt to assign
morespecificmeaningtotheseobjects. Forinstance,detectorssuccessfullydetectedone
personandoneballinFig.1.2(b),butcannotfurthertellwhetherthepersonisthrowing
orkickingtheball,orwhetherthepersonisplayingabasketballorasoccerball.
Thefirstpartofthisthesispresentsanautomaticsystemforunderstandingrawweb
images (represented in pixels, without any annotations), by taking advantages of both
ontology and object detection. Given a raw image as input, our system adopts object
detection as an eye in pre-processing to find generic objects in the image; and employs
3
hold
Person 1
SoccerBall1
BasketballPlayer1
Basketball1
throw
SoccerPlayer2
SoccerBall1
kick
SoccerPlayer3
kick
A Collection of
SoccerPlayers
SoccerPlayer1
Figure 1.3: Images and their Object Relation Networks automatically generated by our
system
a guide ontology as a semantic source of background knowledge. We propose Object
RelationNetwork(ORN)totransferrichsemanticsintheguideontologytothedetected
objects and their relations. In particular, ORN is defined as a graph model representing
the most probable ontological class assignments for the objects and their relations. Our
method automatically generates ORN for an image by solving an energy optimization
problemoveradirectedgraphicalmodel. TheoutputORNcanberegardedasaninstan-
tiation of the guide ontology with respect to the input image. Fig. 1.3 illustrates three
webimagesandtheirORNsautomaticallygeneratedbyoursystem.
Object Relation Networks can be applied to many web applications that need auto-
maticimageunderstanding. Inparticular,thisthesisdemonstratesfourapplications:
• Automatic image tagging: With a few simple inference rules, ORNs can auto-
maticallyproduceinformativetagsdescribingentities,actions,andevenscenesin
images.
4
• Automatic image description generation: Natural language description of an
imagecanbeautomaticallygeneratedbasedonitsORN,usingasimpletemplate
basedapproach.
• Image search by image: Given a query image, the objective is to find images
semantically-similar to the query image from an image library. We show that
thedistancebetweenORNgraphsisaneffectivemeasurementofimagesemantic
similarity. SearchresultsconsistofimageswithORNsthatareclosetothequery
image’sORN,rankedbyORNdistances.
• Semanticimageclustering: Weproposeanovelmethodtoorganizeacollection
of images into a hierarchy of clusters based on image semantics. Our method
describes the semantics of each image with a bag-of-semantics model (i.e., a set
of meaningful descriptors) derived from the image’s ORN. We adopt the class
hierarchies in a guide ontology as different levels of lenses to view the bag-of-
semanticsmodels. Imageclustersareautomaticallyextractedbygroupingimages
with the same bag-of-semantics viewed through a certain lens. In addition, our
method allows each user to control the clustering process while browsing, and
thusdynamicallyadjuststheclusteringresultaccordingtotheuser’spreferences.
1.1.2 Semanticrelationshipmining
One of the most fundamental tasks in data mining on the Semantic Web is to find
complex semantic relationships between entities. Complex semantic relationship, also
known as semantic association, is defined as an undirected path that connects two
resource entities in an RDF graph [5]. A semantic association search seeks to obtain
all meaningful relations between two given entities. Table 1.1 shows a few example
resultsofasemanticassociationsearchontheRDFgraphinFigure1.4.
5
Harry Potter James Potter
Hogwarts
School
education student_graduate
Ginny Weasley
married_to
education
Elder Wand
Albus
Dumbledore
has_possessed
owner
Order of the
Phoenix
founder_of
has_member
has_parent
Organization
in Fiction
Fictional
Character
Classes
Fictional
Object
School in
Fiction
employee_of
Magic
power_or_ability character_with_this_ability
Character
Power
Figure1.4: AsmallfractionoftheRDFgraphwecreatedfromFreebasedata[31]under
thetopic“fictional universe”. Thecolorofeachinstancenodedenotesitsclass.
1 HarryPotter
has parent
−−−−−−−−−→JamesPotter
2 HarryPotter
education
−−−−−−−−−→HogwartsSchool
student graduate
−−−−−−−−−−−−−→ JamesPotter
3 Harry Potter
married to
−−−−−−−−−−→ Ginny Weasley
education
−−−−−−−−−→ Hogwarts School
student graduate
−−−−−−−−−−−−−→ JamesPotter
4 Harry Potter
has possessed
−−−−−−−−−−−→ Elder Wand
owner
−−−−−−−→ Albus Dumbledore
founder of
−−−−−−−−−−→OrderofthePhoenix
has member
−−−−−−−−−−→JamesPotter
5 HarryPotter
power or ablity
−−−−−−−−−−−−→ Magic
character with this ablity
−−−−−−−−−−−−−−−−−−→ JamesPotter
Table1.1: Typicalresultsofthesemanticassociationsearchbetweem Harry Potter and
JamesPotter
As the volume of semantic data increases, a semantic association search is very
likelytoreturntoomanyresultsforahumanusertodigest. Forexample,weparsedthe
6
entire fictional universe domain of Freebase linked-open-data[31] into an RDF knowl-
edge base containing 192K resources and 411K properties. We observed that in such
a knowledge base, even a simple query (e.g., between Harry Potter and James Potter)
withastrictpathlengthrestriction(e.g.,10)returnsthousandsofsemanticassociations.
To ensure more relevant results shown first to the user, an effective ranking method has
becomeanimportantnecessityforsemanticassociationmining.
Inaddition,similartowebsearch,understandinguserpreferencesisakeychallenge
in producing personalized semantic association search results. Different users can have
differentpreferencesintermsofpersonalinterestsandsearchintentions. Giventhatsuch
preferencesaredifficulttobeexplicitlyexpressedincurrentsemanticassociationquery
languages [8, 39], it is the ranking method’s responsibility to cater for each individual
user’sspecificpreferences.
Manyrankingmethodshavebeenproposedtowardsproducingpersonalizedseman-
tic association search results. Some methods (e.g., [6, 7, 38]) allow users to manually
configure certain ranking parameters. But manual tuning requires users to have a good
understandingoftherankingscheme,andcanbeverydifficultforinexperiencedusers.
To alleviate these problems, the second part of this thesis proposes two ranking
methods for complex semantic relationship search based on user preferences. The first
methodemploysalearning-to-rankalgorithmtocaptureeachuser’spreferences. Using
this,itautomaticallyconstructsapersonalizedrankingfunctionfortheuser. Theranking
function is then used to sort the results of each subsequent query by the user. Query
resultsthatmorecloselymatchtheuser’spreferencesgainhigherranks.
However,thismethodhastwolimitations. First,userlabelingisatediousandtime-
consumingtask;auserhastoexaminethousandsofresultsduringthelabelingprocess.
Second,therankingfunctioncannotbeimprovedonceitislearnedfromtheuser-labeled
7
results,whichmaycauseunsatisfactoryrankingresultsifthetrainingdataisinsufficient
tocoverallpreferencesoftheuser.
To improve our first method, we present Rankbox, an adaptive ranking system for
miningcomplexrelationshipsontheSemanticWeb. Ourobjectiveistoprovideamore
effectiveandmoreuser-friendlyrankingmethodforcomplexrelationshipmining,which
can 1) automatically personalize ranking results according to user preferences, 2) be
continuouslyimprovedtomorepreciselycaptureuserpreferences,and3)hideasmany
technicaldetailsfromendusersaspossible. Weobservethatauser’sopinionsonsearch
results carry important information regarding his interests and search intentions. Based
onthisobservation,oursystemsupportseachusertogivesimplefeedbackaboutthecur-
rentsearchresults,andemploysamachine-learningbasedrankingalgorithmtolearnthe
user’spreferencesfromhisfeedback. Apersonalizedrankingfunctionisthengenerated
andusedtosortresultsoftheuser’ssubsequentqueries. Theusercankeepteachingthe
systemhispreferencesbygivingfeedbackthroughseveraliterationsuntilheissatisfied
with the search results. Our system is implemented and deployed on a web server that
canbeeasilyaccessedthroughwebbrowsers.
1.2 Contributions
Themaincontributionsofthisthesis,summarizedbelow,lieintwoareas: understanding
imagesemanticsandsemanticrelationshipmining.
Understanding image semantics: We propose and exploit Object Relation Net-
work towards automatic web image understanding. ORN is an intuitive and expressive
graphmodelinrepresentingthesemanticsofwebimages. Itpresentsthemostprobable
meaning of image objects and their relations, while maintaining the semantic consis-
tencythroughthenetwork.
8
We combine ontological knowledge and image visual features into a probabilistic
graphical model. By solving an optimization problem on this graphical model, our
method automatically transfers rich semantics in the ontology to a raw image, generat-
ing an ORN in which isolated objects detected from the image are connected through
meaningfulrelations.
WeproposeanddemonstratefourapplicationscenariosthatcanbenefitfromORNs:
automatic image tagging, automatic image description generation, image search by
image,andsemanticimageclustering.
Semantic relationship mining: To the best of our knowledge, we are the first to
apply learning-to-rank methods for ranking semantic association search results. Our
methods can automatically capture user preferences and generate personalized ranking
resultswithoutanymanualtuning.
Weproposeanadaptiverankingsystemforsemanticassociationsearch. Thesystem
adapts to user preferences by continuously collecting and learning from user feedback.
To the best of our knowledge, our system presents the first interactive learning-to-rank
methodforsemanticassociationsearch.
Our system acts as a black box to end-users, as it hides all technical details of the
rankingschemefromthem. Ratherthandealingwithtediousparametertuningortrain-
ingdatalabeling,usersonlyneedtoprovidesimpleandintuitivefeedbacktothesystem
ifthecurrentsearchresultsneedimprovement.
Our system can be easily accessed from any devices with a modern web browser.
The evaluation based on a large link-open-data dataset demonstrates the advantages of
oursystemcomparedtothestate-of-the-art.
9
1.3 Organization
The rest of the thesis is organized as follows. Chapter 2 reviews related work in image
understanding and semantic relationship mining. Chapter 3 proposes Object Rela-
tion Network(ORN) and presents our automatic system for understanding web images.
Chapter 4 demonstrates four application scenarios of ORNs. Chapter 5 presents our
first learning-to-rank method for mining complex semantic relationships based on user
preferences. A more efficient and more user-friendly adaptive semantic relationship
miningsystem(Rankbox)isdiscussedinChapter6. Chapter7concludesthethesisand
discussespossiblefuturework.
10
Chapter2
RelatedWork
The work in this thesis is related to the following areas: image understanding, image
clusteringandcategorization,rankingontheSemanticWeb,andlearningtorank.
2.1 ImageUnderstanding
Imageunderstandingwithkeywordsandtext
Someresearchachievementshavebeenmadeinthewebcommunitytowardsunder-
standing web image semantics with keywords and text, such as tag recommendation,
tag ranking and transfer learning from text to images. Tag recommendation [60, 67]
enriches the semantics carried by existing tags by suggesting similar tags. Tag rank-
ing [45, 66] identifies the most relevant semantics among existing tags. Transfer learn-
ingfromtexttoimages[54]buildsasemanticlinkagebetweentextandimagebasedon
their co-occurrence. These methods all require images to have meaningful initial tags
or relevant surrounding text, and thus do not work for untagged images or images with
irrelevanttextsurrounded.
Different from these methods, our image understanding system can interpret the
semanticsofrawimageswithnokeywordsorsurroundingtext.
Imageunderstandingusingvisualappearance
Computer vision community has made great progress in automatically identifying
static objects in images, also known as object detection. The PASCAL visual object
11
classeschallenge(VOC)isanannualcompetitiontoevaluatetheperformanceofdetec-
tionapproaches. Forinstance,intheVOC2011challenge[23],detectorsarerequiredto
detect twenty object classes from over 10,000 flickr images. These efforts have made
object detectors a robust and practical tool for extracting generic objects from images
(e.g., [26]). On the other hand, detection and segmentation are usually localized opera-
tions and thus lose information about the global structure of an image. Therefore, con-
textual information is introduced to connect localized operations and the global struc-
ture. Inparticular,researchersimplicitlyorexplicitlyintroduceaprobabilisticgraphical
modeltoorganizepixels,regions,ordetectedobjects. Theprobabilisticgraphicalmodel
canbeahierarchicalstructure[63,44],adirectedgraphicalmodel[62],oraconditional
random field [34, 55, 40, 41]. These methods are similar in spirit to our method, how-
ever, there are two key differences: (1) we introduce ontology to provide semantics for
bothrelationsandobjects,whilepreviousresearch(evenwithanontology[52])focuses
on spatial relationships, such as on-top-of, beside, which are insufficient to satisfy the
semanticdemandofwebapplications;(2)previousresearchusuallyfocusesonimprov-
ing local operations or providing a general description for the entire scene, they do not
explicitly reveal the semantic relations between objects thus are less informative than
ourObjectRelationNetworkmodel.
Ontology-aidedimageannotation
A number of annotation ontologies (e.g., [59, 57]) have been proposed to provide
description templates for image and video annotation. Concept ontology [24] charac-
terizes the inter-concept correlations to help image classification. Lexical ontologies,
particularly the WordNet [25] ontology, describe the semantic hierarchies of words.
WordNet groups words into sets of synonyms and records different semantic relations
between them, such as antonymy, hypernymy and meronymy. The WordNet ontology
has been used to: (1) improve or extend existing annotations of an image [61, 20], (2)
12
provide knowledge about the relationships between object classes for object category
recognition[50],and(3)organizethestructureofimagedatabase[21].
Different from the prior work, we exploit ontologies to provide background knowl-
edge for automatic image understanding. In particular, the key difference between our
guide ontology and the ontology in [50] is that our guide ontology contains semantic
hierarchies of both object classes and relation classes, and supports various semantic
constraints.
2.2 ImageClusteringandCategorization
ImageClustering
Pioneerimageclusteringresearch[56,17,32]extractslowlevelvisualfeaturesfrom
inputimages,andappliesdifferentclusteringalgorithmsbasedonthesevisualfeatures.
Thesealgorithmsincludedistancebasedclustering[56,51],Ncut[17],localitypreserv-
ingclustering[70],andagglomerativeclustering[32]. Inparticular,treesaresuggested
to be a natural organization of clusters [17]. But in these pioneer efforts, there is no
correspondencebetweentheclustertreeandthestructureofimagesemantics.
Forweb images,textualcontextisbelievedtobeausefuladditiontothevisualfea-
tures. Co-clustering approaches are introduced to integrate visual features and multiple
context features such as surrounding text [15, 29], links [15, 65], and attributes of vari-
ous data objects [65]. In addition, Jing et al. [36] identify semantic clusters related to a
given query, and assign the result images to the clusters. These methods work well for
specific web applications, but lose generality and accuracy when dealing with images
withlimitedorirrelevantwebcontext.
TherecentworkbyBiswasandJacobs[12]developsanimageclusteringalgorithm
that allows humans to improve initial visual-feature-based pairwise image similarity.
13
Although their method also considers user control, there are two key differences with
our method: (1) in our method, rather than comparing the similarity of many pairs of
images, a user can control the clustering hierarchy by splitting nodes in the current
lens; (2) our method can support concrete semantics across various object and relation
categories.
ImageCategorization
In computer vision, image categorization targets at labeling images with one of a
numberofpredefinedcategories[16]. Insteadofdirectlyusinglowlevelvisualfeatures
(e.g.,colorsandtextures),intermediaterepresentationsarefrequentlyintroducedtocap-
ture image semantics. For example, the well-known bag-of-words model [42, 13, 30]
describesanimageasabagofvisualcodewordsandprovidesvariousmeasurementsof
imagesimilarity. Anotherpopularintermediaterepresentationconsistsofimageregions
created from segmentation. With this representation, the image categorization problem
can be formulated as a multiple-instance learning (MIL) problem by viewing an image
as a bag of instances [16, 11, 47]. Ontologies have also been used to provide formal
taxonomichierarchiesforvariousimagecategories[64].
Although these image categorization methods share some similarities with our
approach,wearethefirsttoexploittherelationsbetweenobjectsinimages. Byexplor-
ing the relations between image objects, our bag-of-semantics model can express con-
crete semantics such as “basketball player” and “soccer player”. In contrast, the bag-
of-wordsmodelusesvisualcodewordsthataredirectlyderivedfromrawimage. Thus,
it can hardly express concrete semantics. In addition, we are the first to enable user
control in the clustering process. A user can make intuitive adjustment to intermediate
clusteringresultstoensuretheclustersfaithfullycapturehispreferences.
14
2.3 RankingontheSemanticWeb
Ranking knowledge on the Semantic Web has recently received a great amount of
research interests. Many efforts have been made to address this problem from three
differentlevels: ontologylevel,resourcelevelandrelationshiplevel.
For ontology and resource level ranking, the objective is to determine the relevance
of each individual ontology and resource respectively. For example, AKTiveRank [4]
applies a number of analytic methods to rank ontologies based on how well they repre-
sentthegivensearchterms. Harth et al.[33]andDing et.al[22]usemethodssimilarto
Pagerank[53]torankontologiesandresourcesbyanalyzinglinksandreferralsbetween
them.
Forrelationshiplevelranking,theobjectiveistodeterminetherelevanceofsemantic
relationships between a pair of resources, such as SemRank [7] and Aleman-Meza et
al.’smethod[6]asmentionedinSection1.1.2. Anotherexampleattherelationshiplevel
is NAGA [38], which ranks semantic associations based on three metrics: confidence,
informativeness and compactness. NAGA employs two configurable parameters that
canbemanuallytunedforbetterrankingquality.
The study in this thesis stays at the relationship level. Compared to other ranking
methodsforsemanticrelationships,weintroduceanovelmachinelearningbasedrank-
ing approach. Our method can provide personalized ranking results with minimal user
interference,whileothersallrequireacertainamountofmanualtuningtoachievesuch
results.
2.4 LearningtoRank
Learning to Rank, also known as machine-learning based ranking, is a set of machine
learning algorithms that can automatically construct a ranking model from the training
15
data [46]. The training data consists of queries and ranked lists of results for those
queries. Arankingmodelislearnedfromthetrainingdata,andthenappliedtorankthe
results of unseen queries. The objective of the ranking model is to sort the results of
unseenqueriesinawaythatissimilartotherankingsinthetrainingdata.
Current learning-to-rank algorithms can be classified into three types: pointwise
(e.g., [19]), pairwise (e.g., [37]), and listwise (e.g., [68]). Pointwise approaches usually
use regression algorithms to predict a score for each single query-result pair. Pairwise
approachesattacktherankingproblemasaclassificationproblem: givenapairofquery
results, a binary classifier is learned to tell which result is better, with a goal to mini-
mizetheaveragenumberofinversionsinranking. Listwisealgorithmsproduceranking
modelsbydirectlyoptimizingtherankingqualityoverallqueriesinthetrainingdata.
The learning-to-rank algorithms we develop in this thesis belong to the pairwise
category. Becauseoursemanticrelationshipminingsystemdoesnotrelyonaparticular
learning-to-rankalgorithm,algorithmsofothercategorycanalsobeusedinoursystem.
16
Chapter3
UnderstandingWebImagesbyObject
RelationNetwork
This chapter presents an automatic method for understanding and interpreting the
semantics of unannotated web images. We observe that the relations between objects
in an image carry important semantics about the image. To capture and describe such
semantics, we propose Object Relation Network (ORN), a graph model representing
the most probable meaning of the objects and their relations in an image. Guided and
constrained by an ontology, ORN transfers the rich semantics in the ontology to image
objectsandtherelationsbetweenthem,whilemaintainingsemanticconsistency(e.g.,a
soccerplayercankickasoccerball,butcannotrideit). Wepresentanautomaticsystem
whichtakesarawimageasinputandcreatesanORNbasedonimagevisualappearance
andtheguideontology.
3.1 SystemOverview
An overview of our system is illustrated in Figure 3.1. Taking an unannotated image
(Figure3.1(a))asinput,oursystemfirstemploysanumberofobjectdetectorstodetect
genericobjectsfromtheinputimage(Figure3.1(b)). Theguideontology(Figure3.1(c),
detailedinSection3.2)containsusefulbackgroundknowledgesuchassemantichierar-
chiesandconstraintsrelatedtothedetectedobjectsandtheirrelations. Oursystemthen
constructsadirectedgraphicalmodelasaprimitiverelationnetworkamongthedetected
17
(a) Input (b) Detection
(g) Applications
(e) Energy optimization towards the
best labeling
BasketballPlayer1
Basketball1
throw
o
1
(Person1)
r 1,2
P(o 1 ⇝ BasketballPlayer) = 0.44
P(o 1 ⇝ Person) = 0.37
P(o 1 ⇝ SoccerPlayer) = 0.19
P(r 1,2 ⇝ Throw) = 0.47
P(r 1,2 ⇝ Head) = 0.21
P(r 1,2 ⇝ Non-Interact) = 0.14
P(r 1,2 ⇝ Hold) = 0.10
P(r 1,2 ⇝ Kick) = 0.08
P(r 1,2 ⇝ Interact) = 0.00
P(o 2 ⇝ Ball) = 0.52
P(o 2 ⇝ Basketball) = 0.30
P(o 2 ⇝ SoccerBall) = 0.18
Automatic tagging Automatic description generation
Image search by image
basketball
basketball
player
ball throwing
A basketball
player is throwing
a basketball.
query image
search results (ranked by relevance)
o 2 (Ball1)
E(L) = E
c
(L;Ont) + E
i
(L;Ont)
+ E
v
(L;Img)
Person
1
Ball1
(c) Guide ontology
object node
relation node
{ object }
Ø
{ object, throw }
{ object, non-interact }
...
{ object, kick }
(f) Output: Object Relation Network
Semantic image clustering
(d) Directed graphical model
Figure 3.1: System pipeline for an example image: (a) the input to our system is an
imagewithnometadata;(b)objectdetectorsfindgenericobjects;(c)theguideontology
contains background knowledge related to the detected objects and their relations; (d)
a directed graphical model is constructed; (e) the best labeling of the graph model is
predictedusingenergyminimization;(f)theoutputObjectRelationNetworkrepresents
the most probable yet ontologically-consistent class assignments for the directed graph
model;(g)typicalapplicationsofORNs.
18
objects (Figure 3.1(d)). We define a set of energy functions to transfer two kinds of
knowledgetothegraphicalmodel: (1)backgroundknowledgefromtheguideontology,
and (2) probabilities of potential ontological class assignments to each node, estimated
from visual appearance of the node. Definitions of these energy functions are detailed
inSection3.3. Bysolvinganenergyminimizationproblem(Figure3.1(e)),weachieve
the best labeling over the graphical model, i.e., the most probable yet ontologically-
consistentclassassignmentsovertheentirenodesetofthegraphicalmodel. TheObject
Relation Network (ORN) is generated by applying the best labeling on the graphical
model (Figure 3.1(f)), as the output of our system. The ORN can also be regarded
as an instantiation of the guide ontology. Finally, we propose and demonstrate three
application scenarios of ORNs, including automatic image tagging, automatic image
descriptiongeneration,andimagesearchbyimage(Figure3.1(g)).
3.2 GuideOntology
Thesourceofsemanticsinoursystemisaguideontology. Itprovidesusefulbackground
knowledgeaboutimageobjectsandtheirrelations. Anexampleguideontologyisshown
inFigure3.1(c).
In general, guide ontologies should have three layers. The root layer contains
three general classes, Object, OO-Relation, and Object Collection, denoting the class
of image objects, the class of binary relations between image objects, and the class
of image object collections, respectively. The detection layer contains classes of the
genericobjectsthatcanbedetectedbytheobjectdetectorsinoursystem. Eachofthese
classisanimmediatesubclassof Object,andcorrespondstoagenericobject(e.g., per-
son and ball). The semantic knowledge layer contains background knowledge about
the semantic hierarchies of object classes and relation classes, and the constraints on
19
relation classes. Each object class at this layer must have a superclass in the detection
layer,whileeachrelationclassmustbeasubclassof OO-Relation. Forimplementation
convenience, we require each relation class at the semantic knowledge layer to have a
superclass in the detection layer, denoting a general relation class between two generic
object classes. If no such relation class exists, we create a dummy class to satisfy the
requirement, e.g., P-BRelationinFigure3.1(c).
Conceptually, any ontology regarding the detectable objects and their relations can
beadaptedintooursystemaspartofthesemanticknowledgelayer,rangingfromatiny
ontology which contains only one relation class with a domain class and a range class,
to large ontologies such as WordNet [25]. However, ontologies with more hierarchical
information and more restrictions are always preferred since they carry more semantic
information. Our system supports four typical types of background knowledge in the
guideontology:
• Subsumption is a relationship where one class is a subclass of another, denoted
asA⊑B. E.g., BasketballPlayer isasubclassof Athlete.
• Domain/range constraints assert the domain or range object class of a relation
class,denotedas
domain(C) and range(C). E.g., in Figure 3.1(c), the domain and the range of
Kick mustbe SoccerPlayer and Soccer Ballrespectively.
• Cardinalityconstraintslimitthemaximumnumberofrelationsofacertainrela-
tion class that an object can have, where the object’s class is a domain/range of
therelationclass. E.g.,inFigure3.1(c),BasketballPlayer
1
−−−−−→Throwmeans
thata BasketballPlayer canhaveatmostone Throwrelation.
• Collection refers to a set of image objects belonging to the same object class,
denoteascollection(C).
20
3.3 DirectedGraphicalModel
The core of our system is a directed graphical model G = (V;E). It is a primitive
relation network connecting the detected objects through relations. In particular, given
a set of detected objects{O
i
} in the input image, we create one object node o
i
for
each object O
i
, and one relation node r
i;j
for each object pair < O
i
;O
j
> that has a
corresponding relation class in the detection layer of the guide ontology (e.g., object
pair < person1;ball1 > corresponding to class P-B Relation), indicating that the two
objectshavepotentialrelations. Foreachrelationnoder
i;j
,wecreatetwodirectededges
(o
i
;r
i;j
)and (r
i;j
;o
j
). ThesenodesandedgesformthebasicstructureofG.
We now consider a labeling problem over the node set V of graph G: for each
node v∈ V, we label it with a class assignment from the subclass tree rooted at the
generic class corresponding to v. In particular, we denote the potential class assign-
ments for object node o
i
asC(o
i
) ={C
o
|C
o
⊑ C
g
(O
i
)}, where C
g
(O
i
) is the generic
class of objectO
i
(e.g., Person for object person1). Similarly, the set of potential class
assignmentsforrelationnoder
i;j
isdefinedasC(r
i;j
) ={C
r
|C
r
⊑C
g
(O
i
;O
j
)},where
C
g
(O
i
;O
j
)isthecorrespondingrelationclassinthedetectionlayer(e.g.,P-BRelation).
AlabelingL :{v C(v)}isfeasiblewhenwehaveC(v)∈C(v)foreachnodev∈V.
ThebestfeasiblelabelingL
optimal
isrequiredto(1)satisfytheontologyconstraints,
(2) be as informative as possible, and (3) maximize the probability of the class assign-
ment on each node regarding visual appearance. We predictL
optimal
by minimizing an
energy functionE over labelingL with respect to an imageImg and a guide ontology
Ont:
E(L) =E
c
(L;Ont)+E
i
(L;Ont)+E
v
(L;Img) (3.1)
21
representing the sum of the constraint energy, the informative energy, and the visual
energy;whicharedetailedinthefollowingsubsectionsrespectively.
3.3.1 Ontologybasedenergy
WedefineenergyfunctionsbasedonbackgroundknowledgeintheguideontologyOnt.
Domain/range constraints restrict the potential class assignments of a relation’s
domainorrange. Thus,wedefineadomain-constraintenergyforeachedgee = (o
i
;r
i;j
)
andarange-constraintenergyforeachedgee = (r
i;j
;o
j
):
E
D
c
(o
i
C
o
;r
i;j
C
r
) =
0 ifC
o
⊑domain(C
r
)
∞ otherwise
E
R
c
(r
i;j
C
r
;o
j
C
o
) =
0 ifC
o
⊑range(C
r
)
∞ otherwise
Intuitively,theyaddstrongpenaltytotheenergyfunctionwhenanyofthedomain/range
constraintsisviolated.
Cardinality constraints restrict the number of instances of a certain relation class
thatanobjectcantake. Weareparticularlyinterestedincardinalityconstraintof1since
it is the most common case in practice. In order to handle this type of constraints, we
add additional edges as shown in Figure 3.2. In particular, if a relation class C
r
has
a cardinality constraint being 1 with its domain (or range), we create an edge between
any relation node pair (r
i;j
;r
i;k
) (or (r
i;j
;r
k;j
) when dealing with range) in which both
nodes have a same domain nodeo
i
(or range nodeo
j
) and both nodes have potential of
22
o
2
(Person2)
r
2,3
r
1,3
o
1
(Person1)
o
3
(Ball1)
o
2
(Person2)
r
2,3
r
1,3
o
1
(Person1)
o
3
(Ball1)
Figure 3.2: Given an image with detected objects shown in left, and the cardinality
constraintThrow
1
←−−−−−Basketball,weaddanedgebetweennodepair(r
1;3
;r
2;3
)and
penalizetheenergyfunctionifbothnodesareassignedas Throw.
beinglabeledwithrelationclassC
r
. Acardinalityconstraintenergyisdefinedonthese
additionaledges:
E
D Card
c
(r
i;j
C
1
;r
i;k
C
2
) =
∞ ifC
1
=C
2
=C
r
0 otherwise
E
R Card
c
(r
i;j
C
1
;r
k;j
C
2
) =
∞ ifC
1
=C
2
=C
r
0 otherwise
Intuitively,theypenalizetheenergyfunctionwhentworelationsareassignedasC
r
and
havethesamedomainobject(orrangeobject).
Depth information, defined as the depth of a class assignment in the subclass tree
rootedatitsgenericclass. Intuitively,wepreferdeepclassassignmentswhicharemore
specific and more informative. In contrast, general class assignments with small depth
should be penalized since they are less informative and thus may be of less interest to
the user. In the extremely general case where generic object classes are assigned to
23
the object nodes and OO-Relation is assigned to all the relation nodes, the labeling is
feasiblebutshouldbeavoidedsincelittleinformationisrevealed. Therefore,weaddan
energyfunctionforeachnodeo
i
orr
i;j
concerningdepthinformation:
E
O
i
(o
i
C
o
) =−!
dep
·depth(C
o
)
E
R
i
(r
i;j
C
r
) =−!
dep
·depth(C
r
)
Collection refers to a set of object nodes with the same class assignment. Intu-
itively, we prefer collections with larger size as they tend to group more objects in the
same image to the same class. For example, in Figure 3.3, when the two persons in the
front are labeled with Soccer Player due to the strong observation that they may Kick
a Soccer Ball( how to make observations from visual features is detailed in Sec 3.3.2),
it is quite natural to label the third person with SoccerPlayer as well since the three
of them will form a relatively larger Soccer Players Collection. In addition, we bonus
collections that are deeper in the ontology, e.g., we prefer Soccer Players Collection
to Person Collection. To integrate collection information into our energy minimization
framework is a bit complicated since we do not explicitly have graph nodes for collec-
tions. Therefore, we add edges between object nodes (o
i
;o
j
) when they belong to the
same generic object class that has the potential to form a collection (e.g., Figure 3.3
right),anddefineanenergyfunctionforeachsuchedge:
E
Col
i
(o
i
C
1
;o
j
C
2
) =
−!
col
2
N1
depth(collection(C
o
)) ifC
1
=C
2
=C
o
0 otherwise
where!
col
isaweight,and
2
N1
isanormalizationfactorwithN representingthenum-
berofobjectnodesthatcanbepotentiallylabeledwithC
o
.
24
o
1
(Person1)
r
1,4
r
3,4
o
3
(Person3)
o
4
(Ball1)
r
2,4
o
2
(Person2)
o
1
(Person1)
r
1,4
r
3,4
o
3
(Person3)
o
4
(Ball1)
r
2,4
o
2
(Person2)
Figure 3.3: Edges are added between object nodes that have the potential to form a
collection. Energy bonus is given when the labeling results in a large and informative
collection.
Finally, the ontology based constraint energy E
c
(L;Ont) and informative energy
E
i
(L;Ont)arethesumoftheseenergyfunctions:
E
c
(L;Ont) =
∑
(o
i
;r
i;j
)
E
D
c
+
∑
(r
i;j
;o
j
)
E
R
c
+
∑
(r
i;j
;r
i;k
)
E
D Card
c
+
∑
(r
i;j
;r
k;j
)
E
R Card
c
(3.2)
E
i
(L;Ont) =
∑
o
i
E
O
i
+
∑
r
i;j
E
R
i
+
∑
(o
i
;o
j
)
E
Col
i
(3.3)
3.3.2 Visualfeaturebasedenergy
Besides background knowledge from the ontology, we believe that visual appearance
of objects can give us additional information in determining class assignments. E.g.,
a Ball with white color is more likely to be a Soccer Ball, while the relation between
two spatially close objects are more probable to be Interact than Non-interact. Thus,
we define visual feature based energy functions for object nodes and relation nodes
respectively.
Visualfeaturebasedenergyonobjectnodes: foreachobjectnodeo
i
,wecollecta
setofvisualfeaturesF
o
(O
i
)ofthedetectedobjectO
i
intheinputimage,andcalculatea
25
probabilitydistributionoverpotentialassignmentsetC(o
i
)basedonF
o
(O
i
). Intuitively,
the conditional probability function P(o
i
C
o
|F
o
(O
i
)) denotes the probability of o
i
assigned as class C
o
whenF
o
(O
i
) is observed from the image. Thus, we define the
visualfeaturebasedenergyonanobjectnodeas:
E
O
v
(o
i
C
o
) =−!
obj
P(o
i
C
o
|F
o
(O
i
))
We choose eight visual features ofO
i
to form feature setF
o
(O
i
), including: width
andheightofO
i
’sboundingbox(whichispartoftheoutputfromdetectors);theaverage
ofH,S,VvaluesfromtheHSVcolorspace;andthestandarddeviationofH,S,V.
Giventheseeightfeaturevaluesonanobjectnodeo
i
,aprobabilitydistributionover
thepotentialassignmentsetC(o
i
)isestimated,whichsatisfies:
∑
C2C(o
i
)
P(o
i
C|F
o
(O
i
)) =P(o
i
⊑C
g
(O
i
)|F
o
(O
i
)) = 1
where C
g
(O
i
) is the generic class of O
i
, and o
i
⊑ C
g
(O
i
) is the notation for “o
i
is
assignedasasubclassofC
g
(O
i
)”.
WetakeadvantageofthehierarchicalstructureofthesubclasstreerootedatC
g
(O
i
),
and compute the probability distribution in a top-down manner. Assume P(o
i
⊑
C
o
|F
o
(O
i
)) is known for certain object class C
o
; if C
o
is a leaf node in the ontology
(i.e., C
o
has no subclass), we have P(o
i
C|F
o
(O
i
)) = P(o
i
⊑ C
o
|F
o
(O
i
)); other-
wise,givenC
o
’simmediatesubclasssetI(C
o
),wehaveapropagationequation:
P(o
i
⊑C
o
|F
o
(O
i
)) =P(o
i
C
o
|F
o
(O
i
))
+
∑
C
k
2I(Co)
P(o
i
⊑C
k
|F
o
(O
i
))
26
Soccer
Player
Basketball
Player
Person
Athlete
Person
P(o
1
⊑ Person) = 1
P(o
1
⊑ Athlete) = 0.63
P(o
1
⇝ Person) = 0.37
P(o
1
⊑ Person) = 1
P(o
1
⇝ SoccerPlayer) = 0.19 P(o
1
⇝ BasketballPlayer) = 0.44
P(o
1
⇝ Athlete) = 0.0
Person
Athlete
P(o
1
⊑ Athlete) = 0.63
P(o
1
⇝ Person) = 0.37
P(o
1
⊑ Person) = 1
Figure 3.4: The probability distribution over person1’s potential class assignments is
estimatedinatop-downmanner.
We can view the right-hand side of this equation from the perspective of multi-class
classification: given conditions o
i
⊑ C
o
andF
o
(O
i
), the assignment of o
i
falls into
|I(C
o
)| + 1 categories: o
i
C
o
, or o
i
⊑ C
k
where C
k
∈I(C
o
);k = 1;:::;|I(C
o
)|.
Thus we can train a multi-class classifier (based on object visual features) to assign a
classificationscoreforeachcategory,andapplythecalibrationmethodproposedin[69]
totransformthesescoresintoaprobabilitydistributionoverthese|I(C
o
)|+1categories.
Multiplied by the priorP(o
i
⊑ C
o
|F
o
(O
i
)), this probability distribution determines the
probability functions on the right-hand side of Eqn.(3.4). Thus, the probabilities recur-
sivelypropagatefromtherootclassdowntotheentiresubclasstree,asdemonstratedin
Figure3.4.
In order to train a classifier for each non-leaf object class C
o
, we collect a set of
objectsO
train
belonging to class C
o
from our training images with ground truth label-
ing,andcalculatetheirfeaturesets. Thetrainingsamplesarelatersplitinto|I(C
o
)|+1
categoriesregardingtheirlabeling: assignedasC
o
,orbelongingtooneofC
o
’simmedi-
atesubclasses. Wefollowthesuggestionsin[69]totrainone-against-allSVMclassifiers
using a radial basis function as kernel [58] for each category, apply isotonic regression
27
(PAV[9])tocalibrateclassifierscores, andnormalizetheprobabilityestimatestomake
them sum to 1. This training process is made for every non-leaf object class once and
forall.
Visualfeaturebasedenergyonrelationnodescanbehandledinasimilarmanner
to that on object nodes. The only difference is the feature setF
r
(O
i
;O
j
). As for rela-
tions,therelativespatialinformationismostimportant. Therefore,F
r
(O
i
;O
j
)contains
eight features of object pair < O
i
;O
j
>: the width, height and center (both x and y
coordinates) of O
i
’s bounding box; and the width, height and center of O
j
’s bounding
box.
Similarly, training samples are collected and classifiers are trained for each non-
leaf relation class C
r
in the ontology. With these classifiers, probabilities propagate
fromeachgenericrelationclasstoitsentiresubclasstreetoformadistributionoverthe
potentialassignmentsetC(r
i;j
). Thevisualfeaturebasedenergyonr
i;j
isdefinedas:
E
R
v
(r
i;j
C
r
) =−!
rel
P(r
i;j
C
r
|F
r
(O
i
;O
j
))
Insummary,thevisualfeaturebasedenergyisdefinedas:
E
v
(L;Img) =
∑
o
i
E
O
v
+
∑
r
i;j
E
R
v
(3.4)
3.3.3 Energyoptimization
FindingthebestlabelingL
optimal
tominimizetheenergyfunctionE(L)isanintractable
problem since the search space of labeling L is in the order of|C|
jVj
, where|C| is the
numberofpossibleclassassignmentsforanodeand|V|isthenumberofnodesingraph
G. However, we observe that this space can be greatly reduced by taking the ontology
constraintenergiesintoaccount. Thebrute-forcesearchisprunedbythefollowingrules:
28
1. Fornodev,whenlabelingv C(v)istobesearched,weimmediatelycheckthe
constraint energies on edges touching v, and cut off this search branch if any of
theseenergiesisinfinite.
2. Wewanttouserule1asearlyaspossible. Thus,topickthenextsearchnode,we
alwayschoosetheunlabelednodewiththelargestnumberoflabeledneighbors.
3. Oneachnode,wesortthepotentialclassassignmentsbytheirvisualfeaturebased
probabilities in descending order. Class assignments with large probabilities are
searched first, and those with very small probabilities (empirically, < 0:1) are
onlysearchedwhennoappropriatelabelingcanbefoundinprevioussearches.
Inourexperiments,thegraphicalmodelconstructedisrelativelysmall(usuallycon-
tains a few object nodes and no more than 10 relation nodes). The energy optimization
processexecutesinlessthan 1secondperimage.
3.3.4 ORNgeneration
Given the best labeling L
optimal
over graph G, an Object Relation Network (ORN) is
generatedastheoutputofoursysteminthreesteps:
1. WeapplylabelingL
optimal
overgraphGtoproduceobjectandrelationnodeswith
mostprobableyetsemanticallyconsistentclassassignmentsforORN.
2. A collection of objects is detected by finding object nodes with the same class
assignment inL
optimal
. A collection node is created accordingly, which is linked
toitsmembersbyaddingedgesrepresentingthe isMemberOf relationship.
3. We finally drop some meaningless relation nodes (in particular, Non-interact),
togetherwiththeedgestouchingthem.
29
After these steps, an intuitive and expressive ORN is automatically created from our
system to interpret the semantics of the objects and their relations in the input image.
ExamplesareshowninFigure1.3,Figure3.8andFigure3.9.
3.4 ExperimentalResults
3.4.1 Energyfunctionevaluation
We first demonstrate how visual feature based energy functions work together with
ontology based energy functions, using the example in Figure 3.5. We observe that
theprobabilitydistributionsshowninthemiddletendtogiveagoodestimationforeach
node, i.e., provide a relatively high probability for the true labeling. But there is no
guarantee that the probability of the true labeling is always the highest (e.g., Ball1 has
higher probability of being assigned as Ball than Basketball, highlighted in red). By
combiningtheenergyfunctionstogether,theontologyconstraintsprovideastrictframe
torestrictthepossiblelabelingovertheentiregraphbypenalizinginappropriateassign-
ments (e.g., Basketball Player Throw Ball, given that the range of relation Throw is
limited to Basketball). Probabilities are organized into a tightly interrelated network
which in return improves the prediction for each single node (e.g., in the labeling with
minimalenergy,Ball1iscorrectlyassignedas Basketball).
To quantitatively evaluate the energy functions, we collect 1,200 images from Ima-
geNet[21]fromcategory soccer, basketballand ball. Apersondetector[27]andaball
detectorusingHoughCircleTransforminOpenCV[14]areappliedontheentireimage
set to detect persons and balls. The detected objects and the relations between them
are manually labeled with classes from the guide ontology in Figure 3.1(c). We then
randomly select 600 images as training data, and use the rest as test data. Three differ-
ent scenarios are compared: (1) using only visual feature based energy, (2) using both
30
r
1,2
P(o
1
⇝ BasketballPlayer) = 0.44
P(o
1
⇝ Person) = 0.37
P(o
1
⇝ SoccerPlayer) = 0.19
P(r
1,2
⇝ Throw) = 0.47
P(r
1,2
⇝ Head) = 0.21
P(r
1,2
⇝ Non-Interact) = 0.14
………...
P(o
2
⇝ Ball) = 0.52
P(o
2
⇝ Basketball) = 0.30
P(o
2
⇝ SoccerBall) = 0.18
o
1
(Person1)
o
2
(Ball1)
List of Labeling
(sorted by energy)
o
1
⇝ BasketballPlayer
r
1,2
⇝ Throw
o
2
⇝ Basketball
E = - 1.68
o
1
⇝ BasketballPlayer
r
1,2
⇝ Non-Interact
o
2
⇝ Ball
E = - 1.24
o
1
⇝ Person
r
1,2
⇝ Non-Interact
o
2
⇝ Ball
E = - 1.17
………... ……
Figure 3.5: The probability distributions are organized in a network to predict a most
probableyetconsistentlabeling,whichmayinreturnimprovetheclassificationresult.
On Person nodes On Ball nodes On P−B Relation nodes
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Error rate of class assignment
Using only E
v
(L;Img)
Combining E
v
(L;Img) and E
c
(L;Ont)
Using complete energy function E(L)
Figure3.6: Errorrateofclassassignmentsunderthreedifferentscenarios.
visualfeaturebasedenergyandontologyconstraints,and(3)usingthecompleteenergy
functionE(L)inEqn.3.1. Oursystemminimizestheenergycostineachofthescenar-
ios, and calculates the error rate by comparing the system output with ground truth. As
Figure 3.6 and Table 3.1 suggest, ontology-based energy transfers background knowl-
edge from the guide ontology to the relation network, and thus significantly improves
thequalityofclassassignments.
31
Genericclass UsingE
v
UsingE
v
+
E
c
Using
E(L)
Gain Gain (k >
3)
Person 0.5188 0.4644 0.3766 0.1423 0.1783
Ball 0.2907 0.2693 0.2640 0.0267 0.0571
P-BRel. 0.3222 0.2887 0.2259 0.0962 0.0775
Table 3.1: Evaluation results of the energy functions. The first columns contains data
for Figure 3.6. The last two columns show the gain in accuracy by using the complete
energyfunctionE(L),wherek isthenumberofdetectedobjects.
3.4.2 Systemevaluation
Tofurtherevaluatetherobustnessandgeneralityofoursystem,weadaptamorecompli-
catedguideontology(Figure3.7)intothesystem. Thedetectionlayercontains6generic
object classes: Person, Horse, Motorbike, Chair, Bicycle, and Ball, while the semantic
layer contains simplified semantic hierarchies from the WordNet ontology [25]. More-
over,weextendourimagesetwithimagesfromVOC2011[23]containingover28,000
web images. We randomly choose 2,000 images that have at least one generic object,
manually label ground truth class assignments for objects and relations, and use them
to train the visual feature based classifiers and the weight set (!
dep
;!
col
;!
obj
;!
rel
). We
adoptthedetectorsin[27,14]toperformobjectdetection.
Time complexity: The most time-consuming operation of our system is detection,
which usually takes around one minute per test image. After this pre-processing, our
system automatically creates an ORN for each image within a second. All experiments
arerunonalaptopwithInteli-7CPU1.60GHzand6GBmemory.
Qualitativeresults: Most of our ORNs are quite good. Example results are shown
in Figure 3.8. The “Good” ORNs successfully interpret the semantics of the objects
and their relations. We also demonstrate some “bad” examples in Figure 3.9. Note that
the “bad” results are usually caused by detection errors (e.g., the top image has a false
32
Semantic knowledge layer
Detection layer
Root layer
Object
Collection
Athlete
Soccer
Player
Basketball
Player
Motorcyclist
Horse
Rider
Soccer
Ball
Person
Chair
Horse
Ball
Bicycle
Motorbike
Basketball
Person
Collection
Soccer Player
Collection
Chair
Collection
Athelete
Collection
Person-Ball
Relation
Person-Horse
Relation
Head
Interact
Throw
Kick
Hold
Ride Lead
Object O-O Relation
Person-Bicycle
Relation
Person-Motorbike
Relation
...
Ride
Basketball Player
Collection
Ride
Rider
Cyclist
Stand by
Figure 3.7: The ontology we use for system evaluation. Constraints are only shown in
therootlayerforclarity.
k = 1 k = 2 k = 3 k > 3 overall
ORNscore 3.69 3.38 3.77 3.95 3.65
Detectionscore 4.31 3.93 3.10 3.38 3.69
Table 3.2: Human evaluation of ORN and detection: possible score ranges from 5(per-
fect)to1(failure). k isthenumberofdetectedobjects.
alarm from the person detector while the rest two images both miss certain objects).
Nevertheless,the“bad”ORNsstillinterpretreasonableimagesemantics.
Humanevaluation: Weperformhumanjudgementontheentiretestdataset. Scores
on a scale of 5 (perfect) to 1 (failure) are given by human judges to reflect the quality
of the ORNs, shown in Table 3.2. First, we notice that the ORNs are quite satisfactory
as the overall score is 3:65. Second, ORN scores for images of a single object are
relativelyhighbecausethedetectionisreliablewhenk = 1. Withthenumberofobjects
increasing, the relation network becomes larger and thus more ontology knowledge is
broughtintotheoptimizationprocess. ThequalityofORNkeepsimprovingdespitethe
qualitydropofdetection.
33
Motorcyclist1 Motorbike1
ride
A Collection of SoccerPlayers
SoccerPlayer1
SoccerPlayer2
SoccerBall1
SoccerPlayer3 SoccerPlayer4
SoccerPlayer5
HorseRider1 Person1
ride
Horse1
lead
SoccerPlayer1
SoccerPlayer3
kick
SoccerBall1
kick
SoccerPlayer2
A Collection of SoccerPlayers
Person1
hold
Ball1
Ball6
Ball7 Ball8
Ball10
Ball9
Ball5
Ball4
Ball3
Ball1
Ball2
A Collection of Balls
A Collection of Horses
Horse2
Person1
Horse1
A
Collection
of Persons
ride ride ride ride
A Collection of Cyclists
A Collection of Bicycles
Person1
Person2
Person3
Person4
Bicycle1 Bicycle2 Bicycle4 Bicycle3
Cyclist1 Cyclist2 Cyclist3 Cyclist4
Figure3.8: Examplesof“good”ORNsgeneratedfromoursystem 34
Person3
Person2
Person1
A Collection of
Persons
A Collection of
BasketballPlayers
BasketballPlayer2
BasketballPlayer3
BasketballPlayer1
throw
Basketball1
Cyclist1
ride
Bicycle1
A Collection of Cyclists
ride
Bicycle3
Cyclist2
Bicycle2
A Collection of Bicycles
Figure3.9: Examplesof“bad”ORNsgeneratedfromoursystem
35
Chapter4
ApplicationsofObjectRelation
Network
This chapter demonstrates various useful web applications enabled by Object Relation
Networks, including automatic image tagging, automatic image description generation,
imagesearchbyimage,andsemanticimageclustering.
4.1 Automaticimagetagging
We develop an automatic image tagging approach by combining ORNs and a set of
inferencerules. Givenarawimageasinput,oursystemautomaticallygeneratesitsORN
whichcontainssemanticinformationfortheobjects,relations,andcollections. Thus,we
directly output the ontological class assignments in the ORN as tags regarding entities,
actions, and entity groups. In addition, with a few simple rules, implicit semantics
abouttheglobalscenecanalsobeeasilyinferredfromtheORNandtranslatedintotags.
Table4.1showssomeexampleinferencerules. Resultsfromourmethodandareference
approachALIPR[43]areillustratedinthethirdrowofFigure4.1andFigure4.2. Note
thatevenwithimperfectORNs(the5thand6thimage),ourapproachisstillcapableof
producingrelevanttags.
36
Input images:
hold
Person 1
SoccerBall1
Automatic generated tags based on ORNs:
ride
Cyclist1
Bicycle1
ride
Cyclist2
Bicycle2
A Collection of Bicycles
A Collection of Cyclists
ball holding
soccer ball
person
(Reference) Top-10 automatic annotation results from ALIPR [13]:
Automatic generated image descriptions based on ORNs:
man-made, indoor, people,
grass, sky, car, steam, engine,
food, royal_guard
Obejct relation networks (ORNs) generated from our system:
bicycle race
bicycle riding
cyclists
bicycles
soccer game
ball kicking
soccer players
soccer ball
grass, people, rural, guard,
fight, battle, flower,
landscape, plant, man-made
indoor, man-made, flower,
plant, grass, old, poster, horse,
house, rural_England
There are one person and one
soccer ball. The person is
holding the soccer ball.
This is a picture of a bicycle race.
There are two cyclists and two
bicycles. Cyclist 1 is riding
Bicycle 1. Cyclist 2 is riding
Bicycle 2.
This is a picture of a soccer game. There are
five soccer players and one soccer ball.
Soccer player 3 is kicking the soccer ball.
kick
A Collection of SoccerPlayers
SoccerPlayer1
SoccerPlayer2
SoccerBall1
SoccerPlayer3 SoccerPlayer4
SoccerPlayer5
Figure 4.1: Tags and natural language descriptions automatically generated from our
ORN-based approaches. Annotation results from the ALIPR system (http://alipr.com/)
arealsoshowedforreference. (Part1)
37
indoor, man-made,
decoration, decoy,
people, drawing, thing,
sport, cloth, art
Input images:
Automatic generated tags based on ORNs:
(Reference) Top-10 automatic annotation results from ALIPR [13]:
Automatic generated image descriptions based on ORNs:
Obejct relation networks (ORNs) generated from our system:
motorbike riding
motorcyclists
motorbike
chairs
soccer ball
person
building, historical, man-made,
landscape, car, beach, people,
modern, city, work
building, man-made, sport,
historical, car, people, sky, plane,
race, motorcycle
There are two motorcyclists and one
motorbike. Motorcyclist 1 and Motorcyclist
2 are riding Motorbike 1.
There are three chairs.
There are one person
and one soccer ball.
ride
Motorcyclist2
Motorcyclist1
Motorbike1
A Collection of Motorcyclists
ride
Chair3
Chair2
Chair1
A Collection of Chairs
SoccerBall1
Person1
Figure 4.2: Tags and natural language descriptions automatically generated from our
ORN-based approaches. Annotation results from the ALIPR system (http://alipr.com/)
arealsoshowedforreference. (Part2)
4.2 Automaticimagedescriptiongeneration
Natural language generation for images is an open research problem. We propose to
exploit ORNs to automatically generate natural language descriptions for images. We
38
1 ∃xSoccerPlayerCollection(x)∧∃ySoccerBall(y)∧∃zSoccerPlayer(z)∧
(kick(z;y)∨head(z;y))∧Tag(t)→t=“soccergame”
2 ∃xCyclistCollection(x)∧∃yCyclist(y)∧∃zBicycle(z)∧ride(y;z)∧Tag(t)→
t =“bicyclerace”
3 ∃xBasketballPlayerCollection((x) ∧ ∃yBasketball(y) ∧
∃zBasketballPlayer(z)∧(throw(z;y)∨hold(z;y))∧Tag(t)→t=“basketball
game”
Table4.1: ExamplerulesforinferencingimplicitknowledgefromORNs
extend our automatic tagging approach by employing a simple template based model
(inspired by [40]) to transform tags into concise natural language sentences. In partic-
ular, the image descriptions begin with a sentence regarding the global scene, followed
by another sentence enumerating the entities (and entity groups if there is any) in the
image. ThelastfewsentencesarederivedfromrelationnodesintheORNwithdomain
andrangeinformation. ExamplesareshowninthelastrowofFigure4.1andFigure4.2.
4.3 Imagesearchbyimage
The key in image search by image is the similarity measurement between two images.
Since ORN is a graph model that carries informative semantics about an image, the
graph distance between ORNs can serve as an effective measurement of the semantic
similarity between images. Given that ORN is an ontology instantiation, we employ
the ontology distance measurement in [48] to compute ORN distances. In particular,
we first pre-compute the ORNs for images in our image library which contains over
30,000 images. Then for each query image, we automatically generate its ORN, and
retrieve images with the most similar ORNs from the image library. The result images
are sorted by ORN distances. Figure 4.3 and Figure 4.4 illustrate several search results
of our approach. Search results from Google Image Search by Image are also included
forreference.
39
Query image Top-4 search results ORN
ride
HorseRider1
Horse1
Our approach
Google Image Search by Image
Chair2
Chair1
A Collection of
Chairs
Chair3
Chair4
Google Image Search by Image
Our approach
Figure4.3: ImagesearchresultsofourapproachandGoogleImageSearchbyImage(http://images.google.com/)-Part1
40
Query image Top-4 search results ORN
Soccer
Player3
Google Image Search by Image
Our approach
Soccer
Player1
SoccerBall1
kick kick
A Collection of
SoccerPlayers
Soccer
Player2
Figure4.4: ImagesearchresultsofourapproachandGoogleImageSearchbyImage(http://images.google.com/)-Part2
41
4.4 SemanticImageClustering
4.4.1 Overview
Imageclusteringisanimportanttoolinprocessinglargecollectionsofimages. Thegoal
of image clustering is to organize a large set of images into clusters, such that images
within the same cluster have similar meaning. Image clustering provides high-level
summarization of large image collections, and thus has many useful applications. For
example,clusteredwebimagesearchresultsandimagerepositoriesaremoreconvenient
foruserstobrowse. Inaddition,theefficiencyofimagesearchinalargeimagedatabase
canbesignificantlyimprovedbyretrievingclusteredimagegroupsratherthanindividual
images.
Manyresearcheffortstacklethecomplicatedproblemofimageclusteringbysolving
three subproblems. Given a collection of images, first, a set of features are extracted
from each image as its description. The features can be low-level visual features (e.g.,
[56, 17, 32]), web context features (e.g., [15, 29, 65]), or region-based features such as
the well-known bag-of-words model [42, 13, 30]. Second, a clustering algorithm (e.g.,
k-means,NCut,kNN)isappliedbasedoncertaindistancemeasurementsdefinedinthe
feature space, to group the image collection into multiple clusters. Finally, each cluster
islabeledwitheitheratextdescriptionorarepresentativeimage.
Although prior work has offered partial solution to this problem and has been suc-
cessfully used in many applications, we notice two major limitations. First, current
visual feature based clustering methods usually use local features that do not have
semantic meaning. Thus, given two images, there is no significant correspondence
between their semantic distance and their visual feature distance. These methods risk
grouping images with different semantics into the same cluster, which is unsatisfactory
fromtheperspectiveofhumanusers. Althoughsupervisedmachinelearningapproaches
42
Image Object Relation Network
kick
SoccerPlayer2 SoccerPlayer1
SoccerBall1
A Collection of SoccerPlayers
kick
Bag of semantics
hold
Person 1
SoccerBall1
hold
BasketballPlayer1
Basketball1
person,
hold,
soccer ball
soccer player,
kick,
soccer ball,
soccer player
collection
basketball
player,
hold,
basketball
Viewed through Lens
{Person, Ball}
Viewed through Lens
{Person, Soccer Ball, Basketball}
person,
ball
person,
soccer ball
person,
ball
person,
basketball
person,
ball
person,
soccer ball
Figure 4.5: Images and their bag-of-semantics descriptions automatically generated
from our system. ORNs are shown in the middle as intermediate results. The left two
columnsshowtheviewsofbag-of-semanticsmodelsfromtwodifferentlenses.
canbeintroducedtoreducethegapbetweenlocalvisualfeaturesandimagesemantics,
theymayfailwhendealingwithspecificsemantics. E.g.,theycanhardlytelltheseman-
ticdifferencebetweentheball-playingscenesinFigure4.5thefirstcolumn.
The second limitation of the current image clustering methods is that they usually
act as a black box to users. Thus, users have no control of the clustering performance.
However,weobservethatdifferentuserscanhaveverydifferentpurposesofclustering.
E.g., a user focused on ball types wants to group the top two images in Figure 4.5
together since both of them contain a soccer ball, while a user targets at person types
wants to group the bottom two images together as they are both about athletes. Thus,
usersshouldhavecontroloftheimageclusteringprocess.
43
We present a novel image clustering method to address the above two issues. Our
approach is based on Object Relation Network (ORN), a graph model representing
informative and consistent semantics for objects and their relations in an image (e.g.,
Figure 4.5 middle column). Given an ORN automatically generated for each image,
we propose an image feature model named bag-of-semantics, which contains a set of
semantic descriptors for the image based on its ORN (e.g., Figure 4.5 right column).
Since the ORN is derived from a guide ontology (e.g., Figure 4.6 bottom), the class
hierarchiesintheguideontologycanserveasdifferentlevelsoflensestoviewthebag-
of-semanticsmodel. Inparticular,alensconsistsofasetofontologyclassesthatcanbe
distinguished under it. For example, viewed from a coarse lens containing only Person
and Ball, the bag-of-semantics models for all three images in Figure 4.5 become the
same set{Person;Ball}. In contrast, viewed through a finer lens containing Basket-
ball and Soccer ball, the semantic difference in ball types between the bottom image
and the other two images can be easily identified. Therefore, we cluster images by
grouping images with the same bag-of-semantics viewed through a certain lens. We
achieve hierarchical image clustering by going top-down through the class hierarchies
intheguideontology(andthusaseriesofcoarse-to-finelenses). Inaddition,userpref-
erencesinclusteringcanbecapturedbychoosingdifferentlensesatcertainlevels(e.g.,
splitting Person into subclasses and splitting Ball into subclasses lead to different clus-
tering results for images in Figure 4.5). Finally, each image cluster is labeled with its
bag-of-semanticsunderthecorrespondinglens.
Ourmaincontributionsinclude:
1. Weproposeabag-of-semanticsmodeltodescribeimagesfortheimageclustering
problem. The model explicitly reveals the semantics of an image. Thus, our
clustering algorithm is guaranteed to group semantically-similar images into the
samecluster.
44
2. We present a top-down hierarchical image clustering algorithm by viewing the
bag-of-semanticsmodelthroughlevelsoflenseswithdifferentsemanticgranular-
ities,basedontheclasshierarchiesintheguideontology.
3. Weenableusercontrolintheimageclusteringproblem. Weprovideamechanism
foruserstobrowsethroughtheimagecollectionandmakeintuitiveadjustmentto
theclusteringresults.
4.4.2 Bag-of-SemanticsModel
Wemodelanimageasacollectionofsemanticdescriptorsforbothstaticimageobjects
and the binary relations between them. In particular, we adopt Object Relation Net-
work (ORN) to capture image semantics. ORN is a graphical model that links objects
inanimagethroughmeaningfulrelations(Figure4.5middlecolumn). Guidedandcon-
strainedbyaguideontology,ORNrepresentsthemostprobablemeaningoftheobjects
andtheirrelations,byassigningeachgraphnodetothemostprobableclassintheguide
ontology. Therefore, an image can be described by the ontology class assignments in
the ORN, e.g., Figure 4.5 right column. This image description model captures the
semanticsofbothobjectsandtheirrelations,andthuswecallit bag-of-semantics.
Notethatthebag-of-semanticsmodelsinFigure4.5canbeeasilyextendedtocarry
more semantics. If the domain and/or the range information of a relation needs to be
preserved, we can associate such information with the relation descriptor. E.g., for
the first image in Figure 4.5, if we want to keep the domain information of relations,
its bag-of-semantics model becomes{person, person hold, soccer ball}. In addition,
cardinality can also be expressed in the bag-of-semantics model by adding numbers to
objects. E.g., for the second image in Figure 4.5, its bag-of-semantics model becomes
{twosoccer players,kick,one soccer ball, soccer player collection}.
45
Object
O-O Relation
Basketball
Player
Interact
Throw
Basketball
Non-interact
P-B Relation
Hold Head Kick
Ball Person
Athlete
Object
Collection
Soccer Ball
Soccer
Player
object
class
relation
class
subclassOf domainOf rangeOf memberOf
Figure4.6: Anexampleguideontology
4.4.2.1 BagofSemantics
Given an ORN automatically created for an image I, the image’s bag-of-semantics
descriptionisthesetofclassassignmentsinitsORN.Thus,thebag-of-semanticsmodel
S(I)isasubsetoftheguideontologyclassesG(Formally,Gisthenodesetoftheguide
ontology). Classes in the guide ontology are organized into semantic hierarchies. The
subclassOf property(denotedas⊑andillustratedasthickarrowsinFigure4.6bottom)
linkstheclassnodesinGintoaforestrootedatthreegeneralclasses: Object,O-ORela-
tion, and Object Collection. Therefore, each semantic descriptors∈ S(I) corresponds
to a series of concepts in the class hierarchies{t|s⊑ t;t∈ G}. E.g., a soccer player
can be regarded as a soccer player, an athlete, a person, or an object, according to the
ontologyinFigure4.6.
This hierarchical structure of the semantic descriptors is very useful in our image
clusteringmethod,becauseitcandescribeanimagewithsemanticsfromaverygeneral
level to a very specific level. Clustering is achieved by grouping images with the same
semantics under certain semantic granularity. In the next section, we formally define
46
lenses to control the semantic granularity and present our hierarchical image clustering
algorithmbasedonthebag-of-semanticsmodel.
4.4.3 ImageClustering
4.4.3.1 Lenses
Lenses characterize the semantic granularity for the bag-of-semantics model to express
image semantics. We define a lens as a set of ontology classesL⊆ G, whereG is the
node set of the guide ontology. Viewed through a lens L, a semantic descriptor s is
regarded as its closest ancestor inL, denoted ass
L
. For completeness,s
L
is∅ ifs has
noancestorinL.
Intuitively,L determines the ontology classes that can be distinguished by the lens.
The coarsest lens contains only three general classes, i.e., Object, O-O Relation, and
Object Collection. With this lens, every semantic descriptor is mapped to one of the
generalclasses. Littledifferencecanbefoundbetweenthedescriptors. Onthecontrary,
under a fine lens containing many specific semantic concepts such as Basketball and
Soccer Ball, the corresponding semantic descriptors (e.g., the balls in Figure 4.5) are
expressedwithspecificconceptsanddistinguishedaccordingly.
4.4.3.2 ImageClusteringwithLenses
Viewed through a lens L, the bag-of-semantics S(I) of an image I are expressed as
set S
L
(I) ={s
L
|s∈ S(I)}. Two images with the same bag-of-semantics expression
S
L
(I) = S
L
(J) are indistinguishable under the lensL. We thus group images with the
samebag-of-semanticsexpressionunderacertainlensintothesamecluster.
At first glance, this clustering algorithm may produce as many as 2
jLj
clusters since
S
L
(I)⊆ L and there are 2
jLj
possible subsets ofL. However, since ORNs are created
47
following the semantic constraints in the guide ontology, many of L’s subsets do not
have feasible ORNs, and thus the image clusters corresponding to these subsets are
empty. For instance, given a lens L ={Soccer Player;Kick;SoccerBall}, no image
hasthebag-of-semanticsrepresentationas{Kick}becausethedomain/rangeconstraints
force an image with a kick relation to have a soccer player as its domain and a soccer
ballasitsrangeaswell.
4.4.3.3 HierarchicalClustering
We propose a top-down hierarchical image clustering algorithm by going through a
seriesofcoarse-to-finelenses. Westartwiththecoarsestlenscontainingonlythegeneral
classes and cluster images accordingly. With more specific semantic concepts added
to the lens, we divide each cluster into subclusters according to the refined lens. In
particular, we take advantage of the class hierarchies of the guide ontologyG. In each
lens refinement step, we adopt a split operator to a class nodef inL that has not been
split. The split operator adds f’s child class nodes in G to L, and divides the image
clustersaccordingtotherefinedlens. Thehierarchicalimageclusteringalgorithmstops
whentherearesufficientnumberofclusters.
Inpractice,wefoundthatnotallthenodesinGarehelpful. Lensesincludingsome
insignificant nodes may over-segment the image collection. For example, lenses with
a Non-interact node can tell the difference between images with and without a non-
interact relationship. This difference, however, is hardly interesting to users. To avoid
such insignificant node, we introduce a hide operator that hides a class node together
withitsdescendantsfrombeingselectedintolenses. Bylimitingthelensestothe“visi-
ble”domainoftheguideontology,onlysignificantclassnodesareemployedtosegment
imageclusters.
48
The pseudocode of the automatic hierarchical image clustering algorithm is shown
inAlgorithm1.
Algorithm1:HierarchicalImageClustering
Input: ImagecollectionIwithbags-of-semanticsS(I) ={S(I)|I∈I},guide
ontologyG
Output: ImageclustersC ={C
i
}
Initialization: L;F←{visible general classes}
C←clustersofIgeneratedusingL
whileF̸=∅ and|C|< do
findf∈F withthesmallestdepthinG
findf’svisiblechildnodesetChild
v
(f)inG
F←F
∪
Child
v
(f)\{f}
ifChild
v
(f)̸=∅then
L←L
∪
Child
v
(f)
foreachC
i
∈Cdo
divideC
i
intosubclustersusingL
replaceC
i
inCwithitssubclusters
4.4.3.4 UserControlinImageClustering
Since the bag-of-semantics model carries rich semantics of images, user preferences
can be captured by choosing different coarse-to-fine paths for the lens. We design a
user control mechanism that allows each user to modify the sequence of class nodes to
be split. In particular, the user identifies the most important class nodes in his opinion.
By bringing the subclasses of these class nodes to the lens at an early stage, the cluster
hierarchy can faithfully capture the user’s preference. Figure 4.7 shows an example of
different cluster hierarchies created for two users. User A is more interested in various
relations and thus he chooses to split O-O Relation’s subclasses first. On the contrary,
user B is more focused on person types. By evolving the lens through two different
paths (Figure 4.7 right), different cluster hierarchies are generated according to the two
users’preferences(Figure4.7left).
49
Cluster hierarchy
Class added
to the lens
Class split
Object
O-O relation
O-O relation
P-B relation
Interact
Hold
Throw
Kick
Head
Non-interact
P-B relation
{ object }
Ø
{ object, hold }
{ object,
non-interact }
{ object, kick }
Object
O-O relation
(A)
(B)
Ø
Object
Person
Ball
{ person } { ball }
Person
Athlete
Soccer player
Basketball player
{ soccer player, oor* } { basketball
player, oor* }
{ person,
oor* }
*oor=o-o relation
Figure 4.7: Different choices of lens result in different cluster hierarchies. Lenses are
chosen to cluster images according to relation types and person types in (A) and (B)
respectively.
In our implementation, the image clustering system first applies the automatic clus-
tering algorithm (Algorithm 1) to generate an initial cluster hierarchy for the user to
50
browse through. In each lens refinement step, the user has the option to participate and
choose a class node he thinks to be the most important. Our system splits the specified
node, finds clusters based on the refined lens, then applies Algorithm 1 to update the
subsequentclusterhierarchythatisusedforfurtherbrowsing.
Finally, each image cluster is labeled with its bag-of-semantics under the corre-
spondinglens, e.g.,thedescriptionsundereachimageclusterinFigure4.7left.
4.4.4 Experimentalresults
The dataset used in our experiment contains over 28,000 images from VOC2011 [23]
andImageNet[21]. Werandomlychoose2,000imagesforthetrainingprocessofORN
generation. Our guide ontology contains class hierarchies and constraints for 6 generic
objectclasses(Person,Bicycle,Motorbike,Horse,Chair,Ball)andtheirrelationclasses.
Weadoptthedetectorsin[26]toperformobjectdetection.
Besidestheuser-controlledclusteringresultsinFigure4.7,weshowmorequalitative
resultsproducedbyourautomaticclusteringalgorithminFigure4.8andFigure4.9. Fig-
ure4.8illustratesseveral“good”resultsobtainedbysplittingsubclassesofPersonclass
andalltherelationclasses(i.e.,wehidetheObjectCollectionclassandtheNon-interact
relationclass). Eachrowcontains5exampleimagesfromaresultcluster(withitsbag-
of-semantics descriptors shown in the first column). These result clusters demonstrate
thatourclusteringalgorithmhassuccessfullyclassifiedsemantically-similarimagesinto
thesamecluster,eventhoughthevisualfeaturesofsomeimagesarequitedifferentfrom
each other. Figure 4.9 shows a few “bad” cluttering results. We notice that most of the
clustering errors are caused by missing detection (the first example) or false detection
(the rest of examples). The accuracyof our algorithm mainly depends on the quality of
detectionandORNgeneration.
51
Quantitative evaluation is not included in our experiments for the following reason.
For image clustering, deciding whether an image is classified to the right cluster is a
subjective matter. Different users can have different judgments on a clustering result,
due to the diversity of their preferences. It is hard to obtain clustering ground truth
regarding each user for a large image dataset like the one we used, which makes it
impracticaltomakequantitativemeasurementssuchasprecisionandfalsealarmrate.
The most time-consuming step in our method is ORN generation, which usually
takes one minute per test image. The computation of each bag-of-semantics model and
the clustering after each class-split both finish within a couple of seconds. These times
aremeasuredonalaptopwithInteli-7CPU1.60GHzand6GBmemory. f
52
{ soccer player,
kick,
soccer ball }
{ person, hold,
soccer ball }
{ cyclist, ride,
bicycle }
{ basketball
player, throw,
basketball }
{ motorcyclist,
ride, motorbike }
{ person,
ride, horse }
{ chair }
{ soccer player,
head,
soccer ball }
Figure 4.8: Several “good” clustering results obtained by using very fine lenses,
i.e.splitting subclasses of Person class and all the relation classes. Each row contains
exampleimagesandthelabelofacluster.
53
{ horse }
{ person }
{ person,hold,
soccer ball }
{ person, ball } {motorbike }
Figure 4.9: Examples of “bad” clustering results. The clustering errors are usually
causedbydetectionerrors(e.g.,falsedetectionandmissingdetection).
54
Chapter5
LearningtoRankSemantic
Relationships
Thischapterpresentsanovelrankingmethodforcomplexsemanticrelationship(seman-
tic association) search based on user preferences. The goal of our work is to automati-
cally capture user preferences and effectively leverage these preferences to personalize
semanticassociationsearchresults. Ourmethodismotivatedbythenotionthat,theuser
whostartsthesemanticassociationqueryshouldbethebestjudgeoftherelevanceofa
query result. Thus, user assessments on the search results, as user-specific information,
canserveasavaluablesourceofuserpreferences. Wepresentamachine-learningbased
ranking method to automatically capture a user’s preferences from his assessments on
the search results. Our method creates a personalized ranking function for each user
based on the learned preferences. The ranking function is then used to improve the
relevance of search results for the user’s subsequent queries. In particular, our method
allows each user to assess the results of a small set of randomly selected queries, by
assigningrankstoafewofhisfavoriteresults. Eachresultsemanticassociationischar-
acterizedbyasetofquantitativefeatures. WeuseanSVM-basedlearning-to-rankmodel
tocaptureauser’spreferencesonthesefeaturesfromuser-assessedresults.
55
5.1 SemanticAssociationFeatures
Toformulatetherankingproblem,wefirstneedasetoffeaturestocharacterizesemantic
associationsfromvariousaspects. Priorsemanticassociationrankingapproaches[6,38]
uselimitednumberoffeatures(usuallylessthan10),andrequireuserstomanuallyspec-
ify the weights of the features. In order to get desirable results through manual specifi-
cation,ausermusthavegoodunderstandingoftherankingscheme. However,forusers
whoarenotfamiliarwiththerankingscheme,theyarelikelytogetunsatisfactoryresults
because of improperly tuned ranking criteria. In addition, as more features are used to
comprehensively describe a semantic association, it will become very time-consuming
and tedious, even for experienced users, to explicitly specify their preferences by man-
uallytuningtheweightsoftensoffeatures.
The goal of our learning-to-rank method is to automatically capture user prefer-
encesonassociationfeatures,andgenerateapersonalizedrankingfunctionforeachuser
accordingly. Thus, users are released from the burden of manually tuning the parame-
ters. Inaddition,ourlearning-to-rankmethoddoesnotrelyonaspecificsetoffeatures.
There is no limitation on the number of features used to describe an association. In
general, any association feature set (e.g.those proposed in prior research such as [6])
containinganynumberoffeaturescanbeadoptedbyourmethod. Thus,ourmethodcan
support comprehensive description of associations without incurring any overload to
users. Wechoosetousearelativelylargefeaturesetthatcanbedirectlycalculatedfrom
semanticassociations. Thesesemanticassociationfeaturesaredetailedinthefollowing
subsections.
56
5.1.1 AssociationLength
Association length is a metric for measuring the amount of properties contained in a
semantic association. The longer a semantic association is, the more properties it con-
tains. Assume A is a semantic association, E
A
={e|e∈ A} and P
A
={p|p∈ A}
denote the entities (nodes) and properties (edges) in A respectively, the length of A is
definedasthenumberofitsproperties:
L
A
=|P
A
|: (5.1)
5.1.2 TopicFeatures
Topicfeaturesquantitativelycharacterizethetopicscoveredbytheentitiesofasemantic
association. In a large RDF model, classes at the schema level can be categorized into
several topic regions based on the knowledge domain they describe. Thus, the topic of
anentitycanbedeterminedbythetopicregionofitscorrespondingschema-levelclass.
For example, Figure 5.1 illustrates three topic regions in a small schema. Based on this
region division, entities Harry Potter and Magic are about the topic of character, while
the topic of the entity Order of the Phoenix is organization. Intuitively, the topic of a
semanticassociationAisdecidedbythetopicsofitsentities. Thus,wedefinethetopic
featureofAwithrespecttoatopicS
i
as
C
A
(S
i
) =
|E
i
|
L
A
+1
; (5.2)
whereE
i
={e|e∈ E
A
∧typeOf(e)∈ S
i
} consists of entities that belongs to topicS
i
,
and typeOf(e)istheclassofentitye.
57
Topic: organization
Topic: object
Topic: character
Fictional
Character
Fictional Object
has_possessed
Organization in
Fiction
has_member
Gender
has_gender
Character
Power
Ethnicity in
Fiction
sub_organization_of
Material in Fiction
composition
Destroyer
subclass_of
destroyed_by
power_or_ability
has_ethnicity
Organization
Type
type
Harry Potter Magic Order of the Phoenix
has_member power_or_ability
entity class instance of
Figure 5.1: A small fraction of our Freebase knowledge base schema with three topic
regions
Inpractice,thetopicregionsofschema-levelclassescanbedeterminedeitherbylet-
tinguserspecifythroughanontologyvisualizationtool,orbyanalyzingtheprovenance
oftheclasses.
5.1.3 RelationComplexity
In a large linked dataset like Freebase, it is common that relations between two entities
canbecomplexandhaveitsownproperties. Atypicalsolutioninontologyengineering
is to create a relation node to represent such complex relation and assign properties to
this node. Figure 5.2 illustrates an example of a complex relation node. We use the
58
Harry
Potter
Hogwarts
School
Ginny
Weasley
marriage
time:2010
location: Hogwarts School
witness: Ronald Weasley
Marriage
/m/02h_fsf
Ronald
Weasley
Integer:2010
represented as
location
spouse
witness
time
spouse
Harry
Potter
Ginny
Weasley
Figure5.2: Anexampleofacomplexrelationnode
proportion of complex relation nodes to define relation complexityPC
A
of a semantic
associationA:
PC
A
=
|M|
L
A
+1
(5.3)
whereM ={e|e∈E
A
∧eisacomplexrelationnode}.
5.1.4 PropertyFrequencyFeatures
Frequency of a property in a semantic association is an important hint about the rarity
andcommonnessoftheproperty. Forexample,foranorganizationOrderofthePhoenix,
ithaselevenhas member propertiesandonehas founder property,whichindicatesthat
has member is a common property for this organization while has founder is rare. The
frequency of each property can collectively decide the rarity and commonness of the
entiresemanticassociation. Givenapropertyfrome
x
toe
y
denotedbyp
i
: (e
x
;e
y
),we
defineincomingfrequencyandoutcomingfrequencyofp
i
as:
f
out
(p
i
) =
|P
out
i
|
d
out
(e
x
)
(5.4)
f
in
(p
i
) =
|P
in
i
|
d
in
(e
y
)
(5.5)
59
where P
out
i
={p|p : (e
x
;e)∧ typeOf(p) = typeOf(p
i
)}, P
in
i
={p|p : (e;e
y
)∧
typeOf(p) = typeOf(p
i
)},e is an arbitrary entity andtypeOf(p) denotes the class of
property p; d
in
(e) and d
out
(e) denote the number of incoming and outgoing properties
oferespectively.
Let A denote a semantic association between entities e
s
and e
t
, d
in
(e
s
) = 0,
d
out
(e
t
) = 0,PF
A
={f(p) =f
in
(p)+f
out
(p)|p∈P
A
},weusefourstatisticalfeatures
(averagevalue,standardvariance,minimumvalueandmaximumvalue)tomeasurethe
overallpropertyfrequencyofA:
F
p
A
={(PF
A
);(PF
A
);min(PF
A
);max(PF
A
)}: (5.6)
5.1.5 PopularityFeatures
Thenumberofincomingandoutgoingpropertiesofanentitycanbeviewedasahintof
itspopularity[6]. Asemanticassociationwithmanypopularentitiesisalsolikelytobe
popular. Assume D
A
={d(e) = d
in
(e) +d
out
(e)|e∈ E
A
}, similar to equation (5.6),
weusethefollowingstatisticalfeaturestodescribepopularityofasemanticassociation
A:
F
e
A
={(D
A
);(D
A
);min(D
A
);max(D
A
)}: (5.7)
Featurevector: Basedonthefeaturesdefinedintheabovesections,wedefinefeature
vectorofasemanticassociationAas
x
A
= (L
A
;C
A
(S
1
);:::;C
A
(S
k
);PC
A
;(PF
A
);(PF
A
);
min(PF
A
);max(PF
A
);
(D
A
)
max(D
A
)
;
(D
A
)
max(D
A
)
;
min(D
A
)
max(D
A
)
) (5.8)
60
wherek is the number of topics in the schema. Each feature vectorx corresponds to a
pointinthefeaturespaceX. X isa (k +9)-dimensionalspace.
5.2 LearningtoRank
5.2.1 TheRankingFramework
The core of our system is a machine-learned ranking algorithm, which learns a person-
alizedrankingfunctionforeachuser. Inparticular,foraspecificuser,weaskhim/herto
assignrankstosemanticassociationsreturnedfromaqueryq
i
,whichbelongstoasmall
queryset,q
i
∈Q. Eachuser-assignedrankr
i
ispairedwiththefeaturevectorx
i
ofthe
correspondingsemanticassociation. Atrainingsampleconsistsofallsuchpairs
S(q
i
) ={(x
1
;r
1
);(x
2
;r
2
);:::;(x
n
;r
n
)}: (5.9)
GiventrainingdatasetD =∪S(q
i
),ourobjectiveistolearnalinearrankingfunction
h :X7→Rwhichcalculatesascorefromafeaturevector
h(x) =w
T
·x; (5.10)
wherew is a vector with the same length asx. Vectorw is learned for each particular
user a, and the corresponding personalized ranking function is used for all a’s subse-
quent queries. Given a list of semantic associations returned from a query, we first
calculate a score for each semantic association by applying h over its feature vector;
then sort the entire list based on the scores. Thus a personalized ranking based on user
preferencesisproduced.
61
5.2.2 TheRankingSVMAlgorithm
We choose to employ a pairwise machine-learned ranking algorithm for the learning
process. Therearetwomajorreasonsbehindthischoice: first,pairwisemachine-learned
rankingalgorithmsbenefitfromadvancedbinaryclassifiers,andthusareadequatetoour
problem; second, the distribution-skew problem [46] of the pairwise machine-learned
ranking algorithms does not exist in our case, since the training samples in our training
setareofafixedsize(detailedinSection5.4).
In particular, for a pair of user-ranked associations (x
u
;r
u
) and (x
v
;r
v
) from the
sametrainingsampleS(q
i
),wecomparetheirranksanddenotethepreferencevalueby
y
u;v
=
1 ifr
u
>r
v
−1 otherwise
(5.11)
Therefore, the ranking problem is reduced to a binary classification problem with a
targetclassificationfunctionh
(x
u
;x
v
) =y
u;v
∈{±1}. Weadoptasoft-marginsupport
vectormachine(SVM)algorithm[18]tolearnalinearclassifierwhichseparatespositive
samplesandnegativesamplesonspaceX×X ascleanlyaspossible,whilemaximizing
themarginbetweenthem,asdemonstratedinFigure5.3.
Inourrankingproblem,positivesamplesandnegativesamplesarealwayscreatedin
pairs, i.e.,h
(x
u
;x
v
) =−h
(x
v
;x
u
); which are symmetric to hyperplaneL :x
u
=x
v
.
Therefore, the classification plane always contains this hyperplane and thus can be
expressed in the form ofL
c
: (w;−w)
T
· (x
u
;x
v
) = 0; wherew is the only unknown
variable which is learned by the SVM algorithm. The classification problem is equiv-
alent to calculatingw·x
u
andw·x
v
respectively and comparing these two scores.
Rankingcanbeachievedbysortingh(x) =w
T
·x.
62
u
v
Margin
Positive samples
Negative samples
Classification plane
Support vectors
Figure 5.3: Support vector machine algorithm: a linear classifier is learned to separate
positiveandnegativesampleswithmaximalmargin.
5.2.3 UserPreferencesandWeightVector
The linear ranking function h can be also viewed as a weighted sum of the feature
vector. The weight vectorw reflects the importance of each feature for a particular
user. Sincewislearnedperuser,differentusersmayhavedifferentweightvectors. The
weight of a particular feature represents the importance of this feature to a user, i.e.,
the user’s preference of this feature. Table 5.1 demonstrates the preferences of three
differentusers,inwhichUser1andUser2aremostinterestedinsemanticassociations
withmorecomplexrelations,whileUser3ismostinterestedinresultsaboutaparticular
topic.
63
UserID Featuressortedbyimportance(correspondingweights|w
i
|)
1 PC
A
≻C
A
(S
3
)≻L
A
≻C
A
(S
35
)≻···
2 PC
A
≻(D
A
)=max(D
A
)≻C
A
(S
5
)≻C
A
(S
16
)≻···
3 C
A
(S
17
)≻(D
A
)=max(D
A
)≻C
A
(S
3
)≻···
Table5.1: Featuressortedinimportancedescendingorderfordifferentusers
Learning Module
User Module
Ranking Module Query Module Data Module
RDF KB
Freebase
Parser
Query
Processor
RDF
Ontology
Query
Interface
Personalized
Ranking
Function*
SVM
Learning
Algorithm
Ranked
Results
Training Data
Labeling
Interface
Features
Calculator
Query
Results
User Ranked
Query
Results
Ranked Query
Results with
Features
Training Query
and Unranked
Results
Query
Results with
Features
Freebase Data
(.tsv files)
Figure5.4: Architectureofourrankingsystem(∗: Eachuserhashisownrankingfunc-
tion.)
5.3 SystemImplementation
To validate the effectiveness of our ranking method, we designed and implemented the
semantic association search and ranking system as shown in Figure 5.4. It consists of
fivecomponents:
1. DataModule: TheoriginaldataofFreebaseisintheformof.tsvfiles. Wecreate
aparsertoautomaticallyparseFreebasedataintoalargeRDFontologyknowledgebase.
2. User Module: A user can have three types of interactions with the system: 1)
initiating a semantic association query regarding a pair of resources; 2) viewing per-
sonalized ranked results of the query; 3) training the system to better understand his
preferencesbyassigningrankstohisfavoriteresultsofatrainingquery.
64
3. Query Module: Query processor takes the input pair of resources and retrieves
allthesemanticassociationsbetweenthemfromtheRDFknowledgebase.
4. Ranking Module: Features calculator computes the features defined in Sec-
tion 5.1 for each query result. The personalized ranking function takes in query results
withtheirfeatures,andranksthemaccordingtotheuser’spreferences.
5. Learning Module: Training data consists of user-ranked results for a training
query and features of these results. A few unranked results and their features are also
selected to reduce the skew of the ranking function (detailed in Section 5.4). We adopt
a soft-margin support vector machine algorithm [37] to implement our ranking SVM
algorithm.
5.4 Evaluation
The objective of incorporating learning capability into our ranking approach method is
toimprovetherelevanceofsemanticassociationsearchresultsforeachindividualuser.
Tothisend,weevaluatetheper-userandtheoverallrankingqualityofourmethodonthe
ranking system described in Section 5.3. In particular, we compare the ranking quality
ofourmethodtotwoothermethodsundervariousqualitativeandquantitativemetrics.
5.4.1 ExperimentalSetup
5.4.1.1 Benchmarks
Our experiments were conducted on a consumer level laptop (Intel i-7 CPU 1.60GHz
with 6GB memory). Our dataset is an RDF ontology knowledge base we create from
theentirefictional universedomainofFreebaselinked-open-data. TheRDFknowledge
basecoversinformationaboutalltypesoffictionalworks,especiallythecharactersand
organizations that appear in them. Our dataset contains 192K entities (185K regular
65
Topic
Fictional
character
Workof
fiction
Romantic
involve-
ment
Fictional
setting
Character
creator
# of instances 150,832 22,551 3,179 2,149 1,685
Topic
Sibling
relation-
ship
Fictional
organiza-
tion
Employment
tenure
Personin
fiction
Character
species
# of instances 1,561 1,247 1,166 1,097 1,064
Table5.2: Top10topicswiththemostnumberofentities
entity nodes and 7K complex relation nodes) and 411K properties. The schema of our
RDF knowledge base is available at [1]. In addition, our dataset covers 36 topics that
describethefictional universedomainfromvariousaspects. Table5.2showsthetop10
topicswiththemostnumberofentities.
For each semantic association search query q
i
, our search engine employs a depth
limited search algorithm to find the semantic associations that satisfy q
i
with a length
smaller than a given threshold . The search engine then outputs the first K results as
the result set T(q
i
). In our experiments, we retrieve the first 2000 results of which the
lengthissmallerthan10, i.e., = 10andK = 2000.
5.4.1.2 Rankingmethodscompared
Wecompareourmethod(LtR)withtwootherapproaches: baselineandLtR CA.The
baseline approach is presented in [6]. This approach defines a set of semantic associ-
ation features, and requires users to manually assign weights to the features. We adopt
the default weight assignment used in an official implementation (SemDis Project [2]).
Our feature set and the feature set in the baseline approach both use association length
and topic features. But for topic features, we analyze the provenance information of
classes to decide topic regions, rather than letting the users specify them. The rest of
the features in the two sets are all different. To ensure the soundness of our evaluation,
66
we also test our learning-to-rank framework with the feature set of the baseline. This
method is denoted as LtR CA, in which feature weights are learned using our SVM
rankingmodel.
Note that we do not compare with SemRank (Anyanwu et al., 2005) because: first,
the ranking criteria used in SemRank is not based on features of semantic associations,
whileourmethod(LtR),thebaselineapproach,andLtR CAareallfeature-basedmeth-
ods;second,SemRankaimsatprovidingatoolforuserstolookthroughdifferentlenses
between Conventional and Discovering, while this work is focused on capturing differ-
entuser’spreferences. Thus,wehavenotincludedSemRankinthecomparison.
5.4.1.3 Trainingqueriesandtestqueries
We invited 20 graduate students to participate in the evaluation. Given that the fiction
Harry Potter and its characters are known to most of our participants, we designed
a query set containing 30 queries between the characters in Harry Potter. The entire
query set and detailed statistics can be found in Table 5.3. For each user, 10 queries
q
1
∼ q
10
are randomly selected from the query set. The user is instructed to label the
corresponding result sets T(q
1
)∼ T(q
10
). In particular, for each result set, the user
needs to read through the entire set, pick 10 most interesting results and assign them
rank 1∼ 10 based on his preferences. To consider the effect of unranked semantic
associations, M unranked results are randomly selected and assigned rank 11, which
creates 10M more pairs to ensure that a user-ranked result is more important than an
unrankedone. Thesepairsrewardthepairsbetweenthetop-10rankedresults,butshould
not dominate the training set. Thus, we choose a smallM in our experiments(M = 5).
I.e., each labeled result setS(q
i
) contains 15 results. Finally, from the 10 queriesq
1
∼
q
10
,werandomlychoose5astrainingqueries,andusetherest5areusedastestqueries.
67
Query Entity1 Entity2 Min.L
A
Max.L
A
#ofresults
1 AlbusDumbledore JamesPotter 2 5 1852
2 AlbusDumbledore HermioneGranger 2 5 1700
3 DracoMalfoy FredWeasley 2 5 2000
4 FredWeasley LordVoldemort 2 5 2000
5 GeorgeWeasley FredWeasley 2 5 2000
6 GeorgeWeasley AlbusDumbledore 2 5 2000
7 GinnyWeasley ChoChang 2 5 2000
8 GinnyWeasley GeorgeWeasley 2 5 2000
9 HarryPotter JamesPotter 1 5 2000
10 HarryPotter GinnyWeasley 2 5 2000
11 HarryPotter LordVoldemort 2 5 2000
12 HarryPotter HermioneGranger 1 5 2000
13 HarryPotter SiriusBlack 2 5 2000
14 JamesPotter SeverusSnape 2 5 2000
15 JamesPotter LuciusMalfoy 3 5 2000
16 JamesPotter LordVoldemort 2 5 2000
17 JamesPotter LuciusMalfoy 3 5 2000
18 LilyEvansPotter
Neville
Longbottom
2 5 2000
19 LordVoldemort JamesPotter 2 5 1856
20 LordVoldemort GinnyWeasley 2 5 1827
21 LunaLovegood LuciusMalfoy 4 6 1603
22 LunaLovegood SiriusBlack 4 6 2000
23 LunaLovegood FredWeasley 2 5 1291
24 RemusLupin JamesPotter 2 5 2000
25 RonaldWeasley ChoChang 2 5 1912
26 RonaldWeasley GinnyWeasley 2 5 1832
27 SeverusSnape GinnyWeasley 2 5 1650
28 SiriusBlack RemusLupin 2 6 1555
29 TomRiddleSr. LilyEvansPotter 3 5 502
30 TomRiddleSr. RemusLupin 3 5 400
Table5.3: Querysetandresultstatistics
5.4.2 RankingResults
Figure 5.5 includes four screenshots taken from our implementation. It illustrates the
ranking results of our method and the baseline for two users on the same query (i.e.,
68
searchsemanticassociationsbetweenGinnyWeasleyandChoChang). Eachscreenshot
shows the top six ranking results from one method for one user. Our method demon-
strates a significant advantage in capturing different users’ preferences. For User 1, the
firstsixresultsofourmethodsuccessfullycapturehistopsixfavoritesemanticassocia-
tions; while for the baseline, only two of his ten favorite results with relative low ranks
are shown in the screenshot. For User 2, none of her favorite ten results is captured by
the baseline, but our method is still able to capture her most top two favorite results at
Rank1andRank2.
5.4.3 Comparison1: RankingQualityperUser
We compare ranking quality of our method(LtR), thebaseline approach andLtR CA,
intermsofefficiencyanduserpreferencescaptureforeachindividualuser,basedonthe
followingmetrics:
5.4.3.1 Timecomplexity
Thetrainingprocesstakesonlyafewsecondsanditisjustperformedonce. I.e.,oncethe
personalizedrankingfunctionislearned,itperformsasfastasaweightedsumfunction
for all the subsequent queries. In the testing phase, for all three methods, most of the
time is consumed on path finding, which takes a couple of seconds. The additional
overheadoffeatureanalysisandrankingisverysmall.
5.4.3.2 Cumulativelossratio
r
loss
=
1
N
L
·L =
1
N
L
∑
(u;v)
L(h;x
u
;x
v
;y
u;v
) =
1
N
L
∑
(u;v)
|h(x
u
;x
v
)−y
u;v
|
2
; (5.12)
69
URank 6
URank 7
URank 2
URank 1
URank 6
URank 3
URank 4
URank 5
URank 2
URank 1
(a) Ranked query result for User #1: ranking results of the baseline (top) and our approach(bottom)
(b) Ranked query result for User #2 ranking results of the baseline (top) and our approach(bottom)
Results unranked
by the user are
shown at the top.
Ranked query results of the baseline
Ranked query results of the baseline
Ranked query results of our approach
Ranked query results of our approach
Figure 5.5: Screenshots of ranking results from our implementation. The number “#i”
aheadofeachresultistherankproducedbythecorrespondingrankingapproach. URank
denotes user-assigned rank. According to our experiment setup, only the user’s most
favorite10resultshaveURankvalues.
where L is the cumulative loss of a user’s ranking function, and N
L
denotes the
number of all comparable pairs. In particular, L counts the number of swapped pairs
between ground truths (user-assigned ranks) and ranks produced by the ranking func-
tion, over all the test queries of the user. Therefore the cumulative loss ratio r
loss
can
be regarded as the false alarm rate of the linear classifier. We use r
loss
to measure the
70
0 5 10 15 20
0
10
20
30
40
50
60
User ID
Loss Ratio
Baseline
LtR_CA
LtR
Figure5.6: Cumulativelossratioforeachofthe20users. Thisfigureindicatesthatour
approach (LtR) and LtR CA always perform better than the baseline in capturing user
preferences.
quality of user preferences capture. The smallerr
loss
is, the better a ranking function is
atcapturinguserpreferences.
Figure5.6illustratesthecumulativelossratioofallthreemethodsforeachuser. The
lossratiosofbothLtRandLtR CAarealwaysbetterthanthatofthebaselineapproach.
In general, as shown in Table 5.4, the average loss ratio of LtR is better than that of
LtR CA. However, we find that LtR is not always better than LtR CA for all users.
This is because some users are not interested in our exclusive features such as property
complexity. Instead, they are more interested in those common features of our method
and [6], e.g., context features. For these users, both methods show similar loss ratio
becausethesamelearning-to-rankalgorithmisapplied.
71
5.4.3.3 Totalrankofauser’stop-10
S =
∑
s2U
rank(s) (5.13)
whereU denotesasetthatcontainsauser’stop10favoritesemanticassociationssearch
results for a given query. This measurement is the number of results a user needs to
examineinordertoretrieveallofhis10favoriteresults. Itassessestheeffectivenessof
a ranking approach in identifying a user’s most interested results. The lowerS is, the
moreeffectivetherankingfunctionis.
Figure 5.7 illustrates the total rank of 20 individual users’ top 10 favorite results.
Our method is always more effective in capturing a user’s top 10 favorite results than
thebaseline,anddemonstratesbettereffectivenessthanLtR CAformostusers.
5.4.4 Comparison2: OverallRankingQuality
In addition to the evaluation on an individual user basis, we also evaluate the overall
rankingqualityofthethreemethods. Weadopttwostandardinformationretrievalmet-
rics: precision@k andnDCG
k
[35]. In order to make quantitative evaluation based on
these typical information retrieval metrics, we randomly create 20 additional queries.
Each user is asked to judge the top 10 records that each of the three methods gener-
ated for him, by explicitly assign each record a score on a six point scale ranging from
0(Bad) to 5(Perfect). The record is considered to be relevant with a label of 3(Good)
or better, and non-relevant otherwise. We collect 12,000 records with user judgements,
andevaluatethethreemethodsusingthefollowingmetrics:
Precision@k referstotheratioofrecordsrankedinthetopk resultsthatarelabeled
as relevant.
72
0 5 10 15 20
0
1000
2000
3000
4000
5000
6000
7000
User ID
Loss Ratio
Baseline
LtR_CA
LtR
Figure 5.7: Total rank of each user’s top 10 favorite results: before having seen all
of his 10 most favorite results, the number of results a user needs to examine in our
method(LtR)isfarlessthanthatinthebaseline.
nDCG
k
,the normalized discounted cumulative gainatpositionk is
nDCG
k
=M
k
k
∑
i=1
2
r
i
−1
log
2
(1+i)
; (5.14)
where M
k
is a normalization factor to ensure nDCG
k
of a perfect ordering to be 1.
nDCG
k
is designed specifically for evaluating ranking results. It rewards relevant
recordsrankedatthetopmoreheavilythanthoserankedlower[3].
Figure 5.8 and Figure 5.9 show that among the three methods, our approach (LtR)
alwayshasthebestprecisionandnDCGforthetop10rankingresults.
In addition, we compute the averages of cumulative loss ratio and total rank of a
user’s top-10, as another two metrics to assess the overall ranking quality. Table 5.4
containstheevaluationresultsofallfourmetrics,showingthatourmethodhasthebest
overallrankingqualityunderallthesemetrics.
73
1 2 3 4 5 6 7 8 9 10
0
0.2
0.4
0.6
0.8
1
K
Precision
Baseline
LtR_CA
LtR
Figure5.8: Precision@k forallthreemethods
1 2 3 4 5 6 7 8 9 10
0
0.2
0.4
0.6
0.8
1
K
NDCG
Baseline
LtR_CA
LtR
Figure5.9: nDCG
k
forallthreemethods
74
Avg. lossratio
Avg. total
rank@10
Precision@10 nDCG
10
Baseline 28.91% 2807 32.25% 36.12%
LtR CA 17.92% 550 67.17% 75.73%
LtR(our
method)
15.44% 336 77.42% 87.17%
Table 5.4: Data of average cumulative loss ratio, average total rank of users’ top-10,
precision@10,andnDCG
10
. Ourmethodoutperformstheothertwomethodsunderall
fourmetrics.
75
Chapter6
Rankbox: AdaptivelyMiningSemantic
RelationshipsUsingUserFeedback
A learning-to-rank method for semantic relationship mining was proposed in 5. The
method automatically captures a user’s preferences by asking the user to assign ranks
to his favorite search results. However, it has two limitations. First, user labeling is
a tedious and time-consuming task; a user has to examine thousands of results dur-
ing the labeling process. Second, the ranking function cannot be improved once it is
learnedfromtheuser-labeledresults, whichmaycauseunsatisfactoryrankingresultsif
thetrainingdataisinsufficienttocoverallpreferencesoftheuser.
Toaddresstheaboveissuesandchallenges,thischapterpresentsRankbox,anadap-
tiverankingsystemforminingsemanticassociationswiththreekeyfeatures:
• Personalized: inRankbox,eachuserhashisownrankingfunctionthatrepresents
hisspecificpreferences.
• Improvable: Rankbox supports a refinement mechanism, allowing users to con-
tinuouslyteachthesystemtheirpreferences.
• End-user friendly: users can access Rankbox from any digital devices with a
webbrowser,andconvenientlyinteractwiththesystemthroughjustafewclicks.
Usersnolongerhavetodealwithmanualparametertuningortediouslabeling.
We make an important observation: users’ opinions about search results can serve
as a valuable source of their preferences. Based on this observation, our system allows
76
each user to provide simple feedback about the current search results, learns the user’s
preferences from his feedback, and incorporates these preferences into a user-specific
rankingfunction. Inparticular,thefrontendofoursystemisawebgraphicaluserinter-
face (GUI) that provides semantic association search results to the user, ranked by his
currentrankingfunction;meanwhile,itallowstheusertosendhissubjectiveopinionson
eachresult(e.g.,likeordislike)backtothesystem,byafewclicksonthewebpage. The
backendofoursystemthenautomaticallyinterpretstheuser’sfeedbackandemploysan
adaptivelearning-to-rankalgorithmtocontinuouslyrefinehisrankingfunction. Ingen-
eral, the interaction between Rankbox and users iterates though a “searching-ranking-
feedback-refinement” cycle. During our user study, most users are satisfied with the
ranking results after a few iterations that usually takes only 20∼ 50 clicks. Users’
burden is significantly reduced compared to examining thousands of results during the
labelingprocessin5.
6.1 SystemOverview
Figure 6.1 illustrates an overview of our system. As the system front end, user mod-
ule provides a web GUI that supports users to have three types of interactions with the
system: initiating a semantic association search query through the query interface, get-
ting ranked search results, and sending feedback about the current search results back
to the system. Query module is part of the system back end. Given a semantic asso-
ciation query from the user, the semantic association query processor retrieves results
from the semantic knowledge base. The core of our system is the adaptive ranking
module, which analyzes the feedback from the user and incorporates the feedback into
his ranking function. In particular, given a set of query results, the user’s current rank-
ing function is employed to sort the results. If the user is not satisfied with the sorted
77
Adaptive Ranking Module
User Module
Query Module
Semantic
data
Query
processor
Current
ranking function
Query
results
Feedback
analyzer
Training Data
Past training data
New training data
generates
Refined
ranking function
replaces
Web GUI
query
User
provides
learns from
Feedback
(like/dislike)
Ranked
results
Query
interface
Learning to
rank algorithm
Figure 6.1: Our adaptive ranking system for semantic association search. Components
inboldtextarespecifictoeachuser.
results, he can provide feedback to inform the system of what he likes or dislikes. The
feedback is parsed by a feedback analyzer, and added as new training data for the user.
Apointwiselearning-to-rankalgorithmisemployedtolearntheuser’spreferencesfrom
both the new and the past training data. A refined ranking function is then produced
accordingtotheuser’scurrentpreferences,andreplacestheuser’scurrentrankingfunc-
tion. The user can continuously provide feedback to improve the quality of his ranking
function through several iterations. System components that incorporate user-specific
information (e.g., user preference, user feedback) are highlighted with bold text in Fig-
ure6.1.
6.2 SemanticAssociationSearch
Semantic associations are first introduced in [5], as a sequence of consecutive prop-
erties that link two resource entities in the RDF data model [49]. Semantic associa-
tions describe the existence and the meaning of relations between entities. As many
78
1
HarryPotter
married to
−−−−−−−−−−→GinnyWeasley
2
HarryPotter
has child
−−−−−−−−→LilyLunaPotter
has parent
−−−−−−−−−→GinnyWeasley
3
HarryPotter
is founder of
−−−−−−−−−−−→ Dumbledore’sArmy
has member
−−−−−−−−−−→Ginny
Weasley
4
HarryPotter
has parent
−−−−−−−−−→JamesPotter
education
−−−−−−−−−→HogwartsSchool
student graduate
−−−−−−−−−−−−−→ GinnyWeasley
Table6.1: Exampleresultsofasemanticassociationsearchbetweentwofictionalchar-
acter HarryPotter and Ginny Weasley
researchers have observed, the discovery of semantic associations is a key in informa-
tionretrievalontheSemanticWeb.
A semantic association search takes an entity node pairq = (e
i
;e
j
) as input query,
and aims to find paths in the RDF graph that connect e
i
and e
j
through labeled edges.
The result set is an unordered collection of semantic associations, denoted as A(q).
Table 6.1 shows a few example results for query (HarryPotter;GinnyWeasley), in
an RDF knowledge base we created from the Freebase data [31]. Note that the result
setofagivensemanticassociationqueryissolelydeterminedbytheentitypairandthe
RDFdatamodel,andthusisidenticaltodifferentusers.
In our system, any state-of-the-art semantic association search algorithm (e.g., -
queries [8] and SPARQLeR [39]) can serve as the query processor. For efficiency, we
adopt a depth-limited search algorithm to find the result set A(q) of a given query q
underacertainpathlengthrestriction.
6.3 RankingSearchResults
In a large and complicated knowledge base such as Freebase, a semantic association
search can return thousands of records even with a strict path length restriction (e.g.,
= 10). Therefore, an effective ranking method is required to sort the search results
79
accordingtotheirrelevancewithrespecttoauser’spreference. Toformulatetheranking
problem,weintroducearankingfunctionf(a);a∈A(q),whichdeterminesarelevance
score for each search result. By sorting these relevance scores over the entire search
resultsetA(q),morerelevantresultsareshowntotheuserfirst.
In particular, we calculate a set of features for each semantic associationa∈ A(q),
anddenotethefeaturevectorasx(a). Therankingfunctionisdefinedasascalerproduct
ofthefeaturevectorx(a)andauser-specificweightvectorw
u
:
f
u
(a) =x(a)·w
u
; (6.1)
wherew
u
is a vector with the same length asx(a), and the subscript u denotes that
the ranking functionf
u
together with the weight vectorw
u
specifically reflect useru’s
preferences. The semantic association feature vectorx(a) and the user-specific weight
vectorw
u
aredetailedinthefollowingsubsectionsindividually.
6.3.1 Semanticassociationfeatures
Anappropriatefeaturevectorx(a)consistsofaseriesofrealnumbersthatcharacterize
a semantic association from various perspectives, e.g., length, contexts and popularity.
These features are calculated automatically for each semantic association in the search
result set. The calculation is performed in the back end of our system, and thus is
completely invisible to the user. In general, any feature set (e.g., those proposed in
previous research such as [5]) can be adopted to our system. Our Rankbox system
currently uses the following features to characterize each semantic association search
result:
Associationlengthl(a)isthenumberofentitiescontainedinasemanticassociation
a.
80
Topic: education
Topic: character
Fictional
Character
School
education
Occupation
has_occupation
School
Type
Harry
Potter
Student
Hogwarts
School
has_occupation education
type
instance of class entity
Figure6.2: AsimpleRDFmodelwithtwotopicregionsattheschemalevel
Topic features quantitatively characterize the topics covered by the entities of a
semantic association. In RDF models, schema-level classes and properties can be cate-
gorized into several topic regions based on the knowledge domain they describe. Thus,
thetopicofanentityisdeterminedbythetopicregionofitscorrespondingschema-level
class. Forexample,Figure6.2showsasimpleRDFmodelwithtwotopicregionsatthe
schema level. Based on this region division, the topics of entities Harry Potter and stu-
dentaredeterminedascharacter,whiletheentityHogwartsSchoolisaboutthetopicof
education. Wedefinethetopicfeatureofsemanticassociationaregardingtopicias
t
i
(a) =
|E
i
|
l(a)
;
whereE
i
isthesetofa’sentitiesthatcovertopici.
Thefeaturevectorx(a)ofsemanticassociationaisthendefinedas
x(a) = (l(a);t
1
(a);t
2
(a);··· ;t
n
(a)) (6.2)
81
wherenisthenumberoftopics.
6.3.2 Learningw
u
usingLDA
The weight vectorw
u
in useru’s ranking functionf
u
represents his preferences on the
semantic association features. To automatically capturew
u
, our system collects two
sets of feedback D
+
u
and D
u
from user u, which contain semantic associations liked
and disliked by him respectively. By representing each semantic association inD
+
u
and
D
u
using its feature vector, we get a set of positive samples X
+
u
and a set of negative
samplesX
u
,inthek-dimensionalfeaturespace(k isthenumberoffeaturesinafeature
vector):
X
+
u
={x(a)|a∈D
+
u
}
X
u
={x(a)|a∈D
u
}
We then employ Linear Discriminant Analysis (LDA) [28] to learnw
u
from X
+
u
and
X
u
.
LDAisasupervisedmachinelearningalgorithmforclassifyingtwoormoregroups
of objects. Intuitively, LDA seeks to maximize the separation between groups, while
preserving as much the group discriminatory information as possible. Given X
+
u
and
X
u
as training data, LDA separates the positive samples and the negative samples by
a discriminative hyperplane. After being projected onto the normal to the discriminant
hyperplane,samplesfromthesamegroupareprojectedveryclosetoeachother,whileat
thesametime,theprojectionmeansareasfartherapartaspossible. Figure6.3illustrates
thisprocessina2-dimensionalfeaturespace.
Therefore,theprojectionofasemanticassociationfeaturevectorontothenormalto
thediscriminanthyperplanecanyieldaquantitativepreferencevalueintermsofwhether
82
μ
u
μ
u
The variability
within each set
is minimized
The distance
between the
projection means
is maximized
W
u
positive example (user-liked)
negative example (user-disliked)
x
1
x
2
+
-
Discriminant hyperplane
Figure 6.3: LDA maximizes the distance between the projection means while at the
sametimeminimizingthescatterwithintheset(illustratedina2-dimensionalspace).
userulikesordislikesthecorrespondingsemanticassociation. Wethususethenormal
tothediscriminanthyperplaneasw
u
. AccordingtoFisher’slineardiscriminant,w
u
can
beestimatedas:
w
u
= (
+
u
+
u
)
1
(
+
u
−
u
);
where
+
u
,
u
,
+
u
and
u
are the means and the covariances of X
+
u
and X
u
respec-
tively.
83
6.4 AdaptivelyLearningUserPreferences
Inthissectionwedetailhowtocollectuserfeedbackandadapttheuser’srankingfunc-
tion to his preferences based on the feedback. In particular, we develop an interac-
tive learning-to-rank algorithm which iterates through a “searching-ranking-feedback-
refining”cycle,asshowninAlgorithm2.
Algorithm2:InteracitveLtRforuseru
Initialization:w
u
←w
default
;D
+
u
←∅;D
u
←∅
Data: anRDFknowledgebaseM
rdf
foreach query q fromuser udo
findA(q)byDepthLimitedSearch(M
rdf
,q)
fora∈A(q)do
calculatefeaturevectorx(a)
f
u
(a)←x(a)·w
u
sortA(q)byf
u
(a)
getu’sfeedbackF
+
u
(resultslikedbyu)andF
u
(resultsdislikedbyu)
ifF
+
u
̸=∅ orF
u
̸=∅then
D
+
u
←D
+
u
∪
F
+
u
D
u
←D
u
∪
F
u
applyLDAon(D
+
u
;D
u
)tolearnaneww
u
In the searching and the ranking phases, given a semantic association queryq from
user u, a set of search results A(q) is retrieved from the RDF knowledge base using
depth-limited search, and then sorted by the user’s current ranking function f
u
. If it is
the first time the user uses our system, a default weight vectorw
default
is adopted to
constructf
u
.
Inthefeedback phase, ifuseruthinkstheorderingofA(q) producedbyhiscurrent
f
u
needsimprovement, our systemallowshimtogivetwotypes ofsimpleandintuitive
comments on each semantic association search result: like and dislike. The system
maintains two feedback setsD
+
u
andD
u
for useru. These sets accumulatively collect
alltheresultslikedordislikedbyuserufrombothhispastandpresentfeedbackphases.
84
Each result commented by user u in the present feedback phase is added to either the
positive feedback setD
+
u
or the negative feedback setD
u
, based on the comment type
(liked or disliked).
Intherefiningphase,thegoalistorefineuseru’scurrentrankingfunctionf
u
based
ontheupdatedfeedbacksetsD
+
u
andD
u
. Thus,LinearDiscriminantAnalysis(LDA)is
employed to learn a new weight vectorw
u
from the updatedD
+
u
andD
u
. The ranking
function f
u
is updated with the new w
u
, and used to rank the results of the subse-
quentqueriesfromuseru. Ausercancontinuouslyrefinehisrankingfunctionbygoing
throughthe“searching-ranking-feedback-refining”cycleformanyiterations,untilheis
satisfiedwiththerankingofsearchresults.
6.5 SystemImplementation
Our Rankbox system is developed and deployed on a Tomcat 7 web server. Rankbox
canbeeasilyaccessedfromanydigitaldevicewithamodernwebbrowser. Nosoftware
installation is required on the client side. On the server side, the system maintains a
semantic association query processor, an RDF knowledge base, and user profiles. Each
userprofileconsistsoftheuser’srankingfunctionandallthehistoryfeedbackfromthe
user.
The interactions between users and the system go through the friendly web GUI
illustrated in Figure 6.4. The major components of this GUI include: (A) a login bar:
a user must login to use and update his profile; (B) a search bar to start a semantic
association query; (C) search results; (D) result ranks produced by the user’s current
ranking function; (E) click on like or dislike to comment on each result; (F) click to
submit user feedback to the system; (G) a configuration bar to viewand change current
85
GUI settings, e.g., turn on/off the feedback mode; (H) shortcuts to start the user’s past
searchqueries.
6.6 ExperimentalResults
6.6.1 Experimentalsetup
Dataset: OurdatasetisanRDFknowledgebasewecreatedfromthe fictional universe
domain of the Freebase linked-open-data [31]. The RDF knowledge base covers infor-
mation about all types of fictional works, especially the characters and organizations
that appear in them. In particular, our dataset contains 36 topics, 340K instances and
590Kproperties. TheschemaofourRDFknowledgebaseisavailableat[1].
Ranking methods compared: We compare Rankbox with two state-of-the-art
semantic association ranking methods: SemDis and SVMLtR. SemDis refers to the
approach proposed in [6], which needs users to manually configure the weights of sev-
eral ranking metrics. We adopt the recommended weight configuration of its official
implementation (the SemDis Project [2]). SVMLtR is our SVM-based learning to rank
methodinChapter5. SVMLtRrequiresuserstoexaminethousandsofresultsandassign
rankstohis10favoriteresults.
Usersandtestqueries: Weinvited20humanuserstoparticipateinouruserstudy.
Given that the fiction Harry Potter and its characters are known to most of our partic-
ipants, we designed 20 test queries between the characters in Harry Potter (Table 6.2)
foreachusertogothrough.
86
A
B
C
D
E
F
G
H
Figure6.4: ThewebgraphicaluserinterfaceofRankbox
87
Query Entity1 Entity2
1 RonaldWeasley ChoChang
2 ProfessorAlbusDumbledore JamesPotter
3 LilyEvansPotter NevilleLongbottom
4 TomRiddleSr. LilyEvansPotter
5 SiriusBlack RemusLupin
6 GinnyWeasley GeorgeWeasley
7 ProfessorSeverusSnape GinnyWeasley
8 FredWeasley LordVoldemort
9 JamesPotter LuciusMalfoy
10 LordVoldemort JamesPotter
11 LunaLovegood FredWeasley
12 DracoMalfoy FredWeasley
13 TomRiddleSr. RemusLupin
14 LordVoldemort GinnyWeasley
15 ProfessorAlbusDumbledore HermioneGranger
16 RonaldWeasley GinnyWeasley
17 JamesPotter LordVoldemort
18 GeorgeWeasley FredWeasley
19 JamesPotter LuciusMalfoy
20 GeorgeWeasley ProfessorAlbusDumbledore
Table6.2: Testqueries
6.6.2 Evaluations
Timecomplexity: Themosttime-consumingoperationinoursystemissemanticasso-
ciationsearch,whichusuallytakes15∼20secondstoretrieve500resultsperquery. The
feature calculation, LDA, and sorting all complete within a couple of seconds. Experi-
mental results are measured when the Rankbox server is deployed on a computer with
Inteli-7CPU1.60GHzand6GBmemory.
Qualitative results: Figure 6.5 and Figure 6.6 show that ranking quality can be
significantlyimprovedafterincorporatinguserfeedbackononly8results.
88
Figure 6.5: Iteration 1: a user who is interested in family and romantic relationships selects some search results and gives her
feedback(userfeedbackisinblue).
89
Figure6.6: Iteration2: theuserspreferencehasbeenreflectedintherankingresults.
90
UserStudy: Toquantitativelyevaluateoursystem,weaskeachusertointeractwith
thesystemfor20iterations. Inthei-thiteration,eachuserisinstructedtogivefeedback
toatmost10results. Inaddition,heisaskedtojudgetherelevanceoftop-10resultsby
labeling each as relevant or irrelavant. These labels are used to evaluate the precision
ofoursystem. Weemploythefollowingtwoevaluationmetrics:
Precision@10 refers to the ratio of records ranked in the top 10 results that are
labeledas relevant.
Number of active users@query k is the number of users who give feedback at
queryk, i.e.,inthek-thiteration.
Figure 6.7 and Figure 6.8 illustrate the evaluation results using the above metrics.
Compared to the other two approaches, Rankbox is the only method that can contin-
uously improve the ranking precision through iterations. After 7 iterations, Rankbox
achieves the best precision@10 among all three methods. In addition, as the ranking
quality increases, more and more users stop giving feedback to the system. They only
become active once they feel that some ranking results are “not good”. Such results
usuallycomefroma“difficultquery”suchasQuery7,10and12.
91
0 5 10 15 20
0
0.2
0.4
0.6
0.8
1
Iteration (Query id)
Precision@10
SemDis
SVM LtR
Rankbox
Figure6.7: Precision@10periteration
0 5 10 15 20
0
5
10
15
20
Iteration (Query id)
Number of active users
Figure6.8: Thenumberofactiveusersperiteration
92
Chapter7
ConclusionandFutureWork
7.1 Conclusion
This thesis studied the semantic relationships between data objects from two aspects.
First,weobservedthattherelationsbetweenobjectsinanimagecarryimportantseman-
tics. With the help of a guide schema ontology and object detectors, our system can
automatically understand the semantic meaning of object and their relations. Second,
we workedon the semantic relationshipmining problemon a largescale ontologywith
bothschemaandinstances. Weproposedtwolearning-to-rankbasedmethodstoprovide
personalizedsemanticrelationshipsearchresultsforeachuser.
Chapter3presentedObjectRelationNetwork(ORN)tocarrysemanticinformation
for web images. By solving an optimization problem, ORN was automatically created
fromagraphicalmodeltorepresentthemostprobableandinformativeontologicalclass
assignments for objects detected from the image and their relations, while maintaining
semanticconsistency.
Benefiting from the strong semantic expressiveness of ORN, Chapter 4 proposed
automaticsolutionsforfourtypicalyetchallengingimageunderstandingproblems. Our
experiments showed the effectiveness and robustness of our system. In particular, for
thefourthapplication,wepresentedahierarchicalimageclusteringmethodthatgroups
semantically similar images into the same cluster. We proposed a bag-of-semantics
model to describe the semantic features of images. Viewed through a series of coarse-
to-finelenses,imageswiththesamebag-of-semanticsunderacertainlensareclustered
93
inatop-downhierarchicalmanner. Ourmethodallowseachusertocontroltheclustering
process while browsing, and dynamically adjusts the clustering result according to his
purpose.
Chapter 5 presented a learning-to-rank method to rank semantic association search
results. We use a feature vector to characterize each semantic association. A personal-
ized ranking function is automatically created for each user by learning his preferences
on these features. We evaluated the per-user and the overall ranking quality of our
method under various qualitative and quantitative metrics using a real-world data set.
Comparedwiththe-state-of-the-art,ourmethodhasdemonstratedsignificantadvantage
intermsofprecisionandeffectiveness.
ToimprovethemethodinChapter5,Chapter6proposedRankbox,anadaptiverank-
ing system for mining semantic associations from large-scale semantic data. The core
ofoursystemisaninteractivelearning-to-rankalgorithm,whichautomaticallycaptures
user preferences from his feedback about the search results. A user can continuously
improve the ranking quality by sending more feedback to the system. Our Rankbox
system can be accessed from any devices with a web browser. The evaluation results
demonstratedtheeffectivenessandadvantagesofoursystem.
7.2 Futurework
Based on the work presented in this thesis, we propose the following possible future
directions.
Improvingourimageunderstandingsystem
With the fast advance in object detection techniques, more and more high quality
object detectors has become available, such as detectors for cars, plants, cats, dogs,
aeroplanes and televisions. After adopting a larger ontology with objects and relations
94
fornewcategories,howtomakeourcurrentsystemadaptabletothesenewcategoriesis
a challenging problem. It is also worth exploring other segmentation tools, to incorpo-
ratebackgroundinformation(e.g.,grass,sky,sea,ground)intooursystem.
Themachinelearningcomponentinoursystemcanbefurtherimproved. Forexam-
ple, the visual based energy is currently computed by means of a binary SVM for each
subcategory in combination with a recalibration to likelihoods. It would be interesting
to see if other techniques, such as a simple multiclass logistic regression, would yield
betterresults.
Our idea was to add semantic structures to image object detection. The same idea
shouldalsobeappliedtoimprovethequalityofothercomputervisionproblem,suchas
videoobjectdetection,segmentation,andevenbiologicalimageinterpretation.
AnotherapplicationofObjectRelationNetwork: Imagesearchbykeywords
We demonstrated image search by image as one application of ORN. In fact, ORN
can also be applied to help traditional keyword-based image search. Because we can
automatically generate tags for an image based on its ORN, these tags can be used as
the keywords for the image. This method will particularly useful for the search of raw
images,andimageswithpoorannotationsorirrelevantsurroundingtext.
Improvingoursemanticrelationshipminingsystem
Thelearning-to-rankframeworkweproposedinthisthesiscantakeinothermachine
learningalgorithmforthelearningcomponent,suchasdecisiontreeandneuralnetwork.
Itwouldbeinterestingtostudywhichalgorithmsperformbetteronwhichfeatureset. In
addition, it worth considering using implicit user feedback to improve ranking quality,
such as monitoring a user’s browsing history, or measuring the time a user spends on a
particularresult.
95
Bibliography
[1] Freebase schema. http://www.freebase.com/schema/fictional_
universe.
[2] Semdisproject. http://lsdis.cs.uga.edu:8080/rankingah/,2012.
[3] E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorpo-
rating user behavior information. In Proceedings of the 29th annual international
ACM SIGIR conference on Research and development in information retrieval,
2006.
[4] H. Alani, C. Brewster, and N. Shadbolt. Ranking ontologies with aktiverank. In
The Semantic Web- ISWC 2006.2006.
[5] B.Aleman-Meza,C.Halaschek,I.Arpinar,andA.Sheth. Context-awaresemantic
associationranking. In SWDB’03: 33-50, Berlin, Germany,2003.
[6] B. Aleman-Meza, C. Halaschek-Wiener, I. B. Arpinar, C. Ramakrishnan, and
A. Sheth. Ranking complex relationships on the semantic web. IEEE Internet
Computing,2005.
[7] K. Anyanwu, A. Maduko, and A. Sheth. Semrank: Ranking complex relationship
search results on the semantic web. In the 14th International World Wide Web
Conference,2005.
[8] K. Anyanwu and A. Sheth. -queries,enablingquerying for semanticassociations
onthesemanticweb. Inthe12thInternationalWorldWideWebConference,2003.
[9] M. Ayer, H. Brunk, G. Ewing, W. Reid, and E. Silverman. An empirical distribu-
tion function for sampling with incomplete information. Annals of Mathematical
Statistics,1955.
[10] T.Berners-Lee,J.Hendler,andO.Lassila. Thesemanticweb. ScientificAmerican,
284(5):34–43,2001.
[11] J. Bi, Y. Chen, and J. Z. Wang. A sparse support vector machine approach to
region-basedimagecategorization. In CVPR,2005.
[12] A. Biswas and D. Jacobs. Active image clustering: Seeking constraints from
humanstocomplementalgorithms. In CVPR,2012.
96
[13] A. Bosch, A. Zisserman, and X. Mu˜ noz. Scene classification via plsa. In ECCV,
2006.
[14] G.BradskiandA.Kaehler. LearningOpenCV:ComputerVisionwiththeOpenCV
Library. O’Reilly,Cambridge,MA,2008.
[15] D. Cai, X. He, Z. Li, W.-Y. Ma, and J.-R. Wen. Hierarchical clustering of www
image search results using visual, textual and link information. In ACM Multime-
dia,2004.
[16] Y. Chen and J. Z. Wang. Image categorization by learning and reasoning with
regions. J.Mach.Learn. Res.,2004.
[17] Y. Chen, J. Z. Wang, and R. Krovetz. Clue: Cluster-based retrieval of images by
unsupervisedlearning. IEEETransactionson ImageProcessing,2003.
[18] C.CortesandV.N.Vapnik. Support-vectornetworks. Machine Learning Journa,
1995.
[19] K. Crammer and Y. Singer. Pranking with ranking. In Advances in Neural Infor-
mation ProcessingSystems 14,pages641–647,2001.
[20] R. Datta, W. Ge, J. Li, and J. Wang. Toward bridging the annotation-retrieval gap
inimagesearch. Multimedia, IEEE,2007.
[21] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-
scalehierarchicalimagedatabase. CVPR,2009.
[22] L. Ding, R. Pan, T. Finin, A. Joshi, Y. Peng, and P. Kolari. Finding and ranking
knowledgeonthesemanticweb. In TheSemantic WebC ISWC 2005.2005.
[23] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisser-
man. The PASCAL Visual Object Classes Challenge 2011 (VOC2011) Results.
http://www.pascal-network.org/challenges/VOC/voc2011/workshop/index.html.
[24] J.Fan,Y.Gao,andH.Luo. Integratingconceptontologyandmultitasklearningto
achieve more effective classifier training for multilevel image annotation. Image
Processing,IEEE Transactionson,2008.
[25] C. Fellbaum. WordNet An Electronic Lexical Database. The MIT Press, Cam-
bridge,MA;London,1998.
[26] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection
withdiscriminativelytrainedpartbasedmodels. IEEE TPAMI,32(9),2010.
[27] P. F. Felzenszwalb, R. B. Girshick, and D. McAllester. Discriminatively trained
deformablepartmodels,release4. http://www.cs.brown.edu/~pff/latent-release4/.
[28] D.A.ForsythandJ.Ponce. ComputerVision: AModernApproach. PrenticeHall,
1999.
[29] B. Gao, T.-Y. Liu, T. Qin, X. Zheng, Q.-S. Cheng, and W.-Y. Ma. Web image
clustering by consistent utilization of visual features and surrounding texts. In
ACMMultimedia,2005.
97
[30] J. C. Gemert, J.-M. Geusebroek, C. J. Veenman, and A. W. Smeulders. Kernel
codebooksforscenecategorization. In ECCV,2008.
[31] Google. Freebase data dumps. http://download.freebase.com/
datadumps/,2012.
[32] S.Gordon,H.Greenspan,andJ.Goldberger. Applyingtheinformationbottleneck
principle to unsupervised clustering of discrete and continuous image representa-
tions. In ICCV,2003.
[33] A. Harth, S. Kinsella, and S. Decker. Using naming authority to rank data and
ontologiesforwebsearch. In The Semantic Web- ISWC 2009.2009.
[34] X. He, R. S. Zemel, and M. A. Carreira-Perpinan. Multiscale conditional random
fieldsforimagelabeling. In CVPR,2004.
[35] K.J¨ arvelinandJ.Kek¨ al¨ ainen. Irevaluationmethodsforretrievinghighlyrelevant
documents. In Proceedings of the 23rd annual international ACM SIGIR confer-
ence on Researchand developmentin information retrieval,2000.
[36] F. Jing, C. Wang, Y. Yao, K. Deng, L. Zhang, and W.-Y. Ma. Igroup: web image
searchresultsclustering. In ACMMultimedia,2006.
[37] T. Joachims. Optimizing search engines using clickthrough data. In Proceedings
oftheeighthACMSIGKDDinternationalconferenceonKnowledgediscoveryand
data mining,2002.
[38] G. Kasneci, F. M. Suchanek, G. Ifrim, M. Ramanath, and G. Weikum. Naga:
Searching and ranking knowledge. Data Engineering, International Conference
on,0:953–962,2008.
[39] K. J. Kochut and M. Janik. Sparqler: Extended sparql for semantic association
discovery. In4thEuropeanconferenceonTheSemanticWeb: ResearchandAppli-
cations,2007.
[40] G.Kulkarni,V.Premraj,S.Dhar,S.Li,Y.Choi,A.C.Berg,andT.L.Berg. Baby
Talk: UnderstandingandGeneratingImageDescriptions. In CVPR,2011.
[41] L. Ladicky, P. Sturgess, K. Alahari, C. Russell, and P. H. Torr. What, where and
howmany? combiningobjectdetectorsandcrfs. In ECCV,2010.
[42] F.-F. Li and P. Perona. A bayesian hierarchical model for learning natural scene
categories. In CVPR,2005.
[43] J. Li and J. Z. Wang. Real-time computerized annotation of pictures. In Proceed-
ings of the 14th annual ACMinternational conferenceon Multimedia,2006.
[44] L.-J.Li,R.Socher,andL.Fei-Fei. Towardstotalsceneunderstanding: Classifica-
tion,annotationandsegmentationinanautomaticframework. In CVPR,2009.
[45] D. Liu, X.-S. Hua, L. Yang, M. Wang, and H.-J. Zhang. Tag ranking. In WWW,
2009.
98
[46] T.-Y. Liu. Learning to rank for information retrieval. Found. Trends Inf. Retr.,
2009.
[47] Y. Liu, X. Chen, C. Zhang, and A. Sprague. Semantic clustering for region-based
imageretrieval. J.Vis.Comun. ImageRepresent.,2009.
[48] A. Maedche and S. Staab. Measuring similarity between ontologies. In EKAW,
2002.
[49] F.ManolaandE.Miller. Rdfprimer. In W3C Recommendation,2004.
[50] M. Marszalek and C. Schmid. Semantic hierarchies for visual object recognition.
In CVPR,2007.
[51] B.P.Nguyen,W.-L.Tay,C.-K.Chui,andS.-H.Ong. Aclustering-basedsystemto
automate transfer function design for medical image visualization. Vis. Comput.,
2012.
[52] I.Nwogu,V.Govindaraju,andC.Brown. Syntacticimageparsingusingontology
andsemanticdescriptions. In CVPR,2010.
[53] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking:
Bringingordertotheweb. Technicalreport,1999.
[54] G.-J. Qi, C. Aggarwal, and T. Huang. Towards semantic knowledge propagation
fromtextcorpustowebimages. In WWW,2011.
[55] A.Rabinovich,A.Vedaldi,C.Galleguillos,E.Wiewiora,andS.Belongie. Objects
incontext. In ICCV,2007.
[56] K.Rodden,W.Basalaj,D.Sinclair,andK.Wood. Doesorganisationbysimilarity
assist image browsing? In Proceedings of the SIGCHI conference on Human
factorsin computing systems,2001.
[57] C. Saathoff and A. Scherp. Unlocking the semantics of multimedia presentations
inthewebwiththemultimediametadataontology. In WWW,2010.
[58] B. Sch¨ olkopf and A. J. Smola. Learning with Kernels: Support Vector Machines,
Regularization,Optimization,andBeyond.TheMITPress,Cambridge,MA,2002.
[59] A. T. G. Schreiber, B. Dubbeldam, J. Wielemaker, and B. Wielinga. Ontology-
basedphotoannotation. IEEE IntelligentSystems,2001.
[60] B.Sigurbj¨ ornssonandR.vanZwol.Flickrtagrecommendationbasedoncollective
knowledge. In WWW,2008.
[61] M. Srikanth, J. Varner, M. Bowden, and D. Moldovan. Exploiting ontologies for
automaticimageannotation. In SIGIR,2005.
[62] A. Torralba, K. Murphy, and W. T. Freeman. Using the forest to see the trees:
exploiting context for visual object detection and localization. In Commun. ACM,
2010.
[63] Z. Tu, X. Chen, A. Yuille, and S. Zhu. Image parsing: Unifying segmentation,
detection,andrecognition. IJCV,2005.
99
[64] H.Wang,X.Jiang,L.tienChia,andA.hweeTan. Wikipedia2onto buildingcon-
ceptontologyautomatically,experimentingwithwebimageretrieval. Informatica,
2009.
[65] X.-J. Wang, W.-Y. Ma, L. Zhang, and X. Li. Iteratively clustering web images
basedonlinkandattributereinforcements. In ACMMultimedia,2005.
[66] J. Weston, S. Bengio, and N. Usunier. Large scale image annotation: Learning
to rank with joint word-image embeddings. European Conference on Machine
Learning,2010.
[67] L.Wu,L.Yang,N.Yu,andX.-S.Hua. Learningtotag. In WWW,2009.
[68] F. Xia, T.-Y. Liu, J. Wang, W. Zhang, and H. Li. Listwise approach to learning to
rank: theory and algorithm. In Proceedings of the 25th international conference
on Machinelearning,2008.
[69] B.ZadroznyandC.Elkan. Transformingclassifierscoresintoaccuratemulticlass
probabilityestimates. In ACMSIGKDD,2002.
[70] X.Zheng,D.Cai,X.He,W.-Y.Ma,andX.Lin. Localitypreservingclusteringfor
imagedatabase. In ACMMultimedia,2004.
100
Abstract (if available)
Abstract
Understanding the semantic relationships between data objects has been a critical step towards getting useful semantic information for better integration, search and decision-making. This thesis addresses the problem of semantic relationship understanding from two aspects: first, given an ontology schema, an automatic method is proposed to understand the semantic relationships between image objects using the schema as a useful semantic source
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
From matching to querying: A unified framework for ontology integration
PDF
Discovering and querying implicit relationships in semantic data
PDF
Applying semantic web technologies for information management in domains with semi-structured data
PDF
Learning the semantics of structured data sources
PDF
A statistical ontology-based approach to ranking for multi-word search
PDF
Ontology-based semantic integration of heterogeneous information
PDF
An efficient approach to clustering datasets with mixed type attributes in data mining
PDF
Complex pattern search in sequential data
PDF
Heterogeneous graphs versus multimodal content: modeling, mining, and analysis of social network data
PDF
Enabling laymen to contribute content to the semantic web: a bottom-up approach to creating and aligning diversely structured data
PDF
DBSSC: density-based searchspace-limited subspace clustering
PDF
Customized data mining objective functions
PDF
An efficient approach to categorizing association rules
PDF
Learning semantic types and relations from text
PDF
Workflow restructuring techniques for improving the performance of scientific workflows executing in distributed environments
PDF
Exploiting web tables and knowledge graphs for creating semantic descriptions of data sources
PDF
Linguistic understanding and semantic theory
PDF
Modeling and recognition of events from temporal sensor data for energy applications
PDF
Tag based search and recommendation in social media
PDF
Building a knowledgebase for deep lexical semantics
Asset Metadata
Creator
Chen, Na
(author)
Core Title
Understanding semantic relationships between data objects
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
08/01/2013
Defense Date
06/24/2013
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
image clustering,image search,image semantics,image understanding,OAI-PMH Harvest,relationship ranking,semantic data mining,Semantic Web
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Prasanna, Viktor K. (
committee chair
), McLeod, Dennis (
committee member
), Raghavendra, Cauligi S. (
committee member
)
Creator Email
nanausc@gmail.com,nchen@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-310973
Unique identifier
UC11293545
Identifier
etd-ChenNa-1920.pdf (filename),usctheses-c3-310973 (legacy record id)
Legacy Identifier
etd-ChenNa-1920.pdf
Dmrecord
310973
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Chen, Na
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
image clustering
image search
image semantics
image understanding
relationship ranking
semantic data mining
Semantic Web