Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Advanced knowledge graph embedding techniques: theory and applications
(USC Thesis Other)
Advanced knowledge graph embedding techniques: theory and applications
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ADVANCEDKNOWLEDGEGRAPHEMBEDDINGTECHNIQUES:THEORYAND
APPLICATIONS
by
XiouGe
ADissertationPresentedtothe
FACULTYOFTHEUSCGRADUATESCHOOL
UNIVERSITYOFSOUTHERNCALIFORNIA
InPartialFulfillmentofthe
RequirementsfortheDegree
DOCTOROFPHILOSOPHY
(ELECTRICALENGINEERING)
August2023
Copyright 2023 XiouGe
Dedication
Tomyparents.
ii
Acknowledgements
Iam verygratefultomy doctoraladvisor,Prof. C.-C.JayKuo,for givingmetheopportunity tocollaborate
withhimwhenIknewverylittleaboutartificialintelligenceandmachinelearning. Hissupporthashelped
me overcome many hurdles and challenges. The weekly meetings and reports have facilitated my rapid
growth and enabled me to track my progress. I am also fortunate to have a group of very friendly fellow
studentswho arealwaysready tohelpone another and progress together. I would like to pay special tribute
to my collaborators, Joe Yun Cheng Wang and Bin Wang, who have consistently helped me brainstorm
ideas and provided genuine advice. Starting with no background in Knowledge Graph, I would not have
been able to achieve so much without the discussions with Joe, Bin, and Prof. Kuo. I have also learned
softwareimplementationskillsfromMaxHong-ShuoChen. Throughcollaborationonmanysideprojects,
myimplementationskillshaveimprovedsignificantly,whichacceleratedmyresearchprogressinthelater
part of myPh.D. I would alsolike to thank Prof. Keith Chuggand Prof. Aiichiro Nakano for servingon my
thesiscommittee. Theirconstructivequestionsandsuggestionshavepromptedmetothinkcriticallyaboutmy
work. I extend my gratitude to Prof. Bart Kosko and my TA, Akash Panda, for offering a clear understanding
of Probability Theory, which was previously challengingfor me to comprehend. The course EE503at USC
hastransformedmyunderstandingofRandomVariables,andInowgraspwhyneuralnetworkscanbeusedto
modelprobabilitydistributions. IwouldalsoliketothankProf. BrandonFranzkeandDr. ArashSaifhashemi
forintroducingmemanyindustry-standardsoftwaredevelopmenttoolsandknowledgethroughmyexperience
of working as a teaching assistant for them. These experiences have helped me stand out in an extremely
iii
competitive job market. I am forever thankful to USC for generously offering me the prestigious Annenberg
Fellowship,allowingmeaccesstoworld-classeducationandresearchresources. MyadmissiontoUSC’s
Ph.D.program has given me a chance toprovemyselfandaccomplishmydreamofbecominganAIexpert.
Throughout my entire education in the US, I am very fortunate to have worked with many mentors,
advisors,andfriendswhohavehelpedmeatdifferentpoints. IexpressmygratitudetoProf. LavVarshney,
Prof. Wen-mei Hwu, Prof. Jinjun Xiong, Prof. Jose Schutt-Aine, Prof. Xu Chen, Dr. Richard Goodwin, Tom
Savarino, andAlanPorter. Withouttheirstrongreferencesandwillingnesstocollaborate,mycareerwould
not have advanced so far. I would also like to thank my close friends, Dr. Tinghao Guo, Dr. Thong Nguyen,
andZhichun Wan, for providing me with unreservedemotionalsupportthathelpedmesurvivemyPh.D.
iv
TableofContents
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
ListofTables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
ListofFigures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
Chapter1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Significance of the Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Contributions of the Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Compound Operation Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 CompoundE3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Complex Space RegressionandEmbeddingforEntityTypePrediction . . . . . . . . 6
1.2.4 Type-Associated EmbeddingforKnowledgeGraphEntityAlignment . . . . . . . . 7
1.3 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Chapter2: Research Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Major Knowledge Graphs and Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Knowledge Graph Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Distance-based Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.2 Semantic Matching Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.3 Classifier-based Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.4 Loss Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 CompoundE3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.1 2D Geometric Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.1.1 Advanced Transformations . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.2 Classification-based Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.2.1 Simple NeuralNetworks . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.2.2 Advanced NeuralNetworks . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.2.3 Lightweight ClassificationModel . . . . . . . . . . . . . . . . . . . . . . 27
2.3.3 Advanced Relation Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3.4 Model Ensembles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4 Entity Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4.1 Statistical Approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
v
2.4.2 Classification Approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4.3 Embedding Approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.5 Entity Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5.1 Translational-embedding-basedMethods . . . . . . . . . . . . . . . . . . . . . . . 31
2.5.2 GNN-based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Chapter3: CompoundE: Knowledge Graph Embedding with Translation, Rotation and Scaling
Compound Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2.1 Definition of CompoundE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.2 CompoundE as An Affine Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.3 Relation with Other Distance-basedKGEModels . . . . . . . . . . . . . . . . . . . 44
3.2.4 Properties of CompoundE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3.1 Link Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3.1.1 Datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3.1.2 Evaluation Protocol. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3.1.3 PerformanceBenchmarking. . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3.1.4 ImplementationandOptimalConfigurations. . . . . . . . . . . . . . . . . 53
3.3.1.5 HyperparameterTuning. . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3.2 Path Query Answering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3.3 Task 2: KG Entity Typing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3.4 Ablation Studies on CompoundEVariants. . . . . . . . . . . . . . . . . . . . . . . . 58
3.3.5 Performance on ComplexRelationTypes. . . . . . . . . . . . . . . . . . . . . . . . 58
3.3.6 Complex Relation ModelingandHistogramsofEmbeddingValues . . . . . . . . . 62
3.3.7 Performance comparisonfordifferentvariationsofCompoundE . . . . . . . . . . . 68
3.3.8 Implementation Details. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.3.9 Comparing CompoundEandSTaR . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.4 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Chapter4: Knowledge Graph Embeddingwith3DCompoundGeometricTransformations . . . . . . 73
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.2 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.2.1 CompoundE3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.2.1.1 Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2.1.2 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2.1.3 Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2.1.4 Reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.2.1.5 Shear . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.2.2 Beam Search for Best CompoundE3DVariant . . . . . . . . . . . . . . . . . . . . . 81
4.2.3 Model Ensembles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.2.3.1 Weighted-Distances-Sum(WDS)Strategy . . . . . . . . . . . . . . . . . 83
4.2.3.2 Rank Fusion Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.2.4 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.3.1.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
vi
4.3.1.2 Evaluation Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.3.1.3 Hyper-parameterSearch . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.3.1.4 Other ImplementationDetails . . . . . . . . . . . . . . . . . . . . . . . . 90
4.3.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.3.2.1 PerformanceEvalution . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.3.2.2 Model Ensembles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.3.2.3 Effectiveness ofBeamSearch . . . . . . . . . . . . . . . . . . . . . . . . 93
4.3.2.4 Modeling of SymmetricRelations . . . . . . . . . . . . . . . . . . . . . . 96
4.3.2.5 Modeling of Multiplicity . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.3.2.6 Modeling of HierarchicalRelations . . . . . . . . . . . . . . . . . . . . . 98
4.3.2.7 Model Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.4 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Chapter5: CORE: A Knowledge Graph Entity Type Prediction Method via Complex Space
Regression and Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.2 Proposed CORE Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.2.1 Complex Space KG Embedding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.2.2 Complex Space Type Embedding. . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.2.3 Solving Complex SpaceRegression. . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.2.4 Type Prediction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.2.5 Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.2.6 Complexity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.3.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.3.2 Hyperparameter Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.3.3 Benchmarking Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.3.3.1 Statistical-basedMethod. . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.3.3.2 Classifier-basedMethod. . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.4 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Chapter6: TypeEA: Type-Associated EmbeddingforKnowledgeGraphEntityAlignment . . . . . . 117
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.2 The TypeEA Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.2.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.2.2 Type Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.2.3 Type Association Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.2.4 Entity Representation andAlignment . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.2.5 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.3.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.3.2 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.4 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Chapter7: Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
7.1 Summary of the Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
vii
7.2 Future Research Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
viii
ListofTables
2.1 Survey Papers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Major Open Knowledge Graphs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Dataset (continued). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5 Distance-based KGE models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.6 Semanticmatching-basedKGEmodels.
1
Thereareorthonormalandcommutativeconstraints
onthe matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1 Comparing relation patterns thatCompoundEcanmodelwithotherpopularKGE. . . . . . . 46
3.2 Datasets Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3 Filtered ranking of link predictiononogbl-wikikg2 . . . . . . . . . . . . . . . . . . . . . . 53
3.4 Filtered ranking of link predictionforFB15k-237andWN18RR. . . . . . . . . . . . . . . . 53
3.5 Optimal Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.6 Path Query Answering DatasetsStatistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.7 Performance comparison for pathqueryanswering. . . . . . . . . . . . . . . . . . . . . . . 57
3.8 Entity typing performance comparison for FB15k-ET and YAGO43k-ET datasets. Best result
are in bold and second best resultareunderlined. . . . . . . . . . . . . . . . . . . . . . . . 57
3.9 Entity Typing Datasets Statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.10 Optimal Configurations for Entity Typing. B denotes the batch size and N denotes the
negative sample size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.11 Filtered ranking of link predictionforogbl-wikikg2andFB15k-237 . . . . . . . . . . . . . 59
ix
3.12 Filtered MRR on four relation typesofFB15k-237. . . . . . . . . . . . . . . . . . . . . . . 62
3.13 Filtered MRR on each relation typeofWN18RR . . . . . . . . . . . . . . . . . . . . . . . 62
3.14 Complexity comparison of KGEmodels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.15 Link prediction search space offivehyper-parameters. . . . . . . . . . . . . . . . . . . . . . 69
3.16 Path query answering search spaceoffivehyper-parameters. . . . . . . . . . . . . . . . . . 70
3.17 Entity typing search space of fivehyper-parameters. . . . . . . . . . . . . . . . . . . . . . . 70
3.18 Preliminary comparison after addingreflectionandshearoperators. . . . . . . . . . . . . . 71
4.1 Alist of rank fusion functions underconsideration. . . . . . . . . . . . . . . . . . . . . . . 86
4.2 Statistics of four link prediction datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.3 The search space of six hyper-parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.4 Optimal configurations for link prediction tasks, where B andN denote the batch size and the
negative sample size, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.5 Comparison of the link predictionperformanceunderthefilteredranksettingforDB100k. . 90
4.6 Comparison of the link prediction performance under the filtered rank setting for ogbl-wikikg2. 91
4.7 Comparison of the link predictionperformanceunderthefilteredranksettingforYAGO3-10. 91
4.8 Comparisonofdifferentweighted-distances-sum(WDS)strategiesforDB100KandYAGO3-10. 92
4.9 Performance comparison of differentrankfusionmethodsforDB100KandYAGO3-10. . . . 92
4.10 Ablation study on CompoundE3DforDB100K. . . . . . . . . . . . . . . . . . . . . . . . . 92
4.11 Ablation study on CompoundE3DforYAGO3-10. . . . . . . . . . . . . . . . . . . . . . . . 92
4.12 Comparison of filtered MRR performanceoneachrelationtypeofWN18RR. . . . . . . . . 94
4.13 Complexity comparison of KGEmodelsonogbl-wikikg2underasimilartestingMRR. . . . 94
5.1 Example of type in YAGO43K dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.2 Example of type in DB111K-174dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.3 Statistics of three KG datasets usedinourexperiments. . . . . . . . . . . . . . . . . . . . . 108
x
5.4 Performance comparison of various entity type prediction methods in terms of filtered
rankingforFB15k-ETandYAGO43k-ET,wherethebestandthesecondbestperformance
numbers are shown in bold face andwithanunderscore,respectively. . . . . . . . . . . . . . 111
5.5 PerformancecomparisonofentitytypepredictionforDB111K-174,wherethebestandthe
second best performance numbers are shown in bold face and with an underscore, respectively.112
5.6 Performance comparison of coarse-grained and fine-grained type prediction for the
YAGO43k-ET dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.7 Anillustrative example of type predictionfortheYAGO43k-ETdataset. . . . . . . . . . . . 113
5.8 Hyperparameter setting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.1 ProportionofH@1predictionerrorsduetomismatchedtypesbydifferententityalignment
methods for the D-W 15K V1 dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.2 The statistics of type associationpairsforvariousdatasets. . . . . . . . . . . . . . . . . . . 120
6.3 Ranking of associated type pairprediction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.4 Anexample from D-Y 15K V1 illustratingtheadvantageofusingtypeforalignment. . . . . 129
6.5 Comparison of entity alignment performance of TypeEA with baselines for cross-KG
(DBpedia to Wikidata) alignmentwith15kentities. . . . . . . . . . . . . . . . . . . . . . . 132
6.6 Comparison of entity alignment performance of TypeEA with baselines for cross-lingual
(English to French) alignment with15kentities. . . . . . . . . . . . . . . . . . . . . . . . . 132
6.7 Comparison of entity alignment performance of TypeEA with baselines for cross-lingual
(English to French) alignment with100kentities. . . . . . . . . . . . . . . . . . . . . . . . 133
xi
ListofFigures
1.1 Illustration of Knowledge Graph. Source: AmazonWebServices. . . . . . . . . . . . . . . 1
1.2 Illustration of Knowledge Panel fromGoogleSearch. . . . . . . . . . . . . . . . . . . . . . 3
2.1 Distanced-based knowledge graphembeddingmodelsovertheyears. . . . . . . . . . . . . . 17
2.2 Hits@10 Link Prediction performanceonthemostpopularbenchmarkingdatasets. . . . . . 33
2.3 Illustration of the statistical approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4 Illustration of the RotatE entity space, the RotatE type space and the regression linking these
two spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.1 Anillustration of previous distance-basedKGEmodelsandCompoundE. . . . . . . . . . . 39
3.2 Illustration of different ways of composingcompoundoperations. . . . . . . . . . . . . . . . 41
3.3 Heatmap of test MRR scores obtainedfromlearningrateanddimensiongridsearch. . . . . 55
3.4 -SNE visualization of entity embedding in the 2D space for some major entity types in
FB15K-237. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.5 MRR scores on ogbl-wikikg2 dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.6 Distribution of relation embedding values for “friends” relation in FB15k-237, obtained
using∥S
r
·R
r
·T
r
·h− t∥ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.7 FB15k-237 “Friends” relation embedding obtained using∥S·R·T·h−
ˆ
S·
ˆ
R·
ˆ
T·t∥:
(a) distribution of head translation values, (b) distribution of tail translation values, (c)
distribution of head scaling values, (d) distribution of tail scaling values, and (e) distribution
ofrotation angle values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.8 WN18RR “instance_hypernym” relation: (a) distribution of head translation values, (b)
distribution of tail translation values, (c) distribution of head scaling values, and (d)
distribution of tail scaling values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
xii
3.9 WN18RR“similar_to”relation: (a)distributionofheadtranslationvalues,(b)distribution
oftailtranslationvalues,(c)distributionofheadscalingvalues,and(d)distributionoftail
scaling values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.10 Comparing the performance of differentCompoundEforms. . . . . . . . . . . . . . . . . . 69
4.1 Composing different geometric operationsinthe3Dsubspace. . . . . . . . . . . . . . . . . 75
4.2 The ensemble of multiple CompoundE3Dvariants. . . . . . . . . . . . . . . . . . . . . . . 84
4.3 The distribution of the MRR performance versus the operator number of various model
variants for the different datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.4 Effects of rotation and reflection operatorsonsymmetricrelations. . . . . . . . . . . . . . . 95
4.5 Illustration of CompoundE3D’scapabilityinmultiplicitymodeling. . . . . . . . . . . . . . 96
4.6 Comparing different model’s MRRperformancemetricacrossdifferentdimensions. . . . . . 97
5.1 AKG with the entity type information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.2 IllustrationoftheCOREModel,wheretheblueandreddotsdenoteentitiesandtypesintheir
complex embedding spaces, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.3 Comparison of the MRR performanceforFB15k-ETasafunctionofthetypedimension. . . 109
6.1 AnIllustrative example of the ideabehindtheproposedTypeEAmethod. . . . . . . . . . . . 118
6.2 Illustration of using the type association embedding for identifying relevant candidates in
entityalignment,where uand vdenotetwotypeembeddingsintwoKGs,respectively,and W
denotes their association embeddingusingthebilinearproduct. . . . . . . . . . . . . . . . . 122
6.3 Comparison of the H@1 entity alignment performance with or without the type information
for different datasets and modelsasafunctionofthefractionofseedalignment. . . . . . . . 130
6.4 Comparison of the H@1 entity alignment performance with or without the type information
for different datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
7.1 AnIllustrative example of the transformer-basedapproachforKGcompletion. . . . . . . . . 138
xiii
Abstract
KnowledgeGraphencodeshuman-readableinformationandknowledgeingraphformat. Triples,denoted
by(ℎ,,), are basic elements of a KG, where ℎ and are head and tail entities while is the relation
connectingthem. Bothmanualeffortbydomainexpertsandautomatedinformationextractionalgorithmshave
contributedtothecreationofmanyexistingKnowledgeGraphstoday. However,giventhelimitedinformation
accessibletoeachindividualandthelimitationofalgorithms,itisnearlyimpossibleforaKnowledgeGraphto
perfectlycaptureeverysinglepieceoffactsabouttheworld. Assuch,KnowledgeGraphsareoftenincomplete
and many researchers have developed different algorithms to predict missing facts in Knowledge Graphs.
Knowledge Graph Embedding models were first proposed to mainly solve the Knowledge Graph Completion
problem. Besides,embeddingmodelscanalsobeusefulinsolvingmanydownstreamtaskssuchasentity
classificationandentityalignment. Inthisthesis,wemainlyfocusontworesearchproblemsweattemptto
solve: 1)designingnovelandeffectiveknowledge embedding using geometric manipulations; 2) leveraging
typelabelinformation for improving entityclassificationandentityalignmentperformance.
Translation, rotation, and scaling are three commonly used geometric manipulation operations in image
processing. Besides, some of them are successfully used in developing effective knowledge graph embedding
(KGE)modelssuchasTransE,RotatE,andPairRE.Inspiredbythesynergy,weproposeanewKGEmodel
by leveraging all three operations in this thesis. To demonstrate the effectiveness of CompoundE, we
conduct experiments on three popular knowledge graph completion datasets. Experimental results show that
CompoundEconsistently achieves the state-of-the-artperformance.
xiv
The cascade of 2D geometric transformations were exploited to model relations between entities in a
knowledge graph (KG), leading to an effective KG embedding (KGE) model, CompoundE. Furthermore, the
rotationinthe3DspacewasproposedasanewKGEmodel,Rotate3D,byleveragingitsnon-commutative
property. Inspired by CompoundE and Rotate3D, we leverage 3D compound geometric transformations,
including translation, rotation, scaling, reflection, and shear and propose a family of KGE models, named
CompoundE3D, in this thesis. CompoundE3D allows multiple design variants to match rich underlying
characteristicsofaKG.Sinceeachvarianthasitsownadvantagesonasubsetofrelations,anensembleof
multiple variants can yield superior performance. The effectiveness and flexibility of CompoundE3D are
experimentally verified on four popular linkpredictiondatasets.
Entity type prediction is also an important problem in knowledge graph (KG) research. The goal of entity
typepredictionistopredicttypelabelsforentities. Themainchallengeisthattherearetensofthousands
of fine-grained type labels and each entity may have multiple different type labels. A new KG entity type
predictionmethod,namedCORE(COmplexspaceRegressionandEmbedding),isproposedinthisthesis.
ExperimentsshowthatCOREoutperformsbenchmarkingmethodsonrepresentativeKGentitytypeinference
datasets. Strengths and weaknesses of variousentitytypepredictionmethodsareanalyzed.
Entityalignmentiscommonlyusedtolinkdifferent knowledgegraphs andaugmentfacts aboutentities.
Themainobjectiveistoidentifythecounterpartofasourceentityinthetargetknowledgegraph. Inthiswork,
we demonstratethattheentitytypeinformation,whichiscommonlyavailableinknowledgegraphs,isvery
helpfultoknowledgegraphalignmentandproposeanewmethodcalledtheType-associatedEntityAlignment
(TypeEA) accordingly. Experimental results show that the proposed TypeEA consistently outperforms
state-of-the-art baselines across all OpenEAentityalignmentdatasetswithdifferentexperimentalsettings.
xv
Chapter1
Introduction
1.1 SignificanceoftheResearch
Knowledge Graph representation learning, also known as knowledge graph embedding (KGE), has been
intensively studied in recent years. Yet, it remains one of the most fundamental problems in Artificial
Intelligence(AI)andDataEngineeringresearch. Triples,denotedby(ℎ,,),arebasicelementsofaKG,
whereℎ and areheadandtailentitieswhile istherelationconnectingthem. Forinstance,thefact“Los
Angeles is located in the USA” can be encoded as a triple (Los Angeles, isLocatedIn, USA). An example of
knowledge graph is shown in Fig. 1.1. The link prediction (or KG completion) task is often used to evaluate
Figure 1.1: Illustration of KnowledgeGraph. Source: AmazonWebServices.
1
theeffectivenessofKGEmodels. Thatis,thetaskistopredict withgivenℎ and,ortopredictℎ withgiven
and. KGEmodelsareevaluatedbasedon howwellthe prediction matchesthe groundtruth. Thereare
several challengesin thedesignofgood KGE models. First, real-world KGs often contain a large number
ofentities. Itisimpracticaltohavehigh-dimensionalembeddingsduetodevicememoryconstraints. Yet,
the performance of KGE models may deteriorate significantly in low-dimensional settings. How to design a
KGE model that is effective in low-dimensional settings is not trivial. Second, complex relation patterns (e.g.
1-to-N,N-to-1,and N-to-N,antisymmetric,transitive,hierarchicalrelations,etc.) remaindifficulttomodel.
Third, each of extant KGE models has its own strengths and weaknesses. It is desired yet unclear how to
designaKGEmodelthatleveragesstrengthsofsomemodelsandcomplementsweaknessesofothers. KGEis
criticaltomanydownstreamapplicationssuchasmultihopreasoning[123,36],KGalignment[21,48],entity
classification[52], etc.
Inadditiontotheabovementionedcompletionproblem,knowledgegraphalsohasvastamountofdifferent
andimportantapplications. KGactuallyenablesfastretrievalofstructureddataabouttargetentitiesupon
user search. For example, when you search for a famous person, place, or a popular topic on Google, you will
see the Google Knowledge Panel pops up as shown in Fig. 1.2 alongside with search results. The Knowledge
Panel helps people understand the subject of interest quickly. The data source for Knowledge Panel is the
GoogleKnowledgeGraphlaunchedin 2012. GoogleKnowledge GraphwasinitiallycreatedfromFreebase,
whichwasapopular opensource KGacquiredbyGooglein2010. TheGoogleKnowledgeGraphwaslater
augmentedby integrating many other datasourcesuchastheWikidata.
Indeed,KGinformationintegrationisnotlimitedtotheconstructionofGoogleKnowledgeGraph,buthas
many more applications in different domains. E-Commerce companies seek to build users and products KGs
andmergewith other companies’KGs inordertogainbusinessintelligenceandbetterselltheirproducts to
the right person. Hospitals and clinics share medical conditions about patients in the form of KG to facilitate
better treatment in case the patients move to live at different places. Financial institutions also integrate
2
Figure 1.2: IllustrationofKnowledgePanelfromGoogleSearch.
knowledgebasestotrackdownillegalactivitiessuchmoneylaundering. RidesharingcompaniessuchasUber
are also embarking on similar efforts to crack down collusion between ill-intentioned drivers. To identify the
matchingentity or entity alignment is thereforeveryimportantundertheaboveapplicationscenarios.
Besides,knowledgegraphisalsoandimportantsourceofinformationforAI-poweredvirtualassistant
suchasSiri, Alexa,andGoogleAssistant. DialogsarefirstanalyzedwithNaturalLanguageUnderstanding
(NLU) algorithms to extract a few keywords as cues for locating the subgraph of the KG as the useful
3
information. BytraversingafewhopsontheKG,sensibleresponsescanbegeneratedusingNaturalLanguage
Generationmodels. OtherapplicationsofKGsincludingmusicrecommendationsystemsbasedonmusic
KGsorevent forecasting system based ontemporalKGs.
Entity type offers a valuable piece of information to KG learning tasks. Better results in KG-related tasks
have been achieved with the help of entity types. On the other hand, entity type prediction is challenging for
several reasons. First, collecting additional type labels for entities is expensive. Second, type information
isoften incompleteespeciallyforlarge-scale datasets. Third, KGs are ever-evolving and type information
isoftencorruptedbynoisyfacts. Thus,thereisaneedtodesignalgorithmstopredictmissingtypelabels.
Entity type information can also be helpful for knowledge graph entity alignment. Intuitively, the entity type
canimprovetheperformanceofentityalignmentmodelssincewedonotneedtoalignentitiesofmismatched
types.
1.2 ContributionsoftheResearch
1.2.1 CompoundOperationEmbedding
In this work, we cascade translation, rotation, and scaling operations to form a new model is named
CompoundE. By casting CompoundE in the framework of group theory, we show that quite a few distanced-
based KGE models are special cases of CompoundE. CompoundE extends the simple distance-based scoring
functions to relation-dependent compound operations on head and/or tail entities. Four main contributions of
thisworkare summarized below.
• We present a novel KG embedding model called CompoundE, which combines three fundamental
operations in the affine group and offersawiderangeofdesigns.
• ItisprovedmathematicallythatCompoundEcanhandlecomplexrelationtypesinKGthankstounique
properties of the affine group.
4
• We conduct extensive experiments on three popular KG completion datasets, namely FB15k-237,
WN18RR,andogbl-wikikg2. Experimentalresultsshow thatCompoundEachievesstate-of-the-art
performance across all three datasets.
• Against large-scale datasets containing millions of entities under the memory constraint, CompoundE
outperforms other benchmarking methodsbyabigmarginwithfewerparameters.
1.2.2 CompoundE3D
In this work, we extend CompoundE [50] along three directions. First, we include more affine operations
beyond translation, rotation, and scaling such as reflection and shear. Second, we extend these geometric
transformationsfromthe2Dspacetothe3DspaceandproposeafamilyofKGEmodels,CompoundE3D.
Third, CompoundE3DallowsmultipledesignvariantstomatchrichunderlyingcharacteristicsofaKG.Since
each variant has its own advantages on a subset of relations, an ensemble of multiple variants can yield
superiorperformance. TheeffectivenessofCompoundE3Disexperimentallyverifiedonfourpopularlink
predictiondatasets. Todetermineascoringfunctionthatperformsthebestforagivendataset,wepropose
an adapted beam search algorithm that builds more complex scoring functions from simple but effective
onesgradually. Second,althoughensemblelearningisapopularstrategy,itremainsunder-exploredwhen
it comesto buildingKGE models. In thiswork, we explore two ensemble strategies that potentially boost
linkpredictionperformanceandallowdifferentCompoundE3Dvariantstoworktogetherandcomplement
eachother. First,weimplementaweightedsumofdifferentscoringfunctionsforlinkprediction. Second,
weapplyunsupervised rank aggregation functionstounifyrankpredictionsfromindividualmodelvariants.
Bothstrategies help boost the ranking ofvalidcandidateentitiesandreducetheimpactofoutliers.
Themajor contributions of this work aresummarizedbelow.
5
• We examine affine operations in the 3D space, instead of the 2D space, to allow more versatile relation
representations. Besidestranslation, rotation, and scalingused in CompoundE,weinclude reflection
andshear transformations which allowanevenlargerdesignspace.
• We propose an adapted beam search algorithm to discover better model variants. Such a procedure
avoidsunnecessaryexplorationofpoorvariantsbutzoomsintomoreeffectiveonestostrikeagood
balance between model complexity andpredictionperformance.
• We analyze the properties of each operation and its advantage in modeling different relations. Our
analysis is backed by empirical resultsonfourdatasets.
• Toreduceerrorsofanindividualmodelvariantandboosttheoveralllinkpredictionperformance,we
aggregatedecisionsfromdifferentvariantswithtwoapproaches;namely,thesumofweighteddistances
andrank fusion.
1.2.3 ComplexSpaceRegressionandEmbeddingforEntityTypePrediction
Inthiswork,weproposeanewentitytypepredictionmethodnamedCOREwhichleveragestheexpressive
poweroftwocomplexspaceembeddingmodels;namely,RotatEandComplExmodels. Itembedsentities
and types in two different complex spaces using either RotatE or ComplEx. We derive a complex regression
model to link these two spaces and introduce a mechanism to optimize embedding and regression parameters
jointly. Thecontributions of our work aresummarizedasfollows:
• We present a new method for entity type prediction named CORE (COmplex space Regression and
Embedding). COREleveragestheexpressivepowerofcomplexspaceembeddingmodelsincluding
RotatE [133] and ComplEx [140] to represent entities and types. To capture the relatedness of entities
andtypes, a complex regression modelisbuiltbetweenentityspaceandtypespace.
6
• We conduct experiments on three major KG datasets and make performance comparison between
CORE and state-of-the-art entity type prediction methods. CORE outperforms extant methods in most
evaluation metrics.
• We study and compare statistical-based, classifier-based, and embedding-based methods for entity type
prediction. Strengths and weaknesses of different approaches are discussed. We also introduce a better
statistical method baseline named SDType-Cond.
1.2.4 Type-AssociatedEmbeddingforKnowledgeGraphEntityAlignment
Inthiswork,weproposeatype-associatedembeddingforentityalignment. Althoughtheauxiliaryinformation
such as textual, visual, and temporal features was leveraged to improve the entity alignment performance
in the past, the entity type information is rarely considered in existing entity alignment models. TypeEA
exploitstheentity type informationto guideentityalignmentmodelssothattheycanfocusonentitieswith
matchingtypes. AtypeembeddingmodelbasedonsemanticmatchingisdevelopedinTypeEAtocapture
the association between types in different knowledge graphs. The main contributions of this work can be
summarized as follows.
• Wepresentasimpleandlowmemorycostembeddingmodeltocapturethetypeassociationindifferent
KGs and leverage this information to improve the performance of the entity alignment model. We use
far fewer free parameters compared to complex models that use large pretrained neural models such as
EVA[90] and BERT-INI [136].
• We prepare a type pair dataset for DBP v1.1 by querying the DBpedia English (EN), the DBpedia
German (DE), the DBpedia French (FR), the Wikidata, and the YAGO public endpoint KGs. A subset
ofentity typesis selected tolearnhigh-qualitytypeassociationembedding. Thedatasetisreleasedto
facilitate future research.
7
• We conduct extensive experiments on all entity alignment datasets in DBP v1.1, which contains
cross-lingualandcross-KGalignmenttasks. weobserveaconsistentimprovementwhencombining
TypeEA with different embedding-basedentityalignmentmodels.
1.3 OrganizationoftheThesis
Therestofthethesisisorganizedasfollows. InChapter2,wereviewtheresearchbackgroundforKnowledge
GraphEmbeddingmodels,aswellastheirapplicationsincludingentityclassificationandentityalignment. In
Chapter 3, we propose a unified distance-based Knowledge Graph Embedding model based on compounding
geometricoperations. InChapter4,weproposetwo methodsfor entityclassification in knowledgegraph:
astatistical-basedmethodnamedSDType-CondandacomplexspaceembeddingmodelnamedCORE.In
Chapter 5,we proposea typeembedding approach for enhancing entityalignment performance. Finally, we
giveconcluding remarks and envision futureresearchdirectionsinChapter6.
8
Chapter2
ResearchBackground
2.1 MajorKnowledgeGraphsandDatasets
Before we discuss Knowledge Graph (KG) embedding models, it is equally important to understand the
basic information of the data source. We collect information about major KGs and show it in Table 2.2.
Specifically,weinvestigate the timewhen eachKGiscreated,numberofentitiesandfactsineachKG, how
the information in each KG is collected, special feature of each KG, and finally the link to access each KG. In
addition, we also collect information about different KG completion datasets and show the statistics in Table
2.3. Besidesthebasicstatisticssuchasnumberofentitiesandrelations,train/valid/testsplit,Ialsoinclude
theinformationabouttheaveragedegreeofeachnodeandthedomainthattheKGisdescribing. Overthe
years, many different datasets are created to test different aspects of knowledge graph embedding models and
fixtheloopholes such as the inverse edge leakageprobleminFB15KandWN18.
2.2 KnowledgeGraphEmbedding
After reading some survey papers we collect in Table 2.1 on Knowledge Graph, we observe that researchers
often categorize Knowledge Graph embedding models based on scoring functions and tools applied to model
entity-relationinteractionsandrepresentations. Knowledgegraphembeddingmodelscanbegroupedinto
9
three major classes, namely 1) distance-based models, 2) semantic matching models, 3) neural network
models. Inparticular,we collect alldistance-basedKGembeddingmodelsthatweareawareofandpresent
them in chronological order in Fig. 2.1. In Fig. 2.2, we compare the link prediction performance of KGE
modelsinvented over the years, for the mostpopularbenchmarkingdatasets.
2.2.1 Distance-basedModels
Distance-basedscoringfunctionisoneofthemostpopularstrategiesforlearningknowledgegraphembedding
(KGE).Theintuitionbehindthisstrategyisthatrelationsaremodelledastransformationstoplaceheadentity
vectors in the proximity of their corresponding tail entity vectors or vice versa. For a given triple(ℎ,,), the
goalistominimize the distance betweenℎ and vectorsafterthetransformationintroducedby.
TransE[9]isoneofthefirstKGEmodelsthatinterpretrelationsbetweenentitiesastranslationoperations
invectorspace. Let h, r, t∈R
denote theembeddingforhead,relation,andtailofatriplerespectively, the
TransEscoring function is defined as:
(ℎ,)=∥h+ r− t∥
(2.1)
where = 1 or 2 that denote 1-Norm or 2-Norm respectively. However, this efficient model has difficulty
modelingcomplexrelationssuchas1-N,N-1, N-N,symmetricand transitive relations. Many laterworks
attemptstoovercomethisshortcomings. Forexample,TransHprojectsentityembeddingontorelation-specific
hyperplanes so that complex relations can be modeled by the translation embedding model. Formally, let w
be the normal vector to a relation-specific hyperplane, then the head and tail representation in the hyperplane
canbewritten as,
h
⊥
= h− w
⊤
hw
, t
⊥
= t− w
⊤
tw
(2.2)
10
Theprojected representations are then linkedtogetherusingthesametranslationrelationship,
(ℎ,)=∥h
⊥
+ r− t
⊥
∥
2
2
. (2.3)
However, this orthogonal projection prevents the model from encoding inverse and composition relations.
AsimilarideacalledTransRtransformsentitiesintoarelation-specificspaceinstead. TheTransRscoring
functioncanbe written as,
(ℎ,)=∥M
h+ r− M
t∥
2
2
. (2.4)
However, the relation-specific transformation introduced in TransR requires () additional parameters. To
save the additional parameters introduced, TransD uses entity projection vectors to populate the mapping
matrices,insteadofusingadensematrix. TransDreducestheadditionalparametersfrom() to().
Thescoringfunction can be written as,
(ℎ,)=
r
h
⊤
+ I
h+ r−
r
t
⊤
+ I
t
2
2
(2.5)
Withthesamegoalofsavingadditionalparameters,TranSparseenforcesthetransformationmatrixtobea
sparsematrix. The scoring function can bewrittenas,
(ℎ,)=∥M
(
) h+ r− M
(
) t∥
2
1/2
(2.6)
where
∈[0, 1] isthesparsedegreeforthemappingmatrix M
. VariantofTranSparseincludesseparate
mappingmatricesforheadandtail. TransMassignsdifferentweightstocomplexrelationsforbetterencoding
power. TransMSattemptstoconsidermultidirectionalsemanticsusingnonlinearfunctionsandlinearbias
vectors. TransF mitigates the burden of relation projection by explicitly modeling the basis of projection
11
matrices. ITransFmakeuseofconceptprojectionmatricesandsparseattentionvectorstodiscoverhidden
conceptswithin relations.
In recent years, researchers expand their focus to spaces other than Euclidean geometry. TorusE
[39] projects embedding in an n-dimensional torus space, where[h],[r],[t]∈ T
denotes the projected
representationofhead,relation,tail. TorusEmodelsrelationaltranslationinTorusspacebyoptimizingthe
objectiveasfollows.
min
(,)∈([ℎ]+[])×[]
∥−∥
(2.7)
Multi-RelationalPoincarémodel(MuRP) [5]embedsKGentitiesin aPoincaréballofhyperbolic space.
It transforms entity embeddings using relation-specific Möbius matrix-vector multiplication and Möbius
addition. The negative curvature introduced by hyperbolic space is empirically better in capturing the
hierarchical structure in knowledge graphs. However, MuRP has difficulty encoding relation patterns and
onlyuses aconstantcurvature. ROTH[16] improve over MuRPbyintroducing arelationspecific curvature.
RotatE[133]modelsentitiesinthecomplexvectorspaceandinterpretrelationsasrotationsinsteadof
translations. Formally, let h, r, t∈C
denote the representation of head, relation, and tail of a triple in the
complexvector space. The RotatE scoringfunctioncanbedefinedas,
(ℎ,)=∥h◦ r− t∥ (2.8)
The self-adversarial negative sampling strategy also contributes to RotatE’s significant performance im-
provement compared to its predecessors. Quite a few models attempt to extend RotatE. MRotatE adds an
entity rotation constraint to the optimization objective to handle multifold relations. HAKE rewrite the
rotation formula in polar coordinates and separate the scoring function into two components, that is the phase
componentand the modulus component. ThescoringfunctionofHAKEcanbewrittenas,
12
(ℎ,)=
,
(h, t)+
,
(h, t) (2.9)
where
,
(h, t)=∥ sin((h
+ r
− t
)/2)∥
1
(2.10)
and
,
(h, t)=∥h
◦((r
+ r
′
)/(1− r
′
))− t
∥
2
(2.11)
Thismodificationleadstobettermodelingcapability of hierarchy structuresin knowledge graph. Rotate3D
performsquaternionrotationin3dspaceandenablethemodeltoencodenon-commutativerelations. Rot-Pro
extends the RotatE by transforming entity embeddings using an orthogonal projection that is also idempotent.
This change enable RotPro to model transitive relations. PairRE also tries to improve over RotatE. Instead of
rotating the head to match the tail, PairRE [17] performs transformations on both head and tail. The scoring
functioncanbe defined as,
(ℎ,)=∥h◦ r
H
− t◦ r
T
∥ (2.12)
where h, t∈R
areheadandtailentityembedding,and r
H
, r
T
∈R
arerelation-specificweightvectorsfor
head and tail vectors respectively, and◦ is an elementwise product. In fact, this elementwise multiplication is
simply the scaling operation. One advantage of PairRE compared to previous models is that it is capable
of modeling subrelation structures in knowledge graph. LinearRE [110] is a similar model but adding a
translationcomponentbetweenthescaledheadandtail embedding. Thetransformationstrategy canstillbe
effectivebyaddingittoentityembeddinginvolvedinrelationrotation. SFBR[87]introducesasemanticfilter
whichincludesascalingandshiftcomponent. HousE[85]andReflectE[193]modelsrelationasHouseholder
reflection. InTable 2.5, we compare differentdistance-basedKGEandtheirspacecomplexity.
13
2.2.2 SemanticMatchingModels
Another related idea of developing KGE models is to measure the semantic matching score. RESCAL [105]
adopts a bilinear scoring function as the objective in solving a three-way rank- matrix factorization problem.
Formally,let h, t∈R
denotetheheadandtailembeddingand M
∈R
×
istherepresentationforrelation.
Then,theRESCAL scoring function canbedefinedas,
(ℎ,)= h
⊤
M
t (2.13)
However,oneobviouslimitationofthisapproachisthatitusesadensematrixtorepresenteachrelationwhich
requires an order of magnitude more parameters compared to those use vectors. DistMult [174] reduces free
parameters by enforcing the relation embedding matrix to be diagonal. Let r∈ R
be the relation vector.
Then,diag(r)∈R
×
is the diagonal matrix constructed from r. Then, the DistMult scoring function can be
writtenas,
(ℎ,)= h
⊤
diag(r)t (2.14)
However, because the diagonal matrix symmetric, it has difficulty modeling antisymmetric relations.
ANALOGY[91]hasthesamescoringfunctionasRESCALbutinsteaditattemptstoincorporateantisymmetric
configurationsbyimposingtworegularizationconstraints: 1) M
M
⊤
= M
⊤
M
whichrequirestherelation
matrixtobe orthonormal;2) M
M
′ = M
′M
whichrequirestherelationmatrixtobecommutative. HolE
[106] introduces circular correlation between head and tail vectors which can be interpreted as a compressed
tensorproduct to capture richer interactions. TheHolEscoringfunctioncanbewrittenas
(ℎ,)= r
⊤
(h★ t) (2.15)
14
ComplEx [140] extends the bilinear product score to the complex vector space so as to model antisymmetric
relationsmoreeffectively. Formally,let h, r, t∈C
bethehead,relation,tailcomplexvectors,and tdenote
thecomplexconjugate of the t. The ComplExscoringfunctioncanbedefinedas
(ℎ,)= Re(⟨r, h, t⟩) (2.16)
where⟨·,·,·⟩denotestrilinearproduct,and Re(·)meanstakingtherealpartofacomplexvalue. However,
relation compositions remains difficult for ComplEx to encode. SimplE [75] models inverse of relations with
anenhanced version of Canonical Polyadicdecomposition. ThescoringfunctionofSimplEisdefinedas
(ℎ,)=
1
2
(⟨h, r, t⟩+⟨t, r
′
, h⟩) (2.17)
TuckER[6]extendsthesemanticmatchingmodelto3Dtensorfactorizationofthebinarytensorrepresentation
ofknowledge graph triples. The scoring functionisdefinedas
(ℎ,)=W×
1
h×
2
r×
3
t (2.18)
QuatE[197]andDualE[13]extendfromthecomplexrepresentationtothehypercomplexrepresentationwith
4degreesoffreedomtogainmoreexpressiverotationalcapability. Let
ℎ
,
,
∈H
betherepresentation
of head, relation, and tail in quaternion space of the form = +i+j+d. Then the QuatE scoring
functionisdefined as
(ℎ,)=
ℎ
⊗
⊳
·
(2.19)
Specifically,the normalization of relation vectorinquaternionspaceisdefinedas
⊳
(,,,)=
|
|
=
+
i+
j+
k
√︁
2
+
2
+
2
+
2
(2.20)
15
AndtheHamiltonian product in quaternionspaceiscomputedas
ℎ
⊗
⊳
=(
ℎ
◦ −
ℎ
◦−
ℎ
◦−
ℎ
◦)
+(
ℎ
◦+
ℎ
◦ +
ℎ
◦−
ℎ
◦)i
+(
ℎ
◦−
ℎ
◦+
ℎ
◦ +
ℎ
◦)j
+(
ℎ
◦+
ℎ
◦−
ℎ
◦+
ℎ
◦ )k
(2.21)
Andtheinner product in quaternion spaceiscomputedas
1
·
2
=⟨
1
,
2
⟩+⟨
1
,
2
⟩+⟨
1
,
2
⟩+⟨
1
,
2
⟩ (2.22)
However,onedisadvantageofthesemodelsisthattheyrequireveryhighdimensionalspacestoworkwell
andthereforeitisdifficulttoscaletolargeknowledgegraphs. CrossEintroducescrossoverinteractionsto
betterrepresentthebirdirectionalinteractionsbetweenentityandrelations. ThescoringfunctionofCrossEis
definedas
(ℎ,)=
tanh(c
◦ h+ c
◦ h◦ r+ b) t
⊤
(2.23)
where the relation specific interaction vector c
is obtained by looking up interaction matrix C∈ R
×
.
Diehdral [167] construct elements in a dihedral group using rotation and reflection operations over a 2D
symmetric polygon. The advantage of the model is with encoding relation composition. SEEK [171] and
AutoSF[199]identifiestheunderlyingsimilaritiesamongpopularKGEsandproposeanautomaticframework
ofdesigningnewbilinearscoringfunctionswhilealsounifiesmanypreviousmodels. However,thesearch
spaceofAutoSFiscomputationallyintractableanditisdifficulttoknowifoneconfigurationwillbebetter
thananotherunlessthemodelistrainedandtestedwiththedataset. Therefore,theAutoSFsearchcanbetime
consuming. In Table 2.6, we compare differentsemanticmatching-basedKGEandtheirspacecomplexity.
16
2013
• TransE[9]
2014
• TransH[155]
2015
• TransR[89]
• TransD[68]
• RTransE[46]
• PTransE[88]
• KG2E[60]
2016
• FTransE[43]
• STransE[104]
• TransG[161]
• TransA[71]
• lppTransE[183]
• TranSparse[69]
• ManifoldE[160]
2017
• puTransE[138]
• ITransF[162]
• CombinE[135]
• TransE-RS[209]
2018
• TransC[95]
• TransAt[112]
• TorusE[40]
2019
• MuRP[5]
• QuatE[197]
• RotatE[133]
• KEC[53]
• TransGate[187]
• TransMS[179]
2020
• ATTH[16]
• HyperKG[78]
• Rotate3D[45]
• GeomE[168]
• HAKE[201]
• BoxE[1]
• OTE[137]
• RatE[63]
• MDE[117]
• AprilE[92]
• TransRHS[192]
• SpacE[103]
• LineaRE[110]
2021
•HBE[108]
•RotL[148]
•GrpKG[175]
•HopfE[7]
•PairRE[17]
•MRotatE[64]
•HA-RotatE[151]
•MQuadE[184]
•CyclE[176]
•5E[24]
•DualE[13]
•BiQUE[54]
•FieldE[102]
2022
• StructurE[194]
• ReflectE[193]
• DensE[94]
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure2.1: Distanced-based knowledge graph embedding models over the years.
17
Table 2.1: SurveyPapers.
No. Year Title Author Venue
1
2013 Representation Learning: A Review and New
Perspectives
Yoshua Bengio, Aaron C. Courville, Pascal Vin-
cent.
IEEETransactionsonPatternAnalysisandMa-
chineIntelligence
2
2016 A Review of Relational Machine Learning for
Knowledge Graphs
MaximilianNickel,KevinMurphy,VolkerTresp,
EvgeniyGabrilovich.
ProceedingsoftheIEEE
3
2017 Knowledge Graph Embedding: A Survey of
Approaches and Applications
QuanWang,ZhendongMao,BinWang,LiGuo. IEEETransactionsonKnowledgeandDataEn-
gineering
4
2018 AComprehensiveSurveyofGraph Embedding:
Problems, Techniques, andApplications
HongYunCai,VincentW.Zheng,KevinChen-
ChuanChang.
IEEETransactionsonKnowledgeandDataEn-
gineering
5
2020 Areview: Knowledgereasoningoverknowledge
graph
XiaojunChen,ShengbinJia,YangXiang. ExpertSystemswithApplications
6
2021 Knowledge Graphs AidanHogan,EvaBlomqvist,MichaelCochez,
Claudia D’amato, Gerard De Melo, Claudio
Gutierrez, Sabrina Kirrane, José Emilio Labra
Gayo, Roberto Navigli, Sebastian Neumaier,
Axel-CyrilleNgongaNgomo,AxelPolleres,Sab-
birM.Rashid,AnisaRula,LukasSchmelzeisen,
Juan Sequeda, Steffen Staab, Antoine Zimmer-
mann.
ACMComputingSurveys
7
2021 Knowledge graph representationandreasoning ErikCambria,ShaoxiongJi,ShiruiPan,Philip
S.Yu.
Neurocomputing
8 2021 Knowledge Graphs ClaudioGutierrez,JuanF.Sequeda. CommunicationsoftheACM
9
2021 Knowledgegraphembeddingforlinkprediction:
A comparative analysis
Rossi, Andrea and Barbosa, Denilson and Fir-
mani,DonatellaandMatinata,AntonioandMeri-
aldo,Paolo.
ACM Transactions on Knowledge Discovery
fromData
10
2022 Knowledge graphs as tools for explainable ma-
chine learning: A survey
IlariaTiddi,StefanSchlobach. ArtificialIntelligence
11
2022 ASurveyonKnowledgeGraphs: Representation,
Acquisition, and Applications
ShaoxiongJi,ShiruiPan,ErikCambria,Pekka
Marttinen,PhilipS.Yu.
IEEE Transactions on Neural Networks and
LearningSystems
18
Table 2.2: Major OpenKnowledge Graphs.
Name Year #Entities #Facts Source Highlights Link (as of 2022)
WordNet 1980s 155K 207K Curated by experts Lexicaldatabaseofsemanticre-
lations between words
https://wordnet.princeton.edu
ConceptNet 1999 44M 2.4B Crowdsourcedhumandata
and structured feeds
Commonsense concepts and rela-
tions
https://conceptnet.io
Freebase 2007 44M 2.4B Crowdsourcedhumandata
and structured feeds
One of the first public KBs/KGs https://developers.google.com/
freebase/
DBpedia 2007 4.3M 70M Automatically extracted
structured data from
Wikipedia
Multilingual, Cross-Domain https://dbpedia.org/sparql
YAGO 2008 10M 120M Derived from Wikipedia,
WordNet, and GeoNames
Multilingual, Fine grainedentity
types
https://yago-knowledge.org/sparql
Wikidata 2012 50M 500M Crowdsourced human cu-
ration
Collaborative, Multilingual,
Structured
https://query.wikidata.org
OpenCyc 2001 2.4M 240k Createdbydomainexperts Commonsense concepts and rela-
tions
http://www.cyc.com
NELL 2010 - 50M ExtractedfromClueweb09
corpus (1Bweb pages)
Each fact is given a confidence
score. 2.8M beliefs has high con-
fidence score.
http://rtw.ml.cmu.edu/rtw/
19
Table 2.3: Dataset.
Dataset
Statistics
Remarks
#Ent #Rel #Train #Valid #Test Avg. Deg.
Kinship 104 26 8,544 1,068 1,074 82.15 Information about complex relational
structure among members of a tribe.
UMLS 135 49 5,216 652 661 38.63 Biomedicalrelationshipsbetweencate-
gorizedconceptsoftheUnifiedMedical
Language System.
Countries 272 2 1,111 24 24 4.35 Relationships between countries, re-
gions, and subregions.
FB15K 14,951 1,345 483,142 50,000 59,071 13.2 A subset of the Freebase knowledge
graph. Textual descriptions of entities
are available.
FB15K-237 14,951 237 272,115 17,535 20,466 19.74 DerivedfromFB-15Kbyremovingin-
verserelationstoavoidtestleakageprob-
lem.
WN18 40,943 18 141,442 5,000 5,000 1.2 A subset of the WordNet knowledge
graph.
WN18RR 40,943 11 86,835 3,034 3,134 2.19 Derived from WN18 by removing in-
verserelationstoavoidtestleakageprob-
lem.
YAGO3-10 123,182 37 1,079,040 5,000 5,000 9.6 SubsetofYAGO3(extensionofYAGO)
thatcontainsentitiesassociatedwithat
least 10 different relations, describing
citizenship, gender, and profession of
people.
20
Table 2.4: Dataset(continued).
Dataset
Statistics
Remarks
#Ent #Rel #Train #Valid #Test Avg. Deg.
DB100K 99,604 470 597,572 50,000 50,000 12 A subset of the DBpedia knowledge
graph. Eachentityappearsinatleast20
different relations
CoDEx-S 2,034 42 32,888 1,827 1,828 21.47 Extracted from Wikidata knowledge
graph. Each entity has degree at least
15. Hardnegativesamplesareprovided.
CoDEx-M 17,050 51 185,584 10,310 10,311 13.45 Extracted from Wikidata knowledge
graph. Each entity has degree at least
10. Hardnegativesamplesareprovided.
CoDEx-L 77,951 69 551,193 30,622 30,622 25.62 Extracted from Wikidata knowledge
graph. Eachentityhasdegreeatleast5.
OGB-Wikikg2 2,500,604 535 16,109k 429k 598k 8.79 Extracted from Wikidata knowledge
graph. Triplesplitbasedontimestamp
ratherthan random split.
OGB-Biokg 93,773 51 4,763k 163k 163k 47.5 Createdfromalargenumberofbiomed-
ical data repositories. Contains infor-
mationaboutdiseases,proteins,drugs,
side effects,protein functions.
21
2.2.3 Classifier-basedModels
Another popular approach to build KGE models is based on the classification framework. For example, a
Multilayer Perceptron (MLP) [35] is used to measure the likelihood of unseen triples for link prediction.
NTN [124] adopts a bilinear tensor neural layer to model interactions between entities and relations of triples.
ConvE [32] reshapes and stacks the head entity and the relation vector to form a 2D shape data, applies
ConvolutionalNeuralNetworks(CNNs)toextractfeatures,andusesextractedfeaturestointeractwithtail
embedding. R-GCN[120]appliesaGraph ConvolutionalNetwork(GCN)andconsiderstheneighborhoodof
each entity equally. CompGCN [143] performs a composition operation over each edge in the neighborhood
ofacentralnode. Thecomposedembeddingsarethenconvolvedwithspecificfiltersrepresentingtheoriginal
andtheinverserelations,respectively. KBGAN[11]optimizesagenerativeadversarialnetworktogeneratethe
negative samples. KBGAT [101] applies graph attention networks to capture both entity and relation features
in any given entity’s neighborhood. ConvKB [31] applies 1D convolution on stacked entity and relation
embeddings of a triple to extract feature maps and applies nonlinear classifier to predict likelihood of the
triple. However,theperformanceofConvKBfluctuatesacrossdifferentdatasetsandmetrics. Structure-Aware
Convolutional Network (SACN) [122] usesaweighted-GCN encoderandaConv-TransE decodertoextract
the embedding. This synergy successfully leverage graph connectivity structure information. InteractE [142]
network design ideas including feature permutation, a novel feature reshaping, and circular convolution
compared to ConvE and outperform baseline models significantly. ParamE [19] uses neural networks instead
of relation embedding to model the relational interaction between head and tail entities. MLP, CNN, and
gated structure layers are experimented and the gated layer turns out to be far more effective than embedding
approaches. ReInceptionE [164] applies Inception network to increase the interactions between head and
relationembeddings. Relation-awareattentionmechanisminthemodelaggregatethelocalneighborhood
featuresandtheglobalentityinformation. M-DCN[202]adoptsamulti-scaledynamicconvolutionalnetwork
to model complex relations such as 1-N, N-1, and N-N relations. All the above mentioned examples are
22
basedonneuralnetworks. Yet,thereisanon-neural-networkclassificationmethod,calledKGBoost[154],
developed recently. KGBoost proposes a novel negative sampling method and uses the XGBoost classifier for
linkprediction.
2.2.4 LossFunctions
LossfunctionisanimportantpartofKGElearning. Lossfunctionsaredesignedtoeffectivelydistinguish
validtriplesfromnegativesamples. Theultimategoalofoptimizingthelossfunctionistogetvalidtriples
ranked as high as possible. In early days of KGE learning, margin-based ranking loss is widely adopted. The
pairwisemax-margin loss can be formallydefinedas,
=
∑︁
(ℎ,,)∈G
(ℎ
′
,,
′
)∈G
′
max(0,+
(ℎ,)−
(ℎ
′
,
′
)), (2.24)
where(ℎ,,) denotesgroundtruthtriplefromthesetofallvalidtriplesG,(ℎ
′
,,
′
) denotesnegativesample
from the set of corrupted triplesG
′
. is the margin parameter which specifies how different
(ℎ,) and
(ℎ
′
,
′
) should be at optimum. In fact, a similar loss function is applied to optimize multiclass Support
Vector Machine (SVM) [157]. Both distance-based embedding models such as TransE, TransH, TransR,
TransD, and semantic matching-based models such as LFM, NTN, SME have successfully leveraged this
scoringfunction. [209]proposesaLimit-basedscoringlosstolimitthescoreofpositivetriplessothatthe
translationrelation in positive triples can beguaranteed. TheLimit-basedscorecanbedefinedas,
=
∑︁
(ℎ,,)∈G
(ℎ
′
,,
′
)∈G
′
{[+
(ℎ,)−
(ℎ
′
,
′
)]
+
+[
(ℎ,)−]
+
}, (2.25)
Morerecently,aDoubleLimitScoringLossisproposedby[208]toindependentlycontrolthegolden
triplets’scores and negative samples’ scores. Itcanbedefinedas,
23
=
∑︁
(ℎ,,)∈G
(ℎ
′
,,
′
)∈G
′
{[
(ℎ,)−
]
+
+[
−
(ℎ
′
,
′
)]
+
}, (2.26)
where
>
> 0. This loss function intend to encouragelow distancescore forpositive tripletsand high
distance scores for negative triplets. We can also trace the usage of a similar contrastive loss from Deep
TripletNetwork [61] for different image classificationtasks.
Self-adversarial negative sampling wasproposedinRotatE[133]andcanbedefinedas,
=− log(−
(ℎ,))−
∑︁
=1
(ℎ
′
,,
′
) log(
(ℎ
′
,
′
)−), (2.27)
Instead of giving equal weight to all possible negative triples, self-adversarial negative sampling assigns
moreweighttodifficultnegativetriples. In particular,difficultnegativetriplesaredefined astriplesthatare
morelikelytobemistakenlyidentifiedastruetriplebyusingembeddingsfromcurrentiteration,sincetheir
computeddistancesaresmall. Self-adversarialnegativesamplinglossisespeciallyeffectivefordistance-based
embeddingmodels. ManyKGEincludingourproposedCompoundE[50]andCompoundE3D[51]adoptthis
lossfunction.
Cross entropy or negative log-likelihood loss are known to be more effective for semantic matching
scoring functions such as CP [81] and ComplEx-DURA [200]. The negative log-likelihood loss can be
definedas,
=
∑︁
(ℎ,,)∈G∪G
′
{1+ exp[−
(ℎ,,)
·
(ℎ,)]} (2.28)
Binary cross entropy or bernoulli negative log-likelihood of logistic are often adopted in classifier based
models such as ConvE [32] as well as tensor based models such as TuckER [6]. The binary cross-entropy can
bedefinedas,
24
=
1
∑︁
=1
log
+(1−
) log(1−
) (2.29)
2.3 CompoundE3D
2.3.1 2DGeometricTransformations
Quite a few KGE models are inspired by 2D geometric transformations such as translation, rotation, and
scaling in the 2D plane. TransE [9] models the relation as a translation between head and tail entities.
This simple model is not able to model symmetric relations effectively. RotatE [133] treats relations as
certain rotations in the complex space, which works well for symmetric relations. Furthermore, RotatE
introducesaself-adversarialnegativesampling lossthat improves distance-based KGEmodel performance
significantly. PairRE [18] models relations with the scaling operation to allow variable margins. This is
helpful in encoding complex relations. The unitary constraint on entity embedding in PairRE is also effective
inpractice. CompoundE[50]adoptscompoundgeometrictransformations,includingtranslation,rotation,
and scaling, to model different relations. It offers a superior KGE model without increasing the overall
complexitymuch.
2.3.1.1 Advanced Transformations
NagE [180] introduces generic group theory to the design of KGE models and gives a generic recipe for
their construction. QuatE [197] extends the KGE design to the Quaternion space which enables more
compact interactions between entities and relations while introducing more degree of freedom. To model
non-commutativeness in relation composition more effectively, both RotatE3D [45] and DensE [94] leverage
quaternion rotations but in different forms. ROTH [16] adopts the hyperbolic curvature to capture the
hierarchical structure in KGs. On the other hand, it is questioned in [148] whether the introduction of
hyperbolicgeometry in KGE is necessary.
25
2.3.2 Classification-basedModels
Anotherfamilyofmodelsisbuiltbyclassifyinganunseentripleinto“valid"(orpositive)and“invalid"(or
negative)two classes and then using the softdecisiontomeasurethelikelihoodofthetriple.
2.3.2.1 Simple Neural Networks
A multilayer perceptron (MLP) network [35] is used to measure the likelihood of unseen triples for link
prediction. The neural tensor network (NTN) [124] adopts a bilinear tensor neural layer to model interactions
between entitiesand relations of triples. ConvE [32] stacks head entities and relations, reshapes themto 2D
arrays, and uses the convolutional neural network (CNN) to extract the information from them. The resulting
feature map interacts with tail entities through dot products. R-GCN [120] uses the graph convolutional
network (GCN) with relation-specific weights to obtain entity representations, which are subsequently fed to
DistMult[174] forlink prediction. Despiteits potentialof handlingthe inductivesetting, itsperformance is
notonparwith the embedding based approach.
2.3.2.2 Advanced Neural Networks
KG-BERT[182]usesthepretrainedlanguagemodel, BERT[33], toobtaintheentityrepresentationfrom
textual descriptions (rather than from KG links). However, its inference time is much longer compared to
embedding-basedmodels. SimKGC[149]improvestransformer-basedclassificationmethodsbyconstructing
contrastive pairs. It uses BERT to estimate the semantic similarity and treats triples of higher similarity score
as positive sample pairs, and vice versa. However, its performance is sensitive to the language model quality,
anditsrequired computational resource ishigh.
26
2.3.2.3 Lightweight Classification Model
KGBoost [154]proposes a novel negative sampling scheme, and uses the XGBoost [22] classifier forlink
prediction. Inspired by the Discriminant Feature Learning (DFT) [181, 79] that extracts most discriminative
features from trained embeddings, GreenKGC [153] is a lightweight and modularized classification method
thattrainsabinary classifier to classify unseentriples.
2.3.3 AdvancedRelationModeling
Special techniques have been developed to model complex relations. For example, to model relations such as
1-to-N,N-to-1, and N-to-Neffectively, TransH [155] projectsthe embedded entity spaceinto relation-specific
hyper-planes. TransR[89]learnsarelation-specificprojectionthatmapsentityvectorstoacertainrelation
space. TransD[68]derivesdynamicmappingbasedonrelationandentityprojectionvectors. TranSparse
[69] enforces the relation projection matrix to be sparse. Recently, many KGE models including X+AT
[177], SFBR [87], and STaR [84] apply translation and scaling operations to both distance-based and
semantic-matching-based [150] models to improve the performance gain. The inclusion of translation is
proven to be effective in improving KGEs in the Quaternion space such as DualE [13], BiQUE [54]. ReflectE
[193]modelseachrelationasanormalvectorofahyper-planethatreflectsentityvectors. Itcanbeusedto
modelsymmetricandinverserelationswell. Sofar,thecascadeofvariousaffineoperationsisanaturalyet
unexploredidea to pursue.
2.3.4 ModelEnsembles
Although ensemble learning is a prevailing strategy in machine learning, it remains under-explored for
knowledge graph completion. Link prediction evaluation is essentially a ranking problem. It is desired to
optimize an ensemble decision so that valid triples get ranked higher than invalid ones among all candidates.
Rank aggregation is a classical problem in information retrieval. Both supervised methods [12, 22] and
27
unsupervisedmethods[77,28]havebeenstudied. SincethegroundtruthrankinginKG’slinkpredictionisnot
available (except the top-1 triple), the unsupervised setting is more relevant. Yet, the use of rank aggregation
to boost link prediction performance has received limited attention. Several examples are given below. KEnS
[23]performsensembleinferencetocombinepredictionsfrommultiplelanguage-basedKGEsformultilingual
knowledgegraphcompletion. AutoSF[199]developsanalgorithmtosearchforthebestscoringfunctions
from multiple semantic matching models. The ensemble of multiple identical low-dimensional KGE models
is adopted in [169]to boost the link predictionperformance. Recently, DuEL [74] treats link predictionas a
classificationproblemandaggregatesbinary decisionsfromseveraldifferentclassifiersusingunsupervised
techniques.
2.4 EntityClassification
Workonentity type prediction can be categorizedintothreetypesaselaboratedbelow.
2.4.1 StatisticalApproach.
Before machine learning is introduced to KG entity type prediction, the type inference is often performed
usingRDFrulesandgraphpatternmatching[44]. Althoughthesehandcraftedrulescanmaketypepredictions
with highprecision, they tend tomiss a lotof possible cases sinceit is almostimpossible to manuallycreate
allpatterns. Hence,thisapproachisnotscalableforlarge-scaledatasets. Assuch,researchersapplybasic
statistics to solve the KG entity type prediction problem. An early method, called SDType [109], predicts
missingentitytypesbyestimatingtheempiricalprobabilitydistributionof(|) andaggregating
all such conditional probabilities generated by neighboring relations of the target entity. Although this
approachisrobusttonoise,itcannotpredictunseentypesandrelationcombinations. Anothershortcomingof
the statistical approach is that its performance deteriorates as the number of entity types becomes larger (say,
with thousands of entity types). Furthermore, it does not exploit the type information of neighboring entities.
28
In this work, we will show that the type prediction performance can be further improved by conditioning
on the type information of neighboring entities. Fig. 2.3 illustrates the statistical approach for entity type
prediction. In this figure, arrows with different colors denote different relation types. The red nodes with
different numbers denote different entity type labels. Each histogram denotes the type distribution of the tail
entitygivenaparticulartypeofheadentity. Supposewewanttopredictthetypeofthetargetnode,thengiven
theinformationof eachneighborrelation and neighbor node’s type, we can estimate the type distributionof
thetargetnode.
2.4.2 ClassificationApproach.
Nodeclassificationisacommontaskingraphlearning. Onesolutionistotrainaclassifierwithnodefeatures
as the input and corresponding labels as the output. This idea can be applied to entity type prediction as
well. Researchers[173,165,125]conductedexperimentswithentitytextualdescriptionsinformofword
embeddingsasfeaturesandneuralnetworkssuchasMLP,LSTM,andCNNasclassifiers. [154]useXGBoost
classifier [22] for link prediction and propose a novel negative sampling method. An end-to-end architecture
that learns entity embedding and type prediction jointly was proposed in [72]. The correlation between entity
attributesandKGlink structureswastakenintoaccountinthe learningofdistributedentityrepresentations.
In this work, with pretrained KG embeddings as features, we test a couple of classifiers such as SVM and
XGBoost. Again, we find this approach does not perform well for large datasets. In fact, a comparative study
on statistical and classifier approaches was conducted in [66]. Byanalyzing results from selected entity type
classes,theyconcludedthatKGembeddingsfailtocapturethesemanticsofentitiesandstatisticalapproaches
are often superior in type prediction. While we concur with their experimental findings that the combination
ofentityembeddingfeaturesandclassifierstendstoyieldpoorresults,wedonotagreethatKGembedding
modelscannot be useful for entity type predictionaselaboratedbelow.
29
2.4.3 EmbeddingApproach.
Before introducing the embedding approach for entity type prediction, it is worthwhile to review KG
Embedding models briefly. According to [70], a KG embedding model can be categorized based on its
representationspaceandscoringfunction. Amongseveralrepresentationspaces,realandcomplexvector
spaces are the two most common ones. TransE models triples in form of subject, relation, object or(,,)
in-dimensionalrealvectorspacewiththetranslationalprinciple e
s
+ w
r
≈ e
o
,where e
s
, w
r
, e
o
denotethe
vectorrepresentationforsubject,relation,andobject,respectively. Yet,TransEisnotsuitableformodeling
asymmetric and many-to-one relations. To overcome this weakness, researchers venture into the complex
vector space and design models with greater expressive power. ComplEx [140] and RotatE [133] are two
prominent examples. ComplEx is motivated by low-rank matrix factorization in the complex space to model
both symmetric and asymmetric relations effectively. Inspired by Euler’s identity, RotatE models relations as
rotationsinthecomplexvectorspacetoremedyComplEx’sinabilitytomodelcompositionpatternsinKG.
Both TransE and RotatE adopt distance-based scoring function while ComplEx has a semantic matching
score.
Besides KG embedding, embedding-based entity type prediction approaches learn a distributed represen-
tationofentitytypes. Forexample,adistance-basedscoringfunctioncanbeusedtomeasuretherelevance
of a particular type to the target entity. The ETE model [100] embeds entity and entity types in the same
space. ConnectE[205]embedsentitiesandtypesintwodifferentspacesandlearnsamappingfromtheentity
space to the type space. In addition, it leverages neighbor label information to boost the performance further.
Based on a similar idea, JOIE [205] adds an intra-view component to model the hierarchical structure of the
type ontology. ETE, ConnectE, and JOIE all adopt TransE embedding to represent KG entities. JOIE targets
at better results for link prediction whereas ETE and ConnectE focus on improving entity type prediction.
TransEisknowntosufferfromafewproblems,e.g.,notabletomodelasymmetricrelations. Poorrelation
representation leads to poor entity representation. Since the quality of entity representation affects entity type
30
predictionperformance,weexploittheexpressivepowerofcomplex-spaceKGembeddingtoachievebetter
results.
2.5 EntityAlignment
Entityalignmentisalong-standingprobleminKGresearch. Priortoembedding-basedmodels,traditional
methodsalignentitiesusingstrategiessuchasstringsimilarity[114],schemasimilarity[128]andneighborhood
similarity [80]. These methods are hardly applicable when the textual and ontological information is not
uniformacross different KGs.
Recently, several surveys on embedding-based entity alignment have been published with comprehensive
codebasesandsampleddatasets[132,204,188,196]. Thesecodebasesintegratedifferententityalignment
models together so that fair performance comparison of different models on these datasets can be carried
out. Basedontheentityembeddingtechniques,embedding-basedentityalignmentmodelshavetwomajor
categories [196]: Translation embedding-based methods and Graph Neural Networks (GNN)-based methods.
Theyarereviewed below.
2.5.1 Translational-embedding-basedMethods
MTransE[20],BootEA[131],JAPE[130],MultiKE[195],AttrE[139],andCOTSAE[178]belongtothis
category. TheyuseTransE[9]oravariantofTransE.MTransEproposesseveralscorefunctionsforalignment,
includingdistance-basedaxiscalibration,translationvectors,andlineartransformations. BootEAlearnsa
classifierthrough bootstrapping using the negativelog-likelihoodloss.
Auxiliary features such as entity attributes have also been extensively investigated. JAPE represents
attributefeaturesusingtheSkip-gramwordembedding. AttrErepresentsattributevaluesthroughdifferent
characterembeddingaggregationstrategies suchasLSTM.MultiKEmodelstheassociationbetween entity
embeddingandattributeembeddingusingCNNs. COTSAEusesaPseudoSiameseNetworktolearnattribute
31
predicate and value embedding. Apart from attribute features, visual features [90] generated from entity
images using the ResNet is leveraged in EVA to overcome the bottleneck of very few alignment seeds in
training. Inthiswork,weproposetoleveragetheentitytypefeatures. Althougharecentmethod,knownas
JTMEA[93],alsoconsideredtheentitytype,itwasbenchmarkedwithafewweakerandearlierbaselines.
WewilldemonstratethattheproposedTypeEAmodelcanoutperformstrongerbaselineswiththehelpoftype
features.
2.5.2 GNN-basedMethods
Itisalsopossibletousegraphneuralnetworkstolearnrepresentationsofentities. GCN-Align[156]uses
graph convolutional networks (GCN) to embed both the structural and attribute information of two KGs
in a common space with shared weight matrices. RDGCN [159] extends GCNs with highway gates to
capture the neighborhood information and includes the relation information by the attentive interaction
between a primal graph and a dual graph. Graph attention networks (GATs) are also explored. For example,
NAEA[210]embeddedtheneighborhoodinformationinadditiontoattributerelationsandattributevalues.
Atime-awareGNNbasedmodelisproposedinTEA-GNN[170]tohandlethealignmentofKGswiththe
temporal information. According to [132], BootEA and RDGCN are the top performing models on different
tasks of the DBP v1.1 dataset. In this work, we add the type information to these models and verify whether
TypeEAcan outperform previous best models.
32
(a)FB15K (b)FB15K-237
(c)WN18 (d)WN18RR
Figure 2.2: Hits@10 Link Predictionperformance on the most popularbenchmarking datasets.
33
Table 2.5: Distance-based KGEmodels
Model Ent. emb. Rel. emb. Scoring Function Space
TransE[9] h, t∈R
r∈R
−∥h+ r− t∥
1/2
(+)
TransR[89] h, t∈R
r∈R
, M
∈R
×
−∥M
h+ r− M
t∥
2
2
(+)
TransH[155] h, t∈R
r, w
∈R
−
h− w
⊤
hw
+ r−
t− w
⊤
tw
2
2
(+)
TransA[71] h, t∈R
r∈R
, M
∈R
×
(|h+ r− t|)
⊤
W
r
(|h+ r− t|) (
2
+)
TransF[43] h, t∈R
r∈R
(h+ r)
⊤
t+(t− r)
⊤
h (+)
TransD[68] h, h
, t, t
∈R
r, r
∈R
−
r
h
+ I
h+ r−
r
t
+ I
t
2
2
(+)
TransM[41] h, t∈R
r∈R
,
∈R −
∥h+ r− t∥
1/2
(+)
TranSparse[69] h, t∈R
r∈R
, M
(
)∈R
×
−∥M
(
) h+ r− M
(
) t∥
2
1/2
(+
)
M
1
1
, M
2
2
∈R
×
−
M
1
1
h+ r− M
2
2
t
2
1/2
ManifoldE [160] h, t∈R
r∈R
M(ℎ,,)−
2
2
(+)
TorusE[40] [h],[t]∈T
[r]∈T
min
(,)∈([ℎ]+[])×[]
∥−∥
(+)
RotatE[133] h, t∈C
r∈C
−∥h◦ r− t∥ (+)
PairRE[17] h, t∈R
r
H
, r
T
∈R
−∥h◦ r
− t◦ r
∥ (+)
34
Table2.6: Semanticmatching-based KGEmodels.
1
Thereare orthonormal and commutative constraints on the matrix.
Model Ent. emb. Rel. emb. Scoring Function Space
RESCAL [105] h, t∈R
M
∈R
×
h
⊤
M
t (
2
+)
DistMult[174] h, t∈R
r∈R
h
⊤
diag(r)t (+)
HolE[106] h, t∈R
r∈R
r
⊤
(ℎ★) (+)
ANALOGY[91] h, t∈R
1
ˆ
M
∈R
×
h
⊤
ˆ
M
t (+)
ComplEx [140] h, t∈C
r∈C
Re
⟨r, h, t⟩
(+)
SimplE [75] h, t∈R
r, r
′
∈R
1
2
(⟨h, r, t⟩+⟨t, r
′
, h⟩) (+)
Dihedral[167] h
()
, t
()
∈R
2
R
()
∈D
Í
=1
h
()⊤
R
()
t
()
(+)
TuckER [6] h, t∈R
r∈R
W×
1
h×
2
r×
3
t (
2
+
+
)
QuatE[197]
ℎ
,
∈H
∈H
ℎ
⊗
⊳
·
(+)
DualE[13]
ℎ
,
∈H
,
∈H
(
ℎ
⊗
⊳
+
)·
(+)
CrossE[198] h, t∈R
r∈R
(tanh(c
◦ h+ c
◦ h◦ r+ b) t
⊤
) (+)
SEEK[171] h, t∈R
r∈R
Í
,
r
, h
, t
,
(+)
35
Figure 2.3: Illustrationofthestatisticalapproach.
Figure 2.4: Illustration of the RotatE entity space, the RotatE type space and the regression linking these two
spaces.
36
Chapter3
CompoundE:KnowledgeGraphEmbeddingwithTranslation,Rotationand
ScalingCompoundOperations
3.1 Introduction
Knowledge graphs (KGs) such as DBpedia [3], YAGO [129], NELL [15], Wikidata [144], Freebase [8], and
ConceptNet [127] have been created and made available to thepublic to facilitate researchon KG modeling
andapplications. KGrepresentationlearning,alsoknownasknowledgegraphembedding(KGE),hasbeen
intensivelystudiedinrecentyears. Yet,itremainstobeoneofthemostfundamentalproblemsinArtificial
Intelligence(AI)andDataEngineeringresearch. KGEiscriticaltomanydownstreamapplicationssuchas
multihopreasoning [123, 36], KG alignment[21,48],entityclassification[52],etc.
Triples,denotedby(ℎ,,),arebasicelementsofaKG,whereℎ and areheadandtailentitieswhile is
the relation connecting them. For instance, the fact “Los Angeles is located in the USA” can be encoded
asatriple(LosAngeles,isLocatedIn,USA).Thelinkprediction(orKGcompletion)taskisoftenusedto
evaluate the effectiveness of KGE models. That is, the task is to predict with givenℎ and, or to predictℎ
withgiven and. KGE models are evaluatedbasedonhowwellthepredictionmatchesthegroundtruth.
There are several challenges in the design of good KGE models. First, real-world KGs often contain
a large number of entities. It is impractical to have high-dimensional embeddings due to device memory
37
constraints. Yet,theperformanceofKGEmodelsmaydeterioratesignificantlyinlow-dimensionalsettings.
How to design a KGE model that is effective in low-dimensional settings is not trivial. Second, complex
relation types (e.g., hierarchical relations, surjective relations, antisymmetric relations, etc.) remain difficult
to model. The link prediction performance for 1-N, N-1, and N-N relations is challenging for many existing
KGEmodels. Therelation“isLocatedIn”isanexampleofN-1relation. Sincetherearemanycitiesother
than “Los Angeles” also located in the USA, it is not easy to encode these relations effectively. Third, each of
extant KGE models has its own strengths and weaknesses. It is desired yet unclear how to design a KGE
modelthatleverages strengths of some modelsandcomplementsweaknessesofothers.
Geometric manipulation operations such as translation and rotation have been used to build effective
knowledge graph embedding (KGE) models (e.g., TransE, RotatE). Inspired by their success, we look for
generalized geometric manipulations in image processing [111]. To this end, translation, rotation, and
scalingarethreecommongeometricmanipulationoperations. Furthermore, theycan becascaded toyielda
genericcompoundoperationthatfindsnumerousapplications. Examplesincludeimagewarping[158],image
morphing[121], androbotmotionplanning[83]. Motivatedbythesynergy, weproposeanewKGEmodelto
addresstheabove-mentionedchallenges. Sincetranslation,rotation,andscalingoperationsarecascadedto
form a compound operation, the proposed KGE model is named CompoundE. Compound operations inherit
manydesirablepropertiesfromtheaffinegroup,allowingCompoundEtomodelcomplexrelationsindifferent
KGs. Moreover, since geometric transformations can be composed in any order, CompoundE has a large
numberofdesignvariationstochoosefrom. One canselect theoptimal CompoundEvariant thatbestsuits
thecharacteristics of an individual dataset.
3.2 Method
Translation,rotation,andscalingtransformationsappearfrequentlyinengineeringapplications. Inimage
processing, a cascade of translation, rotation, and scaling operations offers a set of image manipulation
38
Figure 3.1: An illustration of previousdistance-basedKGEmodelsandCompoundE.
techniques. SuchcompoundoperationscanbeusedtodevelopanewKGEmodelcalledCompoundE.We
define CompoundE in Section 3.2.1, explain that it belongs to the affine group in Section 3.2.2, and show
that it is a generalized form of TransE, RotatE, PairRE, and a few other distance-based embedding models in
Section3.2.3. Furthermore, we discuss thepropertiesofCompoundEinSection3.2.4.
3.2.1 DefinitionofCompoundE
Threeformsof CompoundE scoring functioncanbewrittenas
• CompoundE-Head
(ℎ,)=∥T
r
·R
r
·S
r
·h− t∥, (3.1)
39
• CompoundE-Tail
(ℎ,)=∥h−
ˆ
T
r
·
ˆ
R
r
·
ˆ
S
r
·t∥, (3.2)
• CompoundE-Full
(ℎ,)=∥T
r
·R
r
·S
r
·h−
ˆ
T
r
·
ˆ
R
r
·
ˆ
S
r
·t∥, (3.3)
where h, t denote head and tail entity embeddings, T
r
, R
r
, S
r
denote the translation, rotation, and scaling
operations for the head entity embedding, and
ˆ
T
r
,
ˆ
R
r
,
ˆ
S
r
denote the counterparts for the tail entity embedding,
respectively. Theseconstituentoperatorsarerelation-specific. Togeneralize,anyorderorsubsetoftranslation,
rotation, and scaling component can be a valid instance of CompoundE. Since matrix multiplications are non-
commutative, different orders of cascading the constituent operators result in distinct CompoundE operators.
WeillustratedifferentwaysofcascadinggeometrictransformationstocomposedistinctCompoundEoperators
inFig.3.2.
We follow RotatE’s negative sampling loss and the self-adversarial training strategy. The loss function of
CompoundEcan be written as
KGE
=− log(
1
−
(ℎ,))−
∑︁
=1
(ℎ
′
,,
′
) log(
(ℎ
′
,
′
)−
1
), (3.4)
where is the sigmoid function,
1
is a fixed margin hyperparameter, (ℎ
′
,,
′
) is the-th negative triple,
and(ℎ
′
,,
′
) istheprobabilityofdrawingnegativetriple(ℎ
′
,,
′
). Givenapositivetriple,(ℎ
,,
),the
negativesampling distribution is
(ℎ
′
,,
′
|{(ℎ
,,
)})=
exp−
1
(ℎ
′
,
′
)
Í
exp−
1
(ℎ
′
,
′
)
, (3.5)
where
1
isthe temperature of sampling.
40
Figure 3.2: Illustration ofdifferentwaysofcomposingcompoundoperations.
3.2.2 CompoundEasAnAffineGroup
Most analysis in previous work was restricted to the Special Euclidean Group SE() [14]. Yet, we will show
that CompoundE is not a special Euclidean group but an affine group. To proceed, we first formally introduce
theliegroup and three special groups as below.
Definition3.2.1. A Lie group is a continuousgroupthatisalsoadifferentiablemanifold.
SeveralLie group examples are givenbelow.
• Thereal vector space,R
, with the canonicaladditionasthegroupoperation.
• The real vector space excluding zero,(R\{0}), with the element-wise multiplication as the group
operation.
41
• The general linear group,
(R), with the the canonical matrix multiplication as the group operation.
Furthermore, the following three special groupsarecommonlyused.
Definition3.2.2. The special orthogonalgroupisdefinedas
SO()=
A
A∈ GL
(R), A
A= I, det(A)= 1
. (3.6)
Definition3.2.3. The special Euclidean groupisdefinedas
SE()=
A
A=
R v
0 1
, R∈ SO(), v∈R
. (3.7)
Definition3.2.4. The affine group is defined as
Aff ()=
M
M=
A v
0 1
, A∈ GL
(R), v∈R
. (3.8)
Bycomparing Eqs. (3.7) and (3.8), we seethat SE() isasubsetof Aff ().
Withoutloss of generality, consider= 2. If M∈ Aff (2),wehave
M=
A v
0 1
, A∈R
2×2
, v∈R
2
. (3.9)
The2Dtranslational matrix can be writtenas
T=
1 0
0 1
0 0 1
, (3.10)
42
whilethe2Drotational matrix can be expressedas
R=
cos() − sin() 0
sin() cos() 0
0 0 1
. (3.11)
It is easy to verify that they are both special Euclidean groups (i.e. T∈ SE(2) and R∈ SE(2)). On the other
hand,the2D scaling matrix is in form of
S=
0 0
0
0
0 0 1
. (3.12)
Itisnotaspecial Euclidean group but an affinegroupof = 2(i.e., S∈ Aff (2)).
Compoundingtranslationandrotationoperations,wecangetatransformationinthespecialEuclidean
group,
T·R=
1 0
0 1
0 0 1
cos() − sin() 0
sin() cos() 0
0 0 1
=
cos() − sin()
sin() cos()
0 0 1
∈ SE(2). (3.13)
Yet,ifweaddthescalingoperation,thecompoundwillbelongtotheAffinegroup. Oneofsuchcompound
operatorcanbe written as
T·R·S=
1 0
0 1
0 0 1
cos() − sin() 0
sin() cos() 0
0 0 1
0 0
0
0
0 0 1
=
cos() −
sin()
sin()
cos()
0 0 1
∈ Aff (2).
(3.14)
43
When
≠ 0 and
≠ 0, the compound operatorisinvertible. Itcanbewritteninformof
M
−1
=
A
−1
−A
−1
v
0 1
. (3.15)
Furthermore,ahigh-dimensionalrelationoperator canberepresented asa blockdiagonalmatrixinthe
formof
M
r
= diag(O
r,1
, O
r,2
,..., O
r,n
), (3.16)
where O
r,i
is the compound operator at the-thstage. Wecanmultiply M
r
·v inthefollowingmanner,
O
,1
0 ... 0
0 O
,2
... 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 ... O
,
1
1
2
2
.
.
.
. (3.17)
where v = [
1
,
1
,
2
,
2
,...,
,
]
are 2 dimensional entity vectors that are split into multiple 2d
subspaces.
3.2.3 RelationwithOtherDistance-basedKGEModels
CompoundE is actually a general form of quite a few distance-based KGE models. That is, we can derive
their scoringfunctions from that of CompoundE by setting translation, scaling, and rotation operations to
44
certainforms. Four examples are given below.
Derivation of TransE. We begin with CompoundE-Head and set its rotation component to identity matrix I
andscalingparameters to 1. Then, we getthescoringfunctionofTransEas
(ℎ,)=∥T
r
·I·diag(1)·h− t∥=∥h+ r− t∥. (3.18)
Derivationof RotatE. Wecan derivethe scoring function of RotatE from CompoundE-Head by setting the
translationcomponent to I(translation vector t= 0)andscalingcomponentto 1.
(ℎ,)=∥I·R
r
·diag(1)·h− t∥=∥h◦ r− t∥. (3.19)
DerivationofPairRE.CompoundE-FullcanbereducedtoPairREbysettingbothtranslationandrotation
componentto I, for both head and tail transformation.
(ℎ,)=∥I·I·S
r
·h− I·I·
ˆ
S
r
·t∥=∥h⊙ r
H
− t⊙ r
T
∥. (3.20)
Derivationof LinearRE.We can add backthetranslationcomponentfortheheadtransformation:
(ℎ,)=∥T
r
·I·S
r
·h− I·I·
ˆ
S
r
·t∥=∥h⊙ r
H
+ r− t⊙ r
T
∥.
(3.21)
3.2.4 PropertiesofCompoundE
CompoundE has a richer set of operations and, therefore, it is more powerful than previous KGE models
in modeling complex relations such as 1-to-N, N-to-1, and N-to-N relations in KG datasets. Modeling
theserelationsareimportantsincemorethan98%oftriplesinFB15k-237andWN18RRdatasetsinvolves
complex relations. The importance of complex relation modeling is illustrated by two examples below. First,
45
Inference pattern TransE RotatE DistMult ComplEx CompoundE
1-to-N:(,
1
)∧(,
2
),
1
≠
2
× × ✓ ✓ ✓
N-to-1: (
1
,)∧(
2
,),
1
≠
2
× × ✓ ✓ ✓
N-to-N:(
,
),
,
∈ E × × ✓ ✓ ✓
Symmetry:
1
(,)⇒
1
(,) × ✓ ✓ ✓ ✓
Antisymmetry:
1
(,)⇒¬
1
(,) ✓ ✓ × ✓ ✓
Inversion:
1
(,)⇔
2
(,) ✓ ✓ × ✓ ✓
Composition:
1
(,)∧
2
(,)⇒
3
(,) ✓ ✓ × × ✓
Non-Commutative:
2
(,)∧
1
(,)⇒
4
(,) × × × × ✓
Multiplicity:
1
(,)∧
2
(,) × ✓ × × ✓
Table3.1: Comparing relation patterns that CompoundE can model withother popular KGE.
46
thereisaneedtodistinguishdifferentoutcomesofrelationcompositionswhenmodelingnon-commutative
relations. That is
1
·
2
→
3
while
2
·
1
→
4
. For instance,
1
,
2
,
3
and
4
denote isFatherOf,
isMotherOf, isGrandfatherOf and isGrandmotherOf, respectively. TransE and RotatE cannot make such
distinction since they are based on commutative relation embeddings. Second, to capture the hierarchical
structureofrelations,itisessentialtobuildagoodmodelforsub-relations. Forexample,
1
and
2
denote
isCapitalCityOfand cityLocatedInCountry,respectively. Logically,isCapitalCityOfisasub-relationof
cityLocatedInCountrybecause if(ℎ,
1
,) istrue,then(ℎ,
2
,) mustbetrue.
Let Mand
ˆ
Mdenotethecompoundoperationfortheheadandtailentityembeddings,respectively. In
thefollowing,wewillprovethatCompoundEiscapableofmodelingsymmetric/antisymmetric,inversion,
transitive,commutative/non-commutative,andsub-relations.
Proposition3.2.1. CompoundE can model1-Nrelations.
Proof. Arelation isan1-Nrelationiffthereexistatleasttwodistincttailentities
1
and
2
suchthat(ℎ,,
1
)
and(ℎ,,
2
) both hold. Then we have:
M·h=
ˆ
M·t
1
, M·h=
ˆ
M·t
2
ˆ
M·(t
1
− t
2
)= 0
(3.22)
Since t
1
≠ t
2
, CompoundE can model 1-Nrelationswhen
ˆ
Missingular. □
Proposition3.2.2. CompoundE can modelN-1relations.
Proof. A relation is an N-1 relation iff there exist at least two distinct head entities ℎ
1
andℎ
2
such that
(ℎ
1
,,) and(ℎ
1
,,) both hold. Then wehave:
M·h
1
=
ˆ
M·t, M·h
2
=
ˆ
M·t
M·(h
1
− h
2
)= 0
(3.23)
47
Since h
1
≠ h
2
, CompoundE can model N-1relationswhen Missingular. □
Proposition3.2.3. CompoundE can modelN-Nrelations.
Proof. By the proof for Prop.3.2.1 and 3.2.2, N-N relations can be modeled when both M and
ˆ
M are
singular. □
Proposition3.2.4. CompoundE can modelsymmetricrelations.
Proof. Arelation is a symmetric relationiff (ℎ,,) and(,,ℎ) holdssimultaneously. Thenwehave:
M·h=
ˆ
M·t =⇒ h= M
−1
ˆ
M·t
M·t=
ˆ
M·h =⇒ M·t=
ˆ
MM
−1
ˆ
M·t
M
ˆ
M
−1
=
ˆ
MM
−1
(3.24)
Therefore,CompoundE can model symmetricrelationswhen M
ˆ
M
−1
=
ˆ
MM
−1
. □
Proposition3.2.5. CompoundE can modelantisymmetricrelations.
Proof. Arelation isaantisymmetricrelationiff (ℎ,,) holdsbut(,,ℎ) doesnot. Bysimilarprooffor
Proposition3.2.4, CompoundE can modelsymmetricrelationswhen M
ˆ
M
−1
≠
ˆ
MM
−1
. □
Proposition3.2.6. CompoundE can modelinversionrelations.
Proof. A relation
2
is the inverse of relation
1
iff (ℎ,
1
,) and(,
2
,ℎ) holds simultaneously. Then we
have:
M
1
·h=
ˆ
M
1
·t =⇒ h= M
−1
1
ˆ
M
1
·t
M
2
·t=
ˆ
M
2
·h =⇒ M
2
·t=
ˆ
M
2
M
−1
1
ˆ
M
1
·t
ˆ
M
−1
2
M
2
= M
−1
1
ˆ
M
1
(3.25)
Therefore,CompoundE can model inversionrelationswhen
ˆ
M
−1
2
M
2
= M
−1
1
ˆ
M
1
. □
48
Proposition3.2.7. CompoundE can modelrelationcompositions.
Proof.
3
isacompositionof
1
and
2
iff (
1
,
1
,
2
),(
2
,
2
,
3
),and(
1
,
3
,
3
) holdsimultaneously. Then
wehave:
M
1
·e
1
=
ˆ
M
1
·e
2
=⇒ e
1
= M
−1
1
ˆ
M
1
·e
2
M
2
·e
2
=
ˆ
M
2
·e
3
=⇒ e
3
=
ˆ
M
−1
2
M
2
·e
2
M
3
·e
1
=
ˆ
M
3
·e
3
M
3
M
−1
1
ˆ
M
1
·e
2
=
ˆ
M
3
ˆ
M
−1
2
M
2
·e
2
ˆ
M
−1
3
M
3
=(
ˆ
M
−1
2
M
2
)(
ˆ
M
−1
1
M
1
)
(3.26)
Therefore,CompoundE can model relationscompositionwhen
ˆ
M
−1
3
M
3
=(
ˆ
M
−1
2
M
2
)(
ˆ
M
−1
1
M
1
). □
Proposition3.2.8. CompoundE can modelbothbothcommutativeandnon-commutativerelations.
Proof. Since the general form of affine group is non-commutative, our proposed CompoundE is non-
commutative i.e.
(M
1
ˆ
M
−1
1
)(M
2
ˆ
M
−1
2
)≠(M
2
ˆ
M
−1
2
)(M
1
ˆ
M
−1
1
) (3.27)
where each M consists of translation, rotation, and scaling component. However, in special cases, when
our relation embedding has only one of the translation, rotation, or scaling component, then the relation
embeddingbecomes commutative again. □
Proposition3.2.9. CompoundE can modelsub-relations.
Proof. Arelation
1
isasub-relationof
2
if(ℎ,
2
,) implies(ℎ,
1
,). Withoutlossofgenerality,suppose
ourcompounding operation takes the followingform
M= T·R·S,
ˆ
M=
ˆ
T·
ˆ
R·
ˆ
S, (3.28)
49
andsuppose
T
1
= T
2
,
ˆ
T
1
=
ˆ
T
2
,
R
1
= R
2
,
ˆ
R
1
=
ˆ
R
2
,
S
1
=S
2
,
ˆ
S
1
=
ˆ
S
2
,≤ 1.
(3.29)
Withtheseconditions,wecancomparetheCompoundEscoresgeneratedby(ℎ,
1
,) and(ℎ,
2
,) asfollows:
1
(ℎ,)−
2
(ℎ,)
=∥T
1
·R
1
·S
1
·h−
ˆ
T
1
·
ˆ
R
1
·
ˆ
S
1
·t∥−∥T
2
·R
2
·S
2
·h−
ˆ
T
2
·
ˆ
R
2
·
ˆ
S
2
·t∥
=∥T
2
·R
2
·(S
2
)·h−
ˆ
T
2
·
ˆ
R
2
·(
ˆ
S
2
)·t∥−∥T
2
·R
2
·S
2
·h−
ˆ
T
2
·
ˆ
R
2
·
ˆ
S
2
·t∥
=∥(T
2
·R
2
·S
2
·h−
ˆ
T
2
·
ˆ
R
2
·
ˆ
S
2
·t)∥−∥T
2
·R
2
·S
2
·h−
ˆ
T
2
·
ˆ
R
2
·
ˆ
S
2
·t∥≤ 0
(3.30)
This means that(ℎ,
1
,) generates a smaller error score than(ℎ,
2
,). If(ℎ,
2
,) holds,(ℎ,
1
,) must also
holds. Therefore,
1
is a sub-relation of
2
. □
Proposition3.2.10. CompoundE can modeltransitiverelations.
Proof. is a transitive relation iff (
1
,,
2
),(
2
,,
3
), and(
1
,,
3
) hold simultaneously. Consider the
CompoundEvariant, and let R=
ˆ
R, S beaidempotentmatrix.
(ℎ,)=∥S·R·h−
ˆ
R·t∥
=∥R·(R
−1
SR·h− t)∥
=∥R
−1
SR·h− t∥
(3.31)
50
Let M
= R
−1
SR. Then it is easy to see that
M
·M
····· M
=(R
−1
SR)·(R
−1
SR)·····( R
−1
SR)
=R
−1
SR
=M
(3.32)
Therefore,CompoundE can model transitiverelations. □
3.3 Experiments
3.3.1 LinkPrediction
3.3.1.1 Datasets.
We conduct experiments on three widely used benchmarking datasets: ogbl-wikikg2, FB15k-237, and
WN18RR.ogbl-wikikg2isachallengingOpenGraphBenchmarkdataset[62]extractedfromtheWikidata
[144]KG.ItschallengeistodesignembeddingmodelsthatcanscaletolargeKGs. FB15k-237andWN18RR
areextractedfromtheFreebase[8]andtheWordNet[99],respectively. Inverserelationsareremovedfrom
bothtoavoidthedataleakageproblem. Their mainchallenge lies inmodeling symmetry/antisymmetryand
compositionrelation patterns. The detailedstatisticsofthethreedatasetsareshowninTable3.2.
Table3.2: DatasetsStatistics
Dataset #Entities #Relations #Training #Validation #Test
FB15k-237 14,541 237 272,115 17,535 20,466
WN18RR 40,943 11 86,835 3,034 3,134
ogbl-wikikg2 2,500,604 535 16,109,182 429,456 598,543
51
3.3.1.2 Evaluation Protocol.
ToevaluatethelinkpredictionperformanceofCompoundE,wecomputetherankofthegroundtruthentityin
thelistoftopcandidates. Sinceembeddingmodelstendtorankentitiesobservedinthetrainingsethigher,we
computethefilteredranktoprioritizecandidatesthatwouldresultinunseentriples. Wefollowtheconvention
andadopttheMeanReciprocalRank(MRR)andHits@ metricstocomparethequalityofdifferentKGE
models. The MRR can be computed as follows
MRR=
1
||
∑︁
∈
1
Rank
, (3.33)
whiletheHit@ can be computed as
Hits@ =
1
||
∑︁
∈
1{Rank
≤ }. (3.34)
HigherMRR and H@ values indicate bettermodelperformance.
3.3.1.3 Performance Benchmarking.
Tables3.3and3.4showthebestperformanceofCompoundEandotherbenchmarkingmodelsforogbl-wikikg2
and FB15k-237/WN18RR datasets, respectively. The best results are shown in bold fonts. CompoundE
consistentlyoutperformsallbenchmarkingmodelsacrossallthreedatasets. AsshowninTable3.3,theresults
ofCompoundEaremuchbetterthanpreviousKGEmodelswhiletheembeddingdimensionandthemodel
parameter numbers are significantly lower for the ogbl-wikikg2 dataset. This implies lower computation and
memory costs of CompoundE. We see from Table 3.4 that CompoundE has achieved significant improvement
overdistance-basedKGEmodelsusingasingleoperation,eithertranslation(TransE),rotation(RotatE),or
scaling(PairRE).Thisconfirmsthatcascadinggeometrictransformationsisaneffectivestrategyfordesigning
KGembeddings.
52
Table 3.3: Filteredrankingoflinkpredictiononogbl-wikikg2
Datasets ogbl-wikikg2
Metrics Dim
Valid Test
MRR MRR Hit@1 Hit@3 Hit@10
AutoSF+NodePiece [199] - 0.5806 0.5703 - - -
ComplEx-RP [25] 50 0.6561 0.6392 - - -
TransE [9] 500 0.4272 0.4256 - - -
DistMult [174] 500 0.3506 0.3729 - - -
ComplEx [140] 250 0.3759 0.4027 - - -
RotatE [133] 250 0.4353 0.4353 - - -
PairRE [17] 200 0.5423 0.5208 - - -
TripleRE 200 0.6045 0.5794 - - -
CompoundE 100 0.6704 0.6515 0.5843 0.6781 0.7872
Table 3.4: Filtered rankingoflinkpredictionforFB15k-237andWN18RR.
Datasets FB15K-237 WN18RR
Metrics MRR Hit@1 Hit@3 Hit@10 MRR Hit@1 Hit@3 Hit@10
TransE [9] 0.294 - - 0.465 0.226 - - 0.501
DistMult [174] 0.241 0.155 0.263 0.419 0.430 0.390 0.440 0.490
ComplEx [140] 0.247 0.158 0.275 0.428 0.440 0.410 0.460 0.510
RotatE [133] 0.338 0.241 0.375 0.533 0.476 0.428 0.492 0.571
TorusE [38] 0.316 0.217 0.335 0.484 0.453 0.422 0.464 0.512
TuckER [6] 0.358 0.266 0.394 0.544 0.470 0.443 0.482 0.526
AutoSF [199] 0.360 0.267 - 0.552 0.490 0.451 - 0.567
RotatE3D [45] 0.347 0.250 0.385 0.543 0.489 0.442 0.505 0.579
MQuadE [184] 0.356 0.260 0.392 0.549 - - - -
PairRE [17] 0.351 0.256 0.387 0.544 - - - -
M-DCN [202] 0.345 0.255 0.380 0.528 0.475 0.440 0.485 0.540
GIE [14] 0.362 0.271 0.401 0.552 0.491 0.452 0.505 0.575
ReflectE [193] 0.358 0.263 0.396 0.546 0.488 0.450 0.501 0.559
CompoundE 0.367 0.275 0.402 0.555 0.493 0.451 0.507 0.578
3.3.1.4 Implementation and Optimal Configurations.
The statistics of the three datasets used in our experiments are summarized in Table 3.2. In the experiments,
we normalize all entity embeddings to unit vectors before applying compound operations. The optimal
configurationsofCompoundEaregiveninTable3.5. Theimplementationoftherotationoperationinthe
optimalCompoundE configuration for the WN18RRdatasetisadaptedfromRotatE.
53
3.3.1.5 Hyperparameter Tuning.
WeconducttwosetsofcontrolledexperimentstofindthebestmodelconfigurationsforFB15k-237,WN18RR,
and ogbl-wikikg2 datasets. For the first set, we evaluate the effect of different combinations of learning rates
and embedding dimensions while keeping the batch size, the negative sample size, and other parameters
constant. Forthesecondset,weevaluatetheeffectofdifferentcombinationsofthetrainingbatchsizeand
thenegativesamplesize,whilekeepingthelearningrate,theembeddingdimension,andotherparameters
constant. Figs. 3.3(a)-(c)showMRRscoresofCompoundEunderdifferentlearningratesandembedding
dimension settings, for ogbl-wikikg2, FB15k-237, and WN18RR, respectively. We see that CompoundE does
not require a higher embedding dimension to achieve the optimal performance, indicating that its model
sizecanbesmall. Thisis attractive fortrainingandinferenceonmobile/edgedeviceswithlimitedmemory
capacity. Figs. 3.3 (d)-(f) show MRR scoresofCompoundEunderdifferentbatch sizeandnegativesample
sizesettings,forogbl-wikikg2,FB15k-237, andWN18RR, respectively. Basedon theseresults, weareable
to make some dataset specific observations. For ogbl-wikikg2, increasing the batch size to a reasonably large
sizeishelpfulforimprovingtheoverallrankingscore,whilethemodelperformanceisrelativelyinsensitive
to changes in negative sample size. For FB15k-237 and WN18RR, setting optimal learning rates is essential
forobtaining good results. The optimal modelconfigurationsforthreedatasetsaregiveninTable3.5.
Table3.5: OptimalConfiguration
Dataset CompoundEVariant #Dim lr B N
ogbl-wikikg2 ∥h−
ˆ
S·
ˆ
T·
ˆ
R·t∥ 100 0.005 4096 250 1 7
FB15k-237 ∥S·R·T·h−
ˆ
S·
ˆ
R·
ˆ
T·t∥ 1500 0.00005 1024 125 1 6
WN18RR ∥R·S·T·h−
ˆ
S·
ˆ
T·t∥ 200 0.00005 1024 256 0.5 6
3.3.2 PathQueryAnswering
Pathqueryisimportantsinceitisoftendesiredtoperformcomplexqueriesonknowledgegraph. Forexample,
one might ask "where did Michelle Obama’s spouse live in?". To obtain the answer, a model first need to
54
(a)ogbl-wikikg2 (b) FB15k-237 (c)WN18RR
(d) ogbl-wikikg2 (e) FB15k-237 (f)WN18RR
Figure 3.3: Heatmap of test MRR scores obtained from learningrateand dimension grid search.
55
correctlypredictthefactthat(MichelleObama,spouse,BarackObama),andthenpredict(BarackObama,
livedIn, Chicago). CompoundE has the property to perform well on this task since it is capable of modeling
thenon-commutative relation compositions.
In Path Query Answering (PQA), a tuple(,,) is given, where and denote the source and target
entities and ={
1
,...,
} denotes the relation path consisting of a sequence of relations that links
→
1
→
2
···→
→. PQA tests that after traversing through the relation path from a given source
entity,whetherthemodelisabletopredictthecorrecttargetentity. Duringtesting,thegroundtruth ishidden
andwecomputethescoreforallcandidatetargetentitiesandevaluatethequantileofgroundtruth,which is
the fraction of irrelevant candidates that’s ranked lower than the ground truth. Mean quantile of all test paths
arereported. Inparticular,typematchpathsareexcludedsincethosearetrivialforprediction. Specifically,
we use both the KG triples and sampled paths with length||∈{2, 3, 4, 5} to train the embedding, which
is also referred to as the “comp” setting [57]. We use CompoundE to perform PQA on the Freebase and
WordNet datasets prepared by [57]. Statistics of these two datasets are shown in Table 3.6. Performance
comparison with previous models on the PQA taskunder the “comp” setting isshown in Table 3.7. Results
showthatCompoundE is very competitiveforthePQAtaskamongpureembeddingmodels.
Dataset #Ent #Rel
#KGTriples #Path
#Train #Valid #Test #Train #Valid #Test
WordNet 38,551 11 110,361 2,602 10,462 2,129,539 11,277 46,577
Freebase 75,043 13 316,232 5,908 23,733 6,266,058 27,163 109,557
Table 3.6: PathQueryAnsweringDatasetsStatistics.
3.3.3 Task2: KGEntityTyping
KGEntitytypingpredictsclasslabelsfornodesinknowledgegraph. Entitytypeprovidessemanticsignals
for information extraction tasks such as relation extraction [172], entity linking [56, 27] and coreference
56
WordNet Freebase
MQ H@10 MQ H@10
Bilinear 0.894 0.543 0.835 0.421
TransE 0.933 0.435 0.880 0.505
DistMult 0.904 0.311 0.848 0.386
RotatE 0.947 0.653 0.901 0.601
Rotate3D 0.949 0.671 0.905 0.621
CompoundE 0.951 0.674 0.913 0.650
Table 3.7: Performancecomparisonforpathqueryanswering.
resolution [37]. Entity typing is challenging since each entity may be associated with a large number of type
labels. Weshow that CompoundE can alsobeeffectiveforperformingentitytyping.
We perform entity typing using CompoundE embedding on the FB15k-ET and YAGO43k-ET dataset
prepared by [100]. Statistics of these datasets are shown in Table 3.9. In addition to RDF triples(ℎ,,),
entity and entity type pairs(,) are added to theseentity typing datasets. Since the typecan be interpreted
as “isA” relationship between and, we add a “type” relation between the and pair and treat that as a
special triple. Performance comparison with existing work is shown inTable 3.8. The optimal configuration
is shown in Table 3.10. Similar to link prediction, we also report the MRR and Hits@ scores. Results show
thatCompoundEachievessignificantimprovementoverothermodelsespeciallyfortheYAGO43k-ETdataset,
even without giving special treatment to the representation of entity types. This observation supports the
claimthatCompoundE is strongly capableofrepresentingentitysemantics.
Datasets FB15k-ET YAGO43k-ET
Metrics MRR H@1 H@3 H@10 MRR H@1 H@3 H@10
TransE [9] 0.45 31.51 51.45 73.93 0.21 12.63 23.24 38.93
TransE-ET [100] 0.46 33.56 52.96 71.16 0.18 9.19 19.41 35.58
ETE [100] 0.50 38.51 55.33 71.93 0.23 13.73 26.28 42.18
HMGCN [73] 0.51 39.02 54.75 72.36 0.25 14.21 27.34 43.69
ConnectE [205] 0.59 49.55 64.32 79.92 0.28 16.01 30.85 47.92
CORE [52] 0.60 48.91 66.30 81.60 0.35 24.17 39.18 54.95
AttEt [213] 0.62 51.66 67.68 82.13 0.35 24.43 41.31 56.48
CompoundE 0.64 52.49 71.88 85.89 0.48 36.36 55.80 70.31
Table3.8: Entitytyping performance comparisonfor FB15k-ETand YAGO43k-ET datasets. Best resultare
inboldandsecond best result are underlined.
57
Dataset #Ent #Rel #Type
#KGTriples #EntityTypePairs
#Train #Valid #Test #Train #Valid #Test
FB15k-ET 14,951 1,345 3,851 483,142 50,000 59,071 136,618 15,749 15,780
YAGO43k-ET 42,335 37 45,182 331,687 29,599 29,593 375,853 42,739 42,750
Table 3.9: EntityTypingDatasetsStatistics.
Table3.10: OptimalConfigurationsforEntityTyping. BdenotesthebatchsizeandNdenotesthenegative
samplesize.
Dataset CompoundEVariant #Dim lr B N
FB15k-ET ∥R·T·S·h−
ˆ
R·
ˆ
T·
ˆ
S·t∥ 1500 0.00005 2048 512 1 10
YAGO43k-ET ∥h−
ˆ
T·
ˆ
S·
ˆ
R·t∥ 1000 0.00005 1024 256 1 6
3.3.4 AblationStudiesonCompoundEVariants.
We conduct ablation studies on three variants of CompoundE; namely, CompoundE-Full, CompoundE-Head
andCompoundE-TailasdescribedinSection3.2.1. Thegoalistodeterminethevariantthatperformsthe
bestonFB15k-237 andogbl-wikikg2datasets. Moreover,wetestdifferentwaysofcomposingCompoundE
by shuffling the order of translation, scaling, and rotation operations. Since geometric transformations
are not commutative, different orders of cascading yield different models. Hence, we expect results for
different CompoundE variants to be different. We present results of distinct CompoundE variants on
ogbl-wikikg2 and FB15k-237 in Table 3.11. The main results are summarized here. For ogbl-wikikg2,
the best performing scoring function is∥h−
ˆ
S·
ˆ
T·
ˆ
R·t∥ while the best performing scoring function for
FB15k-237is∥S·R·T·h−
ˆ
S·
ˆ
R·
ˆ
T·t∥. Forsimplicity,wesetthehyperparameterstobethesameacross
experimentsin all variants.
3.3.5 PerformanceonComplexRelationTypes.
To gain insights into the superior performance of CompoundE, we examine the performance of CompoundE
on each type of relations. KGrelations can becategorized into4 types: 1) 1-to-1, 2)1-to-N, 3)N-to-1, and 4)
N-to-N. We can classify relations by counting the co-occurrence of their respective head and tail entities.
58
Table3.11: Filtered rankingof link prediction for ogbl-wikikg2 and FB15k-237
Datasets ogbl-wikikg2 FB15k-237
Model MRR Hit@1 Hit@3 Hit@10 MRR Hit@1 Hit@3 Hit@10
∥T·R·S·h− t∥ 0.6001 0.5466 0.6187 0.7043 0.3373 0.2455 0.3720 0.5217
∥T·S·R·h− t∥ 0.5972 0.5431 0.6157 0.7002 0.3359 0.2467 0.3685 0.5171
∥S·T·R·h− t∥ 0.6019 0.5459 0.6211 0.7091 0.3354 0.2461 0.3680 0.5169
∥R·T·S·h− t∥ 0.5838 0.5288 0.6016 0.6880 0.3356 0.2456 0.3687 0.5167
∥S·R·T·h− t∥ 0.6006 0.5460 0.6185 0.7043 0.3342 0.2449 0.3662 0.5141
∥R·S·T·h− t∥ 0.5834 0.5239 0.6039 0.6979 0.3355 0.2460 0.3698 0.5165
∥h−
ˆ
T·
ˆ
R·
ˆ
S·t∥ 0.6440 0.5777 0.6688 0.7787 0.3326 0.2393 0.3681 0.5183
∥h−
ˆ
T·
ˆ
S·
ˆ
R·t∥ 0.6497 0.5827 0.6758 0.7858 0.3302 0.2384 0.3658 0.5123
∥h−
ˆ
S·
ˆ
T·
ˆ
R·t∥ 0.6515 0.5844 0.6781 0.7873 0.3313 0.2397 0.3680 0.5113
∥h−
ˆ
R·
ˆ
T·
ˆ
S·t∥ 0.6434 0.5779 0.6678 0.7763 0.3312 0.2394 0.3674 0.5136
∥h−
ˆ
S·
ˆ
R·
ˆ
T·t∥ 0.6474 0.5802 0.6739 0.7842 0.3290 0.2384 0.3640 0.5090
∥h−
ˆ
R·
ˆ
S·
ˆ
T·t∥ 0.6442 0.5777 0.6699 0.7788 0.3298 0.2383 0.3650 0.5114
∥T·R·S·h−
ˆ
T·
ˆ
R·
ˆ
S·t∥ 0.5479 0.4918 0.5661 0.6549 0.3426 0.2535 0.3770 0.5201
∥T·S·R·h−
ˆ
T·
ˆ
S·
ˆ
R·t∥ 0.5776 0.5210 0.5975 0.6852 0.3613 0.2702 0.3948 0.5462
∥S·T·R·h−
ˆ
S·
ˆ
T·
ˆ
R·t∥ 0.5782 0.5249 0.5962 0.6783 0.3597 0.2687 0.3937 0.5450
∥R·T·S·h−
ˆ
R·
ˆ
T·
ˆ
S·t∥ 0.5611 0.5053 0.5802 0.6660 0.3402 0.2506 0.3721 0.5213
∥S·R·T·h−
ˆ
S·
ˆ
R·
ˆ
T·t∥ 0.5736 0.5175 0.5918 0.6805 0.3634 0.2718 0.3984 0.5500
∥R·S·T·h−
ˆ
R·
ˆ
S·
ˆ
T·t∥ 0.5586 0.5057 0.5743 0.6592 0.3493 0.2593 0.3836 0.5301
59
A relation is classified as a 1-to-1 if each head entity can co-occur with at most one tail entity, 1-to-N if
eachheadentitycanco-occurwithmultipletailentities,N-to-1ifmultipleheadentitiescanco-occurwith
thesametailentity,andN-to-Nifmultipleheadentitiescanco-occurwithmultipletailentities. Wemake
decision based on the following rule. For each relation,, we compute the average number of subject (head)
entitiesperobject(tail)entityasℎ
andtheaveragenumberofobjectentitiespersubjectasℎ
. Then,
withaspecific threshold ,
ℎ
< andℎ
< =⇒ is1-to-1
ℎ
< andℎ
≥ =⇒ is1-to-N
ℎ
≥ andℎ
< =⇒ isN-to-1
ℎ
≥ andℎ
≥ =⇒ isN-to-N.
(3.35)
We set = 1.5 as a logical threshold by following the convention. Table 3.12 compares the MRR scores
of CompoundEwithbenchmarkingmodelson1-to-1,1-to-N,N-to-1,andN-to-Nrelationsinheadandtail
entities prediction performance for the FB15k237 dataset. We see that CompoundE consistently outperforms
benchmarkingmodelsinallrelationcategories. WeshowtheperformanceofCompoundEoneachrelationtype
fortheWN18RRdatasetinTable3.13. Generally,CompoundEhasasignificantadvantageoverbenchmarking
modelsforcertain1-Nrelations(e.g.,“member_of_domain_usage”and“member_of_domain_region”)
and for some N-1 relations (e.g., “synset_domain_topic_of”). CompoundE is more effective than traditional
KGEmodels in modeling complex relations.
60
Figure3.4:-SNEvisualizationofentityembeddinginthe2DspaceforsomemajorentitytypesinFB15K-237.
Figure 3.5: MRRscoresonogbl-wikikg2dataset.
61
Table 3.12: FilteredMRRonfourrelationtypesofFB15k-237.
Task PredictingHead PredictingTail
Rel. Category 1-to-1 1-to-N N-to-1 N-to-N 1-to-1 1-to-N N-to-1 N-to-N
TransE 0.492 0.454 0.081 0.252 0.485 0.072 0.740 0.367
RotatE 0.493 0.471 0.088 0.259 0.491 0.072 0.748 0.370
PairRE 0.496 0.476 0.117 0.274 0.492 0.073 0.763 0.387
CompoundE 0.501 0.488 0.123 0.279 0.497 0.074 0.783 0.394
Table 3.13: FilteredMRRoneachrelationtypeofWN18RR
Relation Category TransE RotatE CompoundE
similar_to 1-to-1 0.294 1.000 1.000
verb_group 1-to-1 0.363 0.961 0.974
member_meronym 1-to-N 0.179 0.259 0.230
has_part 1-to-N 0.117 0.200 0.190
member_of_domain_usage 1-to-N 0.113 0.297 0.332
member_of_domain_region 1-to-N 0.114 0.217 0.280
hypernym N-to-1 0.059 0.156 0.155
instance_hypernym N-to-1 0.289 0.322 0.337
synset_domain_topic_of N-to-1 0.149 0.339 0.367
also_see N-to-N 0.227 0.625 0.629
derivationally_related_form N-to-N 0.440 0.957 0.956
3.3.6 ComplexRelationModelingandHistogramsofEmbeddingValues
The filtered MRR scores on each relation type of WN18RR are given in Table 3.13. We see that CompoundE
has a significant advantage over benchmarking models for certain 1-N relations such as “member_of_-
domain_usage”(+11.8%)and“member_of_domain_region”(+29.0%)andforsomeN-1relationssuchas
“synset_domain_topic_of” (+8.3%).
Besides the histograms shown in the main paper, we add more plots to visualize CompoundE relation
embeddingvalues. InFig. 3.7,weshowtheembeddingvaluesforthe“Friends”relationintheFB15k-237.
WeusetheCompoundE-fullvariant(∥S
r
·R
r
·T
r
·h−
ˆ
S
r
·
ˆ
R
r
·
ˆ
T
r
·t∥)togeneratetheembedding. Weplot
the translation and scaling components for both the head and the tail. We only show a single plot for the
rotationcomponentsincetherotationparameterissharedbetweentheheadandthetail. Differentfromthe
62
Table 3.14: Complexity comparison of KGEmodels.
Model Ent. emb. Rel. emb. Scoring Function Space #Params
TransE h, t∈R
r∈R
−∥h+ r− t∥
1/2
((+)) 1251M
ComplEx h, t∈C
r∈C
Re
Í
=1
r
h
t
((+)) 1251M
RotatE h, t∈C
r∈C
−∥h◦ r− t∥ ((+)) 1250M
PairRE h, t∈R
r
H
, r
T
∈R
−∥h⊙ r
H
− t⊙ r
T
∥ ((+)) 500M
CompoundE-Head h, t∈R
T[:,− 1],(S)∈R
,∈R
/2
−∥T·R()·S·h− t∥ ((+)) 250.1M
CompoundE-Tail h, t∈R
ˆ
T[:,− 1],(
ˆ
S)∈R
,∈R
/2
−
h−
ˆ
T·
ˆ
R()·
ˆ
S·t
((+)) 250.1M
CompoundE-Full h, t∈R
T[:,− 1],(S)∈R
,∈R
/2
−
T·R()·S·h−
ˆ
T·
ˆ
R()·
ˆ
S·t
((+)) 250.3M
ˆ
T[:,− 1],(
ˆ
S)∈R
63
(a) T (b) R()
(c) S
Figure3.6: Distributionofrelationembeddingvaluesfor“friends”relationinFB15k-237,obtainedusing
∥S
r
·R
r
·T
r
·h− t∥
CompoundE-head (∥S
r
·R
r
·T
r
·h− t∥), we see two modes (instead of only one mode) in CompoundE-full’s
plots. One conjecture for this difference is that CompoundE-full has a pair of operations on both the
head and the tail, the distribution of values need to have two modes to maintain the symmetry. Similar to
CompoundE-head,thescalingparametersofCompoundE-fullhavealargeamountofzerostomaintainthe
singularityof compounding operators andhelplearntheN-to-Ncomplexrelations.
Fig. 3.8 and Fig. 3.9 display the histogram of relation embeddings for “instance_hypernym” relation
and“similar_to”relationinWN18RR,respectively. Thereal(inblue)andtheimaginary(inorange)parts
areoverlaidineachplot. Noticethat“instance_hypernym”isanantisymmetricrelationwhile“similar_to”
64
(a) (b)
(c) (d)
(e)
Figure 3.7: FB15k-237 “Friends” relation embedding obtained using∥S·R·T·h−
ˆ
S·
ˆ
R·
ˆ
T·t∥: (a)
distribution of head translation values, (b) distribution of tail translation values, (c) distribution of head
scalingvalues, (d) distribution of tail scalingvalues,and(e)distributionofrotationanglevalues.
65
(a) (b)
(c) (d)
Figure3.8: WN18RR“instance_hypernym”relation: (a)distributionofheadtranslationvalues,(b)distribution
oftailtranslation values, (c) distribution ofheadscalingvalues,and(d)distributionoftailscalingvalues.
isasymmetricrelation. Thisrelationpatternisreflectedontheembeddinghistogramsincethetranslation
and the scaling histograms for the head and the tail are different in “instance_hypernym”. In contrast, the
translationand scaling histograms for theheadandthetailarealmostidenticalin“similar_to”.
ComplexityAnalysis. WecomparethecomputationalcomplexityofCompoundEandseveralpopular
KGE models in Table 3.14. The last column gives the estimated number of free parameters used by different
models to achieve the best performance for the ogbl-wikikg2 dataset. CompoundE cuts the number of
parameters at least by half while achieving much better performance. In the table,,, and denote the
66
(a) (b)
(c) (d)
Figure 3.9: WN18RR “similar_to” relation: (a) distribution of head translation values, (b) distribution of tail
translationvalues, (c) distribution of headscalingvalues,and(d)distributionoftailscalingvalues.
entity number, the relation number, and their embedding dimension, respectively. Since≫ in most
datasets, we can afford to increase the complexity of relation embedding for better link prediction result
withoutsignificantly increasing the overallmodelcomplexity.
Entity Semantics and Relation Component Values. We provide a 2D-SNE visualization of the entity
embeddinggeneratedbyCompoundEforFB15k-237inFig. 3.4. Eachentityiscoloredwithitsrespective
entitytype. Asshowninthefigure,someentitytypeclassarewellseparatedwhileothersarenot. Thisscatter
plot shows that entity representations extracted by CompoundE can capture the semantics of the entity. Thus,
67
their embeddings can be used in various downstream tasks such as KG entity typing and similarity based
recommendations. InFig. 3.6,wevisualizerelationembeddingforthe“friend”relationinFB15k-237by
plottingthehistogram oftranslation, scaling,androtationparametervalues. Since“friend”isasymmetric
relation, we expect the translation value to be close to zero, which is consistent with Fig. 3.6 (a). Also, since
“friend”isanN-to-Nrelation,weexpecttheCompoundoperationtobesingular. Actually,mostofthescaling
valuesarezero as shown in Fig. 3.6 (c). TheysupportourtheoreticalanalysisofCompoundE’sproperties.
3.3.7 PerformancecomparisonfordifferentvariationsofCompoundE
We investigate the performance difference of CompoundE variants. Specifically, the different forms of
CompoundEhavevisibledifferenceindifferentrelationtypes. WeconductexperimentonYAGO3-10dataset
and compare the performance of CompoundE-left, CompoundE-right, CompoundE-Complete for 1-to-1,
1-to-N,andN-to-1relations. Inparticular,whenevaluatingthe1-to-Nrelations,wefocusonpredicting(?,,)
whileforN-to-1relationswefocusonpredicting(ℎ,, ?) tocorrectlyreflecttheperformanceonrespective
relation types. Performance comparison is shown in 3.10. We observe that for CompoundE-Complete
hasadvantageoverotherformsfor1-to-1relations. CompoundE-leftandCompoundE-rightarethebetter
performing forms for 1-to-N and N-to-1 relations respectively. This observation is consistent with the
discussion of the modeling capability of CompoundE. It still remains a questions that how different order of
operatorcomposition will affect the performanceofCompoundEandwewilladdressthatinfuturework.
3.3.8 ImplementationDetails.
All experimentswereconductedonaNVIDIAV100GPUwith32GBmemory. GPUswithlargermemory
suchasNVIDIAA100(40GB),NVIDIAA40(48GB)areonlyneededforhyperparametersweepwhenthe
dimension,thenegativesamplesize,andthebatchsizearehigh. Weshouldpointoutthatsuchsettingsare
not essential for CompoundE to obtain good results. They were used to search for the optimal configurations.
68
(a)1to1 (b) 1 to n
(c) n to 1
Figure 3.10: ComparingtheperformanceofdifferentCompoundEforms.
Wehaveconsideredthefollowingsetofnumbersasourparametersearchspacetoobtainthebestperformance
wecanforeach dataset and tasks.
Table 3.15: Link predictionsearchspaceoffivehyper-parameters.
Dataset ogbl-wikikg2 FB15k-237 WN18RR
Dim {50, 100, 150, 200, 250, 300, 400} {100, 200, 300, 400} {100, 200, 300, 400}
lr {0.0005, 0.001, 0.005, 0.01} {0.1, 0.5, 1, 5}× 10
−4
{0.1, 0.5, 1, 5}× 10
−4
B {256, 512, 1024, 2048} {256, 512, 1024, 2048} {256, 512, 1024, 2048}
N {256, 512, 1024, 2048} {256, 512, 1024, 2048} {256, 512, 1024, 2048}
{5, 6, 7, 8, 9} {4, 5, 6, 7, 8, 9} {5, 6, 7, 8, 9}
69
Table 3.16: Path query answeringsearchspaceoffivehyper-parameters.
Dataset Freebase WordNet
Dim {500, 1000, 1500, 2000} {500, 1000, 1500, 2000}
lr {1, 2, 5, 10}× 10
−5
{1, 2, 5, 10}× 10
−5
B {512, 1024} {512, 1024}
N {256, 512} {256, 512}
{6, 9, 12, 15} {6, 9, 12, 15}
Table 3.17: Entity typingsearchspaceoffivehyper-parameters.
Dataset FB15k-ET YAGO43k-ET
Dim {500, 1000, 1500} {500, 1000, 1500}
lr {0.1, 0.5, 1, 5}× 10
−4
{0.1, 0.5, 1, 5}× 10
−4
B {1024, 2048, 4096, 8192} {1024, 2048}
N {256, 512, 1024, 2048} {256, 512}
{8, 9, 10, 11} {19, 20, 21, 22}
3.3.9 ComparingCompoundEandSTaR
ThemaindifferencebetweenCompoundEandSTaRisthatSTaRembeddingusesabilinearproductandadopts
asemanticmatchingapproachwhileCompoundE’sscoringfunctionisa distance-based metric. Becauseof
this,theoptimizationstrategyforCompoundEistheself-adversrialnegativesamplinglosswhereasSTaR
usestheregularizedcross-entropyloss. Moreimportantly,CompoundEembeddinghasclearandintuitive
geometricinterpretationswhereasthedesignofSTaRislessintuitivesinceitisunclearwhatcomposition
of operators means in the context of a bilinear product. We also shed light on the superior capability of
CompoundEtomodelrelationcompositionsandentitysemanticsthroughPQAandentitytypingexperiments.
Lastly, we can incorporate reflection and shear operators below who also belong to the affine operator family.
Thereflection matrix can be defined as
F=
cos() sin() 0
sin() − cos() 0
0 0 1
, (3.36)
70
Datasets ogbl-wikikg2
Metrics Dim
Valid Test
MRR MRR Hit@1 Hit@3 Hit@10
∥h−
ˆ
S·
ˆ
T·
ˆ
R·t∥ 100 0.6704 0.6515 0.5843 0.6781 0.7872
∥h−
ˆ
S·
ˆ
T·
ˆ
F·
ˆ
R·t∥ 100 0.6694 0.6509 0.5844 0.6760 0.7865
∥h−
ˆ
S·
ˆ
H
x
·
ˆ
H
y
·
ˆ
T·
ˆ
R·t∥ 100 0.6701 0.6539 0.5865 0.6805 0.7906
Table 3.18: Preliminary comparisonafteraddingreflectionandshearoperators.
Andtheshear matrices on two different directionscanbedefinedas
H
=
1 tan(
) 0
0 1 0
0 0 1
, (3.37)
H
=
1 0 0
tan(
) 1 0
0 0 1
, (3.38)
We have done preliminary experiments on Wikikg2 but reflection and shear operators and result is shown
inTable3.18. We will further improve theresultinthefuture.
3.4 ConclusionandFutureWork
AnewKGEmodelcalledCompoundEwasproposedinthiswork. Weshowedthatquiteafewdistance-based
KGE models are special cases of CompoundE. Extensive experiments were conducted for three datasets
to demonstrate the effectiveness of CompoundE. CompoundE achieves state-of-the-art link prediction
performance with a memory saving solution for large KGs. Visualization of entity semantics and relation
embeddingvalues was given to shed lightonthesuperiorperformanceofCompoundE.
71
We are interested in exploring two topics as future extensions. First, we may consider more complex
operations in CompoundE. For example, there is a recent trend to extend 2D rotations to 3D rotations for
rotation-based embeddings such as RotatE3D [45], SU2E [180]. It is worthwhile to explore CompoundE3D.
Second, CompoundE is expected to be useful in many downstream tasks. This conjecturehas to be verified.
Ifthisisthecase, CompoundE can offer a lowmemorysolutiontothesetasksinrealisticsettings.
72
Chapter4
KnowledgeGraphEmbeddingwith3DCompoundGeometric
Transformations
4.1 Introduction
Knowledgegraphs(KGs)findrichapplicationsinknowledgemanagementanddiscovery[186,118,189],
recommendation systems [147, 207], fraud detection [191, 211], chatbots [59, 2], etc. KGs are directed
relationalgraphs. Theyareformedbyacollectionoftriplesinformof(ℎ,,),whereℎ,,and denotehead,
relation, and tail, respectively. Heads and tails are called entities and represented by nodes while relations are
links in KGs. KGs are often incomplete. One critical task in knowledge graph (KG) management is “missing
link prediction". Knowledge graph embedding (KGE) methods have received a lot of attention in recent years
due to their effectiveness in missing link prediction. Many KGs such as DBpedia [3], YAGO [129], Freebase
[8],NELL[15],Wikidata[144],andConceptNet[127]havebeencreatedandmadepubliclyavailablefor
KGEmodeldevelopment and evaluation.
OnefamilyofKGEmodelsbuildsahigh-dimensionalembeddingspace,whereeachentityisavector.
The relation is modeled by a certain geometric manipulation such as translation and rotation. To evaluate
the likelihoodof a candidatetriple, the geometric manipulationassociated with therelation is applied to the
head entity and then the distance between the manipulated head and the tail is measured. The shorter the
73
distance, thehigher likelihood of thetriple. To thisend, these KGEmodels are calleddistance-based KGEs.
Examplesofdistance-basedKGEsincludeTransE[9],RotatE[133],andPairRE[18]. Eachofthemusesa
single geometric transformation to represent relations between entities. Specifically, translation, rotation, and
scalingoperations are adopted by TransE,RotatE,andPairRE,respectively.
Theabove-mentionedKGEmodelsachievereasonablygoodperformanceinlinkpredictionwithonlya
singlegeometrictransformation. Thecascadeofmultiple2Dgeometrictransformationsoffersapowerful
tool in image manipulation [111]. This idea was exploited to develop a new KGE model, called CompoundE,
in [50]. TransE, RotatE and PairRE are all degenerate cases of CompoundE. Thus, CompoundE outperforms
them in linkprediction performance. CompoundEunifies translation, rotation, and scaling operations under
one common framework. It has several mathematically provable properties that facilitate the modeling of
differentcomplexrelationtypesinKGE.Theeffectivenessofthesecompositeoperatorshasbeensuccessfully
demonstrated through extensive experiments and applications in downstream tasks such as entity typing and
multihop query answering in [50]. Furthermore, borrowed from the concept of rotation in the 3D space,
Rotate3D [45, 180, 94] achieves more effective parameterization and endows a model with greater modeling
power than RotatE based on 2D rotation. That is, Rotate3D can model non-commutative relations better than
RotatE.
Inspired by the success of CompoundE and Rotate3D, we wonder whether it would be beneficial to look
forcompound geometric transformationsinthe3DspaceintheKGEmodeldesign.
4.2 ProposedMethod
4.2.1 CompoundE3D
In this work, we use 3D affine transformations, including Translation, Scaling, Rotation, Reflection, and
Shearasillustratedin4.1, tomodeldifferentrelationsinKGs. Thislargesetoftransformationoperatorsoffer
74
(a)Translation (b) Scaling
(c)Rotation (d) Reflection
(e)Shear (f) Compound
Figure 4.1: Composing differentgeometricoperationsinthe3Dsubspace.
75
immenseflexibilityintheKGEdesignagainstdifferentcharacteristicsofKGdatasets. Below,weformally
defineeachofthe 3D affine operators in homogeneouscoordinates.
4.2.1.1 Translation
Component T∈ SE(3), illustrated by Fig. 4.1a,isdefinedas
T=
1 0 0
0 1 0
0 0 1
0 0 0 1
, (4.1)
4.2.1.2 Scaling
Component S∈ Aff (3), illustrated by Fig. 4.1b,isdefinedas
S=
0 0 0
0
0 0
0 0
0
0 0 0 1
, (4.2)
4.2.1.3 Rotation
Component R∈ SO(3), illustrated by Fig. 4.1c,isdefinedas
R= R
()R
()R
()=
0
0
ℎ 0
0 0 0 1
, (4.3)
76
where
= cos() cos(),
= cos() sin() sin()− sin() cos(),
= cos() sin() cos()+ sin() sin(),
= sin() cos(),
= sin() sin() sin()+ cos() cos(),
= sin() sin() cos()− cos() sin(),
=− sin(),
ℎ= cos() sin(),
= cos() cos().
(4.4)
This general 3D rotation operator is the result of compounding yaw, pitch, and roll rotations. They are,
respectively,defined as
• Yawrotation component:
R
()=
cos() − sin() 0 0
sin() cos() 0 0
0 0 1 0
0 0 0 1
, (4.5)
• Pitchrotation component:
R
()=
cos() 0 − sin() 0
0 1 0 0
sin() 0 cos() 0
0 0 0 1
, (4.6)
77
• Rollrotation component:
R
()=
1 0 0 0
0 cos() − sin() 0
0 sin() cos() 0
0 0 0 1
. (4.7)
4.2.1.4 Reflection
Component F∈ SO(3), illustrated by Fig. 4.1d,isdefinedas
F=
1− 2
2
−2
−2
0
−2
1− 2
2
−2
0
−2
−2
1− 2
2
0
0 0 0 1
. (4.8)
TheaboveexpressionisderivefromtheHouseholder reflection, F= I− 2nn
T
. Inthe 3Dspace, nisa3-D
unitvectorthat is perpendicular to the reflectinghyper-plane, n=[
,
,
].
4.2.1.5 Shear
Component H∈ Aff (3), illustrated by Fig. 4.1e,isdefinedas
H= H
H
H
=
1 Sh
Sh
0
Sh
1 Sh
0
Sh
Sh
1 0
0 0 0 1
. (4.9)
78
Theshearoperatoristheresultofcompounding3operators: H
, H
,and H
Theyaremathematically
definedas
H
=
1 0 0 0
Sh
1 0 0
Sh
0 1 0
0 0 0 1
, (4.10)
H
=
1 Sh
0 0
0 1 0 0
0 Sh
1 0
0 0 0 1
, (4.11)
H
=
1 0 Sh
0
0 1 Sh
0
0 0 1 0
0 0 0 1
. (4.12)
Matrix H
has a physical meaning - the shear transformation that shifts the- and- components by a factor
ofthe component. Similar physical interpretationsareappliedto H
and H
.
Theabove transformations can be cascadedtoyieldacompoundoperator;e.g.,
O= T·S·R·F·H, (4.13)
79
In the actual implementation, we use the operator’s representation in regular Cartesian coordinate instead of
the homogeneous coordinate. Furthermore, a high-dimensional relation operator can be represented as a
blockdiagonal matrix in the form of
M
r
= diag(O
r,1
, O
r,2
,..., O
r,n
), (4.14)
where O
r,i
is the compound operator at the-thstage. Wecanmultiply M
r
·v inthefollowingmanner,
O
,1
0 ... 0
0 O
,2
... 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 ... O
,
1
1
1
2
2
2
.
.
.
. (4.15)
where v=[
1
,
1
,
1
,
2
,
2
,
2
,...,
,
,
]
are 3 dimensional entity vectors that are split into multiple
3dsubspaces.
Wecandefine the following three scoringfunctionsforCompoundE3D:
80
• CompoundE3D-Head
(ℎ)
(ℎ,)=∥M
r
·h− t∥, (4.16)
• CompoundE3D-Tail
()
(ℎ,)=∥h−
ˆ
M
r
·t∥, (4.17)
• CompoundE3D-Complete
(ℎ,)
(ℎ,)=∥M
r
·h−
ˆ
M
r
·t∥, (4.18)
where h and t denoteheadand tailentityembeddings, and M
r
and
ˆ
M
r
denotethe relation-specificoperators
thatoperateon head and tail entities, respectively.
Generally speaking, we have five different affine operations available to use, i.e. translation, scaling,
rotation,reflection,andshear. Eachoperatorcanbeappliedto1)headentity,2)tailentity,or3)bothhead
and tail. Hence, we have in total 15 different ways of applying operators at each stage. All these possible
choicesarecalledCompoundE3Dvariants. ForagivenKGdataset,thereisahugesearchspaceinfindingthe
optimalCompoundE3Dvariant. Itisessential to develop a simple yeteffectivemechanism to find a variant
thatgivesthe best performance under a certaincomplexityconstraint.
4.2.2 BeamSearchforBestCompoundE3DVariant
Inthissubsection,wepresentabeamsearchalgorithmtofindtheoptimalCompoundE3Dvariant. Forthe
-thstage,the set of all operator pairs that canbeappliedatacertainstepis
P∈{(T, I),(S, I),(R, I),(F, I),(H, I),
(I,
ˆ
T),(I,
ˆ
S),(I,
ˆ
R),(I,
ˆ
F),(I,
ˆ
H),
(T,
ˆ
T),(S,
ˆ
S),(R,
ˆ
R),(F,
ˆ
F),(H,
ˆ
H)},
(4.19)
81
where I is the identity operator. First, we apply all operator pairs in P and calculate scoring functions for
allintermediatevariants. Eachvariantisoptimizedwith iterationsusingthetrainingsetanditsperformance
is evaluated on the validation dataset. Then, we choose the top- best-performing variants as starting points
forfurtherexplorationinthenextstep. Thesameprocessisrepeateduntiloneofterminatingconditionsis
triggered. Afterward,weproceedtothe(+ 1)-thstage. Thewholesearchiscompletedafterthefinalstageis
reached. The total number of stages is a userselectedhyper-parameter.
The beam search process in building more complex KGE models from simpler ones is described in
Algorithm1. Additional comments are givenbelow.
• We initialize the algorithm by setting up a loop to iterate over the set, P, of all possible operator
combinations to train and evaluate themandfindthetop- variantsasstartingpoints.
• In the next loop, we have two stopping criteria to terminate the beam search: 1) # operators > ,
meaningthatwestopthesearchwhenthenumberofoperatorsexceedstheupperbound;2)
ΔMRR
ΔParam
<,
meaningthattheratioofincreaseinMRRversustheincreaseinfreeparametersfallbelowthethreshold
, and it is no longer worthwhile to increase the model complexity for the marginal gain in model
performance.
• P× WdenotestheCartesianproductbetweentheoperatorpairsset Pandtop- variantsset Wfromthe
last step while
¤
−1
(ℎ,)(M
,
ˆ
M
) denotes applying the operator pair(M
,
ˆ
M
) to previous optimal
scoring function
¤
−1
(ℎ,).
• Forexample, if
¤
−1
(ℎ,)=∥R·h− t∥ and(M
,
ˆ
M
)=(S,
ˆ
S),then
˜
(ℎ,)=∥S·R·h−
ˆ
S·t∥.
• After the loop terminates due to any terminating condition is triggered, we select the top-1 performing
variant from the explored variants set, W,asthebestchoice.
82
Algorithm1Beam Search for Best CompoundE3DVariant
initialize← 1, U←{}
for(M
,
ˆ
M
)∈ P do
˜
(ℎ,)←∥M
·h−
ˆ
M
·t∥;
train
˜
(ℎ,) for iterations;
MRR←evaluate
˜
(ℎ,) with valid set;
U.insert({MRR,
˜
(ℎ,)});
endfor
W← top- variants from U
←+ 1
ΔMRR←,ΔParam← 1
while # operators< and max
ΔMRR
ΔParam
≥ do
initialize V←{}
for {(M
,
ˆ
M
),
¤
−1
(ℎ,)}∈ P× W do
˜
(ℎ,)←
¤
−1
(ℎ,)(M
,
ˆ
M
);
train
˜
(ℎ,) for iterations;
evaluate
˜
(ℎ,) with valid set;
ΔMRR←
˜
(ℎ,) MRR−
¤
−1
(ℎ,) MRR;
ΔParam←
˜
(ℎ,) Param−
¤
−1
(ℎ,) Param;
V.insert(MRR,ΔMRR,ΔParam,
˜
(ℎ,));
endfor
W←top- variants from V;
endwhile
∗
(ℎ,)←best variant from W;
4.2.3 ModelEnsembles
4.2.3.1 Weighted-Distances-Sum (WDS)Strategy
We choose the top- performing CompoundE3D variants and conduct a weighted average of their predicted
scores. Thefollowing three weighting schemesareconsidered.
• Uniform Weights. This scheme takesanequalweightofselected variantsas
ˆ
(ℎ,)=
1
∑︁
=1
(ℎ,), (4.20)
where
(ℎ,) is the scoring functionforthe-thvariant.
83
Figure 4.2: The ensembleofmultipleCompoundE3Dvariants.
• GeometricWeights. ThisschemesortsthevariantsbasedontheirMRRperformanceonthevalidation
dataset in a descending order and assignweight
, 0<< 1,tothe-thvariant. Thatis,
ˆ
(ℎ,)=
1
Í
=1
∑︁
=1
(ℎ,). (4.21)
Since
>
+1
, we assign a higher weight to a better performer in computing the aggregated distance.
84
• Learnable Weights. This scheme trains a set of learnable weights,
> 0, based on the training
dataset to minimize the following weightedscore:
ˆ
(ℎ,)=
1
Í
=1
∑︁
=1
(ℎ,). (4.22)
The learnable weights are implemented as parameters in the optimization process under the same
learning rate and the optimizer in findingthebestvariants.
For eachrelation, we compare the three weight schemes and choose theone that offers the best performance.
4.2.3.2 Rank Fusion Strategy
Link prediction is a list ranking problem. Rank fusion can be exploited to boost the performance. A few
simple rank fusion methods can be applied to score-based KGE methods. For example, we can take the
maximum,minimum,median,sum,andL2distanceofcandidates’ranks. TheyaredenotedbyCombMAX,
CombMIN, CombMEDIAN, CombSUM, and Euclidean in Table 4.1, respectively. Three advanced rank
fusion methods are also considered and included in the table. They are: Borda Count [141], Reciprocal Rank
Fusion(RRF)[28],andRBC(RankBiasedCentroid)[4]. BordaCountawardspointstocandidatesbased
ontheirpositionsinanindividualpreferencelist,wherethetopcandidategetsthemostpointsandthelast
candidategetstheleastpoints. RRFaggregatesthereciprocalranktodiscounttheimportanceoflower-ranked
candidates. Thefactor inthetablemitigatestheimpactofhighrankingsbyoutliers. RBCdiscountsthe
weightsoflower-rankedcandidatesusingageometricdistribution. Themathematicalformulasofallrank
fusion functions are given in the second column of Table 4.1, where
is the rank of the-th base model (or
variant), 1≤≤,∈ represents an entityintheentityset,and and arehyper-parameters.
85
Table 4.1: A list ofrankfusionfunctionsunderconsideration.
Name Function
CombMAX max{
1
(),···,
()}
CombMIN min{
1
(),···,
()}
CombMEDIAN median{
1
(),···,
()}
CombSUM
∑︁
=1
()
Euclidean
√︁
1
()
2
+···+
()
2
Borda Count
∑︁
=1
||−
()+ 1
||
RRF [28]
∑︁
=1
1
+
()
RBC [4]
∑︁
=1
(1−)
()−1
4.2.4 Optimization
By following RotatE’s negative sampling loss and the self-adversarial training strategy, we choose the
followingloss function of CompoundE3D
KGE
= − log(
1
−
(ℎ,)) (4.23)
−
∑︁
=1
(ℎ
′
,,
′
) log(
(ℎ
′
,
′
)−
1
),
where isthesigmoidfunction,
1
isapresetmarginhyper-parameter,(ℎ
′
,,
′
) isthe-thnegativetriple,
and(ℎ
′
,,
′
) istheprobabilityofdrawingnegativetriple(ℎ
′
,,
′
). Givenapositivetriple,(ℎ
,,
),the
negativesampling distribution is
(ℎ
′
,,
′
|{(ℎ
,,
)})=
exp
1
(ℎ
′
,
′
)
Í
exp
1
(ℎ
′
,
′
)
, (4.24)
where
1
isthe temperature in the softmaxfunction.
86
4.3 Experiments
Table 4.2: Statisticsoffourlinkpredictiondatasets.
Dataset #Entities #Relations #Training #Validation #Test Ave. Degree
DB100K 99,604 470 597,572 50,000 50,000 12
ogbl-wikikg2 2,500,604 535 16,109,182 429,456 598,543 12.2
YAGO3-10 123,182 37 1,079,040 5,000 5,000 9.6
WN18RR 40,943 11 86,835 3,034 3,134 2.19
4.3.1 ExperimentalSetup
4.3.1.1 Datasets
WeevaluatethelinkpredictionperformanceofCompoundE3Dandcompareitwithseveralbenchmarking
methodsonthe following four KG datasets.
• DB100K [34]. It is a subset of the DBpedia KG. The dataset contains information related to music
content suchas genre,band, and musicalartisits. ItisarelativelydenseKGsinceeachentityappears
inatleast 20 different relations.
• YAGO3-10 [98]. It is a subset of YAGO3, which describes citizenship, gender, and profession of
people. YGGO3-10 contains entitiesassociatedwithatleast10differentrelations.
• WN18RR [9, 32]. It is a subset of the WordNet lexical database. The inverse relation is removed from
WN18RR to avoid test leakage.
• Ogbl-Wikikg2 [62]. It is extracted from Wikipedia. It contains 2.5M entities and is the largest one
among the four selected datasets.
Thestatistics of the four KG datasets aregiveninTable4.2.
87
Table 4.3: Thesearchspace of sixhyper-parameters.
Dataset DB100K ogbl-wikikg2 YAGO3-10 WN18RR
Dim {150, 300, 450, 600} {90, 150, 180, 240, 300} {450, 600, 750, 900} {180, 240, 360, 480, 600}
lr {2, 3, 4, 5, 6, 7, 8, 9}× 10
−5
{0.0005, 0.001, 0.005, 0.01} {3, 4, 5, 6, 7}× 10
−4
{4, 5, 6, 7, 8}× 10
−4
B {256, 512, 1024, 2048} {2048, 4096, 8192} {512, 1024, 2048, 4096} {512, 1024, 2048, 4096}
N {256, 512, 1024, 2048} {125, 250, 500} {256, 512, 1024, 2048} {256, 512, 1024, 2048}
{4, 5, 6, 7, 8, 9, 10, 11, 12, 13} {5, 6, 7, 8, 9} {11, 12, 13, 13.1, 13.3, 13.5} {5, 6, 7, 8, 9}
{0.5, 0.7, 0.9, 1.0, 1.2} {0.5, 1.0} {0.8, 0.9, 1.0, 1.1, 1.2} {0.5, 0.7, 0.9, 1.0, 1.2}
88
Table 4.4: Optimal configurations for link prediction tasks, where B and N denote the batch size and the
negativesample size, respectively.
Dataset CompoundE3DVariant #Dim lr B N
DB100K ∥S·h−
ˆ
T·
ˆ
R·
ˆ
S·t∥ 600 0.00005 1024 512 9 1
ogbl-wikikg2 ∥T·h−
ˆ
H·t∥ 300 0.001 8192 125 8 1
YAGO3-10 ∥T·S·R·h− t∥ 600 0.0005 1024 1024 13.3 1.1
WN18RR ∥R·S·T·h− t∥ 480 0.00005 512 256 6 1
4.3.1.2 Evaluation Protocol
The commonly used evaluation protocol for the link prediction task is explained below. For every triple
(ℎ,,) in the test set, we corrupt either the head entityℎ or tail entity to generate test examples(?,,)
and(ℎ,, ?). Then, for every head candidate that forms triple(
ˆ
ℎ,,) and tail candidate that forms triple
(ℎ,, ˆ ), we compute distance-based scoring functions
(
ˆ
ℎ,) and
(ℎ, ˆ ), respectively. The lower score
value indicates that the generated triple is more likely to be true. Then, we sort scores of all candidate triples
inascendingorderandlocatetherankofthegroundtruthtriple. Furthermore,weevaluatethelinkprediction
performanceunderthefilteredranksetting[9]thatgivessaliencetounseentriplepredictionssinceembedding
models tend to give observed triples better ranks. We adopt the Hits@ and the mean reciprocal rank (MRR)
asevaluation metrics to compare the qualityofKGEmodels.
4.3.1.3 Hyper-parameter Search
We perform an extensive search on six hyper-parameters of CompoundE3D with respect to different KG
datasets. Theyare: 1)thedimensionoftheembeddingspace(Dim),2)thelearningrate(lr),3)thebatchsize
(B), 4) the negative sample size (N), 5) the margin hyper-parameter (), and 6) the sampling temperature ().
Theirsearchvalues are listed in Table 4.3.
In the search process, we first compute scoring functions with a certain hyper-parameter setting that
allows a few variants to have decent performance, where the number of training iterations for each variant is
set to= 30000. After locatingthe optimalvariant,wefinetunehyper-parametersundertheoptimalvariant.
89
TheoptimalconfigurationsareshowninTable4.4. TheAdamoptimizer[76]isemployedforallparameter
tuning. Forensemble experiments, we adoptthesameoptimalconfigurationforeachbasevariantmodel.
4.3.1.4 Other Implementation Details
We run experiments and perform hyper-parameter tuning on a variety of GPUs, including Nvidia P100 (16G),
V100(32G),A100(40G)andA40(48G),dependingontheGPUmemoryrequirementofajob. Typically,
werequest8CPUcoreswithlessthan70GRAMforeachjob. ResultsofeachoptimalconfigurationinTable
4.4canbereproducedononesingleV100foralldatasets. FortheWN18RRdataset,weadopttherotation
implementation from Rotate3D [45].
Table4.5: Comparison of the link predictionperformanceunderthefilteredranksettingforDB100k.
Datasets DB100K
Model MRR H@1 H@3 H@10
TransE [9] 0.111 0.016 0.164 0.27
DistMult [174] 0.233 0.115 0.301 0.448
HolE [106] 0.26 0.182 0.309 0.411
ComplEx [140] 0.242 0.126 0.312 0.44
Analogy [91] 0.252 0.142 0.323 0.427
RUGE [55] 0.246 0.129 0.325 0.433
ComplEx-NNE+AER[34] 0.306 0.244 0.334 0.418
SEEK [171] 0.338 0.268 0.37 0.467
AcrE (Parallel)[116] 0.413 0.314 0.472 0.588
PairRE [18] 0.412 0.309 0.472 0.600
TransSHER [86] 0.431 0.345 0.476 0.589
CompoundE [50] 0.405 0.306 0.461 0.588
CompoundE3D 0.450 0.373 0.488 0.594
CompoundE3DRRF 0.457 0.376 0.497 0.607
CompoundE3DWDS 0.462 0.378 0.506 0.616
4.3.2 ExperimentalResults
4.3.2.1 Performance Evalution
WecomparethelinkpredictionperformanceofafewbenchmarkingKGEmethodswiththatofCompoundE3D
using the optimal configuration given in Table 4.4. The performance benchmarking results for DB100K,
90
Table4.6: Comparison of the link predictionperformanceunderthefilteredranksettingforogbl-wikikg2.
Datasets ogbl-wikikg2
Metrics Dim
Valid Test
MRR MRR
AutoSF+NodePiece 100 0.5806 0.5703
ComplEx-N3-RP 100 0.6701 0.6481
TransE[9] 500 0.4272 0.4256
DistMult[174] 500 0.3506 0.3729
ComplEx[140] 250 0.3759 0.4027
RotatE [133] 250 0.4353 0.4353
Rotate3D[45] 100 0.5685 0.5568
PairRE[18] 200 0.5423 0.5208
TripleRE[185] 200 0.6045 0.5794
CompoundE[50] 100 0.6704 0.6515
CompoundE3D
90 0.6994 0.6826
180 0.7146 0.6962
300 0.7175 0.7006
Table4.7: Comparison of the link predictionperformanceunderthefilteredranksettingforYAGO3-10.
Datasets YAGO3-10
Metrics MRR Hit@1 Hit@3 Hit@10
DistMult [174] 0.34 0.24 0.38 0.54
ComplEx [140] 0.36 0.26 0.4 0.55
DihEdral [167] 0.472 0.381 0.523 0.643
ConvE [32] 0.44 0.35 0.49 0.62
RotatE [133] 0.495 0.402 0.55 0.67
InteractE [142] 0.541 0.462 - 0.687
HAKE [201] 0.545 0.462 0.596 0.694
DensE [94] 0.541 0.465 0.585 0.678
Rot-Pro [126] 0.542 0.443 0.596 0.699
CompoundE [50] 0.477 0.376 0.538 0.664
CompoundE3D 0.542 0.450 0.602 0.701
CompoundE3D RRF 0.541 0.446 0.607 0.707
CompoundE3D WDS 0.551 0.463 0.608 0.703
ogbl-wikikg2, and YAGO3-10 datasets are shown, respectively, in Table 4.5, Table 4.6, and Table 4.7.
Furthermore, the best and the second-best results in each column are indicated by the boldface font and with
anunderline, respectively. CompoundE3D hassignificantperformanceimprovement overCompoundEand
other recent models. We see a clear advantage of CompoundE3D by including more affine operators and
extendingaffine transformations from 2D to3Dinthenewframework,
91
Table4.8: Comparison ofdifferent weighted-distances-sum (WDS) strategies for DB100K and YAGO3-10.
Datasets DB100K YAGO3-10
Strategies MRR H@1 H@3 H@10 MRR H@1 H@3 H@10
Learnable Weights 0.462 0.378 0.506 0.616 0.545 0.451 0.586 0.696
Uniform Weights 0.460 0.376 0.503 0.614 0.551 0.463 0.608 0.703
Geometric Weights 0.446 0.348 0.503 0.618 0.531 0.439 0.580 0.691
Table4.9: Performance comparison ofdifferentrankfusionmethodsforDB100KandYAGO3-10.
Datasets DB100K YAGO3-10
AggregationFunction MRR Hit@1 Hit@3 Hit@10 MRR Hit@1 Hit@3 Hit@10
CombMAX 0.455 0.375 0.496 0.603 0.536 0.440 0.600 0.701
CombMIN 0.452 0.369 0.496 0.606 0.527 0.427 0.597 0.702
CombMEDIAN 0.456 0.376 0.497 0.606 0.541 0.445 0.605 0.705
CombSUM 0.456 0.376 0.497 0.606 0.540 0.446 0.606 0.704
Euclidean 0.455 0.375 0.496 0.605 0.540 0.445 0.603 0.702
Borda 0.456 0.376 0.497 0.606 0.540 0.446 0.606 0.704
RRF 0.457 0.376 0.497 0.607 0.541 0.446 0.607 0.707
RBC 0.456 0.376 0.497 0.606 0.540 0.445 0.604 0.703
Table 4.10: AblationstudyonCompoundE3DforDB100K.
Datasets DB100K
Variant MRR Hit@1 Hit@3 Hit@10
∥S·h−
ˆ
S·t∥ 0.417 0.323 0.471 0.590
∥S·h−
ˆ
R·
ˆ
S·t∥ 0.447 0.364 0.492 0.600
∥S·h−
ˆ
T·
ˆ
R·
ˆ
S·t∥ 0.450 0.373 0.488 0.594
Table 4.11: AblationstudyonCompoundE3DforYAGO3-10.
Datasets YAGO3-10
Metrics MRR Hit@1 Hit@3 Hit@10
∥R·h− t∥ 0.496 0.402 0.547 0.676
∥S·R·h− t∥ 0.501 0.404 0.554 0.680
∥T·S·R·h− t∥ 0.542 0.450 0.602 0.701
Toverifytheeffectivenessofmodelensembles,weexaminetwodifferentensemblestrategiesforDB100K
andYAGO3-10 datasets.
• FortheDB100Kdataset,weselectthebesttwoperformingvariants. Theyare∥S·h−
ˆ
R·
ˆ
S·t∥ and
∥S·h−
ˆ
T·
ˆ
R·
ˆ
S·t∥
92
• FortheYAGO3-10dataset,weselectthebestthreeperformingvariants. Theyare∥T·S·R·h− t∥,
∥S·R·T·h− t∥, and∥S·T·R·h− t∥.
4.3.2.2 Model Ensembles
As discussed in Sec. 4.2.3, we have two strategies to conduct model ensembles: weighted-distances-sum
(WDS) and rank fusion. Among the three WDS strategies, the learnable weight strategy is the most effective
one for DB100K while the uniform weight performs the best for YAGO3-10. We use CompoundE3D WDS
to denote the best WDS scheme in Tables 4.5 and 4.7 and document the performance of other weighting
strategies in Table 4.8. Among all eight rank fusion strategies, we observe that reciprocal rank fusion (RRF)
is the most effective one for both DB100K and YAGO3-10. Thus, we use CompoundE3D RRF to denote the
bestrankfustionschemeinTables4.5and4.7,anddocumenttheperformanceofotherrankfushionstrategies
inTable4.9.
4.3.2.3 Effectiveness of Beam Search
(a)DB100K (b) YAGO3-10
Figure 4.3: The distribution of the MRR performance versus the operator number of various model variants
forthedifferent datasets.
93
We conduct ablation studies on DB100K and YAGO3-10 datasets to shed light on the effects of different
transformation operators on model performance. We begin with the variant of the simplest configuration
and add additional operators at each stage. Good simple models that lead to optimal variants and their
performance numbers are reported in Tables 4.10 and 4.11. Furthermore, we visualize the distribution of
theMRRperformanceasmoreoperatorsare added with respect to DB100K and YAGO-3 in Figs. 4.3aand
4.3b,respectively. Tointerpretboxplots,yellowbarrepresentsthemedian,boxrepresentstheinterquantile
range,twoend-barsdenotethelowerandupperwhiskers,andlastlydotsareoutliers. Theybothshowthe
effectivenessof the proposed beam search algorithm.
Table 4.12: Comparison of filteredMRRperformanceoneachrelationtypeofWN18RR.
Relations Types Khs
TransE RotatE CompoundE CompoundE3D
similarto 1-to-1 0.07 -1.00 0.294 1.000 1.000 1.000
verbgroup 1-to-1 0.07 -0.50 0.363 0.961 0.974 0.898
membermeronym 1-to-N 1.00 -0.50 0.179 0.259 0.230 0.246
haspart 1-to-N 1.00 -1.43 0.117 0.200 0.190 0.202
memberofdomain usage 1-to-N 1.00 -0.74 0.113 0.297 0.332 0.378
memberofdomain region 1-to-N 1.00 -0.78 0.114 0.217 0.280 0.413
hypernym N-to-1 1.00 -2.64 0.059 0.156 0.155 0.182
instancehypernym N-to-1 1.00 -0.82 0.289 0.322 0.337 0.356
synset domain topicof N-to-1 0.99 -0.69 0.149 0.339 0.367 0.396
also see N-to-N 0.36 -2.09 0.227 0.625 0.629 0.622
derivationallyrelated form N-to-N 0.07 -3.84 0.440 0.957 0.956 0.959
Table4.13: Complexity comparison ofKGEmodelsonogbl-wikikg2underasimilartestingMRR.
Models No. ofParameters
TransE [9] 1,251M
DistMult[174] 1,251M
ComplEx[140] 1,251M
ComplEx-RP[26] 250.1M
RotatE [133] 500M
RotatE3D[45] 750.4M
PairRE[18] 500M
CompoundE[50] 250.1M
CompoundE3D 225.2M
94
Figure4.4: Effects of rotation and reflectionoperators on symmetricrelations.
95
Figure 4.5: Illustration of CompoundE3D’scapabilityinmultiplicitymodeling.
4.3.2.4 Modeling of Symmetric Relations
Rotation and reflection are isometric operations. As stated in [133, 193], their 2D versions can handle
symmetricrelationswellinsomecases. Itisourconjecturethatthesamepropertyholdsfortheircorresponding
3D operators. To check it, we perform ablation studies and evaluate the base scoring functions of those
withonlytranslationandscalingversusthosewithrotationandreflectionaswell. TheMRRperformance
numbersofdifferentmodelvariantsforsymmetricandasymmetricrelationsinDB100Karecomparedin
Fig. 4.4. Inthisfigure,wechoosethemostfrequentlyobservedrelationtypesformeaningfulcomparison.
Asexpected,rotationandreflectionoperatorsindeedbringmoresignificantperformanceimprovementon
symmetric relations than asymmetric relations. This supports our conjecture that rotation and reflection
operatorsare intrinsically advantageous forthemodelingofsymmetricrelations.
96
Figure4.6: Comparing different model’sMRRperformancemetricacrossdifferentdimensions.
4.3.2.5 Modeling of Multiplicity
Multiplicityisthescenariowheremultiplerelationsco-existbetweentwoentities;namely,triples(ℎ,
1
,),...,(ℎ,
,)
hold simultaneously. Generally, it is challenging to model multiplicity in traditional KGE models due to their
limitedpowerinrelationalmodeling. Incontrast, CompoundE3Discapableofmodelingmultiplicityrelation
patterns well since it can use multiple distinct sets of transformations that map from the head to the tail. We
presenttwoexamplestoillustrateCompoundE3D’scapabilityinmodelingmultiplicityrelationsinFig. 4.5.
Theyaretakenfromtheactuallinkprediction examples inDB100K. Thereare three differentrelationsheld
forafixed(head,tail)pair. The topthreetailpredictionsforeachrelationinthetwoexamplesareshownin
thefigure. We see that CompoundE3D canhandlemultiplicitywellduetoitsrichsetofvariants.
97
4.3.2.6 Modeling of Hierarchical Relations
We would like to investigate CompoundE3D’s capability in modeling hierarchical relations. WN18RR offers
arepresentativedatasetcontaininghierarchicalrelations. Twometricscanbeusedtomeasurethehierarchical
behavior of relations [16]: 1) the Krackhardt score denoted by Khs
, and 2) the curvature estimate denoted
by
. If relation has a high Khs
score and a low
score, then it has a stronger hierarchical behavior,
and vice versa. We compare the filtered MRR performance of different baseline models, such as TransE,
RotatE, CompoundE (2D version), and CompoundE3D in Table 4.12. In the same table, we also list the
Khs
and
valuesforeachrelationtoseewhetherithasastrongerhierarchicalbehavior. Weseefrom
the table that CompoundE and CompoundE3D have better performance than TransE and RotatE in almost all
relations. Furthermore,CompoundE3DoutperformsCompoundEinallhierarchicalrelationsexcept“member
meronym". ThisresultindicatesthatCompoundE3Dcanmodelhierarchicalrelationsmoreeffectivelythan
CompoundEby including more diverse 3Dtransformations.
4.3.2.7 Model Efficiency
It is important to investigate the relationship between the model performance and the model dimension.
The model dimension reflects memory and computational complexities. To illustrate the advantage of
CompoundE3D over prior models across a wide range of embedding dimensions, we plot the MRR
performanceoflinkprediction ontheWikikg2datasetinFig. 4.6,where thedimensionvaluesareset to 12,
24, 48, 102, 150, 198, 252, and 300. We see from the figure that CompoundE3D consistently outperforms all
benchmarkingmodelsinalldimensions. Furthermore,weanalyzethecomplexityofdifferentKGEmodelsin
termsofthenumberoffreeparameters. Table4.13comparesthenumberoffreeparametersofdifferentKGE
modelsforthe ogbl-wikikg2 dataset.
We refer to the ogbl-wikikg2 leaderboard when reporting the number of free parameters used by baseline
models. ThereportednumberofparametersforCompoundE3Diswhentheembeddingdimensionissetto90.
98
AsshowninFig. 4.6andTable4.13,CompoundE3Doffersthebestperformanceamongallbenchmarking
modelswhile having the smallest number offreeparameters.
4.4 ConclusionandFutureWork
A novel and effective KGE model based on composite affine transformations in the 3D space, named
CompoundE3D,wasproposedinthiswork. AbeamsearchprocedurewasdevisedtobuildadesiredKGE
fromthesimplestconfigurationtomorecomplicatedones. Theensembleofthetop- modelvariantswas
also explored to further boost link prediction performance. Extensive experimental results were provided
to demonstrate the superior performance of CompoundE3D. We conducted ablation studies to assess the
effect of each operator and performed case studies to shed light on the modeling power of CompoundE3D for
severalrelation types such as multiplicity,symmetricrelations,andhierarchicalrelations.
Astofutureresearchdirections,itwillbeinterestingtoexploretheeffectivenessofCompoundE3Din
other important KG problems such as entity typing [52] and entity alignment [49]. Besides, research on
performanceboostinginlow-dimensionalembeddingspaceisvaluableinpracticalreal-worldapplications
andworthfurther investigation.
99
Chapter5
CORE:AKnowledgeGraphEntityTypePredictionMethodviaComplex
SpaceRegressionandEmbedding
5.1 Introduction
Research on knowledge graph (KG) construction, completion, inference, and applications has grown rapidly
in recent years since it offers a powerful tool for modeling human knowledge in graph forms. Nodes in
KGs denote entities and links represent relations between entities. The basic building blocks of KG are
entity-relationtriplesinformof(subject,predicate,object)introducedbytheResourceDescriptionFramework
(RDF). Learning representations for entities and relations in low dimensional vector spaces is one of the most
activeresearch topics in the field.
Entity type offers a valuable piece of information to KG learning tasks. Better results in KG-related
taskshavebeenachievedwiththehelpofentitytypes. Forexample, TKRL[163]usesahierarchicaltype
encoderforKGcompletionbyincorporatingentitytypeinformation. RLKB[42]proposesanovelembedding
modelthatleveragesentitydescriptions. AutoETER[107]adoptsasimilarapproachbutencodesthetype
informationwithprojectionmatrices. BasedonDistMult[174]andComplEx[140]embedding,[67]proposes
an improved factorization model without explicit type supervision. JOIE [58] attempts to embed entities and
types in two separate spaces by learning instance-view embedding and ontology-view embedding. Similar to
100
JOIE, TaRP [29] leverages the hierarchical type ontology structure for relation prediction. Instead of learning
embedding for all types, TaRP develops a heuristic weighting mechanism to rank the prior probability of
a relation given its head and tail type information. A similar idea was examined in TransT [97]. Besides,
entitytypeisalsoimportantforInformationExtractiontasksincludingentitylinking[56,47]andrelation
extraction[206, 30].
Figure 5.1: AKGwiththeentitytypeinformation.
Ontheotherhand,entitytypepredictionischallengingforseveralreasons. First,collectingadditional
typelabelsforentitiesisexpensive. Second,typeinformationisoftenincompleteespeciallyforlarge-scale
datasets. Fig. 5.1 shows a snapshot of a KG, where missing entity types need to be inferred. Third, KGs
are ever-evolving and type information is often corrupted by noisy facts. Thus, there is a need to design
algorithmstopredictmissingtypelabels. Quiteafewapproacheshavebeenproposedtopredictmissingentity
typesinKG.Theycanbeclassifiedintothreedifferentcategories: namelystatistical-based, classifier-based,
andembedding-based methods.
101
Table 5.1: ExampleoftypeinYAGO43Kdataset
Entity Coarse-grained Fine-grained
AlbertEinstein Scientist FellowsoftheGermanAcademyofSciencesLeopoldina
Cosmologists 20th-centuryastronomers
Refugees PeoplewithacquiredSwisscitizenship
Pacifists 20th-centuryAmericanwriters
... ...
55 type labels in total
Table 5.2: ExampleoftypeinDB111K-174dataset
Entity Typelabel
Manchu people Ethnicgroup
Discovery Channel Televisionstation
Let It Bleed Album
5.2 ProposedCOREMethods
Toleveragetheexpressivepowerofthecomplex-spaceKGembedding,weproposetolearnasetofembeddings
for both the entity space and the type space as shown in Fig. 2.4. The blue dots denote the representation of
entities and the red dots denote the representation of types. The blue solid arrow denotes the relation rotation
vector in entity space and the red solid arrow denotes the relation rotation vector in the type space. The black
solid arrows denote the regression mappings from entity vectors to type vectors. Specifically, we experiment
withComplExandRotatEembeddingmodels. Ontopofentityembeddingandtypeembedding,welearn
a regression between these two spaces. Finally, we make type predictions using a distance-based scoring
function based on the embeddings and regression parameters. The high-level concept of the proposed CORE
modelisgiven in Fig. 5.2.
102
5.2.1 ComplexSpaceKGEmbedding.
Let(,,) be a KG triple and e
s
, w
r
, e
o
∈C
denote the complex space representation of triple’s subject,
relation,and object. For ComplEx, the scorefunctionis
(,)=− Re(⟨w
r
, e
s
, e
o
⟩),
where⟨·,·,·⟩denotesanelement-wisemulti-lineardotproduct[140],and· denotestheconjugateforcomplex
vectors. ForRotatE embedding, the scorefunctionis
(,)=∥e
s
◦ w
r
− e
o
∥
1
,
where◦ denotes the element-wise product.
WefollowRotatE’snegativesamplinglossandself-adversarialtrainingstrategytotraintheembedding.
Thelossfunction for KG embedding modelscanbewrittenas
KGE
=− log(
1
−
(,))
−
∑︁
=1
(
′
,,
′
) log(
(
′
,
′
)−
1
),
where isthesigmoidfunction,
1
isafixedmarginhyperparameterfortrainingKGembedding, (
′
,,
′
) is
theth negative triple and(
′
,,
′
) is the probability of drawing negative triple(
′
,,
′
). Given a positive
triple,(
,,
), the negative sampling distributionis
(
′
,,
′
|{(
,,
)})=
exp
1
(
′
,
′
)
Í
exp
1
(
′
,
′
)
,
where
1
isthe temperature of sampling.
103
5.2.2 ComplexSpaceTypeEmbedding.
Similar to definitions in KG Embedding space, we use (
,,
) to denote a type triple and t
s
, v
r
, t
o
∈ C
todenoterepresentationsofthesubjecttype,therelation,andtheobjecttypeinthetypeembeddingspace.
Type triples can be generated by inspecting the entity types of head and tail entities of an existing entity
triple. Take an example from Figure 5.1. Suppose we have the entity triple “(Joe Biden, was_born_in,
Scranton_Pennsylvania)”. Then,atypetriplecanbeinferredtoas“(Officeholder,was_born_in,Location)”.
ForComplEx embedding, the score functionis
(
,
)=− Re(⟨v
r
, t
s
, t
o
⟩).
ForRotatEembedding, the score functionis
(
,
)=∥t
s
◦ v
r
− t
o
∥
1
.
Totraintype embedding, we use the self-adversarialnegativesamplinglossinformof
TPE
=− log(
2
−
(
,
))
−
∑︁
=1
(
′
,,
′
) log(
(
′
,
′
)−
2
),
where
2
is a fixed margin hyperparameter for training type embedding and (
′
,,
′
) is theth negative
triple, and(
′
,,
′
) is the probability of drawing the negative triple(
′
,,
′
). Given a positive triple,
(
,,
),the negative sampling distributionis
(
′
,,
′
|{(
,,
)})=
exp
2
(
′
,
′
)
Í
exp
2
(
′
,
′
)
,
104
where
2
isthe temperature of sampling.
Figure5.2: IllustrationoftheCOREModel,wheretheblueandreddotsdenoteentitiesandtypesintheir
complexembedding spaces, respectively.
5.2.3 SolvingComplexSpaceRegression.
To propagate information from the entity space to the type space, we learn a regression between two complex
spaces. Afeasibleandlogicalwayofsolvingthecomplexregressionistocasttheproblemintoamultivariate
regressionprobleminrealvectorspace. Formally,let e∈C
, t∈C
denotetherepresentationoftheentity
and its type. We divide the real and the imaginary parts of every complex entity vector into two real vectors;
namely, Re(e)∈R
and Im(e)∈R
. Wedothesametodividethecomplextypevectorintotworealvectors:
Re(t)∈R
and Im(t)∈R
.
As shown in Fig. 5.2, the regression process consists of four different real block matrices: {A
Re→Re
,
A
Im→Re
, A
Re→Im
and A
Im→Im
}∈ R
×
. The real part of the output vector depends on both the real and
105
imaginary part of the input vector. Similarly, the imaginary part of the output vector also depends on both the
realandimaginary part of the input vector. Theregressionproblemcanberewrittenas
©
«
Re(t)
Im(t)
ª
®
®
®
¬
=
©
«
A
Re→Re
A
Im→Re
A
Re→Im
A
Im→Im
ª
®
®
®
¬
©
«
Re(e)
Im(e)
ª
®
®
®
¬
+,
where∈R
2
denotes the error vector. Fig. 5.2 gives an illustration of the model. To minimize, we use the
followingscore function
(,)=∥A
Re→Re
·Re(e)+A
Im→Re
·Im(e)− Re(t)∥
2
+∥A
Re→Im
·Re(e)+A
Im→Im
·Im(e)− Im(t)∥
2
.
Wefindthattheself-adversarialnegativesamplingstrategiesareusefulinoptimizingregressioncoefficients.
Thelossfunction in learning these coefficientsissetto
RR
=− log(
3
− (,))
−
∑︁
=1
(,
′
) log((,
′
)−
3
),
where
3
is a fixed margin hyperparameter in the regression, (,
′
) is theth negative pair, and(,
′
) is the
probabilityofdrawingnegativepair(,
′
). Givenpositivepair(,
),thenegativesamplingdistributionis
equalto
(,
′
|{(,
)})=
exp
3
(,
′
)
Í
exp
3
(,
′
)
,
where
3
is the temperature of sampling. Since the objective of our model is to predict the type given an
entity,weonly generate negative samplesfortypesbutnotforentities.
106
5.2.4 TypePrediction.
Weusethedistance-basedscoringfunction (,) intheregressiontopredictentitytypes. Thetypeprediction
functioncanbe written as
ˆ = arg min
∈T
(,),
whereT denotes the set of all types.
5.2.5 Optimization.
Wefirstinitializetheembeddingandregressionparametersbysamplingfromthestandarduniformdistribution.
Three parts of our model are optimized sequentially. First, we optimize the KG embeddings using KG triples
and negative triples. Next, we move on to train regression and type space embeddings parameters. we freeze
theKGembeddingtoensuretheregressionislearningimportantinformationinthisstage. Last,wefurther
optimizethetypespaceembeddingsusingtypetriples. Toavoidoverfittingoftheregressionmodelinthe
earlytraining stage, we alternate the optimizationforeachpartofthemodelevery1000iterations.
5.2.6 Complexity.
The memory and space complexity for CORE are both(
×
+
×
+
×
+
×
), where
denotes the number of objects, denotes the dimension of vectors, and the subscripts denotes entity,
denotesrelation, denotes type, respectively.
107
Table5.3: Statisticsof three KGdatasets used in ourexperiments.
Dataset #Ent #Rel #Type
#KG Triples #Entity Type Pairs
#Train #Valid #Test #Train #Valid #Test
FB15k-ET 14,951 1,345 3,851 483,142 50,000 59,071 136,618 15,749 15,780
YAGO43k-ET 42,335 37 45,182 331,687 29,599 29,593 375,853 42,739 42,750
DB111K-174 111,762 305 242 527,654 65,000 65,851 57,969 1,000 39,371
108
5.3 Experiments
5.3.1 Datasets
We evaluate the proposed CORE model by conducting experiments on several well-known KG datasets with
entity type information. They include FB15k-ET, YAGO43k-ET [100], and DB111k-174 [58], which are
subsets of Freebase [8], YAGO [129], and DBpedia [3] KGs, respectively. [205] further clean FB15k-ET and
YAGO43k-ETdatasetsbyremovingtriplesinthetrainingsetsfromthevalidationandtestsets. Theyalso
create a script to generate type triples(
,,
) by enumerating all possible combination of relations and the
types of their subject and object. We use the same script to generate type triples for training type embedding.
Thestatistics of these datasets are showninTable5.3.
(a)RotatE (b) ComplEx
Figure5.3: Comparison of the MRR performanceforFB15k-ETasafunctionofthetypedimension.
5.3.2 HyperparameterSetting
Welist out the hyperparameter settingsforeach of the benchmarking datasets werun experiments on in Table
5.8. In this table, and denote the dimension of entity embedding and type embedding, respectively. ,
, and denotetheentitybatchsize, typebatchsize, andnegativesamplesize, respectively.
1
,
1
, and
1
denotethesamplingtemperature,marginparameter,andlearningraterespectively. Inaddition,wealso
109
show the MRR and Hits@k results for RotatE and ComplEx with different type dimensions for FB15k-ET in
Fig. 5.3(a)and Fig. 5.3 (b), respectively.
5.3.3 BenchmarkingMethods
Tothebestofourknowledge,wearethefirstworktocompareembedding-basedmethodswithstatistical-based
and classifier-based methods since we would like to understand the strengths and weaknesses of different
models.
5.3.3.1 Statistical-based Method.
WecomparetheperformanceoftheproposedCOREmodelwiththatofastatistical-basedmethodnamed
SDType-Cond in FB15k-ET dataset. SDType-Cond is a variant of SDType. The neighbor type is readily
available in many entities, yet SDType ignores this important piece of information. SDType-Cond is capable
of estimating the type distribution more precisely by leveraging the known neighbor type information.
Insteadofestimatingthetypedistribution given a paricularrelation(
| = ˜ ) or(
| = ˜ ), weestimate
(
| = ˜ ,
= ˜ ) or (
| = ˜ ,
= ˜ ), ˜ ∈R, ˜ ∈T, whereR andT denote the set of all relations and
thesetofallentitytypes, respectively. Thetwoprobabilitiesrepresentthetwocaseswherethetargetentity
servesasasubjectandanobject,respectively. Byaggregatingtheprobabilitiesgeneratedbyallpossible( ˜ , ˜ )
combinationsintheneighborhoodofthetargetentity,wecanrankthetypecandidatesusingthefollowing
function:
ˆ = arg max
ˆ ∈T
1
|N|
∑︁
( ˜ , ˜ )∈N
(
= ˆ | = ˜ ,
= ˜ )
+
∑︁
( ˜ , ˜ )∈N
(
= ˆ | = ˜ ,
= ˜ )
,
whereN denotes the set of all( ˜ , ˜ ) combinationsinthetargetentity’sneighborhood.
110
Table 5.4: Performance comparison of various entity type prediction methods in terms of filtered ranking for FB15k-ET and YAGO43k-ET, where the
bestandthe secondbestperformancenumbersare shown in boldface and withan underscore, respectively.
Datasets FB15k-ET YAGO43k-ET
Metrics MRR H@1 H@3 H@10 MRR H@1 H@3 H@10
RESCAL [105] 0.19 9.71 19.58 37.58 0.08 4.24 8.31 15.31
RES.-ET[100] 0.24 12.17 27.92 50.72 0.09 4.32 9.62 19.40
HOLE[106] 0.22 13.29 23.35 38.16 0.16 9.02 17.28 29.25
HOLE-ET[100] 0.42 29.40 48.04 66.73 0.18 10.28 20.13 34.90
TransE[9] 0.45 31.51 51.45 73.93 0.21 12.63 23.24 38.93
TransE-ET[100] 0.46 33.56 52.96 71.16 0.18 9.19 19.41 35.58
ETE[100] 0.50 38.51 55.33 71.93 0.23 13.73 26.28 42.18
ConnectE-E2T[205] 0.57 45.53 62.31 78.12 0.24 13.54 26.20 44.51
ConnectE-E2T-TRT[205] 0.59 49.55 64.32 79.92 0.28 16.01 30.85 47.92
ConnectE-E2T-TRT(Actual) 0.58 47.45 64.33 77.55 0.14 8.04 17.59 24.62
SDType-Cond 0.42 27.56 50.09 71.23 - - - -
CORE-RotatE 0.60 49.32 65.25 81.09 0.32 22.96 36.55 51.00
CORE-ComplEx 0.60 48.91 66.30 81.60 0.35 24.17 39.18 54.95
111
Table 5.5: Performance comparison of entity type prediction for DB111K-174, where the best and the second
bestperformance numbers are shown in boldfaceandwithanunderscore,respectively.
Datasets DB111K-174
Metrics MRR H@1 H@3
JOIE-HATransE-CT 0.857 75.55 95.91
SDType 0.861 78.53 92.67
ConnectE-E2T 0.88 81.63 94.19
ConnectE-E2T-TRT 0.90 82.96 96.07
SDType-Cond 0.879 80.99 94.05
CORE-RotatE 0.889 82.02 95.36
CORE-ComplEx 0.900 84.25 95.42
TransE+XGBoost 0.878 81.38 94.07
TransE+SVM 0.917 86.77 96.33
Table 5.6: Performance comparison of coarse-grained and fine-grained type prediction for the YAGO43k-ET
dataset.
Metric H@1
YAGOTypes CORE ConnectE
Club 69.87 10.89
Award 60.71 46.42
Artist 56.03 31.91
Administrativedistrict 43.67 0.97
Cities/townsinCalifornia 86.95 52.17
Brazilian footballers 73.63 60.00
Americanactresses 56.41 23.07
20th-century-Americannovelists 50.00 22.73
5.3.3.2 Classifier-based Method.
We also explore the node classification approach to solve the type prediction problem as a benchmarking
method. Specifically, we experiment with the combination of pretrained TransE embedding as entity features
anduseSVM and XGBoost as classifiers.
112
Table 5.7: An illustrative exampleoftypepredictionfortheYAGO43k-ETdataset.
Entity Model Top3TypePredictions
Albert Einstein
ConnectE IslandsofSicily,Swisssingers,HeadsofstateofCanada
CORE
NobellaureatesinPhysics,FellowsoftheRoyalSociety,
20th-centurymathematicians
Warsaw
ConnectE
DefunctpoliticalpartiesinPoland,PoliticalpartiesinPoland,
UniversitiesinPoland
CORE Administrativedistrict,Portcities,CitiesinEurope
George Michael
ConnectE
UnitedSoccerLeagueplayers,PeoplefromStourbridge,
FortunaDüsseldorfmanagers
CORE Britishsingers,Englishmusicians,Rocksingers
Table5.8: Hyperparametersetting.
Dataset Model
1
1
1
FB15k-ET
CORE-RotatE 1000 700 1024 4096 256 1 24 0.0001
CORE-ComplEx 500 550 1024 4096 400 1 24 0.0002
YAGO43k-ET
CORE-RotatE 500 350 1024 4096 400 1 24 0.0002
CORE-ComplEx 500 350 1024 4096 400 1 24 0.0002
DB111K-174
CORE-RotatE 1000 250 1024 4096 256 1 24 0.0005
CORE-ComplEx 1000 250 1024 4096 400 1 24 0.0002
5.3.4 ExperimentalResults
To evaluate the performance of the CORE method and several benchmarking methods, we use the Mean
ReciprocalRank (MRR) and Hits@k as theperformancemetrics. Specifically,MRRcanbecomputed as
MRR=
1
||
∑︁
∈
1
Rank
where is the set of all entity type pairs and Rank
denotes the rank of ground truth for the−th sample.
Hits@ is the proportion of ground truth appears in the top- candidate list. Higher Hits@ and MRR imply
betterperformanceofthemodel. Sincemodelsaretrainedtofavorobservedtypesastopchoices,wefilterthe
observedtypes from all possible type candidateswhencomputingtherankingbasedonascoringfunction.
113
First, weshow the MRR and Hits@kresults for RotatE and ComplEx with different type dimensions for
FB15k-ETinFig. 5.3(a)andFig. 5.3(b), respectively. Theoptimaltype dimensionsfortheRotatEand the
ComplExembeddings are 700 and 550. Weadoptthissettinginthefollowingexperiments.
Table5.4showstheresultsforFB15k-ETandYAGO34k-ET.Weseefromthetablethatourproposed
CORE models offer the state-of-the-art performance in both datasets. CORE-ComplEx achieves the best
performance in all categories except Hits@1 for FB15K-ET. It outperforms the previous best method,
ConnectE-E2T-TRT, by a significant margin( 51%)forYAGO34K-ETdataset.
We also experiment with classifier-based methods and observe that classifier-based methods do not
scalewellforlarge datasetssuch asFB15k-ETandYAGO43k-ET.Thetrainingtimefortheclassifiergrows
significantly with the size of the label set and the feature dimension. In addition, the performance of the
classifier-based method is much lower thanthestatistical-basedandembedding-basedmethods.
Ontheotherhand,fordatasetswithasmallnumberofdistincttypes,classifier-basedmethodsdohave
someadvantages. Table5.5 comparestheperformance ofseveraltype predictionmethods for DB111K-174.
TheSVMclassifierwithpretrainedTransEembeddingfeaturesoutperformsallothermethods. Theideato
incorporatehierarchicalstructuralinformationoftypesbyJOIEdoesnotseemeffectivefortypeprediction
sinceeventhesimplestatistical-basedmethodsuchasSDTypecanoutperformtheJOIEbaselineforMRR
andHits@1metrics. Byconditioningonneighbortypelabelinformation,SDType-Condcanfurtherboost
theperformance. Theperformancegapamong differentbenchmarkingmethodsis smaller onDB111K-174.
In addition, we analyze and compare the performance of CORE and ConnectE in both coarse-grained
and fine-grained types found in the YAGO43k-ET dataset. Coarse-grained types are the ones with a broader
scope,whereasfine-grainedtypesaretheoneswithmorerestrictivedefinitions. Inparticular,wecomparethe
Hits@1 metric for CORE-ComplEx and ConnectE-E2T-TRT in Table 5.6. As shown in the table, CORE
outperforms ConnectE for most type classes. We again attribute this performance improvement to the better
representation of KG entities and relationsbycomplexspaceembeddingmodels.
114
Togaininsightsintoentitytypeprediction,weprovideanillustrativeexampleoftypepredictioninthe
YAGO43k-ETdataset. In Table 5.7, we comparethetopthreetypepredictionsbyConnectEandCOREfor
some well-known people and places. Although there are some lapses in CORE’s predictions, the model can
make the right decisions for most queries and a majority of the top three candidates are in fact valid type
labelsforthecorrespondingtargetentity. Theseresultsdemonstratetheimpressivepredictionpowerofour
proposedCORE model, given the enormousamountofuniquetypelabels.
5.4 ConclusionandFutureWork
Acomplexregressionandembeddingmethod,calledCORE,hasbeenproposedtosolveentitytypeprediction
probleminthisworkbyexploitingtheexpressivepowerofRotatEandComplExmodels. Itembedsentities
andtypesintwodifferentcomplexspacesandusedacomplexregressionmodeltomeasuretherelatednessof
entities and types. Finally, it optimizes embedding and regression parameters jointly to reach the optimal
performanceunderthisframework. ExperimentalresultsdemonstratethatCOREoffersgreatperformance
on KG entity type inference datasets and outperforms the state-of-the-art model by a large margin for
YAGO34K-ET dataset.
Thereareseveralresearchdirectionsworthexploringinthefuture. First,theuseoftextualdescriptions
and transformer models to extract features for entity type prediction can be investigated. Second, we can
examinethemultilabelclassificationframeworkforentitytypepredictionsinceitsharesasimilarproblem
formulation. Although both try to predict multiple target labels, there are however differences. For multilabel
classification, objects in the train set and the test set are disjoint. That is, we train the classifier using the train
setandtestitonadifferentset. Forentitytypeprediction,thetwosetsarenotdisjoint. Intraining,amodelis
trained with a set of entity feature vectors and their corresponding labels. In inference, it is often to infer
missing type labels for the same set of entities. Third, a binary classifier can also be used for entity type
115
prediction. Yet,thereexistfarmorenegativesamplesthanpositiveones,anditrequiresa goodselection of
negativeexamples to handle the data imbalanceproblem.
116
Chapter6
TypeEA:Type-AssociatedEmbeddingforKnowledgeGraphEntityAlignment
6.1 Introduction
The entity type offers an important piece of side information. It indicates what class an entity belongs
to. Besides, ontological structures between types allow us to group entities together at different levels of
granularity. Intuitively, the entity type can improve the performance of entity alignment models since we do
notneedtoalign entities of mismatched types.
WeuseanexampleinFig. 6.1toillustratetheunderlyingidea. Suppose“HomeAlone”isthesameentity
to be aligned between DBpedia [3] and Wikidata [144] KGs. In DBpedia, the entity “Home Alone” has type
labelssuchas“CreativeWork”,“Movie”,etc. InWikidataKG,entitieswith“Film”typeshouldberanked
higherthanentitieswith“Actor”typeor“Filmdirector”type,althoughthesetypelabelsarecloselyrelated
concepts.
Afterinspectingrecententityalignmentmodels,weobservethatalargefractionoferrorsofthepredicted
entity alignment pairs have mismatched types between entities. Since these predictions are unlikely to be the
correctones,sucherrorscanbeavoidedbytakingtypeinformationintoconsideration. Wecollectthestatistics
of the proportion of H@1 prediction errors due to mismatched types for different entity alignment models in
Table6.1. Basedon thestatistics, wecouldpotentiallyreduceupto30%-50%ofthetop1predictionerrors
whenconsidering type information.
117
Figure 6.1: An Illustrative exampleoftheideabehindtheproposedTypeEAmethod.
Table 6.1: Proportion of H@1 prediction errors due to mismatched types by different entity alignment
methodsforthe D-W 15K V1 dataset
Model MTransE JAPE BootEA MultiKE RDGCN
Ratio 45.240% 42.678% 40.884% 34.588% 57.038%
Prior to the deep learning era, type information has already been leveraged for entity alignment. For
instance,PBA[212]isapartition-and-blocking-basedalignmentmethodthatusesthetypeinformationas
theblockingkey. However,sincedifferentKGshavedisjointsetsoftypelabels,solvingthetyperesolution
problem can be challenging. To address this problem, we propose a method to train an embedding model
to capture the type association, and call it the Type-associated Entity Alignment (TypeEA) method. Seed
alignments allow us to generate some associated type pairs. Based on them, we can train the TypeEA model
to capture more associations. Hence, given the type information of an entity in the source KG, instead of
performingrule-basedblocking,weusetheTypeEAmodeltoidentifyitsmostrelevantcounterpartinthe
target KG automatically. In this work, we first show that the bilinear product embedding for the proposed
TypeEA can capture the type association well. For entity alignment, we make the alignment ranking and
decisions by considering the alignment score and the type association score jointly so that TypeEA can better
focusonentities with matched types.
118
6.2 TheTypeEAMethod
To perform entity alignment, the proposed TypeEA method consists of two parts: 1) how to effectively train
the type association embedding and 2) how to select the subset of entity types to learn the representation.
Then,weintegratethetrainedtypeassociationembeddingwiththestate-of-the-artentityalignmentmodelsto
correctthetype mismatch problem in theirmodels.
6.2.1 ProblemFormulation
LetG=(E,R,L,T) denotesaknowledgegraphwhereE,R,andL representsasetofallentities,relations,
andtypelabels, respectively.T denotes asetofallrelationtriples{(ℎ,,)|ℎ,∈E,∈R}.
Toalignentities in two KGs, denotedby
G
1
= (E
1
,R
1
,L
1
,T
1
), (6.1)
G
2
= (E
2
,R
2
,L
2
,T
2
), (6.2)
weneedtoidentify all pairs of equivalententities
={(
1
,
2
)|
1
∈E
1
,
2
∈E
2
} (6.3)
from two KGs. Seed entity pairs are often given in entity alignment datasets. Since entity type labels are
availablefrom KG queries, we can infer labelpairs
={(
1
,
2
)|
1
∈L
1
,
2
∈L
2
} (6.4)
from the entity pair set. Our goal is to design embedding models to encode the type information and
investigatewhether the type embedding canimprovetheentityalignmentperformance.
119
Table6.2: Thestatisticsof type association pairs for various datasets.
Dataset
V1 (Sparse) V2 (Dense)
Train Valid Test KG1 KG2 Train Valid Test KG1 KG2
D-W15K 2,609 1,328 9,273 101 1,185 2,696 1,350 9,435 51 563
D-W100K 17,313 8,680 60,844 163 3,883 17,995 8,915 62,620 104 2,682
D-Y15K 2,884 1,437 10,062 387 407 2,903 1,472 10,178 178 75
D-Y100K 19,138 9,532 66,777 1,117 1,306 19,162 9,565 66,961 690 791
EN-DE 15K 2,884 1,455 10,189 497 103 2,877 1,452 10,176 284 53
EN-DE 100K 19,466 9,777 68,257 1619 149 19,386 9,705 68,005 1,179 109
EN-FR15K 2,511 1,246 8,799 565 208 2,594 1,302 9,166 340 115
EN-FR100K 16,090 8,003 56,236 1,757 335 16,660 8,259 58,199 1,422 290
120
6.2.2 TypeAcquisition
Onecontributionofthisworkistoaddthetypelabelinformationofentitiestoexistingdatasets. TheDBpedia
(EN) KG has the most abundant type labels for each entity. However, there are two challenges in choosing an
appropriate subset of type labels for modeling. First, many of type labels are acquired from different sources
andoftenredundant. Second,alargenumberoftypelabelsaretoofine-grained. Withalimitedamountof
seed entity pairs, it is difficult to generate enough type label pairs to train type embedding well. To solve this
problem,weobtainnon-overlappingsubsetsoftypesandtheirassociationpairsforbothsourceandtarget
KGs. Details on type information acquisition are discussed in Sec. 6.3.1. We train and evaluate the type
associationembedding using the type pairsdataset.
6.2.3 TypeAssociationEmbedding
Thegoaloftrainingthetypeassociationembeddingistomodeltherelationshipbetweentypelabelsfromtwo
KGs. Since the type sets for two KGs are disjoint, we essentially use the type association embedding to align
the types from two KGs before aligning the entities. Source and target entities whose type labels can generate
higher type association scores are more likely to be aligned. To model the type association, we adopt two
scoringfunctions: 1) the cosine similarityand2)thebilinearproduct.
Cosine Similarity. We first experiment with the cosine similarity as the score function to capture the
associationbetween types. This can be writteninformof
type
(,)= cos(u, v)=
u
T
v
∥u∥∥v∥
, (6.5)
where and denote two types associated with two different KGs. The use of the cosine similarity score
functionisintuitivesincethegoalistohaveentitytypesthatfrequentlyappearinatypepairtohavehigher
scoreswhilethosepairsthatneverappearbeforehavelowerscores. However,thecosinesimilarityscoredoes
121
Figure6.2: Illustrationofusingthetypeassociationembeddingfor identifyingrelevantcandidatesinentity
alignment, where u and v denote two type embeddings in two KGs, respectively, and W denotes their
associationembedding using the bilinear product.
not generate satisfactory results in retrieving the most relevant types in practice. We perform experiments on
typelabelpairs for D-W 15K V1/V2 datasetandshowtheresultsinTable6.3.
Bilinear Product. Since thecosine similaritymeasure isnot effectivein modelingthe relationship between
types, we propose a more expressive bilinear product score function to model type association. The
semantic-matching-based score is defined as
type
(,)= u
T
Wv, (6.6)
where and denote the type ofone KG1 entity andthe type of one KG2entity, respectively. Also, u∈R
and v∈R
denotetherepresentationsof and inthetypespace,respectively. SimilartoRESCAL[105]
andDistMult [174], we construct an embeddingmatrix, W∈R
×
,whichissharedamongalltypepairs.
122
We find that the self-adversarial negative sampling strategies introduced in RotatE [134] are particularly
useful in learning the type association embedding parameters. The objective function in learning these
parametersis set to
T
=− log(
type
(,)−)−
∑︁
=1
(,
′
) log(
type
(,
′
)−), (6.7)
where isafixedmarginhyper-parameter, (,
′
) isthe-thnegativetypepair,and(,
′
) istheprobability
ofdrawingnegativetypepair(,
′
). Givenacorruptedtypepair(,
′
),thesamplingdistributioncanbe
writtenas
(,
′
|{(,
)})=
exp
type
(,
′
)
Í
exp
type
(,
′
)
. (6.8)
Thisself-adversarialnegativesamplingschemehastwoadvantages. First,hardnegativesamplesaremore
likelytobechosenfortraining. Theembeddingmodelcanbefine-tunedmoreeffectivelybyhardnegative
examples than easy negative samples. Second, since hard negative samples carry a higher weight in the
objective function, their loss is given more attention in optimization. The performance on predicting the
associatedtype pairs for various datasets isgiveninTable6.3.
6.2.4 EntityRepresentationandAlignment
Weexperiment on three model types withdifferentfeaturerepresentationsbelow.
TranslationEmbedding. Translation-basedEAtechniquesmainlyusethetranslationembeddingmodelto
extractstructural features for entities andrelations. ThewellknownTransEscoringfunctionisdefined as
triple
(ℎ,,)=∥h+ r− t∥, (6.9)
123
where h, r, tarethelowdimensionalspacerepresentationfortheheadentity,therelation,andthetailentityof
a triple, respectively. To pull entity vectors from KG1 and KG2 into a unified space, we generate new triples
byswappingalignedentitiesinthecorrespondingtriples. Forexample,givenanalignedentitypair(
1
,
2
),
where
1
and
2
comes from KG1 and KG2respectively,wecangeneratethefollowingsetoftriples:
T
={(
2
,,)|(
1
,,)∈T
1
}∪{(ℎ,,
2
)|(ℎ,,
1
)∈T
1
}
∪{(
2
,,)|(
1
,,)∈T
2
}∪{(ℎ,,
2
)|(ℎ,,
1
)∈T
2
}.
(6.10)
Moreover,insteadofusingthemax-marginlossfunction adoptedby theoriginalTransE model,we usethe
limit-basedloss function[209] to optimizetheembedding. Thelossfunctioncanbeexpressedas:
e
=
∑︁
(ℎ,,)∈T
max(0,[
triple
(ℎ,,)−
1
])+
1
∑︁
(ℎ
′
,
′
,
′
)∈T
′
max(0,[
2
−
triple
(ℎ
′
,
′
,
′
)]),
(6.11)
whereT
=T
1
∪T
2
∪T
, andT
′
contains all corrupted triples generated by uniform negative sampling.
Basedonthelearnedentityrepresentation,analignmentmoduleisfurtherproposedandtrainedtoidentify
thecounterpartofatargetentityintheotherKG.Amongdifferentalignmentmodules,Bootstrapping[131]is
oneofthebestperformingstrategy. Inourexperiment,wealsoincludetheBootstrappingstrategytofacilitate
thealignment. In particular, we minimizethefollowingcross-entropyobjective:
a
=−
∑︁
1
∈E
1
∑︁
2
∈E
2
1
1
(
2
) log(
2
|
1
;Θ), (6.12)
where1
1
(
2
) is a indicator function that denotes the labeling probability of entity
1
and(
2
|
1
;Θ) is
the function that computes the likelihood of labeling the counterpart of entity
1
as
2
, given embedding
124
parametersΘobtainedfromTransE.Inourexperiment,thecosinesimilarityisusedasthesimilarityfunction.
Thealignment decision is made using the followingfunction
align
(
,
)∝(
2
|
1
;Θ)= cos(e
, e
). (6.13)
Attribute Auxiliary Features. In this line of work, auxiliary features such as textual information from entity
namesandnumericattributesfromentitypropertylateralssuchas‘date’,‘age’areleveragedtoimproveentity
alignmentperformance. Theseauxiliaryfeatureswerenotconsideredinlearningthestructuralembedding.
Yet, they provide important information for identifying matching entities. In our experiments, we choose
MultiKE [195], which uses auxiliary features, as one of our baselines and verify if our type association
methodcanimprovetheperformanceofMultiKE.Inthebaseline,pre-trainedwordandcharacterembeddings
are used to encode entity names. To model the attribute-value information, separate embedding matrices are
trained for attribute labels and values, respectively. To learn the attribute label and value embedding, we use
thefollowing scoring function
(,,)=∥e−CNN([a∥v])∥, (6.14)
where,, represent entity, attribute label, and attribute value, respectively, in a attribute lateral triple.
[a∥v] denotestheconcatenationofattributelabelandattributevaluevectors. Theconcatenatedfeaturevector
is passed into a Convolutional Neural Network (CNN) and the error between the resulting vector and the
entityvector e is minimized. We use the logisticbasedobjectivefunctiontooptimizethemodel
v
=
∑︁
(,,)∈T
log(1+ exp((,,))), (6.15)
125
whereT
is a set of all attribute triples. The obtained structural, textual, and attribute embeddings are
combinedtoformtherepresentationofentities. Alignmentinferencesareperformedthroughnearestneighbor
search.
Graph Neural Networks. Another approach is to use graph convolutional networks (GCNs) to represent the
entity. Themessage passing process in GCNscanbeformulatedas
H
(+1)
=(
˜
D
−
1
2 ˜
A
˜
D
−
1
2
H
()
W
()
), (6.16)
where
˜
A= + istheadjacencymatrixofthegraph, istheidentitymatrixthatdenotestheselfconnection,
˜
D is a diagonal matrix of node degrees where
˜
D
=
Í
˜
A
, W
()
is the weight matrix at the-th layer to
be optimized, and H
()
is the node embedding at the-th layer. The particular variant of the GCN-based
approach adopted as a baseline in our experiment is the RDGCN [159]. RDGCN uses a coupled GCN to
incorporatethe relation information through attentive interactions between the original graph and its dual
relation graph. A max-margin loss is used to optimize the model. We can obtain the vector representation of
eachnodefrom the output layer of the RDGCNandcomputethealignmentscoreas
align
(
,
)∝ 1−∥e
− e
∥. (6.17)
126
6.2.5 Inference
During inference, we take both the type score and the alignment score into account. Suppose
is the source
entity. To infer the matching target entity,
, we choosethe entity that maximizes the linearcombination of
thetypescore and the alignment score. Mathematically,wehave
ˆ
= arg max
∈
·
type
((
),(
))
+(1−)·
align
(
,
),
(6.18)
where∈[0, 1] is the balancing parameter,and(·)isatypelabellookupfunctionfortheentity.
6.3 Experiments
6.3.1 Datasets
We perform experiments on the DBP v1.1 entity alignment datasets from [132] that includes both cross-KB
andcross-lingualsettings. Tobespecific,under the cross-KB setting, there are D-W and D-Y which denote
DBpedia-WikidataandDBpedia-YAGO,respecively. Underthecross-lingualsettings,thereareEN-FRand
EN-DEwhichdenoteDBpediaEnglish-DBpediaFrenchandDBpediaEnglish-DBepdiaGerman,respectively.
Foreachoftheabovetasks,therearealsovariantswithdifferentsize: 15kand100k,andvariantsofsparse
(V1) and dense (V2) subgraphs. These datasets were generated using a method called iterative degree-based
sampling(IDS). The detailed statistics oftheDBPv1.1datasetcanbefoundintheoriginalpaper[132].
We obtain the type data by querying the DBpedia∗ [3], Wikidata† [144], and YAGO‡ [129] public
endpointusingSPARQLqueries. Fromallthe typelabelsobtainedfromqueries,weselectasubsetoftype
labelsinordertogetreliabletypeembedding. Forexample,tomakethealignmentbetweenDBpediaand
∗https://dbpedia.org/sparql
†https://query.wikidata.org/
‡https://yago-knowledge.org/sparql
127
Table 6.3: Ranking of associated type pairprediction.
Dataset
V1 (Sparse) V2 (Dense)
MRR MR H@1 H@3 H@10 MRR MR H@1 H@3 H@10
D-W15K(cos) 0.312 42.38 0 50.39 86.14 0.368 67.47 2.35 56.68 91.73
D-W15K 0.915 55.14 91.44 91.52 91.63 0.957 13.96 95.68 95.68 95.71
D-W100K 0.953 97.50 95.31 95.33 95.34 0.970 41.65 97.01 97.03 97.05
D-Y15K 0.944 17.21 94.16 94.44 94.97 0.985 2.15 98.26 98.85 98.94
D-Y100K 0.966 29.22 96.54 96.61 96.79 0.981 10.25 98.03 98.10 98.22
EN-DE 15K 0.957 6.52 95.26 95.72 96.67 0.978 2.29 97.58 97.90 98.54
EN-DE 100K 0.977 13.48 97.54 97.73 98.11 0.983 6.13 98.16 98.34 98.68
EN-FR15K 0.930 18.28 92.62 93.08 93.75 0.964 5.76 96.05 96.41 97.04
EN-FR100K 0.962 23.90 95.90 96.24 96.85 0.969 16.71 96.71 96.98 97.40
The first rowshowsthe resultsofpreliminary experiments usingthecosinesimilarityscorefunctionwhileresults inallotherrowsare
generated usingthe bilinear productscore.
128
Table6.4: An example from D-Y15K V1 illustratingthe advantage of using type for alignment.
Rank
w/o Type wType
Entity Type Entity Type
1 Chris Hayward Human Chris Hayward Human
2 LouGrant(TVseries) TVSeries Lou Grant N/A
3 Lou Grant N/A James Coco Human
4 The Munsters TVSeries Ed. Weinberger Human
5 Mr. Smith(TVseries) TVSeries Carol Sobieski Human
6 TheToy(1982 film) Movie Teresa Ganzel Human
... ... ...
10 Ed. Weinberger Human Ernest Kinoy Human
ThesourceentityEd. Weinbergerhastypelabel‘Person’inDBpedia. Thetargetentitiescandidatestogether
with their correspondingtypesinYAGOknowledgegrapharelistedbytheirranks.
129
(a)D-W15KV1(BootEA) (b) D-W 15K V2 (BootEA)
(c)EN-FR15KV1(RDGCN) (d) EN-FR 15K V2 (RDGCN)
Figure 6.3: Comparison of the H@1entity alignment performance with or without the type information for
differentdatasets and models as a functionofthefractionofseedalignment.
Wikidata,we onlyusetype labelsthathave ‘http://www.wikidata.org/’ prefixandfilter outothertype labels.
For DBpedia to YAGO alignment, we use only type labels with ‘http://dbpedia.org/’ prefix. For DBpedia
EN to DE and EN to FR, we use labels with prefixes ‘http://schema.org/’, ‘http://dbpedia.org/ontology’,
‘http://de.dbpedia.org/’, and ‘http://fr.dbpedia.org/’. Statistics of the type pair datasets are shown in Table 6.2.
Specifically, the Train, Valid, and Test columns indicate the number of type pairs in training, validation, and
testing sets, respectively. The KG1 and KG2 columns indicate the number of distinct type in KG1 and KG2
respectively.
130
(a)D-W15KV1(KG1→KG2) (b) D-W 15K V1 (KG2→KG1)
(c)EN-FR15KV1(KG1→KG2) (d) EN-FR 15K V1 (KG2→KG1)
Figure 6.4: Comparison of the H@1entity alignment performance with or without the type information for
differentdatasets.
6.3.2 ImplementationDetails
Model Configuration We set the type embedding dimension= 200 and= 200 for the source and target
KGs respectively, and the type pair batch size is set to 4096. To train type embedding, we use the Adam
optimizerwiththelearningrate= 1− 4. Wesetthebatchsizeto1024, thenumberofnegativesample
to 256, the sampling temperature= 1, the margin parameter= 24. The parameter of entity alignment
baseline models are kept the same as provided in OpenEA [132]. We use a server with Intel(R) Xeon(R)
E5-2620CPU and Nvidia Quadro M6000GPUtorunallofourexperiments.
131
Table 6.5: Comparison of entity alignment performance of TypeEA with baselines for cross-KG (DBpedia to
Wikidata)alignment with 15k entities.
Models
D-W15KV1 D-W15KV2
MRR H@1 H@5 MRR H@1 H@5
MTransE 0.352 25.45 46.03 0.365 25.84 48.03
JAPE 0.339 24.32 44.42 0.364 25.71 48.06
GCNAlign 0.467 37.33 58.26 0.618 51.27 74.95
AttrE 0.383 30.18 46.91 0.586 48.99 69.32
BootEA 0.655 57.8 75.01 0.87 82.22 92.45
RDGCN 0.587 51.83 67.26 0.678 61.71 75.36
MultiKE 0.483 42.27 54.30 0.574 50.08 64.86
TypeEA-B 0.681 60.79 77.05 0.889 83.04 94.01
TypeEA-R 0.656 59.04 73.80 0.729 66.75 80.91
TypeEA-M 0.522 45.53 58.85 0.612 53.41 70.26
Table 6.6: Comparison of entity alignment performance of TypeEA with baselines for cross-lingual (English
toFrench)alignment with 15k entities.
Models
EN-FR15KV1 EN-FR15KV2
MRR H@1 H@5 MRR H@1 H@5
MTransE 0.35 24.6 46.67 0.34 24.47 44.04
JAPE 0.374 26.66 49.96 0.404 29.44 52.65
GCNAlign 0.446 33.45 57.91 0.545 41.89 70.17
AttrE 0.558 46.90 66.05 0.651 55.61 76.71
BootEA 0.597 50.31 71.02 0.747 66.13 85.41
RDGCN 0.799 75.45 85.25 0.881 84.84 91.87
MultiKE 0.776 74.2 81.26 0.884 86.13 90.85
TypeEA-B 0.643 54.28 76.91 0.909 88.21 93.93
TypeEA-R 0.832 79.44 87.91 0.930 90.57 95.91
TypeEA-M 0.827 79.61 86.08 0.908 89.07 92.91
EvaluationMetricsFollowingtheconvention,weuseHits@ andMeanReciprocalRank(MRR)asour
evaluation metrics to evaluate the performance of both type association embedding and the entity alignment
models. Hits@ is the proportion of ground truth entity appears in the top- candidate list. Higher Hits@
andMRRimplybetterperformanceofthemodel. Toevaluatetheperformanceoftypeassociationembedding,
wealsoinclude the Mean Rank (MR) metric.
132
Table 6.7: Comparison of entity alignment performance of TypeEA with baselines for cross-lingual (English
toFrench)alignment with 100k entities.
Models
EN-FR100KV1 EN-FR100KV2
MRR H@1 H@5 MRR H@1 H@5
MTransE 0.203 13.74 26.46 0.131 8.60 16.95
JAPE 0.243 16.92 31.20 0.183 12.35 23.94
GCNAlign 0.321 23.14 41.33 0.351 25.76 45.21
AttrE 0.509 42.96 59.73 0.541 45.70 63.59
BootEA 0.475 39.03 56.30 0.715 63.97 80.52
RDGCN 0.682 63.81 72.96 0.751 71.77 79.03
MultiKE 0.654 62.85 67.94 0.669 64.21 69.52
TypeEA-B 0.483 40.65 58.21 0.722 65.56 82.42
TypeEA-R 0.689 65.65 74.51 0.758 73.62 80.57
TypeEA-M 0.701 67.52 72.84 0.701 67.38 72.85
BaselinesToevaluatetheperformanceimprovementcontributedbyTypeEA,wecompareitwith7highly-cited
strong baseline models. According to the results in [132], BootEA [131], MultiKE [195] and RDGCN [159]
arethetopperformers across different datasetssettingsinDBPv1.1.
6.3.3 Results
Type Association Embedding As shown in Table 6.3, our proposed bilinear product type embedding model
consistentlyachievesgoodresultsofpredictingtheassociatedtypesinthetargetKGacrossallthedatasets
and settings. The Hits@ scores are all above 90 and the MRR scores are all above 0.9. This means that our
proposedmodelcanpredict mostrelevantassociated typesaccuratelyandreliably. Thebilinearproductscore
is more effective than the cosine similarity score. One possible reason behind it is that the shared embedding
matrix W∈R
×
has more modeling powerandimprovestheexpressivenessofthemodel.
EntityAlignmentTables6.5,6.6,and6.7presentacomprehensiveperformancecomparisonofourproposed
TypeEAwithpreviousbestbaselinemodelsforbothcross-KG(D-W)andcross-lingual(EN-FR)EAdatasets,
for both small (15k) and large (100k) subgraphs, and under both sparse (V1) and dense (V2) sampling
settings. Inparticular,TypeEA-B,TypeEA-R,andTypeEA-Mdenotetheresultsgeneratedfromapplyingtype
associationembeddingtobaselinemodelBootEA,RDGCN,andMultiKErespectively. Weusethegiven
133
split where the train, valid, and test set have 20%, 10%, and 70% of entity pairs respectively. We observe
consistent performance improvement as compared to previous results. Among all, the most performance
improvement is observed for EN-FR 15K V1 and EN-FR 15K V2 datasets where the Hit@1 scores are
improvedby 4.16 and 4.44 respectively.
Table6.4showsanentityalignmentexamplefromD-Y15KV1datasetcomparingtherankingofcandidate
entities. In this example, the source entity in DBpedia is “Ed. Weinberger” and we are trying to find its
counterpartinYAGO.Withoutusingthetypeinformation,thebaselinemodelBootEAmakesafewerroneous
predictionsinthetopcandidatelist. Amongtheincorrectpredictions,manyhasmismatchedtypessuchas
“TV Series”, “Movie” and they are not the matching candidate that we are looking for. The ground-truth
targethasrelativelylowranks. Afterapplyingthetypeinformation,entitieswithwrongtypelabelsareranked
lower in the predictions. This confirms our intuition that predicting entities with mismatched type is indeed a
problem of the baseline models. With the help of type association embedding, the ground-truth target can be
rankedhigher in the final alignment predictions.
In Fig. 6.3, we show plots of entity alignment accuracy as a function of alignment seed fraction. We
conduct experiments when the fraction of seed is{0.1, 0.2, 0.3, 0.4} respectively and observe consistent
improvementevenifonly 10%ofalignmentseedsareprovided. Whenwehavemorealignmentseeds,the
advantage of adding the type information seems to diminish perhaps because the entity alignment models are
making fewer mistakes that can be corrected using type information when predicting the counterpart of a
targetentity.
InFig. 6.4,wecomparethefractionofH@1predictionerrorsduetomismatchedtypelabelsforthree
baselinemodels: BootEA,MultiKE,RDGCN.Inparticular,werunexperimentsonD-W15KV1(Fig.6.4
(a)(b)) and EN-FR 15K V1 (Fig.6.4 (c)(d)) dataset. We show the error reduction for alignment for both
directions: from KG1 to KG2 and from KG2 to KG1. We observe that our TypeEA approach is effective
134
inreducingtheerrorforallthreebaselinemodels. LargestreductionisobservedfortheRDGCNbaseline,
perhapsbecause RDGCN makes the highestpercentageoftypemismatcherrors.
6.4 ConclusionandFutureWork
In this work, we present a new approach called Type-Associated Entity Alignment (TypeEA) for helping
entity alignment model decisions. We experiment with different scoring functions for modeling TypeEA and
findthatthebilinearproductisthebestforcapturingthetypeassociation. Wealsoemploytheself-adversarial
negativesamplingstrategywhichisveryeffectiveinlearningtheembedding. Weintegratetypeassociated
embeddingwithentityalignmentmodelsanddemonstratebetteralignmentperformanceonDBPv1.1dataset.
Moreover,wecollectandprepareentitytypelabelpairsdatasetscomplementarytoallsub-datasetsofDBP
v1.1sothatthetypeassociationembeddingcanbelearned. Onelimitationofourworkisthatthesubsetof
types is still heuristically selected for training reliable type association embedding. In the future, we will
further investigate how to use embedding to model more diverse entity types and more complex relationship
inentitytype pairs.
135
Chapter7
ConclusionandFutureWork
7.1 SummaryoftheResearch
In this thesis, we have described two major aspect of my research on Knowledge Graph. First, we get
inspirationfromimageprocessingresearchwheregeometricoperationsarecompoundedformanipulating
images,andtherebyproposeCompoundEembedding. Second,wefocusondesigningtypeembeddingmodels
forentitytype prediction as well as entity alignment.
CompoundEInthiswork,wecombinegeometricoperationsincludingtranslation,rotation,andscalingto
formCompoundEembedding. Wehaveshowedthatquiteafewdistance-basedKGEmodelsarespecialcases
ofCompoundE.Extensiveexperimentswereconductedforthreedatasetstodemonstratetheeffectiveness
ofCompoundE.CompoundEachievesstate-of-the-artlinkpredictionperformancewithamemorysaving
solutionforlargeKGs. Visualizationofentitysemanticsandrelationembeddingvalueswasgiventoshed
lightonthesuperior performance of CompoundE.
CompoundE3D In this work, we have examined affine operations in the 3D space, instead of the 2D
space, to allow more versatile relation representations. Besides translation, rotation, and scaling used in
CompoundE,weincludereflectionandsheartransformationswhichallowanevenlargerdesignspace. We
have proposed an adapted beam search algorithm to discover better model variants. Such a procedure avoids
unnecessary exploration ofpoor variantsbut zoomsinto moreeffectiveones tostrikea good balancebetween
136
modelcomplexityandpredictionperformance. Wehavealsoanalyzedthepropertiesofeachoperation and
its advantage in modeling different relations. Our analysis is backed by empirical results on four datasets. In
ordertoreduceerrorsofanindividualmodelvariantandboosttheoveralllinkpredictionperformance,we
aggregate decisions from different variants with two approaches; namely, the sum of weighted distances and
rankfusion.
COREInthiswork,wehaveproposedanewKGentitytypepredictionmethod,namedCORE(COmplex
space Regression and Embedding). The proposed CORE method leverages the expressive power of two
complex space embedding models; namely, RotatE and ComplEx models. It embeds entities and types in
two different complex spaces using either RotatE or ComplEx. Then, we derive a complex regression model
tolinkthesetwospaces. Finally,amechanismtooptimizeembeddingandregressionparametersjointlyis
introduced. Experiments show that COREoutperforms benchmarking methods on representative KG entity
typeinference datasets. Strengths and weaknessesofvariousentitytypepredictionmethodsareanalyzed.
TypeEAInthiswork,wehavedemonstratedthattheentitytypeinformation,whichiscommonlyavailable
inknowledgegraphs,isveryhelpfultoknowledgegraphalignmentandproposedanewmethodcalledthe
Type-associated Entity Alignment (TypeEA) accordingly. Although the auxiliary information such as textual,
visual,andtemporalfeatureswasleveragedtoimprovetheentityalignmentperformanceinthepast,theentity
typeinformationisrarelyconsideredinexistingentityalignmentmodels. TypeEAexploitstheentitytype
information to guide entity alignment models so that they can focus on entities with matching types. A type
embeddingmodelbasedonsemanticmatchingisdevelopedinTypeEAtocapturetheassociationbetween
types in different knowledge graphs. Experimental results show that the proposed TypeEA consistently
outperformsstate-of-the-artbaselinesacrossallOpenEAentityalignmentdatasetswithdifferentexperimental
settings.
137
7.2 FutureResearchDirections
Withthe adventoflargelanguagemodels(LLM) in recent years, more and moreNLP tasks are improved
significantly improved by pre-trained transformer-based models. In recent years, researchers have also
started to think about using transformer-based model solution to perform knowledge graph construction
[65]. However, initial results from early papers have not fully demonstrate the effectiveness of langugage
model based solutions. It requires not only significantly more computational resources for training than KG
embedding models, but also has very slow inference speed. There are still many issues with language model
approachthat are yet to be solved.
Figure 7.1: An Illustrative exampleofthetransformer-basedapproachforKGcompletion.
ThekeydifferencebetweentraditionalKGembeddingapproachandlanguagemodelbasedapproachis
that the former focuses on local structural information in graph whereas the later rely on pre-trained language
models to decide the contextual relatedness between entities names and descriptions. Textual descriptions of
entitiesareusuallyavailableinmanyknowledgegraphssuchasFreebase,WordNet,andWikidata. Triples
arecreatedfromlargeamountofcorpusonentitydescriptionthroughinformationextraction. Theseentity
138
descriptions are often stored in the knowledge base together with the entity entry. These textual descriptions
areveryusefulinformationandonecanusePre-trainedlanguagemodels(PLMs)toextracttextualfeatures
fromthesedescriptions. PLMsarelargescaletransformer-basedmodelthatistrainedoverlargescalecorpora.
PLMs are known to be able to generate good features for various NLP tasks. One of the famous PLMs is
BERT.BERTisapre-trainedbidirectional language modelthat is built based on transformer architecture.
Sinceoneofthetasktopre-trainBERTisnextsentenceprediction,itnaturallygenerategoodfeaturesfor
characterizing whether two sentences are closely related. KG-BERT [182] is one of the first model we
knowofthatusesPLMstoextractlinguisticfeaturesfromentitydescriptions. Itleveragestheadvantageof
BERT that is trained using next sentence prediction to determine the association of head entity and tail entity
descriptionsfor link prediction, and also tripleclassificationtasksusingthesamelogic.
ERNIE[203] proposesa transformerbasedarchitecturethatleverageslexical,syntactic,andknowledge
informationsimultaneously by encoding textualdescriptiontokensthroughcascadedtransformerblocks, as
well as concatenating textual features and entity embedding features to achieve information fusion. K-BERT
alleviatestheknowledgenoiseissuebyintroducingsoftpositionandvisiblematrix. StAR[146]andSimKGC
[149] both use two separate transformer encoder to extract the textual representation of (head entity, relation)
and (tail entity). However, they adopt very different methods to model the interaction between two encodings.
StAR learns from previous NLP literature [10, 115] to apply interactive concatenation of features such as
multiplication, subtraction, and the embedding vectors themselves to represent the semantic relationship
betweentwopartsoftriples. Ontheotherhand,SimKGCcomputescosinesimilaritybetweentwotextual
encodings. SimKGC also proposesnew negative sampling methods includingin-batch negatives, pre-batch
negatives, and self negatives to improve the performance. Graph structural information is also considered in
SimKGCtoboostthescoreofentitiesthatappearsinthek-hopneighborhood. KEPLER[152]usesthetextual
descriptions ofheadandtailentitiesasinitializationforentityembeddingsandusesTransEembeddingas a
decoder. Themaskedlanguagemodeling(MLM)lossisaddedtotheknowledgeembedding(KE)lossfor
139
overall optimization. InductivE [145] uses features from pre-trained BERT as graph embedding initialization
for inductive learning on commonsense knowledge graphs (CKG). Experimental results on CKG shows
that fastText features can exhibit comparable performance as that from BERT. BERTRL [190] fine-tunes
pre-trainedlanguagemodelbyusingrelationinstancesandpossiblereasoningpathsinthelocalneighborhood
as training samples. Relation paths in the local neighborhood are known to carry useful information for
predicting direct relation between between two entities [82, 166, 123]. In BERTRL, each relation path is
linearized to a sequence of tokens. The target triple and linearized relation paths are each fed to a pre-trained
BERTmodeltoproducealikelihoodscore. Thefinallinkpredictiondecisionsaremadefromaggregating
thesescores. Thebasiclogicinthisworkis toperformlink predictionthrough therelationpath aroundthe
targettriple. KGT5[119]proposetouseapopularseq2seqtransformermodelnamedT5[113]topre-train
on the link prediction task and perform KGQA. During link prediction training, a textual signal “predict
tail” or “predict head” is prepended to the concatenated entity and relation sequence that’s divided by a
separation token. This sequence is fed to the encoder and the decoder’s objective is to autoregressively
predictthecorrespondingtailorheadbasedonthetextualsignal. Toperformquestionanswering,textual
signal “predict answer” is prepended to the query and we expect the decoder to autoregressively generate
corresponding answer to the query. This approach claims to significantly reduce the model size and inference
time compared to other models. PKGC [96] propose a new evaluation metric that is more accurate under
open-worldassumption(OWA)setting. Webelievethatasimpledecodingschemeandefficientfine-tuningof
pre-trainedlanguage models will be the keytothesuccessofthisapproach.
However,transformer-basedapproachisstillyettobecompetitiveforallKGcompletionsettings. Although
weobservesomesuccessinapplyingtransformers-basedapproachesfordatasetslikeWN18RR[149],itis
still not as competitive for FB15k-237 in which the textual descriptions carry significant amount of noise. In
addition, the inferencetime forsome ofthe earlytransformer-basedapproachessuch asKG-BERT istoolong
to be feasible for real world applications. The computational resource requirement for reproducing results of
140
transformer-based approaches is also significantly higher than that for embedding-based approaches. Indeed,
several bottlenecks are yet to be conquered before transformer-based approaches can become a realistic
solutionforKG completion.
141
Bibliography
[1] Ralph Abboud, Ismail Ceylan, ThomasLukasiewicz,andTommasoSalvatori.“Boxe:Abox
embedding model for knowledge basecompletion”.In:Proceedingsofthe34thInternational
Conference on Neural InformationProcessingSystems33(2020),pp.9649–9661.
[2] Addi Ait-Mlouk and Lili Jiang. “KBot:AKnowledgegraphbasedchatBotfornaturallanguage
understanding over linked data”. In:IEEEAccess8(2020),pp.149220–149230.
[3] Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives.
“Dbpedia: A nucleus for a web of opendata”.In:TheSemanticWeb.Springer,2007,pp.722–735.
[4] PeterBailey, AlistairMoffat, FalkScholer, andPaul Thomas.“Retrievalconsistency inthe presence
of query variations”. In: Proceedings ofthe 40th InternationalACM SIGIR Conference on Research
andDevelopment in Information Retrieval.2017,pp.395–404.
[5] Ivana Balažević, Carl Allen, andTimothyHospedales.“Multi-relationalPoincaréGraph
Embeddings”. In: Proceedings ofthe33rdInternationalConferenceonNeuralInformation
Processing Systems. Vol. 32. 2019.
[6] Ivana Balažević, Carl Allen, and Timothy Hospedales. “TuckER: Tensor Factorization for Knowledge
Graph Completion”. In:Proceedingsofthe2019ConferenceonEmpiricalMethodsinNatural
Language Processing and the 9thInternationalJointConferenceonNaturalLanguageProcessing
(EMNLP-IJCNLP). 2019, pp. 5185–5194.
[7] Anson Bastos, KuldeepSingh, Abhishek Nadgeri, Saeedeh Shekarpour, Isaiah Onando Mulang,and
Johannes Hoffart. “Hopfe: Knowledgegraphrepresentationlearningusinginversehopffibrations”.
In:Proceedings of the 30th ACM InternationalConferenceonInformation&Knowledge
Management. 2021, pp. 89–99.
[8] KurtBollacker, Colin Evans, PraveenParitosh,TimSturge,andJamieTaylor.“Freebase:a
collaboratively created graph databaseforstructuringhumanknowledge”.In:Proceedingsof the
2008 ACM SIGMOD InternationalConferenceonManagementofData.2008,pp.1247–1250.
[9] Antoine Bordes, Nicolas Usunier,AlbertoGarcia-Duran,JasonWeston,andOksanaYakhnenko.
“Translating embeddings for modelingmulti-relationaldata”.In:Proceedingsofthe27th
International Conference on NeuralInformationProcessingSystems26(2013),pp.2787–2795.
142
[10] Samuel Bowman, Gabor Angeli, Christopher Potts, and Christopher D Manning. “A large annotated
corpus for learning natural languageinference”.In:Proceedingsofthe2015Conferenceon
Empirical Methods in Natural LanguageProcessing.2015,pp.632–642.
[11] Liwei Cai and William Yang Wang.“KBGAN:AdversarialLearningforKnowledgeGraph
Embeddings”. In: Proceedings of the2018ConferenceoftheNorthAmericanChapterofthe
Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers).
2018, pp. 1470–1480.
[12] ZheCao, Tao Qin, Tie-Yan Liu, Ming-FengTsai,andHangLi.“Learningtorank:frompairwise
approach to listwise approach”. In:Proceedingsofthe24thinternationalconferenceonMachine
learning. 2007, pp. 129–136.
[13] Zongsheng Cao, Qianqian Xu, ZhiyongYang,XiaochunCao,andQingmingHuang.“Dual
quaternion knowledge graph embeddings”.In:ProceedingsoftheThirty-FifthAAAIConference on
Artificial Intelligence . 2021, pp. 6894–6902.
[14] Zongsheng Cao, Qianqian Xu, ZhiyongYang,XiaochunCao,andQingmingHuang.“Geometry
Interaction Knowledge Graph Embeddings”. In: Proceedings of the Thirty-Sixth AAAI Conference on
Artificial Intelligence . 2022.
[15] Andrew Carlson, Justin Betteridge,BryanKisiel,BurrSettles,EstevamRHruschka,and
Tom M Mitchell. “Toward an architecture for never-ending language learning”. In: Proceedings of the
Twenty-Fourth AAAI ConferenceonArtificialIntelligence .2010,pp.1306–1313.
[16] InesChami, Adva Wolf, Da-ChengJuan,FredericSala,SujithRavi,andChristopherRé.
“Low-Dimensional HyperbolicKnowledge Graph Embeddings”. In: Proceedings of the58th Annual
Meeting of the Association for ComputationalLinguistics.2020,pp.6901–6914.
[17] Linlin Chao, Jianshan He, Taifeng Wang, and Wei Chu. “PairRE: Knowledge Graph Embeddings via
Paired Relation Vectors”. In: Proceedingsofthe59thAnnualMeetingoftheAssociationfor
Computational Linguistics and the11thInternationalJointConferenceonNaturalLanguage
Processing (Volume 1: Long Papers).2021,pp.4360–4369.
[18] Linlin Chao, Jianshan He, Taifeng Wang, and Wei Chu. “PairRE: Knowledge Graph Embeddings via
Paired Relation Vectors”. In: Proceedingsofthe59thAnnualMeetingoftheAssociationfor
Computational Linguistics and the11thInternationalJointConferenceonNaturalLanguage
Processing (Volume 1: Long Papers).2021,pp.4360–4369.
[19] Feihu Che, Dawei Zhang, Jianhua Tao, Mingyue Niu, and Bocheng Zhao. “Parame: Regarding neural
network parameters as relation embeddings for knowledge graph completion”. In: Proceedings of the
Thirty-Fourth AAAI Conference onArtificialIntelligence .Vol.34.2020,pp.2774–2781.
[20] Muhao Chen, Yingtao Tian, MohanYang,andCarloZaniolo.“MultilingualKnowledgeGraph
Embeddings for Cross-lingual KnowledgeAlignment”.In:Proceedingsofthe26thInternational
JointConference on Artificial Intelligence .2017.
143
[21] Muhao Chen, Yingtao Tian, MohanYang,andCarloZaniolo.“Multilingualknowledgegraph
embeddings for cross-lingual knowledge alignment”. In: Proceedings of the 26th International Joint
Conference on Artificial Intelligence .2017,pp.1511–1517.
[22] Tianqi Chen and Carlos Guestrin.“XGBoost:Ascalabletreeboostingsystem”.In:Proceedings of
the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016.
[23] Xuelu Chen, Muhao Chen, ChangjunFan,AnkithUppunda,YizhouSun,andCarloZaniolo.
“Multilingual Knowledge Graph Completion via Ensemble Knowledge Transfer”. In: Findings of the
Association for Computational Linguistics:EMNLP2020.2020,pp.3227–3238.
[24] YaoChen,JiangangLiu,ZheZhang,ShipingWen,andWenjunXiong.“MöbiusE:KnowledgeGraph
Embedding on Möbius Ring”. In:Knowledge-BasedSystems 227(2021),p.107181.
[25] Yihong Chen, Pasquale Minervini,SebastianRiedel,andPontusStenetorp.“RelationPrediction as
anAuxiliary Training Objective forImprovingMulti-RelationalGraphRepresentations”.In:
Proceedings of the 3rd ConferenceonAutomatedKnowledgeBaseConstruction.2021.
[26] Yihong Chen, Pasquale Minervini,SebastianRiedel,andPontusStenetorp.“RelationPrediction as
anAuxiliary Training Objective forImprovingMulti-RelationalGraphRepresentations”.In:3rd
Conference on Automated KnowledgeBaseConstruction.2021.
[27] Eunsol Choi, Omer Levy, Yejin Choi,andLukeZettlemoyer.“Ultra-FineEntityTyping”.In:
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1:
Long Papers). 2018, pp. 87–96.
[28] Gordon V Cormack, Charles LA Clarke, and Stefan Buettcher. “Reciprocal rank fusion outperforms
condorcet and individual rank learningmethods”.In:Proceedingsofthe32ndinternationalACM
SIGIR conference on Research anddevelopmentininformationretrieval.2009,pp.758–759.
[29] Zijun Cui, Pavan Kapanipathi, KartikTalamadupula,TianGao,andQiangJi.“Type-augmented
relation prediction in knowledge graphs”.In:ProceedingsoftheThirty-FifthAAAIConference on
Artificial Intelligence . 2021.
[30] Aron Culotta and Jeffrey Sorensen. “Dependencytreekernelsforrelationextraction”.In:
Proceedings of the 42nd Annual MeetingoftheAssociationforComputationalLinguistics.2004.
[31] TuDinh Nguyen Dai Quoc Nguyen,DatQuocNguyen,andDinhPhung.“ANovelEmbedding
Model for Knowledge Base CompletionBasedonConvolutionalNeuralNetwork”.In:Proceedings
ofNAACL-HLT. 2018, pp. 327–333.
[32] TimDettmers, Pasquale Minervini,PontusStenetorp,andSebastianRiedel.“Convolutional2D
knowledge graphembeddings”. In: Proceedingsof theThirty-SecondAAAI ConferenceonArtificial
Intelligence and Thirtieth InnovativeApplicationsofArtificialIntelligenceConferenceandEighth
AAAI Symposium on Educational AdvancesinArtificialIntelligence .2018,pp.1811–1818.
144
[33] Jacob Devlin, Ming-Wei Chang, KentonLee,andKristinaToutanova.“BERT:Pre-trainingof Deep
Bidirectional Transformers for Language Understanding”. In: Proceedings of the 2019 Conference of
theNorth American Chapter of theAssociationforComputationalLinguistics:HumanLanguage
Technologies, Volume 1 (Long andShortPapers).2019,pp.4171–4186.
[34] Boyang Ding, Quan Wang, Bin Wang,andLiGuo.“Improvingknowledgegraphembeddingusing
simple constraints”. In: arXiv preprintarXiv:1805.02408(2018).
[35] XinDong, Evgeniy Gabrilovich, GeremyHeitz,WilkoHorn,NiLao,KevinMurphy,
Thomas Strohmann, Shaohua Sun,andWeiZhang.“Knowledgevault:Aweb-scaleapproach to
probabilisticknowledgefusion”.In:Proceedingsofthe20thACMSIGKDDInternationalConference
onKnowledge Discovery and DataMining.2014,pp.601–610.
[36] Zhengxiao Du, Chang Zhou, JiangchaoYao,TengTu,LetianCheng,HongxiaYang,JingrenZhou,
andJie Tang. “CogKR: CognitiveGraphforMulti-hopKnowledgeReasoning”.In:IEEE
Transactions on Knowledge and DataEngineering(2021).
[37] Greg Durrett and Dan Klein. “A Joint Model for Entity Analysis: Coreference, Typing, and Linking”.
In:Transactions of the AssociationforComputationalLinguistics2(2014),pp.477–490.
[38] Takuma Ebisu and Ryutaro Ichise.“GeneralizedTranslation-BasedEmbeddingofKnowledge
Graph”. In:IEEE Transactions onKnowledgeandDataEngineering 32.5(2019),pp.941–951.
[39] Takuma Ebisu and Ryutaro Ichise.“Generalizedtranslation-basedembeddingofknowledgegraph”.
In:IEEE Transactions on KnowledgeandDataEngineering 32.5(2019),pp.941–951.
[40] Takuma Ebisu and Ryutaro Ichise.“Toruse:Knowledgegraphembeddingonaliegroup”.In:
Proceeding of the Thirty-second AAAIconferenceonartificialintelligence .2018.
[41] Miao Fan, Qiang Zhou, Emily Chang,andFangZheng.“Transition-basedknowledgegraph
embedding with relational mapping properties”. In: Proceedings of the 28th Pacific Asia conference
onLanguage, Information and Computing.2014,pp.328–337.
[42] Miao Fan, Qiang Zhou, Thomas FangZheng,andRalphGrishman.“Distributedrepresentation
learning for knowledge graphs withentitydescriptions”.In:PatternRecognitionLetters(2017).
[43] Jun Feng, MinlieHuang, Mingdong Wang, Mantong Zhou, Yu Hao, and Xiaoyan Zhu. “Knowledge
graph embedding by flexible translation”. In: Fifteenth International Conference on the Principles of
Knowledge Representation and Reasoning.2016.
[44] Aldo Gangemi, Andrea Giovanni Nuzzolese,ValentinaPresutti,FrancescoDraicchio,
AlbertoMusetti,andPaoloCiancarini.“AutomatictypingofDBpediaentities”.In:Proceedingofthe
International Semantic Web Conference.2012.
[45] Chang Gao, Chengjie Sun, Lili Shan,LeiLin,andMingjiangWang.“Rotate3d:Representing
relations as rotations in three-dimensionalspaceforknowledgegraphembedding”.In:Proceedings
ofthe 29th ACM International ConferenceonInformation&KnowledgeManagement.2020,
pp.385–394.
145
[46] Alberto García-Durán, Antoine Bordes,andNicolasUsunier.“ComposingRelationshipswith
Translations”.In:ConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP2015).
2015, pp. 286–290.
[47] Bhaskar Gautam, Oriol Ramos Terrades, Joana Maria Pujadas-Mora, and Miquel Valls. “Knowledge
graph based methods for record linkage”.In:PatternRecognitionLetters(2020).
[48] Congcong Ge, Xiaoze Liu, Lu Chen, Yunjun Gao, and Baihua Zheng. “LargeEA: aligning entities for
large-scaleknowledgegraphs”.In:ProceedingsoftheVLDBEndowment 15.2(2021),pp.237–245.
[49] Xiou Ge, Yun Cheng Wang, Bin Wang,C-CJayKuo,etal.“TypeEA:Type-AssociatedEmbedding
forKnowledge Graph Entity Alignment”.In:APSIPATransactionsonSignalandInformation
Processing12.1 (2023).
[50] Xiou Ge, Yun-Cheng Wang, Bin Wang,andC-CJayKuo.“CompoundE:Knowledgegraph
embedding with translation, rotationandscalingcompoundoperations”.In:arXivpreprint
arXiv:2207.05324 (2022).
[51] Xiou Ge, Yun-Cheng Wang, Bin Wang,andC-CJayKuo.“KnowledgeGraphEmbeddingwith 3D
Compound Geometric Transformations”.In:arXivpreprintarXiv:2304.00378(2023).
[52] Xiou Ge, Yun-Cheng Wang, Bin Wang,andCCJayKuo.“CORE:Aknowledgegraphentitytype
predictionmethodviacomplexspaceregressionandembedding”.In:PatternRecognitionLetters157
(2022), pp. 97–103.
[53] Niannian Guan, Dandan Song, andLejianLiao.“Knowledgegraphembeddingwithconcepts”. In:
Knowledge-Based Systems164 (2019),pp.38–44.
[54] JiaGuo and Stanley Kok. “BiQUE:BiquaternionicEmbeddingsofKnowledgeGraphs”.In:
Proceedings of the 2021 ConferenceonEmpiricalMethodsinNaturalLanguageProcessing. 2021,
pp.8338–8351.
[55] ShuGuo, Quan Wang, Lihong Wang,BinWang,andLiGuo.“Knowledgegraphembeddingwith
iterative guidance from soft rules”. In: Proceedings of the AAAI Conference on Artificial Intelligence .
Vol.32. 1. 2018.
[56] Nitish Gupta, Sameer Singh, and Dan Roth. “Entity linking via joint encoding of types, descriptions,
andcontext”. In:Proceedings of the2017ConferenceonEmpiricalMethodsinNaturalLanguage
Processing. 2017, pp. 2681–2690.
[57] Kelvin Guu, John Miller, and PercyLiang.“TraversingKnowledgeGraphsinVectorSpace”.In:
Proceedings of the 2015 ConferenceonEmpiricalMethodsinNaturalLanguageProcessing. 2015,
pp.318–327.
[58] Junheng Hao, Muhao Chen, WenchaoYu,YizhouSun,andWeiWang.“Universalrepresentation
learning of knowledge bases by jointlyembeddinginstancesandontologicalconcepts”.In:
Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining. 2019.
146
[59] HeHe, Anusha Balakrishnan, MihailEric,andPercyLiang.“LearningSymmetricCollaborative
Dialogue Agents with Dynamic Knowledge Graph Embeddings”. In: Proceedings of the 55th Annual
Meeting of the Association for ComputationalLinguistics(Volume1:LongPapers).2017,
pp.1766–1776.
[60] Shizhu He, Kang Liu, Guoliang Ji,andJunZhao.“Learningtorepresentknowledgegraphswith
gaussian embedding”. In:Proceedingsofthe24thACMinternationalonconferenceoninformation
andknowledge management. 2015,pp.623–632.
[61] EladHofferandNirAilon.“Deepmetriclearningusingtripletnetwork”.In: Similarity-BasedPattern
Recognition: ThirdInternational Workshop, SIMBAD2015, Copenhagen, Denmark,October 12-14,
2015. Proceedings 3. Springer. 2015,pp.84–92.
[62] Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta,
andJure Leskovec. “Open graph benchmark:datasetsformachinelearningongraphs”.In:
Proceedings of the 34th InternationalConferenceonNeuralInformationProcessingSystems. 2020,
pp.22118–22133.
[63] HaoHuang, Guodong Long, Tao Shen,JingJiang,andChengqiZhang.“RatE:Relation-Adaptive
Translating Embedding for Knowledge Graph Completion”. In: Proceedings of the 28th International
Conference on Computational Linguistics.2020,pp.556–567.
[64] Xuqian Huang, Jiuyang Tang, Zhen Tan, Weixin Zeng, Ji Wang, and Xiang Zhao. “Knowledge graph
embedding by relational and entityrotation”.In:Knowledge-BasedSystems229(2021),p.107310.
[65] Ihab F Ilyas, Theodoros Rekatsinas,VishnuKonda,JeffreyPound,XiaoguangQi,and
Mohamed Soliman. “Saga: A PlatformforContinuousConstructionandServingofKnowledge At
Scale”. In: Proceedings of the 2022InternationalConferenceonManagementofData.2022,
pp.2259–2272.
[66] NitishaJain,Jan-ChristophKalo,Wolf-TiloBalke,andRalfKrestel.“Doembeddingsactuallycapture
knowledge graph semantics?” In: ProceedingsoftheEuropeanSemanticWebConference.2021.
[67] Prachi Jain, Pankaj Kumar, SoumenChakrabarti,etal.“Type-sensitiveknowledgebaseinference
without explicit type supervision”. In: Proceedings of the 56th Annual Meeting of the Association for
Computational Linguistics. 2018.
[68] Guoliang Ji, Shizhu He, Liheng Xu,KangLiu,andJunZhao.“Knowledgegraphembeddingvia
dynamic mapping matrix”. In:Proceedingsofthe53rdAnnualMeetingoftheAssociationfor
Computational Linguistics and the7thInternationalJointConferenceonNaturalLanguage
Processing (volume 1: Long papers).Vol.1.2015,pp.687–696.
[69] GuoliangJi,KangLiu,ShizhuHe,andJunZhao.“Knowledgegraphcompletionwithadaptivesparse
transfer matrix”. In:Proceedings oftheThirtiethAAAIConferenceonArtificialIntelligence .2016,
pp.985–991.
[70] Shaoxiong Ji, Shirui Pan, Erik Cambria, Pekka Marttinen, and S Yu Philip. “A survey on knowledge
graphs: Representation, acquisition,andapplications”.In:IEEETransactionsonNeuralNetworks
andLearning Systems33.2 (2021),pp.494–514.
147
[71] Yantao Jia, Yuanzhuo Wang, HailunLin,XiaolongJin,andXueqiCheng.“Locallyadaptive
translation for knowledge graph embedding”.In:ProceedingoftheThirtiethAAAIconference on
artificial intelligence . 2016.
[72] Hailong Jin, Lei Hou, Juanzi Li, andTiansiDong.“Attributedandpredictiveentityembedding for
fine-grained entity typing in knowledge bases”. In: Proceedings of the 27th International Conference
onComputational Linguistics. 2018,pp.282–292.
[73] Hailong Jin, Lei Hou, Juanzi Li, and Tiansi Dong. “Fine-grained entity typing via hierarchical multi
graph convolutional networks”. In:Proceedingsofthe2019ConferenceonEmpiricalMethods in
Natural Language Processing andthe9thInternationalJointConferenceonNaturalLanguage
Processing (EMNLP-IJCNLP). 2019,pp.4969–4978.
[74] Unmesh Joshi and Jacopo Urbani.“Ensemble-BasedFactClassificationwithKnowledgeGraph
Embeddings”. In: The Semantic Web:19thInternationalConference,ESWC2022,Hersonissos,
Crete, Greece, May 29–June 2, 2022,Proceedings.2022,pp.147–164.
[75] Seyed Mehran Kazemi and DavidPoole.“Simpleembeddingforlinkpredictioninknowledge
graphs”. In:Proceedings of the 32ndInternationalConferenceonNeuralInformationProcessing
Systems 31 (2018), pp. 4289–4300.
[76] Diederik P Kingma and Jimmy Ba. “Adam: A method for stochastic optimization”. In: arXiv preprint
arXiv:1412.6980(2014).
[77] Alexandre Klementiev, Dan Roth,andKevinSmall.“Unsupervisedrankaggregationwith
distance-based models”. In:Proceedingsofthe25thinternationalconferenceonMachinelearning.
2008, pp. 472–479.
[78] Prodromos Kolyvakis, AlexandrosKalousis,andDimitrisKiritsis.“Hyperbolicknowledgegraph
embeddings for knowledge base completion”.In:ProceedingsoftheEuropeanSemanticWeb
Conference. Springer. 2020, pp. 199–214.
[79] C-CJay Kuo and Azad M Madni.“Greenlearning:Introduction,examplesandoutlook”.In:J. Vis.
Commun. Image. Represent. (2022),p.103685.
[80] Simon Lacoste-Julien, KonstantinaPalla,AlexDavies,GjergjiKasneci,ThoreGraepel,and
Zoubin Ghahramani. “Sigma: Simplegreedymatchingforaligninglargeknowledgebases”.In:
Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining. 2013.
[81] Timothée Lacroix, Nicolas Usunier, and Guillaume Obozinski. “Canonical tensor decomposition for
knowledge base completion”. In:InternationalConferenceonMachineLearning.PMLR.2018,
pp.2863–2872.
[82] NiLao, Tom Mitchell, and WilliamCohen.“Randomwalkinferenceandlearninginalargescale
knowledgebase”.In: Proceedings of the 2011 conference on empirical methods in natural language
processing. 2011, pp. 529–539.
[83] Steven M LaValle.Planning Algorithms.CambridgeUniversityPress,2006.
148
[84] JiayiLiandYujiuYang.“STaR:KnowledgeGraphEmbeddingbyScaling,TranslationandRotation”.
In:Proc. 11th Artif. Intell. Mob. Serv.(AIMS2022).Springer.2022,pp.31–45.
[85] RuiLi, Jianan Zhao, Chaozhuo Li,DiHe,YiqiWang,YumingLiu,HaoSun,SenzhangWang,
Weiwei Deng, Yanming Shen, et al.“House:Knowledgegraphembeddingwithhouseholder
parameterization”. In: InternationalConferenceonMachineLearning.PMLR.2022,
pp.13209–13224.
[86] Yizhi Li, Wei Fan, Chao Liu, ChenghuaLin,andJiangQian.“TranSHER:TranslatingKnowledge
Graph Embedding with Hyper-EllipsoidalRestriction”.In:Proceedingsofthe2022Conference on
Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Dec.
2022, pp. 8517–8528.
[87] Zongwei Liang, Junan Yang, Hui Liu,andKejuHuang.“Asemanticfilterbasedonrelationsfor
knowledge graph completion”. In: Proceedingsofthe2021ConferenceonEmpiricalMethods in
Natural Language Processing. 2021,pp.7920–7929.
[88] Yankai Lin, Zhiyuan Liu, HuanboLuan,MaosongSun,SiweiRao,andSongLiu.“Modeling
Relation Paths for Representation LearningofKnowledgeBases”.In:Proceedingsofthe2015
Conference on Empirical MethodsinNaturalLanguageProcessing.2015,pp.705–714.
[89] Yankai Lin, Zhiyuan Liu, MaosongSun,YangLiu,andXuanZhu.“Learningentityandrelation
embeddingsforknowledgegraphcompletion”.In:ProceedingsoftheTwenty-NinthAAAIConference
onArtificial Intelligence . 2015, pp.2181–2187.
[90] Fangyu Liu, Muhao Chen, Dan Roth,andNigelCollier.“VisualPivotingfor(Unsupervised)Entity
Alignment”. In: Proceedings of theThirty-FifthAAAIConferenceonArtificialIntelligence .2021.
[91] Hanxiao Liu, Yuexin Wu, and Yiming Yang. “Analogical inference for multi-relational embeddings”.
In:Proceedings of the 34th InternationalConferenceonInternationalConferenceonMachine
Learning. Vol. 70. 2017, pp. 2168–2178.
[92] Yuzhang Liu, Peng Wang, YingtaiLi,YizhanShao,andZhongkaiXu.“AprilE:attentionwith
pseudo residual connection for knowledgegraphembedding”.In:Proceedingsofthe28th
International Conference on ComputationalLinguistics.2020,pp.508–518.
[93] Guoming Lu, Lizong Zhang, MinjieJin,PanchengLi,andXiHuang.“Entityalignmentvia
knowledge embedding and type matching constraints for knowledge graph inference”. In: Journal of
Ambient Intelligence and HumanizedComputing(2021),pp.1–11.
[94] HaonanLu,HailinHu,andXiaodongLin.“DensE:Anenhancednon-commutativerepresentationfor
knowledge graph embedding withadaptivesemantichierarchy”.In:Neurocomputing476(2022),
pp.115–125.
[95] XinLv,LeiHou,JuanziLi,andZhiyuanLiu.“DifferentiatingConceptsandInstancesforKnowledge
Graph Embedding”. In:Proceedingsofthe2018ConferenceonEmpiricalMethodsinNatural
Language Processing. 2018, pp. 1971–1979.
149
[96] XinLv, Yankai Lin, Yixin Cao, LeiHou,JuanziLi,ZhiyuanLiu,PengLi,andJieZhou.“Do
pre-trained models benefit knowledgegraphcompletion?areliableevaluationandareasonable
approach”. In: Association for ComputationalLinguistics.2022.
[97] Shiheng Ma, Jianhui Ding, WeijiaJia,KunWang,andMinyiGuo.“Transt:Type-basedmultiple
embeddingrepresentationsfor knowledgegraphcompletion”.In: Proceedingofthe JointEuropean
Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD). Springer.
2017, pp. 717–733.
[98] Farzaneh Mahdisoltani, Joanna Biega,andFabianSuchanek.“Yago3:Aknowledgebasefrom
multilingual wikipedias”. In: Proc. 7th Biennial Conf. Innov. Data Syst. Research. CIDR Conference.
2014.
[99] GeorgeAMiller.“WordNet:alexicaldatabaseforEnglish”.In:CommunicationsoftheACM 38.11
(1995), pp. 39–41.
[100] Changsung Moon, Paul Jones, NagizaFSamatova,etal.“Learningentitytypeembeddingsfor
knowledgegraph completion”.In:Proceedingsofthe26thACMonConferenceonInformationand
Knowledge Management. 2017.
[101] Deepak Nathani, Jatin Chauhan, CharuSharma,andManoharKaul.“LearningAttention-based
Embeddings for Relation PredictioninKnowledgeGraphs”.In:Proceedingsofthe57thAnnual
Meeting of the Association for ComputationalLinguistics.2019,pp.4710–4723.
[102] Mojtaba Nayyeri, Chengjin Xu, FrancaHoffmann,MirzaMohtashimAlam,JensLehmann,and
SaharVahdati.“KnowledgeGraphRepresentationLearning usingOrdinary DifferentialEquations”.
In:Proceedings of the 2021 ConferenceonEmpiricalMethodsinNaturalLanguageProcessing.
2021, pp. 9529–9548.
[103] Mojtaba Nayyeri, Chengjin Xu, SaharVahdati,NadezhdaVassilyeva,EmanuelSallinger,
Hamed Shariat Yazdi, and Jens Lehmann.“Fantasticknowledgegraphembeddingsandhowto find
theright space for them”. In: ProceedingoftheInternationalSemanticWebConference.Springer.
2020, pp. 438–455.
[104] Dat Quoc Nguyen, Kairit Sirts, Lizhen Qu, and Mark Johnson. “STransE: a novel embedding model
ofentities and relationships in knowledgebases”.In:Proceedingsofthe2016Conferenceofthe
North American Chapter of the AssociationforComputationalLinguistics:HumanLanguage
Technologies. 2016, pp. 460–466.
[105] MaximilianNickel,VolkerTresp,andHans-PeterKriegel.“Athree-waymodelforcollectivelearning
onmulti-relational data”. In:Proceedingsofthe28thInternationalConferenceonInternational
Conference on Machine Learning.2011,pp.809–816.
[106] Nickel, Maximilian others, LorenzoRosasco,andTomasoPoggio.“Holographicembeddings of
knowledge graphs”. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence . 2016.
[107] Guanglin Niu,BoLi, YongfeiZhang, Shiliang Pu, and Jingyang Li. “AutoETER: Automated Entity
Type Representation for KnowledgeGraphEmbedding”.In:FindingsoftheAssociationfor
Computational Linguistics: EMNLP2020.2020,pp.1172–1181.
150
[108] ZhePan and Peng Wang. “HyperbolicHierarchy-AwareKnowledgeGraphEmbeddingforLink
Prediction”. In: Findings of the AssociationforComputationalLinguistics:EMNLP2021.2021,
pp.2941–2948.
[109] Heiko Paulheim and Christian Bizer.“TypeinferenceonnoisyRDFdata”.In:Proceedingof the
International Semantic Web Conference.2013.
[110] Yanhui Peng and Jing Zhang. “Lineare:Simplebutpowerfulknowledgegraphembeddingfor link
prediction”. In: Proceedings of the2020IEEEInternationalConferenceonDataMining(ICDM).
IEEE. 2020, pp. 422–431.
[111] William K Pratt.Introduction to digitalimageprocessing.CRCpress,2013.
[112] WeiQian, Cong Fu, Yu Zhu, DengCai,andXiaofeiHe.“TranslatingEmbeddingsforKnowledge
Graph Completion with Relation AttentionMechanism.”In:Proceedingsofthe27thInternational
JointConference on Artificial Intelligence .2018,pp.4286–4292.
[113] Colin Raffel, Noam Shazeer, AdamRoberts,KatherineLee,SharanNarang,MichaelMatena,
Yanqi Zhou, Wei Li, and Peter J Liu.“Exploringthelimitsoftransferlearningwithaunified
text-to-text transformer”. In: The Journal of Machine Learning Research 21.1 (2020), pp. 5485–5551.
[114] YvesRaimond, Christopher Sutton,andMarkBSandler.“Automaticinterlinkingofmusicdatasets
onthe semantic web”. In:LDOW.2008.
[115] NilsReimers and Iryna Gurevych.“Sentence-BERT:SentenceEmbeddingsusingSiamese
BERT-Networks”. In: Proceedingsofthe2019ConferenceonEmpiricalMethodsinNatural
Language Processing and the 9thInternationalJointConferenceonNaturalLanguageProcessing
(EMNLP-IJCNLP). 2019, pp. 3982–3992.
[116] Feiliang Ren, Juchen Li, Huihui Zhang,ShileiLiu,BochaoLi,RuichengMing,andYujiaBai.
“Knowledge Graph Embedding with Atrous Convolution and Residual Learning”. In: Proceedings of
the28th International ConferenceonComputationalLinguistics.2020,pp.1532–1543.
[117] Afshin Sadeghi, Damien Graux, Hamed Shariat Yazdi, and Jens Lehmann. “MDE: Multiple Distance
Embeddings for Link Prediction inKnowledgeGraphs”.In:ECAI2020.IOSPress,2020,
pp.1427–1434.
[118] Shengtian Sang, Zhihao Yang, LeiWang,XiaoxiaLiu,HongfeiLin,andJianWang.“SemaTyP: a
knowledge graph based literature mining method for drug discovery”. In: BMC Bioinform. 19 (2018),
pp.1–11.
[119] Apoorv Saxena, Adrian Kochsiek,andRainerGemulla.“Sequence-to-SequenceKnowledgeGraph
Completion and Question Answering”.In:Proceedingsofthe60thAnnualMeetingofthe
Association for Computational Linguistics.2022.
[120] Michael Schlichtkrull, Thomas NKipf,PeterBloem,RiannevandenBerg,IvanTitov,and
Max Welling. “Modeling relational data with graph convolutional networks”. In: Proceedings of the
European Semantic Web Conference.Springer.2018,pp.593–607.
151
[121] Steven M Seitz and Charles R Dyer.“Viewmorphing”.In:Proceedingofthe23rdAnnual
Conference on Computer GraphicsandInteractiveTechniques.1996,pp.21–30.
[122] Chao Shang, Yun Tang, Jing Huang,JinboBi,XiaodongHe,andBowenZhou.“End-to-end
structure-aware convolutional networksforknowledgebasecompletion”.In:Proceedingsofthe
Thirty-Third AAAI Conference on ArtificialIntelligence .Vol.33.2019,pp.3060–3067.
[123] Ying Shen, Ning Ding, Hai-Tao Zheng,YaliangLi,andMinYang.“Modelingrelationpathsfor
knowledge graph completion”. In: IEEETransactionsonKnowledgeandDataEngineering33.11
(2020), pp. 3607–3617.
[124] Richard Socher, Danqi Chen, ChristopherDManning,andAndrewNg.“Reasoningwithneural
tensor networks for knowledge basecompletion”.In:Proceedingsofthe27thInternational
Conference on Neural InformationProcessingSystems26(2013).
[125] Radina Sofronova, Russa Biswas,MehwishAlam,andHaraldSack.“Entitytypingbasedon
RDF2Vecusingsupervisedandunsupervisedmethods”.In: Proceedingsofthe EuropeanSemantic
WebConference. 2020.
[126] Tengwei Song, Jie Luo, and Lei Huang.“Rot-pro:Modelingtransitivitybyprojectioninknowledge
graph embedding”. In: Advances inNeuralInformationProcessingSystems34(2021),
pp.24695–24706.
[127] Robyn Speer, Joshua Chin, and CatherineHavasi.“ConceptNet5.5:Anopenmultilingualgraph of
general knowledge”. In:ProceedingsoftheThirty-FirstAAAIConferenceonArtificialIntelligence .
2017, pp. 4444–4451.
[128] Fabian M Suchanek, Serge Abiteboul,andPierreSenellart.“Paris:Probabilisticalignmentof
relations, instances, and schema”.In:ProceedingsoftheVLDBEndowment (2011).
[129] FabianMSuchanek,GjergjiKasneci,andGerhardWeikum.“Yago:acoreofsemanticknowledge”.
In:Proceedings of the 16th InternationalConferenceonWorldWideWeb.2007,pp.697–706.
[130] Zequn Sun, Wei Hu, and ChengkaiLi.“Cross-LingualEntityAlignmentviaJoint
Attribute-Preserving Embedding”.In:ProceedingoftheInternationalSemanticWebConference.
2017.
[131] Zequn Sun, Wei Hu, Qingheng Zhang,andYuzhongQu.“BootstrappingEntityAlignmentwith
Knowledge Graph Embedding”. In:Proceedingsofthe27thInternationalJointConferenceon
Artificial Intelligence . 2018.
[132] Zequn Sun, Qingheng Zhang, WeiHu,ChengmingWang,MuhaoChen,FarahnazAkrami,and
Chengkai Li. “A Benchmarking StudyofEmbedding-basedEntityAlignmentforKnowledge
Graphs”. In: Proceedings of the VLDBEndowment 13.11(2020).
[133] Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. “RotatE: Knowledge Graph Embedding
byRelational Rotation in ComplexSpace”.In:ProceedingsoftheInternationalConferenceon
Learning Representations (ICLR)2019.2019,pp.1–18.
152
[134] Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. “Rotate: Knowledge graph embedding by
relational rotation in complex space”.In: ProceedingsoftheInternationalConferenceonLearning
Representations (ICLR) 2019(2019).
[135] Zhen Tan, Xiang Zhao, and Wei Wang. “Representation learning of large-scale knowledge graphs via
entity feature combinations”. In: Proceedingsofthe26thACMonConferenceonInformation and
Knowledge Management. 2017, pp.1777–1786.
[136] Xiaobin Tang, Jing Zhang, Bo Chen,YangYang,HongChen,andCuipingLi.“BERT-INT:A
BERT-based Interaction Model ForKnowledgeGraphAlignment”.In:Proceedingsofthe29th
International Joint Conference onArtificialIntelligence .2020.
[137] YunTang, Jing Huang, GuangtaoWang,XiaodongHe,andBowenZhou.“OrthogonalRelation
Transforms with Graph Context Modeling for Knowledge Graph Embedding”. In: Proceedings of the
58th Annual Meeting of the AssociationforComputationalLinguistics.2020,pp.2713–2722.
[138] Yi Tay, Anh Tuan Luu, and Siu Cheung Hui. “Non-parametric estimation of multiple embeddings for
link prediction on dynamic knowledge graphs”. In: Proceeding of the Thirty-first AAAI conference on
artificial intelligence . 2017.
[139] Bayu Distiawan Trisedya, JianzhongQi,andRuiZhang.“Entityalignmentbetweenknowledge
graphs using attribute embeddings”.In: ProceedingsoftheThirty-ThirdAAAIConferenceon
Artificial Intelligence . 2019.
[140] Théo Trouillon, Johannes Welbl, SebastianRiedel,ÉricGaussier,andGuillaumeBouchard.
“Complex embeddings for simplelinkprediction”.In:Proceedingsofthe33rdInternational
Conference on International ConferenceonMachineLearning.Vol.48.2016,pp.2071–2080.
[141] Merijn Van Erp and Lambert Schomaker. “Variants of the borda count method for combining ranked
classifier hypotheses”. In: 7th InternationalWorkshoponfrontiersinhandwritingrecognition.
International Unipen Foundation.2000,pp.443–452.
[142] Shikhar Vashishth, Soumya Sanyal,VikramNitin,NileshAgrawal,andParthaTalukdar.“Interacte:
Improving convolution-based knowledgegraphembeddingsbyincreasingfeatureinteractions”. In:
Proceedings of the Thirty-Fourth AAAIConferenceonArtificialIntelligence .Vol.34.2020,
pp.3009–3016.
[143] Shikhar Vashishth, Soumya Sanyal,VikramNitin,andParthaTalukdar.“Composition-based
Multi-Relational Graph ConvolutionalNetworks”.In:ProceedingsoftheInternationalConference
onLearning Representations (ICLR)2020.2020.
[144] Denny Vrandečić and Markus Krötzsch.“Wikidata:afreecollaborativeknowledgebase”.In:
Communications of the ACM 57.10(2014),pp.78–85.
[145] BinWang, Guangtao Wang, Jing Huang,JiaxuanYou,JureLeskovec,andC-CJayKuo.“Inductive
learning on commonsense knowledge graph completion”. In: 2021 International Joint Conference on
Neural Networks (IJCNN). IEEE. 2021,pp.1–8.
153
[146] Bo Wang, Tao Shen, Guodong Long, Tianyi Zhou, Ying Wang, and Yi Chang. “Structure-augmented
textrepresentation learning for efficientknowledgegraphcompletion”.In: Proceedingsofthe Web
Conference 2021. 2021, pp. 1737–1748.
[147] Hongwei Wang, Fuzheng Zhang, JialinWang,MiaoZhao,WenjieLi,XingXie,andMinyiGuo.
“Ripplenet: Propagating user preferencesontheknowledgegraphforrecommendersystems”. In:
Proceedings of the 27th ACM internationalconferenceoninformationandknowledgemanagement.
2018, pp. 417–426.
[148] KaiWang, Yu Liu, Dan Lin, andMichaelSheng.“HyperbolicGeometryisNotNecessary:
Lightweight Euclidean-Based ModelsforLow-DimensionalKnowledgeGraphEmbeddings”. In:
Findings of the Association for ComputationalLinguistics:EMNLP2021.2021,pp.464–474.
[149] LiangWang,WeiZhao,ZhuoyuWei, andJingming Liu.“SimKGC: SimpleContrastive Knowledge
Graph Completion with Pre-trained Language Models”. In: Proceedings of the 60th Annual Meeting
ofthe Association for ComputationalLinguistics(Volume1:LongPapers).2022,pp.4281–4294.
[150] Quan Wang, Zhendong Mao, Bin Wang,andLiGuo.“Knowledgegraphembedding:Asurvey of
approaches and applications”. In: IEEETransactionsonKnowledgeandDataEngineering29.12
(2017), pp. 2724–2743.
[151] Shensi Wang, Kun Fu, Xian Sun, Zequn Zhang, Shuchao Li, and Li Jin. “Hierarchical-aware relation
rotational knowledge graph embeddingforlinkprediction”.In:Neurocomputing458(2021),
pp.259–270.
[152] XiaozhiWang,TianyuGao,ZhaochengZhu,ZhengyanZhang,ZhiyuanLiu,JuanziLi,andJianTang.
“KEPLER: A unified model for knowledgeembeddingandpre-trainedlanguagerepresentation”. In:
Transactions of the Association forComputationalLinguistics9(2021),pp.176–194.
[153] Yun-Cheng Wang, Xiou Ge, Bin Wang,andC-CJayKuo.“Greenkgc:Alightweightknowledge
graph completion method”. In:arXivpreprintarXiv:2208.09137 (2022).
[154] Yun-Cheng Wang, Xiou Ge, Bin Wang,andC-CJayKuo.“KGBoost:Aclassification-based
knowledge base completion methodwithnegativesampling”.In:PatternRecognitionLetters157
(2022), pp. 104–111.
[155] Zhen Wang, Jianwen Zhang, JianlinFeng,andZhengChen.“Knowledgegraphembeddingby
translating on hyperplanes”. In:ProceedingsoftheTwenty-EighthAAAIConferenceonArtificial
Intelligence. 2014, pp. 1112–1119.
[156] Zhichun Wang, Qingsong Lv, XiaohanLan,andYuZhang.“Cross-lingualKnowledgeGraph
Alignment via Graph ConvolutionalNetworks”.In:EMNLP.2018.
[157] JWESTON.“SupportVectorMachinesforMulti-ClassPatternRecognition”.In:Proc.7thEuropean
Symposium on Artificial Neural Networks .1999.
[158] George Wolberg.Digital image warping.Vol.10662.IEEEComputerSocietyPressLosAlamitos,
CA,1990.
154
[159] Yuting Wu, Xiao Liu, Yansong Feng,ZhengWang,RuiYan,andDongyanZhao.“Relation-aware
entity alignment for heterogeneousknowledgegraphs”.In:Proceedingsofthe28thInternational
JointConference on Artificial Intelligence (2019).
[160] HanXiao, Minlie Huang, and XiaoyanZhu.“Fromonepointtoamanifold:knowledgegraph
embeddingforpreciselinkprediction”.In:Proceedingsofthe25thInternationalJointConferenceon
Artificial Intelligence . 2016, pp. 1315–1321.
[161] HanXiao, Minlie Huang, and XiaoyanZhu.“TransG:AGenerativeModelforKnowledgeGraph
Embedding”. In:Proceedings of the54thAnnualMeetingoftheAssociationforComputational
Linguistics (Volume 1: Long Papers).2016,pp.2316–2325.
[162] Qizhe Xie, Xuezhe Ma, Zihang Dai, and Eduard Hovy. “An Interpretable Knowledge Transfer Model
for Knowledge Base Completion”. In: Proceedings of the 55th Annual Meeting of the Association for
Computational Linguistics (Volume1:LongPapers).2017,pp.950–962.
[163] RuobingXie,ZhiyuanLiu,MaosongSun,etal.“Representationlearningofknowledgegraphs with
hierarchical types”. In:Proceedingsofthe25thInternationalJointConferenceonArtificial
Intelligence. 2016.
[164] ZhiwenXie,Guangyou Zhou,JinLiu,and XiangjiHuang.“ReInceptionE:relation-aware inception
network with joint local-global structuralinformationforknowledgegraphembedding”.In:
Proceedings of the 58th Annual MeetingoftheAssociationforComputationalLinguistics.2020,
pp.5929–5939.
[165] JiXin, Yankai Lin, Zhiyuan Liu, andMaosongSun.“Improvingneuralfine-grainedentitytyping
with knowledge attention”. In:ProceedingsoftheThirty-SecondAAAIConferenceonArtificial
Intelligence. 2018.
[166] Wenhan Xiong, Thien Hoang, andWilliamYangWang.“DeepPath:AReinforcementLearning
Method for Knowledge Graph Reasoning”.In:Proceedingsofthe2017ConferenceonEmpirical
Methods in Natural Language Processing.2017,pp.564–573.
[167] Canran Xu and Ruijiang Li. “RelationEmbeddingwithDihedralGroupinKnowledgeGraph”. In:
Proceedings of the 57th Annual MeetingoftheAssociationforComputationalLinguistics.2019,
pp.263–272.
[168] Chengjin Xu, Mojtaba Nayyeri, Yung-Yu Chen, and Jens Lehmann. “Knowledge Graph Embeddings
inGeometric Algebras”. In: Proceedingsofthe28thInternationalConferenceonComputational
Linguistics. 2020, pp. 530–544.
[169] Chengjin Xu, Mojtaba Nayyeri, Sahar Vahdati, and Jens Lehmann. “Multiple run ensemble learning
withlow-dimensional knowledge graphembeddings”.In: 2021InternationalJointConference on
Neural Networks (IJCNN). IEEE. 2021,pp.1–8.
[170] Chengjin Xu, Fenglong Su, and JensLehmann.“Time-awareGraphNeuralNetworkforEntity
Alignment between Temporal KnowledgeGraphs”.In: Proceedingsofthe2021Conferenceon
Empirical Methods in Natural LanguageProcessing.2021,pp.8999–9010.
155
[171] Wentao Xu, Shun Zheng, Liang He,BinShao,JianYin,andTie-YanLiu.“SEEK:Segmented
Embedding of Knowledge Graphs”.In:Proceedingsofthe58thAnnualMeetingoftheAssociation
forComputational Linguistics. 2020,pp.3888–3897.
[172] Yadollah Yaghoobzadeh, Heike Adel,andHinrichSchütze.“NoiseMitigationforNeuralEntity
Typing and Relation Extraction”. In: Proceedings of the 15th Conference of the European Chapter of
theAssociation for ComputationalLinguistics:Volume1,LongPapers.2017,pp.1183–1194.
[173] Yadollah Yaghoobzadeh and HinrichSchütze.“Corpus-levelFine-grainedEntityTypingUsing
Contextual Information”. In:Proceedingsofthe2015ConferenceonEmpiricalMethodsinNatural
Language Processing. 2015, pp. 715–725.
[174] Bishan Yang, Wen-tau Yih, XiaodongHe,JianfengGao,andLiDeng.“Embeddingentitiesand
relations for learning and inferenceinknowledgebases”.In:ProceedingsoftheInternational
Conference on Learning Representations(ICLR)2015.2015,pp.1–13.
[175] HanYang and Junfei Liu. “Knowledgegraphrepresentationlearningasgroupoid:unifyingTransE,
RotatE, QuatE, ComplEx”. In:Proceedingsofthe30thACMInternationalConferenceon
Information & Knowledge Management.2021,pp.2311–2320.
[176] Han Yang, Leilei Zhang, BingningWang, Ting Yao,and Junfei Liu. “Cycle or Minkowski: Which is
More Appropriate for Knowledge GraphEmbedding?”In:Proceedingsofthe30thACM
International Conference on Information&KnowledgeManagement.2021,pp.2301–2310.
[177] JinfaYang, Yongjie Shi, Xin Tong,RobinWang,TaiyanChen,andXianghuaYing.“Improving
knowledge graph embeddingusing affinetransformations ofentities corresponding toeach relation”.
In:Findings of the Association forComputationalLinguistics:EMNLP2021.2021,pp.508–517.
[178] Kai Yang, Shaoqin Liu, Junfeng Zhao, Yasha Wang, and Bing Xie. “Cotsae: Co-training of structure
andattribute embeddings for entityalignment”.In:ProceedingsoftheThirty-FourthAAAI
Conference on Artificial Intelligence .2020.
[179] Shihui Yang, Jidong Tian, HonglunZhang,JunchiYan,HaoHe,andYaohuiJin.“TransMS:
Knowledge Graph Embedding forComplexRelationsbyMultidirectionalSemantics.”In:
Proceedings of the 28th InternationalJointConferenceonArtificialIntelligence .2019,
pp.1935–1942.
[180] Tong Yang, Long Sha, and PengyuHong.“NagE:Non-Abeliangroupembeddingforknowledge
graphs”. In:Proceedings of the 29thACMInternationalConferenceonInformation&Knowledge
Management. 2020, pp. 1735–1742.
[181] Yijing Yang, Wei Wang, Hongyu Fu, C-C Jay Kuo, et al. “On supervised feature selection from high
dimensional feature spaces”. In:APSIPATransactionsonSignalandInformationProcessing11.1
(2022).
[182] LiangYao,ChengshengMao,andYuanLuo.“KG-BERT:BERTforknowledgegraphcompletion”.
In:arXiv preprint arXiv:1909.03193(2019).
156
[183] Hee-Geun Yoon, Hyun-Je Song, Seong-BaePark,andSe-YoungPark.“Atranslation-based
knowledge graph embedding preservinglogicalpropertyofrelations”.In:proceedingsofthe2016
conference of the North Americanchapteroftheassociationforcomputationallinguistics:human
language technologies. 2016, pp.907–916.
[184] JinxingYu,YunfengCai,MingmingSun,andPingLi.“MQuadE:aunifiedmodelforknowledgefact
embedding”. In:Proceedings of the30thInternationalConferenceonWorldWideWeb.2021,
pp.3442–3452.
[185] Long Yu, Zhicong Luo, HuanyongLiu,DengLin,HongzhuLi,andYafengDeng.“Triplere:
Knowledge graph embeddings viatripledrelationvectors”.In:arXivpreprintarXiv:2209.08271
(2022).
[186] Tong Yu, Jinghua Li, Qi Yu, Ye Tian,XiaofengShun,LiliXu,LingZhu,andHongjieGao.
“Knowledge graph for TCM healthpreservation:Design,construction,andapplications”.In:Artif.
Intell. Med.77 (2017), pp. 48–52.
[187] JunYuan, Neng Gao, and Ji Xiang.“Transgate:knowledgegraphembeddingwithsharedgate
structure”. In: Proceedings of the Thirty-ThirdAAAIConferenceonArtificialIntelligence .Vol. 33.
2019, pp. 3100–3107.
[188] Kaisheng Zeng, Chengjiang Li, LeiHou,JuanziLi,andLingFeng.“Acomprehensivesurvey of
entity alignment for knowledge graphs”.In:AIOpen(2021).
[189] Xiangxiang Zeng, Xinqi Tu, YuanshengLiu,XiangzhengFu,andYansenSu.“Towardbetterdrug
discovery with knowledge graph”.In:Curr.Opin.Struct.Biol. 72(2022),pp.114–126.
[190] HanwenZha,ZhiyuChen,andXifengYan.“InductiverelationpredictionbyBERT”.In:Proceedings
ofthe AAAI Conference on ArtificialIntelligence .Vol.36.5.2022,pp.5923–5931.
[191] QingZhanandHangYin.“Aloanapplicationfrauddetectionmethodbasedonknowledgegraphand
neural network”. In: Proc. 2nd Int.Conf.Innov.Artif.Intell.2018,pp.111–115.
[192] Fuxiang Zhang, Xin Wang, Zhao Li, and Jianxin Li. “TransRHS: A Representation Learning Method
forKnowledge Graphs with RelationHierarchicalStructure.”In:Proceedingsofthe29th
International Joint Conference onArtificialIntelligence .2020,pp.2987–2993.
[193] Qianjin Zhang, Ronggui Wang, JuanYang,andLixiaXue.“Knowledgegraphembeddingby
reflection transformation”. In: Knowledge-BasedSystems238(2022),p.107861.
[194] Qianjin Zhang, Ronggui Wang, JuanYang,andLixiaXue.“Structuralcontext-basedknowledge
graph embedding for link prediction”.In:Neurocomputing470(2022),pp.109–120.
[195] Qingheng Zhang, Zequn Sun, WeiHu,MuhaoChen,LingbingGuo,andYuzhongQu.“Multi-view
Knowledge Graph Embedding for Entity Alignment”. In: Proceedings of the 28th International Joint
Conference on Artificial Intelligence .2019.
157
[196] RuiZhang, Bayu Distiawan Trisedya,MiaoLi,YongJiang,andJianzhongQi.“Abenchmark and
comprehensive survey on knowledgegraphentityalignmentviarepresentationlearning”.In:The
VLDB Journal (2022), pp. 1–26.
[197] Shuai Zhang, Yi Tay, Lina Yao, andQiLiu.“Quaternionknowledgegraphembeddings”.In:
Proceedingsofthe 33rdInternationalConferenceon Neural InformationProcessingSystems.2019,
pp.2735–2745.
[198] WenZhang, Bibek Paudel, Wei Zhang,AbrahamBernstein,andHuajunChen.“Interaction
embeddings for prediction and explanationinknowledgegraphs”.In:ProceedingsoftheTwelfth
ACM International Conference onWebSearchandDataMining.2019,pp.96–104.
[199] YongqiZhang,QuanmingYao,WenyuanDai,andLeiChen.“AutoSF:Searchingscoringfunctions
forknowledge graph embedding”.In: Proceedingsofthe2020IEEE36thInternationalConference
onData Engineering (ICDE). IEEE.2020,pp.433–444.
[200] ZhanqiuZhang,JianyuCai,andJieWang.“Duality-inducedregularizerfortensorfactorizationbased
knowledge graph completion”. In: AdvancesinNeuralInformationProcessingSystems33(2020),
pp.21604–21615.
[201] Zhanqiu Zhang, Jianyu Cai, Yongdong Zhang, and Jie Wang. “Learning hierarchy-aware knowledge
graph embeddings for link prediction”.In: ProceedingsoftheThirty-FourthAAAIConference on
Artificial Intelligence . Vol. 34. 2020,pp.3065–3072.
[202] ZhaoliZhang,ZhifeiLi,HaiLiu,andNealNXiong.“Multi-scaledynamicconvolutionalnetworkfor
knowledge graph embedding”. In: IEEETransactionsonKnowledgeandDataEngineering34.5
(2022), pp. 2335–2347.
[203] Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, and Qun Liu. “ERNIE: Enhanced
LanguageRepresentationwithInformativeEntities”.In:Proceedingsofthe57thAnnualMeeting of
theAssociation for ComputationalLinguistics.2019,pp.1441–1451.
[204] Xiang Zhao,Weixin Zeng, Jiuyang Tang, Wei Wang,and Fabian Suchanek.“An experimental study
ofstate-of-the-art entity alignmentapproaches”.In:IEEETransactionsonKnowledgeandData
Engineering (2020).
[205] YuZhao, AnxiangZhang, Ruobing Xie, Kang Liu, and Xiaojie Wang. “Connecting embeddings for
knowledge graph entity typing”. In:Proceedingsofthe58thAnnualMeetingoftheAssociation for
Computational Linguistics. 2020.
[206] GuoDong Zhou, Jian Su, Jie Zhang,andMinZhang.“Exploringvariousknowledgeinrelation
extraction”. In:Proceedings of the43rdAnnualMeetingoftheAssociationforComputational
Linguistics. 2005.
[207] KunZhou, Wayne Xin Zhao, ShuqingBian,YuanhangZhou,Ji-RongWen,andJingsongYu.
“Improving conversational recommendersystemsviaknowledgegraphbasedsemanticfusion”. In:
Proceedings of the 26th ACM SIGKDDinternationalconferenceonknowledgediscovery&data
mining. 2020, pp. 1006–1014.
158
[208] Xiaofei Zhou, Lingfeng Niu, QiannanZhu,XingquanZhu,PingLiu,JianlongTan,andLiGuo.
“Knowledge graph embedding bydoublelimitscoringloss”.In:IEEETransactionsonKnowledge
andData Engineering34.12 (2021),pp.5825–5839.
[209] Xiaofei Zhou, Qiannan Zhu, Ping Liu, and Li Guo. “Learning knowledge embeddings by combining
limit-based scoring loss”. In:Proceedingsofthe26thACMonConferenceonInformationand
Knowledge Management. 2017.
[210] Qiannan Zhu, Xiaofei Zhou, Jia Wu,JianlongTan,andLiGuo.“Neighborhood-AwareAttentional
Representation for Multilingual Knowledge Graphs.” In: Proceedings of the 28th International Joint
Conference on Artificial Intelligence .2019.
[211] Xiaoqian Zhu, Xiang Ao, Zidi Qin, Yanpeng Chang, Yang Liu, Qing He, and Jianping Li. “Intelligent
financial fraud detection practices inpost-pandemicera”.In: TheInnov. 2.4(2021),p.100176.
[212] YanZhuang, Guoliang Li, ZhuojianZhong,andJianhuaFeng.“PBA:Partitionandblockingbased
alignment for large knowledge bases”.In:DatabaseSystemsforAdvancedApplications:21st
International Conference, DASFAA2016,Dallas,TX,USA,April16-19,2016,Proceedings,Part I
21.Springer. 2016, pp. 415–431.
[213] Jianhuan Zhuo, Qiannan Zhu, YinliangYue,YuhongZhao,andWeisiHan.“A
Neighborhood-Attention Fine-grainedEntityTypingforKnowledgeGraphCompletion”.In:
Proceedings ofthe Fifteenth ACM International Conferenceon WebSearch and DataMining. 2022,
pp.1525–1533.
159
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Word, sentence and knowledge graph embedding techniques: theory and performance evaluation
PDF
Efficient graph learning: theory and performance evaluation
PDF
Labeling cost reduction techniques for deep learning: methodologies and applications
PDF
Green knowledge graph completion and scalable generative content delivery
PDF
Theory and applications of adversarial and structured knowledge learning
PDF
Advances in understanding and leveraging structured data for knowledge-intensive tasks
PDF
Graph embedding algorithms for attributed and temporal graphs
PDF
Advanced techniques for stereoscopic image rectification and quality assessment
PDF
Human activity analysis with graph signal processing techniques
PDF
Advanced machine learning techniques for video, social and biomedical data analytics
PDF
Data-driven image analysis, modeling, synthesis and anomaly localization techniques
PDF
Compression of signal on graphs with the application to image and video coding
PDF
Graph-based models and transforms for signal/data processing with applications to video coding
PDF
Scaling up temporal graph learning: powerful models, efficient algorithms, and optimized systems
PDF
Transforming unstructured historical and geographic data into spatio-temporal knowledge graphs
PDF
Lifting transforms on graphs: theory and applications
PDF
Advanced techniques for human action classification and text localization
PDF
Dynamic graph analytics for cyber systems security applications
PDF
Effective graph representation and vertex classification with machine learning techniques
PDF
Data-efficient image and vision-and-language synthesis and classification
Asset Metadata
Creator
Ge, Xiou
(author)
Core Title
Advanced knowledge graph embedding techniques: theory and applications
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Degree Conferral Date
2023-08
Publication Date
05/16/2023
Defense Date
05/04/2023
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
entity alignment,entity typing,geometric transformation,knowledge graph embedding,OAI-PMH Harvest,representation learning
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Kuo, C.-C. Jay (
committee chair
), Chugg, Keith M. (
committee member
), Nakano, Aiichiro (
committee member
)
Creator Email
ge.xiou.2012@gmail.com,xiouge@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113131535
Unique identifier
UC113131535
Identifier
etd-GeXiou-11860.pdf (filename)
Legacy Identifier
etd-GeXiou-11860
Document Type
Dissertation
Format
theses (aat)
Rights
Ge, Xiou
Internet Media Type
application/pdf
Type
texts
Source
20230518-usctheses-batch-1046
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
entity alignment
entity typing
geometric transformation
knowledge graph embedding
representation learning