Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Learning facet-specific entity embeddings

Alshaikh, Rana 2021. Learning facet-specific entity embeddings. PhD Thesis, Cardiff University.
Item availability restricted.

[thumbnail of Rana Alshaikh, PhD, Thesis] PDF (Rana Alshaikh, PhD, Thesis) - Accepted Post-Print Version
Available under License Creative Commons Attribution No Derivatives.

Download (3MB)
[thumbnail of Cardiff University Electronic Publication Form] PDF (Cardiff University Electronic Publication Form) - Supplemental Material
Restricted to Repository staff only

Download (164kB) | Request a copy

Abstract

An entity embedding is a vector space representation of entities in which similar entities have similar representations. However, similarity is a multi-faceted notion; for example, a person may be similar to one group of people because they graduated from the same university and similar to another group through having the same nationality or playing the same sport. Our hypothesis in this thesis is that learning a single entity embedding is a sub-optimal way to faithfully capture these different facets of similarity. Therefore, this thesis aims to learn facet-specific entity embeddings that capture different facets of similarity, taking inspiration from a framework widely known in cognitive science called conceptual spaces framework. Conceptual spaces [48] are vector space models designed to represent entities of a given kind (e.g. movies), together with their associated properties (e.g. scary), and concepts (e.g. thrillers). As such, they are similar in spirit to the vector space models that have been proposed in natural language processing, but there are also notable differences. First, the dimensions of conceptual spaces, referred to as quality dimensions, are interpretable, as they correspond to semantically meaningful features. Second, conceptual spaces are organized into sets of semantic domains or facets (e.g. genre, language), which are formed by grouping the quality dimensions. Each facet is associated with its own low-dimensional vector space, which intuitively captures similarity with respect to the corresponding facet. For instance, the vector space for the budget facet would only capture whether two movies had similar budgets. From an application point of view, the fact that conceptual spaces are structured into facets is appealing because this allows us to model the different facets of similarity in a more flexible and cognitively more plausible way. Based on this, we hypothesize that learning facet-specific entity embeddings that are similar in spirit to conceptual spaces will allow us to predict the properties and categories of entities more reliably than from standard single space representations. Learning data-driven conceptual spaces, especially in an unsupervised way, has received very limited attention to date. Therefore, in this thesis, we will learn facet-specific entity embeddings that is similar in spirit to conceptual spaces. This includes learning quality dimensions and then grouping them into facets. In particular, in this thesis, we propose three unsupervised models to learn this type of vector space representations for a set of entities using their textual descriptions. In two of these models, we convert traditional vector space embeddings into facet-specific entity embeddings, using quality dimensions-like features. In these cases, we rely on an existing method to learn these features. In our first proposed model, we structured the vector space representations implicitly into meaningful facets by identifying the quality dimensions in a two-level hierarchy: The first level corresponds to the facets, and the second level corresponds to the facet-specific features. In our second developed model, using the quality dimensions and pre-trained word embeddings, we decompose the vector space representations into low-dimensional facets in an incremental way. In both of these models, we depend on clustering algorithms to find facet-specific features. In contrast, our third proposed model uses a mixture-of experts formulation to find the features that describe each facet and it simultaneously learns the facet-specific embeddings directly from the bag-of-words. We evaluate our models on several datasets, each of which contains a set of entities with their textual descriptions and a number of classification tasks, using a range of different classifiers. The experimental results support our hypothesis that, by capturing different facets of similarity, facet-specific vector space representations improve a model’s ability to predict the categories and properties of entities.

Item Type: Thesis (PhD)
Date Type: Acceptance
Status: Unpublished
Schools: Computer Science & Informatics
Subjects: Q Science > Q Science (General)
T Technology > T Technology (General)
Date of First Compliant Deposit: 15 July 2021
Date of Acceptance: April 2021
Last Modified: 19 May 2023 01:07
URI: https://orca.cardiff.ac.uk/id/eprint/142483

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics