Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Jointly learning word embeddings and latent topics

Shi, Bei, Lam, Wai, Jameel, Shoaib, Schockaert, Steven and Lai, Kwun Ping 2017. Jointly learning word embeddings and latent topics. Presented at: 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Tokyo, Japan, 7-11 August 2017. SIGIR '17 Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, pp. 375-384. 10.1145/3077136.3080806

[img]
Preview
PDF - Accepted Post-Print Version
Download (1MB) | Preview

Abstract

Word embedding models such as Skip-gram learn a vector-space representation for each word, based on the local word collocation patterns that are observed in a text corpus. Latent topic models, on the other hand, take a more global view, looking at the word distributions across the corpus to assign a topic to each word occurrence. These two paradigms are complementary in how they represent the meaning of word occurrences. While some previous works have already looked at using word embeddings for improving the quality of latent topics, and conversely, at using latent topics for improving word embeddings, such "two-step'' methods cannot capture the mutual interaction between the two paradigms. In this paper, we propose STE, a framework which can learn word embeddings and latent topics in a unified manner. STE naturally obtains topic-specific word embeddings, and thus addresses the issue of polysemy. At the same time, it also learns the term distributions of the topics, and the topic distributions of the documents. Our experimental results demonstrate that the STE model can indeed generate useful topic-specific word embeddings and coherent latent topics in an effective and efficient way.

Item Type: Conference or Workshop Item (Paper)
Date Type: Publication
Status: Published
Schools: Computer Science & Informatics
Publisher: ACM
ISBN: 978-1-4503-5022-8
Related URLs:
Date of First Compliant Deposit: 11 July 2017
Last Modified: 08 Aug 2019 10:33
URI: http://orca.cf.ac.uk/id/eprint/100911

Citation Data

Cited 29 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics