Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

A deep learning approach to self-expansion of abbreviations based on morphology and context distance

Chopard, Daphne and Spasic, Irena 2019. A deep learning approach to self-expansion of abbreviations based on morphology and context distance. Presented at: SLSP 2019: 7th International Conference on Statistical Language and Speech Processing, Ljubljana, Slovenia, 14-16 October 2019. Published in: Martín-Vide, Carlos, Purver, Matthew and Pollak, Senja eds. Statistical Language and Speech Processing: 7th International Conference, SLSP 2019, Ljubljana, Slovenia, October 14–16, 2019, Proceedings. Springer, pp. 71-82. 10.1007/978-3-030-31372-2_6

Full text not available from this repository. (Request a copy)

Abstract

Abbreviations and acronyms are shortened forms of words or phrases that are commonly used in technical writing. In this study we focus specifically on abbreviations and introduce a corpus-based method for their expansion. The method divides the processing into three key stages: abbreviation identification, full form candidate extraction, and abbreviation disambiguation. First, potential abbreviations are identified by combining pattern matching and named entity recognition. Both acronyms and abbreviations exhibit similar orthographic properties, thus additional processing is required to distinguish between them. To this end, we implement a character-based recurrent neural network (RNN) that analyses the morphology of a given token in order to classify it as an acronym or an abbreviation. A siamese RNN that learns the morphological process of word abbreviation is then used to select a set of full form candidates. Having considerably constrained the search space, we take advantage of the Word Mover’s Distance (WMD) to assess semantic compatibility between an abbreviation and each full form candidate based on their contextual similarity. This step does not require any corpusbased training, thus making the approach highly adaptable to different domains. Unlike the vast majority of existing approaches, our method does not rely on external lexical resources for disambiguation, but with a macro F-measure of 96.27% is comparable to the state-of-the art.

Item Type: Conference or Workshop Item (Paper)
Date Type: Published Online
Status: Published
Schools: Computer Science & Informatics
Data Innovation Research Institute (DIURI)
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Publisher: Springer
ISBN: 9783030313715
Last Modified: 08 Oct 2019 15:00
URI: http://orca.cf.ac.uk/id/eprint/125811

Actions (repository staff only)

Edit Item Edit Item