Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Unsupervised multi-word term recognition in Welsh

Spasic, Irena, Owen, David, Knight, Dawn and Artemiou, Andreas 2019. Unsupervised multi-word term recognition in Welsh. Presented at: Celtic Language Technology Workshop 2019, Dublin, Ireland, 19 August 2019. Published in: Lynn, Teresa, Prys, Delyth, Batchelor, Colin and Tyers, Francis eds. Proceedings of the Celtic Language Technology Workshop. European Association for Machine Translation,

[img]
Preview
PDF - Published Version
Available under License Creative Commons Attribution.

Download (261kB) | Preview

Abstract

This paper investigates an adaptation of an existing system for multi-word term recognition, originally developed for English, for Welsh. We overview the modifications required with a special focus on an important difference between the two representatives of two language families, Germanic and Celtic, which is concerned with the directionality of noun phrases. We successfully modelled these differences by means of lexico–syntactic patterns, which represent parameters of the system and, therefore, required no re–implementation of the core algorithm. The performance of the Welsh version was compared against that of the English version. For this purpose, we assembled three parallel domain–specific corpora. The results were compared in terms of precision and recall. Comparable performance was achieved across the three domains in terms of the two measures (P = 68.9%, R = 55.7%), but also in the ranking of automatically extracted terms measured by weighted kappa coefficient (k = 0.7758). These early results indicate that our approach to term recognition can provide a basis for machine translation of multi-word terms.

Item Type: Conference or Workshop Item (Paper)
Date Type: Published Online
Status: Published
Schools: Mathematics
English, Communication and Philosophy
Computer Science & Informatics
Data Innovation Research Institute (DIURI)
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Publisher: European Association for Machine Translation
Related URLs:
Date of First Compliant Deposit: 8 October 2019
Last Modified: 08 Oct 2019 14:15
URI: http://orca.cf.ac.uk/id/eprint/125820

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics