Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Automatically identifying instances of change in diachronic corpus data

Buerki, Andreas 2013. Automatically identifying instances of change in diachronic corpus data. Presented at: Corpus Linguistics 2013, Lancaster University, UK, 22 - 26 July 2013.

[img]
Preview
PDF - Accepted Post-Print Version
Download (132kB) | Preview

Abstract

With the increasing availability of diachronic corpora, machine-aided identification of linguistic items that have undergone significant change is set to become an important task. This importance is heightened further if, as Hilpert and Gries (2009:386) have argued, approaching linguistic change in a data-driven manner can reveal otherwise unnoticed phenomena. Key to this endeavour is being able to tell apart relevant change from noise and random or other synchronic variation. This non-trivial task differs in important ways from the much more widely investigated comparison of linguistic features between two (usually contemporary) corpora and has to date not received the attention it should perhaps be afforded. In this paper, a number of methods for identifying relevant change are reviewed and a procedure suggested which has not so far been documented. This new procedure is based on a simple chi-square test for goodness of fit, combined with additional parameters. Its operation is illustrated using the example of a study conducted to investigate motivation of recent and ongoing change in Multi-word Expressions (MWEs) using data taken from the 20-million word Swiss Text Corpus (STC). The STC is a corpus of 20th century written German as used in Switzerland (Bickel et al 2009). Results of the application of the proposed method indicate that the procedure yields high-quality instances of significant change in the data and is applicable to MWEs as well as a range of other linguistic items. It is able to identify instances of change with fewer arbitrary decisions and able to identify a wider range of different types of change than other suggested methods. Additionally, it shows that both the structure of the data as well as particular research interests will guide the choice of method used to identify relevant change.

Item Type: Conference or Workshop Item (Paper)
Date Type: Completion
Status: Unpublished
Schools: English, Communication and Philosophy
Subjects: P Language and Literature > P Philology. Linguistics
Related URLs:
Date of First Compliant Deposit: 30 March 2016
Last Modified: 04 Jun 2017 08:26
URI: http://orca.cf.ac.uk/id/eprint/77956

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics