Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Frequency consolidation among word N-grams: A practical procedure

Buerki, Andreas 2017. Frequency consolidation among word N-grams: A practical procedure. Presented at: EUROPHRAS 2017: Computational and Corpus-based Phraseology, London, UK, 13-14 November 2017.

Full text not available from this repository.

Abstract

This paper considers the issue of frequency consolidation in lists of different length word n-grams (i.e. recurrent word sequences) extracted from the same underlying corpus. A simple algorithm – enhanced by a preparatory stage – is proposed which allows the con- solidation of frequencies among lists of different length n-grams, from 2-grams to 6-grams and beyond. The consolidation adjusts the frequency count of each n-gram to the number of its occurrences minus its occur- rences as part of longer n-grams. Among other uses, such a procedure aids linguistic analysis and allows the non-inflationary counting of word tokens that are part of frequent n-grams of various lengths, which in turn allows an assessment of the proportion of running text made up of recurring chunks. The proposed procedure delivers frequency consolida- tion and substring reduction among word n-grams and is independent of any particular method of n-gram extraction and filtering, making it applicable also in situations where full access to underlying corpora is unavailable.

Item Type: Conference or Workshop Item (Paper)
Date Type: Completion
Status: Unpublished
Schools: English, Communication and Philosophy
Subjects: P Language and Literature > P Philology. Linguistics
Related URLs:
Last Modified: 24 Nov 2017 11:51
URI: http://orca.cf.ac.uk/id/eprint/106952

Actions (repository staff only)

Edit Item Edit Item