Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Korpusgeleitete Extraktion von Mehrwortsequenzen aus (diachronen) Korpora: Vorgehenswege für deutschsprachige Daten [‘Corpus-led extraction of multiword sequences from (diachronic) corpora: procedures for work on German-language data’]

Buerki, Andreas ORCID: https://orcid.org/0000-0003-2151-3246 2012. Korpusgeleitete Extraktion von Mehrwortsequenzen aus (diachronen) Korpora: Vorgehenswege für deutschsprachige Daten [‘Corpus-led extraction of multiword sequences from (diachronic) corpora: procedures for work on German-language data’]. Aspekte der historischen Phraseologie und Phraseographie, Heidelberg: Universitätsverlag Winter,

Full text not available from this repository.

Abstract

With the increasing availability of diachronic corpora, the automatic extraction of phraseological phenomena is becoming an important concern of diachronic phraseological research. To date, there have been few suggestions for corpus-driven extraction procedures specifically tailored to linguistic research on German corpora and the very prospect of a useful corpus-driven extraction is challenged by a number of remaining problems. In this paper we sought to assess the feasibility of an entirely corpus-driven approach to multiword sequence extraction and the influence of factors such as the part-lemmatisation of source data, various filters and the incorporation of sequence-internal variable slots. Using a subcorpus of the Swiss Text Corpus as test data, we first developed an operationalization of multiword sequences and then devised a procedure which is able to extract them with a precision of upward of 70% while also providing adequate recall and transparency of results. Best results where obtained with a frequency-based filter combined with a lexico-structural filter, part-lemmatisation and the incorporation of optional variable slots.

Item Type: Book Section
Date Type: Publication
Status: Published
Schools: English, Communication and Philosophy
Subjects: P Language and Literature > P Philology. Linguistics
Publisher: Universitätsverlag Winter
Last Modified: 28 Oct 2022 10:23
URI: https://orca.cardiff.ac.uk/id/eprint/78124

Actions (repository staff only)

Edit Item Edit Item