Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Knowledge extraction from a small corpus of unstructured safeguarding reports

Edwards, Aleksandra, Preece, Alun D. ORCID: https://orcid.org/0000-0003-0349-9057 and De Ribaupierre, Helene 2019. Knowledge extraction from a small corpus of unstructured safeguarding reports. Presented at: 16th ESWC 2019, Portoroz, Slovenia, 2-6 June 2019.

[thumbnail of KnowledgeExtractionFromaSmallCorpusofUnstructuredSafeguardingReports.pdf]
Preview
PDF - Accepted Post-Print Version
Download (1MB) | Preview

Abstract

This paper presents results on the performance of a range of analysis tools for extracting entities and sentiments from a small corpus of unstructured, safeguarding reports. We use sentiment analysis to identify strongly positive and strongly negative segments in an attempt to attribute patterns on the sentiments extracted to specific entities. We use entity extraction for identifying key entities. We evaluate tool performance against non-specialist human annotators. An initial study comparing the inter-human agreement against inter-machine agreement shows higher overall scores from human annotators than software tools. However, the degree of consensus between the human annotators for entity extraction is lower than expected which suggests a need for trained annotators. For sentiment analysis, the annotators reached a higher agreement for annotating descriptive sentences compared to reflective sentences, while the inter-tool agreement was similarly low for the two sentence types. The poor performance of the entity extraction and sentiment analysis approaches point to the need for domain-specific approaches for knowledge extraction on these kinds of document. However, there is currently a lack of pre-existing ontologies in the safeguarding domain. Thus, in future, our focus is the development of such a domain-specific ontology.

Item Type: Conference or Workshop Item (Paper)
Date Type: Completion
Status: Unpublished
Schools: Computer Science & Informatics
Crime and Security Research Institute (CSURI)
Related URLs:
Date of First Compliant Deposit: 14 August 2019
Last Modified: 04 Nov 2022 12:25
URI: https://orca.cardiff.ac.uk/id/eprint/123018

Citation Data

Cited 1 time in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics