Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Extracting topic-sensitive content from textual documents - A hybrid topic model approach

Liang, Yan, Liu, Ying ORCID: https://orcid.org/0000-0001-9319-5940, Chen, Chong and Jiang, Zhigang 2018. Extracting topic-sensitive content from textual documents - A hybrid topic model approach. Engineering Applications of Artificial Intelligence 70 , pp. 81-91. 10.1016/j.engappai.2017.12.010

[thumbnail of EAAI-v16.pdf]
Preview
PDF - Accepted Post-Print Version
Download (1MB) | Preview

Abstract

When exploring information of a topic, users often concern its different aspects. For instance, product designers are interested in seeking information of specific topic aspects such as technical challenge and usability from online consumer opinions, while potential buyers wish to obtain general sentiment of public opinions. In this paper, we study an interesting problem called topic-sensitive content extraction (TSCE). TSCE aims to extract contents that are relevant to the samples of topic aspects highlighted by users from a single document in a given text collection. To tackle TSCE, we have proposed a new hybrid topic model which integrates different structures in both topic space and context space. It focuses on identifying contents associated with a specified topic aspect from each document. By modeling gradient documents via term profiles for context modeling and by leveraging local and global differences between probability distributions over words in both topic modeling and context modeling, it has better captured the features of various language patterns. Hence, sentence relevance ranking according to a specific topic aspect is largely improved. The experimental studies on extracting critical contents of specific aspects, including motivation and design solution, from technical patents for design analysis have shown the merits of the proposed modeling.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Engineering
Publisher: Elsevier / International Federation of Automatic Control (IFAC)
ISSN: 0952-1976
Date of First Compliant Deposit: 2 January 2018
Date of Acceptance: 27 December 2017
Last Modified: 07 Nov 2023 04:52
URI: https://orca.cardiff.ac.uk/id/eprint/107835

Citation Data

Cited 11 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics