Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Scalable real-time classification of data streams with concept drift

Tennant, Mark, Stahl, Frederic, Rana, Omer ORCID: https://orcid.org/0000-0003-3597-2646 and Gomes, João Bártolo 2017. Scalable real-time classification of data streams with concept drift. Future Generation Computer Systems 75 , pp. 187-199. 10.1016/j.future.2017.03.026

[thumbnail of 1-s2.0-S0167739X17304685-main.pdf] PDF - Published Version
Available under License Creative Commons Attribution.

Download (2MB)

Abstract

Inducing adaptive predictive models in real-time from high throughput data streams is one of the most challenging areas of Big Data Analytics. The fact that data streams may contain concept drifts (changes of the pattern encoded in the stream over time) and are unbounded, imposes unique challenges in comparison with predictive data mining from batch data. Several real-time predictive data stream algorithms exist, however, most approaches are not naturally parallel and thus limited in their scalability. This paper highlights the Micro-Cluster Nearest Neighbour (MC-NN) data stream classifier. MC-NN is based on statistical summaries of the data stream and a nearest neighbour approach, which makes MC-NN naturally parallel. In its serial version MC-NN is able to handle data streams, the data does not need to reside in memory and is processed incrementally. MC-NN is also able to adapt to concept drifts. This paper provides an empirical study on the serial algorithm’s speed, adaptivity and accuracy. Furthermore, this paper discusses the new parallel implementation of MC-NN, its parallel properties and provides an empirical scalability study.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Computer Science & Informatics
Uncontrolled Keywords: Parallel data stream classification; Adaptation to concept drift; High velocity data streams
Additional Information: This is an open access article under the CC BY license
Publisher: Elsevier
ISSN: 0167-739X
Date of First Compliant Deposit: 5 May 2017
Date of Acceptance: 22 March 2017
Last Modified: 10 May 2023 13:25
URI: https://orca.cardiff.ac.uk/id/eprint/100328

Citation Data

Cited 59 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics