Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

A framework for categorising AI evaluation instruments

Cohn, Anthony G, Hernández-Orallo, José, Mboli, Julius Sechang, Moros-Daval, Yael, Xiang, Zhiliang ORCID: https://orcid.org/0000-0002-0263-7289 and Zhou, Lexin 2022. A framework for categorising AI evaluation instruments. Presented at: Workshop on AI Evaluation Beyond Metrics, Vienna, Austria, 24 July 2022. Published in: Hernandez-Orallo, Jose, Cheke, Lucy, Tenenbaum, Joshua, Ullman, Tomer, Martinez-Plumed, Fernando, Rutar, Danaja, Burden, John, Burnell, Ryan and Schellaert, Wout eds. Proceedings of the Workshop on AI Evaluation Beyond Metrics co-located with the 31st International Joint Conference on Artificial Intelligence (IJCAI-ECAI 2022). CEUR Workshop Proceedings. , vol.3169 CEUR Workshop Proceedings,

[thumbnail of A Framework for Categorising AI Evaluation Instruments.pdf]
Preview
PDF - Published Version
Available under License Creative Commons Attribution.

Download (1MB) | Preview

Abstract

The current and future capabilities of Artificial Intelligence (AI) are typically assessed with an ever increasing number of benchmarks, competitions, tests and evaluation standards, which are meant to work as AI evaluation instruments (EI). These EIs are not only increasing in number, but also in complexity and diversity, making it hard to understand this evaluation landscape in a meaningful way. In this paper we present an approach for categorising EIs using a set of 18 facets, accompanied by a rubric to allow anyone to apply the framework to any existing or new EI. We apply the rubric to 23 EIs in different domains through a team of raters, and analyse how consistent the rubric is and how well it works to distinguish between EIs and map the evaluation landscape in AI.

Item Type: Conference or Workshop Item (Paper)
Date Type: Publication
Status: Published
Schools: Computer Science & Informatics
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Publisher: CEUR Workshop Proceedings
ISSN: 1613-0073
Related URLs:
Date of First Compliant Deposit: 8 August 2022
Date of Acceptance: 3 June 2022
Last Modified: 24 Aug 2022 11:00
URI: https://orca.cardiff.ac.uk/id/eprint/151802

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics