Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Panoptic segmentation-based attention for image captioning

Cai, Wenjie, Xiong, Zheng, Sun, Xianfang, Rosin, Paul L., Jin, Longcun and Peng, Xinyi 2020. Panoptic segmentation-based attention for image captioning. Applied Sciences 10 (1) , 391. 10.3390/app10010391

[img] PDF - Published Version
Available under License Creative Commons Attribution.

Download (4MB)

Abstract

Image captioning is the task of generating textual descriptions of images. In order to obtain a better image representation, attention mechanisms have been widely adopted in image captioning. However, in existing models with detection-based attention, the rectangular attention regions are not fine-grained, as they contain irrelevant regions (e.g., background or overlapped regions) around the object, making the model generate inaccurate captions. To address this issue, we propose panoptic segmentation-based attention that performs attention at a mask-level (i.e., the shape of the main part of an instance). Our approach extracts feature vectors from the corresponding segmentation regions, which is more fine-grained than current attention mechanisms. Moreover, in order to process features of different classes independently, we propose a dual-attention module which is generic and can be applied to other frameworks. Experimental results showed that our model could recognize the overlapped objects and understand the scene better. Our approach achieved competitive performance against state-of-the-art methods. We made our code available.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Computer Science & Informatics
Publisher: MDPI
ISSN: 2076-3417
Date of First Compliant Deposit: 2 April 2020
Date of Acceptance: 1 January 2020
Last Modified: 03 Apr 2020 15:24
URI: http://orca.cf.ac.uk/id/eprint/130681

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics