💽

Clevr 데이터셋이란?

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

When building artificial intelligence systems that can reason and answer questions about visual data, we need diagnostic tests to analyze our progress and discover shortcomings. Existing benchmarks for visual question answering can help, but have strong biases that models can exploit to correctly answer questions without reasoning.

https://cs.stanford.edu/people/jcjohns/clevr/

CLEVR 데이터셋은 다음과 같은 이미지와 질문 : Are there an equal number of large things and metal spheres? 답변 : No

Main Dataset

This is the main dataset used in the paper. It consists of:

•

A training set of 70,000 images and 699,989 questions

•

A validation set of 15,000 images and 149,991 questions

•

A test set of 15,000 images and 14,988 questions

•

Answers for all train and val questions

•

Scene graph annotations for train and val images giving ground-truth locations, attributes, and relationships for objects

•

Functional program representations for all training and validation images

CLEVR-CoGenT

stands for Compositional Generalization Test

두 개의 condition 이 있고 각각의 condition 은 Composition 이 다르다. 그렇기 때문에 Condition A 를 훈련시키고 Condition B 를 Evaluation 하는 식으로 검증을 하게된다. CLEVR 보다 좀 더 새로운 visual 구조에 reasoning 을 해야하기 때문에 어렵다.

Condition A

•

Cubes are gray, blue, brown, or yellow

•

Cylinders are red, green, purple, or cyan

•

Spheres can have any color

Condition B

•

Cubes are red, green, purple, or cyan

•

Cylinders are gray, blue, brown, or yellow

•

Spheres can have any color

CLEVR variation

CLEVR-Ref

CVPR 2019 Open Access Repository

CLEVR-Ref+: Diagnosing Visual Reasoning With Referring Expressions Runtao Liu, Chenxi Liu, Yutong Bai, Alan L. Yuille ; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4185-4194 Referring object detection and referring image segmentation are important tasks that require joint understanding of visual information and natural language.

https://openaccess.thecvf.com/content_CVPR_2019/html/Liu_CLEVR-Ref_Diagnosing_Visual_Reasoning_With_Referring_Expressions_CVPR_2019_paper.html

CLEVR 는 VQA 데이터셋으로 question 을 묻고 answer 를 하는 식인데 CELVR-Ref 는 어떤 객체에 대한 설명을 하고 그 객체를 바운딩박스나 마스크로 나타낸 것이다.

CLEVR-dialog

arxiv.org

https://arxiv.org/pdf/1903.03166.pdf

CLEVR 에서 dialog 를 통해 지속적인 추론을 하는 것이다. 아쉬운 점은 예측모델에게 애매한 질문을 해서 되물어 봄으로서 좀 더 확실한 대답을 알게하는 식의 시나리오보다는 VQA 를 여러개 늘어놓은 형식에 가깝다.

CLEVR-X

Question: There is a purple metallic ball; what number of cyan objects are right of it? Answer: 1 Explanation: There is a cyan cylinder which is on the right side of the purple metallic ball.

기존 CLEVR 에서 Explanation 이 붙어서 왜 Answer 인지에 대한 factual explanation 을 추가되었다.

GitHub - ExplainableML/CLEVR-X: CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations

By Leonard Salewski, A. Sophia Koepke, Hendrik Lensch and Zeynep Akata. Published in Springer LNAI xxAI. A preprint is available on arXiv. This repository is the official implementation of CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations. It contains code to generate the CLEVR-X dataset and a PyTorch dataset implementation.

https://github.com/ExplainableML/CLEVR-X

CLEVR-Math

Question: Take away 2 matte cylinders. How many objects are left? Answer: 7

CLEVR-Math 는 만약~ 하면 어떻게 되는가? 와 같이 우선 visual perception 을 통해 상황을 인지하고 그 뒤에 산술 추론을 해서 답을 내놓아야하는 좀 더 어려운 문제라고 할 수 있다.

CLEVR-Math: A Dataset for Compositional Language, Visual and Mathematical Reasoning

We introduce CLEVR-Math, a multi-modal math word problems dataset consisting of simple math word problems involving addition/subtraction, represented partly by a textual description and partly by an image illustrating the scenario. The text describes actions performed on the scene that is depicted in the image.

https://arxiv.org/abs/2208.05358