[WIP] Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts

## 一言でいうと
vision-and-languageの事前学習のための1200万の画像とテキストのペアを備えたデータセット．

### 論文リンク
https://arxiv.org/pdf/2102.08981.pdf

### 著者/所属機関
Google Research

### 投稿日付(yyyy/MM/dd)
CVPR2021

## Motivation

vision-and-languageの事前学習には，これまではimage captioningやVQAなどのデータが利用されていた．
このようなある特定のタスクについてのデータセットを流用した事前学習も非常に有用ではあったものの，元のタスクにマッチする制限の元でしかデータを収集できなかったために，データセットの規模と多様性にも制限がかかってしまっていた．

本研究ではそうした制限を取り払って，vision-and-languageの事前学習のための大規模なデータセットの構築をした．

## Composition

<img width="521" alt="Screen Shot 2021-06-12 at 2 16 07" src="https://user-images.githubusercontent.com/10952293/121725886-cc31a980-cb24-11eb-9ffe-ea6dd358fc64.png">

<img width="473" alt="Screen Shot 2021-06-12 at 2 16 19" src="https://user-images.githubusercontent.com/10952293/121725906-d0f65d80-cb24-11eb-91e6-4a2b503fa270.png">

<img width="513" alt="Screen Shot 2021-06-12 at 2 16 24" src="https://user-images.githubusercontent.com/10952293/121725914-d3f14e00-cb24-11eb-9b51-ec6b9f9760ce.png">


## Collection Process

## Benchmarks

<img width="499" alt="Screen Shot 2021-06-12 at 2 16 32" src="https://user-images.githubusercontent.com/10952293/121725922-d6ec3e80-cb24-11eb-8f17-d3c7628aa6af.png">

<img width="1013" alt="Screen Shot 2021-06-12 at 2 16 45" src="https://user-images.githubusercontent.com/10952293/121725996-e8cde180-cb24-11eb-8ee1-89a5d6682eda.png">

<img width="1010" alt="Screen Shot 2021-06-12 at 2 16 54" src="https://user-images.githubusercontent.com/10952293/121726001-ebc8d200-cb24-11eb-82fc-fa4f564da9e9.png">

<img width="1019" alt="Screen Shot 2021-06-12 at 2 17 10" src="https://user-images.githubusercontent.com/10952293/121726010-eec3c280-cb24-11eb-9098-9d0ef311aca9.png">


## コメント


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts #3

一言でいうと

論文リンク

著者/所属機関

投稿日付(yyyy/MM/dd)

Motivation

Composition

Collection Process

Benchmarks

コメント

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[WIP] Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts #3

Description

一言でいうと

論文リンク

著者/所属機関

投稿日付(yyyy/MM/dd)

Motivation

Composition

Collection Process

Benchmarks

コメント

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions