Language Model / Multimodal

모집 인원

1명

자연어 처리와 컴퓨터 비전 분야에 대한 통합적인 이해가 있는 분들

아래 논문은 참고 자료이며, 상의 후 논문을 정할 수 있습니다.

참고자료

Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

Large pretrained (e.g., "foundation") models exhibit distinct capabilities depending on the domain of data they are trained on. While these domains are generic, they may only barely overlap. For example, visual-language models (VLMs) are trained on Internet-scale image captions, but large language models (LMs) are further trained on Internet-scale text with no images (e.g., spreadsheets, SAT questions, code).

https://socraticmodels.github.io/

ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

We present ALFRED (Action Learning From Realistic Environments and Directives), a benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks. ALFRED includes long, compositional tasks with non-reversible state changes to shrink the gap between research benchmarks and real-world applications.

https://arxiv.org/abs/1912.01734

Emergent Abilities of Large Language Models

Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models.

https://arxiv.org/abs/2206.07682

Large Language Models Still Can't Plan (A Benchmark for LLMs on Planning and Reasoning about Change)

The recent advances in large language models (LLMs) have transformed the field of natural language processing (NLP). From GPT-3 to PaLM, the state-of-the-art performance on natural language tasks is being pushed forward with every new large language model.

https://arxiv.org/abs/2206.10498

Learning to Play Minecraft with Video PreTraining (VPT)

We trained a neural network to play Minecraft by Video PreTraining (VPT) on a massive unlabeled video dataset of human Minecraft play, while using only a small amount of labeled contractor data. With fine-tuning, our model can learn to craft diamond tools, a task that usually takes proficient humans over

https://openai.com/blog/vpt/