Skip to content

Commit d12ea0f

Browse files
committed
Rebuild
1 parent a0bdb48 commit d12ea0f

File tree

368 files changed

+253393
-246019
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

368 files changed

+253393
-246019
lines changed

β€Ždocs/_downloads/032d653a4f5a9c1ec32b9fc7c989ffe1/seq2seq_translation_tutorial.ipynbβ€Ž

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -423,7 +423,7 @@
423423
"name": "python",
424424
"nbconvert_exporter": "python",
425425
"pygments_lexer": "ipython3",
426-
"version": "3.8.10"
426+
"version": "3.10.11"
427427
}
428428
},
429429
"nbformat": 4,

β€Ždocs/_downloads/03a48646520c277662581e858e680809/model_parallel_tutorial.ipynbβ€Ž

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -204,7 +204,7 @@
204204
"name": "python",
205205
"nbconvert_exporter": "python",
206206
"pygments_lexer": "ipython3",
207-
"version": "3.8.10"
207+
"version": "3.10.11"
208208
}
209209
},
210210
"nbformat": 4,
Lines changed: 48 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,37 @@
11
# -*- coding: utf-8 -*-
22
"""
3-
TorchMultimodal Tutorial: Finetuning FLAVA
3+
TorchMultimodal νŠœν† λ¦¬μ–Ό: FLAVA λ―Έμ„Έμ‘°μ •
44
============================================
5+
6+
**λ²ˆμ—­:** `κΉ€μ°¬ <https://github.com/chanmuzi>`__
7+
58
"""
69

10+
711
######################################################################
8-
# Multimodal AI has recently become very popular owing to its ubiquitous
9-
# nature, from use cases like image captioning and visual search to more
10-
# recent applications like image generation from text. **TorchMultimodal
11-
# is a library powered by Pytorch consisting of building blocks and end to
12-
# end examples, aiming to enable and accelerate research in
13-
# multimodality**.
14-
#
15-
# In this tutorial, we will demonstrate how to use a **pretrained SoTA
16-
# model called** `FLAVA <https://arxiv.org/pdf/2112.04482.pdf>`__ **from
17-
# TorchMultimodal library to finetune on a multimodal task i.e. visual
18-
# question answering** (VQA). The model consists of two unimodal transformer
19-
# based encoders for text and image and a multimodal encoder to combine
20-
# the two embeddings. It is pretrained using contrastive, image text matching and
21-
# text, image and multimodal masking losses.
12+
# λ©€ν‹° λͺ¨λ‹¬ AIλŠ” μ΅œκ·Όμ— 이미지 μžλ§‰μΆ”κ°€, μ‹œκ°μ  검색뢀터 ν…μŠ€νŠΈλ‘œλΆ€ν„° 이미지λ₯Ό 생성같은
13+
# 졜근의 μ‘μš©κΉŒμ§€ κ·Έ μ‚¬μš©μ΄ λΉ λ₯΄κ²Œ ν™•μ‚°λ˜κ³  μžˆμŠ΅λ‹ˆλ‹€. **TorchMultimodal은 PyTorchλ₯Ό
14+
# 기반으둜 ν•˜λŠ” 라이브러리둜, λ©€ν‹° λͺ¨λ‹¬ 연ꡬλ₯Ό κ°€λŠ₯ν•˜κ²Œ ν•˜κ³  κ°€μ†ν™”ν•˜κΈ° μœ„ν•œ λΉŒλ”© 블둝과
15+
# end-to-end μ˜ˆμ œλ“€μ„ μ œκ³΅ν•©λ‹ˆλ‹€**.
16+
#
17+
# λ³Έ νŠœν† λ¦¬μ–Όμ—μ„œλŠ” **사전 ν›ˆλ ¨λœ SoTA λͺ¨λΈμΈ** `FLAVA <https://arxiv.org/pdf/2112.04482.pdf>`__ **λ₯Ό**
18+
# **TorchMultimodal λΌμ΄λΈŒλŸ¬λ¦¬μ—μ„œ μ‚¬μš©ν•˜μ—¬ λ©€ν‹° λͺ¨λ‹¬ μž‘μ—…μΈ μ‹œκ°μ  질의 응닡(VQA)에 λ―Έμ„Έμ‘°μ •ν•˜λŠ” 방법을 보여 λ“œλ¦¬κ² μŠ΅λ‹ˆλ‹€.**
19+
# 이 λͺ¨λΈμ€ ν…μŠ€νŠΈμ™€ 이미지λ₯Ό μœ„ν•œ 두 개의 단일 λͺ¨λ‹¬ 트랜슀포머 기반 인코더와
20+
# 두 μž„λ² λ”©μ„ κ²°ν•©ν•˜λŠ” 닀쀑 λͺ¨λ‹¬ μΈμ½”λ”λ‘œ κ΅¬μ„±λ˜μ–΄ μžˆμŠ΅λ‹ˆλ‹€.
21+
# 이 λͺ¨λΈμ€ λŒ€μ‘°μ , 이미지-ν…μŠ€νŠΈ λ§€μΉ­, 그리고 ν…μŠ€νŠΈ, 이미지 및 닀쀑 λͺ¨λ‹¬ λ§ˆμŠ€ν‚Ή 손싀을 μ‚¬μš©ν•˜μ—¬ 사전 ν›ˆλ ¨λ˜μ—ˆμŠ΅λ‹ˆλ‹€.
22+
2223

2324

2425
######################################################################
25-
# Installation
26+
# μ„€μΉ˜
2627
# -----------------
27-
# We will use TextVQA dataset and ``bert tokenizer`` from Hugging Face for this
28-
# tutorial. So you need to install datasets and transformers in addition to TorchMultimodal.
28+
# 이 νŠœν† λ¦¬μ–Όμ„ μœ„ν•΄μ„œλŠ” TextVQA 데이터셋과 Hugging Face의 ``bert ν† ν¬λ‚˜μ΄μ €`` λ₯Ό μ‚¬μš©ν•  κ²ƒμž…λ‹ˆλ‹€.
29+
# λ”°λΌμ„œ TorchMultimodal 외에도 datasetsκ³Ό transformersλ₯Ό μ„€μΉ˜ν•΄μ•Ό ν•©λ‹ˆλ‹€.
2930
#
3031
# .. note::
31-
#
32-
# When running this tutorial in Google Colab, install the required packages by
33-
# creating a new cell and running the following commands:
32+
#
33+
# 이 νŠœν† λ¦¬μ–Όμ„ Google Colabμ—μ„œ μ‹€ν–‰ν•  경우, μƒˆλ‘œμš΄ 셀을 λ§Œλ“€κ³  λ‹€μŒμ˜ λͺ…λ Ήμ–΄λ₯Ό μ‹€ν–‰ν•˜μ—¬
34+
# ν•„μš”ν•œ νŒ¨ν‚€μ§€λ₯Ό μ„€μΉ˜ν•˜μ„Έμš”:
3435
#
3536
# .. code-block::
3637
#
@@ -40,32 +41,27 @@
4041
#
4142

4243
######################################################################
43-
# Steps
44+
# 단계
4445
# -----
4546
#
46-
# 1. Download the Hugging Face dataset to a directory on your computer by running the following command:
47+
# 1. λ‹€μŒ λͺ…λ Ήμ–΄λ₯Ό μ‹€ν–‰ν•˜μ—¬ Hugging Face 데이터셋을 μ»΄ν“¨ν„°μ˜ 디렉토리에 λ‹€μš΄λ‘œλ“œν•˜μ„Έμš”:
4748
#
4849
# .. code-block::
4950
#
5051
# wget http://dl.fbaipublicfiles.com/pythia/data/vocab.tar.gz
5152
# tar xf vocab.tar.gz
5253
#
5354
# .. note::
54-
# If you are running this tutorial in Google Colab, run these commands
55-
# in a new cell and prepend these commands with an exclamation mark (!)
55+
# 이 νŠœν† λ¦¬μ–Όμ„ Google Colabμ—μ„œ μ‹€ν–‰ν•˜λŠ” 경우, μƒˆ μ…€μ—μ„œ 이 λͺ…λ Ήμ–΄λ₯Ό μ‹€ν–‰ν•˜κ³  λͺ…λ Ήμ–΄ μ•žμ— λŠλ‚Œν‘œ (!)λ₯Ό λΆ™μ΄μ„Έμš”.
5656
#
5757
#
58-
# 2. For this tutorial, we treat VQA as a classification task where
59-
# the inputs are images and question (text) and the output is an answer class.
60-
# So we need to download the vocab file with answer classes and create the answer to
61-
# label mapping.
58+
# 2. λ³Έ νŠœν† λ¦¬μ–Όμ—μ„œλŠ” VQAλ₯Ό 이미지와 질문(ν…μŠ€νŠΈ)이 μž…λ ₯되고 좜λ ₯이 λ‹΅λ³€ 클래슀인 λΆ„λ₯˜ μž‘μ—…μœΌλ‘œ μ·¨κΈ‰ν•©λ‹ˆλ‹€.
59+
# λ”°λΌμ„œ λ‹΅λ³€ ν΄λž˜μŠ€μ™€ λ ˆμ΄λΈ” 맀핑을 생성할 단어μž₯ νŒŒμΌμ„ λ‹€μš΄λ‘œλ“œν•΄μ•Ό ν•©λ‹ˆλ‹€.
6260
#
63-
# We also load the `textvqa
64-
# dataset <https://arxiv.org/pdf/1904.08920.pdf>`__ containing 34602 training samples
65-
# (images,questions and answers) from Hugging Face
61+
# λ˜ν•œ Hugging Faceμ—μ„œ `textvqa 데이터셋 <https://arxiv.org/pdf/1904.08920.pdf>`__ 을 λΆˆλŸ¬μ˜€λŠ”λ°,
62+
# 이 데이터셋은 34602개의 ν›ˆλ ¨ μƒ˜ν”Œ(이미지, 질문, λ‹΅λ³€)을 ν¬ν•¨ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€.
6663
#
67-
# We see there are 3997 answer classes including a class representing
68-
# unknown answers.
64+
# 3997개의 λ‹΅λ³€ ν΄λž˜μŠ€κ°€ μžˆμŒμ„ 확인할 수 있으며, μ΄μ—λŠ” μ•Œ 수 μ—†λŠ” 닡변을 λ‚˜νƒ€λ‚΄λŠ” ν΄λž˜μŠ€λ„ ν¬ν•¨λ˜μ–΄ μžˆμŠ΅λ‹ˆλ‹€.
6965
#
7066

7167
with open("data/vocabs/answers_textvqa_more_than_1.txt") as f:
@@ -81,7 +77,7 @@
8177
dataset = load_dataset("textvqa")
8278

8379
######################################################################
84-
# Lets display a sample entry from the dataset:
80+
# λ°μ΄ν„°μ…‹μ—μ„œ μƒ˜ν”Œ μ—”νŠΈλ¦¬λ₯Ό ν‘œμ‹œν•΄ λ΄…μ‹œλ‹€:
8581
#
8682

8783
import matplotlib.pyplot as plt
@@ -95,12 +91,10 @@
9591

9692

9793
######################################################################
98-
# 3. Next, we write the transform function to convert the image and text into
99-
# Tensors consumable by our model - For images, we use the transforms from
100-
# torchvision to convert to Tensor and resize to uniform sizes - For text,
101-
# we tokenize (and pad) them using the ``BertTokenizer`` from Hugging Face -
102-
# For answers (i.e. labels), we take the most frequently occurring answer
103-
# as the label to train with:
94+
# 3. λ‹€μŒμœΌλ‘œ, 이미지와 ν…μŠ€νŠΈλ₯Ό λͺ¨λΈμ—μ„œ μ‚¬μš©ν•  수 μžˆλŠ” ν…μ„œλ‘œ λ³€ν™˜ν•˜κΈ° μœ„ν•œ λ³€ν™˜ ν•¨μˆ˜λ₯Ό μž‘μ„±ν•©λ‹ˆλ‹€.
95+
# - μ΄λ―Έμ§€μ˜ 경우, torchvision의 λ³€ν™˜μ„ μ‚¬μš©ν•˜μ—¬ ν…μ„œλ‘œ λ³€ν™˜ν•˜κ³  μΌμ •ν•œ 크기둜 μ‘°μ •ν•©λ‹ˆλ‹€.
96+
# - ν…μŠ€νŠΈμ˜ 경우, Hugging Face의 ``BertTokenizer`` λ₯Ό μ‚¬μš©ν•˜μ—¬ 토큰화(및 νŒ¨λ”©)ν•©λ‹ˆλ‹€.
97+
# - λ‹΅λ³€(즉, λ ˆμ΄λΈ”)의 경우, κ°€μž₯ λΉˆλ²ˆν•˜κ²Œ λ‚˜νƒ€λ‚˜λŠ” 닡변을 ν›ˆλ ¨ λ ˆμ΄λΈ”λ‘œ μ‚¬μš©ν•©λ‹ˆλ‹€:
10498
#
10599

106100
import torch
@@ -133,25 +127,21 @@ def transform(tokenizer, input):
133127

134128

135129
######################################################################
136-
# 4. Finally, we import the ``flava_model_for_classification`` from
137-
# ``torchmultimodal``. It loads the pretrained FLAVA checkpoint by default and
138-
# includes a classification head.
130+
# 4. λ§ˆμ§€λ§‰μœΌλ‘œ, ``torchmultimodal`` μ—μ„œ ``flava_model_for_classification`` 을 κ°€μ Έμ˜΅λ‹ˆλ‹€.
131+
# 이것은 기본적으둜 사전 ν›ˆλ ¨λœ FLAVA 체크포인트λ₯Ό λ‘œλ“œν•˜κ³  λΆ„λ₯˜ ν—€λ“œλ₯Ό ν¬ν•¨ν•©λ‹ˆλ‹€.
139132
#
140-
# The model forward function passes the image through the visual encoder
141-
# and the question through the text encoder. The image and question
142-
# embeddings are then passed through the multimodal encoder. The final
143-
# embedding corresponding to the CLS token is passed through a MLP head
144-
# which finally gives the probability distribution over each possible
145-
# answers.
133+
# λͺ¨λΈμ˜ 순방ν–₯ ν•¨μˆ˜λŠ” 이미지λ₯Ό μ‹œκ° 인코더에 ν†΅κ³Όμ‹œν‚€κ³  μ§ˆλ¬Έμ„ ν…μŠ€νŠΈ 인코더에 ν†΅κ³Όμ‹œν‚΅λ‹ˆλ‹€.
134+
# 이미지와 질문의 μž„λ² λ”©μ€ κ·Έ ν›„ λ©€ν‹° λͺ¨λ‹¬ 인코더λ₯Ό ν†΅κ³Όν•©λ‹ˆλ‹€.
135+
# μ΅œμ’… μž„λ² λ”©μ€ CLS 토큰에 ν•΄λ‹Ήν•˜λ©°, μ΄λŠ” MLP ν—€λ“œλ₯Ό ν†΅κ³Όν•˜μ—¬ 각 κ°€λŠ₯ν•œ 닡변에 λŒ€ν•œ ν™•λ₯  뢄포λ₯Ό μ œκ³΅ν•©λ‹ˆλ‹€.
146136
#
147137

148138
from torchmultimodal.models.flava.model import flava_model_for_classification
149139
model = flava_model_for_classification(num_classes=len(vocab))
150140

151141

152142
######################################################################
153-
# 5. We put together the dataset and model in a toy training loop to
154-
# demonstrate how to train the model for 3 iterations:
143+
# 5. 데이터셋과 λͺ¨λΈμ„ ν•¨κ»˜ λͺ¨μ•„ 3회 λ°˜λ³΅μ„ μœ„ν•œ κ°„λ‹¨ν•œ ν›ˆλ ¨ 루프λ₯Ό μž‘μ„±ν•˜μ—¬
144+
# λͺ¨λΈ ν›ˆλ ¨ 방법을 λ³΄μ—¬μ€λ‹ˆλ‹€:
155145
#
156146

157147
from torch import nn
@@ -177,14 +167,12 @@ def transform(tokenizer, input):
177167

178168

179169
######################################################################
180-
# Conclusion
170+
# κ²°λ‘ 
181171
# -------------------
182172
#
183-
# This tutorial introduced the basics around how to finetune on a
184-
# multimodal task using FLAVA from TorchMultimodal. Please also check out
185-
# other examples from the library like
186-
# `MDETR <https://github.com/facebookresearch/multimodal/tree/main/torchmultimodal/models/mdetr>`__
187-
# which is a multimodal model for object detection and
188-
# `Omnivore <https://github.com/facebookresearch/multimodal/blob/main/torchmultimodal/models/omnivore.py>`__
189-
# which is multitask model spanning image, video and 3d classification.
173+
# 이 νŠœν† λ¦¬μ–Όμ—μ„œλŠ” TorchMultimodal의 FLAVAλ₯Ό μ‚¬μš©ν•˜μ—¬ λ©€ν‹° λͺ¨λ‹¬ μž‘μ—…μ— λ―Έμ„Έ μ‘°μ •ν•˜λŠ”
174+
# 기본적인 방식을 μ†Œκ°œν–ˆμŠ΅λ‹ˆλ‹€. 객체 탐지λ₯Ό μœ„ν•œ λ©€ν‹° λͺ¨λ‹¬ λͺ¨λΈμΈ `MDETR <https://github.com/facebookresearch/multimodal/tree/main/torchmultimodal/models/mdetr>`__ κ³Ό
175+
# 이미지, λΉ„λ””μ˜€, 3D λΆ„λ₯˜λ₯Ό ν¬κ΄„ν•˜λŠ” λ‹€μž‘μ—… λͺ¨λΈ `Omnivore <https://github.com/facebookresearch/multimodal/blob/main/torchmultimodal/models/omnivore.py>`__
176+
# 같은 라이브러리의 λ‹€λ₯Έ μ˜ˆμ œλ“€λ„ 확인해 λ³΄μ„Έμš”.
177+
#
190178
#

β€Ždocs/_downloads/09dab7b70298bcb798ab79840558b800/maskedtensor_sparsity.ipynbβ€Ž

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -316,7 +316,7 @@
316316
"name": "python",
317317
"nbconvert_exporter": "python",
318318
"pygments_lexer": "ipython3",
319-
"version": "3.8.10"
319+
"version": "3.10.11"
320320
}
321321
},
322322
"nbformat": 4,

β€Ždocs/_downloads/0e6615c5a7bc71e01ff3c51217ea00da/tensorqs_tutorial.ipynbβ€Ž

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -358,7 +358,7 @@
358358
"name": "python",
359359
"nbconvert_exporter": "python",
360360
"pygments_lexer": "ipython3",
361-
"version": "3.8.10"
361+
"version": "3.10.11"
362362
}
363363
},
364364
"nbformat": 4,

β€Ždocs/_downloads/11f1adacb7d237f2041ce267ac38abb6/saveloadrun_tutorial.ipynbβ€Ž

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@
139139
"name": "python",
140140
"nbconvert_exporter": "python",
141141
"pygments_lexer": "ipython3",
142-
"version": "3.8.10"
142+
"version": "3.10.11"
143143
}
144144
},
145145
"nbformat": 4,

0 commit comments

Comments
Β (0)