Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
168 commits
Select commit Hold shift + click to select a range
d909ae0
feat: add basic webdataset
spravil Mar 21, 2024
e233676
fix: dim of cls token
spravil Apr 22, 2024
9986691
feat: simple console logging
spravil Apr 22, 2024
c47b6c1
fix: add attention mask to cross entropy loss
spravil Apr 22, 2024
70823e1
feat: allow multiple loss functions
spravil Apr 22, 2024
0c87d91
fix: register nce loss
spravil Apr 22, 2024
b652a7d
feat: add dataloader for webdataset
spravil Apr 22, 2024
b0e933a
chore: add config
spravil Apr 22, 2024
d8d5a5f
feat: add nicer logging to wandb
spravil May 7, 2024
4ea65c8
fix: hardcoded batches in web loader
spravil May 7, 2024
dfe88c9
chore: update coca config
spravil May 7, 2024
5a3e844
fix: rebase
spravil May 7, 2024
043384d
fix: print only on main rank in component factory
spravil May 9, 2024
3cd9244
fix: total loss average logging
spravil May 9, 2024
b09d20e
fix: cuda env and run script
spravil May 9, 2024
e09745d
chore: update coca config
spravil May 9, 2024
5a74dee
fix: print parameters and done only on main rank
spravil May 9, 2024
f7b725c
chore: update coca wds config
spravil May 9, 2024
c7308e2
fix: tokenizer config of coca
spravil May 9, 2024
32d0b19
fix: add multinode splitter to webdataset
spravil May 9, 2024
55c039f
fix: webdataset slow loading
spravil May 9, 2024
966d237
fix: add batching
spravil May 9, 2024
dacf639
fix: add more options to webloader
spravil May 9, 2024
b40ecd5
fix: webloader
spravil May 9, 2024
a9ce132
fix: dataset factory
spravil May 9, 2024
63ef47c
fix: webdataset
spravil May 9, 2024
d6d84dc
fix: loss accumulation
spravil May 10, 2024
3d04f78
fix: loss average for eval
spravil May 10, 2024
e65a3cd
refactor: remove unused code from coca collator
spravil May 10, 2024
622570d
fix: coca collator
spravil May 10, 2024
efedc77
fix: loss normalization
spravil May 10, 2024
eba23a9
feat: add clip loss
spravil May 10, 2024
9bdc830
fix: use clip loss in coca
spravil May 10, 2024
b3aff91
chore: update coca webdataset config
spravil May 10, 2024
dffe644
chore: update coca webdataset config
spravil May 10, 2024
d8bdebc
fix: gradient accumulation
spravil May 13, 2024
244fac9
fix: normalize cls token of coca
spravil May 13, 2024
0524acd
feat: add weight option to loss config
spravil May 13, 2024
df48b62
fix: log weighted loss
spravil May 13, 2024
2945a03
feat: add cosine scheduler with warmup
spravil May 13, 2024
2d0e3f3
fix: loss logging
spravil May 13, 2024
1c0993f
feat: add local clip loss
spravil May 13, 2024
0e3b239
fix: clip loss
spravil May 13, 2024
7459ee0
fix: add barrier to eval
spravil May 13, 2024
1c0f456
feat: print global batch size
spravil May 13, 2024
a04bf97
fix: force integer for rank env variable
spravil May 13, 2024
9086af6
fix: validation set loading
spravil May 13, 2024
351990a
fix: webdataset splitter
spravil May 13, 2024
c90df9d
feat: add drop_last to webloader
spravil May 13, 2024
860b8fd
fix: val dataset with webdataset
spravil May 13, 2024
962c4ca
refactor: evaluator
spravil May 13, 2024
2d1ea92
fix: nodesplitter in webdataset
spravil May 13, 2024
1fb4fba
feat: add wandb grouping
spravil May 13, 2024
22c77cd
chore: update coca config
spravil May 13, 2024
6d980e7
fix: coca config
spravil May 13, 2024
dfb7884
feat: add universal multimodal dataset
spravil Jun 10, 2024
da27f0d
feat: add flatten_dict function
spravil Jun 10, 2024
016bed3
fix: webdataset
spravil Jun 10, 2024
faeec41
fix: web dataset integration
spravil Jun 11, 2024
1c6fcff
chore: add todo statement
spravil Jun 11, 2024
e90a8d5
chore: merge remote-tracking branch 'origin/main' into exp/vision_lan…
spravil Jun 11, 2024
42f469b
chore: update start script
spravil Jun 11, 2024
03082b8
chore: merge remote-tracking branch 'origin/main' into exp/vision_lan…
spravil Jun 11, 2024
78efe14
fix: coca webdataset config
spravil Jun 11, 2024
deb0788
feat: extend vision transformer model to video data
SogolHaghighat Jun 7, 2024
4a9695d
test: extend vision transformer test to video data
SogolHaghighat Jun 7, 2024
f04243a
test: add and update config for testing vision transformer with image…
SogolHaghighat Jun 7, 2024
cad8d11
feat: add video transforms
spravil Jun 11, 2024
3653922
chore: add video config
spravil Jun 11, 2024
3b72cd2
fix: video coca
spravil Jun 11, 2024
0c0317a
feat: add conformer audio encoder
manasMauryax Mar 27, 2024
5f63246
feat: make CoCa audio compatible
manasMauryax Mar 27, 2024
857867e
test: change config and dummy dataset for E2E CoCa test
manasMauryax Mar 28, 2024
3f10bfa
fix: webdataset with multiple dataset builders
spravil Jun 17, 2024
599ed66
fix: vision transformer config
spravil Jun 17, 2024
2d3c3fa
refactor: add pyav dependency
SogolHaghighat Jul 8, 2024
88ae41f
fix: block_size for video coca
SogolHaghighat Jul 8, 2024
fe23a69
test: add more test cases and merge video and image tests for vision …
SogolHaghighat Jul 8, 2024
256f6b2
fix: hard coded num_frames in video transform
SogolHaghighat Jul 12, 2024
28a3b82
refactor: use decord for loading videos
sthoduka Aug 5, 2024
c0e8d1f
fix: incorrect variable usage and audio input shape
May 14, 2024
83accc1
fix: to avoid torch.tensor(tensor)
May 14, 2024
54906d6
fix: add argument to ignore padding indices
May 14, 2024
67d3778
test: uptate tests to comply with changes
May 14, 2024
0b1698d
chore: add configs
May 14, 2024
b89bff8
feat: allow masking of "pad" keys
May 28, 2024
d31d0a7
feat: implement Conformer from scratch
May 28, 2024
4c42b27
test: fix to comply to changes
May 28, 2024
1e09561
test: remove deprecated test
May 28, 2024
68f6e5e
chore: fix configs to comply to changes
May 28, 2024
94efdb8
fix: accelerate import
thomaschhh Jun 3, 2024
b13c8cb
refactor: introduce global constants
thomaschhh Jun 3, 2024
70652f1
fix: constant renaming
thomaschhh Jun 10, 2024
f07cfda
fix: disable mamba imports
Jun 12, 2024
11a28e2
chore: update audio coca arrow dataset config
Jun 12, 2024
ced54c8
feat: add audio transform
thomaschhh Jun 13, 2024
8ad8e86
fix: cross entropy loss ignore index
thomaschhh Jun 13, 2024
31d72d0
fix: prepare_sample
thomaschhh Jun 13, 2024
3a14b95
chore: apply changes from origin/feat/audio_coca
thomaschhh Jun 13, 2024
68a920f
Merge branch 'exp/vision_language_coca' into feat/coca
sthoduka Sep 17, 2024
789f4f4
Merge branch 'main' into feat/coca
sthoduka Sep 17, 2024
2335dcd
chore: add separate vision and audio configs
spravil Jun 20, 2024
420041b
fix: coca n_query parameter
spravil Jun 20, 2024
9236a46
fix: copy paste error
manasMauryax Jul 2, 2024
edbc1ac
feat: allow for training all modalities
thomaschhh Aug 1, 2024
845f961
fix: revert back to multiple builders assertion
manasMauryax Aug 6, 2024
35e6812
refactor: copy decord from github exp/vision_languauge_coca branch
manasMauryax Aug 6, 2024
f6f7023
feat: add video-audio-text sample generation
manasMauryax Aug 6, 2024
21c9ee4
fix: video-audio-text forward pass
manasMauryax Aug 6, 2024
9a78231
fix: gathered embeddings
manasMauryax Aug 6, 2024
44fa306
fix: use torchaudio to load audio from videos too
sthoduka Aug 7, 2024
a2112da
refactor: simplify if statement
manasMauryax Aug 9, 2024
8c12b8a
refactor: improve readability
manasMauryax Aug 9, 2024
efae855
feat: add first draft of single batch mixed modality
manasMauryax Aug 9, 2024
c114506
fix: load audio from video file only if it exists, and behave the sam…
sthoduka Aug 9, 2024
414031a
fix: keep original dimension for audio when averaging channels
sthoduka Aug 12, 2024
7770b1c
fix: group input_ids based on modality; the order is determined by th…
sthoduka Aug 15, 2024
d025dd4
refactor: separate forward pass for audio-image and video-audio
manasMauryax Aug 16, 2024
8826758
fix: audio-image forward o/p for contrastive loss
manasMauryax Aug 16, 2024
5435d4b
refactor: revert back to wds.torch_audio
manasMauryax Aug 16, 2024
7c3c4ba
fix: only collect text samples once for case where a single dataset h…
sthoduka Aug 19, 2024
bc28749
refactor: coca: separate vision into image and video, and refactor fo…
sthoduka Aug 19, 2024
c8b9f65
fix: webdataset builder: remove constraint that all dataset builders …
sthoduka Aug 19, 2024
ea8910c
fix: add config parameter for video-audio-text dataset as a special case
sthoduka Aug 19, 2024
aa66bc6
fix: only use audio from video-audio sample if specified
sthoduka Aug 19, 2024
9efecdb
chore: remove comment
thomaschhh Aug 20, 2024
6d53b39
fix: set correct type for mixing ratios (float)
sthoduka Aug 26, 2024
21d1751
fix: gather all in clip loss
spravil Aug 27, 2024
a014b25
fix: use maximum batch size of samples as batch length
sthoduka Aug 29, 2024
1788b15
fix: webdataset: use a fixed round robin sampling strategy to get a f…
sthoduka Sep 2, 2024
7f31684
refactor: make batch_size mandatory only if using multiple builders
sthoduka Sep 9, 2024
2e84732
chore: remove unused configs; add new config
sthoduka Sep 17, 2024
1664744
fix: misc fixes after merging main into feat/coca
sthoduka Sep 17, 2024
f31be51
chore: remove comment and unused file
sthoduka Sep 17, 2024
91934fd
refactor: out_put -> output
sthoduka Sep 17, 2024
cdbb956
fix: reset cumulated losses using function
sthoduka Sep 17, 2024
9bcc475
fix: scaled weight initialization for residual layers of coca
sthoduka Sep 20, 2024
25130cd
refactor: replace CosineAnnealingWithWarmupLR with OneCycleLR
sthoduka Sep 20, 2024
a97f448
refactor: set weight decay groups for coca
sthoduka Sep 20, 2024
3b89853
fix: update path for coca tokenizer
sthoduka Sep 23, 2024
8f7a114
chore: use built-in types
thomaschhh Sep 23, 2024
3eb77a3
chore: refactor loss related items to match main
manasMauryax Sep 24, 2024
3366bc9
docs: add docstrings and type hints for audio-related classes and fun…
thomaschhh Sep 24, 2024
61cfe51
test: update coca model test and add coca collator test
sthoduka Sep 25, 2024
56ea087
refactor: verify correctness of coca model config
sthoduka Sep 25, 2024
8f5aea4
feat: multiple loss functions
manasMauryax Sep 24, 2024
f1f0fe5
Merge branch 'feat/multiple_loss_functions' into feat/coca
manasMauryax Sep 27, 2024
c291fcc
test: add more tests for loss functions
manasMauryax Sep 27, 2024
f1dbe91
revert: add back default values for NCELoss
sthoduka Sep 27, 2024
146682f
refactor: use composition to wrap the pytorch DataLoader using LLMDat…
sthoduka Sep 27, 2024
b985b9b
refactor: rename WebLoader to WebDataLoader
sthoduka Sep 30, 2024
9113f8a
docs: update docs for WebDataLoader and MultimodalWebDataset
sthoduka Sep 30, 2024
2a1cfa1
docs: update docs for coca model
sthoduka Sep 30, 2024
b9bfaea
docs: update docs for vision transforms and make video transform para…
sthoduka Oct 1, 2024
d6da583
test: add test for webdataset dataset and dataloader
sthoduka Oct 1, 2024
8312ed6
Merge branch 'main' into feat/coca
sthoduka Oct 7, 2024
152ebf2
docs: add docstring to WebDataloader, typehints to _init_modality
davidkaczer Oct 7, 2024
0353969
fix: mask creation for audio inputs
manasMauryax Oct 7, 2024
a12ff8a
test: add tests for audio_transformer
manasMauryax Oct 7, 2024
79c6c6b
docs: misc. docstrings and type hints for VideoTransform, web dataset
sthoduka Oct 7, 2024
23267e8
fix: update simple progress subscriber config
sthoduka Oct 8, 2024
29352b7
refactor: rename norm layers for easier regex for weight initializati…
sthoduka Oct 11, 2024
2aa4fc0
test: fix weight initialization and weight decay tests for coca
sthoduka Oct 11, 2024
c84bbda
fix: update directory name for getting started example
sthoduka Oct 11, 2024
e33076f
docs: update changelog with info about CoCa PR
sthoduka Oct 11, 2024
71a5bd1
chore: fix linting
sthoduka Oct 12, 2024
de9baab
docs: fix minor docstring inconsistencies
thomaschhh Oct 23, 2024
6c99264
Merge branch 'main' into feat/coca
sthoduka Nov 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions CHANGELOG_DEV.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,3 +85,19 @@ This PR mainly addresses the warmstart of model training, e.g., after GPU crashe

**Breaking Changes**
* the settings part of the configs have been completely refactored


## PR #263 CoCa model updates

This PR adds updates to the CoCa model:


**General Changes**
* add AudioTransformer model
* update the VisionTransformer model for video
* add the MultimodalWebDataset dataset for loading audio-text, image-text and video-text in the webdataset format
* add a multi-loss function for specifying a weighted-sum of different losses
* update the CoCa model to include encoders for video and audio

**Breaking Changes**
* the LLMDataLoader now contains a Pytorch Dataloader object instead of inheriting from it.
Loading