obvious-research
commited on
Commit
•
ee20d3b
1
Parent(s):
e32490b
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,48 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Phenaki CViViT - Obvious Research
|
2 |
+
|
3 |
+
<p align="center">
|
4 |
+
<img src="assets_readme/obvious_research.png" alt="obvious_research" width="600"/>
|
5 |
+
</p>
|
6 |
+
|
7 |
+
# Reproduction of the first step in the [text-to-video model Phenaki](https://arxiv.org/pdf/2210.02399.pdf).
|
8 |
+
## Code and model weights for the Transformer-based autoencoder for videos called CViViT.
|
9 |
+
|
10 |
+
<p align="center">
|
11 |
+
<img src="assets_readme/phenaki.png" alt="phenaki" width="600"/>
|
12 |
+
</p>
|
13 |
+
|
14 |
+
## * Code, based on lucidrains' repo
|
15 |
+
|
16 |
+
The code is heavily based [on the reproduction of Phenaki](https://github.com/lucidrains/phenaki-pytorch) by the one and only [lucidrains](https://github.com/lucidrains). However, for actually training the model we had to make several modifications. Here's the list of modifications compared to the original repo:
|
17 |
+
|
18 |
+
- added i3d video loss
|
19 |
+
- loss weights, architecture parameters, optimizer parameters closer to paper
|
20 |
+
- added learning rate schedulers (warmup + annealing)
|
21 |
+
- added webdataset integration
|
22 |
+
- added video data preprocessing (8fps, 11 frames per videos as in the paper)
|
23 |
+
- added vq L2 factorized codes (once again thanks to lucidrains)
|
24 |
+
- code is now compatible for multi GPU and multi node training
|
25 |
+
- added accelerate wandb integration
|
26 |
+
- added visualisation scripts
|
27 |
+
- minor bug fixes
|
28 |
+
|
29 |
+
## * Model weight release, on Huggingface
|
30 |
+
|
31 |
+
We release the model weights of our best training. The model is trained on the Webvid-10M dataset on a multi-node multi-gpu setup.
|
32 |
+
|
33 |
+
As the model CViViT is an autoencoder for videos, here are examples of videos and reconstructions created by the model:
|
34 |
+
|
35 |
+
## * Next steps
|
36 |
+
|
37 |
+
We are working on the second part of training of Phenaki, which actually yields the full text-to-video model.
|
38 |
+
|
39 |
+
We appreciate any help, feel free to reach out! You can contact us:
|
40 |
+
|
41 |
+
- On Twitter: [@obv_research](https://twitter.com/obv_research)
|
42 |
+
- By mail: research.obvious@gmail.com
|
43 |
+
|
44 |
+
## * About Obvious Research
|
45 |
+
|
46 |
+
Obvious Research is an Artificial Intelligence research laboratory dedicated to creating new AI artistic tools, initiated by the artists’ trio [Obvious](https://obvious-art.com/), in partnership with La Sorbonne Université.
|
47 |
+
|
48 |
+
|