bokyeong1015 commited on
Commit
7f29ed9
1 Parent(s): 103ac4e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -17
README.md CHANGED
@@ -72,23 +72,25 @@ image = pipe(prompt).images[0]
72
  image.save("example.png")
73
  ```
74
 
75
- The above examples have been tested on a single NVIDIA GeForce RTX 3090 GPU with the following versions:
76
-
77
- ```
78
- torch 1.13.1+cu117
79
- transformers 4.29.2
80
- diffusers 0.15.0
81
- ```
82
 
83
 
84
 
85
  ## Compression Method
86
 
87
  ### U-Net Architecture
88
- We removed several residual and attention blocks from the 0.86B-parameter U-Net in the 1.04B-param SDM-v1.4, and our compressed models are summarized as follows.
89
- - 0.76B-param **BK-SDM-Base** (0.58B-param U-Net): obtained with ① fewer blocks in outer stages.
90
- - 0.66B-param **BK-SDM-Small** (0.49B-param U-Net): obtained with and ② mid-stage removal.
91
- - 0.50B-param **BK-SDM-Tiny** (0.33B-param U-Net): obtained with ①, ②, and further inner-stage removal.
 
 
 
 
 
 
 
 
 
92
 
93
 
94
  ### Distillation Pretraining
@@ -96,7 +98,7 @@ The compact U-Net was trained to mimic the behavior of the original U-Net. We le
96
 
97
 
98
  <center>
99
- <img alt="U-Net architectures and KD-based pretraining" img src="https://huggingface.co/spaces/nota-ai/compressed-stable-diffusion/resolve/e6fb31631f0b2948cf6ec54006ea050d6c83e940/docs/fig_model.png" width="100%">
100
  </center>
101
 
102
 
@@ -116,23 +118,24 @@ The following table shows the zero-shot results on 30K samples from the MS-COCO
116
 
117
  | Model | FID↓ | IS↑ | CLIP Score↑<br>(ViT-g/14) | # Params,<br>U-Net | # Params,<br>Whole SDM |
118
  |:---:|:---:|:---:|:---:|:---:|:---:|
119
- | Stable Diffusion v1.4 | 13.05 | 36.76 | 0.2958 | 0.86B | 1.04B |
120
- | BK-SDM-Base (Ours) | 15.76 | 33.79 | 0.2878 | 0.58B | 0.76B |
121
- | BK-SDM-Small (Ours) | 16.98 | 31.68 | 0.2677 | 0.49B | 0.66B |
122
- | BK-SDM-Tiny (Ours) | 17.12 | 30.09 | 0.2653 | 0.33B | 0.50B |
123
 
124
  <br/>
125
 
126
  The following figure depicts synthesized images with some MS-COCO captions.
127
 
128
  <center>
129
- <img alt="Visual results" img src="https://huggingface.co/spaces/nota-ai/compressed-stable-diffusion/resolve/e6fb31631f0b2948cf6ec54006ea050d6c83e940/docs/fig_results.png" width="100%">
130
  </center>
131
 
132
 
133
  <br/>
134
 
135
 
 
136
  # Uses
137
  _Note: This section is taken from the [Stable Diffusion v1 model card]( https://huggingface.co/CompVis/stable-diffusion-v1-4) (which was based on the [DALLE-MINI model card](https://huggingface.co/dalle-mini/dalle-mini)) and applies in the same way to BK-SDMs_.
138
 
 
72
  image.save("example.png")
73
  ```
74
 
 
 
 
 
 
 
 
75
 
76
 
77
 
78
  ## Compression Method
79
 
80
  ### U-Net Architecture
81
+ Certain residual and attention blocks were eliminated from the U-Net of SDM-v1.4:
82
+
83
+ - 1.04B-param [SDM-v1.4](https://huggingface.co/CompVis/stable-diffusion-v1-4) (0.86B-param U-Net): the original source model.
84
+ - 0.76B-param [**BK-SDM-Base**](https://huggingface.co/nota-ai/bk-sdm-base) (0.58B-param U-Net): obtained with fewer blocks in outer stages.
85
+ - 0.66B-param [**BK-SDM-Small**](https://huggingface.co/nota-ai/bk-sdm-small) (0.49B-param U-Net): obtained with ① and ② mid-stage removal.
86
+ - 0.50B-param [**BK-SDM-Tiny**](https://huggingface.co/nota-ai/bk-sdm-tiny) (0.33B-param U-Net): obtained with ①, ②, and ③ further inner-stage removal.
87
+
88
+
89
+ <center>
90
+ <img alt="U-Net architectures" img src="https://netspresso-research-code-release.s3.us-east-2.amazonaws.com/assets-bk-sdm/fig_arch.png" width="100%">
91
+ </center>
92
+
93
+
94
 
95
 
96
  ### Distillation Pretraining
 
98
 
99
 
100
  <center>
101
+ <img alt="KD-based pretraining" img src="https://netspresso-research-code-release.s3.us-east-2.amazonaws.com/assets-bk-sdm/fig_kd_bksdm.png" width="100%">
102
  </center>
103
 
104
 
 
118
 
119
  | Model | FID↓ | IS↑ | CLIP Score↑<br>(ViT-g/14) | # Params,<br>U-Net | # Params,<br>Whole SDM |
120
  |:---:|:---:|:---:|:---:|:---:|:---:|
121
+ | [Stable Diffusion v1.4](https://huggingface.co/CompVis/stable-diffusion-v1-4) | 13.05 | 36.76 | 0.2958 | 0.86B | 1.04B |
122
+ | [BK-SDM-Base](https://huggingface.co/nota-ai/bk-sdm-base) (Ours) | 15.76 | 33.79 | 0.2878 | 0.58B | 0.76B |
123
+ | [BK-SDM-Small](https://huggingface.co/nota-ai/bk-sdm-small) (Ours) | 16.98 | 31.68 | 0.2677 | 0.49B | 0.66B |
124
+ | [BK-SDM-Tiny](https://huggingface.co/nota-ai/bk-sdm-tiny) (Ours) | 17.12 | 30.09 | 0.2653 | 0.33B | 0.50B |
125
 
126
  <br/>
127
 
128
  The following figure depicts synthesized images with some MS-COCO captions.
129
 
130
  <center>
131
+ <img alt="Visual results" img src="https://netspresso-research-code-release.s3.us-east-2.amazonaws.com/assets-bk-sdm/fig_results.png" width="100%">
132
  </center>
133
 
134
 
135
  <br/>
136
 
137
 
138
+
139
  # Uses
140
  _Note: This section is taken from the [Stable Diffusion v1 model card]( https://huggingface.co/CompVis/stable-diffusion-v1-4) (which was based on the [DALLE-MINI model card](https://huggingface.co/dalle-mini/dalle-mini)) and applies in the same way to BK-SDMs_.
141