zer0int
/

LongCLIP-GmP-ViT-L-14

Inference Endpoints

Model card Files Files and versions Community

zer0int commited on Jun 15

Commit

3300319

•

1 Parent(s): 79869a4

Update README.md

Files changed (1) hide show

README.md +62 -3

README.md CHANGED Viewed

@@ -1,3 +1,62 @@
----
-license: mit
----

+## A fine-tune of [BeichenZhang/LongCLIP-L](https://huggingface.co/BeichenZhang/LongCLIP-L) -- Long-CLIP ViT-L/14 expanded to 248 tokens.
+The fine-tune has an improved ImageNet/ObjectNet accuracy of 0.89 (original Long-CLIP by the authors:~0.81)**.
+Made possible with Geometric Parametrization (GmP):
+```
+"Normal" CLIP MLP (multi-layer perceptron):
+(mlp): Sequential(
+  |-(c_fc): Linear(in_features=1024, out_features=4096, bias=True)
+  | (gelu): QuickGELU()
+|-}-(c_proj): Linear(in_features=4096, out_features=1024, bias=True)
+| |
+| |-- visual.transformer.resblocks.0.mlp.c_fc.weight
+| |-- visual.transformer.resblocks.0.mlp.c_fc.bias
+|
+|---- visual.transformer.resblocks.0.mlp.c_proj.weight
+|---- visual.transformer.resblocks.0.mlp.c_proj.bias
+GmP CLIP MLP:
+Weight decomposition into:
+- radial component 'r' as norm of pre-trained weights
+- angular component 'theta' as normalized direction
+-> preserves weight vectors' directionality and magnitude
+(mlp): Sequential(
+  |-(c_fc): GeometricLinear()
+  | (gelu): QuickGELU()
+|-}-(c_proj): GeometricLinear()
+| |
+| |-- visual.transformer.resblocks.0.mlp.c_fc.r
+| |-- visual.transformer.resblocks.0.mlp.c_fc.theta
+| |-- visual.transformer.resblocks.0.mlp.c_fc.bias
+|
+|---- visual.transformer.resblocks.0.mlp.c_proj.r
+|---- visual.transformer.resblocks.0.mlp.c_proj.theta
+|---- visual.transformer.resblocks.0.mlp.c_proj.bias
+(Same thing for [text] transformer.resblocks)
+```
+✅ The model / state_dict I am sharing was converted back to .weight after fine-tuning - alas, it can be used in the same manner as any state_dict, e.g. for use with ComfyUI as the SDXL / SD3 Text Encoder using [SeaArtLab/ComfyUI-Long-CLIP](https://github.com/SeaArtLab/ComfyUI-Long-CLIP) custom nodes! 🤗
+** For details on training and those numbers / the eval, or for just fine-tuning the model yourself, see: [https://github.com/zer0int/Long-CLIP](https://github.com/zer0int/Long-CLIP)
+```
+@article{zhang2024longclip,
+        title={Long-CLIP: Unlocking the Long-Text Capability of CLIP},
+        author={Beichen Zhang and Pan Zhang and Xiaoyi Dong and Yuhang Zang and Jiaqi Wang},
+        journal={arXiv preprint arXiv:2403.15378},
+        year={2024}
+}
+```
+Pre-trained CLIP model by OpenAI, License: [MIT License](https://github.com/openai/CLIP/blob/main/LICENSE)