Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

yushun0410

posted an update 1 day ago

Post

2206

Hi Huggingfacers!

Thrilled to introduce Adam-mini, an optimizer that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint. Adam-mini can also achieve 49.5% higher throughput than AdamW on Llama2-7B pre-training.

The design of Adam-mini is inspired by certain Hessian structures we observed on Transformers.

Feel free to try it out! Try switching to Adam-mini with the same hyperparams of AdamW, it would work with only half memory. Hope Adam-mini can help save time, cost, and energy in your tasks!

Paper: "Adam-mini: Use Fewer Learning Rates To Gain More" https://arxiv.org/abs/2406.16793

Code: https://github.com/zyushun/Adam-mini

1 reply

Xenova

posted an update 2 days ago

Post

3095

Florence-2, the new vision foundation model by Microsoft, can now run 100% locally in your browser on WebGPU, thanks to Transformers.js! 🤗🤯

It supports tasks like image captioning, optical character recognition, object detection, and many more! 😍 WOW!
- Demo: Xenova/florence2-webgpu
- Models: https://huggingface.co/models?library=transformers.js&other=florence2
- Source code: https://github.com/xenova/transformers.js/tree/v3/examples/florence2-webgpu

DmitryRyumin

posted an update 2 days ago

Post

2341

🚀🎭🌟 New Research Alert - Portrait4D-v2 (Avatars Collection)! 🌟🎭🚀
📄 Title: Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer 🔝

📝 Description: Portrait4D-v2 is a novel method for one-shot 4D head avatar synthesis using pseudo multi-view videos and a vision transformer backbone, achieving superior performance without relying on 3DMM reconstruction.

👥 Authors: Yu Deng, Duomin Wang, and Baoyuan Wang

📄 Paper: Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer (2403.13570)

🌐 GitHub Page: https://yudeng.github.io/Portrait4D-v2/
📁 Repository: https://github.com/YuDeng/Portrait-4D

📺 Video: https://www.youtube.com/watch?v=5YJY6-wcOJo

🚀 CVPR-2023-24-Papers: https://github.com/DmitryRyumin/CVPR-2023-24-Papers

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🚀 Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36

🔍 Keywords: Portrait4D #4DAvatar #HeadSynthesis #3DModeling #TechInnovation #DeepLearning #ComputerGraphics #ComputerVision #Innovation

1 reply

ehristoforu

posted an update 3 days ago

Post

3560

🤗 Hello from the Project Fluently team!

🥏 We are ready to announce a new series of Supple Diffusion models, these are new generation diffusion models (about 1-2 weeks left before release).

🦾 The new series aims to take diffusion models to the next level, with performance and versatility as the main goal.

🧐 How will our models be better than others? Firstly, we worked on the CLIP models, now they understand your requests better, it will become easier to process. Secondly, we trained the models with high quality, even better than all our previous ones. Thirdly, you won’t have to keep 20 models on your disk; only 4-6 will be enough.

🗺️ Roadmap:
1. Create Supple Diffusion Small
2. Creating Supple Diffusion Medium
3. Create Supple Diffusion Large

🎆 Our models are universal for realism, and for cartoons, and for anime, and for caricatures.

💖 The project really needs your support and your recommendations and reviews, please do not hesitate to write comments under this post, thank you!

🖼️ Below are demo images made with the pre-release version of Supple Diffusion Small.

3 replies

as-cle-bert

posted an update 1 day ago

Post

1146

🤗 Hi HF Community!
🧬 As you may now, Evolutionary Scale recently released EvolutionaryScale/esm3-sm-open-v1 model here on the Hub, "a frontier generative model for biology, able to jointly reason across three fundamental biological properties of proteins: sequence, structure, and function" - as it is described on the dedicated GitHub page.
⚡ If you are curious about it and you want to try it out, you can do it with a space I built, as-cle-bert/proteins-with-esm
Hope this helps with your research!🚀

louisbrulenaudet

posted an update 3 days ago

Post

2650

I am delighted to announce the publication of my LegalKit, a French labeled dataset built for legal ML training 🤗

This dataset comprises multiple query-document pairs (+50k) curated for training sentence embedding models within the domain of French law.

The labeling process follows a systematic approach to ensure consistency and relevance:
- Initial Query Generation: Three instances of the LLaMA-3-70B model independently generate three different queries based on the same document.
- Selection of Optimal Query: A fourth instance of the LLaMA-3-70B model, using a dedicated selection prompt, evaluates the generated queries and selects the most suitable one.
- Final Label Assignment: The chosen query is used to label the document, aiming to ensure that the label accurately reflects the content and context of the original text.

Dataset: louisbrulenaudet/legalkit

Stay tuned for further updates and release information 🔥

@clem , if we can create an "HF for Legal" organization, similar to what exists for journalists, I am available!

Note : My special thanks to @alvdansen for their illustration models ❤️

2 replies

grimjim

posted an update 2 days ago

Post

1666

Uploaded two basic SLERP merges of princeton-nlp/Llama-3-Instruct-8B-SimPO and UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3, alternating the choice of base model, for people to test out and potentially use as merge fuel. (Personally, I am drawn to intelligent and attentive models, hence the experimentation.)

grimjim/Llama-3-Instruct-8B-SPPO-Iter3-SimPO-merge
grimjim/Llama-3-Instruct-8B-SimPO-SPPO-Iter3-merge

as-cle-bert

posted an update 2 days ago

Post

1529

Hi HuggingFacers!🤗
💥 If you are Bioinformaticians or Biologists, you may be familiar with BLAST, a search algorithm that allows researchers to identify the group of organisms (species, taxa...) from which DNA/Protein sequences come.
🥱 You may also be familiar with the difficulties to interpret long and multi-parametric results coming out from BLAST searches: here's where we can operate with LLMs, summarizing the outputs and/or replying to queries about them!
🧬 You can now run BLAST for 16S rRNA bacterial sequences here on HF, summarizing and/or asking questions about the results, or make sense of your online BLAST searches uploading description tables, using the last space I built: as-cle-bert/BLAST-SummarAIzer
Have fun and may this be helpful to your research!💻

mitkox

posted an update 1 day ago

Post

1120

I started Friday with decentralized AI using Gemma-2, and it all works without blockchain. This is what I did:

1. Pinned Gemma-2 9B in the Interplanetary Filesystem IPFS with the LoRA fine-tuning adapters.
2. Set up a llama-ipfs server to fetch and cache the model and adapters on the fly and inference locally.

Now, I can use my on device AI platform across:

• All my macOS automation workflows
• All my browsers
• My Copilot++ in VSCode
• My Open Apple Intelligence (OAI, not to be confused with the other closed OAI owned by a nonprofit foundation and BigTech)

The llama-ipfs server’s RPC support lets me decentralize inferencing across all my devices, supercharging computing and energy efficiency.

Make sure you own your AI. AI in the cloud is not aligned with you, it’s aligned with the company that owns it.

2 replies

Steelskull

posted an update 3 days ago

Post

1285

Myself @Steelskull and @elinas have been working on a new rendition of the Aethora-15B model, that's built on the Llama 3 architecture, and we've optimized it especially for creative writing tasks ( Both kinds ;D ) while maintaining strong general intelligence capabilities.

Model: L3-Aethora-15B-V2
ZeusLabs/L3-Aethora-15B-V2

Dataset: Aether-Lite-v1.8.1
TheSkullery/Aether-Lite-v1.8.1

What we've built:
A modified DUS (Depth Up Scale) model (originally created by Elinas) by using passthrough to create a 15b model, with specific adjustments (zeroing) to 'o_proj' and 'down_proj', enhancing its efficiency and reducing perplexity

Trained for 17.5 hours on 4 x A100 GPUs (huge thanks to g4rg for sponsoring the compute!)

Uses our Aether-Lite-V1.8.1 dataset with Large 125k high-quality samples
Focuses on creative writing and storytelling, with robust general intelligence

What makes L3-Aethora-15B v2 unique:
Creative Writing: We've really pushed its capabilities in generating engaging narratives, poetry, and adapting to various writing styles, RP and genres.

Versatile Intelligence: While we focused on creative tasks, it still handles scientific discussions, problem-solving, and educational content creation like a champ.

Long Context Understanding: Trained on the full sequence length of 8192 tokens, it maintains coherent conversations over extended interactions.

Carefully Curated Dataset: Alot of work was put into Aether-Lite-V1.8.1, our training dataset. It combines creative writing, instructional content, and specialized knowledge from various high-quality sources. All brought together by a custom data pipeline. (more information on the process is available on the dataset page)

Open Source: We've made both the model and the full dataset available to the community.

We'd love your ideas and recommendations for further improvements!

Recently active users