General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published 17 days ago • 72
view article Article Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models Jun 24 • 166
view article Article Announcing Finance Commons and the Bad Data Toolbox: Pioneering Open Data and Advanced Document Processing By Pclanglais • Jul 19 • 17
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Paper • 2311.06242 • Published Nov 10, 2023 • 77
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Paper • 2406.04325 • Published Jun 6 • 71
📀 Dataset comparison models Collection 1.8B models trained on 350BT to compare different pretraining datasets • 8 items • Updated Jun 12 • 27
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting Paper • 2309.04269 • Published Sep 8, 2023 • 32
FocalFormer3D : Focusing on Hard Instance for 3D Object Detection Paper • 2308.04556 • Published Aug 8, 2023 • 8
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models Paper • 2308.04729 • Published Aug 9, 2023 • 31
Shepherd: A Critic for Language Model Generation Paper • 2308.04592 • Published Aug 8, 2023 • 29
PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers Paper • 2308.05732 • Published Aug 10, 2023 • 8
Alexa, play with robot: Introducing the First Alexa Prize SimBot Challenge on Embodied AI Paper • 2308.05221 • Published Aug 9, 2023 • 9
Flexible Isosurface Extraction for Gradient-Based Mesh Optimization Paper • 2308.05371 • Published Aug 10, 2023 • 10
Follow Anything: Open-set detection, tracking, and following in real-time Paper • 2308.05737 • Published Aug 10, 2023 • 11
OpenProteinSet: Training data for structural biology at scale Paper • 2308.05326 • Published Aug 10, 2023 • 10
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment Paper • 2308.05374 • Published Aug 10, 2023 • 27
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining Paper • 2308.05734 • Published Aug 10, 2023 • 36
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models Paper • 2308.01390 • Published Aug 2, 2023 • 31
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization Paper • 2308.02151 • Published Aug 4, 2023 • 18
Mirror-NeRF: Learning Neural Radiance Fields for Mirrors with Whitted-Style Ray Tracing Paper • 2308.03280 • Published Aug 7, 2023 • 6
Tiny LVLM-eHub: Early Multimodal Experiments with Bard Paper • 2308.03729 • Published Aug 7, 2023 • 9
TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents Paper • 2308.03427 • Published Aug 7, 2023 • 14
Seeing through the Brain: Image Reconstruction of Visual Perception from Human Brain Signals Paper • 2308.02510 • Published Jul 27, 2023 • 21
UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition Paper • 2308.03279 • Published Aug 7, 2023 • 21
AvatarVerse: High-quality & Stable 3D Avatar Creation from Text and Pose Paper • 2308.03610 • Published Aug 7, 2023 • 23
ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free Domain Adaptation Paper • 2308.03793 • Published Aug 4, 2023 • 10
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore Paper • 2308.04430 • Published Aug 8, 2023 • 9
3D Gaussian Splatting for Real-Time Radiance Field Rendering Paper • 2308.04079 • Published Aug 8, 2023 • 165
Simple synthetic data reduces sycophancy in large language models Paper • 2308.03958 • Published Aug 7, 2023 • 21
Ambient Adventures: Teaching ChatGPT on Developing Complex Stories Paper • 2308.01734 • Published Aug 3, 2023 • 6
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World Paper • 2308.01907 • Published Aug 3, 2023 • 10
HANDAL: A Dataset of Real-World Manipulable Object Categories with Pose Annotations, Affordances, and Reconstructions Paper • 2308.01477 • Published Aug 2, 2023 • 11
Multimodal Neurons in Pretrained Text-Only Transformers Paper • 2308.01544 • Published Aug 3, 2023 • 15
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales Paper • 2308.01320 • Published Aug 2, 2023 • 44
Training Data Protection with Compositional Diffusion Models Paper • 2308.01937 • Published Aug 2, 2023 • 5
Scaling Clinical Trial Matching Using Large Language Models: A Case Study in Oncology Paper • 2308.02180 • Published Aug 4, 2023 • 9
Getting the Ball Rolling: Learning a Dexterous Policy for a Biomimetic Tendon-Driven Hand with Rolling Contact Joints Paper • 2308.02453 • Published Aug 4, 2023 • 8
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP Paper • 2308.02487 • Published Aug 4, 2023 • 12
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities Paper • 2308.02490 • Published Aug 4, 2023 • 16
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks Paper • 2306.04362 • Published Jun 7, 2023 • 2
MobileNMT: Enabling Translation in 15MB and 30ms Paper • 2306.04235 • Published Jun 7, 2023 • 3
ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image Collections Paper • 2306.04619 • Published Jun 7, 2023 • 4
LLMZip: Lossless Text Compression using Large Language Models Paper • 2306.04050 • Published Jun 6, 2023 • 4
M^3IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning Paper • 2306.04387 • Published Jun 7, 2023 • 8
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts Paper • 2306.04845 • Published Jun 8, 2023 • 4
Modular Visual Question Answering via Code Generation Paper • 2306.05392 • Published Jun 8, 2023 • 2
Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models Paper • 2306.05357 • Published Jun 8, 2023 • 3
Optimizing ViViT Training: Time and Memory Reduction for Action Recognition Paper • 2306.04822 • Published Jun 7, 2023 • 2
LU-NeRF: Scene and Pose Estimation by Synchronizing Local Unposed NeRFs Paper • 2306.05410 • Published Jun 8, 2023 • 2
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models Paper • 2306.05424 • Published Jun 8, 2023 • 7
Improving Open Language Models by Learning from Organic Interactions Paper • 2306.04707 • Published Jun 7, 2023 • 3
MIMIC-IT: Multi-Modal In-Context Instruction Tuning Paper • 2306.05425 • Published Jun 8, 2023 • 11
SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions Paper • 2306.05178 • Published Jun 8, 2023 • 6
Embodied Executable Policy Learning with Language-based Scene Summarization Paper • 2306.05696 • Published Jun 9, 2023 • 3