phanes (William Lamkin)

upvoted 2 papers 1 day ago

Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Paper • 2409.11355 • Published 2 days ago • 23

OSV: One Step is Enough for High-Quality Image to Video Generation

Paper • 2409.11367 • Published 2 days ago • 11

upvoted 7 papers 2 days ago

Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

Paper • 2409.11406 • Published 2 days ago • 19

EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer

Paper • 2409.10819 • Published 3 days ago • 11

upvoted a paper 3 days ago

Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

Paper • 2409.09214 • Published 6 days ago • 36

upvoted a paper 4 days ago

Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos

Paper • 2409.08353 • Published 7 days ago • 9

upvoted 3 papers 7 days ago

DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors

Paper • 2409.08278 • Published 7 days ago • 10

Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models

Paper • 2409.07452 • Published 8 days ago • 18

Generative Hierarchical Materials Search

Paper • 2409.06762 • Published 9 days ago • 6

upvoted 2 papers 14 days ago

Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

Paper • 2409.02634 • Published 15 days ago • 84

FLUX that Plays Music

Paper • 2409.00587 • Published 19 days ago • 31

upvoted a collection 19 days ago

Sapiens

Collection

Foundation models for human tasks. Code: https://github.com/facebookresearch/sapiens • 72 items • Updated 1 day ago • 21

upvoted a paper 21 days ago

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Paper • 2408.15998 • Published 22 days ago • 81

upvoted 2 papers 22 days ago

Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation

Paper • 2408.14819 • Published 24 days ago • 18

Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published 24 days ago • 119

upvoted a paper 24 days ago

LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation

Paper • 2408.13252 • Published 27 days ago • 23

upvoted 3 papers 27 days ago

Real-Time Video Generation with Pyramid Attention Broadcast

Paper • 2408.12588 • Published 28 days ago • 13

Subsurface Scattering for 3D Gaussian Splatting

Paper • 2408.12282 • Published 29 days ago • 5

Sapiens: Foundation for Human Vision Models

Paper • 2408.12569 • Published 28 days ago • 84

upvoted a paper 28 days ago

xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

Paper • 2408.12590 • Published 28 days ago • 33

upvoted a paper 30 days ago

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published about 1 month ago • 54

upvoted 19 papers about 1 month ago

Segment Anything with Multiple Modalities

Paper • 2408.09085 • Published Aug 17 • 20

Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering

Paper • 2408.09702 • Published Aug 19 • 9

TraDiffusion: Trajectory-Based Training-Free Image Generation

Paper • 2408.09739 • Published Aug 19 • 7

SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views

Paper • 2408.10195 • Published Aug 19 • 12

JPEG-LM: LLMs as Image Generators with Canonical Codec Representations

Paper • 2408.08459 • Published Aug 15 • 44

Towards flexible perception with visual memory

Paper • 2408.08172 • Published Aug 15 • 19

3D Gaussian Editing with A Single Image

Paper • 2408.07540 • Published Aug 14 • 10

Body Transformer: Leveraging Robot Embodiment for Policy Learning

Paper • 2408.06316 • Published Aug 12 • 8

FruitNeRF: A Unified Neural Radiance Field based Fruit Counting Framework

Paper • 2408.06190 • Published Aug 12 • 17

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Paper • 2408.06072 • Published Aug 12 • 35

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

Paper • 2408.06292 • Published Aug 12 • 114

RayGauss: Volumetric Gaussian-Based Ray Casting for Photorealistic Novel View Synthesis

Paper • 2408.03356 • Published Aug 6 • 8

Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks

Paper • 2408.03615 • Published Aug 7 • 30

Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling

Paper • 2408.03695 • Published Aug 7 • 11

Achieving Human Level Competitive Robot Table Tennis

Paper • 2408.03906 • Published Aug 7 • 26

IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts

Paper • 2408.03209 • Published Aug 6 • 21

ProCreate, Dont Reproduce! Propulsive Energy Diffusion for Creative Generation

Paper • 2408.02226 • Published Aug 5 • 10

MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization

Paper • 2408.02555 • Published Aug 5 • 28

ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

Paper • 2406.12793 • Published Jun 18 • 31

upvoted 10 papers about 2 months ago

TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling

Paper • 2408.01291 • Published Aug 2 • 11

Medical SAM 2: Segment medical images as video via Segment Anything Model 2

Paper • 2408.00874 • Published Aug 1 • 40

Berkeley Humanoid: A Research Platform for Learning-based Control

Paper • 2407.21781 • Published Jul 31 • 7

Tora: Trajectory-oriented Diffusion Transformer for Video Generation

Paper • 2407.21705 • Published Jul 31 • 25

Expressive Whole-Body 3D Gaussian Avatar

Paper • 2407.21686 • Published Jul 31 • 7

WalkTheDog: Cross-Morphology Motion Alignment via Phase Manifolds

Paper • 2407.18946 • Published Jul 11 • 12

SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency

Paper • 2407.17470 • Published Jul 24 • 14

MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence

Paper • 2407.16655 • Published Jul 23 • 28

A Simulation Benchmark for Autonomous Racing with Large-Scale Human Data

Paper • 2407.16680 • Published Jul 23 • 11

MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation

Paper • 2407.15060 • Published Jul 21 • 9

upvoted 5 papers 2 months ago

Shape of Motion: 4D Reconstruction from a Single Video

Paper • 2407.13764 • Published Jul 18 • 19

Grasping Diverse Objects with Simulated Humanoids

Paper • 2407.11385 • Published Jul 16 • 5

DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation

Paper • 2407.11394 • Published Jul 16 • 11

Scaling Diffusion Transformers to 16 Billion Parameters

Paper • 2407.11633 • Published Jul 16 • 25

GRUtopia: Dream General Robots in a City at Scale

Paper • 2407.10943 • Published Jul 15 • 23

William Lamkin

AI & ML interests

Organizations

phanes's activity