rishitdagli/see-2-sound · Hugging Face

SEE-2-SOUND🔊: Zero-Shot Spatial Environment-to-Spatial Sound

Rishit Dagli¹ · Shivesh Prakash¹ · Rupert Wu¹ · Houman Khosravani^1,2,3

¹University of Toronto ²Temerty Centre for Artificial Intelligence Research and Education in Medicine ³Sunnybrook Research Institute

This work presents SEE-2-SOUND, a method to generate spatial audio from images, animated images, and videos to accompany the visual content. Check out our website to view some results of this work.

These checkpoints are meant to be used with our code: SEE-2-SOUND.

Installation

First, install the pip package and download these checkpoints (needs Git LFS):

pip install -e git+https://github.com/see2sound/see2sound.git#egg=see2sound
git clone https://huggingface.co/rishitdagli/see-2-sound
cd see-2-sound

View the full installation instructions as well a tips on dependencies in the repository README.

Running the Models

Now, we can start by making a configuration file, make a file called config.yaml:

codi_encoder: 'codi/codi_encoder.pth'
codi_text: 'codi/codi_text.pth'
codi_audio: 'codi/codi_audio.pth'
codi_video: 'codi/codi_video.pth'

sam: 'sam/sam.pth'
# H, L or B in decreasing performance
sam_size: 'H'

depth: '/depth/depth.pth'
# L, B, or S in decreasing performance
depth_size: 'L'

download: False

# Change to True if your GPU has < 40 GB vRAM
low_mem: False
fp16: False
gpu: True
steps: 500
num_audios: 3
prompt: ''
verbose: True

Now, we can start running inference:

import see2sound


config_file_path = "config.yaml"

model = see2sound.See2Sound(config_path = config_file_path)
model.setup()
model.run(path = "test.png", output_path = "test.wav")

More Information

Feel free to take a look at the full dcoumentation for extra information and tips on running the model.

rishitdagli
/

see-2-sound

SEE-2-SOUND🔊: Zero-Shot Spatial Environment-to-Spatial Sound

Installation

Running the Models

More Information

Space using rishitdagli/see-2-sound 1

Collection including rishitdagli/see-2-sound

SEE-2-SOUND

Evaluation results