text to audio Stable-Audio-Open

Stable Audio Open, an open source model optimised for generating short audio samples, sound effects and production elements using text prompts.

Free Online Stable-Audio-Open

What is Stable Audio Open?

Stable Audio Open allows anyone to generate up to 47 seconds of high-quality audio data from a simple text prompt. Its specialised training makes it ideal for creating drum beats, instrument riffs, ambient sounds, foley recordings and other audio samples for music production and sound design.

Features

How to use Stable Audio Open?

Illustration of a person interacting with a robot on a computer screen

Let's get started with Stable Audio Open in just a few simple steps.

Download model from huggingface

git clone https://huggingface.co/stabilityai/stable-audio-open-1.0

Install Dependencies

pip install torch torchaudio stable_audio_tools einops

Import Required Libraries


      import torch
      import torchaudio
      from einops import rearrange
      from stable_audio_tools import get_pretrained_model
      from stable_audio_tools.inference.generation import generate_diffusion_cond
      import gradio as gr

Load model


      model, model_config = get_pretrained_model('stabilityai/stable-audio-open-1.0')
      model = model.to(device)

Generate Audio


      output = generate_diffusion_cond(
        model,
        steps=100,
        cfg_scale=7,
        conditioning=conditioning,
        sample_size=sample_size,
        sigma_min=0.3,
        sigma_max=500,
        sampler_type="dpmpp-3m-sde",
        device=device
    )

Output save audio


      # Rearrange audio batch to a single sequence
      output = rearrange(output, "b d n -> d (b n)")

      # Peak normalize, clip, convert to int16, and save to file
      output = output.to(torch.float32).div(torch.max(torch.abs(output))).clamp(-1, 1).mul(32767).to(torch.int16).cpu()
      torchaudio.save("output.wav", output, sample_rate)

FAQs

Here are some of the most frequently asked questions.