text to audio Stable-Audio-Open

Stable Audio Open, an open source model optimised for generating short audio samples, sound effects and production elements using text prompts.



UserUserUserUserUser


Free Online Stable-Audio-Open


What is Stable Audio Open?

Stable Audio Open allows anyone to generate up to 47 seconds of high-quality audio data from a simple text prompt. Its specialised training makes it ideal for creating drum beats, instrument riffs, ambient sounds, foley recordings and other audio samples for music production and sound design.

Features

Open Source Model

Completely free with up to 47 seconds of samples and sound effects

Specialized Training

High-quality and diverse audio generation

Customizable

Fine-tune with your own data.

Difference from Stable Audio Bussiness

Focused on short audio clips

Community and Feedback

Model Available on Hugging Face,you can deploy it by yourself

How to use Stable Audio Open?

Illustration of a person interacting with a robot on a computer screen

Let's get started with Stable Audio Open in just a few simple steps.

1

Download model from huggingface

git clone https://huggingface.co/stabilityai/stable-audio-open-1.0
2

Install Dependencies

pip install torch torchaudio stable_audio_tools einops
3

Import Required Libraries


      import torch
      import torchaudio
      from einops import rearrange
      from stable_audio_tools import get_pretrained_model
      from stable_audio_tools.inference.generation import generate_diffusion_cond
      import gradio as gr
4

Load model


      model, model_config = get_pretrained_model('stabilityai/stable-audio-open-1.0')
      model = model.to(device)
5

Generate Audio


      output = generate_diffusion_cond(
        model,
        steps=100,
        cfg_scale=7,
        conditioning=conditioning,
        sample_size=sample_size,
        sigma_min=0.3,
        sigma_max=500,
        sampler_type="dpmpp-3m-sde",
        device=device
    )
6

Output save audio


      # Rearrange audio batch to a single sequence
      output = rearrange(output, "b d n -> d (b n)")

      # Peak normalize, clip, convert to int16, and save to file
      output = output.to(torch.float32).div(torch.max(torch.abs(output))).clamp(-1, 1).mul(32767).to(torch.int16).cpu()
      torchaudio.save("output.wav", output, sample_rate)
      

FAQs

Here are some of the most frequently asked questions.

Stable Audio Open is an open-source text-to-audio model for generating audio samples and sound effects. It allows users to create up to 47 seconds of high-quality audio from simple text prompts.

Stable Audio Open focuses on generating short audio clips and sound effects, while the commercial version can create full tracks and complex compositions up to three minutes in length.

Yes, users can fine-tune Stable Audio Open with their own audio data to generate personalized sound effects and audio samples.

You can create drum beats, instrument riffs, ambient sounds, foley recordings, and production elements.

The model weights are available on Hugging Face.

Yes, it is completely free and open-source.

The model was trained on audio data from FreeSound and the Free Music Archive.

Yes, as an open-source model, it can be used for both personal and commercial purposes.

The model generates audio based on text prompts, so it supports any language input that the user provides.

You can start by downloading the model from Hugging Face and following the tutorials and documentation available.

The model can run on any system that supports PyTorch and has enough GPU or CPU resources.

Yes, you can join the community on Discord for support and discussions.

It is released under an open-source license.

Yes, you can contribute by providing feedback, reporting issues, and submitting pull requests on GitHub.

Developers can access documentation, community forums, and direct support through the Discord channel.

While it can generate short musical clips, it is not optimized for full songs, melodies, or vocals.

The model is trained on diverse datasets and fine-tuned for high-quality audio generation.

Temporary officials have not released a specific warehouse, only the model is released

You can integrate the model into your applications using its API.

Audio-to-audio generation modifies existing audio, while text-to-audio generation creates new audio from text prompts.