文本生成音效 Stable-Audio-Open

一个免费、开源且强大的文本转音效模型,让你可以生成各种音效。



UserUserUserUserUser


Free Online Stable-Audio-Open


什么是 stable audio open ?

Stable Audio Open 允许任何人通过简单的文本提示生成长达 47 秒的高质量音频数据。其专业训练使其非常适合为音乐制作和声音设计创建鼓点、乐器连复段、环境声音、拟音录音和其他音频样本。

Feature

How to use Stable Audio Open?

Illustration of a person interacting with a robot on a computer screen

Let's get started with Stable Audio Open in just a few simple steps.

1

Download model from huggingface

git clone https://huggingface.co/stabilityai/stable-audio-open-1.0
2

Install Dependencies

pip install torch torchaudio stable_audio_tools einops
3

Import Required Libraries


      import torch
      import torchaudio
      from einops import rearrange
      from stable_audio_tools import get_pretrained_model
      from stable_audio_tools.inference.generation import generate_diffusion_cond
      import gradio as gr
4

Load model


      model, model_config = get_pretrained_model('stabilityai/stable-audio-open-1.0')
      model = model.to(device)
5

Generate Audio


      output = generate_diffusion_cond(
        model,
        steps=100,
        cfg_scale=7,
        conditioning=conditioning,
        sample_size=sample_size,
        sigma_min=0.3,
        sigma_max=500,
        sampler_type="dpmpp-3m-sde",
        device=device
    )
6

Output save audio


      # Rearrange audio batch to a single sequence
      output = rearrange(output, "b d n -> d (b n)")

      # Peak normalize, clip, convert to int16, and save to file
      output = output.to(torch.float32).div(torch.max(torch.abs(output))).clamp(-1, 1).mul(32767).to(torch.int16).cpu()
      torchaudio.save("output.wav", output, sample_rate)
      

FAQs

这里有一些最常见的问题。