VibeVoice (TTS)

VibeVoice (TTS) — is a model for generating natural conversational dialogues from text, capable of creating dialogues with up to 4 speakers and durations of up to 90 minutes.

Key Features:

Two models: small and large
Up to 4 speakers in a single recording
Up to 90 minutes of generated audio
Language support: officially supports 2 languages: English (default) and Chinese, but it has been verified to work decently for other languages as well.

How to use the model

The text must be in English or Chinese; quality is not guaranteed for other languages. The maximum text length is 5000 characters. Avoid special characters. The text must be formatted specifically to indicate speakers:

Correct format:

Speaker 1: Hello! How are you today?
Speaker 2: I'm doing great, thanks for asking!
Speaker 1: That's wonderful to hear.
Speaker 3: Hey everyone, sorry I'm late!

Incorrect format:

Hello! How are you today?
I'm doing great!

Important:

Each line must start with Speaker N: (where N is a number from 1 to 4)
Speaker numbering: Speaker 1, Speaker 2, Speaker 3, Speaker 4
You can use from 1 to 4 speakers
Case does not matter: Speaker 1: = speaker 1: = SPEAKER 1

If you need a monologue, you do not need to specify a speaker.

Example scenarios:

Monologue (1 speaker):

Speaker 1: Today I want to talk about artificial intelligence.
Speaker 1: It's changing our world in incredible ways.
Speaker 1: From healthcare to entertainment, AI is everywhere.

Dialogue (2 speakers):

Speaker 1: Have you tried the new restaurant downtown?
Speaker 2: Not yet, but I've heard great things about it!
Speaker 1: We should go there this weekend.
Speaker 2: That sounds like a perfect plan!

Group conversation (3-4 speakers):

Speaker 1: Welcome to our podcast, everyone!
Speaker 2: Thanks for having us!
Speaker 3: It's great to be here.
Speaker 4: I'm excited to share our thoughts today.
Speaker 1: Let's start with introductions.

🗎 Copy link Use algorithm Demo

VibeVoice (TTS)

Key Features:

How to use the model

Correct format:

Incorrect format:

Example scenarios:

Site information

Company

Extra