Whisper ASR with Pyannote Speaker Diarization

Upload an audio file to get a transcript with speaker diarization. This demo uses openai/whisper-small for ASR and pyannote/speaker-diarization-3.1 for diarization. A Hugging Face token with access to pyannote/speaker-diarization-3.1 is required. Please set it as an HF_TOKEN environment variable before launching (see script comments).
Note: For long audios or high concurrent usage, consider using a GPU and models like whisper-large-v3.

1 32
1 30
ASR Language
Examples
Upload Audio File (WAV, MP3, FLAC, etc.) ASR Batch Size ASR Chunk Length (seconds) ASR Language