Upload an audio file to get a transcript with speaker diarization. This demo uses openai/whisper-small for ASR and pyannote/speaker-diarization-3.1 for diarization. A Hugging Face token with access to pyannote/speaker-diarization-3.1 is required. Please set it as an HF_TOKEN environment variable before launching (see script comments).
Note: For long audios or high concurrent usage, consider using a GPU and models like whisper-large-v3.