diff --git a/README.md b/README.md index 75fe28e148e9d692cb14b43d6ab2dc998426d067..5dff2349830c266d7142a9d088280d116f90b1ed 100644 --- a/README.md +++ b/README.md @@ -19,8 +19,8 @@ ### Structure This repository implements a speech-to-speech cascaded pipeline with consecutive parts: 1. **Voice Activity Detection (VAD)**: [silero VAD v5](https://github.com/snakers4/silero-vad) -2. **Speech to Text (STT)**: Whisper models (including distilled versions) -3. **Language Model (LM)**: Any instruct model available on the [Hugging Face Hub](https://huggingface.co/models)! 🤗 +2. **Speech to Text (STT)**: Whisper checkpoints (including [distilled versions](https://huggingface.co/distil-whisper)) +3. **Language Model (LM)**: Any instruct model available on the [Hugging Face Hub](https://huggingface.co/models?pipeline_tag=text-generation&sort=trending)! 🤗 4. **Text to Speech (TTS)**: [Parler-TTS](https://github.com/huggingface/parler-tts) ### Modularity @@ -70,6 +70,8 @@ python s2s_pipeline.py --recv_host localhost --send_host localhost python listen_and_play.py --host localhost ``` +You can pass `--device mps` to run it locally on a Mac. + ## Command-line Usage ### Model Parameters