Skip to content
Snippets Groups Projects
Commit ac1dd412 authored by Andres Marafioti's avatar Andres Marafioti
Browse files

improve readme (not according to cursor :( )

parent 129cd11b
No related branches found
No related tags found
No related merge requests found
...@@ -79,27 +79,28 @@ https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install ...@@ -79,27 +79,28 @@ https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install
### Server/Client Approach ### Server/Client Approach
To run the pipeline on the server: 1. Run the pipeline on the server:
```bash ```bash
python s2s_pipeline.py --recv_host 0.0.0.0 --send_host 0.0.0.0 python s2s_pipeline.py --recv_host 0.0.0.0 --send_host 0.0.0.0
``` ```
Then run the client locally to handle sending microphone input and receiving generated audio: 2. Run the client locally to handle microphone input and receive generated audio:
```bash ```bash
python listen_and_play.py --host <IP address of your server> python listen_and_play.py --host <IP address of your server>
``` ```
### Local approach (running on Mac) ### Local Approach (Mac)
To run on mac, we recommend setting the flag `--local_mac_optimal_settings`:
```bash 1. For optimal settings on Mac:
python s2s_pipeline.py --local_mac_optimal_settings ```bash
``` python s2s_pipeline.py --local_mac_optimal_settings
```
You can also pass `--device mps` to have all the models set to device mps. This setting:
The local mac optimal settings set the mode to be local as explained above and change the models to: - Adds `--device mps` to use MPS for all models.
- LightningWhisperMLX - Sets LightningWhisperMLX for STT
- MLX LM - Sets MLX LM for language model
- MeloTTS - Sets MeloTTS for TTS
### Recommended usage with Cuda ### Recommended usage with Cuda
...@@ -117,6 +118,57 @@ python s2s_pipeline.py \ ...@@ -117,6 +118,57 @@ python s2s_pipeline.py \
For the moment, modes capturing CUDA Graphs are not compatible with streaming Parler-TTS (`reduce-overhead`, `max-autotune`). For the moment, modes capturing CUDA Graphs are not compatible with streaming Parler-TTS (`reduce-overhead`, `max-autotune`).
### Multi-language Support
The pipeline supports multiple languages, allowing for automatic language detection or specific language settings. Here are examples for both local (Mac) and server setups:
#### With the server version:
For automatic language detection:
```bash
python s2s_pipeline.py \
--stt_model_name large-v3 \
--language zh \
--mlx_lm_model_name mlx-community/Meta-Llama-3.1-8B-Instruct \
```
Or for one language in particular, chinese in this example
```bash
python s2s_pipeline.py \
--stt_model_name large-v3 \
--language zh \
--mlx_lm_model_name mlx-community/Meta-Llama-3.1-8B-Instruct \
```
#### Local Mac Setup
For automatic language detection:
```bash
python s2s_pipeline.py \
--local_mac_optimal_settings \
--device mps \
--stt_model_name large-v3 \
--language zh \
--mlx_lm_model_name mlx-community/Meta-Llama-3.1-8B-Instruct-4bit \
```
Or for one language in particular, chinese in this example
```bash
python s2s_pipeline.py \
--local_mac_optimal_settings \
--device mps \
--stt_model_name large-v3 \
--language zh \
--mlx_lm_model_name mlx-community/Meta-Llama-3.1-8B-Instruct-4bit \
```
## Command-line Usage ## Command-line Usage
### Model Parameters ### Model Parameters
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment