Projects with this topic
-
-
https://github.com/modelscope/FunASR A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processingUpdated -
https://github.com/Picovoice/cheetah On-device streaming speech-to-text engine powered by deep learning
Updated -
https://github.com/Picovoice/rhino On-device Speech-to-Intent engine powered by deep learning
Updated -
https://github.com/Picovoice/cobra On-device voice activity detection (VAD) powered by deep learning
Updated -
https://github.com/m-bain/whisperX WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)Updated -
Updated
-
-
https://github.com/Picovoice/falcon On-device speaker diarization powered by deep learning
Updated -
https://github.com/modelscope/FunClip Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.Updated -
https://github.com/bytedance/SALMONN SALMONN: Speech Audio Language Music Open Neural NetworkUpdated -
https://github.com/speechbrain/speechbrain.github.io The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.Updated -
https://github.com/Picovoice/web-voice-processor A library for real-time voice processing in web browsers.
Updated -
https://github.com/Picovoice/porcupine On-device wake word detection powered by deep learning
Updated -
https://github.com/homebrewltd/AudioBench AudioBench: A Universal Benchmark for Audio Large Language Models
https://arxiv.org/abs/2406.16020Updated -
huggingface.co/transformers https://github.com/huggingface/transformers Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.Updated -
-
https://github.com/markovka17/digit-recognition A small model for recognition of digits in audio clipsUpdated -
https://github.com/Cinnamon/whisper-jargon[SIGDIAL'24] Improving Speech Recognition with Jargon Injection
Updated -
https://github.com/YuanGongND/ltu Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
Updated