D
dataset

Projects with this topic

View Awesome Selfhosted Data project

mirrored_repos / awesomeness / Awesome Selfhosted Data

https://github.com/awesome-selfhosted/awesome-selfhosted-data machine-readable data for https://awesome-selfhosted.net

awesome-list self-hosted dataset data

0

Updated Jun 10, 2026

0 0 0 0

Updated Jun 10, 2026
View Data Juicer project

mirrored_repos / MachineLearning / modelscope / Data Juicer

🔧🔗https://github.com/modelscope/data-juicer Making data higher-quality, juicier, and more digestible for foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据！

nlp data-science opendata data-visuali... pytorch dataset data-analysis Llama gpt modelscope multi-modality pre-training streamlit sora Large Langua... instruction-... llava

0

Updated Jun 09, 2026

0 0 0 0

Updated Jun 09, 2026
View Llama Datasets project

mirrored_repos / MachineLearning / run-llama / Llama Datasets

https://github.com/run-llama/llama-datasets Github repo for storing LlamaDatasets

dataset llamaindex

0

Updated May 05, 2026

0 0 0 0

Updated May 05, 2026
View Ucla Phonetic Corpus project

mirrored_repos / MachineLearning / xinjli / Ucla Phonetic Corpus

UCLA Phonetic Corpus
🔧🔗https://github.com/xinjli/ucla-phonetic-corpus

Dataset of ICASSP 2021 MULTILINGUAL PHONETIC DATASET FOR LOW RESOURCE SPEECH RECOGNITION

speech-recog... speech dataset phonetics

0

Updated Sep 02, 2025

0 0 0 0

Updated Sep 02, 2025
View Label Studio project

mirrored_repos / MachineLearning / HumanSignal / Label Studio

🔧🔗https://github.com/HumanSignal/label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format

computer-vision image-annota... annotation annotations dataset yolo image-classi... datasets semantic-seg... annotation-tool text-annotation boundingbox image-labeling labeling-tool mlops image-labell... data-labeling label-studio Deep Learning

0

Updated Jul 12, 2025

0 0 0 0

Updated Jul 12, 2025
View Ocr Vqgan project

mirrored_repos / MachineLearning / joanrod / Ocr Vqgan

🔧🔗https://github.com/joanrod/ocr-vqganOCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Perceptual loss for clear text-within-im

ocr Deep Learning image-recons... dataset image-genera... deep-generat... taming-trans... vqgan ocr-vqgan paper2fig paper2fig100k text-reconst...

0

Updated Mar 26, 2025

0 0 0 0

Updated Mar 26, 2025
View Thinking Dataset project

mirrored_repos / MachineLearning / MultiTonic / Thinking Dataset

🔧🔗https://github.com/MultiTonic/thinking-dataset

creating a thinking dataset

Python tonic dataset multi-tonic

0

Updated Mar 15, 2025

0 0 0 0

Updated Mar 15, 2025
View Dataspeech project

mirrored_repos / MachineLearning / huggingface / Dataspeech

https://github.com/huggingface/dataspeech

automatic-sp... dataset tagging

0

Updated Sep 03, 2024

0 0 0 0

Updated Sep 03, 2024
View TTS Recipes project

mirrored_repos / MachineLearning / coqui-ai / TTS Recipes

https://github.com/coqui-ai/TTS-recipes 🐸TTS recipes for different datasets

Deep Learning recipe speech tts dataset coqui-ai tts-recipes

0

Updated Jun 09, 2024

0 0 0 0

Updated Jun 09, 2024
View Llm Decontaminator project

mirrored_repos / MachineLearning / lm-sys / Llm Decontaminator

https://github.com/lm-sys/llm-decontaminator Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"

llm-training dataset lm-sys

0

Updated May 23, 2024

0 0 0 0

Updated May 23, 2024
View Minari project

mirrored_repos / MachineLearning / Farama-Foundation / Minari

https://github.com/Farama-Foundation/Minari A standard format for offline reinforcement learning datasets, with popular reference datasets and related utilities

reinforcemen... dataset datasets gymnasium offline-rein... offline

0

Updated May 06, 2024

0 0 0 0

Updated May 06, 2024

🐾❤️ Strive to be the person your dogs believe you are ❤️🐾