Projects with this topic
Sort by:
-
🔧 🔗 https://github.com/om-ai-lab/VLM-R1 Solve Visual Understanding with Reinforced VLMsUpdated -
🔧 🔗 https://github.com/Doriandarko/MLX-GRPO MLX-GRPO is a training framework for large language models (LLMs) that leverages Apple’s MLX framework exclusively. Designed to run natively on Apple Silicon using the Metal backend, this project implements Group-based Relative Policy Optimization (GRPO) with a chain-of-thought prompting structure. The pipeline includes dataset preparation, reward function definitions, and GRPO training—all running in a pure MLX environment (no CUDA).Updated