Projects with this topic
-
🔧 🔗 https://github.com/modelscope/ms-swiftSWIFT (Scalable lightWeight Infrastructure for Fine-Tuning) Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs.
Updated -
🔧 🔗 https://github.com/modelscope/modelscope-agentModelScope-Agent: An agent framework connecting models in ModelScope with the world
Updated -
🔧 🔗 https://github.com/om-ai-lab/VLM-R1 Solve Visual Understanding with Reinforced VLMsUpdated -
https://github.com/InternLM/HuixiangDou HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance
Updated -
https://github.com/Stability-AI/stability-sdk SDK for interacting with stability.ai APIs (e.g. stable diffusion inference)
Updated -
🔧 🔗 https://github.com/lucidrains/transfusion-pytorchPytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
Updated -
https://github.com/InternLM/InternLM-XComposer InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Updated -
https://github.com/princeton-nlp/CharXiv CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
Updated -
🔧 🔗 https://github.com/joanrod/star-vector StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and textual inputs to produce high-quality SVG code with remarkable precision.Updated -
🔧 🔗 https://github.com/om-ai-lab/OmAgent Build multimodal language agents for fast prototype and productionUpdated -
🔧 🔗 https://github.com/om-ai-lab/VL-CheckList Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations. [EMNLP 2022]Updated -
🔧 🔗 https://github.com/om-ai-lab/GroundVLP GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)Updated -
🔧 🔗 https://github.com/deepseek-ai/JanusJanus-Series: Unified Multimodal Understanding and Generation Models
Updated -
🔧 🔗 https://github.com/FoundationVision/Groma[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
Updated -
🔧 🔗 https://github.com/forhaoliu/language-quantized-autoencodersLanguage Quantized AutoEncoders This is a Jax implementation of our work Language Quantized AutoEncoders.
Updated