M
multimodal

Projects with this topic

View Ms Swift project

mirrored_repos / MachineLearning / modelscope / Ms Swift

🔧🔗https://github.com/modelscope/ms-swift

SWIFT (Scalable lightWeight Infrastructure for Fine-Tuning) Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs.

agent deployment lora Llama liger peft multimodal modelscope sft dpo pre-training Large Langua... llava vllm qwen lmdeploy minicpm-v internvl

0

Updated Jun 10, 2026

0 0 0 0

Updated Jun 10, 2026
View Stability Sdk project

mirrored_repos / MachineLearning / Stability AI / Stability Sdk

https://github.com/Stability-AI/stability-sdk SDK for interacting with stability.ai APIs (e.g. stable diffusion inference)

generative-art multimodal ai-art latent-diffu... stable-diffu...

0

Updated May 14, 2026

0 0 0 0

Updated May 14, 2026
View GroundVLP project

mirrored_repos / MachineLearning / OM AI Lab / GroundVLP

🔧🔗https://github.com/om-ai-lab/GroundVLP GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)

object-detec... zero-shot-le... multimodal vision-langu...

0

Updated Apr 10, 2026

0 0 0 0

Updated Apr 10, 2026
View VL CheckList project

mirrored_repos / MachineLearning / OM AI Lab / VL CheckList

🔧🔗https://github.com/om-ai-lab/VL-CheckList Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations. [EMNLP 2022]

evaluation metrics multimodal vision language Deep Learning

0

Updated Apr 10, 2026

0 0 0 0

Updated Apr 10, 2026
View VLM R1 project

mirrored_repos / MachineLearning / OM AI Lab / VLM R1

🔧🔗https://github.com/om-ai-lab/VLM-R1 Solve Visual Understanding with Reinforced VLMs

vlm multimodal Large Langua... qwen deepseek grpo vlm-r1

0

Updated Mar 24, 2026

0 0 0 0

Updated Mar 24, 2026
View Modelscope Agent project

mirrored_repos / MachineLearning / modelscope / Modelscope Agent

🔧🔗https://github.com/modelscope/modelscope-agent

ModelScope-Agent: An agent framework connecting models in ModelScope with the world

modelscope agents Python data-science code chatbot android multi-agent Retrieval Au... mobile-agents gpt multimodal Large Langua... qwen assistant codexgraph data-science...

0

Updated Mar 24, 2026

0 0 0 0

Updated Mar 24, 2026
View Transfusion Pytorch project

mirrored_repos / MachineLearning / lucidrains / Transfusion Pytorch

🔧🔗https://github.com/lucidrains/transfusion-pytorch

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

Python pytorch Deep Learning transformers pytorch-tran... Synthetic In... attention multimodal flow-matching

0

Updated Jan 27, 2026

0 0 0 0

Updated Jan 27, 2026
View HuixiangDou project

mirrored_repos / MachineLearning / InternLM / HuixiangDou

https://github.com/InternLM/HuixiangDou HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance

application robot pipeline dsl chatbot assistant wechat group-chat lark image-retrieval ai-assistant multimodal Retrieval Au... Large Langua...

0

Updated Nov 24, 2025

0 0 0 0

Updated Nov 24, 2025
View Star Vector project

mirrored_repos / MachineLearning / joanrod / Star Vector

🔧🔗https://github.com/joanrod/star-vector StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and textual inputs to produce high-quality SVG code with remarkable precision.

svg vlm Large Langua... multimodal

0

Updated Nov 07, 2025

0 0 0 0

Updated Nov 07, 2025
View InternLM XComposer project

mirrored_repos / MachineLearning / InternLM / InternLM XComposer

https://github.com/InternLM/InternLM-XComposer InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

foundation gpt language-model multimodal multi-modality vision-trans... gpt-4 visual-langu... Large Langua... chatgpt instruction-... supervised-f... mllm vision-langu... large-vision...

0

Updated May 20, 2025

0 0 0 0

Updated May 20, 2025
View CharXiv project

mirrored_repos / MachineLearning / princeton-nlp / CharXiv

https://github.com/princeton-nlp/CharXiv CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs

benchmark Machine Lear... multimodal vision-langu... chart-unders...

0

Updated Apr 22, 2025

0 0 0 0

Updated Apr 22, 2025
View OmAgent project

mirrored_repos / MachineLearning / OM AI Lab / OmAgent

🔧🔗https://github.com/om-ai-lab/OmAgent Build multimodal language agents for fast prototype and production

agent Python workflow chatbot OpenAI Llama gradio gpt multimodal Retrieval Au... Large Langua... llava language-agent multimodal-a...

0

Updated Mar 19, 2025

0 0 0 0

Updated Mar 19, 2025
View Janus project

mirrored_repos / MachineLearning / deepseek-ai / Janus

🔧🔗https://github.com/deepseek-ai/Janus

Janus-Series: Unified Multimodal Understanding and Generation Models

deepseek multimodal unified-model any-to-any foundation-m... Large Langua... vision-langu...

0

Updated Feb 01, 2025

0 0 0 0

Updated Feb 01, 2025
View Groma project

mirrored_repos / MachineLearning / FoundationVision / Groma

🔧🔗https://github.com/FoundationVision/Groma

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

🕸️🔗groma-mllm.github.io/

Llama multimodal grounding foundational... Large Langua... mllm vision-langu... llama2

0

Updated Oct 19, 2024

0 0 0 0

Updated Oct 19, 2024
View Language Quantized Autoencoders project

mirrored_repos / MachineLearning / forhaoliu / Language Quantized Autoencoders

🔧🔗https://github.com/forhaoliu/language-quantized-autoencoders

Language Quantized AutoEncoders This is a Jax implementation of our work Language Quantized AutoEncoders.

autoencoders bert vq multimodal roberta vqvae Large Langua...

0

Updated Oct 12, 2024

0 0 0 0

Updated Oct 12, 2024

🐾❤️ Strive to be the person your dogs believe you are ❤️🐾