M
multimodal
Projects with this topic
-
🔧 🔗 https://github.com/FoundationVision/Groma[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
Updated
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization