
OM AI Lab
Om AI Lab Open Multimodal AGI Research
@OmAI_lab https://huggingface.co/omlab
https://github.com/om-ai-lab/GroundVLP GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024) https://github.com/om-ai-lab/OmAgent Build multimodal language agents for fast prototype and production- O
https://github.com/om-ai-lab/OmChatA suite of multimodal language models that are powerful and efficient
https://github.com/om-ai-lab/OmDet Real-time and accurate open-vocabulary end-to-end object detection https://github.com/om-ai-lab/OmModelA collection of strong multimodal models for building multimodal AGI agents
https://github.com/om-ai-lab/OVDEvalA Comprehensive Evaluation Benchmark for Open-Vocabulary Detection (AAAI 2024)
https://github.com/om-ai-lab/RS5M RS5M: a large-scale vision language dataset for remote sensing [TGRS] https://github.com/om-ai-lab/VL-CheckList Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations. [EMNLP 2022] https://github.com/om-ai-lab/ZoomEye ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration