Large Language Models

Projects with this topic

mirrored_repos / MachineLearning / InternLM / Lmdeploy

https://github.com/InternLM/lmdeploy LMDeploy is a toolkit for compressing, deploying, and serving LLMs. 🔗 lmdeploy.readthedocs.io/en/latest/

Llama cuda-kernels deepspeed
+ 8 more

0

Updated Dec 04, 2024

0 0 0 0

Updated Dec 04, 2024
mirrored_repos / MachineLearning / JanHQ / Cortex.Tensorrt Llm

https://github.com/janhq/cortex.tensorrt-llm Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU accelerated inference on NVIDIA's GPUs.

nvidia jan tensorrt
+ 2 more

0

Updated Nov 01, 2024

0 0 0 0

Updated Nov 01, 2024
mirrored_repos / MachineLearning / FoundationVision / Groma

🔧🔗https://github.com/FoundationVision/Groma

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

🕸️🔗groma-mllm.github.io/

Llama multimodal grounding
+ 6 more

0

Updated Oct 19, 2024

0 0 0 0

Updated Oct 19, 2024
mirrored_repos / MachineLearning / IST-DASLabs / Marlin

🔧🔗https://github.com/IST-DASLab/marlin FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

kernel quantization Large Langua...
+ 1 more

0

Updated Oct 12, 2024

0 0 0 0

Updated Oct 12, 2024
mirrored_repos / MachineLearning / thukeg / APAR

https://github.com/THUDM/APAR APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding

Large Langua... parallel auto-regressive

0

Updated Sep 30, 2024

0 0 0 0

Updated Sep 30, 2024

Large Language Models

Projects with this topic

mirrored_repos / MachineLearning / InternLM / Lmdeploy

mirrored_repos / MachineLearning / JanHQ / Cortex.Tensorrt Llm

mirrored_repos / MachineLearning / FoundationVision / Groma

mirrored_repos / MachineLearning / IST-DASLabs / Marlin

mirrored_repos / MachineLearning / thukeg / APAR