L
llm-inference

Projects with this topic

View Vllm project

mirrored_repos / MachineLearning / vllm-project / Vllm

🔧🔗https://github.com/vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

vllm amd cuda inference pytorch transformer Llama gpt rocm model-serving tpu hpu mlops xpu inferentia Large Langua... llm-inference llmops

0

Updated Feb 02, 2026

0 0 0 0

Updated Feb 02, 2026
View Lmdeploy project

mirrored_repos / MachineLearning / InternLM / Lmdeploy

https://github.com/InternLM/lmdeploy LMDeploy is a toolkit for compressing, deploying, and serving LLMs. 🔗 lmdeploy.readthedocs.io/en/latest/

Llama cuda-kernels deepspeed Large Langua... fastertransf... llm-inference turbomind internlm llama2 codellama llama3

0

Updated Feb 02, 2026

0 0 0 0

Updated Feb 02, 2026
View Flashinfer project

mirrored_repos / MachineLearning / flashinfer-ai / Flashinfer

🔧🔗https://github.com/flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

🕸️🔗flashinfer.ai

gpu code jit pytorch llm-inference flash-attention Large Langua...

0

Updated Feb 02, 2026

0 0 0 0

Updated Feb 02, 2026
View Gemma.Cpp project

mirrored_repos / MachineLearning / google / Gemma.Cpp

🔧🔗https://github.com/google/gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models.

gemma llm-inference cpp

0

Updated Jan 29, 2026

0 0 0 0

Updated Jan 29, 2026
View Dash Infer project

mirrored_repos / MachineLearning / modelscope / Dash Infer

🔧🔗https://github.com/modelscope/dash-infer

DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including x86 and ARMv9.

modelscope cpu llm-inference Large Langua... native-engine

0

Updated Jul 29, 2025

0 0 0 0

Updated Jul 29, 2025
View ShadowKV project

mirrored_repos / MachineLearning / bytedance / ShadowKV

🔧🔗https://github.com/bytedance/ShadowKV ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

research low-rank sparse-atten... long-context llm-inference

0

Updated May 01, 2025

0 0 0 0

Updated May 01, 2025
View Inference Engine project

mirrored_repos / MachineLearning / coqui-ai / Inference Engine

https://github.com/coqui-ai/inference-engine Coqui Inference Engine

llm-inference

0

Updated Jun 09, 2024

0 0 0 0

Updated Jun 09, 2024

🐾❤️ Strive to be the person your dogs believe you are ❤️🐾