I
inference

Projects with this topic

View Sglang project

mirrored_repos / MachineLearning / sgl-project / Sglang

🔧🔗https://github.com/sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

cuda inference pytorch transformer moe Llama vlm Large Langua... llm-serving llava deepseek llama3

0

Updated Nov 23, 2025

0 0 0 0

Updated Nov 23, 2025
View Vllm project

mirrored_repos / MachineLearning / vllm-project / Vllm

🔧🔗https://github.com/vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

vllm amd cuda inference pytorch transformer Llama gpt rocm model-serving tpu hpu mlops xpu inferentia Large Langua... llm-inference llmops

0

Updated Nov 23, 2025

0 0 0 0

Updated Nov 23, 2025
View Vllm Ascend project

mirrored_repos / MachineLearning / vllm-project / Vllm Ascend

🔧🔗https://github.com/vllm-project/vllm-ascend Community maintained hardware plugin for vLLM on Ascend

inference transformer model-serving mlops ascend Large Langua... llmops llm-serving vllm

0

Updated Nov 22, 2025

0 0 0 0

Updated Nov 22, 2025
View Inference project

mirrored_repos / MachineLearning / roboflow / Inference

https://github.com/roboflow/inference A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models.

Python Docker deployment server classification vit inference-se... jetson tensorrt instance-seg... onnx yolact inference-api yolov5 yolov7 yolov8 Machine Lear... computer-vision inference object-detec...

0

Updated Nov 21, 2025

0 0 0 0

Updated Nov 21, 2025
View Faster Whisper project

mirrored_repos / MachineLearning / SYSTRAN / Faster Whisper

🔧🔗https://github.com/SYSTRAN/faster-whisper

Faster Whisper transcription with CTranslate2

Deep Learning inference transformer OpenAI quantization whisper speech-recog... speech-to-text

0

Updated Nov 19, 2025

0 0 0 0

Updated Nov 19, 2025
View Inception Core Server project

mirrored_repos / MachineLearning / nbursa / Inception Core Server

🔧🔗https://github.com/nbursa/inception-core-server

A modular, extensible Rust-based server providing short-term, long-term, and latent memory services, a chat endpoint with a BaseAgent + Sentience DSL, and integration with ChromaDB and LLM services.

Docker rust ai-memory memory dsl sqlite inference llm-inference

0

Updated Jul 15, 2025

0 1 0 0

Updated Jul 15, 2025
View Debug Print project

mirrored_repos / MachineLearning / flashinfer-ai / Debug Print

🔧🔗https://github.com/flashinfer-ai/debug-print

Debug print operator for cudagraph debugging

cuda cuda-kernels debug inference

0

Updated Jan 24, 2025

0 0 0 0

Updated Jan 24, 2025

🐾❤️ Strive to be the person your dogs believe you are ❤️🐾