Skip to content
GitLab
Explore
Sign in
Overview
Active
Stale
All
This project is mirrored from
https://github.com/ggerganov/llama.cpp
. Pull mirroring updated
Sep 20, 2024
.
llama-bpw
a08e1a92
·
llama.cpp : show model size and BPW on load
·
Sep 17, 2023
custom-attention-mask-no-roped-cache
784d14ed
·
llama : store non-RoPEd K cache (WIP)
·
Sep 17, 2023
cam-cuda
93352769
·
Merge branch 'custom-attention-mask' into cam-cuda
·
Sep 19, 2023
cam-cuda-2
d30ab79b
·
fix rope shift
·
Sep 20, 2023
cont-reshape
0fd462fd
·
ggml : revert change to ggml_cpy, add ggml_cont_Nd instead
·
Sep 20, 2023
llama-bench-readme
3f9a4830
·
llama-bench : add README
·
Sep 23, 2023
cam-simple-fix
72e7ef4e
·
simple : fixes
·
Sep 26, 2023
cublas-f16
7d5674dd
·
restrict fp16 mat mul to volta and up
·
Sep 28, 2023
custom-attention-mask
c5650ed4
·
server : avoid context swaps by shifting the KV cache
·
Sep 28, 2023
ci-disable-freebsd
666ca5ae
·
ci : disable freeBSD builds due to lack of VMs
·
Sep 28, 2023
llama-model-params
c8a9658e
·
Merge remote-tracking branch 'origin/master' into llama-model-params
·
Sep 28, 2023
train-fix-kq-pos
1eb4de0f
·
make sure KQ_pos is not reallocated in finetune
·
Sep 29, 2023
cparams-doc
1e3781cd
·
add notice to hot topics
·
Sep 29, 2023
cuda-cmath
64beaf76
·
ggml-cuda : explicitly use cmath for log2
·
Sep 29, 2023
cublas-q-f16
39ddda27
·
disable fp16 mat mul completely with multi GPU
·
Sep 30, 2023
fix-sessions
5418932b
·
llama : fix comments for llama_kv_cache API
·
Oct 03, 2023
server-parallel
5ab6c213
·
server-parallel : add "--reverse-prompt" + compiler warning fixes
·
Oct 06, 2023
per-layer-kv
f4f9367f
·
less code duplication, offload k and v separately
·
Oct 06, 2023
gguf-fix-publish
ba44776d
·
bump version
·
Oct 07, 2023
metal-improve-batching
6b9554a7
·
metal : print more GPU info + disable mul_mm for MTLGPUFamiliy < Apple7
·
Oct 08, 2023
Prev
1
…
5
6
7
8
9
10
11
12
13
…
26
Next