Skip to content
GitLab
Explore
Sign in
Overview
Active
Stale
All
This project is mirrored from
https://github.com/ggerganov/llama.cpp
. Pull mirroring updated
Sep 20, 2024
.
llava-fix-offloading
932589c0
·
Honor -ngl option for Cuda offloading in llava
·
Oct 14, 2023
ttfs-alloc-fix
32fe1a58
·
train-text-from-scratch : fix assert failure in ggml-alloc
·
Oct 13, 2023
ggml-enum-finetune-fix
a85229c4
·
ggml : add context enumeration functions
·
Oct 12, 2023
rev-sampling
5261aee8
·
sampling : one sequence per sampling context
·
Oct 12, 2023
llava
0bd7e69d
·
do not use Wno-cast-qual for MSVC
·
Oct 12, 2023
fix-server-kv-cache-manage
058e83ca
·
server : fix kv cache management
·
Oct 12, 2023
batched-bench
2fcdf869
·
batched-bench : add mmq CLI arg
·
Oct 11, 2023
alloc-assert-fix
ee745692
·
ggml-alloc : fix assert in debug builds
·
Oct 09, 2023
fix-metal-mul-mm
fdd5ad9a
·
metal : do not use mul_mm kernels when ne00 < 64
·
Oct 08, 2023
fix-kv-cache-access
ee268b54
·
llama : no longer perform uninitialized access to the KV cache
·
Oct 08, 2023
fix-refact
acead654
·
Merge branch 'master' into fix-refact
·
Oct 08, 2023
metal-improve-batching
6b9554a7
·
metal : print more GPU info + disable mul_mm for MTLGPUFamiliy < Apple7
·
Oct 08, 2023
gguf-fix-publish
ba44776d
·
bump version
·
Oct 07, 2023
per-layer-kv
f4f9367f
·
less code duplication, offload k and v separately
·
Oct 06, 2023
server-parallel
5ab6c213
·
server-parallel : add "--reverse-prompt" + compiler warning fixes
·
Oct 06, 2023
fix-sessions
5418932b
·
llama : fix comments for llama_kv_cache API
·
Oct 03, 2023
cublas-q-f16
39ddda27
·
disable fp16 mat mul completely with multi GPU
·
Sep 30, 2023
cuda-cmath
64beaf76
·
ggml-cuda : explicitly use cmath for log2
·
Sep 29, 2023
cparams-doc
1e3781cd
·
add notice to hot topics
·
Sep 29, 2023
train-fix-kq-pos
1eb4de0f
·
make sure KQ_pos is not reallocated in finetune
·
Sep 29, 2023
Prev
1
…
13
14
15
16
17
18
19
20
21
…
26
Next