Skip to content
GitLab
Explore
Sign in
Overview
Active
Stale
All
This project is mirrored from
https://github.com/ggerganov/llama.cpp
. Pull mirroring updated
Sep 19, 2024
.
gg/flash-attn-online
a9681feb
·
ggml : online attention (CPU)
·
Jan 20, 2024
ik/faster_imatrix
3aa56562
·
imatrix: add --no-ppl option to skip PPL calculations altogether
·
Jan 20, 2024
ik/q3_k_xs
29c41d49
·
Q3_K_XS: quanize first 1/8 of ffn_down layers with Q4_K
·
Jan 21, 2024
ik/kl-divergence
c0e9d27b
·
Add ability to compute KL-divergence
·
Jan 22, 2024
ik/keep_imatrix
c53a10ca
·
Be able to keep intermediate imatrix results
·
Jan 22, 2024
gg/flash-attn-wip
diverged from upstream
0f018b7e
·
wip
·
Jan 22, 2024
sl/qwen-fix
f0bb1052
·
llama : fix not enough space in buffer with Qwen
·
Jan 22, 2024
ik/kl-divergence-2
0b59931c
·
perplexity: a better organized KL-divergence statistics output
·
Jan 23, 2024
gg/minor
ea88e2a4
·
minor : clean-up some warnings and style
·
Jan 23, 2024
pydantic-fixups
bdf770b3
·
examples : make pydantic scripts pass mypy and support py3.8
·
Jan 23, 2024
gg/flash-attn-wip2
06c2d0d1
·
wip
·
Jan 23, 2024
sl/graph-inputs
eaa7722a
·
llama : pre-allocate input tensors in a separate buffer
·
Jan 23, 2024
gg/flash-attn-wip4
da23b56f
·
wip : no ic 8 step
·
Jan 24, 2024
ik/bucket_sort
26dde91a
·
Bucket sort: another minor improvement
·
Jan 24, 2024
gg/flash-attn-wip3
6ccbd177
·
wip
·
Jan 24, 2024
ik/fix_q3k_xs
baa70cd7
·
Fix Q3_K_XS for MoE models
·
Jan 24, 2024
ceb/nomic-vulkan
637366b5
·
fix q4_0/q4_1 mmv, 1009 -> 993 failures (16 less)
·
Jan 24, 2024
gg/flash-attn-simd
2bf91c53
·
metal : clean up
·
Jan 25, 2024
sl/cuda-alloc-size-fix
fbe20453
·
cuda : fix tensor size calculation for non-split buffer
·
Jan 26, 2024
sl/alloc-margin
0c979ca3
·
ggml-alloc : add 10% margin to the buffer sizes
·
Jan 26, 2024
Prev
1
…
15
16
17
18
19
20
21
22
23
…
26
Next