Branches · mirrored_repos / MachineLearning / Llama.Cpp · GitLab

This project is mirrored from https://github.com/ggerganov/llama.cpp. Pull mirroring updated Sep 19, 2024.

gg/flash-attn-online

a9681feb · ggml : online attention (CPU) · Jan 20, 2024
ik/faster_imatrix

3aa56562 · imatrix: add --no-ppl option to skip PPL calculations altogether · Jan 20, 2024
ik/q3_k_xs

29c41d49 · Q3_K_XS: quanize first 1/8 of ffn_down layers with Q4_K · Jan 21, 2024
ik/kl-divergence

c0e9d27b · Add ability to compute KL-divergence · Jan 22, 2024
ik/keep_imatrix

c53a10ca · Be able to keep intermediate imatrix results · Jan 22, 2024
gg/flash-attn-wip diverged from upstream

0f018b7e · wip · Jan 22, 2024
sl/qwen-fix

f0bb1052 · llama : fix not enough space in buffer with Qwen · Jan 22, 2024
ik/kl-divergence-2

0b59931c · perplexity: a better organized KL-divergence statistics output · Jan 23, 2024
gg/minor

ea88e2a4 · minor : clean-up some warnings and style · Jan 23, 2024
pydantic-fixups

bdf770b3 · examples : make pydantic scripts pass mypy and support py3.8 · Jan 23, 2024
gg/flash-attn-wip2

06c2d0d1 · wip · Jan 23, 2024
sl/graph-inputs

eaa7722a · llama : pre-allocate input tensors in a separate buffer · Jan 23, 2024
gg/flash-attn-wip4

da23b56f · wip : no ic 8 step · Jan 24, 2024
ik/bucket_sort

26dde91a · Bucket sort: another minor improvement · Jan 24, 2024
gg/flash-attn-wip3

6ccbd177 · wip · Jan 24, 2024
ik/fix_q3k_xs

baa70cd7 · Fix Q3_K_XS for MoE models · Jan 24, 2024
ceb/nomic-vulkan

637366b5 · fix q4_0/q4_1 mmv, 1009 -> 993 failures (16 less) · Jan 24, 2024
gg/flash-attn-simd

2bf91c53 · metal : clean up · Jan 25, 2024
sl/cuda-alloc-size-fix

fbe20453 · cuda : fix tensor size calculation for non-split buffer · Jan 26, 2024
sl/alloc-margin

0c979ca3 · ggml-alloc : add 10% margin to the buffer sizes · Jan 26, 2024

Prev
1
…
15
16
17
18
19
20
21
22
23
…
26
Next

🐾❤️ Strive to be the person your dogs believe you are ❤️🐾