Branches · mirrored_repos / MachineLearning / Llama.Cpp · GitLab

This project is mirrored from https://github.com/ggerganov/llama.cpp. Pull mirroring updated Sep 19, 2024.

ceb/kompute-llama-bench

3536cf60 · llama : remove obsolete set of n_threads=1 · Jan 30, 2024
ik/fix_iq3xxs_metal

719a0871 · iq3_xxs: forgotten update of the grid points · Jan 30, 2024
ik/iq3_xxs

fb6576bc · Add IQ3_XXS to test-backend-ops · Jan 30, 2024
ik/faster_iq2xs_avx2

aa6698a8 · Speed up computing sign bits in AVX2 iq2_xs dot product · Jan 30, 2024
null-dereference-on-alloc-failure

9581b8cf · Fixed the fix of the fix · Jan 29, 2024
gg/server-fix-shift

5f62e231 · server : take system_tokens into account · Jan 29, 2024
gg/fix-py

9ef9316f · py : fix except (#5189) · Jan 29, 2024
sl/max-buf-size

b45620d3 · add max buffer sizes to opencl and metal backends · Jan 29, 2024
gg/script-parse-wtype

fb75fc04 · scripts : parse wtype in server-llm.sh · Jan 28, 2024
ceb/nvcc-cpu-arch

a5d7765a · cmake : pass ARCH_FLAGS to -Xcompiler on MSVC · Jan 26, 2024
0cc4m/fix-opencl-bias-tensors

f6f540e1 · Put add kernel into different string to stay within MSVC string length limit,... · Jan 26, 2024
sl/alloc-margin

0c979ca3 · ggml-alloc : add 10% margin to the buffer sizes · Jan 26, 2024
sl/cuda-alloc-size-fix

fbe20453 · cuda : fix tensor size calculation for non-split buffer · Jan 26, 2024
gg/flash-attn-simd

2bf91c53 · metal : clean up · Jan 25, 2024
ceb/nomic-vulkan

637366b5 · fix q4_0/q4_1 mmv, 1009 -> 993 failures (16 less) · Jan 24, 2024
ik/fix_q3k_xs

baa70cd7 · Fix Q3_K_XS for MoE models · Jan 24, 2024
gg/flash-attn-wip3

6ccbd177 · wip · Jan 24, 2024
ik/bucket_sort

26dde91a · Bucket sort: another minor improvement · Jan 24, 2024
gg/flash-attn-wip4

da23b56f · wip : no ic 8 step · Jan 24, 2024
sl/graph-inputs

eaa7722a · llama : pre-allocate input tensors in a separate buffer · Jan 23, 2024

Prev
1
…
3
4
5
6
7
8
9
10
11
…
26
Next

🐾❤️ Strive to be the person your dogs believe you are ❤️🐾