Skip to content
GitLab
Explore
Sign in
Overview
Active
Stale
All
This project is mirrored from
https://github.com/ggerganov/llama.cpp
. Pull mirroring updated
Sep 19, 2024
.
ceb/kompute-llama-bench
3536cf60
·
llama : remove obsolete set of n_threads=1
·
Jan 30, 2024
ik/fix_iq3xxs_metal
719a0871
·
iq3_xxs: forgotten update of the grid points
·
Jan 30, 2024
ik/iq3_xxs
fb6576bc
·
Add IQ3_XXS to test-backend-ops
·
Jan 30, 2024
ik/faster_iq2xs_avx2
aa6698a8
·
Speed up computing sign bits in AVX2 iq2_xs dot product
·
Jan 30, 2024
null-dereference-on-alloc-failure
9581b8cf
·
Fixed the fix of the fix
·
Jan 29, 2024
gg/server-fix-shift
5f62e231
·
server : take system_tokens into account
·
Jan 29, 2024
gg/fix-py
9ef9316f
·
py : fix except (#5189)
·
Jan 29, 2024
sl/max-buf-size
b45620d3
·
add max buffer sizes to opencl and metal backends
·
Jan 29, 2024
gg/script-parse-wtype
fb75fc04
·
scripts : parse wtype in server-llm.sh
·
Jan 28, 2024
ceb/nvcc-cpu-arch
a5d7765a
·
cmake : pass ARCH_FLAGS to -Xcompiler on MSVC
·
Jan 26, 2024
0cc4m/fix-opencl-bias-tensors
f6f540e1
·
Put add kernel into different string to stay within MSVC string length limit,...
·
Jan 26, 2024
sl/alloc-margin
0c979ca3
·
ggml-alloc : add 10% margin to the buffer sizes
·
Jan 26, 2024
sl/cuda-alloc-size-fix
fbe20453
·
cuda : fix tensor size calculation for non-split buffer
·
Jan 26, 2024
gg/flash-attn-simd
2bf91c53
·
metal : clean up
·
Jan 25, 2024
ceb/nomic-vulkan
637366b5
·
fix q4_0/q4_1 mmv, 1009 -> 993 failures (16 less)
·
Jan 24, 2024
ik/fix_q3k_xs
baa70cd7
·
Fix Q3_K_XS for MoE models
·
Jan 24, 2024
gg/flash-attn-wip3
6ccbd177
·
wip
·
Jan 24, 2024
ik/bucket_sort
26dde91a
·
Bucket sort: another minor improvement
·
Jan 24, 2024
gg/flash-attn-wip4
da23b56f
·
wip : no ic 8 step
·
Jan 24, 2024
sl/graph-inputs
eaa7722a
·
llama : pre-allocate input tensors in a separate buffer
·
Jan 23, 2024
Prev
1
…
3
4
5
6
7
8
9
10
11
…
26
Next