Skip to content
GitLab
Explore
Sign in
Overview
Active
Stale
All
This project is mirrored from
https://github.com/ggerganov/llama.cpp
. Pull mirroring updated
Sep 19, 2024
.
compilade/smaller-output-buffer
5f33a675
·
perplexity : make hellaswag and multiple-choice outputs identical to master
·
Mar 20, 2024
sl/blas-backend
fa012a95
·
move BLAS to a separate backend
·
Mar 20, 2024
jg/flash-attn-4
82ae7f33
·
fused attention kernel for batch size 1
·
Mar 20, 2024
compilade/fix-server-tests-penalty
9a424a38
·
server : fix tests expecting old repeat penalty
·
Mar 19, 2024
jg/flash-attn
7fca4586
·
pragma unroll, use_mask template parameter
·
Mar 19, 2024
gg/repeng
0a9bc301
·
control-vectors : minor code style updates
·
Mar 14, 2024
gg/metal-embed
abf0afd0
·
ci : fix iOS builds to use embedded library
·
Mar 14, 2024
ik/try_fix_iq1s_sycl
9f805264
·
Attempt 2
·
Mar 12, 2024
ik/even_better_iq1s
5440a127
·
iq1_s: fix dequantize on the CPU
·
Mar 11, 2024
gg/try-fix-sycl-iq1_s
76be02ae
·
sycl : fix grid type
·
Mar 11, 2024
sycl_q3s_q1s
989e15b3
·
Merge branch 'master' into sycl_q3s_q1s
·
Mar 11, 2024
gritlm-pr
b54afce9
·
mostly style fixes; fix KQ_mask comment
·
Mar 09, 2024
gg/bert-f16
0ba20ed9
·
llama : compute BERT graph with F16 K, V
·
Mar 07, 2024
revert-5901-fix_set_gpu
b5b02703
·
Revert "[SYCL] fix error when set main gpu to non-zero (#5901)"
·
Mar 07, 2024
ik/iq3_s_multiplier
31cecc87
·
iq3_s_mult_shuffle: use lookup table on Metal
·
Mar 05, 2024
gg/fix-embeddings-wip
4ec0e9ab
·
wip
·
Mar 04, 2024
sl/fix-cuda-soft-max-race
6564fbab
·
cuda : fix data race in soft max
·
Mar 03, 2024
tests/server/passkey
0c7f5b26
·
server: tests: passkey add a negative test
·
Mar 02, 2024
feature/server/init-http-threads-with-n-slots
65e013b6
·
server: init server http requests threads pool with max of hardware_concurrency -1 or n_slots + 2
·
Mar 02, 2024
gg/fix-iq3_s-avx
55ac610c
·
ggml: fix IQ3_S AVX implementation
·
Mar 02, 2024
Prev
1
2
3
4
5
6
…
26
Next