Branches · mirrored_repos / MachineLearning / Llama.Cpp · GitLab

This project is mirrored from https://github.com/ggerganov/llama.cpp. Pull mirroring updated Sep 19, 2024.

compilade/smaller-output-buffer

5f33a675 · perplexity : make hellaswag and multiple-choice outputs identical to master · Mar 20, 2024
sl/blas-backend

fa012a95 · move BLAS to a separate backend · Mar 20, 2024
jg/flash-attn-4

82ae7f33 · fused attention kernel for batch size 1 · Mar 20, 2024
compilade/fix-server-tests-penalty

9a424a38 · server : fix tests expecting old repeat penalty · Mar 19, 2024
jg/flash-attn

7fca4586 · pragma unroll, use_mask template parameter · Mar 19, 2024
gg/repeng

0a9bc301 · control-vectors : minor code style updates · Mar 14, 2024
gg/metal-embed

abf0afd0 · ci : fix iOS builds to use embedded library · Mar 14, 2024
ik/try_fix_iq1s_sycl

9f805264 · Attempt 2 · Mar 12, 2024
ik/even_better_iq1s

5440a127 · iq1_s: fix dequantize on the CPU · Mar 11, 2024
gg/try-fix-sycl-iq1_s

76be02ae · sycl : fix grid type · Mar 11, 2024
sycl_q3s_q1s

989e15b3 · Merge branch 'master' into sycl_q3s_q1s · Mar 11, 2024
gritlm-pr

b54afce9 · mostly style fixes; fix KQ_mask comment · Mar 09, 2024
gg/bert-f16

0ba20ed9 · llama : compute BERT graph with F16 K, V · Mar 07, 2024
revert-5901-fix_set_gpu

b5b02703 · Revert "[SYCL] fix error when set main gpu to non-zero (#5901)" · Mar 07, 2024
ik/iq3_s_multiplier

31cecc87 · iq3_s_mult_shuffle: use lookup table on Metal · Mar 05, 2024
gg/fix-embeddings-wip

4ec0e9ab · wip · Mar 04, 2024
sl/fix-cuda-soft-max-race

6564fbab · cuda : fix data race in soft max · Mar 03, 2024
tests/server/passkey

0c7f5b26 · server: tests: passkey add a negative test · Mar 02, 2024
feature/server/init-http-threads-with-n-slots

65e013b6 · server: init server http requests threads pool with max of hardware_concurrency -1 or n_slots + 2 · Mar 02, 2024
gg/fix-iq3_s-avx

55ac610c · ggml: fix IQ3_S AVX implementation · Mar 02, 2024

Prev
1
2
3
4
5
6
…
26
Next

🐾❤️ Strive to be the person your dogs believe you are ❤️🐾