Skip to content
GitLab
Explore
Sign in
Overview
Active
Stale
All
This project is mirrored from
https://github.com/ggerganov/llama.cpp
. Pull mirroring updated
Sep 19, 2024
.
sycl_q3s_q1s
989e15b3
·
Merge branch 'master' into sycl_q3s_q1s
·
Mar 11, 2024
gg/try-fix-sycl-iq1_s
76be02ae
·
sycl : fix grid type
·
Mar 11, 2024
ik/even_better_iq1s
5440a127
·
iq1_s: fix dequantize on the CPU
·
Mar 11, 2024
ik/try_fix_iq1s_sycl
9f805264
·
Attempt 2
·
Mar 12, 2024
gg/metal-embed
abf0afd0
·
ci : fix iOS builds to use embedded library
·
Mar 14, 2024
gg/repeng
0a9bc301
·
control-vectors : minor code style updates
·
Mar 14, 2024
jg/flash-attn
7fca4586
·
pragma unroll, use_mask template parameter
·
Mar 19, 2024
compilade/fix-server-tests-penalty
9a424a38
·
server : fix tests expecting old repeat penalty
·
Mar 19, 2024
jg/flash-attn-4
82ae7f33
·
fused attention kernel for batch size 1
·
Mar 20, 2024
sl/blas-backend
fa012a95
·
move BLAS to a separate backend
·
Mar 20, 2024
compilade/smaller-output-buffer
5f33a675
·
perplexity : make hellaswag and multiple-choice outputs identical to master
·
Mar 20, 2024
ik/fix_k_cache_backend_tests
68e4fed4
·
Now fix test-quantize-fns
·
Mar 21, 2024
sl/cuda-f16-fix2
4f7e57a2
·
cuda : fix LLAMA_CUDA_F16 build
·
Mar 21, 2024
0cc4m/vulkan-improvements
1fceeb90
·
Fix Intel dequant issue
·
Mar 21, 2024
ik/try_fix_rocm_k_cache
a710d58d
·
Try fix quantized k-cache on ROCm
·
Mar 21, 2024
gg/metal-dequant-align
072c56fc
·
metal : fix the fix
·
Mar 22, 2024
gg/enable-cb-default
31f2d03f
·
server : enable continuous batching by default
·
Mar 22, 2024
patch-1
12aa74ba
·
minor : spacing
·
Mar 22, 2024
gg/hf-args
8c3d5b5a
·
common : remove defaults
·
Mar 22, 2024
ik/quantize_not_repeating
0e826d12
·
quantize: be able to specify the token embedding tensor type
·
Mar 22, 2024
Prev
1
…
21
22
23
24
25
26
Next