Skip to content
GitLab
Explore
Sign in
Overview
Active
Stale
All
This project is mirrored from
https://github.com/ggerganov/llama.cpp
. Pull mirroring updated
Sep 20, 2024
.
0cc4m/fix-opencl-bias-tensors
f6f540e1
·
Put add kernel into different string to stay within MSVC string length limit,...
·
Jan 26, 2024
ceb/nvcc-cpu-arch
a5d7765a
·
cmake : pass ARCH_FLAGS to -Xcompiler on MSVC
·
Jan 26, 2024
gg/script-parse-wtype
fb75fc04
·
scripts : parse wtype in server-llm.sh
·
Jan 28, 2024
sl/max-buf-size
b45620d3
·
add max buffer sizes to opencl and metal backends
·
Jan 29, 2024
gg/fix-py
9ef9316f
·
py : fix except (#5189)
·
Jan 29, 2024
gg/server-fix-shift
5f62e231
·
server : take system_tokens into account
·
Jan 29, 2024
null-dereference-on-alloc-failure
9581b8cf
·
Fixed the fix of the fix
·
Jan 29, 2024
ik/faster_iq2xs_avx2
aa6698a8
·
Speed up computing sign bits in AVX2 iq2_xs dot product
·
Jan 30, 2024
ik/iq3_xxs
fb6576bc
·
Add IQ3_XXS to test-backend-ops
·
Jan 30, 2024
ik/fix_iq3xxs_metal
719a0871
·
iq3_xxs: forgotten update of the grid points
·
Jan 30, 2024
ceb/kompute-llama-bench
3536cf60
·
llama : remove obsolete set of n_threads=1
·
Jan 30, 2024
0cc4m/vulkan-fixes
7c8cf299
·
Fix small matrix multiplication errors in AMD GPUs on Windows or with amdvlk
·
Jan 31, 2024
sl/tasks-threads
7a04afde
·
ggml : limit n_threads to the max n_tasks
·
Jan 31, 2024
gg/remove-max-devices
1139b66f
·
readme : change deprecation notice to "remove" and fix url
·
Jan 31, 2024
gg/flash-attn-mask-f16
1ad42b1f
·
ggml : ggml_soft_max uses F16 mask
·
Jan 31, 2024
flash-attn-cuda
ac26f270
·
cuda : increase C to 128 for better performance
·
Feb 01, 2024
sl/cuda-f16-fix
3d2eb9bb
·
cuda : fix LLAMA_CUDA_F16
·
Feb 01, 2024
gg/flash-attn-cuda
b957b8f5
·
cuda : add flash_attn kernel (wip)
·
Feb 01, 2024
ceb/rope-scaling-type
95d91fd3
·
YaRN : store rope scaling type as int32_t in memory
·
Feb 02, 2024
ik/imatrix_tools
4e0d6dd9
·
imatrix: be able to start from a specific chunk
·
Feb 03, 2024
Prev
1
…
16
17
18
19
20
21
22
23
24
…
26
Next