Skip to content
GitLab
Explore
Sign in
Overview
Active
Stale
All
This project is mirrored from
https://github.com/ggerganov/llama.cpp
. Pull mirroring updated
Sep 19, 2024
.
gg/hellaswag-batched
9df62c25
·
perplexity : remove HellaSwag restruction for n_batch
·
Jan 18, 2024
ik/winogrande
e3a17dcb
·
winogrande: add dataset instructions
·
Jan 18, 2024
ceb/nomic-vulkan-fixes
681f6a1f
·
kompute : fix rope_f32 and scale ops
·
Jan 17, 2024
gg/imatrix-gpu-4931
2917e6b5
·
Merge branch 'master' into gg/imatrix-gpu-4931
·
Jan 17, 2024
gg/fix-autorelease-4952
06b49791
·
test : simplify
·
Jan 17, 2024
gg/fix-spm-added-tokens-dict-4958
23742deb
·
py : fix padded dummy tokens (I hope)
·
Jan 17, 2024
ik/better_q2_k_s
9fd1e83f
·
Use Q4_K for attn_v for Q2_K_S when n_gqa >= 4
·
Jan 17, 2024
gg/iq2-refactor-and-tests
49bafe09
·
tests : avoid creating RNGs for each tensor
·
Jan 17, 2024
cd/test-ggml-ci-run
29927a60
·
ggml-ci
·
Jan 17, 2024
gg/hellaswag-clear-kv-cache
27f5fc6d
·
perplexity : fix kv cache handling for hellaswag
·
Jan 16, 2024
ik/imatrix_legacy_quants
bb9abb5c
·
imatrix: guard Q4_0/Q5_0 against ffn_down craziness
·
Jan 16, 2024
crasm_segfault-on-pthread
e6e34b2a
·
add test to tests/CMakeLists.txt
·
Jan 15, 2024
gg/sched-eval-callback-4931
40cdb397
·
backend : clean-up the implementation
·
Jan 15, 2024
ik/quantize_iq2_notcompatible
dccaec76
·
The check for 256 divisibility was missing for IQ2_XS, IQ2_XXS
·
Jan 15, 2024
ik/cuda_faster_legacy_dequantize
08b89f7e
·
CUDA: faster dequantize kernels for Q4_0 and Q4_1
·
Jan 14, 2024
ik/imatrix_k_quants
90096a5f
·
Add ability to use importance matrix for all k-quants
·
Jan 14, 2024
gg/llama-trace
0abbe2fc
·
llama : check LLAMA_TRACE env for extra logging
·
Jan 14, 2024
ik/fix_qxm_moe
121eb066
·
Fix the fix
·
Jan 14, 2024
gg/metal-rm-api
96cf0282
·
metal : remove old API
·
Jan 13, 2024
sl/micro-batching
40b3c5ef
·
pipeline parallelism demo
·
Jan 13, 2024
Prev
1
…
5
6
7
8
9
10
11
12
13
…
26
Next