Skip to content
GitLab
Explore
Sign in
Overview
Active
Stale
All
This project is mirrored from
https://github.com/ggerganov/llama.cpp
. Pull mirroring updated
Sep 19, 2024
.
gg/py-minor-fixes
56c04715
·
py : minor fixes
·
Feb 22, 2024
gg/simplify-fp16
19377a3f
·
ggml : more FP16 -> FP32 conversion fixes
·
Feb 22, 2024
ceb/mpt-tied-output
549fe807
·
mpt : do not duplicate token_embd.weight on disk
·
Feb 22, 2024
ik/iq3_xs_new2
e1b8efb9
·
Will this fix ROCm?
·
Feb 23, 2024
gg/float-pos
608f4498
·
swift : fix build
·
Feb 23, 2024
ceb/gemma-convert-ftype
fc252ea6
·
convert : fix missing ftype for gemma
·
Feb 23, 2024
gg/refactor-k-shift
c5ae0946
·
llama : cont k-shift refactoring + normalize type names
·
Feb 24, 2024
gg/normalize-enums
0db4cb0f
·
code : cont
·
Feb 24, 2024
hotfix/server-issue-5655-concurrent-embedding-final
04f4cbbd
·
server: tests: adding OAI compatible embedding with multiple inputs
·
Feb 24, 2024
feature/server-logs-improvment
a69c446f
·
server: logs PR feedback: change text log format to: LEVEL [function_name]...
·
Feb 25, 2024
doc/server-refresh-documentation
18239fa7
·
server: docs - refresh and tease a little bit more the http server
·
Feb 25, 2024
hotfix/server-test-increase-timeout-in-idle
0037c628
·
server: tests - longer inference timeout for CI
·
Feb 25, 2024
gg/unicode-opt
f47d82c1
·
unicode : reuse iterator
·
Feb 26, 2024
ik/iq2_s_new2
ec0abd2d
·
Update examples/quantize/quantize.cpp
·
Feb 26, 2024
gg/defrag
a5446c2f
·
llama : fix defrag bugs + enable by default
·
Feb 26, 2024
ik/iq4_nl_xs
d7bb4b6d
·
iq4_xs: Added forgotten check for 256 divisibility
·
Feb 27, 2024
gg/kv-compress
14d75706
·
llama : add llama_kv_cache_compress (EXPERIMENTAL)
·
Feb 27, 2024
ik/fix-android
6bc2e4b8
·
Attempt to fix android build
·
Feb 27, 2024
ik/i-quants-64
f0cbb6dd
·
iq1_s: turn off SIMD implementation for QK_K = 64 (it does not work)
·
Feb 28, 2024
gg/remove-awq
3921ff5c
·
awq-py : remove
·
Feb 28, 2024
Prev
1
…
19
20
21
22
23
24
25
26
Next