Skip to content
GitLab
Explore
Sign in
Overview
Active
Stale
All
This project is mirrored from
https://github.com/ggerganov/llama.cpp
. Pull mirroring updated
Sep 19, 2024
.
doc/server-refresh-documentation
18239fa7
·
server: docs - refresh and tease a little bit more the http server
·
Feb 25, 2024
feature/server-logs-improvment
a69c446f
·
server: logs PR feedback: change text log format to: LEVEL [function_name]...
·
Feb 25, 2024
hotfix/server-issue-5655-concurrent-embedding-final
04f4cbbd
·
server: tests: adding OAI compatible embedding with multiple inputs
·
Feb 24, 2024
gg/normalize-enums
0db4cb0f
·
code : cont
·
Feb 24, 2024
gg/refactor-k-shift
c5ae0946
·
llama : cont k-shift refactoring + normalize type names
·
Feb 24, 2024
ceb/gemma-convert-ftype
fc252ea6
·
convert : fix missing ftype for gemma
·
Feb 23, 2024
gg/float-pos
608f4498
·
swift : fix build
·
Feb 23, 2024
ik/iq3_xs_new2
e1b8efb9
·
Will this fix ROCm?
·
Feb 23, 2024
ceb/mpt-tied-output
549fe807
·
mpt : do not duplicate token_embd.weight on disk
·
Feb 22, 2024
gg/simplify-fp16
19377a3f
·
ggml : more FP16 -> FP32 conversion fixes
·
Feb 22, 2024
gg/py-minor-fixes
56c04715
·
py : minor fixes
·
Feb 22, 2024
gg/improve-gemma-quants
488bd973
·
llama : quantize token_embd.weight using output type
·
Feb 22, 2024
gg/add-gemma-conversion
7ad7da6a
·
Update convert-hf-to-gguf.py
·
Feb 22, 2024
sl/fix-quant-kv-shift
5271c756
·
llama : fix K-shift with quantized K (wip)
·
Feb 22, 2024
sl/gemma-offload-output
22ca4ddb
·
gemma : allow offloading the output tensor
·
Feb 21, 2024
ceb/fix-n-keep
f921fc3e
·
examples : do not assume BOS when shifting context
·
Feb 20, 2024
fix-convert-modelname
941de117
·
convert : get general.name from model dir, not its parent
·
Feb 20, 2024
ik/iq4_nl_no_superblock
daacf6ca
·
It was the ggml_vdotq thing missed inside the brackets
·
Feb 20, 2024
sl/fix-cuda-peer-access
62d3263f
·
fix hip
·
Feb 19, 2024
gg/flash-attn-sync
f249c997
·
llama : adapt to F16 KQ_pos
·
Feb 19, 2024
Prev
1
2
3
4
5
6
7
8
…
26
Next