Skip to content
GitLab
Explore
Sign in
Overview
Active
Stale
All
This project is mirrored from
https://github.com/ggerganov/llama.cpp
. Pull mirroring updated
Sep 19, 2024
.
gg/remove-deprecated-api
5834217d
·
llama : remove deprecated API
·
Feb 28, 2024
gg/ci-less-chunks
5fc98fb8
·
ci : reduce 3b chunks to 1 to avoid timeout
·
Feb 28, 2024
feature/server-mul-mat-q
b4b0d533
·
server: docs: --no-mul-mat-q,-nommq
·
Feb 28, 2024
feature/server-http-threads
ccad4253
·
server: allow to override threads server pool with --threads-server
·
Feb 29, 2024
gg/fix-embeddings
008f3fc7
·
llama : fix embeddings
·
Feb 29, 2024
gg/remove-oai-proxy
0231bbc5
·
server : remove api_like_OAI.py proxy script
·
Mar 01, 2024
gg/fix-starcoder2
9862d59c
·
llama : change starcoder2 rope type
·
Mar 01, 2024
ceb/convert-vocab-fallback
f8ab5391
·
convert : update help string
·
Mar 01, 2024
cd/fix-workflow-check-requirements
3572aad5
·
workflows : use cleanup option while running check-requirements.sh
·
Mar 01, 2024
ik/iq3_s_faster
d4dfc250
·
Fix ARM_NEON
·
Mar 02, 2024
ceb/convert-hf-refactor
0b673ca1
·
s/_MODEL_CLASSES/_model_classes/
·
Mar 02, 2024
gg/fix-iq3_s-avx
55ac610c
·
ggml: fix IQ3_S AVX implementation
·
Mar 02, 2024
feature/server/init-http-threads-with-n-slots
65e013b6
·
server: init server http requests threads pool with max of hardware_concurrency -1 or n_slots + 2
·
Mar 02, 2024
tests/server/passkey
0c7f5b26
·
server: tests: passkey add a negative test
·
Mar 02, 2024
sl/fix-cuda-soft-max-race
6564fbab
·
cuda : fix data race in soft max
·
Mar 03, 2024
gg/fix-embeddings-wip
4ec0e9ab
·
wip
·
Mar 04, 2024
ik/iq3_s_multiplier
31cecc87
·
iq3_s_mult_shuffle: use lookup table on Metal
·
Mar 05, 2024
revert-5901-fix_set_gpu
b5b02703
·
Revert "[SYCL] fix error when set main gpu to non-zero (#5901)"
·
Mar 07, 2024
gg/bert-f16
0ba20ed9
·
llama : compute BERT graph with F16 K, V
·
Mar 07, 2024
gritlm-pr
b54afce9
·
mostly style fixes; fix KQ_mask comment
·
Mar 09, 2024
Prev
1
…
20
21
22
23
24
25
26
Next