Skip to content
GitLab
Explore
Sign in
Overview
Active
Stale
All
This project is mirrored from
https://github.com/ggerganov/llama.cpp
. Pull mirroring updated
Sep 20, 2024
.
server-rev
c0f4d548
·
server : add comment about changing slot_state to bool
·
Oct 22, 2023
upd-issue-templates
b9bb4cbe
·
Separate bug and enhancement template + no default title
·
Oct 23, 2023
cuda-batched-gemm-deq
69664749
·
cuda : play with faster Q4_0 dequantization
·
Oct 24, 2023
cuda-batched-gemm
d798a17c
·
cuda : add TODO for calling cublas from kernel + using mem pool
·
Oct 24, 2023
fix-server-system
4b32c65d
·
server : minor
·
Oct 24, 2023
cuda-quantum-batch
49af767f
·
build : add compile option to force use of MMQ kernels
·
Oct 27, 2023
cuda-multi-gpu
cd3e20fb
·
cuda : fix multi-gpu with tensor cores
·
Oct 27, 2023
starcoder-cuda
fb2f5f5b
·
starcoder : offload layers to GPU
·
Oct 28, 2023
sampling-greedy-with-probs
bbfc62ac
·
sampling : temp == 0.0 -> no probs, temp < 0.0 -> probs
·
Oct 28, 2023
apply-3585
de7e0912
·
convert : ignore tokens if their IDs are within [0, vocab_size)
·
Oct 28, 2023
fix-kv-shift
fb645834
·
llama : fix kv shift bug
·
Oct 28, 2023
ggml-quants
8a86b95e
·
quantize : --pure option for disabling k-quant mixtures
·
Oct 28, 2023
scratch
15267192
·
llama : refactor tensor offloading as callback
·
Oct 29, 2023
llama-refactor-ffn
3b778a4a
·
llama : add llm_build_ffn helper function
·
Oct 29, 2023
lto
bc28aaa8
·
make : use -lfto=auto to avoid warnings and maintain perf
·
Oct 30, 2023
ggml-impl
4b3cb98d
·
ggml-impl : move extern "C" to start of file
·
Oct 30, 2023
llama-refactor-norm
7923b70c
·
llama : add llm_build_inp_embd helper
·
Oct 31, 2023
deploy
dab42893
·
scripts : working curl pipe
·
Oct 31, 2023
test-mmv
29fe5169
·
wip
·
Oct 31, 2023
try-fix-3869
22cc9bef
·
cuda : check if this fixes Pascal card regression
·
Oct 31, 2023
Prev
1
…
7
8
9
10
11
12
13
14
15
…
26
Next