Skip to content
GitLab
Explore
Sign in
Tags
Tags give the ability to mark specific points in history as being important
This project is mirrored from
https://github.com/lucidrains/self-rewarding-lm-pytorch
. Pull mirroring updated
Sep 19, 2024
.
0.0.16
8f3d7529
·
expose spin lambda hyperparameter
·
Jan 28, 2024
0.0.15
53a975ae
·
better naming
·
Jan 28, 2024
0.0.14
b88cd64d
·
bump
·
Jan 28, 2024
0.0.12
17fb85a9
·
allow for dpo and its trainer to be exported and used independently
·
Jan 28, 2024
0.0.11
e41ac42b
·
just create a new DPO trainer per iteration, so that scheduler and optimizer is reset
·
Jan 28, 2024
0.0.10
9a3721c2
·
oops
·
Jan 27, 2024
0.0.9
7a7b54fb
·
make it simpler
·
Jan 27, 2024
0.0.7
bd2efb59
·
demonstrate ability to define own reward prompt. project management
·
Jan 27, 2024
0.0.6
45a74ef9
·
use latest einx get_at for clarity
·
Jan 27, 2024
0.0.5
2e899a0d
·
allow for learning rate annealing for spin as well
·
Jan 27, 2024
0.0.4
de8d325b
·
spin trainer works by itself
·
Jan 27, 2024
0.0.2
5d6e8434
·
make the sampling performant, SPIN should be finished
·
Jan 26, 2024
0.0.1
67922a13
·
cast before decoding candidate response tensors back to string
·
Jan 25, 2024
Prev
1
2
3
Next