Skip to content

GitLab

Explore

Sign in

Tags

Tags give the ability to mark specific points in history as being important

This project is mirrored from https://github.com/lucidrains/self-rewarding-lm-pytorch. Pull mirroring updated Sep 19, 2024.

0.0.16

8f3d7529 · expose spin lambda hyperparameter · Jan 28, 2024
0.0.15

53a975ae · better naming · Jan 28, 2024
0.0.14

b88cd64d · bump · Jan 28, 2024
0.0.12

17fb85a9 · allow for dpo and its trainer to be exported and used independently · Jan 28, 2024
0.0.11

e41ac42b · just create a new DPO trainer per iteration, so that scheduler and optimizer is reset · Jan 28, 2024
0.0.10

9a3721c2 · oops · Jan 27, 2024
0.0.9

7a7b54fb · make it simpler · Jan 27, 2024
0.0.7

bd2efb59 · demonstrate ability to define own reward prompt. project management · Jan 27, 2024
0.0.6

45a74ef9 · use latest einx get_at for clarity · Jan 27, 2024
0.0.5

2e899a0d · allow for learning rate annealing for spin as well · Jan 27, 2024
0.0.4

de8d325b · spin trainer works by itself · Jan 27, 2024
0.0.2

5d6e8434 · make the sampling performant, SPIN should be finished · Jan 26, 2024
0.0.1

67922a13 · cast before decoding candidate response tensors back to string · Jan 25, 2024

Prev
1
2
3
Next

🐾❤️ Strive to be the person your dogs believe you are ❤️🐾