Skip to content
GitLab
Explore
Sign in
Tags
Tags give the ability to mark specific points in history as being important
This project is mirrored from
https://github.com/lucidrains/self-rewarding-lm-pytorch
. Pull mirroring updated
Sep 19, 2024
.
0.0.37
f5058f26
·
when carrying out iterative spin, need to update reference model with policy after each iteration
·
Jan 31, 2024
0.0.36
1191a62b
·
0.0.36
·
Jan 30, 2024
0.0.35
7ecb8a73
·
0.0.35
·
Jan 30, 2024
0.0.34
5d3a3e6b
·
0.0.34
·
Jan 30, 2024
0.0.33
74769c74
·
handle the logic for no ema decay externally
·
Jan 30, 2024
0.0.32
2105450d
·
bring in own ema library, default to frozen ref model
·
Jan 30, 2024
0.0.31
af570628
·
handle tensor output from eval module for early stopping
·
Jan 29, 2024
0.0.30
e0059e4e
·
finish first pass at early stopping, assuming evaluation module returns a...
·
Jan 29, 2024
0.0.29
34d58722
·
move early stopping break signal logic into the module
·
Jan 29, 2024
0.0.28
a4468df2
·
some prep work to allow for external reward module
·
Jan 29, 2024
0.0.27
e8788c36
·
bump
·
Jan 29, 2024
0.0.25
d8d7d1f7
·
also allow dpo trainer to be used independently
·
Jan 29, 2024
0.0.26
d8d7d1f7
·
also allow dpo trainer to be used independently
·
Jan 29, 2024
0.0.24
01c1bc1e
·
protect against less than 2 valid candidate responses
·
Jan 29, 2024
0.0.23
2d6b4d93
·
they used different sets of prompts for each self-reward iteration
·
Jan 29, 2024
0.0.22
fd060413
·
move all the logic for picking preference pairs to torch, to ready for...
·
Jan 28, 2024
0.0.20
22e0c217
·
validation loop for spin
·
Jan 28, 2024
0.0.19
f9a2a50f
·
more cleanup
·
Jan 28, 2024
0.0.18
fed19f40
·
update einx
·
Jan 28, 2024
0.0.17
9b5f218d
·
allow for iterative spin
·
Jan 28, 2024
Prev
1
2
3
Next