Tags

Tags give the ability to mark specific points in history as being important

This project is mirrored from https://github.com/lucidrains/self-rewarding-lm-pytorch. Pull mirroring updated Sep 19, 2024.

0.0.37

f5058f26 · when carrying out iterative spin, need to update reference model with policy after each iteration · Jan 31, 2024
0.0.36

1191a62b · 0.0.36 · Jan 30, 2024
0.0.35

7ecb8a73 · 0.0.35 · Jan 30, 2024
0.0.34

5d3a3e6b · 0.0.34 · Jan 30, 2024
0.0.33

74769c74 · handle the logic for no ema decay externally · Jan 30, 2024
0.0.32

2105450d · bring in own ema library, default to frozen ref model · Jan 30, 2024
0.0.31

af570628 · handle tensor output from eval module for early stopping · Jan 29, 2024
0.0.30

e0059e4e · finish first pass at early stopping, assuming evaluation module returns a... · Jan 29, 2024
0.0.29

34d58722 · move early stopping break signal logic into the module · Jan 29, 2024
0.0.28

a4468df2 · some prep work to allow for external reward module · Jan 29, 2024
0.0.27

e8788c36 · bump · Jan 29, 2024
0.0.25

d8d7d1f7 · also allow dpo trainer to be used independently · Jan 29, 2024
0.0.26

d8d7d1f7 · also allow dpo trainer to be used independently · Jan 29, 2024
0.0.24

01c1bc1e · protect against less than 2 valid candidate responses · Jan 29, 2024
0.0.23

2d6b4d93 · they used different sets of prompts for each self-reward iteration · Jan 29, 2024
0.0.22

fd060413 · move all the logic for picking preference pairs to torch, to ready for... · Jan 28, 2024
0.0.20

22e0c217 · validation loop for spin · Jan 28, 2024
0.0.19

f9a2a50f · more cleanup · Jan 28, 2024
0.0.18

fed19f40 · update einx · Jan 28, 2024
0.0.17

9b5f218d · allow for iterative spin · Jan 28, 2024

Prev
1
2
3
Next

🐾❤️ Strive to be the person your dogs believe you are ❤️🐾