Tags

Tags give the ability to mark specific points in history as being important

This project is mirrored from https://github.com/lucidrains/self-rewarding-lm-pytorch. Pull mirroring updated Sep 19, 2024.

0.2.12

ebeca908 · 0.2.12 · Apr 10, 2024
0.2.11

2db4fed1 · 0.2.11 · Mar 29, 2024
0.2.10

41cb1772 · 0.2.10 · Mar 29, 2024
0.2.9

f8b26fbf · patch · Mar 26, 2024
0.2.8

0649dfb9 · fix type error · Feb 18, 2024
0.2.7

81fc3df9 · patch · Feb 08, 2024
0.2.6

04ec0667 · allow for an external LLM to play as reward model, as in DAP · Feb 08, 2024
0.2.5

ec8b9112 · address https://github.com/lucidrains/self-rewarding-lm-pytorch/issues/15 · Feb 03, 2024
0.2.4

bf546cdd · fix misnamed hyperparameter, and add validation function for parsed reward, project management · Feb 01, 2024
0.2.3

51b991c4 · make sure nucleus sampling and its threshold is customizable · Feb 01, 2024
0.2.2

bded2ccf · sft trainer auto concats multiple datasets · Feb 01, 2024
0.2.1

e9a582c5 · save an import for researcher · Feb 01, 2024
0.2.0

7c4ba1a8 · generalize the system · Feb 01, 2024
0.1.1

0dcd7f2e · allow the creation of the self-reward dpo dataset to be called within dpo... · Feb 01, 2024
0.1.0

f33215b7 · gradient accumulation · Jan 31, 2024
0.0.42

1acf9f32 · del memmap · Jan 31, 2024
0.0.41

602049de · use separate folders for each iteration of dpo w/ early stopping · Jan 31, 2024
0.0.40

113c0ce2 · one more step towards arbitrary ordering of fine-tuning · Jan 31, 2024
0.0.39

9673d9b9 · update reference model with policy within the spin and dpo trainers... · Jan 31, 2024
0.0.38

a83249f8 · able to customize SPIN if SPINTrainer is instantiated with model directly · Jan 31, 2024

Prev
1
2
3
Next

🐾❤️ Strive to be the person your dogs believe you are ❤️🐾