Skip to content
GitLab
Explore
Sign in
Tags
Tags give the ability to mark specific points in history as being important
This project is mirrored from
https://github.com/lucidrains/self-rewarding-lm-pytorch
. Pull mirroring updated
Sep 19, 2024
.
0.2.12
ebeca908
·
0.2.12
·
Apr 10, 2024
0.2.11
2db4fed1
·
0.2.11
·
Mar 29, 2024
0.2.10
41cb1772
·
0.2.10
·
Mar 29, 2024
0.2.9
f8b26fbf
·
patch
·
Mar 26, 2024
0.2.8
0649dfb9
·
fix type error
·
Feb 18, 2024
0.2.7
81fc3df9
·
patch
·
Feb 08, 2024
0.2.6
04ec0667
·
allow for an external LLM to play as reward model, as in DAP
·
Feb 08, 2024
0.2.5
ec8b9112
·
address
https://github.com/lucidrains/self-rewarding-lm-pytorch/issues/15
·
Feb 03, 2024
0.2.4
bf546cdd
·
fix misnamed hyperparameter, and add validation function for parsed reward, project management
·
Feb 01, 2024
0.2.3
51b991c4
·
make sure nucleus sampling and its threshold is customizable
·
Feb 01, 2024
0.2.2
bded2ccf
·
sft trainer auto concats multiple datasets
·
Feb 01, 2024
0.2.1
e9a582c5
·
save an import for researcher
·
Feb 01, 2024
0.2.0
7c4ba1a8
·
generalize the system
·
Feb 01, 2024
0.1.1
0dcd7f2e
·
allow the creation of the self-reward dpo dataset to be called within dpo...
·
Feb 01, 2024
0.1.0
f33215b7
·
gradient accumulation
·
Jan 31, 2024
0.0.42
1acf9f32
·
del memmap
·
Jan 31, 2024
0.0.41
602049de
·
use separate folders for each iteration of dpo w/ early stopping
·
Jan 31, 2024
0.0.40
113c0ce2
·
one more step towards arbitrary ordering of fine-tuning
·
Jan 31, 2024
0.0.39
9673d9b9
·
update reference model with policy within the spin and dpo trainers...
·
Jan 31, 2024
0.0.38
a83249f8
·
able to customize SPIN if SPINTrainer is instantiated with model directly
·
Jan 31, 2024
Prev
1
2
3
Next