Skip to content
GitLab
Explore
Sign in
Tags
Tags give the ability to mark specific points in history as being important
This project is mirrored from
https://github.com/lucidrains/st-moe-pytorch
. Pull mirroring updated
Sep 19, 2024
.
0.0.17
5f6b9929
·
keep cleaning
·
Aug 21, 2023
0.0.16
25440fc3
·
oops
·
Aug 21, 2023
0.0.15
1b03b1fd
·
remove 2-level, simplify code, thank @arankomat for consultation. likely...
·
Aug 21, 2023
0.0.14
f1a79592
·
cleanup
·
Aug 21, 2023
0.0.12
8c1ca310
·
just fix to the best performing policy
·
Aug 21, 2023
0.0.11
cfa0e510
·
fix the router z loss
·
Aug 20, 2023
0.0.10
49bde96f
·
allow dispatch tensor to pass gradients back
·
Aug 20, 2023
0.0.9
c5f92a82
·
allow dispatch tensor to pass gradients back
·
Aug 20, 2023
0.0.8
4b22fac8
·
without importance from the hierarchical moe, can do topk of 2 all at once
·
Aug 20, 2023
0.0.7
70f4081e
·
more cleanup
·
Aug 20, 2023
0.0.6
95d5c087
·
remove dropout, as in the paper, they show it is unhelpful (and also input...
·
Aug 20, 2023
0.0.5
7ac3f112
·
when doing eval, turn off balance and router z loss calculations
·
Aug 20, 2023
0.0.3
95d6de3d
·
init expert weights and biases
·
Aug 20, 2023
0.0.2
84e5b907
·
first pass for router z loss
·
Aug 19, 2023
0.0.1
566a05fb
·
start cleaning up, add the ff geglu based experts with multiplicative bias for...
·
Aug 19, 2023
Prev
1
2
Next