Skip to content
GitLab
Explore
Sign in
Tags
Tags give the ability to mark specific points in history as being important
This project is mirrored from
https://github.com/lucidrains/st-moe-pytorch
. Pull mirroring updated
Sep 19, 2024
.
0.1.8
d7669d43
·
0.1.8
·
Jun 04, 2024
0.1.7
6b7f7fbb
·
remove erroneous backwards for split_by_rank
·
Feb 29, 2024
0.1.6
8eb41cc5
·
address
https://github.com/lucidrains/st-moe-pytorch/issues/4
·
Jan 24, 2024
0.1.5
19577711
·
make sure contiguous
·
Dec 14, 2023
0.1.4
51727d00
·
router z loss should be calculated on the unnoised gating logits
·
Sep 21, 2023
0.1.2
d9f5f089
·
allow for noising of gates
·
Sep 20, 2023
0.1.1
977ee550
·
researcher will want to log the unweighted auxiliary losses
·
Sep 11, 2023
0.1.0
5d5f0714
·
rename loss_coef to balance_loss_coef, sum the balance and router z-loss and...
·
Sep 11, 2023
0.0.30
2bb762de
·
handle variable sequence lengths if `allow_var_seq_len = True` on `Experts`
·
Sep 11, 2023
0.0.29
00be3460
·
any combinatino of number of experts and world size should not break
·
Sep 10, 2023
0.0.28
52b5c8a7
·
oops
·
Sep 10, 2023
0.0.27
83d75b83
·
chip away at edge cases
·
Sep 10, 2023
0.0.25
54188734
·
another micro optimization for communication
·
Sep 10, 2023
0.0.24
666d2fd4
·
in split by rank function, cache the sizes so on backwards there is not an extra call
·
Sep 10, 2023
0.0.23
085d5118
·
start journeying into distributed mixture of experts implementation
·
Sep 09, 2023
0.0.22
97a56888
·
add ability to use differentiable topk
·
Aug 25, 2023
0.0.21
22dfd4da
·
allow for different thresholds between second and third expert
·
Aug 21, 2023
0.0.20
f9b8ce34
·
multiply gates by mask_flat twice, as in mesh tensorflow code for top-n gating
·
Aug 21, 2023
0.0.19
1ca8170a
·
better naming
·
Aug 21, 2023
0.0.18
5ef273bb
·
generalize to top-n gating, parallelize as much as possible
·
Aug 21, 2023
Prev
1
2
Next