Tags

Tags give the ability to mark specific points in history as being important

This project is mirrored from https://github.com/lucidrains/st-moe-pytorch. Pull mirroring updated Sep 19, 2024.

0.1.8

d7669d43 · 0.1.8 · Jun 04, 2024
0.1.7

6b7f7fbb · remove erroneous backwards for split_by_rank · Feb 29, 2024
0.1.6

8eb41cc5 · address https://github.com/lucidrains/st-moe-pytorch/issues/4 · Jan 24, 2024
0.1.5

19577711 · make sure contiguous · Dec 14, 2023
0.1.4

51727d00 · router z loss should be calculated on the unnoised gating logits · Sep 21, 2023
0.1.2

d9f5f089 · allow for noising of gates · Sep 20, 2023
0.1.1

977ee550 · researcher will want to log the unweighted auxiliary losses · Sep 11, 2023
0.1.0

5d5f0714 · rename loss_coef to balance_loss_coef, sum the balance and router z-loss and... · Sep 11, 2023
0.0.30

2bb762de · handle variable sequence lengths if `allow_var_seq_len = True` on `Experts` · Sep 11, 2023
0.0.29

00be3460 · any combinatino of number of experts and world size should not break · Sep 10, 2023
0.0.28

52b5c8a7 · oops · Sep 10, 2023
0.0.27

83d75b83 · chip away at edge cases · Sep 10, 2023
0.0.25

54188734 · another micro optimization for communication · Sep 10, 2023
0.0.24

666d2fd4 · in split by rank function, cache the sizes so on backwards there is not an extra call · Sep 10, 2023
0.0.23

085d5118 · start journeying into distributed mixture of experts implementation · Sep 09, 2023
0.0.22

97a56888 · add ability to use differentiable topk · Aug 25, 2023
0.0.21

22dfd4da · allow for different thresholds between second and third expert · Aug 21, 2023
0.0.20

f9b8ce34 · multiply gates by mask_flat twice, as in mesh tensorflow code for top-n gating · Aug 21, 2023
0.0.19

1ca8170a · better naming · Aug 21, 2023
0.0.18

5ef273bb · generalize to top-n gating, parallelize as much as possible · Aug 21, 2023

🐾❤️ Strive to be the person your dogs believe you are ❤️🐾