Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity | Synapse