The MAMBA product transformer with a language modeling head on best (linear layer with weights tied on the input
It starts with a linear projection to expand upon the enter embeddings. Then, a convolution before the https://k2spiceshop.com/product/liquid-k2-on-paper-online/