A novel Transformer architecture that is much cheaper and faster, while matching and outperforming the standard. Sequence Lengths of 100K+ on 1 GPU. Intuition, evaluation and code available on repository.
Show HN: ELiTA: Linear-Time Attention Done Right | Heykuki News