Much of the code is adopted from the Paint Transformer (https://arxiv.org/abs/2108.03798) paper. In this code base, I treat emojis like “brushes”, and use a Gumbel Softmax-based lookup to choose an emoji to paste to the canvas during training (similar to how attention uses a softmax to select keys during training).