I would like to share my recent project work done as a part of the computer vision course under Prof. Rob Fergus. In an effort to make image captioning results more reproducible, me and my colleague (Nicholas Choma) implemented a recently published state-of-the-art method which is popularly known as Bottom-Up and Top-Down Attention model.
The code is available on this GitHub link: https://github.com/njchoma/transformer_image_caption
Implementation of SOTA Image Captioning Model in Pytorch | Heykuki News