MaMMUT: A simple vision-encoder text-decoder architecture for multimodal tasks | Heykuki News

Heykuki News

Top New Best Ask Show Jobs

MaMMUT: A simple vision-encoder text-decoder architecture for multimodal tasks | Heykuki News