Show HN: Emu2 – A Gemini-like open-source 37B Multimodal Model

153 points

2 years ago

Hello HN, I'm excited to introduce Emu2, the latest generative multimodal model developed by the Beijing Academy of Artificial Intelligence (BAAI). Emu2 is an open-source initiative that reflects BAAI's commitment to fostering open, secure, and responsible AI research. It's designed to enhance AI's proficiency in handling tasks across various modalities with minimal examples and straightforward instructions.

Emu2 has demonstrated superior performance over other large-scale models like Flamingo-80B in few-shot multimodal understanding tasks. It serves as a versatile base model for developers, providing a flexible platform for crafting specialized multimodal applications.

Key features of Emu2 include:

- A more streamlined modeling framework than its predecessor, Emu.

- A decoder capable of reconstructing images from the encoder's semantic space.

- An expansion to 37 billion parameters, boosting both capabilities and generalization.

BAAI has also released fine-tuned versions, Emu2-Chat for visual understanding and Emu2-Gen for visual generation, which stand as some of the most powerful open-source models available today.

Here are the resources for those interested in exploring or contributing to Emu2:

- Project: https://baaivision.github.io/emu2/

- Model: https://huggingface.co/BAAI/Emu2

- Code: https://github.com/baaivision/Emu/tree/main/Emu2

- Demo: https://huggingface.co/spaces/BAAI/Emu2

- Paper: https://arxiv.org/abs/2312.13286

We're eager to see how the HN community engages with Emu2 and we welcome your feedback to help us improve. Let's collaborate to push the boundaries of multimodal AI!

27 comments