NExT-GPT is the first end-to-end MM-LLM that perceives input and generates output in arbitrary combinations (any-to-any) of text, image, video, and audio and beyond.
2 comments
Show HN: NExT-GPT – First LLM working with multimodal input and output | Heykuki News