October 2nd, 2024

NVLM 1.0: Nvidia new open-source model

NVIDIA's NVLM 1.0 introduces multimodal large language models excelling in vision-language tasks, with the 72B version showing improved text performance, novel architecture, and open-sourced resources for community benefit.

Read original articleLink Icon
NVLM 1.0: Nvidia new open-source model

NVIDIA has introduced NVLM 1.0, a new family of multimodal large language models (LLMs) that achieve state-of-the-art performance in vision-language tasks, competing with both proprietary models like GPT-4o and open-access models such as Llama 3-V. The NVLM 1.0 model, particularly the 72B version, shows significant improvements in text-only tasks after multimodal training, with an average accuracy increase of 4.3 points. It outperforms or matches leading models across various benchmarks, including MathVista and OCRBench. The model demonstrates strong instruction-following capabilities and excels in tasks requiring reasoning, localization, and coding. Notably, NVLM 1.0 employs a novel architecture that enhances training efficiency and multimodal reasoning, alongside a unique 1-D tile-tagging design for high-resolution images. The training data is meticulously curated, emphasizing the importance of dataset quality and task diversity over sheer scale. The open-sourcing of model weights and training code aims to benefit the broader community.

- NVLM 1.0 achieves state-of-the-art results in vision-language tasks.

- The 72B model shows improved performance in text-only tasks post-multimodal training.

- It outperforms or matches leading models on key benchmarks.

- The model features a novel architecture for enhanced multimodal reasoning.

- Open-sourcing of model weights and training code is intended for community use.

Link Icon 2 comments
By @dlojudice - about 2 months
With its new massive and open AI model, they’re coming straight for GPT-4's throne! This could ignite a whole new wave of competition in generative AI
By @beanjuiceII - about 2 months
where's the open source?