October 16th, 2024

Nvidia releases weights for Llama-3.1-Nemotron-70B-Instruct

NVIDIA launched the Llama-3.1-Nemotron-70B-Instruct model, ranking first in three benchmarks, utilizing RLHF techniques, requiring significant hardware, and emphasizing ethical AI development and responsible usage.

Read original article

Nvidia releases weights for Llama-3.1-Nemotron-70B-Instruct

NVIDIA has introduced the Llama-3.1-Nemotron-70B-Instruct, a large language model designed to enhance the helpfulness of responses generated for user queries. As of October 1, 2024, it ranks first in three automatic alignment benchmarks: Arena Hard (85.0), AlpacaEval 2 LC (57.6), and GPT-4-Turbo MT-Bench (8.98), surpassing other leading models like GPT-4o and Claude 3.5 Sonnet. The model was developed using Reinforcement Learning from Human Feedback (RLHF) techniques, specifically REINFORCE, and is based on the Llama-3.1-70B-Instruct model. It is optimized for deployment using NVIDIA's NeMo Framework, requiring significant hardware resources, including multiple NVIDIA GPUs. The model can handle a maximum of 128k tokens for input and 4k tokens for output. Ethical considerations are emphasized, with NVIDIA advocating for responsible AI development and usage. Users are encouraged to follow the terms of service and collaborate with model teams to ensure compliance with industry standards. The model is available for free hosted inference, and detailed instructions for setup and usage are provided.

- Llama-3.1-Nemotron-70B-Instruct is optimized for helpfulness in user queries.

- It ranks first in three key alignment benchmarks as of October 2024.

- The model utilizes RLHF techniques for training and evaluation.

- Deployment requires substantial hardware resources, including NVIDIA GPUs.

- Ethical AI development and responsible usage are emphasized by NVIDIA.

Llama 3.1: Our most capable models to date

Meta has launched Llama 3.1 405B, an advanced open-source AI model supporting diverse languages and extended context length. It introduces new features like Llama Guard 3 and aims to enhance AI applications with improved models and partnerships.

Meta Llama 3.1 405B

The Meta AI team unveils Llama 3.1, a 405B model optimized for dialogue applications. It competes well with GPT-4o and Claude 3.5 Sonnet, offering versatility and strong performance in evaluations.

Llama 3 Secrets Every Engineer Must Know

Llama 3 is an advanced open-source language model trained on 15 trillion multilingual tokens, featuring 405 billion parameters, improved reasoning, and multilingual capabilities, while exploring practical applications and limitations.

Nvidia releases NVLM 1.0 72B open weight model

NVIDIA launched NVLM 1.0, featuring the open-sourced NVLM-D-72B model, which excels in multimodal tasks, outperforms competitors like GPT-4o, and supports multi-GPU loading for text and image interactions.

Open-source 70B model surpass GPT-4o and Claude 3.5 on Arena Hard

NVIDIA's Llama-3.1-Nemotron-70B-Instruct model, optimized for helpfulness, ranks first in alignment benchmarks, utilizes advanced RLHF techniques, requires significant hardware, and emphasizes ethical usage guidelines for responsible application.

3 comments

By @robertritz - 7 months

I gave it a softball question and it got it horribly wrong.

"Cidr /17 is what subnet mask?"

Here is it's answer:

"A simple subnetting question!

Given: CIDR /17

To determine the corresponding subnet mask, we need to understand how CIDR notation works. CIDR (Classless Inter-Domain Routing) notation is a compact way to express the relationship between an IP address and its associated routing prefix. The number after the slash (/) represents the number of leading 1-bits in the subnet mask when it's converted to binary.

A subnet mask for IPv4 is 32 bits long. Here's how to convert /17 to a subnet mask:

Determine the number of 1-bits: /17 means the first 17 bits are 1. Determine the number of 0-bits: Since the total length is 32 bits, and 17 are 1-bits, the remaining bits are 0. So, 32 - 17 = 15 bits are 0. Convert to binary and then to dotted decimal: Binary: The first 17 bits are 1, and the last 15 bits are 0. 11111111111111111 000000000000000 Dotted Decimal: Divide the binary into four parts (each 8 bits long, known as octets) and convert each to decimal. 11111111 = 255 11111111 = 255 00000000 = 0 000000000000000 (last 8 bits of the 15 zeros) = 0 So, the subnet mask for CIDR /17 in dotted decimal notation is: 255.255.0.0."

Yes it's a simple subnetting question and it got it wrong. Right answer is 255.255.128.0.

By @rvnx - 7 months

> As of 1 Oct 2024, this model is #1 on all three automatic alignment benchmarks (verified tab for AlpacaEval 2 LC), edging out strong frontier models such as GPT-4o and Claude 3.5 Sonnet.

> this model can correctly the question How many r in strawberry? without specialized prompting or additional reasoning tokens

Can be tested here: https://huggingface.co/chat/models/nvidia/Llama-3.1-Nemotron...

Llama 3.1: Our most capable models to date

Meta Llama 3.1 405B

The Meta AI team unveils Llama 3.1, a 405B model optimized for dialogue applications. It competes well with GPT-4o and Claude 3.5 Sonnet, offering versatility and strong performance in evaluations.

Nvidia releases weights for Llama-3.1-Nemotron-70B-Instruct

Related

Llama 3.1: Our most capable models to date

Meta Llama 3.1 405B

Llama 3 Secrets Every Engineer Must Know

Nvidia releases NVLM 1.0 72B open weight model

Open-source 70B model surpass GPT-4o and Claude 3.5 on Arena Hard

Related

Llama 3.1: Our most capable models to date

Meta Llama 3.1 405B

Llama 3 Secrets Every Engineer Must Know

Nvidia releases NVLM 1.0 72B open weight model

Open-source 70B model surpass GPT-4o and Claude 3.5 on Arena Hard