August 20th, 2024

New Phi-3.5 Models from Microsoft, including new MoE

Phi-3.5-MoE-instruct is a Microsoft model for text generation and reasoning, featuring 6.6 billion parameters, 128,000 token context, multilingual support, and rigorous safety evaluations, available in Azure AI Studio.

Read original article

New Phi-3.5 Models from Microsoft, including new MoE

Phi-3.5-MoE-instruct is a state-of-the-art open model developed by Microsoft, designed for high-quality text generation and reasoning tasks. It utilizes a mixture-of-expert architecture with 6.6 billion active parameters and supports a context length of 128,000 tokens. The model is trained on a diverse dataset comprising synthetic data and publicly available documents, focusing on multilingual capabilities and strong reasoning, particularly in code, math, and logic. It has undergone extensive fine-tuning and safety evaluations to ensure adherence to instructions and mitigate risks. Phi-3.5-MoE-instruct is intended for commercial and research applications, especially in environments with memory and compute constraints. While it performs competitively against larger models in various benchmarks, it is limited by its size, which may affect factual accuracy. The model is integrated into the official version of transformers and is also available in Azure AI Studio. Users are advised to consider the model's limitations and adhere to relevant laws when deploying it in high-risk scenarios.

- Phi-3.5-MoE-instruct is designed for high-quality text generation and reasoning tasks.

- It supports multilingual capabilities and has a context length of 128,000 tokens.

- The model is suitable for memory-constrained environments and has undergone rigorous safety evaluations.

- It performs competitively against larger models but may have limitations in factual accuracy.

- The model is available in Azure AI Studio and integrated into the official transformers library.

NuExtract: A LLM for Structured Extraction

NuExtract is a structure extraction model by NuMind, offering tiny and large versions. NuMind also provides NuNER Zero and sentiment analysis models. Mistral 7B, by Mistral AI, excels in benchmarks with innovative attention mechanisms.

Mistral NeMo

Mistral AI introduces Mistral NeMo, a powerful 12B model developed with NVIDIA. It features a large context window, strong reasoning abilities, and FP8 inference support. Available under Apache 2.0 license for diverse applications.

Large Enough – Mistral AI

Mistral AI released Mistral Large 2, enhancing code generation, reasoning, and multilingual support with 123 billion parameters. It outperforms competitors and is available for research use via various cloud platforms.

Llama 3 Secrets Every Engineer Must Know

Llama 3 is an advanced open-source language model trained on 15 trillion multilingual tokens, featuring 405 billion parameters, improved reasoning, and multilingual capabilities, while exploring practical applications and limitations.

Mixture of a Million Experts

The paper "Mixture of A Million Experts" introduces a sparse MoE architecture, PEER, which improves transformer efficiency by enabling retrieval from over a million experts, enhancing performance without high computational costs.

3 comments

By @thecal - 9 months

Mini: https://huggingface.co/microsoft/Phi-3.5-mini-instruct

Large MoE with impressive benchmarks: https://huggingface.co/microsoft/Phi-3.5-MoE-instruct

Vision: https://huggingface.co/microsoft/Phi-3.5-vision-instruct

By @pseudosavant - 9 months

Does anyone have an idea what the output token limit is? I only see mention of the 128k token context window, but I bet the output limit is 4k tokens.

By @hurrdurr57 - 9 months

The Phi models always seem to do really well when it comes to benchmarks but then in real world performance they always fall way behind competing models.

New Phi-3.5 Models from Microsoft, including new MoE

Related