New Phi-3.5 Models from Microsoft, including new MoE
Phi-3.5-MoE-instruct is a Microsoft model for text generation and reasoning, featuring 6.6 billion parameters, 128,000 token context, multilingual support, and rigorous safety evaluations, available in Azure AI Studio.
Read original articlePhi-3.5-MoE-instruct is a state-of-the-art open model developed by Microsoft, designed for high-quality text generation and reasoning tasks. It utilizes a mixture-of-expert architecture with 6.6 billion active parameters and supports a context length of 128,000 tokens. The model is trained on a diverse dataset comprising synthetic data and publicly available documents, focusing on multilingual capabilities and strong reasoning, particularly in code, math, and logic. It has undergone extensive fine-tuning and safety evaluations to ensure adherence to instructions and mitigate risks. Phi-3.5-MoE-instruct is intended for commercial and research applications, especially in environments with memory and compute constraints. While it performs competitively against larger models in various benchmarks, it is limited by its size, which may affect factual accuracy. The model is integrated into the official version of transformers and is also available in Azure AI Studio. Users are advised to consider the model's limitations and adhere to relevant laws when deploying it in high-risk scenarios.
- Phi-3.5-MoE-instruct is designed for high-quality text generation and reasoning tasks.
- It supports multilingual capabilities and has a context length of 128,000 tokens.
- The model is suitable for memory-constrained environments and has undergone rigorous safety evaluations.
- It performs competitively against larger models but may have limitations in factual accuracy.
- The model is available in Azure AI Studio and integrated into the official transformers library.
Related
NuExtract: A LLM for Structured Extraction
NuExtract is a structure extraction model by NuMind, offering tiny and large versions. NuMind also provides NuNER Zero and sentiment analysis models. Mistral 7B, by Mistral AI, excels in benchmarks with innovative attention mechanisms.
Mistral NeMo
Mistral AI introduces Mistral NeMo, a powerful 12B model developed with NVIDIA. It features a large context window, strong reasoning abilities, and FP8 inference support. Available under Apache 2.0 license for diverse applications.
Large Enough – Mistral AI
Mistral AI released Mistral Large 2, enhancing code generation, reasoning, and multilingual support with 123 billion parameters. It outperforms competitors and is available for research use via various cloud platforms.
Llama 3 Secrets Every Engineer Must Know
Llama 3 is an advanced open-source language model trained on 15 trillion multilingual tokens, featuring 405 billion parameters, improved reasoning, and multilingual capabilities, while exploring practical applications and limitations.
Mixture of a Million Experts
The paper "Mixture of A Million Experts" introduces a sparse MoE architecture, PEER, which improves transformer efficiency by enabling retrieval from over a million experts, enhancing performance without high computational costs.
Large MoE with impressive benchmarks: https://huggingface.co/microsoft/Phi-3.5-MoE-instruct
Vision: https://huggingface.co/microsoft/Phi-3.5-vision-instruct
Related
NuExtract: A LLM for Structured Extraction
NuExtract is a structure extraction model by NuMind, offering tiny and large versions. NuMind also provides NuNER Zero and sentiment analysis models. Mistral 7B, by Mistral AI, excels in benchmarks with innovative attention mechanisms.
Mistral NeMo
Mistral AI introduces Mistral NeMo, a powerful 12B model developed with NVIDIA. It features a large context window, strong reasoning abilities, and FP8 inference support. Available under Apache 2.0 license for diverse applications.
Large Enough – Mistral AI
Mistral AI released Mistral Large 2, enhancing code generation, reasoning, and multilingual support with 123 billion parameters. It outperforms competitors and is available for research use via various cloud platforms.
Llama 3 Secrets Every Engineer Must Know
Llama 3 is an advanced open-source language model trained on 15 trillion multilingual tokens, featuring 405 billion parameters, improved reasoning, and multilingual capabilities, while exploring practical applications and limitations.
Mixture of a Million Experts
The paper "Mixture of A Million Experts" introduces a sparse MoE architecture, PEER, which improves transformer efficiency by enabling retrieval from over a million experts, enhancing performance without high computational costs.