July 26th, 2024

LeanDojo: Theorem Proving in Lean Using Language Models

LeanDojo enhances theorem proving in Lean with retrieval-augmented language models, introducing Lean Copilot for tactic suggestions and proof searches. It includes benchmarks and enables interactive proof state observation, improving automation and discovering new proofs.

Read original article

LeanDojo: Theorem Proving in Lean Using Language Models

LeanDojo is a framework designed to enhance theorem proving in Lean using retrieval-augmented language models (LLMs). It introduces Lean Copilot, which assists users by suggesting tactics and searching for proofs, allowing for the integration of both local and cloud-based models. LeanDojo extracts proofs from Lean into datasets for training machine learning models, enabling these models to interact with Lean's proof environment effectively. The framework includes two benchmarks: LeanDojo Benchmark and LeanDojo Benchmark 4, which consist of extensive collections of theorems, proofs, tactics, and premises sourced from math libraries.

Key features of LeanDojo include detailed premise information and a challenging data split that encourages models to generalize rather than memorize proofs. The environment created by LeanDojo allows for interactive proof state observation and tactic execution, essential for training and evaluating provers. Experiments demonstrate that the ReProver model, trained on LeanDojo Benchmark, outperforms existing proof automation tactics, including those based on GPT-4.

Additionally, LeanDojo has successfully discovered new proofs and identified formalization bugs in theorem statements. A ChatGPT plugin has been developed to enable theorem proving through interaction with Lean, allowing for a blend of informal and formal mathematics, although it faces challenges in proof accuracy compared to specialized models. The team behind LeanDojo includes researchers from various institutions, contributing to advancements in automated theorem proving.

Dola Decoding by Contrasting Layers Improves Factuality in Large Language Models

A decoding strategy named DoLa reduces hallucinations in large language models without external knowledge. It contrasts logits from different layers to enhance truthfulness, improving factual generation by 12-17% in tasks like TruthfulQA.

Self hosting a Copilot replacement: my personal experience

The author shares their experience self-hosting a GitHub Copilot replacement using local Large Language Models (LLMs). Results varied, with none matching Copilot's speed and accuracy. Despite challenges, the author plans to continue using Copilot.

Can the New Mathstral LLM Accurately Compare 9.11 and 9.9?

Mathstral is a new 7B model by Mistral AI for math reasoning, with a 32k context window and Apache 2.0 license. It aims to improve common sense in math problem-solving, deployable locally with LlamaEdge and shareable via GaiaNet for customization and integration.

AI solves IMO problems at silver medal level

Google DeepMind's AI systems, AlphaProof and AlphaGeometry 2, solved four out of six International Mathematical Olympiad problems, achieving a silver medalist level, marking a significant milestone in AI mathematical reasoning.

Google DeepMind's AI systems can now solve complex math problems

Google DeepMind's AI systems, AlphaProof and AlphaGeometry 2, solved four of six problems from the International Mathematical Olympiad, achieving a silver medal and marking a significant advancement in AI mathematics capabilities.

0 comments

LeanDojo: Theorem Proving in Lean Using Language Models

Related

Dola Decoding by Contrasting Layers Improves Factuality in Large Language Models

Self hosting a Copilot replacement: my personal experience

Can the New Mathstral LLM Accurately Compare 9.11 and 9.9?

AI solves IMO problems at silver medal level

Google DeepMind's AI systems can now solve complex math problems

Related

Dola Decoding by Contrasting Layers Improves Factuality in Large Language Models

Self hosting a Copilot replacement: my personal experience

Can the New Mathstral LLM Accurately Compare 9.11 and 9.9?

AI solves IMO problems at silver medal level

Google DeepMind's AI systems can now solve complex math problems