Automated reasoning to remove LLM hallucinations
Amazon Web Services has launched Automated Reasoning checks in Amazon Bedrock Guardrails to enhance large language model accuracy, allowing organizations to validate outputs against established facts, currently in preview in Oregon.
Read original articleAmazon Web Services (AWS) has introduced Automated Reasoning checks as a new feature in Amazon Bedrock Guardrails, aimed at enhancing the accuracy of responses generated by large language models (LLMs) and mitigating factual errors caused by hallucinations. This feature employs mathematical and logic-based verification processes to ensure that the outputs of generative AI applications align with established facts. Automated Reasoning checks are part of a broader suite of safeguards that include content filtering and personal identifiable information (PII) redaction. The checks allow organizations to encode their rules and guidelines into structured policies, which can then be used to validate the accuracy of LLM-generated content. This approach is particularly beneficial for applications where factual accuracy is critical, such as in human resources or operational workflows. The system analyzes uploaded documents to create initial reasoning policies, which can be tested and refined through a user-friendly interface. The Automated Reasoning checks are currently available in preview in the US West (Oregon) AWS Region, with plans for broader access in the future. This initiative positions AWS as a leader in responsible AI capabilities among major cloud providers.
- AWS has launched Automated Reasoning checks to improve the accuracy of LLM outputs.
- The feature is part of Amazon Bedrock Guardrails, which includes various content safety measures.
- Organizations can create structured policies to validate LLM-generated content.
- Automated Reasoning checks are particularly useful for applications requiring high factual accuracy.
- The feature is currently in preview in the US West (Oregon) AWS Region.
Related
Leveraging AI for efficient incident response
Meta has developed an AI-assisted system for root cause analysis, achieving 42% accuracy by combining heuristic retrieval and LLM ranking, significantly improving investigation efficiency while addressing potential risks through feedback and explainability.
Looming Liability Machines (LLMs)
The use of Large Language Models for root cause analysis in cloud incidents raises concerns about undermining human expertise, leading to superficial analyses, systemic failures, and risks from unexpected automated behaviors.
AWS AI Stack – Ready-to-Deploy Serverless AI App on AWS and Bedrock
The AWS AI Stack is a boilerplate for serverless AI applications, featuring backend services, a React frontend, AI chat functionality, and easy deployment with the Serverless Framework. A live demo is available.
Apple study proves LLM-based AI models are flawed because they cannot reason
Apple's study reveals significant reasoning shortcomings in large language models from Meta and OpenAI, introducing the GSM-Symbolic benchmark and highlighting issues with accuracy due to minor query changes and irrelevant context.
Amazon Nova
Amazon has launched Amazon Nova, a suite of foundation models for generative AI, featuring understanding and creative models, customization options, and safety controls to enhance productivity and reduce costs.
And for a use-case simple enough for this system to work (e.g. regurgitate a policy), it seems like the LLM is unnecessary. After all, if your system can perfectly interpret the question and answer and see if this rule set applies, then you can likely just use the rule set to generate the answer rather than wasting resources with a giant language model.
What Amazon appears to have done here is use a transformers based neural network (aka LLM) to translate natural language into symbolic logic rules which are collectively used together in what could be identified as an Expert System.
Full Circle. Hilarious.
For reference to those on the younger side: The Computer Chronicles (1984) https://www.youtube.com/watch?v=_S3m0V_ZF_Q
The approaches seem very different though. I'm curious if anyone here has used either or both and can share feedback.
By constraining the field it is trying to solve it makes grounding the natural language question in a knowledge graph tractable.
An analogy is type inference in a computer language: it can't solve every problem but it's very useful much of the time (actually this is a lot more than an analogy because you can view a knowledge graph as an actual type system in some circumstances).
I get the vibe VC money is being burned with promises of an AGI that may never eventuate and there's no clear path to.
---
and yet, the paper that went around in March:
Paper Link: https://arxiv.org/pdf/2401.11817
Paper Title; Hallucination is Inevitable: An Innate Limitation of Large Language Models
---
Instead of trying to trick a bunch of people into thinking we can somehow ignore the flaws of post-LLM "AI" by also using the still flawed pre-LLM "AI", why don't we cut the salesman BS and just tell people not to use "AI" for the range of tasks it's not suited for.
Related
Leveraging AI for efficient incident response
Meta has developed an AI-assisted system for root cause analysis, achieving 42% accuracy by combining heuristic retrieval and LLM ranking, significantly improving investigation efficiency while addressing potential risks through feedback and explainability.
Looming Liability Machines (LLMs)
The use of Large Language Models for root cause analysis in cloud incidents raises concerns about undermining human expertise, leading to superficial analyses, systemic failures, and risks from unexpected automated behaviors.
AWS AI Stack – Ready-to-Deploy Serverless AI App on AWS and Bedrock
The AWS AI Stack is a boilerplate for serverless AI applications, featuring backend services, a React frontend, AI chat functionality, and easy deployment with the Serverless Framework. A live demo is available.
Apple study proves LLM-based AI models are flawed because they cannot reason
Apple's study reveals significant reasoning shortcomings in large language models from Meta and OpenAI, introducing the GSM-Symbolic benchmark and highlighting issues with accuracy due to minor query changes and irrelevant context.
Amazon Nova
Amazon has launched Amazon Nova, a suite of foundation models for generative AI, featuring understanding and creative models, customization options, and safety controls to enhance productivity and reduce costs.