Show HN: FiddleCube – Generate Q&A to test your LLM
FiddleCube on GitHub helps create question-answer datasets for Large Language Models. It includes a guide, examples, and details on generating ideal datasets for testing, evaluating, and training LLMs. For more information, visit the GitHub page.
Read original articleThe GitHub URL contains information about FiddleCube, a tool designed to create optimal question-answer datasets for testing Large Language Models (LLMs). It offers a quickstart guide, usage examples, and details on generating ideal QnA datasets for LLM testing, evaluation, and training purposes. For further details or specific inquiries, users are encouraged to refer to the GitHub page for comprehensive documentation.
Related
GitHub – Karpathy/LLM101n: LLM101n: Let's Build a Storyteller
The GitHub repository "LLM101n: Let's build a Storyteller" offers a course on creating a Storyteller AI Large Language Model using Python, C, and CUDA. It caters to beginners, covering language modeling, deployment, programming, data types, deep learning, and neural nets. Additional chapters and appendices are available for further exploration.
Show HN: Python lib to run evals across providers: OpenAI, Anthropic, etc.
The GitHub repository provides details on LLM Safety Evals, accessible on evals.gg. It features a bar chart, a Twitter post, setup guidelines, and code execution commands. Contact for further support.
Detecting hallucinations in large language models using semantic entropy
Researchers devised a method to detect hallucinations in large language models like ChatGPT and Gemini by measuring semantic entropy. This approach enhances accuracy by filtering unreliable answers, improving model performance significantly.
Show HN: Qq: like jq, but can transcode between many formats
The GitHub repository hosts `qq`, a tool using `jq` query syntax and `gojq` for configuration format transcoding. It offers interactive query building, multiple format support, and encoding performance focus. Installation options include source or releases. Contributions welcome.
LLMs on the Command Line
Simon Willison presented a Python command-line utility for accessing Large Language Models (LLMs) efficiently, supporting OpenAI models and plugins for various providers. The tool enables running prompts, managing conversations, accessing specific models like Claude 3, and logging interactions to a SQLite database. Willison highlighted using LLM for tasks like summarizing discussions and emphasized the importance of embeddings for semantic search, showcasing LLM's support for content similarity queries and extensibility through plugins and OpenAI API compatibility.
If you have your own LLM, you may have sensitive/private data "in" it from your training. You may not be allowed to use this service from a legal point of view.
This is where RAG systems consistently fall down. The end user, by definition, doesn’t know what you’ve got in your data. They won’t ask questions carefully cherry-picked from it. They’ll ask questions they need to know the answer to, and more often than you think, those answers won’t be in your data. You absolutely must know how your system behaves when they do that.
Related
GitHub – Karpathy/LLM101n: LLM101n: Let's Build a Storyteller
The GitHub repository "LLM101n: Let's build a Storyteller" offers a course on creating a Storyteller AI Large Language Model using Python, C, and CUDA. It caters to beginners, covering language modeling, deployment, programming, data types, deep learning, and neural nets. Additional chapters and appendices are available for further exploration.
Show HN: Python lib to run evals across providers: OpenAI, Anthropic, etc.
The GitHub repository provides details on LLM Safety Evals, accessible on evals.gg. It features a bar chart, a Twitter post, setup guidelines, and code execution commands. Contact for further support.
Detecting hallucinations in large language models using semantic entropy
Researchers devised a method to detect hallucinations in large language models like ChatGPT and Gemini by measuring semantic entropy. This approach enhances accuracy by filtering unreliable answers, improving model performance significantly.
Show HN: Qq: like jq, but can transcode between many formats
The GitHub repository hosts `qq`, a tool using `jq` query syntax and `gojq` for configuration format transcoding. It offers interactive query building, multiple format support, and encoding performance focus. Installation options include source or releases. Contributions welcome.
LLMs on the Command Line
Simon Willison presented a Python command-line utility for accessing Large Language Models (LLMs) efficiently, supporting OpenAI models and plugins for various providers. The tool enables running prompts, managing conversations, accessing specific models like Claude 3, and logging interactions to a SQLite database. Willison highlighted using LLM for tasks like summarizing discussions and emphasized the importance of embeddings for semantic search, showcasing LLM's support for content similarity queries and extensibility through plugins and OpenAI API compatibility.