June 25th, 2024

Show HN: FiddleCube – Generate Q&A to test your LLM

FiddleCube on GitHub helps create question-answer datasets for Large Language Models. It includes a guide, examples, and details on generating ideal datasets for testing, evaluating, and training LLMs. For more information, visit the GitHub page.

Read original article

Show HN: FiddleCube – Generate Q&A to test your LLM

The GitHub URL contains information about FiddleCube, a tool designed to create optimal question-answer datasets for testing Large Language Models (LLMs). It offers a quickstart guide, usage examples, and details on generating ideal QnA datasets for LLM testing, evaluation, and training purposes. For further details or specific inquiries, users are encouraged to refer to the GitHub page for comprehensive documentation.

GitHub – Karpathy/LLM101n: LLM101n: Let's Build a Storyteller

The GitHub repository "LLM101n: Let's build a Storyteller" offers a course on creating a Storyteller AI Large Language Model using Python, C, and CUDA. It caters to beginners, covering language modeling, deployment, programming, data types, deep learning, and neural nets. Additional chapters and appendices are available for further exploration.

Show HN: Python lib to run evals across providers: OpenAI, Anthropic, etc.

The GitHub repository provides details on LLM Safety Evals, accessible on evals.gg. It features a bar chart, a Twitter post, setup guidelines, and code execution commands. Contact for further support.

Detecting hallucinations in large language models using semantic entropy

Researchers devised a method to detect hallucinations in large language models like ChatGPT and Gemini by measuring semantic entropy. This approach enhances accuracy by filtering unreliable answers, improving model performance significantly.

Show HN: Qq: like jq, but can transcode between many formats

The GitHub repository hosts `qq`, a tool using `jq` query syntax and `gojq` for configuration format transcoding. It offers interactive query building, multiple format support, and encoding performance focus. Installation options include source or releases. Contributions welcome.

LLMs on the Command Line

Simon Willison presented a Python command-line utility for accessing Large Language Models (LLMs) efficiently, supporting OpenAI models and plugins for various providers. The tool enables running prompts, managing conversations, accessing specific models like Claude 3, and logging interactions to a SQLite database. Willison highlighted using LLM for tasks like summarizing discussions and emphasized the importance of embeddings for semantic search, showcasing LLM's support for content similarity queries and extensibility through plugins and OpenAI API compatibility.

7 comments

By @Loic - 10 months

For the people wondering, the Github repo is only hosting a couple of lines of Python to connect to their API.

If you have your own LLM, you may have sensitive/private data "in" it from your training. You may not be allowed to use this service from a legal point of view.

By @mistercow - 10 months

The bulleted list of what constitutes “ideal” is missing one of the most important types of questions: questions that aren’t answered by the knowledge set, but which seem like they should/might be.

This is where RAG systems consistently fall down. The end user, by definition, doesn’t know what you’ve got in your data. They won’t ask questions carefully cherry-picked from it. They’ll ask questions they need to know the answer to, and more often than you think, those answers won’t be in your data. You absolutely must know how your system behaves when they do that.

By @johnsutor - 10 months

How does this differ from Ragas? https://docs.ragas.io/en/latest/index.html

By @cruxcode - 10 months

Can it generate HTML as part of prompt?

By @praveenkumarnew - 10 months

Can I plug this into ragas pipeline

By @aditikothari - 10 months

This is super cool!

By @arjun9642 - 10 months

I want to hack

Show HN: FiddleCube – Generate Q&A to test your LLM

Related

GitHub – Karpathy/LLM101n: LLM101n: Let's Build a Storyteller

Show HN: Python lib to run evals across providers: OpenAI, Anthropic, etc.

Detecting hallucinations in large language models using semantic entropy

Show HN: Qq: like jq, but can transcode between many formats

LLMs on the Command Line

Related

GitHub – Karpathy/LLM101n: LLM101n: Let's Build a Storyteller

Show HN: Python lib to run evals across providers: OpenAI, Anthropic, etc.

Detecting hallucinations in large language models using semantic entropy

Show HN: Qq: like jq, but can transcode between many formats

LLMs on the Command Line