First Principles in Evaluating LLM Systems

– DRAFT –

Retrieval-augmented generation (RAG) has become a marketing buzzword ever since ChatGPT sparked the widespread interest in Generative AI and Large Language Models (LLMs). In its common definition, RAG refers to an LLM application that injects knowledge retrieved from external databases into the model’s context to generate answers.

Howeveer, this definition diverges significantly from the Retrieval-Augmented Generation paper, which describes a system trained end-to-end.

The current understanding of a RAG system is essentially a two-stage process:

Retrieve
Generate

where the newly minted AI Engineers often rely on vector databases for the retrieval step before generating a response.

In this blog post, we will first generalize this simplified process and then apply first principles to distill down this basic building block of LLM systems to faciliate evaluation driven development.

LLM Applications

First Principles

Full Talk Slide Deck

@article{
    leehanchung,
    author = {Lee, Hanchung},
    title = {First Principles in Evaluating LLM Systems},
    year = {2024},
    month = {05},
    howpublished = {\url{https://leehanchung.github.io}},
    url = {https://leehanchung.github.io/blogs/2024/05/22/first-principles-eval/}
}