Publications
Evaluation and Alignment: The Seminal Papers
Manning Publications
A practical guide to designing and implementing AI evaluation systems, grounded in seminal research papers.
Get Early Access — 50% OffAbout the Book
Building AI systems is only half the battle — knowing whether they actually work is the other half. Evaluation and Alignment bridges the gap between foundational research and practical engineering, walking you through the seminal papers that shaped how we measure and align AI systems today.
Whether you’re building LLM-powered applications, fine-tuning models, or designing evaluation pipelines, this book gives you the conceptual and practical tools to do it rigorously.
What You’ll Learn
- Evaluation fundamentals — From classical metrics to modern LLM-as-judge approaches, understand what “good” means for your system and how to measure it.
- Alignment techniques — RLHF, constitutional AI, and the research lineage behind today’s alignment methods.
- Reading research effectively — Each chapter is anchored to seminal papers, teaching you how to extract practical insights from academic work.
- Building evaluation pipelines — Design end-to-end evaluation systems that catch regressions, measure progress, and build confidence in your deployments.
Table of Contents
- The Landscape of LLM Evaluation and Alignment: An Evolving Field
- The Dawn of Automatic Evaluation: BLEU and ROUGE
- Bridging the Semantic Gap with Learned Metrics: BERTScore and COMET
- LLM-as-a-Judge: The New Paradigm for Evaluation
- Detecting and Quantifying Hallucinations
- Evaluating Retrieval-Augmented Generation (RAG): The RAGAS Framework
- Reinforcement Learning from Human Preferences: The RLHF Foundation
- Constitutional AI: Alignment Through Principles
- Advanced Alignment and Safety: Red Teaming and Beyond
- Deliberative Alignment: Reasoning for Safety at Inference Time
- The Evolving Landscape of LLM Evaluation and Alignment
Table of contents is preliminary and subject to change during Early Access.
Why This Book?
The AI evaluation landscape is fragmented across hundreds of papers, blog posts, and tribal knowledge. This book distills the work that matters most into a single, coherent narrative — so you can spend less time reading papers and more time building systems that work.