Tag

AI

2 articles

Mar 8, 2026 7 min

A model scores 92% on MMLU — but did it learn the concepts or memorize the answers? Four detection strategies, from first principles.

Mar 4, 2026 11 min

LLM-as-a-judge from first principles — when to use it, how to design rubrics, the three biases that skew scores, and when to use something simpler.