Research

Papers, conjectures, and open questions.

May 19, 2026
Process of Elimination: zebras, logic, and locks
A logic puzzle handed to six frontier thinking models.
May 12, 2026
The Three-Cylinders Problem: when models choose beauty over truth
Four frontier models tackle a clean geometry problem with a non-obvious answer. Three get it wrong, and how they get it wrong is more diagnostic than the score.
April 27, 2026
MathDuels: a self-play benchmark for mathematical reasoning
A benchmark in which each frontier model both solves problems and authors problems for others, yielding two ratings that avoid the saturation typical of fixed test sets.