Research
Papers, conjectures, and open questions.
Process of Elimination: zebras, logics, and locks
A logic puzzle handed to six frontier thinking models.
The Three-Cylinders Problem: when models choose beauty over truth
Four frontier models tackle a clean geometry problem with a non-obvious answer. Three get it wrong — and how they get it wrong is more diagnostic than the score.
MathDuels: a self-play benchmark for mathematical reasoning
A benchmark in which each frontier model both solves problems and authors problems for others, yielding two ratings that avoid the saturation typical of fixed test sets.