Mapping the Frontier of Scientific Reasoning
Pattern-matching plateaus.
Reasoning generalizes.
We build large-scale math and physics reasoning datasets for training and evaluating frontier AI systems.
◣ Our Product
We create math and physics datasets for training and evaluating frontier AI models. Even as these models improve rapidly, we develop original problems at the frontier of difficulty, with verifiable solutions, at scale.
As models improve, we stay ahead of them. That's the job.
◤ Our Process
We systematically study how leading models behave at the jagged frontier of intelligence. What kinds of reasoning steps break them? Where does pattern-matching stop working? That frontier shifts as models improve, and we track it.
This informs everything we build. Our problems are designed to require actual thought: the kind of work where getting the right answer is strong evidence you understood the problem.
◣ Our Vision
We are on a mission to help AI understand the world. We believe the path to AI that reasons – not just recognizes – runs through mathematics. Math is the domain where genuine inference can be distinguished from fluent approximation, because solutions are verifiable and novelty is constructible. We build the data that makes that training possible.
◤ Who We Are
We're a team of mathematicians, computer scientists, and engineers at the University of Pennsylvania having access to a network of faculty, postdocs, and PhD students across multiple domains. We bring academic rigor to a problem that requires it and the infrastructure to deliver at the scale frontier labs need.
◣ Working With Us
We deliver research-grade datasets at volume, with fast turnaround. Every dataset can be configured to your specifications: difficulty distribution, target domains, evaluation format, specific capability gaps you're trying to address.
If you're training or evaluating frontier models and you need problems that actually test reasoning, get in touch with us.
“Rabdos” is Greek for “rod,” referencing Napier's rods – handcrafted tools that supercharged calculation.