Challenge the newest AI models with your hardest PhD-level exercises

— and learn how to use AI in your math research —

Sep 1, 2025: We are releasing our first public benchmark Press Release
While all models fail the majority of the problems posed by professional researchers, the results reveal wide variation in performance.
Aug 15, 2025: Mathematics in the Context Window by Thomas Kahle & Christian Stump
Large language models are slowly making their way into math research. We share some thoughts about this process.

Why Join Our Community?

Let the best models solve your exercises
Study and solve the exercises of others
Show that your exercises go far beyond the capabilities of LLMs
Challenge the following models with your prompts:
DeepSeek-V3.1 GPT-5 Claude Opus 4.1 Grok-4 Gemini 2.5 Pro o3

Example Submissions

Number Theory: How many mutually non-isomorphic extensions of degree 4 and Galois group of order at most $8$ does the the field $\mathbb{Q}_2$ of $2$-adic numbers admit?
Why is this a good prompt? The prompt is clear and short, but still requires a deep understanding of the theory and also the usage of computer algebra systems and databases.
Representation Theory: Let $W$ be the Weyl group of type $F_4 \times E_7$. Determine the number of pairs $(\beta, \gamma)$ of positive roots in the corresponding root system such that the difference $\beta - \gamma$ is a simple root.
Why is this a good prompt? This prompt is well-defined, has a simple unique answer, and requires a deep understanding of root systems. There is a theory that applies to simply-laced types (including $E_7$ but not to $F_4$) and there is another theory that captures all finite types, but requires nontrivial computations. The reasoning takes multiple steps, including the analysis which theory can be applied in which situation.
Algebraic Statistics: Let $G$ be the graph on $9$ vertices that consists of a $7$-cycle and a $4$-cycle glued along one edge. What is the maximum degree among the elements of the Markov basis of the corresponding binary hierarchical model?
Why is this a good prompt? This prompt gives a short description of a simple structure, and then asks for a simple unique answer based on a deep understanding of hierarchical models in algebraic statistics.
Algebraic Geometry: Consider the projective variety formed by all principal and almost principal minors of a symmetric $n \times n$-matrix. What is the degree of this variety for $n=5$?
Why is this a good prompt? While defined in elementary terms, the proper understanding of this prompt requires connecting this definition to a family of well-studied geometric structures and their properties.

For Companies & Institutions

Want to benchmark your models against PhD-level math problems from professional researchers?