Challenge the newest AI models with your hardest PhD-level exercises

— and learn how to use AI in your math research —

Mar 1, 2026: We are introducing the chat feature

Chat with the latest AI models in parallel about your mathematics research.

Nov 26, 2025: We are releasing our newest public benchmark

It includes 140 research-level mathematics problems. Includes Gemini 3 Pro, GPT 5.1, and Claude Opus 4.5.

View the public benchmark, or browse several sample prompts including their model answers.

Study and solve the exercises of others

Let the best models solve your exercises

Chat with the best AI models about mathematics

Show that your exercises go far beyond the capabilities of LLMs

Chat with the following models and challenge them with your problems:

GPT-5.2 DeepSeek-V3.2 Claude Opus 4.5 Gemini 3 Pro Grok-4.1

Questions, feedback, or want to contribute to the project? We'd love to hear from you.

contact@science-bench.ai