About the job
About Our Team
Join the innovative Frontier Evaluations & Environments team at OpenAI, where we are dedicated to building transformative model environments that pave the way for safe artificial general intelligence (AGI) and artificial superintelligence (ASI). Our team constructs ambitious evaluation environments that not only measure but also enhance the capabilities of our models, creating self-improvement loops that inform our training, safety, and deployment strategies. Some of our notable open-source evaluations include GDPval, SWE-bench Verified, MLE-bench, PaperBench, and SWE-Lancer. We have also executed frontier evaluations for groundbreaking models like GPT4o, o1, o3, GPT 4.5, ChatGPT Agent, and GPT5. If you are passionate about experiencing firsthand the rapid advancements of our models and guiding them toward a positive impact, this is the opportunity for you.
Your Role
We are in search of exceptional research engineers who are eager to push the limits of our frontier models. Our ideal candidates will play a vital role in shaping our empirical understanding of AI capabilities across a broad spectrum and will take ownership of specific projects from conception to execution.
Key Responsibilities:
Design and implement ambitious reinforcement learning environments to maximize our models' potential.
Conduct assessments of frontier model capabilities, skills, and behaviors.
Create innovative methodologies for the automated exploration of model behaviors.
Guide training processes for our most extensive model training initiatives, gaining insights into the future of AI.
Collaborate with cross-functional teams to align model evaluations with organizational objectives.

