About the job
About Bespoke Labs
Bespoke Labs is a leading applied AI research lab at the forefront of data and reinforcement learning (RL) environment curation for training and evaluating intelligent agents. We have recently curated Open Thoughts, recognized as one of the premier open reasoning datasets utilized by numerous advanced research labs. Our innovations include the development of state-of-the-art models such as Bespoke-MiniChart-7B and Bespoke-MiniCheck, alongside training agents to effectively perform multi-turn tool calling through reinforcement learning.
Bespoke Labs is uniquely positioned to capture a significant market share in the realm of data and RL environment curation.
About The Role
We are seeking an RL Environment Research Engineer to propel our efforts in crafting, assessing, and benchmarking training environments for AI agents. You will devise systematic methodologies for environment design, pinpoint agent failures, and transform those insights into high-quality training data and benchmarks.
This position marries research intuition with practical execution. A deep understanding of agent behavior is essential—identifying reward hacking, analyzing failure modes, and discerning why certain environments yield superior training outcomes. You will convert this understanding into repeatable processes and benchmark suites that can be externally showcased.
You thrive on both analytical tasks (reviewing agent rollouts, uncovering patterns in failures) and creative endeavors (designing environments, establishing evaluation pipelines). You can oscillate between exploring the science behind effective environments and actually producing them at scale.
What You'll Do
- Develop systematic strategies and frameworks for creating high-quality RL environments that proficiently train and evaluate agents.
- Investigate how large language models (LLMs) and agents fail across various task types, identifying patterns that inform superior environment design.
- Create benchmark environments that assess specific agent capabilities, preparing them for external release on our evaluation platform.
- Validate environment quality through hands-on testing—training small-scale agents, checking for reward hacking, and scrutinizing training dynamics.

