About the job
About Anthropic
At Anthropic, we are dedicated to developing reliable, interpretable, and controllable AI systems. Our goal is to ensure that AI technology is safe and beneficial for both users and society. Our rapidly expanding team consists of passionate researchers, engineers, policy experts, and business leaders collaborating to create advantageous AI systems.
About the Teams
The Reinforcement Learning teams at Anthropic spearhead our research and development in reinforcement learning, playing an essential role in enhancing our AI systems. We have made significant contributions to all Claude models, particularly impacting the autonomy and coding capabilities of Claude Sonnet 4.5 and Opus 4.5. Our work encompasses several critical areas:
- Creating systems that empower models to utilize computers effectively.
- Enhancing code generation through reinforcement learning techniques.
- Conducting pioneering RL research for large language models.
- Establishing scalable RL infrastructure and training methodologies.
- Improving model reasoning capabilities.
We work closely with Anthropic's alignment and frontier red teams to ensure our systems are both capable and secure. Additionally, we collaborate with the applied production training team to seamlessly integrate research advancements into deployed models, demonstrating our commitment to implementing research at scale. Our Reinforcement Learning teams operate at the intersection of cutting-edge research and engineering excellence, dedicated to building high-quality, scalable systems that expand the possibilities of AI.
About the Role
As a Research Engineer in the Reinforcement Learning domain, you will partner with a diverse group of researchers and engineers to enhance the capabilities and safety of large language models. This position merges research and engineering responsibilities, requiring you to implement innovative approaches while contributing to the research strategy. You will engage in fundamental research in reinforcement learning, developing 'agentic' models capable of tool use for open-ended tasks such as computer usage and autonomous software generation, improving reasoning skills in disciplines like mathematics, and creating prototypes for internal applications, productivity, and evaluation.
Representative Projects:
- Design and optimize core reinforcement learning infrastructure, from clean training abstractions to distributed experiment management across GPU clusters, scaling our systems to manage increasingly complex research workflows.
- Invent, implement, and evaluate novel training environments, evaluations, and methodologies for reinforcement learning.

