About the job
Our Mission
At Reflection AI, we are dedicated to creating open superintelligence that is accessible to everyone.
Our innovative team is developing open weight models tailored for individuals, agents, enterprises, and even nation states. Our group of AI researchers and industry pioneers hail from top organizations including DeepMind, OpenAI, Google Brain, Meta, Character. AI, and Anthropic.
Your Role's Mission
As a crucial member of our team, you will play a pivotal role in bridging the divide between research and production by converting advanced algorithms into scalable training systems. Your expertise will help design and optimize the foundational infrastructure that supports cutting-edge AI models — including reinforcement learning training loops, distributed GPU training, and extensive data pipelines.
Our systems are engineered to train models across thousands of GPUs while managing petabyte-scale datasets. We prioritize numerical stability, high throughput, and reproducibility in our processes.
Team Overview
This team is responsible for the evolution and management of the core infrastructure that underpins our training systems.
Our focus areas include:
- Reinforcement learning training infrastructure
- Distributed training and inference systems
- Experiment infrastructure and reproducibility
- Large-scale data pipelines
Our goal is to construct an engineering foundation that enables researchers to iterate swiftly while training models at a massive scale.
Role Overview
You will be responsible for architecting and optimizing the core training infrastructure that powers our models, including RL training loops, distributed GPU systems, and large-scale data pipelines.
Collaborating closely with researchers, you will assist in transforming innovative ideas into reliable, scalable training systems.
Key Responsibilities:
- Design and optimize large-scale training loops and data pipelines.
- Implement cutting-edge techniques while ensuring numerical stability and computational efficiency.
- Develop internal tools for launching, monitoring, and reproducing complex experiments.
- Identify and resolve deep bottlenecks across the training stack (e.g., GPU memory issues, communication overhead, dataloader stalls).
- Translate research prototypes into reusable, production-grade infrastructure.

