companyReflection AI logo

Technical Staff Member - Research Software Engineer

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

To succeed in this role, you should have a strong background in software engineering, particularly in the areas of distributed systems and machine learning. Experience with reinforcement learning and GPU computing is highly desirable. A proven ability to collaborate with researchers and engineers to develop scalable and efficient systems is essential.

About the job

Our Mission

At Reflection AI, we are dedicated to creating open superintelligence that is accessible to everyone.

Our innovative team is developing open weight models tailored for individuals, agents, enterprises, and even nation states. Our group of AI researchers and industry pioneers hail from top organizations including DeepMind, OpenAI, Google Brain, Meta, Character. AI, and Anthropic.

Your Role's Mission

As a crucial member of our team, you will play a pivotal role in bridging the divide between research and production by converting advanced algorithms into scalable training systems. Your expertise will help design and optimize the foundational infrastructure that supports cutting-edge AI models — including reinforcement learning training loops, distributed GPU training, and extensive data pipelines.

Our systems are engineered to train models across thousands of GPUs while managing petabyte-scale datasets. We prioritize numerical stability, high throughput, and reproducibility in our processes.

Team Overview

This team is responsible for the evolution and management of the core infrastructure that underpins our training systems.

Our focus areas include:

  • Reinforcement learning training infrastructure
  • Distributed training and inference systems
  • Experiment infrastructure and reproducibility
  • Large-scale data pipelines

Our goal is to construct an engineering foundation that enables researchers to iterate swiftly while training models at a massive scale.

Role Overview

You will be responsible for architecting and optimizing the core training infrastructure that powers our models, including RL training loops, distributed GPU systems, and large-scale data pipelines.

Collaborating closely with researchers, you will assist in transforming innovative ideas into reliable, scalable training systems.

Key Responsibilities:

  • Design and optimize large-scale training loops and data pipelines.
  • Implement cutting-edge techniques while ensuring numerical stability and computational efficiency.
  • Develop internal tools for launching, monitoring, and reproducing complex experiments.
  • Identify and resolve deep bottlenecks across the training stack (e.g., GPU memory issues, communication overhead, dataloader stalls).
  • Translate research prototypes into reusable, production-grade infrastructure.

About Reflection AI

Reflection AI is at the forefront of AI research, committed to democratizing access to superintelligent systems. Our diverse team of experts is united by a shared vision to innovate and build robust technologies that empower users across various domains.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.