About the job
At Magic, we are committed to creating safe AGI that propels humanity forward in addressing the world’s most pressing challenges. We believe that the key to achieving safe AGI is through the automation of research and code generation, which enhances model performance and ensures alignment more reliably than human efforts alone. Our innovative approach integrates cutting-edge pre-training techniques, domain-specific reinforcement learning, ultra-long context, and advanced inference-time computation.
Role Overview
As a Software Engineer on our RL Research & Environments team, you will be instrumental in designing and managing data systems, evaluation frameworks, and environmental setups that enhance model capabilities following pre-training.
This position emphasizes post-training processes, where your responsibilities will include identifying capability gaps, creating specialized datasets, designing reward structures, and executing iterative training cycles that lead to significant improvements in user interactions. You will oversee the infrastructure and experimental workflows that bridge product objectives with measurable capability enhancements.
Our long-context models present unique post-training challenges, including long-horizon reasoning, maintaining coherence over extended tasks, optimizing context utilization, and enabling tool-assisted behaviors. You will develop systems that reveal failure modes, produce high-value training data, and facilitate rapid reinforcement learning iterations at scale.
This role offers the potential for growth, allowing you to take ownership of key capability domains, delve deeper into RL systems, or influence post-training strategies as we enhance the performance and reliability of long-context models.
Key Responsibilities
Design and construct post-training datasets leveraging synthetic generation, targeted data collection, and self-play methodologies.
Implement filtering, scoring, and mixture strategies for reinforcement learning and post-training datasets.
Develop and sustain evaluation frameworks that identify long-context failure modes.
Create reward signals and training environments aimed at specific capability advancements.
Conduct ablation studies across various data sources, reward configurations, and long-horizon task designs.
Enhance the reliability and observability of post-training data and environment pipelines.
Collaborate closely with Product and Research teams to translate capability objectives into quantifiable iteration cycles.
Ideal Candidate Profile
Solid foundation in software engineering principles.
Proven experience in developing or managing large-scale data or machine learning systems.
Strong analytical skills and a passion for tackling complex problems.

