company

Software Engineer - Reinforcement Learning Research & Environments

Magic.devSan Francisco
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Entry Level

Qualifications

The ideal candidate will possess a strong background in software engineering and demonstrate expertise in large-scale data or machine learning systems. Your analytical skills and passion for solving complex issues will be crucial in our mission to enhance AGI capabilities.

About the job

At Magic, we are committed to creating safe AGI that propels humanity forward in addressing the world’s most pressing challenges. We believe that the key to achieving safe AGI is through the automation of research and code generation, which enhances model performance and ensures alignment more reliably than human efforts alone. Our innovative approach integrates cutting-edge pre-training techniques, domain-specific reinforcement learning, ultra-long context, and advanced inference-time computation.

Role Overview

As a Software Engineer on our RL Research & Environments team, you will be instrumental in designing and managing data systems, evaluation frameworks, and environmental setups that enhance model capabilities following pre-training.

This position emphasizes post-training processes, where your responsibilities will include identifying capability gaps, creating specialized datasets, designing reward structures, and executing iterative training cycles that lead to significant improvements in user interactions. You will oversee the infrastructure and experimental workflows that bridge product objectives with measurable capability enhancements.

Our long-context models present unique post-training challenges, including long-horizon reasoning, maintaining coherence over extended tasks, optimizing context utilization, and enabling tool-assisted behaviors. You will develop systems that reveal failure modes, produce high-value training data, and facilitate rapid reinforcement learning iterations at scale.

This role offers the potential for growth, allowing you to take ownership of key capability domains, delve deeper into RL systems, or influence post-training strategies as we enhance the performance and reliability of long-context models.

Key Responsibilities

  • Design and construct post-training datasets leveraging synthetic generation, targeted data collection, and self-play methodologies.

  • Implement filtering, scoring, and mixture strategies for reinforcement learning and post-training datasets.

  • Develop and sustain evaluation frameworks that identify long-context failure modes.

  • Create reward signals and training environments aimed at specific capability advancements.

  • Conduct ablation studies across various data sources, reward configurations, and long-horizon task designs.

  • Enhance the reliability and observability of post-training data and environment pipelines.

  • Collaborate closely with Product and Research teams to translate capability objectives into quantifiable iteration cycles.

Ideal Candidate Profile

  • Solid foundation in software engineering principles.

  • Proven experience in developing or managing large-scale data or machine learning systems.

  • Strong analytical skills and a passion for tackling complex problems.

About Magic.dev

Magic.dev is at the forefront of artificial intelligence research, dedicated to building safe AGI that accelerates progress on the world's most pressing problems. Our innovative approach is driven by a commitment to leveraging cutting-edge technology and methodologies to achieve our mission.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.