About the job
At Magic, we are driven by our mission to develop safe Artificial General Intelligence (AGI) that propels humanity forward in addressing the most critical challenges. We firmly believe that the future of safe AGI lies in automating research and code generation, allowing us to enhance models and tackle alignment issues more effectively than humans alone can manage. Our innovative approach combines cutting-edge pre-training, domain-specific reinforcement learning (RL), ultra-long context, and efficient inference-time computation to realize this vision.
Position Overview
As a Software Engineer within the Inference & RL Systems team, you will play a pivotal role in designing and managing the distributed systems that enable our models to function seamlessly in production, supporting extensive post-training workflows.
This position operates at the intersection of model execution and distributed infrastructure, focusing on systems that influence inference latency, throughput, stability, and the reliability of RL and post-training training loops.
Our long-context models impose significant execution demands, including KV-cache scaling, managing memory constraints for lengthy sequences, batching strategies, long-horizon trajectory rollouts, and ensuring consistent throughput under real-world workloads. You will be responsible for the infrastructure that ensures both production inference and large-scale RL iterations are efficient and dependable.
Key Responsibilities
Craft and scale high-performance inference serving systems.
Optimize KV-cache management, batching methods, and scheduling processes.
Enhance throughput and latency for long-context tasks.
Develop and sustain distributed RL and post-training infrastructure.
Boost reliability across rollout, evaluation, and reward pipelines.
Automate fault detection and recovery mechanisms for serving and RL systems.
Analyze and eliminate performance bottlenecks across GPU, networking, and storage components.
Collaborate with Kernel and Research teams to ensure alignment between execution systems and model architecture.
Qualifications
Solid foundation in software engineering and distributed systems.
Proven experience in building or managing large-scale inference or training systems.
In-depth understanding of GPU execution constraints and memory trade-offs.
Experience troubleshooting performance issues in production machine learning systems.
Capability to analyze system-level trade-offs between latency, throughput, and cost.

