About the job
About Our Team
Join the Inference team at OpenAI, where we leverage cutting-edge research and technology to deliver exceptional AI products to consumers, enterprises, and developers. Our mission is to empower users to harness the full potential of our advanced AI models, enabling unprecedented capabilities. We prioritize efficient and high-performance model inference while accelerating research advancements.
About the Role
We are seeking a passionate Software Engineer to optimize some of the world's largest and most sophisticated AI models for deployment in high-volume, low-latency, and highly available production and research environments.
Key Responsibilities
Collaborate with machine learning researchers, engineers, and product managers to transition our latest technologies into production.
Work closely with researchers to enable advanced research initiatives through innovative engineering solutions.
Implement new techniques, tools, and architectures that enhance the performance, latency, throughput, and effectiveness of our model inference stack.
Develop tools to identify bottlenecks and instability sources, designing and implementing solutions for priority issues.
Optimize our code and Azure VM fleet to maximize every FLOP and GB of GPU RAM available.
You Will Excel in This Role If You:
Possess a solid understanding of modern machine learning architectures and an intuitive grasp of performance optimization strategies, especially for inference.
Take ownership of problems end-to-end, demonstrating a willingness to acquire any necessary knowledge to achieve results.
Bring at least 5 years of professional software engineering experience.
Have or can quickly develop expertise in PyTorch, NVidia GPUs, and relevant optimization software stacks (such as NCCL, CUDA), along with HPC technologies like InfiniBand, MPI, and NVLink.
Have experience in architecting, building, monitoring, and debugging production distributed systems, with bonus points for working on performance-critical systems.
Have successfully rebuilt or significantly refactored production systems multiple times to accommodate rapid scaling.
Are self-driven, enjoying the challenge of identifying and addressing the most critical problems.

