About the job
Join Our Innovative Team
At OpenAI, our Kernels team is at the forefront of developing cutting-edge software that drives our most ambitious AI research initiatives.
We operate at the intersection of hardware and software, crafting high-performance kernels and implementing distributed system optimizations to enhance the efficiency of large-scale training and inference processes.
Our mission is to empower OpenAI to push the boundaries of AI by ensuring that various models—from large language models (LLMs) to recommendation systems—operate seamlessly on state-of-the-art supercomputing infrastructures. This includes adapting our software stack for new accelerator technologies, optimizing overall system performance, and eliminating bottlenecks throughout the architecture.
Your Role
As a member of the Accelerators team, you will play a crucial role in evaluating and integrating new computing platforms designed to support extensive AI training and inference capabilities.
Your projects will encompass everything from prototyping system software on emerging accelerators to implementing performance enhancements across our AI applications.
You will engage with both hardware and software components, focusing on kernel development, sharding strategies, distributed systems scalability, and performance modeling.
This position emphasizes the integration of machine learning algorithms with system performance optimization—particularly in large-scale environments—rather than solely compiler development.
Key Responsibilities
Prototype and empower OpenAI's AI software stack on pioneering accelerator platforms.
Enhance the performance of large-scale models (LLMs, recommender systems, distributed AI workloads) across varied hardware setups.
Design kernels, sharding strategies, and system scaling solutions optimized for new accelerator technologies.
Collaborate on code-level optimizations (e.g., in PyTorch) and lower-level enhancements to improve performance on unconventional hardware. Conduct system-level performance modeling, identify bottlenecks, and foster comprehensive optimization.
Partner with hardware teams and vendors to assess alternatives to current platforms and adapt our software stack accordingly.
Contribute to runtime advancements, compute and communication overlapping, and scaling strategies for next-generation AI workloads.
Ideal Candidate Profile
A strong background in software engineering, particularly with a focus on system performance and large-scale applications.
Experience with AI workloads and optimizing performance across both hardware and software layers.
Familiarity with distributed systems and the ability to work collaboratively with hardware teams.
A passion for advancing AI technologies and a desire to tackle challenging problems in a fast-paced environment.

