companyOpenAI logo

Software Engineer, Inference – AMD GPU Enablement

OpenAISan Francisco
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Experience in GPU kernel development or porting using HIP, CUDA, or Triton. Familiarity with communication libraries such as NCCL/RCCL and their application in large-scale model serving. Proven background in distributed inference systems and model scaling across accelerator fleets. Passion for addressing end-to-end performance challenges across various technical layers. Enthusiasm for contributing to a fast-paced, innovative team environment.

About the job

About Our Team
The Inference team at OpenAI is dedicated to translating our cutting-edge research into accessible, transformative technology for consumers, enterprises, and developers. By leveraging our advanced AI models, we enable users to achieve unprecedented levels of innovation and productivity. Our primary focus lies in enhancing model inference efficiency and accelerating progress in research through optimized inference capabilities.

About the Role
We are seeking talented engineers to expand and optimize OpenAI's inference infrastructure, specifically targeting emerging GPU platforms. This role encompasses a wide range of responsibilities from low-level kernel optimization to high-level distributed execution. You will collaborate closely with our research, infrastructure, and performance teams to ensure seamless operation of our largest models on cutting-edge hardware.

This position offers a unique opportunity to influence and advance OpenAI’s multi-platform inference capabilities, with a strong emphasis on optimizing performance for AMD accelerators.

Your Responsibilities Include:

  • Overseeing the deployment, accuracy, and performance of the OpenAI inference stack on AMD hardware.

  • Integrating our internal model-serving infrastructure (e.g., vLLM, Triton) into diverse GPU-backed systems.

  • Debugging and optimizing distributed inference workloads across memory, network, and compute layers.

  • Validating the correctness, performance, and scalability of model execution on extensive GPU clusters.

  • Collaborating with partner teams to design and optimize high-performance GPU kernels for accelerators utilizing HIP, Triton, or other performance-centric frameworks.

  • Working with partner teams to develop, integrate, and fine-tune collective communication libraries (e.g., RCCL) to parallelize model execution across multiple GPUs.

Ideal Candidates Will:

  • Possess experience in writing or porting GPU kernels using HIP, CUDA, or Triton, with a strong focus on low-level performance.

  • Be familiar with communication libraries like NCCL/RCCL, understanding their importance in high-throughput model serving.

  • Have experience with distributed inference systems and be adept at scaling models across multiple accelerators.

  • Enjoy tackling end-to-end performance challenges across hardware, system libraries, and orchestration layers.

  • Be eager to join a dynamic, agile team focused on building innovative infrastructure from the ground up.

About OpenAI

At OpenAI, we are on a mission to ensure that artificial general intelligence (AGI) benefits all of humanity. Our team is composed of leading experts in AI research and technology, working collaboratively to develop cutting-edge solutions that drive progress and empower users globally. Join us in shaping the future of AI and making a lasting impact on the world.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.