companyOpenAI logo

Inference Technical Lead - On-Device Transformers

OpenAISan Francisco
Hybrid Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Manager

About the job

About Our Team

Join the Future of Computing Research team at OpenAI, an innovative applied research group within the Consumer Devices division. Our mission is to pioneer new methods and models that contribute to our overarching goal of developing Artificial General Intelligence (AGI) for the betterment of humanity.

Role Overview

As the Inference Technical Lead, you will collaborate with world-class machine learning researchers and top-notch design talents to push the boundaries of model capabilities. This position is stationed in San Francisco, CA, offering a hybrid work model that includes 4 days in the office, along with relocation assistance for new hires.

Key Responsibilities

  • Assess and select silicon platforms, including GPUs, NPUs, and specialized accelerators, for the deployment of OpenAI models on-device and at the edge.

  • Collaborate closely with research teams to co-design model architectures that satisfy real-world constraints such as latency, memory, power, and bandwidth.

  • Conduct system performance analyses to identify trade-offs in model design, memory hierarchy, compute throughput, and hardware capabilities.

  • Work hand-in-hand with hardware vendors and internal infrastructure teams to launch new accelerators, ensuring efficient execution of transformer workloads.

  • Lead a team of engineers in implementing the low-level inference stack, encompassing kernel development and runtime systems.

  • Navigate challenges to transform emerging research capabilities into scalable solutions.

Ideal Candidate Profile

  • Proven experience in evaluating or deploying workloads on GPUs, NPUs, or other specialized accelerators.

  • Strong understanding of transformer model performance characteristics, including attention mechanisms, KV-cache behaviors, and memory bandwidth requirements.

  • Experience designing or optimizing high-performance computing systems, such as inference engines, distributed runtimes, or hardware-aware ML pipelines.

  • Background in building or leading teams focused on low-level performance-critical software, including CUDA kernels, compilers, or ML runtimes.

  • Demonstrated ability to thrive in a fast-paced, innovative environment.

About OpenAI

OpenAI is at the forefront of artificial intelligence research and development, dedicated to advancing digital intelligence while ensuring it benefits all of humanity. Our mission encompasses a commitment to creating safe AGI and promoting its responsible use.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.