About the job
About Our Team
Join OpenAI’s dynamic Inference team, where we empower the deployment of cutting-edge AI models, including our renowned GPT models, advanced Image Generation capabilities, and Whisper, across diverse platforms. Our mission is to ensure these models are not only high-performing and scalable but also available for real-world applications. Collaborating closely with our Research team, we’re committed to bringing the next generation of AI innovations to fruition. As a compact, agile team, we prioritize delivering an exceptional developer experience while continuously pushing the frontiers of artificial intelligence.
As we expand our focus into multimodal inference, we are building the necessary infrastructure to support models that process images, audio, and other non-text modalities. This work involves tackling diverse model sizes and interactions, managing complex input/output formats, and ensuring seamless collaboration between product and research teams.
About The Role
We are seeking a passionate Software Engineer to aid in the large-scale deployment of OpenAI’s multimodal models. You will join a small yet impactful team dedicated to creating robust, high-performance infrastructure for real-time audio, image, and various multimodal workloads in production environments.
This position is inherently collaborative; you will work directly with researchers who develop these models and with product teams to define novel interaction modalities. Your contributions will enable users to generate speech, interpret images, and engage with models in innovative ways that extend beyond traditional text-based interactions.
Key Responsibilities:
- Design and implement advanced inference infrastructure for large-scale multimodal models.
- Optimize systems for high-throughput and low-latency processing of image and audio inputs and outputs.
- Facilitate the transition of experimental research workflows into dependable production services.
- Engage closely with researchers, infrastructure teams, and product engineers to deploy state-of-the-art capabilities.
- Contribute to systemic enhancements, including GPU utilization, tensor parallelism, and hardware abstraction layers.
You May Excel In This Role If You:
- Have a proven track record of building and scaling inference systems for large language models or multimodal architectures.
- Possess experience with GPU-based machine learning workloads and a solid understanding of the performance dynamics associated with large models, particularly with intricate data types like images or audio.
- Thrive in a fast-paced, experimental environment and enjoy collaborating with cross-functional teams to drive impactful results.

