About the job
About the Team
At OpenAI, our Core Services team plays a pivotal role in constructing and managing the essential online services that our product teams depend on for production. We specialize in foundational distributed systems and platform capabilities that ensure reliable execution, deliver high-performance services, and address large-scale data/file needs across our diverse product suite. Unlike developer and data infrastructure, our focus is on establishing robust production service foundations and core runtime services.
About the Role
We are seeking an Engineering Manager for our Core Services division. In this leadership position, you will guide teams responsible for designing and maintaining reliable, high-scale distributed systems that are vital for OpenAI’s product ecosystem. Your team will oversee the foundational production systems that serve as the backbone for our product engineering initiatives. You’ll work closely with product and infrastructure partners to expedite the delivery of dependable services and facilitate the scaling of both systems and teams as OpenAI continues to grow. Collaborating with senior engineering leaders, you will help evolve organizational structures, enhance operational processes, and spearhead significant platform initiatives. A strong technical background is essential for success in this role.
Your Responsibilities Include:
Leading and nurturing a high-performing team of infrastructure engineers.
Overseeing teams that build and manage critical production platforms with a focus on cluster reliability, scalability, and rollout safety.
Developing and operating mission-critical distributed systems with a strong emphasis on operational excellence (SLOs, incident response, capacity planning, reliability).
Defining technical direction for foundational platforms, including workflow/orchestration capabilities, large-scale storage solutions, and core service infrastructures.
Collaborating with a wide range of stakeholders, including product engineering teams, adjacent infrastructure teams, and finance/cost partners as necessary.
Mentoring and developing engineers and aspiring leaders within the organization.
You Will Excel in This Role If You:
Possess extensive experience leading teams that manage mission-critical infrastructure in production environments.
Have a proven track record in operating mission-critical services or building core distributed systems.
Have experience in developing platform-like systems (e.g., orchestration/workflow execution, service platforms) and/or large-scale storage solutions.
Are adept at building systems that enhance operational efficiencies and reliability.

