About the job
About Our Team
The Infrastructure Engineering team operates within the IT department, dedicated to the reliable construction, deployment, and management of critical on-premises and hybrid environments that empower our internal services and vital research and development projects.
This newly established team is committed to implementing rigorous Site Reliability Engineering (SRE) practices in environments where uptime, safety, recoverability, and security are paramount. We aim to replace unique, one-off infrastructure with standardized infrastructure-as-code components that enhance reliability and operational efficiency as OpenAI continues to grow.
About This Role
We are in search of an Infrastructure Engineering Lead who will architect, build, and maintain reliable, secure, and scalable infrastructure that supports identity, access, endpoint, and shared platform services throughout the organization.
You will take full ownership of infrastructure and identity systems from conceptual design and provisioning to policy enforcement, upgrades, recovery, and ongoing operations. Your goal will be to develop robust, production-grade platforms that minimize operational hurdles, enforce security by default, and empower teams to work more effectively and confidently.
This position is ideal for a senior engineer who excels in navigating ambiguity, relishes the challenge of overseeing complex systems from start to finish, and enhances reliability and security by transforming fragile implementations into standardized, repeatable infrastructure.
This role is based at our San Francisco headquarters and requires in-office attendance.
Key Responsibilities:
- Define and refine infrastructure patterns for on-prem and hybrid environments, including self-hosted platforms, vendor-supported systems, and lab settings.
- Establish standardized, production-grade deployment and operational models that replace custom-built solutions.
- Collaborate with IT, Security, Identity, and Network teams to ensure infrastructure is designed to meet reliability, security, and access standards.
- Design and enhance the production architecture for Identity and Access Management (IAM) adjacent platforms, such as Microsoft Entra, utilizing SRE principles.
- Develop common management protocols and shared resources within Azure subscriptions to ensure uniformity and policy compliance in operations.

