About the job
About the Role
Join the Engineering Acceleration Delivery / Continuous Deployment team at OpenAI, where we develop and maintain systems designed to securely deploy OpenAI’s infrastructure and product code into production.
Our team is responsible for the deployment platform, release pipelines, and safety mechanisms that empower engineers across OpenAI to make rapid changes while minimizing operational risks. Our goal is to streamline production deployments, enhancing speed, safety, and autonomy.
This position is a unique opportunity to work at the convergence of developer productivity, distributed systems reliability, and large-scale infrastructure orchestration.
In This Role, You Will
Architect and implement continuous deployment infrastructure that efficiently manages changes across multiple Kubernetes clusters and global regions.
Create systems for progressive delivery, incorporating techniques like canary releases, staged rollouts, and automated rollback processes.
Enhance engineering velocity by reducing friction within the release pipeline and automating operational workflows.
Collaborate with product and infrastructure teams to ensure their services are deployable, observable, and resilient at scale.
Refine and adopt deployment methodologies such as GitOps, infrastructure-as-code, and progressive delivery patterns.
Develop systems that automatically assess deployment health through metrics, logs, traces, and alerts to identify regressions and initiate safe rollbacks.
Create systems that facilitate agent-assisted or fully autonomous deployment workflows using cutting-edge AI tools.
Technologies you will work with include:
Kubernetes for large-scale container orchestration and runtime infrastructure
Python and FastAPI for internal services
Terraform for infrastructure as code
GitOps-based deployment workflows (e.g., ArgoCD, Flux, or similar systems)
Buildkite for CI orchestration

