companyOpenAI logo

Software Engineer, Reliability

OpenAISan Francisco
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Proven experience in software engineering with a focus on system reliability, scalability, and performance. Proficiency in programming languages such as Python, Java, or Go. Familiarity with infrastructure as code (IaC) tools like Terraform or CloudFormation. Strong understanding of cloud computing platforms (AWS, GCP, Azure). Experience with monitoring and incident response tools. Excellent problem-solving skills and the ability to work collaboratively in a fast-paced environment. Strong communication skills and a passion for ensuring the safe deployment of AI technologies.

About the job

Become a vital part of the engineering teams that responsibly bring OpenAI’s transformative technologies to the world!

At OpenAI, our Applied Engineering team collaborates across research, engineering, product management, and design to deliver AI solutions to both consumers and businesses. We are committed to learning from our deployments, maximizing the benefits of AI, and ensuring that this powerful technology is utilized both safely and ethically. Our priority is safety over unchecked growth.

About the Role

As OpenAI continues to expand, we are seeking seasoned engineers who excel in problem-solving to enhance the scalability of our systems. Our achievements hinge on our ability to rapidly iterate on product development while ensuring optimal performance and reliability. You will thrive in a collaborative, fast-paced environment, playing a key role in delivering our technology to millions globally, with a focus on safety and reliability. As a reliability engineer, you will lead efforts to maintain and improve the stability, scalability, and performance of our dynamic infrastructure. You will collaborate closely with cross-functional teams, including software engineers, product managers, and data scientists, to construct and sustain robust systems capable of accommodating our growing user base and workload.

Your Responsibilities Include:

  • Designing and implementing solutions to scale our infrastructure to meet increasing demands effectively.

  • Developing and maintaining load, chaos, and synthetic testing software that enhances the reliability of systems designed by development teams.

  • Creating and managing automation tools to streamline repetitive tasks and bolster system reliability.

  • Overseeing the lifecycle management platform for CPU/storage, GPU, and network resources to foster efficiency and support dynamic optimization.

  • Implementing fault-tolerant and resilient design patterns to minimize service interruptions.

  • Establishing and maintaining service level objectives (SLOs) and service level indicators (SLIs) to ensure system reliability.

  • Collaborating with researchers, engineers, product managers, and designers to introduce new features and research advancements to the world.

  • Participating in an on-call rotation to address critical incidents and ensure 24/7 system availability.

Your Impact: Your contributions will be essential in guaranteeing the reliability and performance of our platforms as we continue to scale our operations.

About OpenAI

OpenAI is at the forefront of artificial intelligence research and deployment, dedicated to ensuring that AI benefits all of humanity. Our mission is to develop advanced AI technologies while prioritizing safety and ethical considerations. As pioneers in the field, we strive to create AI that is both powerful and responsible, making significant contributions to the global AI landscape.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.