companyCodeRabbit logo

Site Reliability Engineer - Platform at CodeRabbit | San Francisco

CodeRabbitSan Francisco
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

QualificationsProven experience in Site Reliability Engineering or similar roles. Strong proficiency in cloud computing platforms, particularly Google Cloud Platform. Hands-on experience with Infrastructure as Code tools, especially Terraform. Understanding of SRE principles, including SLI/SLO frameworks. Familiarity with monitoring tools like Datadog. Excellent problem-solving skills and a proactive mindset.

About the job

About CodeRabbit

CodeRabbit is a pioneering research and development firm dedicated to creating highly efficient human-machine collaboration systems. Our mission is to develop the next generation of AI-driven code review tools, fostering a harmonious partnership between human creativity and advanced algorithms that far exceed the capabilities of individual engineers. By merging language models with human innovation, we aim to elevate the standards of efficiency and quality in software development.

The Role

We are in search of a talented Site Reliability Engineer (SRE) to become a vital part of our Platform Engineering team located in the Bay Area. In this role, you will play a crucial part in maintaining the high availability, performance, and scalability of CodeRabbit's AI-enhanced code review platform. This position lies at the nexus of software engineering and systems operations, where you will construct the foundational platforms and automation that empower our engineering teams to deploy, monitor, and scale our services with reliability.

As a Site Reliability Engineer at CodeRabbit, your responsibilities will include improving the reliability of our essential services that handle millions of code reviews, developing sophisticated automation platforms, and managing the infrastructure that drives our AI analysis engine. You will engage with cutting-edge technologies such as large language models, real-time processing systems, and distributed architectures that function at scale.

Key Responsibilities

Infrastructure & Platform Ownership

  • Design, implement, and maintain scalable infrastructure on Google Cloud Platform to accommodate CodeRabbit's expanding user base and processing needs.

  • Take ownership of and operate essential platform services.

  • Develop and manage Infrastructure as Code using Terraform to guarantee consistent, reproducible, and version-controlled infrastructure deployments.

Reliability & Performance Engineering

  • Establish and uphold SLI/SLO frameworks for all critical services, ensuring we fulfill our reliability commitments to users.

  • Implement comprehensive monitoring, alerting, and observability solutions utilizing Datadog and custom instrumentation.

  • Conduct in-depth incident response, root cause analysis, and post-mortem processes to continually enhance system reliability.

  • Optimize application and infrastructure performance to manage millions of pull request analyses with minimal latency.

About CodeRabbit

CodeRabbit is at the forefront of innovation in research and development, specifically targeting the enhancement of human-machine collaboration systems. By harnessing the power of AI, we aim to redefine software development practices and create tools that empower engineers to work more efficiently and effectively.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.