companyClickHouse logo

Senior Site Reliability Engineer - Remote

ClickHouseCanada(remote)
Remote Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Senior

Qualifications

To excel in this position, you should possess strong expertise in site reliability engineering principles and practices, with a proven track record in managing cloud infrastructure. A solid understanding of distributed systems and experience with incident management processes are essential. Strong programming skills are also required to develop and maintain tools that improve operational efficiency. You should be a proactive problem solver with excellent communication skills, capable of working collaboratively in a dynamic team environment.

About the job

About ClickHouse

Recognized on the 2025 Forbes Cloud 100 list, ClickHouse stands out as a leading innovator in the realm of private cloud technology. With a rapidly expanding customer base exceeding 3,000 and an astounding annual recurring revenue (ARR) growth of over 250% year-on-year, ClickHouse is at the forefront of real-time analytics, data warehousing, observability, and AI workloads.

Our recent $400M Series D financing round validates our sustained momentum. Notable clients such as Capital One, Lovable, Decagon, Polymarket, and Airwallex have recently adopted or expanded their use of our platform, joining a prestigious roster of AI pioneers and global brands including Meta, Cursor, Sony, and Tesla.

Join us in our mission to revolutionize the way companies leverage data!

About the Role

As we enhance our commitment to delivering dependable and secure services, we are expanding our Site Reliability Engineering team. In this role, you will spearhead initiatives to maintain and improve the reliability, availability, scalability, and performance of our cloud infrastructure. Collaborate across various teams, including Control Plane, Data Plane, Core, Security, Support, and Operations, to design and implement robust, secure, and highly available distributed systems. You will take charge of incident management and response processes, conducting blameless postmortems and driving continuous improvements in our Cloud services. Your software engineering expertise will be vital in developing tools and platforms to enhance operational and engineering efficiencies within ClickHouse Cloud. This is a unique opportunity to make a substantial impact on our high-performance, elastic ClickHouse Cloud.

Your Responsibilities

  • Collaborate with diverse engineering teams at ClickHouse to architect and implement scalable, secure, and high-availability systems.
  • Establish and manage service level objectives (SLOs) and service level agreements (SLAs) for ClickHouse Cloud.
  • Ensure all infrastructure components within ClickHouse Cloud, including Data Plane, Control Plane, and ClickHouse Core, have effective monitoring and alerting systems in place for timely incident detection and resolution.
  • Refine incident response processes and post-mortem analyses for outages in ClickHouse Cloud, including communication with impacted customers through the support team.
  • Continuously enhance the reliability and performance of ClickHouse services.

About ClickHouse

ClickHouse is a rapidly growing private cloud company, recognized on the 2025 Forbes Cloud 100 list. With a focus on real-time analytics, data warehousing, and AI workloads, we are dedicated to transforming how companies utilize data. Our extensive customer base and substantial ARR growth highlight our commitment to innovation and excellence in the tech industry.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.