About the job
About ClickHouse
Featured in the 2025 Forbes Cloud 100, ClickHouse is a leader in innovation and growth among private cloud companies. With a customer base exceeding 3,000 and an annual recurring revenue (ARR) that has surged over 250% year-over-year, we excel in real-time analytics, data warehousing, observability, and AI workloads.
Our remarkable growth was recently validated by a $400 million Series D funding round. In just three months, we've welcomed new customers like Capital One, Lovable, Decagon, Polymarket, and Airwallex, joining our existing roster of AI trailblazers and global brands, including Meta, Cursor, Sony, and Tesla.
Join us as we revolutionize the way companies harness data!
About the Team
The Release Team is dedicated to the seamless and secure delivery of ClickHouse Cloud, a managed database platform that operates tens of thousands of ClickHouse clusters. Our responsibilities include scaling upgrades and maintenance of these clusters, developing the internal tools that facilitate this process, and serving as the final line of defense when unexpected issues arise.
About the Role
This position offers a balanced mix of operational execution and software development. You will oversee operations: coordinating upgrades, managing exceptions that deviate from the norm, and ensuring the health of thousands of clusters in production. Concurrently, you will enhance the systems that make each rollout safer and more automated than the last. If you thrive on both crafting and implementing processes, including navigating the complexities, we want you on our team!
What You'll Do
- Plan and implement rolling upgrades across a vast array of ClickHouse clusters, prioritizing safety, accuracy, and minimal disruption to customers.
- Manage the entire release pipeline, from pre-upgrade validation and staged rollouts to post-upgrade monitoring and incident response.
- Analyze and resolve production issues as part of a regular on-call schedule, including unique clusters and scenarios that current automation cannot address.
- Develop and refine the internal tools and automation that ensure reliable and repeatable large-scale database operations.
- Collaborate closely with core database and cloud infrastructure teams to identify and resolve operational challenges.
- Provide support and training to other engineering teams utilizing our internal tools.

