companyOrgvue logo

Principal Site Reliability Engineer

OrgvueLondon, England, United Kingdom
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Senior

Qualifications

Proven experience leading SRE transformationsExtensive hands-on expertise with Kubernetes (EKS preferred) in production settingsStrong proficiency with AWS core services (EC2, EKS, RDS, S3, ALB/NLB, IAM, CloudWatch, etc.)Expertise in Infrastructure as Code utilizing tools such as Terraform, with familiarity in GitOps workflowsSolid background in observability: metrics, visualization, logging, and tracing

About the job

At Orgvue, we are at the forefront of organizational design and planning software, harnessing the transformative power of data visualization and modeling to help organizations become more adaptable and high-performing. Our platform empowers HR, finance, and business leaders to make swift, informed workforce decisions in an ever-evolving landscape.

Trusted by some of the world's largest enterprises and renowned management consulting firms, Orgvue enables organizations to visualize and proactively shape their futures. Headquartered in London, we also have offices in Philadelphia, The Hague, Toronto, and Sydney.

We are currently on the lookout for a Principal Site Reliability Engineer to join our team as a senior technical leader specializing in scaling and fortifying our AWS and Kubernetes-based infrastructure.

Role Overview

In this pivotal role, you will collaborate with product, platform, and operations teams to ensure our systems are reliable, observable, and resilient, even at scale. This position marries hands-on technical proficiency with strategic foresight, enabling us to cultivate a world-class reliability culture and a strong engineering framework for growth. We seek an individual with robust technical skills, exceptional communication abilities, and a passion for cross-team collaboration.

Key Responsibilities

  • Establish and uphold SLOs, SLIs, and error budgets across vital services
  • Design and execute a comprehensive cloud infrastructure and tooling strategy
  • Elevate SRE practices organization-wide
  • Implement effective observability metrics, logs, and traces using our observability tools
  • Lead the team in creating automated, self-healing systems
  • Manage and refine our incident response protocols, including on-call practices and a post-mortem culture
  • Mentor engineers throughout the organization on reliability best practices, operational readiness, and scalable infrastructure
  • Drive Infrastructure as Code (IaC) initiatives using Terraform, Kubernetes, CloudFormation, and GitOps methodologies
  • Work closely with security, DevOps, and software teams to guarantee compliance, scalability, and operational excellence
  • Assess and introduce tools, patterns, and practices that enhance the performance and reliability of our SaaS platform

Qualifications

  • Proven experience leading SRE transformations
  • Extensive hands-on expertise with Kubernetes (EKS preferred) in production settings
  • Strong proficiency with AWS core services (EC2, EKS, RDS, S3, ALB/NLB, IAM, CloudWatch, etc.)
  • Expertise in Infrastructure as Code utilizing tools such as Terraform, with familiarity in GitOps workflows
  • Solid background in observability: metrics, visualization, logging, and tracing
  • Underst...

About Orgvue

Orgvue is a leader in organizational design and planning software, helping businesses harness the power of data visualization and modeling to foster adaptability and high performance. Our platform is utilized by top enterprises and consulting firms, enabling them to effectively visualize and architect their future business landscapes.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.