Principal Site Reliability Engineer

OrgvueLondon, England, United Kingdom

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Senior

Qualifications

Proven experience leading SRE transformationsExtensive hands-on expertise with Kubernetes (EKS preferred) in production settingsStrong proficiency with AWS core services (EC2, EKS, RDS, S3, ALB/NLB, IAM, CloudWatch, etc.)Expertise in Infrastructure as Code utilizing tools such as Terraform, with familiarity in GitOps workflowsSolid background in observability: metrics, visualization, logging, and tracing

About the job

Trusted by some of the world's largest enterprises and renowned management consulting firms, Orgvue enables organizations to visualize and proactively shape their futures. Headquartered in London, we also have offices in Philadelphia, The Hague, Toronto, and Sydney.

We are currently on the lookout for a Principal Site Reliability Engineer to join our team as a senior technical leader specializing in scaling and fortifying our AWS and Kubernetes-based infrastructure.

Role Overview

In this pivotal role, you will collaborate with product, platform, and operations teams to ensure our systems are reliable, observable, and resilient, even at scale. This position marries hands-on technical proficiency with strategic foresight, enabling us to cultivate a world-class reliability culture and a strong engineering framework for growth. We seek an individual with robust technical skills, exceptional communication abilities, and a passion for cross-team collaboration.

Key Responsibilities

Establish and uphold SLOs, SLIs, and error budgets across vital services
Design and execute a comprehensive cloud infrastructure and tooling strategy
Elevate SRE practices organization-wide
Implement effective observability metrics, logs, and traces using our observability tools
Lead the team in creating automated, self-healing systems
Manage and refine our incident response protocols, including on-call practices and a post-mortem culture
Mentor engineers throughout the organization on reliability best practices, operational readiness, and scalable infrastructure
Drive Infrastructure as Code (IaC) initiatives using Terraform, Kubernetes, CloudFormation, and GitOps methodologies
Work closely with security, DevOps, and software teams to guarantee compliance, scalability, and operational excellence
Assess and introduce tools, patterns, and practices that enhance the performance and reliability of our SaaS platform

Qualifications

Proven experience leading SRE transformations
Extensive hands-on expertise with Kubernetes (EKS preferred) in production settings
Strong proficiency with AWS core services (EC2, EKS, RDS, S3, ALB/NLB, IAM, CloudWatch, etc.)
Expertise in Infrastructure as Code utilizing tools such as Terraform, with familiarity in GitOps workflows
Solid background in observability: metrics, visualization, logging, and tracing
Underst...

About Orgvue

Orgvue is a leader in organizational design and planning software, helping businesses harness the power of data visualization and modeling to foster adaptability and high performance. Our platform is utilized by top enterprises and consulting firms, enabling them to effectively visualize and architect their future business landscapes.

Similar jobs

1 - 20 of 2,329 Jobs

Search for Devops Engineer Site Reliability Engineer

2,329 results

Select all on this page (20)

Apply

DevOps Engineer / Site Reliability Engineer

Freelancer.com

Full-time|On-site|London, England, United Kingdom

Join our dynamic Systems Engineering team as a pivotal and trusted DevOps Engineer / Site Reliability Engineer. Collaborating closely with software engineers, you will design and implement mission-critical services and systems. Your role will involve managing infrastructure and services at scale, employing a diverse array of cutting-edge technologies that support our high-traffic, real-time Freelancer.com marketplace as well as various other business products deployed on Amazon Web Services. Our technology stack includes Nginx, MySQL, Redis, ElasticSearch, RabbitMQ, Consul, Docker, and Kubernetes. We aim to build highly resilient, dynamically scaling, self-healing systems by automating and monitoring all processes using tools such as Terraform, Puppet, Prometheus, Grafana, Kibana, and Jenkins.

Dec 3, 2025

Apply

Principal Site Reliability Engineer

Orgvue

Full-time|On-site|London, England, United Kingdom

At Orgvue, we are at the forefront of organizational design and planning software, harnessing the transformative power of data visualization and modeling to help organizations become more adaptable and high-performing. Our platform empowers HR, finance, and business leaders to make swift, informed workforce decisions in an ever-evolving landscape.Trusted by some of the world's largest enterprises and renowned management consulting firms, Orgvue enables organizations to visualize and proactively shape their futures. Headquartered in London, we also have offices in Philadelphia, The Hague, Toronto, and Sydney.We are currently on the lookout for a Principal Site Reliability Engineer to join our team as a senior technical leader specializing in scaling and fortifying our AWS and Kubernetes-based infrastructure.Role OverviewIn this pivotal role, you will collaborate with product, platform, and operations teams to ensure our systems are reliable, observable, and resilient, even at scale. This position marries hands-on technical proficiency with strategic foresight, enabling us to cultivate a world-class reliability culture and a strong engineering framework for growth. We seek an individual with robust technical skills, exceptional communication abilities, and a passion for cross-team collaboration.Key ResponsibilitiesEstablish and uphold SLOs, SLIs, and error budgets across vital servicesDesign and execute a comprehensive cloud infrastructure and tooling strategyElevate SRE practices organization-wideImplement effective observability metrics, logs, and traces using our observability toolsLead the team in creating automated, self-healing systemsManage and refine our incident response protocols, including on-call practices and a post-mortem cultureMentor engineers throughout the organization on reliability best practices, operational readiness, and scalable infrastructureDrive Infrastructure as Code (IaC) initiatives using Terraform, Kubernetes, CloudFormation, and GitOps methodologiesWork closely with security, DevOps, and software teams to guarantee compliance, scalability, and operational excellenceAssess and introduce tools, patterns, and practices that enhance the performance and reliability of our SaaS platformQualificationsProven experience leading SRE transformationsExtensive hands-on expertise with Kubernetes (EKS preferred) in production settingsStrong proficiency with AWS core services (EC2, EKS, RDS, S3, ALB/NLB, IAM, CloudWatch, etc.)Expertise in Infrastructure as Code utilizing tools such as Terraform, with familiarity in GitOps workflowsSolid background in observability: metrics, visualization, logging, and tracingUnderst...

Feb 6, 2026

Apply

Site Reliability Engineer at Wheely | London

Wheely

Full-time|On-site|London, England, United Kingdom

About WheelyWheely is revolutionizing premium transportation in major cities across Europe, the United States, and the Middle East. We seamlessly integrate cutting-edge technology with the artistry of five-star chauffeuring to provide an unparalleled experience that has earned the trust of over 100,000 active riders and 1,200 corporate clients.As a profitable and rapidly growing scale-up, we have raised $43M and surpassed $100M in annual revenue. Following our recent launch in New York City, we are swiftly expanding across the US and EMEA. If you take pride in your craft and are eager to contribute to our next phase of growth, we invite you to connect with us.Our infrastructure has been rebuilt almost from the ground up over the past few years, and we are now seeking to further expand our infrastructure team.As a valued member of our team, you will focus on minimizing incidents related to availability, performance, and security. You will accelerate the delivery of new features to customers by building flexible, highly available, and secure infrastructure, ensuring a smooth journey for every customer.

Apr 9, 2026

Apply

Lead Site Reliability Engineer

getground

Full-time|Hybrid|London

Location: London, Waterloo (Hybrid, 4 days in-office - Wednesday is our designated work from home day, though you are welcome to join us in the office on Wednesdays if you prefer)At getground, we are revolutionizing one of the world's most significant asset classes: property. With over £2 billion in assets on our platform and a community of more than 30,000 users across 70 countries, we are shaping the future of asset ownership and tackling wealth inequality.Our innovative product streamlines property investing from start to finish, making real estate investment accessible to everyone.Your Key Responsibilities:Collaborating within cross-functional product teams to transition infrastructure and reliability initiatives from concept to live deployment.Thriving in a dynamic environment where autonomy and ownership are fundamental to our operations.Developing and sustaining a robust, scalable infrastructure within our GCP cloud ecosystem. Utilizing Kubernetes, Terraform, Cloudflare, and cutting-edge observability tools to ensure seamless platform functionality.Working closely with engineering teams to formulate CI/CD pipelines, enhance deployment methodologies, and advocate for reliability as a core engineering principle.Contributing to the establishment of SRE practices for a rapidly growing fintech platform. Mentoring fellow engineers as we expand our teams and influence.Your Day-to-Day Activities:Designing, implementing, and maintaining cloud infrastructure on Google Cloud Platform (GCP), ensuring it meets scalability, reliability, and security standards.Taking ownership of our Kubernetes clusters and containerization strategy, including Docker image optimization, cluster management, and deployment orchestration.Creating and optimizing Infrastructure as Code using Terraform, producing modular, testable, and well-documented configurations that adapt to our rapid growth.Managing and enhancing our Cloudflare infrastructure, including Workers for edge computing, DNS, CDN, security policies, and performance optimization.Implementing AI-powered product features in isolated and secure serverless environments.Establishing comprehensive monitoring and observability with Prometheus and Grafana, defining SLIs/SLOs, and proactively identifying potential issues before they affect users.Designing and maintaining CI/CD pipelines with appropriate quality gates, testing strategies, and deployment methodologies (blue-green, canary) to facilitate rapid deployments.

Feb 27, 2026

Apply

Site Reliability Engineer at Air Apps | London

Air Apps

Full-time|On-site|London Metropolitain Area

Air Apps builds technology to help people plan, work, and live better. Founded in Lisbon in 2018 and still family-led, the company has grown to San Francisco and now London, remaining self-funded and reaching over 100 million downloads. Every day, teams at Air Apps challenge assumptions and develop AI-powered products that make a difference for users worldwide. The company values creativity and aims to improve how resources are managed and lives are impacted. Role Overview The Site Reliability Engineer (SRE) will focus on keeping Air Apps systems reliable, available, and scalable. This role connects software development and operations, using automation, monitoring, and performance tuning to reduce downtime and strengthen system resilience. This is a fully onsite position based in the London Metropolitan Area. Air Apps will consider relocation support for the right candidate. The SRE will work closely with cross-functional teams in a busy office setting. What You Will Do Design and implement systems that are scalable, reliable, and fault-tolerant across cloud platforms. Develop and maintain observability tools for monitoring, logging, and alerting (such as Prometheus, Grafana, Datadog, ELK). Automate infrastructure provisioning, deployment, and incident response using Infrastructure as Code tools like Terraform or CloudFormation. Improve system performance, scalability, and incident response processes to maximize uptime. Work with development and DevOps teams to strengthen system designs for reliability. Conduct root cause analysis and implement steps to prevent future failures. Design and maintain strategies for load balancing, failover, and disaster recovery to ensure high availability.

Apr 17, 2026

Apply

Senior Site Reliability Engineer I

Axon Enterprise, Inc.

Full-time|Hybrid|London, England, United Kingdom

Become a Force for Good with Axon.At Axon, our mission is to Protect Life. We are innovators tackling society’s most pressing safety and justice challenges through our integrated ecosystem of devices and cloud software. Like our products, we thrive on collaboration, connecting with transparency and empathy, and embracing diverse perspectives from our customers, communities, and each other.Working at Axon is fast-paced, challenging, and purposeful. Here, you will take the initiative and make a tangible impact. Constantly develop your skills as you dedicate yourself to a mission that matters within a company that values your contributions.Your ContributionJoin us in revolutionizing infrastructure automation for critical law enforcement systems. As a Senior Site Reliability Engineer, you will lead the creation of a cutting-edge infrastructure provisioning and automation platform. This platform allows engineering teams to independently access cloud infrastructure, ensuring safety and efficiency while minimizing manual interventions and operational risks.Your role will involve hands-on contributions to build and enhance systems leveraging automation and intelligent agents to generate, validate, test, and manage infrastructure at scale. We seek an engineer with a strong software development background, proficiency in programming languages such as Go or Python, and extensive experience in designing and operating cloud platforms, with a drive to enhance developer productivity, reliability, and platform robustness.Work Location:This position is based in our London office and follows a hybrid work schedule. We emphasize in-person collaboration, requiring team members to be onsite from Tuesday to Friday, with the option to work remotely on Mondays, unless a workplace accommodation has been approved. We believe that connection fuels innovation, and our in-office culture is designed to promote meaningful teamwork, mentorship, and collective success.Key ResponsibilitiesDevelop robust, user-friendly foundational platforms and tools that enable engineering teams to provision infrastructure quickly, consistently, and securely across diverse cloud providers.Write efficient, maintainable, and clear code in Go.Promote and uphold Infrastructure as Code (IaC) best practices and coding standards.Utilize strong problem-solving skills to troubleshoot issues in cloud-native distributed systems.Influence and educate the engineering organization on adopting new and improved architectural patterns.Provide comprehensive documentation to facilitate self-service by engineers.

Mar 27, 2026

Apply

Site Reliability Engineer at Trainline | London

Trainline

Full-time|On-site|London

About UsAt Trainline, we are passionate advocates for rail travel, dedicated to creating a greener and more sustainable future for travel. Our platform empowers millions of travelers to discover and book the most cost-effective tickets across various carriers, fares, and journey options through our highly-rated mobile app, website, and B2B partner channels.Join Us in Creating Great Journeys As Europe’s leading rail app, we attract over 125 million monthly visits and facilitate £5.9 billion in annual ticket sales. Collaborating with over 270 rail and coach companies across more than 40 countries, we envision a world where travel is simple, seamless, eco-friendly, and affordable. Today, as a FTSE 250 company, our diverse team of over 1,000 Trainliners from 50+ nationalities is based in London, Paris, Barcelona, Milan, Edinburgh, and Madrid. With a strong focus on growth in the UK and Europe, this is an exciting time to join us on this high-speed journey.Introducing the Reliability & Operations Engineering Team Trainline is at the forefront of technology, delivering world-class digital experiences for millions of customers. Our platform predominantly operates on AWS, utilizing cloud-native architecture, modern CI/CD pipelines, and robust DevOps and SRE methodologies.The Reliability & Operations Engineering team (ReliabilityOps) integrates SRE, Incident Management, and Database Reliability to ensure our platform remains observable, reliable, scalable, and resilient. We work closely with product engineering teams to facilitate safe delivery, respond to incidents, and consistently enhance system reliability.We are in search of a mid-level Site Reliability Engineer who will be instrumental in driving our initiatives forward. You will bring solid production experience, a growth mindset, and a readiness to challenge and be challenged, all while contributing to platform reliability and developing broader technical ownership with guidance from senior engineers.

Feb 25, 2026

Apply

Software Engineer - Site Reliability Engineering

Neo4j

Full-time|On-site|London

About Neo4j Neo4j builds a graph intelligence platform used by 84 of the Fortune 100 and supported by the world’s largest graph community. The platform powers knowledge graphs for AI, delivers reliable graph capabilities across cloud environments, and integrates with a wide range of systems. Neo4j’s technology is designed for precision, accountability, and governance, helping organizations turn data into actionable insights for intelligent applications and AI systems. Engineered for seamless operation in any cloud, Neo4j supports dynamic, personalized, and autonomous AI solutions. The focus is on delivering swift results, contextual knowledge, and solutions that improve both customer and employee experiences. Our Vision Neo4j’s mission is to help the world understand data. As business and society become more interconnected, Neo4j’s technology enables organizations to find and understand relationships within their data. The company pioneered the graph database category and continues to lead in helping teams innovate and stay competitive. About the Site Reliability Engineering Team The Site Reliability Engineering (SRE) team supports Neo4j’s Database as a Service (DBaaS) product, Neo4j Aura. Aura operates globally across all major cloud providers, running hundreds of Kubernetes clusters and managing thousands of Neo4j instances in production. This team is redefining SRE within Neo4j Aura. Rather than simply reacting to incidents, the SRE group empowers teams to design for reliability from the start. The work centers on building tools, practices, and a culture that embed SRE principles into the foundation of Aura’s operations. Collaboration with product teams and a commitment to resilience and engineering excellence are central to the team’s approach. What You Will Do Automate for insight and scale: Build systems that enable fast, safe, and scalable troubleshooting across thousands of Neo4j instances. This includes developing internal tools that provide actionable insights. Location London

Apr 20, 2026

Apply

Site Reliability Engineer - Release Engineering

Kaluza

Full-time|£40K/yr - £60K/yr|Hybrid|Bristol, England, United Kingdom; Edinburgh, Scotland, United Kingdom; London, England, United Kingdom

Join our dynamic Release Engineering team at Kaluza as a Site Reliability Engineer. In this pivotal role, you will play a crucial part in enhancing our software development lifecycle by developing innovative engineering solutions that empower our software teams to deploy high-quality code efficiently. Your efforts will significantly boost engineering productivity through the optimization of testing, deployment, and release processes across all Kaluza engineering teams.

Feb 23, 2026

Apply

Staff Cloud Site Reliability Engineer

Wayve Technologies Ltd.

Full-time|On-site|London

Join Wayve Technologies as a Staff Cloud Site Reliability Engineer and play a pivotal role in shaping the future of autonomous driving technology. In this position, you will leverage your expertise to enhance the reliability, performance, and scalability of our cloud infrastructure. Collaborate with cross-functional teams to design robust systems that can handle high traffic and ensure seamless operation.

Mar 11, 2026

Apply

Site Reliability Engineer (SRE)

xAI

Full-time|On-site|London, UK

About xAIAt xAI, our mission is to develop advanced AI systems that can comprehend the universe and assist humanity in its quest for knowledge. Our dedicated team is small, highly motivated, and committed to engineering excellence, making it an ideal environment for individuals who thrive on challenges and curiosity. We foster a flat organizational structure where every employee plays a crucial role in driving our mission forward. We value initiative and excellence, rewarding those who consistently demonstrate strong work ethic and prioritization skills. Effective communication is essential, and all team members are expected to share their insights clearly and concisely.About the TeamYou will join a team responsible for the backend services that power our innovative products, including grok.com and our API. Our focus is on developing and maintaining highly scalable and reliable services capable of efficiently processing tens of thousands of queries per second, hosted across multiple Kubernetes clusters in both on-premises and cloud environments.About the RoleWe are looking for a candidate who meets the following criteria:In-depth expertise in Kubernetes.Proficiency with continuous deployment systems, including Buildkite and ArgoCD.Extensive experience with monitoring tools such as Prometheus, Grafana, and PagerDuty.Strong knowledge of infrastructure as code practices utilizing tools like Pulumi or Terraform.Familiarity with systems programming languages such as Rust, C++, or Go.Experience in traffic management and HTTP proxies, such as nginx and envoy.LocationThis position requires in-person attendance in London, UK. While we typically work from the office five days a week, we do provide flexibility for remote work when necessary. Candidates should be prepared to attend late meetings at least once a week to coordinate with our global teams.

Feb 4, 2026

Apply

Site Reliability Engineer - Core at Blockchain | London

Blockchain.com

Full-time|On-site|London

Blockchain.com is at the forefront of revolutionizing finance, providing millions globally with secure access to cryptocurrency. Established in 2011, we have gained the trust of over 90 million wallet holders and more than 40 million verified users, facilitating over $1 trillion in crypto transactions.Blockchain is the world's premier software platform for digital assets. We operate the largest production blockchain platform globally, driven by our passion for coding and building an open, accessible, and equitable financial future, one innovative software solution at a time.We are seeking a Site Reliability Engineer to join our Core team. This role involves advocating for infrastructure best practices across our organization, enabling us to securely scale a distributed financial platform that serves millions daily.Our distributed financial platform addresses some of the most fascinating challenges in the crypto space for our vast customer base and is experiencing rapid growth. The Site Reliability Engineering (SRE) team at Blockchain merges software and systems engineering to create a platform that simplifies complexity, enhancing security, reliability, and swift product delivery.The SRE organization at Blockchain is a dynamic environment focused on continual improvement. We foster a culture where team members can propose, discuss, design, and implement changes with a high degree of autonomy. We value abstract thinking to develop exceptionally effective tools and strive to eliminate toil.As a member of the Core team, you will gain a comprehensive understanding of our products' infrastructure needs. Your role will include establishing and maintaining innovative engineering solutions to enhance our customers' experience through the development of essential tools. Importantly, you will also mentor and guide developer teams to deliver new features in a rapid, secure, and scalable manner.

Jan 29, 2026

Apply

Senior Site Reliability Engineer

Bumble Inc.

On-site|On-site|UK London

Join our dynamic team as a Senior Site Reliability Engineer at Bumble Inc., where your expertise in Linux and system-level operations will be pivotal in managing complex production environments. We seek a proactive engineer capable of independently troubleshooting incidents, leading post-incident recovery efforts, and implementing enhancements to boost overall system stability, performance, and observability. This role is ideal for hands-on SREs with a solid foundation in Linux infrastructure and third-party system operations, focusing on optimizing large-scale environments of over 5,000 hosts utilizing technologies such as Kafka, Redis, and Kubernetes. Please note, this position centers on operational excellence rather than application development, requiring deep technical acumen and advanced troubleshooting capabilities.

Nov 19, 2025

Apply

Senior Site Reliability Engineer at ClearScore | London

ClearScore Technology Limited

Full-time|On-site|London, England, United Kingdom

Senior Site Reliability Engineer At ClearScore, we pride ourselves on being a unique workplace that has revolutionized the financial services industry over the past decade. With millions of users benefiting from our services, our success is driven by a collaborative culture that values hard work, adaptability, and mutual respect. This environment empowers our team members to realize their full potential and achieve outcomes that profoundly impact our users' lives. Our mission is to enhance the financial wellbeing of our users by placing their needs at the forefront of our innovations. Leveraging advanced technology, insightful analytics, and stunning design, we help our users gain financial confidence and make informed decisions. We believe in fostering an environment where our employees can thrive, which is why we prioritize output over hours logged. We embrace an inclusive culture that encourages personal wellness while supporting career growth and development. Your Responsibilities: Drive architectural advancements by participating in RFCs, architecture forums, and company-wide initiatives to enhance reliability, scalability, and efficiency. Lead and advance ClearScore’s Kubernetes platform, focusing on designing, upgrading, and optimizing clusters at scale while shaping our Kubernetes usage across the organization. Independently troubleshoot and resolve complex production issues, utilizing a profound understanding of distributed systems and containerization to prevent and mitigate incidents. Design and contribute to Kubernetes controllers and automation tools that enhance our infrastructure and developer experience. Improve our AWS estate, ensuring cost-effectiveness, security, and scalability while promoting best practices across teams. Collaborate with developers to enhance service observability, implement strategic metrics and alerting, and create informative dashboards for intricate systems. Construct and maintain CI/CD pipelines from inception for new use cases, manage migrations, and introduce new tooling as necessary. Engage with open-source projects by providing fixes, feedback, or developing new tools aligned with our mission. Mentor mid-level SREs and other engineers, fostering their growth in technical mastery and operational excellence.

Jan 19, 2026

Apply

Site Reliability Engineer at Xceptor | London

Xceptor

Full-time|On-site|London

ABOUT XCEPTORAt Xceptor, data is the cornerstone of our operations. Our platform is intricately designed to facilitate comprehensive data manipulation. We harness data from diverse sources, ensuring it is curated, normalized, validated, repaired, and enriched. The data produced by Xceptor is reliable and trustworthy, meeting the high standards of our clients.Recognized as a leader in the Financial Services sector, we closely collaborate with Business Users in Middle and Back-Office teams, empowering them to address their data challenges independently, removing the dependency on technology-led projects.Our mission is to empower business users within financial institutions to create automated processes that yield trustworthy data.Our values include:Client CentricityOne TeamImpactful Your Role:As a Site Reliability Engineer (SRE), you will play a vital role in ensuring the reliability, performance, security, and operability of our services in production. You will collaborate with various teams across Xceptor, setting and evolving standards for SLOs/SLIs, observability, incident response, and operational controls. Your focus will be on building automation that reduces operational toil and enables safe and swift deployments in both cloud and on-premises environments.Xceptor employs an AI-first product development lifecycle, where AI agents act as digital partners, enhancing our design, build, testing, documentation, deployment, and operational processes. We engineer reliability through established standards, automation, and measurable signals, while maintaining human oversight for intent, constraints, verification, and accountability. What You’ll Be Doing:As a Site Reliability Engineer, you will enhance the functionality and reliability of our services, ensuring they operate optimally in production.

Mar 22, 2026

Apply

Site Reliability Engineer at Orbital | London

Orbital

Full-time|On-site|London

Join us on our mission to revolutionize real estate transactions, making them smarter, faster, and more seamless. The real estate sector is the largest asset class globally, yet its legal processes are still slow, manual, and underfunded. Lawyers often find themselves sifting through complex documents and fragmented information, all while clients demand quicker and more transparent due diligence. That’s where we come in. Orbital Copilot is an AI-powered assistant crafted specifically for commercial real estate law. In collaboration with experienced real estate lawyers, we’ve designed a tool that accelerates intricate due diligence processes by up to 70%, ensuring legal-grade accuracy. We have recently secured a $60 million Series B funding round to expedite our expansion in the UK and US markets. Our trusted partners include leading law firms such as Goodwin and BCLP, who rely on us to eliminate mundane tasks, allowing their legal teams to focus on delivering exceptional client service and expediting deal closures. At Orbital, you will be part of a dynamic team that is redefining the future of real estate transactions—embracing speed, collaboration, and empowerment from day one.Role Overview We are seeking our second Site Reliability Engineer to enhance our platform and reliability initiatives. This position presents an exciting chance to design, implement, and oversee our infrastructure, CI/CD pipelines, and production operations from scratch. You will have the autonomy to shape our technological framework, establish best practices, and develop scalable systems that will lay the groundwork for our future engineering endeavors. If you thrive in a startup culture and enjoy the intersection of software engineering, operations, and infrastructure, we invite you to connect with us.Your Responsibilities Include: ‍ ‍Infrastructure Management:Design and maintain a robust, cloud-based infrastructure on Azure.Develop and manage infrastructure as code (IaC) using tools like Terraform.Take ownership of our system’s reliability and scalability, establishing a solid foundation for our engineering environment.Container Deployment and Orchestration:Manage deployment and orchestration of containerized applications, ensuring efficient performance and reliability.

Nov 25, 2025

Apply

Senior Site Reliability Engineer - Edge

On Running

Full-time|On-site|London

Application Instructions:Please provide your full legal name in the fields below.

Mar 17, 2026

Apply

Senior Site Reliability Engineer I

Axon Enterprise, Inc.

Full-time|Hybrid|London, England, United Kingdom

About Axon Axon’s mission is to safeguard life. The company develops devices and cloud-based software focused on public safety and justice. Teams at Axon work together to address complex challenges, valuing transparency, empathy, and a range of perspectives from users, communities, and colleagues. Role Overview: Senior Site Reliability Engineer I This position sits within the Site Reliability Engineering (SRE) team. The main focus: tackle real-time challenges across Axon’s mission-critical, cloud-native services. The work centers on maintaining the reliability and quality customers expect. Collaboration is key, both within the SRE group and across the wider engineering organization, to help product teams deliver new features consistently. Work Location and Flexibility This role is based in London, England, United Kingdom. Axon uses a hybrid working model. Team members are expected onsite from Tuesday to Friday, with remote work on Mondays (unless a workplace accommodation is granted). The company emphasizes in-person collaboration to support teamwork, mentorship, and shared success.

Apr 17, 2026

Apply

Observability Platform Engineer (Site Reliability Focus)

YouLend

Full-time|On-site|London, England, United Kingdom

About YouLendYouLend is an innovative and swiftly expanding FinTech firm, recognized as the leading embedded financing platform for top-tier e-commerce platforms, technology companies, and Payment Service Providers globally. Our advanced software platform empowers partners to enhance their value propositions by offering customizable financing solutions under their own brand, allowing them to serve their merchants without any capital risks.Backed by EQT, a prominent Private Equity firm, our company has experienced remarkable growth, boasting a +100% year-over-year increase since 2020. Our headquarters are in London, UK, with a presence in various European countries and the United States, supporting esteemed partners such as eBay, Amazon, Just Eat, Shopify, and Stripe.Role OverviewAs we establish a premier Observability function, we seek a passionate individual dedicated to uptime, insightful alerts, and sophisticated dashboards. If you have experience with on-call duties, managing alert noise, or debugging elusive issues across microservices during off-hours, we want you on our team!This position transcends a typical “Platform Engineer” role; you will be intensely focused on observability, system reliability, and empowering developers. You will collaborate closely with teams to understand not just when failures occur, but also why.Key Responsibilities:Designing and scaling on-call systems that engineers will appreciate being a part of.Enhancing Datadog monitoring, alerting, dashboards, and log pipelines for Kubernetes environments.Defining and managing SLOs, SLIs, and error budgets, ensuring teams adhere to them.Developing scorecards and software catalogs so engineers can easily track system health and ownership.Mentoring and enabling development teams to take charge of their own observability, alerts, and incident responses.Implementing chaos engineering practices to intentionally identify weaknesses.Fostering a culture of reliability through incident reviews, shared learnings, and transparency.Ideal Candidate Qualifications:Proven production experience with observability tools, particularly Datadog, in cloud-native settings.Experience establishing monitoring and alerting across Kubernetes services.Demonstrated ability in building or scaling on-call systems within startup or large-scale environments.Expertise in minimizing alert fatigue and a passion for effective monitoring.

Feb 13, 2026

Apply

Senior Site Reliability Engineer for Vehicle Software

Wayve

Full-time|On-site|London

At Wayve, we are dedicated to fostering a diverse, fair, and respectful workplace culture that values the unique skills and perspectives of every individual, irrespective of sex, race, religion, belief, ethnic or national origin, disability, age, citizenship, marital status, domestic partnership, sexual orientation, gender identity, veteran status, pregnancy or related conditions (including breastfeeding), or any other basis protected by applicable law.About UsEstablished in 2017, Wayve is at the forefront of developing Embodied AI technology. Our cutting-edge AI software and foundational models empower vehicles to perceive, interpret, and navigate complex environments, significantly improving the usability and safety of automated driving systems.Our mission is to create autonomous solutions that drive the world forward. Our intelligent, mapless, and hardware-agnostic AI products cater to automakers, facilitating the shift from assisted to fully automated driving.We thrive on the challenges posed by a fast-paced environment—embracing uncertainty and tackling complex problems to unlock innovative solutions. We hold ourselves to high standards while remaining humble in our pursuit of excellence, constantly evolving to pave the way for a smarter, safer future.Your contributions at Wayve will truly make a difference. We celebrate diversity, embrace new ideas, and cultivate an inclusive work environment where we support each other to make an impact.Join us at Wayve and let your career take flight!The RoleAs a Senior Site Reliability Engineer in Vehicle Software, you will ensure the reliability, observability, and safety of Wayve’s autonomous driving fleet while operating on public roads. You will collaborate at the intersection of software, hardware, and operations, transforming real-world incidents and performance bottlenecks into sustainable engineering enhancements. This role provides a clear connection between your work and the delivery of safer deployments, accelerated iterations, and expanded fleet capabilities.

Feb 24, 2026

Create account — see all 2,329 results