Site Reliability Operations Analyst at Palantir | London, UK
Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Experience
Qualifications
About Palantir Technologies Inc.
Palantir Technologies is a leader in software solutions that empower organizations to make informed decisions through data. Our tools help tackle some of the most pressing challenges faced by businesses and governments today.
Similar jobs
Palantir Technologies Inc.
Join a Transformative OrganizationAt Palantir, we create the premier software solution for data-driven decision-making and operational efficiency. Our innovative platforms enable our partners to make significant impacts, from developing life-saving medications to predicting supply chain issues and even reuniting families with missing children.About the Position As a Site Reliability Operations Analyst, you will be pivotal in ensuring the smooth deployment of Palantir's solutions. Your role encompasses crafting, implementing, and executing processes designed to optimize workflows, eliminate bottlenecks, and enhance project tracking. You will proactively identify and address customer needs, allowing our engineers to devote their expertise to complex technical challenges. This role demands a blend of project management prowess, process improvement capabilities, and diligent execution. You are a problem-solver at heart and are always open to embracing diverse ideas to drive efficiency and effectiveness.
Palantir Technologies
Join a Transformative OrganizationAt Palantir, we are at the forefront of developing cutting-edge software that facilitates data-driven decision-making and operational excellence. Our innovative platforms empower our partners to address critical challenges, from discovering life-saving medications and predicting supply chain interruptions to reuniting families with missing children.Position OverviewThe Finance Team plays a vital role in supporting and empowering our team members, known as Palantirians, through seamless operations and inventive problem-solving. As an International Payroll Analyst, you will take charge of payroll administration while simultaneously identifying inefficiencies, proposing solutions, and driving projects to fruition. This role requires a proactive approach to designing, implementing, and enhancing payroll workflows. If you are passionate about process improvement and are looking to challenge the status quo, we invite you to apply for this exciting opportunity.
Palantir Technologies Inc.
Join Palantir Technologies as an Information Security Engineer in London, where you will play a critical role in safeguarding our cutting-edge technology and sensitive data. Your expertise will help us develop and implement robust security strategies, ensuring that our platforms remain secure against evolving threats.
Palantir Technologies
Join a Pioneering CompanyAt Palantir Technologies, we are at the forefront of creating world-class software that empowers organizations to make data-informed decisions. Our innovative platforms enable partners to tackle critical challenges, from developing groundbreaking medical treatments to optimizing supply chains and reuniting families.Position Overview The Finance Team at Palantir is dedicated to equipping our team members with the tools and insights needed for effective operations and proactive problem-solving. As a vital member of the Revenue Team, you will play a crucial role in analyzing and auditing complex revenue transactions while continuously improving our revenue recognition policies. You will manage the entire quote-to-cash cycle for your designated customer portfolio, collaborating closely with Deal and Finance teams to ensure the smooth and accurate progression of accounting and operational facets of your deals. Your contribution is essential to our global revenue close process. You are a collaborative team member who possesses a strong desire for learning and self-improvement, along with meticulous attention to detail. The role also demands robust operational, organizational, and communication skills.
Palantir Technologies Inc.
Role overview The Forward Deployed Reliability Engineer at Palantir Technologies in London plays a key role in supporting the reliability and performance of Palantir's software as it becomes part of client operations. This position centers on ensuring that solutions remain stable and effective after deployment. What you will do Partner with clients to help integrate Palantir's technology into their daily workflows. Troubleshoot and resolve complex technical challenges to keep systems stable. Work to optimize performance and apply established reliability engineering practices. Collaborate with teams across disciplines to enhance system functionality and deliver results for clients.
About WheelyWheely is revolutionizing premium transportation in major cities across Europe, the United States, and the Middle East. We seamlessly integrate cutting-edge technology with the artistry of five-star chauffeuring to provide an unparalleled experience that has earned the trust of over 100,000 active riders and 1,200 corporate clients.As a profitable and rapidly growing scale-up, we have raised $43M and surpassed $100M in annual revenue. Following our recent launch in New York City, we are swiftly expanding across the US and EMEA. If you take pride in your craft and are eager to contribute to our next phase of growth, we invite you to connect with us.Our infrastructure has been rebuilt almost from the ground up over the past few years, and we are now seeking to further expand our infrastructure team.As a valued member of our team, you will focus on minimizing incidents related to availability, performance, and security. You will accelerate the delivery of new features to customers by building flexible, highly available, and secure infrastructure, ensuring a smooth journey for every customer.
Kaplan Professional
Business Analyst Reference Number: JR254146Location: Hybrid working model - a blend of home-based and office work at Borough High Street (60/40)Working Pattern: Full Time, 37.5 hours per weekContract Type: PermanentNumber of Roles: 2Grade: IT5ASalary: £51,000 - £54,800 per annumWe are seeking a motivated and skilled Business Analyst to join our dynamic team.Role SummaryIn this role, you will be pivotal in managing customer expectations and demands. You will collaborate closely with various IT functions and business units to assess their needs, ensuring they receive top-notch service and facilitating the prioritization of project deliveries.What You Will Bring to This RoleIf you are passionate about driving change, influencing decisions, and thriving in a fast-paced, collaborative atmosphere, this position is perfect for you.Proven experience in delivering multiple projects within a matrix systems environment.Proficient in organizing and leading workshops, as well as negotiating with diverse stakeholders, up to 10 at a time.Adept at identifying genuine business needs, exploring solution alternatives, and articulating proposals that yield business value. Confident in presenting ideas and recommendations.Experience in developing and maintaining business process maps.Outstanding communication skills – capable of translating technical concepts into business-friendly language, both in written and verbal formats.Results-oriented with a strong focus on managing the delivery timeline of business solutions.A collaborative team player, skilled in working effectively with others.Self-motivated with a strong emphasis on customer and end-user delivery.Knowledge of data governance frameworks and protocols.Familiarity with ITIL processes and standards.About UsKaplan Professional stands as a premier provider of apprenticeships, accountancy, tax, finance, and banking courses, alongside professional assessments. For nearly 80 years, we have played a significant role in shaping the careers of finance professionals.
ABOUT XCEPTORAt Xceptor, data is the cornerstone of our operations. Our platform is intricately designed to facilitate comprehensive data manipulation. We harness data from diverse sources, ensuring it is curated, normalized, validated, repaired, and enriched. The data produced by Xceptor is reliable and trustworthy, meeting the high standards of our clients.Recognized as a leader in the Financial Services sector, we closely collaborate with Business Users in Middle and Back-Office teams, empowering them to address their data challenges independently, removing the dependency on technology-led projects.Our mission is to empower business users within financial institutions to create automated processes that yield trustworthy data.Our values include:Client CentricityOne TeamImpactful Your Role:As a Site Reliability Engineer (SRE), you will play a vital role in ensuring the reliability, performance, security, and operability of our services in production. You will collaborate with various teams across Xceptor, setting and evolving standards for SLOs/SLIs, observability, incident response, and operational controls. Your focus will be on building automation that reduces operational toil and enables safe and swift deployments in both cloud and on-premises environments.Xceptor employs an AI-first product development lifecycle, where AI agents act as digital partners, enhancing our design, build, testing, documentation, deployment, and operational processes. We engineer reliability through established standards, automation, and measurable signals, while maintaining human oversight for intent, constraints, verification, and accountability. What You’ll Be Doing:As a Site Reliability Engineer, you will enhance the functionality and reliability of our services, ensuring they operate optimally in production.
Blockchain.com
Blockchain.com is at the forefront of revolutionizing finance, providing millions globally with secure access to cryptocurrency. Established in 2011, we have gained the trust of over 90 million wallet holders and more than 40 million verified users, facilitating over $1 trillion in crypto transactions.Blockchain is the world's premier software platform for digital assets. We operate the largest production blockchain platform globally, driven by our passion for coding and building an open, accessible, and equitable financial future, one innovative software solution at a time.We are seeking a Site Reliability Engineer to join our Core team. This role involves advocating for infrastructure best practices across our organization, enabling us to securely scale a distributed financial platform that serves millions daily.Our distributed financial platform addresses some of the most fascinating challenges in the crypto space for our vast customer base and is experiencing rapid growth. The Site Reliability Engineering (SRE) team at Blockchain merges software and systems engineering to create a platform that simplifies complexity, enhancing security, reliability, and swift product delivery.The SRE organization at Blockchain is a dynamic environment focused on continual improvement. We foster a culture where team members can propose, discuss, design, and implement changes with a high degree of autonomy. We value abstract thinking to develop exceptionally effective tools and strive to eliminate toil.As a member of the Core team, you will gain a comprehensive understanding of our products' infrastructure needs. Your role will include establishing and maintaining innovative engineering solutions to enhance our customers' experience through the development of essential tools. Importantly, you will also mentor and guide developer teams to deliver new features in a rapid, secure, and scalable manner.
Join us on our mission to revolutionize real estate transactions, making them smarter, faster, and more seamless. The real estate sector is the largest asset class globally, yet its legal processes are still slow, manual, and underfunded. Lawyers often find themselves sifting through complex documents and fragmented information, all while clients demand quicker and more transparent due diligence. That’s where we come in. Orbital Copilot is an AI-powered assistant crafted specifically for commercial real estate law. In collaboration with experienced real estate lawyers, we’ve designed a tool that accelerates intricate due diligence processes by up to 70%, ensuring legal-grade accuracy. We have recently secured a $60 million Series B funding round to expedite our expansion in the UK and US markets. Our trusted partners include leading law firms such as Goodwin and BCLP, who rely on us to eliminate mundane tasks, allowing their legal teams to focus on delivering exceptional client service and expediting deal closures. At Orbital, you will be part of a dynamic team that is redefining the future of real estate transactions—embracing speed, collaboration, and empowerment from day one.Role Overview We are seeking our second Site Reliability Engineer to enhance our platform and reliability initiatives. This position presents an exciting chance to design, implement, and oversee our infrastructure, CI/CD pipelines, and production operations from scratch. You will have the autonomy to shape our technological framework, establish best practices, and develop scalable systems that will lay the groundwork for our future engineering endeavors. If you thrive in a startup culture and enjoy the intersection of software engineering, operations, and infrastructure, we invite you to connect with us.Your Responsibilities Include: Infrastructure Management:Design and maintain a robust, cloud-based infrastructure on Azure.Develop and manage infrastructure as code (IaC) using tools like Terraform.Take ownership of our system’s reliability and scalability, establishing a solid foundation for our engineering environment.Container Deployment and Orchestration:Manage deployment and orchestration of containerized applications, ensuring efficient performance and reliability.
Palantir Technologies Inc.
Join the prestigious UK Meritocracy Fellowship at Palantir Technologies, where we empower the next generation of leaders through innovation and collaboration. As a fellow, you will engage in transformative projects, work alongside industry experts, and contribute to building cutting-edge solutions that drive meaningful change across various sectors.
About UsAt Trainline, we are passionate advocates for rail travel, dedicated to creating a greener and more sustainable future for travel. Our platform empowers millions of travelers to discover and book the most cost-effective tickets across various carriers, fares, and journey options through our highly-rated mobile app, website, and B2B partner channels.Join Us in Creating Great Journeys As Europe’s leading rail app, we attract over 125 million monthly visits and facilitate £5.9 billion in annual ticket sales. Collaborating with over 270 rail and coach companies across more than 40 countries, we envision a world where travel is simple, seamless, eco-friendly, and affordable. Today, as a FTSE 250 company, our diverse team of over 1,000 Trainliners from 50+ nationalities is based in London, Paris, Barcelona, Milan, Edinburgh, and Madrid. With a strong focus on growth in the UK and Europe, this is an exciting time to join us on this high-speed journey.Introducing the Reliability & Operations Engineering Team Trainline is at the forefront of technology, delivering world-class digital experiences for millions of customers. Our platform predominantly operates on AWS, utilizing cloud-native architecture, modern CI/CD pipelines, and robust DevOps and SRE methodologies.The Reliability & Operations Engineering team (ReliabilityOps) integrates SRE, Incident Management, and Database Reliability to ensure our platform remains observable, reliable, scalable, and resilient. We work closely with product engineering teams to facilitate safe delivery, respond to incidents, and consistently enhance system reliability.We are in search of a mid-level Site Reliability Engineer who will be instrumental in driving our initiatives forward. You will bring solid production experience, a growth mindset, and a readiness to challenge and be challenged, all while contributing to platform reliability and developing broader technical ownership with guidance from senior engineers.
ClearScore Technology Limited
Senior Site Reliability Engineer At ClearScore, we pride ourselves on being a unique workplace that has revolutionized the financial services industry over the past decade. With millions of users benefiting from our services, our success is driven by a collaborative culture that values hard work, adaptability, and mutual respect. This environment empowers our team members to realize their full potential and achieve outcomes that profoundly impact our users' lives. Our mission is to enhance the financial wellbeing of our users by placing their needs at the forefront of our innovations. Leveraging advanced technology, insightful analytics, and stunning design, we help our users gain financial confidence and make informed decisions. We believe in fostering an environment where our employees can thrive, which is why we prioritize output over hours logged. We embrace an inclusive culture that encourages personal wellness while supporting career growth and development. Your Responsibilities: Drive architectural advancements by participating in RFCs, architecture forums, and company-wide initiatives to enhance reliability, scalability, and efficiency. Lead and advance ClearScore’s Kubernetes platform, focusing on designing, upgrading, and optimizing clusters at scale while shaping our Kubernetes usage across the organization. Independently troubleshoot and resolve complex production issues, utilizing a profound understanding of distributed systems and containerization to prevent and mitigate incidents. Design and contribute to Kubernetes controllers and automation tools that enhance our infrastructure and developer experience. Improve our AWS estate, ensuring cost-effectiveness, security, and scalability while promoting best practices across teams. Collaborate with developers to enhance service observability, implement strategic metrics and alerting, and create informative dashboards for intricate systems. Construct and maintain CI/CD pipelines from inception for new use cases, manage migrations, and introduce new tooling as necessary. Engage with open-source projects by providing fixes, feedback, or developing new tools aligned with our mission. Mentor mid-level SREs and other engineers, fostering their growth in technical mastery and operational excellence.
Air Apps builds technology to help people plan, work, and live better. Founded in Lisbon in 2018 and still family-led, the company has grown to San Francisco and now London, remaining self-funded and reaching over 100 million downloads. Every day, teams at Air Apps challenge assumptions and develop AI-powered products that make a difference for users worldwide. The company values creativity and aims to improve how resources are managed and lives are impacted. Role Overview The Site Reliability Engineer (SRE) will focus on keeping Air Apps systems reliable, available, and scalable. This role connects software development and operations, using automation, monitoring, and performance tuning to reduce downtime and strengthen system resilience. This is a fully onsite position based in the London Metropolitan Area. Air Apps will consider relocation support for the right candidate. The SRE will work closely with cross-functional teams in a busy office setting. What You Will Do Design and implement systems that are scalable, reliable, and fault-tolerant across cloud platforms. Develop and maintain observability tools for monitoring, logging, and alerting (such as Prometheus, Grafana, Datadog, ELK). Automate infrastructure provisioning, deployment, and incident response using Infrastructure as Code tools like Terraform or CloudFormation. Improve system performance, scalability, and incident response processes to maximize uptime. Work with development and DevOps teams to strengthen system designs for reliability. Conduct root cause analysis and implement steps to prevent future failures. Design and maintain strategies for load balancing, failover, and disaster recovery to ensure high availability.
Palantir Technologies Inc.
Join Palantir Technologies as a Forward Deployed Infrastructure Engineer, where you will play a critical role in supporting the UK Government. In this position, you will leverage your technical expertise to implement innovative infrastructure solutions that address complex challenges. Collaborate with cross-functional teams to deliver high-impact projects and enhance national security capabilities.
Legal & General
Join Legal & General as a Graduate Site Reliability Engineer in London, where you will play a vital role in enhancing our technology operations. This entry-level position offers a unique opportunity to work alongside a dynamic tech team, ensuring our systems operate smoothly and efficiently.As a Graduate in this role, you will be involved in monitoring system performance, troubleshooting issues, and collaborating with various departments to implement automation solutions. Your contributions will help improve our infrastructure and deliver exceptional service to our clients.
About UsAt Heidi Health, we believe that healthcare deserves a more harmonious approach—one that ensures care remains continuous and deeply personalized. Our innovative AI Care Partner collaborates with healthcare providers to enhance the care experience for patients and clinicians alike.Our diverse team includes doctors, engineers, designers, researchers, and creatives, all dedicated to creating tools that empower clinicians to focus on what matters most: their patients.In just 18 months, we've reclaimed over 18 million hours for healthcare professionals, facilitating 73 million patient visits across 116 countries. Currently, our technology supports more than two million patient visits weekly worldwide.With nearly $100 million in funding, we are expanding our presence in the US, UK, Canada, and Europe, partnering with prestigious health systems such as the NHS, Beth Israel Lahey Health, and Monash Health.The OpportunityJoin our core Platform/SRE team, where you will take charge of production reliability. This role involves active incident response, on-call duties, system reliability, and daily operational oversight of Heidi’s platform.We welcome applications from mid-level SREs eager to embrace greater responsibility, as well as senior SREs who relish hands-on operational roles. This position emphasizes operational involvement and aims to maintain the health of real systems in production.Your ResponsibilitiesEngage in on-call and incident response: Address production incidents, assist in service restoration, and facilitate clear communication during incidents, escalating to leading incidents over time.Enhance operational reliability: Identify recurring issues and reliability risks, driving improvements through better alerting, automation, system enhancements, and process refinements.Manage production environment components: Operate and enhance Kubernetes clusters, cloud infrastructure, and core platform services, increasing responsibility as expertise grows.Boost observability: Refine dashboards, alerts, logs, and traces to enable earlier detection and faster diagnosis of issues, concentrating on actionable insights.Minimize operational toil: Automate repetitive tasks, streamline runbooks, and enhance tooling to facilitate smoother and safer on-call and daily operations.
About Neo4j Neo4j builds a graph intelligence platform used by 84 of the Fortune 100 and supported by the world’s largest graph community. The platform powers knowledge graphs for AI, delivers reliable graph capabilities across cloud environments, and integrates with a wide range of systems. Neo4j’s technology is designed for precision, accountability, and governance, helping organizations turn data into actionable insights for intelligent applications and AI systems. Engineered for seamless operation in any cloud, Neo4j supports dynamic, personalized, and autonomous AI solutions. The focus is on delivering swift results, contextual knowledge, and solutions that improve both customer and employee experiences. Our Vision Neo4j’s mission is to help the world understand data. As business and society become more interconnected, Neo4j’s technology enables organizations to find and understand relationships within their data. The company pioneered the graph database category and continues to lead in helping teams innovate and stay competitive. About the Site Reliability Engineering Team The Site Reliability Engineering (SRE) team supports Neo4j’s Database as a Service (DBaaS) product, Neo4j Aura. Aura operates globally across all major cloud providers, running hundreds of Kubernetes clusters and managing thousands of Neo4j instances in production. This team is redefining SRE within Neo4j Aura. Rather than simply reacting to incidents, the SRE group empowers teams to design for reliability from the start. The work centers on building tools, practices, and a culture that embed SRE principles into the foundation of Aura’s operations. Collaboration with product teams and a commitment to resilience and engineering excellence are central to the team’s approach. What You Will Do Automate for insight and scale: Build systems that enable fast, safe, and scalable troubleshooting across thousands of Neo4j instances. This includes developing internal tools that provide actionable insights. Location London
About xAIAt xAI, our mission is to develop advanced AI systems that can comprehend the universe and assist humanity in its quest for knowledge. Our dedicated team is small, highly motivated, and committed to engineering excellence, making it an ideal environment for individuals who thrive on challenges and curiosity. We foster a flat organizational structure where every employee plays a crucial role in driving our mission forward. We value initiative and excellence, rewarding those who consistently demonstrate strong work ethic and prioritization skills. Effective communication is essential, and all team members are expected to share their insights clearly and concisely.About the TeamYou will join a team responsible for the backend services that power our innovative products, including grok.com and our API. Our focus is on developing and maintaining highly scalable and reliable services capable of efficiently processing tens of thousands of queries per second, hosted across multiple Kubernetes clusters in both on-premises and cloud environments.About the RoleWe are looking for a candidate who meets the following criteria:In-depth expertise in Kubernetes.Proficiency with continuous deployment systems, including Buildkite and ArgoCD.Extensive experience with monitoring tools such as Prometheus, Grafana, and PagerDuty.Strong knowledge of infrastructure as code practices utilizing tools like Pulumi or Terraform.Familiarity with systems programming languages such as Rust, C++, or Go.Experience in traffic management and HTTP proxies, such as nginx and envoy.LocationThis position requires in-person attendance in London, UK. While we typically work from the office five days a week, we do provide flexibility for remote work when necessary. Candidates should be prepared to attend late meetings at least once a week to coordinate with our global teams.
Join our dynamic Release Engineering team at Kaluza as a Site Reliability Engineer. In this pivotal role, you will play a crucial part in enhancing our software development lifecycle by developing innovative engineering solutions that empower our software teams to deploy high-quality code efficiently. Your efforts will significantly boost engineering productivity through the optimization of testing, deployment, and release processes across all Kaluza engineering teams.
Sign in to browse more jobs
Create account — see all 10,199 results
Palantir Technologies Inc.
Join a Transformative OrganizationAt Palantir, we create the premier software solution for data-driven decision-making and operational efficiency. Our innovative platforms enable our partners to make significant impacts, from developing life-saving medications to predicting supply chain issues and even reuniting families with missing children.About the Position As a Site Reliability Operations Analyst, you will be pivotal in ensuring the smooth deployment of Palantir's solutions. Your role encompasses crafting, implementing, and executing processes designed to optimize workflows, eliminate bottlenecks, and enhance project tracking. You will proactively identify and address customer needs, allowing our engineers to devote their expertise to complex technical challenges. This role demands a blend of project management prowess, process improvement capabilities, and diligent execution. You are a problem-solver at heart and are always open to embracing diverse ideas to drive efficiency and effectiveness.
Palantir Technologies
Join a Transformative OrganizationAt Palantir, we are at the forefront of developing cutting-edge software that facilitates data-driven decision-making and operational excellence. Our innovative platforms empower our partners to address critical challenges, from discovering life-saving medications and predicting supply chain interruptions to reuniting families with missing children.Position OverviewThe Finance Team plays a vital role in supporting and empowering our team members, known as Palantirians, through seamless operations and inventive problem-solving. As an International Payroll Analyst, you will take charge of payroll administration while simultaneously identifying inefficiencies, proposing solutions, and driving projects to fruition. This role requires a proactive approach to designing, implementing, and enhancing payroll workflows. If you are passionate about process improvement and are looking to challenge the status quo, we invite you to apply for this exciting opportunity.
Palantir Technologies Inc.
Join Palantir Technologies as an Information Security Engineer in London, where you will play a critical role in safeguarding our cutting-edge technology and sensitive data. Your expertise will help us develop and implement robust security strategies, ensuring that our platforms remain secure against evolving threats.
Palantir Technologies
Join a Pioneering CompanyAt Palantir Technologies, we are at the forefront of creating world-class software that empowers organizations to make data-informed decisions. Our innovative platforms enable partners to tackle critical challenges, from developing groundbreaking medical treatments to optimizing supply chains and reuniting families.Position Overview The Finance Team at Palantir is dedicated to equipping our team members with the tools and insights needed for effective operations and proactive problem-solving. As a vital member of the Revenue Team, you will play a crucial role in analyzing and auditing complex revenue transactions while continuously improving our revenue recognition policies. You will manage the entire quote-to-cash cycle for your designated customer portfolio, collaborating closely with Deal and Finance teams to ensure the smooth and accurate progression of accounting and operational facets of your deals. Your contribution is essential to our global revenue close process. You are a collaborative team member who possesses a strong desire for learning and self-improvement, along with meticulous attention to detail. The role also demands robust operational, organizational, and communication skills.
Palantir Technologies Inc.
Role overview The Forward Deployed Reliability Engineer at Palantir Technologies in London plays a key role in supporting the reliability and performance of Palantir's software as it becomes part of client operations. This position centers on ensuring that solutions remain stable and effective after deployment. What you will do Partner with clients to help integrate Palantir's technology into their daily workflows. Troubleshoot and resolve complex technical challenges to keep systems stable. Work to optimize performance and apply established reliability engineering practices. Collaborate with teams across disciplines to enhance system functionality and deliver results for clients.
About WheelyWheely is revolutionizing premium transportation in major cities across Europe, the United States, and the Middle East. We seamlessly integrate cutting-edge technology with the artistry of five-star chauffeuring to provide an unparalleled experience that has earned the trust of over 100,000 active riders and 1,200 corporate clients.As a profitable and rapidly growing scale-up, we have raised $43M and surpassed $100M in annual revenue. Following our recent launch in New York City, we are swiftly expanding across the US and EMEA. If you take pride in your craft and are eager to contribute to our next phase of growth, we invite you to connect with us.Our infrastructure has been rebuilt almost from the ground up over the past few years, and we are now seeking to further expand our infrastructure team.As a valued member of our team, you will focus on minimizing incidents related to availability, performance, and security. You will accelerate the delivery of new features to customers by building flexible, highly available, and secure infrastructure, ensuring a smooth journey for every customer.
Kaplan Professional
Business Analyst Reference Number: JR254146Location: Hybrid working model - a blend of home-based and office work at Borough High Street (60/40)Working Pattern: Full Time, 37.5 hours per weekContract Type: PermanentNumber of Roles: 2Grade: IT5ASalary: £51,000 - £54,800 per annumWe are seeking a motivated and skilled Business Analyst to join our dynamic team.Role SummaryIn this role, you will be pivotal in managing customer expectations and demands. You will collaborate closely with various IT functions and business units to assess their needs, ensuring they receive top-notch service and facilitating the prioritization of project deliveries.What You Will Bring to This RoleIf you are passionate about driving change, influencing decisions, and thriving in a fast-paced, collaborative atmosphere, this position is perfect for you.Proven experience in delivering multiple projects within a matrix systems environment.Proficient in organizing and leading workshops, as well as negotiating with diverse stakeholders, up to 10 at a time.Adept at identifying genuine business needs, exploring solution alternatives, and articulating proposals that yield business value. Confident in presenting ideas and recommendations.Experience in developing and maintaining business process maps.Outstanding communication skills – capable of translating technical concepts into business-friendly language, both in written and verbal formats.Results-oriented with a strong focus on managing the delivery timeline of business solutions.A collaborative team player, skilled in working effectively with others.Self-motivated with a strong emphasis on customer and end-user delivery.Knowledge of data governance frameworks and protocols.Familiarity with ITIL processes and standards.About UsKaplan Professional stands as a premier provider of apprenticeships, accountancy, tax, finance, and banking courses, alongside professional assessments. For nearly 80 years, we have played a significant role in shaping the careers of finance professionals.
ABOUT XCEPTORAt Xceptor, data is the cornerstone of our operations. Our platform is intricately designed to facilitate comprehensive data manipulation. We harness data from diverse sources, ensuring it is curated, normalized, validated, repaired, and enriched. The data produced by Xceptor is reliable and trustworthy, meeting the high standards of our clients.Recognized as a leader in the Financial Services sector, we closely collaborate with Business Users in Middle and Back-Office teams, empowering them to address their data challenges independently, removing the dependency on technology-led projects.Our mission is to empower business users within financial institutions to create automated processes that yield trustworthy data.Our values include:Client CentricityOne TeamImpactful Your Role:As a Site Reliability Engineer (SRE), you will play a vital role in ensuring the reliability, performance, security, and operability of our services in production. You will collaborate with various teams across Xceptor, setting and evolving standards for SLOs/SLIs, observability, incident response, and operational controls. Your focus will be on building automation that reduces operational toil and enables safe and swift deployments in both cloud and on-premises environments.Xceptor employs an AI-first product development lifecycle, where AI agents act as digital partners, enhancing our design, build, testing, documentation, deployment, and operational processes. We engineer reliability through established standards, automation, and measurable signals, while maintaining human oversight for intent, constraints, verification, and accountability. What You’ll Be Doing:As a Site Reliability Engineer, you will enhance the functionality and reliability of our services, ensuring they operate optimally in production.
Blockchain.com
Blockchain.com is at the forefront of revolutionizing finance, providing millions globally with secure access to cryptocurrency. Established in 2011, we have gained the trust of over 90 million wallet holders and more than 40 million verified users, facilitating over $1 trillion in crypto transactions.Blockchain is the world's premier software platform for digital assets. We operate the largest production blockchain platform globally, driven by our passion for coding and building an open, accessible, and equitable financial future, one innovative software solution at a time.We are seeking a Site Reliability Engineer to join our Core team. This role involves advocating for infrastructure best practices across our organization, enabling us to securely scale a distributed financial platform that serves millions daily.Our distributed financial platform addresses some of the most fascinating challenges in the crypto space for our vast customer base and is experiencing rapid growth. The Site Reliability Engineering (SRE) team at Blockchain merges software and systems engineering to create a platform that simplifies complexity, enhancing security, reliability, and swift product delivery.The SRE organization at Blockchain is a dynamic environment focused on continual improvement. We foster a culture where team members can propose, discuss, design, and implement changes with a high degree of autonomy. We value abstract thinking to develop exceptionally effective tools and strive to eliminate toil.As a member of the Core team, you will gain a comprehensive understanding of our products' infrastructure needs. Your role will include establishing and maintaining innovative engineering solutions to enhance our customers' experience through the development of essential tools. Importantly, you will also mentor and guide developer teams to deliver new features in a rapid, secure, and scalable manner.
Join us on our mission to revolutionize real estate transactions, making them smarter, faster, and more seamless. The real estate sector is the largest asset class globally, yet its legal processes are still slow, manual, and underfunded. Lawyers often find themselves sifting through complex documents and fragmented information, all while clients demand quicker and more transparent due diligence. That’s where we come in. Orbital Copilot is an AI-powered assistant crafted specifically for commercial real estate law. In collaboration with experienced real estate lawyers, we’ve designed a tool that accelerates intricate due diligence processes by up to 70%, ensuring legal-grade accuracy. We have recently secured a $60 million Series B funding round to expedite our expansion in the UK and US markets. Our trusted partners include leading law firms such as Goodwin and BCLP, who rely on us to eliminate mundane tasks, allowing their legal teams to focus on delivering exceptional client service and expediting deal closures. At Orbital, you will be part of a dynamic team that is redefining the future of real estate transactions—embracing speed, collaboration, and empowerment from day one.Role Overview We are seeking our second Site Reliability Engineer to enhance our platform and reliability initiatives. This position presents an exciting chance to design, implement, and oversee our infrastructure, CI/CD pipelines, and production operations from scratch. You will have the autonomy to shape our technological framework, establish best practices, and develop scalable systems that will lay the groundwork for our future engineering endeavors. If you thrive in a startup culture and enjoy the intersection of software engineering, operations, and infrastructure, we invite you to connect with us.Your Responsibilities Include: Infrastructure Management:Design and maintain a robust, cloud-based infrastructure on Azure.Develop and manage infrastructure as code (IaC) using tools like Terraform.Take ownership of our system’s reliability and scalability, establishing a solid foundation for our engineering environment.Container Deployment and Orchestration:Manage deployment and orchestration of containerized applications, ensuring efficient performance and reliability.
Palantir Technologies Inc.
Join the prestigious UK Meritocracy Fellowship at Palantir Technologies, where we empower the next generation of leaders through innovation and collaboration. As a fellow, you will engage in transformative projects, work alongside industry experts, and contribute to building cutting-edge solutions that drive meaningful change across various sectors.
About UsAt Trainline, we are passionate advocates for rail travel, dedicated to creating a greener and more sustainable future for travel. Our platform empowers millions of travelers to discover and book the most cost-effective tickets across various carriers, fares, and journey options through our highly-rated mobile app, website, and B2B partner channels.Join Us in Creating Great Journeys As Europe’s leading rail app, we attract over 125 million monthly visits and facilitate £5.9 billion in annual ticket sales. Collaborating with over 270 rail and coach companies across more than 40 countries, we envision a world where travel is simple, seamless, eco-friendly, and affordable. Today, as a FTSE 250 company, our diverse team of over 1,000 Trainliners from 50+ nationalities is based in London, Paris, Barcelona, Milan, Edinburgh, and Madrid. With a strong focus on growth in the UK and Europe, this is an exciting time to join us on this high-speed journey.Introducing the Reliability & Operations Engineering Team Trainline is at the forefront of technology, delivering world-class digital experiences for millions of customers. Our platform predominantly operates on AWS, utilizing cloud-native architecture, modern CI/CD pipelines, and robust DevOps and SRE methodologies.The Reliability & Operations Engineering team (ReliabilityOps) integrates SRE, Incident Management, and Database Reliability to ensure our platform remains observable, reliable, scalable, and resilient. We work closely with product engineering teams to facilitate safe delivery, respond to incidents, and consistently enhance system reliability.We are in search of a mid-level Site Reliability Engineer who will be instrumental in driving our initiatives forward. You will bring solid production experience, a growth mindset, and a readiness to challenge and be challenged, all while contributing to platform reliability and developing broader technical ownership with guidance from senior engineers.
ClearScore Technology Limited
Senior Site Reliability Engineer At ClearScore, we pride ourselves on being a unique workplace that has revolutionized the financial services industry over the past decade. With millions of users benefiting from our services, our success is driven by a collaborative culture that values hard work, adaptability, and mutual respect. This environment empowers our team members to realize their full potential and achieve outcomes that profoundly impact our users' lives. Our mission is to enhance the financial wellbeing of our users by placing their needs at the forefront of our innovations. Leveraging advanced technology, insightful analytics, and stunning design, we help our users gain financial confidence and make informed decisions. We believe in fostering an environment where our employees can thrive, which is why we prioritize output over hours logged. We embrace an inclusive culture that encourages personal wellness while supporting career growth and development. Your Responsibilities: Drive architectural advancements by participating in RFCs, architecture forums, and company-wide initiatives to enhance reliability, scalability, and efficiency. Lead and advance ClearScore’s Kubernetes platform, focusing on designing, upgrading, and optimizing clusters at scale while shaping our Kubernetes usage across the organization. Independently troubleshoot and resolve complex production issues, utilizing a profound understanding of distributed systems and containerization to prevent and mitigate incidents. Design and contribute to Kubernetes controllers and automation tools that enhance our infrastructure and developer experience. Improve our AWS estate, ensuring cost-effectiveness, security, and scalability while promoting best practices across teams. Collaborate with developers to enhance service observability, implement strategic metrics and alerting, and create informative dashboards for intricate systems. Construct and maintain CI/CD pipelines from inception for new use cases, manage migrations, and introduce new tooling as necessary. Engage with open-source projects by providing fixes, feedback, or developing new tools aligned with our mission. Mentor mid-level SREs and other engineers, fostering their growth in technical mastery and operational excellence.
Air Apps builds technology to help people plan, work, and live better. Founded in Lisbon in 2018 and still family-led, the company has grown to San Francisco and now London, remaining self-funded and reaching over 100 million downloads. Every day, teams at Air Apps challenge assumptions and develop AI-powered products that make a difference for users worldwide. The company values creativity and aims to improve how resources are managed and lives are impacted. Role Overview The Site Reliability Engineer (SRE) will focus on keeping Air Apps systems reliable, available, and scalable. This role connects software development and operations, using automation, monitoring, and performance tuning to reduce downtime and strengthen system resilience. This is a fully onsite position based in the London Metropolitan Area. Air Apps will consider relocation support for the right candidate. The SRE will work closely with cross-functional teams in a busy office setting. What You Will Do Design and implement systems that are scalable, reliable, and fault-tolerant across cloud platforms. Develop and maintain observability tools for monitoring, logging, and alerting (such as Prometheus, Grafana, Datadog, ELK). Automate infrastructure provisioning, deployment, and incident response using Infrastructure as Code tools like Terraform or CloudFormation. Improve system performance, scalability, and incident response processes to maximize uptime. Work with development and DevOps teams to strengthen system designs for reliability. Conduct root cause analysis and implement steps to prevent future failures. Design and maintain strategies for load balancing, failover, and disaster recovery to ensure high availability.
Palantir Technologies Inc.
Join Palantir Technologies as a Forward Deployed Infrastructure Engineer, where you will play a critical role in supporting the UK Government. In this position, you will leverage your technical expertise to implement innovative infrastructure solutions that address complex challenges. Collaborate with cross-functional teams to deliver high-impact projects and enhance national security capabilities.
Legal & General
Join Legal & General as a Graduate Site Reliability Engineer in London, where you will play a vital role in enhancing our technology operations. This entry-level position offers a unique opportunity to work alongside a dynamic tech team, ensuring our systems operate smoothly and efficiently.As a Graduate in this role, you will be involved in monitoring system performance, troubleshooting issues, and collaborating with various departments to implement automation solutions. Your contributions will help improve our infrastructure and deliver exceptional service to our clients.
About UsAt Heidi Health, we believe that healthcare deserves a more harmonious approach—one that ensures care remains continuous and deeply personalized. Our innovative AI Care Partner collaborates with healthcare providers to enhance the care experience for patients and clinicians alike.Our diverse team includes doctors, engineers, designers, researchers, and creatives, all dedicated to creating tools that empower clinicians to focus on what matters most: their patients.In just 18 months, we've reclaimed over 18 million hours for healthcare professionals, facilitating 73 million patient visits across 116 countries. Currently, our technology supports more than two million patient visits weekly worldwide.With nearly $100 million in funding, we are expanding our presence in the US, UK, Canada, and Europe, partnering with prestigious health systems such as the NHS, Beth Israel Lahey Health, and Monash Health.The OpportunityJoin our core Platform/SRE team, where you will take charge of production reliability. This role involves active incident response, on-call duties, system reliability, and daily operational oversight of Heidi’s platform.We welcome applications from mid-level SREs eager to embrace greater responsibility, as well as senior SREs who relish hands-on operational roles. This position emphasizes operational involvement and aims to maintain the health of real systems in production.Your ResponsibilitiesEngage in on-call and incident response: Address production incidents, assist in service restoration, and facilitate clear communication during incidents, escalating to leading incidents over time.Enhance operational reliability: Identify recurring issues and reliability risks, driving improvements through better alerting, automation, system enhancements, and process refinements.Manage production environment components: Operate and enhance Kubernetes clusters, cloud infrastructure, and core platform services, increasing responsibility as expertise grows.Boost observability: Refine dashboards, alerts, logs, and traces to enable earlier detection and faster diagnosis of issues, concentrating on actionable insights.Minimize operational toil: Automate repetitive tasks, streamline runbooks, and enhance tooling to facilitate smoother and safer on-call and daily operations.
About Neo4j Neo4j builds a graph intelligence platform used by 84 of the Fortune 100 and supported by the world’s largest graph community. The platform powers knowledge graphs for AI, delivers reliable graph capabilities across cloud environments, and integrates with a wide range of systems. Neo4j’s technology is designed for precision, accountability, and governance, helping organizations turn data into actionable insights for intelligent applications and AI systems. Engineered for seamless operation in any cloud, Neo4j supports dynamic, personalized, and autonomous AI solutions. The focus is on delivering swift results, contextual knowledge, and solutions that improve both customer and employee experiences. Our Vision Neo4j’s mission is to help the world understand data. As business and society become more interconnected, Neo4j’s technology enables organizations to find and understand relationships within their data. The company pioneered the graph database category and continues to lead in helping teams innovate and stay competitive. About the Site Reliability Engineering Team The Site Reliability Engineering (SRE) team supports Neo4j’s Database as a Service (DBaaS) product, Neo4j Aura. Aura operates globally across all major cloud providers, running hundreds of Kubernetes clusters and managing thousands of Neo4j instances in production. This team is redefining SRE within Neo4j Aura. Rather than simply reacting to incidents, the SRE group empowers teams to design for reliability from the start. The work centers on building tools, practices, and a culture that embed SRE principles into the foundation of Aura’s operations. Collaboration with product teams and a commitment to resilience and engineering excellence are central to the team’s approach. What You Will Do Automate for insight and scale: Build systems that enable fast, safe, and scalable troubleshooting across thousands of Neo4j instances. This includes developing internal tools that provide actionable insights. Location London
About xAIAt xAI, our mission is to develop advanced AI systems that can comprehend the universe and assist humanity in its quest for knowledge. Our dedicated team is small, highly motivated, and committed to engineering excellence, making it an ideal environment for individuals who thrive on challenges and curiosity. We foster a flat organizational structure where every employee plays a crucial role in driving our mission forward. We value initiative and excellence, rewarding those who consistently demonstrate strong work ethic and prioritization skills. Effective communication is essential, and all team members are expected to share their insights clearly and concisely.About the TeamYou will join a team responsible for the backend services that power our innovative products, including grok.com and our API. Our focus is on developing and maintaining highly scalable and reliable services capable of efficiently processing tens of thousands of queries per second, hosted across multiple Kubernetes clusters in both on-premises and cloud environments.About the RoleWe are looking for a candidate who meets the following criteria:In-depth expertise in Kubernetes.Proficiency with continuous deployment systems, including Buildkite and ArgoCD.Extensive experience with monitoring tools such as Prometheus, Grafana, and PagerDuty.Strong knowledge of infrastructure as code practices utilizing tools like Pulumi or Terraform.Familiarity with systems programming languages such as Rust, C++, or Go.Experience in traffic management and HTTP proxies, such as nginx and envoy.LocationThis position requires in-person attendance in London, UK. While we typically work from the office five days a week, we do provide flexibility for remote work when necessary. Candidates should be prepared to attend late meetings at least once a week to coordinate with our global teams.
Join our dynamic Release Engineering team at Kaluza as a Site Reliability Engineer. In this pivotal role, you will play a crucial part in enhancing our software development lifecycle by developing innovative engineering solutions that empower our software teams to deploy high-quality code efficiently. Your efforts will significantly boost engineering productivity through the optimization of testing, deployment, and release processes across all Kaluza engineering teams.
Sign in to browse more jobs
Create account — see all 10,199 results

