company

Observability Platform Engineer (Site Reliability Focus)

YouLendLondon, England, United Kingdom
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

To excel in this role, you should have:Strong experience with observability frameworks and tools. A knack for creating intuitive dashboards and alerting systems. A collaborative spirit to work closely with developers and engineers. A willingness to adopt and promote chaos engineering practices.

About the job

About YouLend

YouLend is an innovative and swiftly expanding FinTech firm, recognized as the leading embedded financing platform for top-tier e-commerce platforms, technology companies, and Payment Service Providers globally. Our advanced software platform empowers partners to enhance their value propositions by offering customizable financing solutions under their own brand, allowing them to serve their merchants without any capital risks.

Backed by EQT, a prominent Private Equity firm, our company has experienced remarkable growth, boasting a +100% year-over-year increase since 2020. Our headquarters are in London, UK, with a presence in various European countries and the United States, supporting esteemed partners such as eBay, Amazon, Just Eat, Shopify, and Stripe.

Role Overview

As we establish a premier Observability function, we seek a passionate individual dedicated to uptime, insightful alerts, and sophisticated dashboards. If you have experience with on-call duties, managing alert noise, or debugging elusive issues across microservices during off-hours, we want you on our team!

This position transcends a typical “Platform Engineer” role; you will be intensely focused on observability, system reliability, and empowering developers. You will collaborate closely with teams to understand not just when failures occur, but also why.

Key Responsibilities:

  • Designing and scaling on-call systems that engineers will appreciate being a part of.
  • Enhancing Datadog monitoring, alerting, dashboards, and log pipelines for Kubernetes environments.
  • Defining and managing SLOs, SLIs, and error budgets, ensuring teams adhere to them.
  • Developing scorecards and software catalogs so engineers can easily track system health and ownership.
  • Mentoring and enabling development teams to take charge of their own observability, alerts, and incident responses.
  • Implementing chaos engineering practices to intentionally identify weaknesses.
  • Fostering a culture of reliability through incident reviews, shared learnings, and transparency.

Ideal Candidate Qualifications:

  • Proven production experience with observability tools, particularly Datadog, in cloud-native settings.
  • Experience establishing monitoring and alerting across Kubernetes services.
  • Demonstrated ability in building or scaling on-call systems within startup or large-scale environments.
  • Expertise in minimizing alert fatigue and a passion for effective monitoring.

About YouLend

YouLend is at the forefront of FinTech innovation, providing cutting-edge embedded financing solutions that empower e-commerce platforms, tech firms, and Payment Service Providers to enhance their offerings without financial risk. With exceptional growth and a strong backing from EQT, we are transforming the landscape of financing in digital commerce.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.