About the job
About YouLend
YouLend is an innovative and swiftly expanding FinTech firm, recognized as the leading embedded financing platform for top-tier e-commerce platforms, technology companies, and Payment Service Providers globally. Our advanced software platform empowers partners to enhance their value propositions by offering customizable financing solutions under their own brand, allowing them to serve their merchants without any capital risks.
Backed by EQT, a prominent Private Equity firm, our company has experienced remarkable growth, boasting a +100% year-over-year increase since 2020. Our headquarters are in London, UK, with a presence in various European countries and the United States, supporting esteemed partners such as eBay, Amazon, Just Eat, Shopify, and Stripe.
Role Overview
As we establish a premier Observability function, we seek a passionate individual dedicated to uptime, insightful alerts, and sophisticated dashboards. If you have experience with on-call duties, managing alert noise, or debugging elusive issues across microservices during off-hours, we want you on our team!
This position transcends a typical “Platform Engineer” role; you will be intensely focused on observability, system reliability, and empowering developers. You will collaborate closely with teams to understand not just when failures occur, but also why.
Key Responsibilities:
- Designing and scaling on-call systems that engineers will appreciate being a part of.
- Enhancing Datadog monitoring, alerting, dashboards, and log pipelines for Kubernetes environments.
- Defining and managing SLOs, SLIs, and error budgets, ensuring teams adhere to them.
- Developing scorecards and software catalogs so engineers can easily track system health and ownership.
- Mentoring and enabling development teams to take charge of their own observability, alerts, and incident responses.
- Implementing chaos engineering practices to intentionally identify weaknesses.
- Fostering a culture of reliability through incident reviews, shared learnings, and transparency.
Ideal Candidate Qualifications:
- Proven production experience with observability tools, particularly Datadog, in cloud-native settings.
- Experience establishing monitoring and alerting across Kubernetes services.
- Demonstrated ability in building or scaling on-call systems within startup or large-scale environments.
- Expertise in minimizing alert fatigue and a passion for effective monitoring.

