About the job
Elevate Your Career with Loopio!
At Loopio, we are searching for a visionary Senior Engineering Leader to spearhead our Site Reliability Engineering (SRE), Infrastructure, and MLOps teams. In this pivotal role, you will architect the foundation of reliability, scalability, and cost efficiency for our platform's systems.
You will guide teams responsible for designing, building, and operating our production infrastructure, ensuring our services remain resilient, observable, and primed for expansion as we incorporate cutting-edge AI and automated workflows. Collaborating closely with Product Engineering, Security, and Data teams, you will facilitate rapid, secure delivery while upholding operational excellence.
Note: This position is an existing vacancy within our team.
Key Responsibilities
Leadership & Team Development
Lead and nurture multiple teams across SRE, Cloud Infrastructure, and MLOps.
Mentor engineering managers and senior contributors, cultivating a culture of ownership and high standards.
Foster a 'Platform-as-a-Product' mindset, ensuring infrastructure and ML tools empower the wider engineering organization.
Collaborate with Recruiting to attract and retain top-tier talent in the areas of cloud, reliability, and machine learning infrastructure.
Reliability & Operational Excellence
Oversee the operational health of production systems, focusing on availability, latency, and durability.
Define and refine SLIs, SLOs, and error budgets to promote data-driven reliability decisions.
Lead incident response efforts, championing blameless postmortems and systemic improvements to minimize

