Site Reliability Engineer at Compass Education | Hawthorn, Victoria
Compass Education
Full-time|On-site|Hawthorn, Victoria, Australia Join us in reshaping the future of education.At Compass Education, we are dedicated to revolutionizing the school experience for all stakeholders - educators, students, families, and administrators. Our innovative technology solutions empower educational institutions to prioritize what truly matters: fostering learning, growth, and success.Our commitment to this mission has propelled us into a global scale-up, currently supporting over 5,000 schools across three countries with a dedicated team of more than 300 professionals. Our comprehensive school management platform is transforming how educational communities connect, communicate, and operate.We are seeking a skilled Site Reliability Engineer (SRE) to join our Technology team in Melbourne to help maintain the stability, scalability, and security of our platform as we expand.Role Overview & Team DynamicsReporting directly to the Head of Technology, you will concentrate on production operations, incident response, and ongoing reliability enhancements.Your role will involve hands-on work within live environments, collaborating closely with Engineering, Product, and Support teams to ensure our services are reliable, observable, and safe to modify.This position is ideal for an engineer who is passionate about solving genuine production challenges, improving systems over time, and taking ownership of reliability outcomes.Key Responsibilities:Infrastructure & AutomationManage and enhance our cloud infrastructure to guarantee the stability, scalability, and security of systems as user demand increases.Enhance environment consistency and deployment safety through better configuration and automation.Minimize operational toil by automating repetitive tasks and improving tools.Observability & MonitoringDevelop and refine monitoring, alerting, and logging systems to identify issues early and lessen customer impact.Enhance dashboards and production visibility for Engineering teams.Set higher standards for observability before services go live.Production & Incident ManagementEngage in on-call duties and respond to incidents in a systematic and composed manner.Lead less complex incidents from start to finish while supporting more significant events.Participate in post-incident reviews and implement systemic improvements.Reliability, Resilience & RiskAssist in enhancing service reliability targets and decreasing repeat incidents.Support capacity planning, performance optimization, and disaster recovery readiness.Identify operational and security risks, contributing to preventive controls.
Feb 27, 2026