About the job
Future Secure AI creates AI Co-Workers that help enterprises automate operational tasks. The team builds production systems designed to handle real-world scale and reliability. Daily work emphasizes disciplined engineering, resilience, and a culture that encourages both collaboration and individual growth. Leadership maintains an entrepreneurial approach and remains accessible to support employees.
Role overview
The Site Reliability Engineer, based in Sydney, will design, build, and maintain the infrastructure that powers AI Co-Workers. This position works closely with product, AI, and engineering teams. The role involves taking responsibility for system reliability throughout the entire lifecycle.
Key responsibilities
- Design, build, and manage reliable production infrastructure for AI Co-Workers.
- Oversee Kubernetes-based platforms for deploying and running AI workloads.
- Create and maintain infrastructure as code using Terraform.
- Implement and manage Helm-based deployment workflows.
- Define, measure, and improve system reliability using SLIs, SLOs, and SLAs.
- Participate in on-call rotations, handle incident response, conduct root cause analysis, and contribute to post-mortem reviews.
- Reduce operational toil through automation and engineering improvements.
- Develop and enhance observability, including monitoring, logging, and alerting.
- Work with engineers to keep systems resilient, scalable, and secure.
- Manage tasks across build, deploy, and operate phases of the software lifecycle.

