About the job
Okta, Inc. helps organizations manage identity securely in a rapidly changing landscape. The technical operations team is dedicated to keeping systems available and resilient, with a strong focus on automation and reliability.
This Site Reliability Engineering Manager position is based in Bengaluru. The role leads a team of SREs responsible for maintaining and improving Okta’s core infrastructure. Success in this position requires a hands-on leader who values automation, learns quickly, and is committed to both reliability and security.
What you will do
- Mentor, manage, and guide a diverse team of SREs.
- Promote security best practices and drive projects that strengthen Okta’s infrastructure security.
- Respond to production incidents, resolve issues rapidly, and find ways to prevent future problems.
- Diagnose and troubleshoot complex production issues to maintain system reliability and performance.
- Collaborate with stakeholders across Okta to ensure new capabilities meet goals for reliability, security, and delivery speed.
- Work with recruiting and HR to help attract and retain top SRE talent.
- Monitor key metrics such as vulnerability scans, security posture, cloud costs, recovery point objectives (RPO), recovery time objectives (RTO), and toil overhead, making sure projects improve these measures.
- Support a 24/7 online environment as part of an on-call rotation.
What sets you apart
- Proactive mindset: identify and resolve problems as they arise.
- Commitment to helping engineering peers grow, leading by example.
- Extensive experience managing teams in large-scale production environments, especially with Java/Tomcat and containerized services on AWS (such as EC2, ECS, KMS, Kinesis, RDS) or similar cloud platforms.

