About the job
Empower Every Identity, from AI to Human
Identity is the cornerstone of unlocking AI's potential. At Okta, we secure AI by creating a trustworthy, neutral infrastructure that allows organizations to confidently navigate this transformative era. This mission demands an unwavering commitment to addressing intricate challenges with significant real-world implications. We seek innovative builders who act with speed and urgency and execute with exceptional proficiency.
This is your chance to engage in work that can define your career. We are fully dedicated to this mission. If you share this passion, we want to hear from you.
Join Us in Securing Every Identity, from AI to Human
Okta is at the forefront of providing a superior authentication experience for hundreds of millions globally. Our focus on reliability forms the bedrock of our product, with a strong commitment to surpassing customer expectations for availability being a fundamental engineering priority. As a Senior Site Reliability Engineer, you will be part of our SRE team, ensuring our production systems are not only fully operational but also resilient, scalable, and poised for remarkable growth. This role goes beyond mere maintenance; it is about playing a significant role in enhancing the core robustness and resilience of our platform. You will be a proactive builder, developing solutions that inherently boost our system's reliability.
Your Responsibilities:
- Craft and develop custom software in Go to bolster the platform’s reliability and resilience.
- Collaborate with engineering teams to integrate reliability principles, enhancing the availability, performance, and observability of our services.
- Utilize your profound understanding of infrastructure and observability to pinpoint improvement opportunities within the product and implement effective solutions.
- Participate in our on-call rotation, providing swift, effective responses to critical incidents and utilizing your expertise to troubleshoot, mitigate, or accurately escalate production issues.
- Enhance our SRE tooling and processes, focusing on automation and operational efficiency.
- Establish, document, and promote reliability best practices throughout the organization.

