companySierra logo

Software Engineer, Site Reliability (SRE)

SierraSan Francisco, CA
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

About the job

About Us

  • At Sierra, we are pioneering a transformative platform that empowers businesses to forge authentic customer experiences through AI technology. Headquartered in the vibrant city of San Francisco, we also boast a dynamic presence in Atlanta, New York, London, France, Singapore, and Japan.

  • Our operations are anchored in core values that shape our culture: Trust, Customer Obsession, Craftsmanship, Intensity, and Family. These principles guide our actions and are integral to our mission.

  • Our visionary founders, Bret Taylor and Clay Bavor, bring unparalleled expertise. Bret, currently the Board Chair of OpenAI, previously co-led Salesforce and served as CTO at Facebook, while Clay led numerous initiatives at Google, including AR/VR projects and Google Workspace.

Your Role

In your capacity as a Software Engineer on the Site Reliability team, you will play a crucial role in establishing and enhancing the reliability, observability, and scalability of Sierra’s AI-centric infrastructure. Collaborating closely with our engineering and product teams, your goal is to ensure our systems remain highly available, efficient, and primed for growth.

  • Lead the development of Sierra’s observability stack—including monitoring, alerting, logging, and tracing—to provide engineers with critical insights into system health and performance.

  • Collaborate with product and platform engineers to architect systems that prioritize reliability and scalability from the outset, not as an afterthought.

  • Design and implement robust, scalable, and secure cloud infrastructure on AWS, employing Terraform and cutting-edge DevOps tools.

  • Enhance the reliability and scalability of our LLM deployments, ensuring they operate efficiently and cost-effectively.

  • Drive improvements in deployment pipelines, CI/CD tooling, and incident management processes to minimize downtime and accelerate response times.

  • Define and cultivate SRE practices within Sierra, shaping culture, tooling, and best practices across the engineering organization.

Qualifications

  • Bachelor's degree in Computer Science or a related field, or equivalent experience.

  • Proven experience in Site Reliability Engineering or a similar role, with a strong understanding of cloud infrastructure (AWS).

  • Proficiency in Terraform and modern DevOps practices.

  • Experience with observability tools and techniques—monitoring, alerting, logging, and tracing.

  • Strong problem-solving skills with a focus on scalability and performance optimization.

  • Excellent collaboration and communication skills, with the ability to work effectively in a team environment.

About Sierra

At Sierra, we are at the forefront of innovation, leveraging AI to transform customer experiences. Our diverse and inclusive culture thrives on core values such as Trust, Customer Obsession, Craftsmanship, Intensity, and Family. Join us as we shape the future of business interactions around the globe.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.