companyPendo logo

Site Reliability Engineer at Pendo | Herzliya, IL

PendoHerzliya, IL
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Minimum Qualifications Experience with cloud infrastructure management tools such as Ansible or Terraform. Proficiency in programming languages such as Go or Python, with a willingness to learn additional languages as required. Ability to conceptualize and implement solutions for complex reliability and performance challenges. Strong analytical and critical thinking skills. Experience with incident management and troubleshooting in production environments.

About the job

The Site Reliability Engineering (SRE) team at Pendo plays a crucial role in provisioning and overseeing cloud infrastructure throughout the development and production lifecycle for all product initiatives. We collaborate closely with developers and product managers to guarantee that our products are not only reliable and high-performing but also cost-efficient. Our platform leverages Google Kubernetes Engine (GKE) alongside various Google technologies including Memorystore, Cloud Datastore, PubSub, Cloud Functions, BigQuery, and Vertex AI, in addition to services from vendors like Amazon SES.

In the development phase, SREs ensure that developers have stable and efficient continuous integration and release pipelines, as well as development environments enabling swift delivery of new features. In production, SREs handle Tier 1 on-call duties and incident management, supporting a high-throughput platform that processes over 35 billion events daily. To maintain reliability for our customers, SREs work in tandem with developers and product managers to define service level objectives, analyze failure scenarios, and design systems that effectively balance cost with reliability. Additionally, SREs partner with the Information Security team to secure our cloud infrastructure, ensuring compliance with industry standards like SOC 2.

Key Responsibilities

  • Develop high-quality infrastructure-as-code that automates the provisioning, deployment, scaling, and monitoring of Pendo’s infrastructure to ensure reliability and performance.
  • Create maintainable code focused on product functionality, with an emphasis on operations, scalability, resilience, and monitoring.
  • Collaborate with fellow engineers to ensure that new services are well-designed, properly monitored, and accompanied by clear SLIs and achievable SLOs.
  • Troubleshoot production issues, quickly identify mitigation strategies, and implement preventive measures.
  • Maintain and automate runbooks for manual tasks wherever feasible.
  • Proactively monitor our capacity, quotas, and other performance limits to plan for growth effectively.
  • Engage in a 24x7 on-call rotation to manage product availability issues and urgent customer support escalations.

About Pendo

Pendo is a forward-thinking technology company dedicated to providing innovative solutions that enhance the user experience. Our team is passionate about leveraging advanced technology to create impactful products that drive success for our clients.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.