companyRubrik logo

Production Engineer/Site Reliability Engineer (Shift Basis)

RubrikBangalore
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Experience You’ll Need:Solid understanding of distributed system concepts. Hands-on experience working with production systems and environments, preferably within public cloud infrastructures. Familiarity with container orchestration platforms, particularly Kubernetes. Practical experience with infrastructure management tools like CloudFormation and Terraform. Strong analytical and problem-solving skills for diagnosing and resolving system and application issues. Proficient in data structures and algorithms, UNIX, networking, operating systems, and database systems such as MySQL. Proficient in Python programming. Excellent verbal and written communication skills.

About the job

About the Role:

Production Engineer
The Production Engineer at Rubrik is essential for achieving operational excellence. This position involves managing alerts, addressing outages, and leading incident resolution as an Incident Manager. The ideal candidate will possess hands-on experience in maintaining highly available critical services across multi-cloud environments while continuously enhancing processes through automation and intelligent monitoring.

What You’ll Do:

  • Become a vital part of a 24/7 Production Operations team dedicated to managing and supporting critical infrastructure and services in multi-cloud environments.
  • Supervise staging and production environments to ensure optimal uptime and reliability.
  • Implement and uphold comprehensive observability solutions for real-time monitoring, alerting, and metrics collection.
  • Lead incident management initiatives by promptly responding to alerts and outages, coordinating teams for timely resolutions.
  • Investigate recurring incidents to identify root causes, minimize toil, and enhance system resilience.
  • Design and develop automation tools to proactively detect, triage, and remediate production issues.
  • Maintain and update runbooks to facilitate incident response and address recurring issues.
  • Exhibit strong decision-making skills under pressure, effectively managing critical situations with urgency and composure.

About Rubrik

Join Rubrik in our mission to secure the world’s data. With our innovative Zero Trust Data Security™, we empower organizations to achieve resilience against cyber threats and malicious insiders.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.