Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Experience
Qualifications
Proven experience in Linux system administration and network troubleshooting. Strong background in automation and scripting (e.g., Python, Bash). Experience with cloud infrastructure and services. Ability to work collaboratively in a fast-paced environment. Excellent problem-solving skills and attention to detail.
About the job
As a Staff Site Reliability Engineer at Zscaler, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based security services. You will engage in troubleshooting complex Linux and network issues while implementing automation solutions to enhance operational efficiency.
Your expertise will contribute to our mission of delivering unparalleled security solutions to our clients.
About Zscaler
Zscaler is a leading cloud security company that enables organizations to securely transform their networks and applications. Our innovative solutions are trusted by thousands of customers globally. Join us to be part of a pioneering team that is redefining the future of security.
Similar jobs
1 - 20 of 978 Jobs
Search for Production Engineer Site Reliability Engineer Shift Basis
About the Role:Production EngineerThe Production Engineer at Rubrik is essential for achieving operational excellence. This position involves managing alerts, addressing outages, and leading incident resolution as an Incident Manager. The ideal candidate will possess hands-on experience in maintaining highly available critical services across multi-cloud environments while continuously enhancing processes through automation and intelligent monitoring.What You’ll Do:Become a vital part of a 24/7 Production Operations team dedicated to managing and supporting critical infrastructure and services in multi-cloud environments.Supervise staging and production environments to ensure optimal uptime and reliability.Implement and uphold comprehensive observability solutions for real-time monitoring, alerting, and metrics collection.Lead incident management initiatives by promptly responding to alerts and outages, coordinating teams for timely resolutions.Investigate recurring incidents to identify root causes, minimize toil, and enhance system resilience.Design and develop automation tools to proactively detect, triage, and remediate production issues.Maintain and update runbooks to facilitate incident response and address recurring issues.Exhibit strong decision-making skills under pressure, effectively managing critical situations with urgency and composure.
About the Role:Production EngineerThe Production Engineer at Rubrik is pivotal in ensuring operational excellence, managing alerts, addressing outages, and spearheading incident resolution as an Incident Manager. This position demands hands-on expertise in maintaining highly available critical services across multi-cloud environments while fostering continuous improvements through automation and intelligent monitoring.What You Will Do:Become a key member of a 24/7 Production Operations team dedicated to managing and supporting vital infrastructure and services across multi-cloud environments.Supervise staging and production environments to guarantee maximum uptime and reliability.Deploy and maintain comprehensive observability solutions for real-time monitoring, alerting, and metrics collection.Lead incident management initiatives by promptly responding to alerts and outages, coordinating teams for swift resolution.Investigate recurring incidents to identify root causes, mitigate toil, and enhance system resilience.Design and develop automation tools to proactively detect, triage, and rectify production issues.Update and maintain runbooks to facilitate incident response and address recurring issues.Exhibit strong decision-making abilities under pressure, managing critical situations with urgency and composure.
Sumo Logic seeks a Staff Site Reliability Engineer based in Bangalore, Karnataka, India. The main focus of this position is to maintain and enhance the reliability and performance of company systems. Collaboration with development teams is central, especially when resolving operational issues and building solutions that keep systems stable. Key Responsibilities Partner with engineers to boost system reliability and maximize uptime. Create and improve monitoring and automation tools to support operational goals. Diagnose and resolve operational challenges as they occur. Contribute to optimizing performance throughout the infrastructure.
Roles and ResponsibilitiesGuarantee the reliability, availability, and optimal performance of our systems and services.Automate and optimize operations and processes for greater efficiency.Continuously monitor system health, identify bottlenecks, and proactively resolve potential issues.Collaborate with development teams to enhance system architecture and performance.Conduct thorough post-incident reviews and implement necessary improvements.Develop and maintain infrastructure as code using industry-standard tools like Terraform and Ansible.
Role Overview Black Duck Software is looking for a Senior Site Reliability Engineer in Bangalore. This role focuses on maintaining the reliability, availability, and performance of our systems. Collaboration with development teams is central to the work, with an emphasis on building and supporting scalable infrastructure. What You Will Do Work with developers to design, implement, and maintain scalable systems. Troubleshoot production issues and identify long-term solutions. Strengthen the resilience of our platform through process and technical improvements. Promote a culture of continuous improvement across teams.
Join the UiPath TeamThe team at UiPath is passionate about harnessing the transformative potential of automation to redefine the way the world operates. We are dedicated to developing industry-leading enterprise software that empowers organizations.To realize this vision, we seek individuals who are inquisitive, motivated, generous, and authentic. We value those who thrive in a dynamic, fast-paced environment and who genuinely care—about their colleagues, the mission of UiPath, and the broader impact of our work.Are you ready to make a difference?Your RoleAs a Principal Site Reliability Engineer at UiPath, you will play a pivotal role in enhancing the reliability of our expansive, cloud-native systems. This position requires a comprehensive understanding of the full reliability spectrum, going beyond any single domain. You will define and drive the architecture, scalability, measurement, and automation of reliability across our systems.This role focuses on shaping the reliability practices at UiPath rather than merely reacting to outages or coding. You will collaborate with engineering and platform teams to integrate reliability into our systems, workflows, and organizational culture. Your contributions will elevate our standards for monitoring, automation, and ensuring our systems can withstand real-world loads and failures.You will take ownership of service reliability, observability, automation, and continuous improvement initiatives, partnering with teams in Romania and India as necessary.Your Responsibilities at UiPathComprehensive Reliability Ownership: Develop and refine the reliability strategy for our distributed systems, ensuring a balance of availability, performance, velocity, and cost through well-defined SLIs/SLOs and error budgets.Incident Management & Operational Excellence: Lead and actively participate in high-severity incidents, driving structured troubleshooting in uncertain situations and ensuring sustainable systemic enhancements.Observability & Operational Insights: Advocate for robust observability practices to make service health and performance risks visible and actionable.Automation, Tooling & Engineering Discipline: Automate manual operational tasks through effective tooling and self-service options while applying disciplined engineering methodologies.Infrastructure, Cloud & IaC: Champion reliable and scalable cloud infrastructure utilizing Infrastructure as Code, collaborating with platform teams to establish best practices.Technical Leadership & Organizational Impact: Influence strategic decisions to improve reliability outcomes and mentor team members to foster a culture of excellence.
As a Staff Site Reliability Engineer at Zscaler, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based security services. You will engage in troubleshooting complex Linux and network issues while implementing automation solutions to enhance operational efficiency.Your expertise will contribute to our mission of delivering unparalleled security solutions to our clients.
Please note that we will only accept candidates who possess the appropriate rights and documentation for employment in India.About Us:Axi is a premier global provider specializing in margin and deliverable Foreign Exchange, Contracts for Difference (CFDs), and Financial Spread Betting. Our evolution into a world-class, multifaceted brokerage is marked by a presence across six regions and significant investments in cutting-edge trading technology, designed to deliver the most comprehensive trading experience for clients ranging from novices to institutional investors.Your Role:As a Site Reliability Engineer, you will be pivotal in ensuring the availability, reliability, and operational excellence of Axi's technology infrastructure. You will design, implement, and maintain sophisticated monitoring, alerting, and log management solutions. Collaborating closely with Technology teams throughout the Development and Operations phases, your goal is to proactively identify and address any business-impacting incidents before they are reported by affected users, ensuring thorough observability and analysis through effective log management.Your Responsibilities:Act as the Product Owner for Monitoring and Observability within Axi's Technology Operations Environment.Evaluate the current environment and propose a roadmap for optimizing product offerings while managing the lifecycle of existing products.Support technology delivery teams through all product delivery phases by gathering requirements, producing detailed designs, conducting PoCs, and architecting solutions.Tweak and refine health rules while maintaining existing monitoring solutions.Minimize toil by documenting and automating repeatable processes.Communicate ideas and designs effectively to both technical and non-technical stakeholders.Consistently document processes and maintain an up-to-date knowledge base of your product expertise.
At Emergent Labs Inc., we are pioneering the future of software development by creating autonomous coding agents that revolutionize traditional programming methods. Our innovative systems can generate, test, and deploy production applications directly from plain-language commands, allowing for a seamless development experience.Since our public launch, we have achieved remarkable milestones, reaching $100 million in Annual Recurring Revenue (ARR) within just 8 months. Our platform has empowered over 6 million users across more than 190 countries to build over 6.5 million applications. With the backing of renowned investors like Khosla Ventures, SoftBank, Google, Lightspeed, Prosus, Together, and Y Combinator, we have raised over $100 million to further our mission.We are committed to tackling the complexities of AI-driven software creation, ensuring correctness, reliability, security, and scalability in production environments. Our team consists of seasoned professionals, including repeat founders, Olympiad medalists, and alumni from IIT and IIM, as well as leaders from tech giants like Google, Amazon, and Dropbox.If you are a builder eager to have ownership, work at speed, and make a global impact, we want you on our team!
Veeam is a leading provider of data and AI solutions, dedicated to helping organizations protect and manage their data effectively. Recognized as a pioneer in data resilience and security posture management, we empower businesses to navigate the complexities of identity, data, security, and AI risk. With our headquarters in Seattle and operations in over 30 countries, Veeam proudly safeguards the operations of more than 550,000 customers globally. Join our dynamic team and be part of a transformative journey as we advance together, fostering growth, learning, and making a significant impact for renowned brands around the world.About the RoleAs a Staff Site Reliability Engineer, you will take on a pivotal role as a hands-on technical leader within our Site Reliability Engineering (SRE) team. Your expertise will guide senior engineers, influence product development efforts, and ensure our systems are constructed to be reliable, scalable, and observable from the ground up.You will spearhead strategic initiatives, mentor peers in SRE practices, and help define architectural best practices across our platform. This role is crucial for aligning teams, enforcing high standards, and scaling SRE principles globally at Veeam.What You’ll DoReliability Engineering & Resilience:Serve as a technical authority, mentoring senior engineers and guiding design decisions to enhance service reliability and resilience.Lead the establishment and enforcement of Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets; ensure adherence across engineering teams.Collaborate with fellow staff members across teams to unify strategy and promote shared reliability standards and objectives.Engage with development and product teams to proactively design for failure, construct resilient architectures, and operationalize reliability from inception.Observability & Operational Excellence:Promote the organization-wide adoption of observability best practices and tools.Ensure that metrics, logs, and traces yield deep, actionable insights throughout systems.Lead complex incident responses, conduct postmortems, and drive systemic reliability enhancements.Encourage and uphold a blameless culture of learning and continuous improvement.
Veeam is recognized as the premier Data and AI Trust Company, dedicated to assisting organizations in comprehending, securing, and fortifying their data and AI systems. As the leading entity in data resilience and security posture management, Veeam is designed to address the convergence of identity, data, security, and AI risk. Our headquarters are in Seattle, and we operate in over 30 countries, safeguarding the data of more than 550,000 customers globally who rely on Veeam to maintain business continuity. Join us as we advance together, fostering growth, learning, and making a significant impact for some of the world’s most renowned brands.We are seeking a Senior Software Engineer - Reliability to take on a pivotal role as a hands-on technical leader within our Site Reliability Engineering (SRE) team. In this position, you will mentor senior engineers, influence product development, and ensure that our operational systems are designed for reliability, scalability, and observability from the ground up.Your responsibilities will include driving strategic initiatives, mentoring others in SRE practices, and defining architectural best practices across our platform. This role is crucial for aligning teams, maintaining high standards, and scaling SRE principles globally within Veeam.Your tasks will include:Reliability Engineering & ResilienceDesign and enhance infrastructure to ensure high availability, fault tolerance, and scalability across public clouds, starting with Azure and planning expansion to other providers.Establish and uphold Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets to define and enforce reliability goals.Lead incident response initiatives, conduct thorough analysis, facilitate blameless postmortems, and host sharing sessions to maximize learning throughout our engineering team, driving improvements across the socio-technical engineering ecosystem.Observability & Operational ExcellencePromote deep observability practices, ensuring telemetry, logs, and metrics are effectively utilized to enhance our operational insights.
Join our dynamic team of innovators at New Relic, where we are committed to redefining the future of observability. Our platform empowers organizations to excel in an AI-driven landscape by providing deep insights into their complex systems. As we broaden our global presence, we are seeking dedicated individuals passionate about optimizing digital applications for top-tier companies. Embark on your career journey with us!Your OpportunityAs a Senior Software Engineer in the Container Fabric (CF) organization, you will play a vital role in enhancing New Relic's global internal platform. We are searching for an operations-focused engineer with 5-7 years of experience to connect high-performance Go development with expansive Kubernetes orchestration. You will assume a leadership role, guiding critical projects and mentoring junior engineers while ensuring the reliability of our global fleet.What You'll DoArchitectural Leadership: Spearhead the design and implementation of internal tools using Golang, with a focus on Kubernetes Operators and Controllers to streamline resource management.Platform Orchestration: Navigate complex infrastructure transitions.Operational Excellence: Own incident responses, create comprehensive retrospectives, and implement systemic safeguards using advanced overcommit strategies.
ABOUT THE TEAM The Production Engineering team at Rubrik plays a crucial role in ensuring the availability and reliability of mission-critical platforms across expansive, multi-cloud environments. We serve as the cornerstone of operational excellence, managing incident responses, outages, observability, and continuous enhancement. Our team collaborates closely with Site Reliability Engineering (SRE) and Engineering units to proactively identify risks, minimize operational toil, mitigate outages, and construct resilient systems. We thrive in complex, high-pressure situations, utilizing sound technical judgment while relentlessly improving through learning, ownership, and accountability. ABOUT THE ROLE As a Senior Engineering Manager for Production Engineering, you will lead and nurture a high-impact team responsible for sustaining highly available, business-critical services. You will drive the technical roadmap, strategy, and execution while acting as a senior escalation point during significant incidents. Your role involves setting a compelling vision for the team, coaching engineers to handle high-pressure scenarios confidently. This position demands strong technical expertise, decisive decision-making skills, and the capability to collaborate across teams to enhance system reliability and operational maturity. What You’ll Do Lead the Production Engineering team, supporting critical infrastructure and services across multi-cloud environments, aligning with EST timezone (4:00 PM – 1:00 AM IST). Own the operational excellence for production environments by establishing robust processes, standards, and accountability for availability and reliability. Promote a metrics-driven culture focused on continuous improvement, driving initiatives to enhance operational KPIs such as MTTA/MTTR. Encourage a strong collaborative mindset by working closely with leadership to align operational priorities with business objectives and reliability targets. Develop team members through coaching, mentoring, and career advancement, empowering senior contributors to achieve maximum impact. Plan and manage on-call rotations, escalation protocols, and resource availability to guarantee sustainable support for mission-critical systems. Implement strong execution rigor through sprint planning, prioritization, and accountability, holding teams to elevated standards of delivery and performance. Experience You’ll Need A passionate leader dedicated to building technically proficient teams, with over 12 years of experience in Software Development Engineering.
About Tekion:Tekion is at the forefront of transforming the automotive industry, introducing unprecedented innovation to a sector that has remained stagnant for over half a century. Our pioneering cloud-native automotive platform, featuring the groundbreaking Automotive Retail Cloud (ARC), Automotive Enterprise Cloud (AEC), and Automotive Partner Cloud (APC), seamlessly connects every facet of the automotive retail ecosystem. By leveraging state-of-the-art technology, big data analytics, machine learning, and AI, we foster deep engagement among OEMs, retailers, and consumers, elevating the automotive retail experience to unparalleled levels. With a dynamic workforce of nearly 3,000 professionals across North America, Asia, and Europe, Tekion is redefining the landscape of automotive technology.Our Vision:At Tekion, we are passionate about shaping a future where technology simplifies, secures, and revolutionizes the buying, selling, and servicing of vehicles. Our culture thrives on collaboration and innovation, driven by an unwavering commitment to security, reliability, and scalability as we expand our global presence.Team Overview:Join our high-impact Cloud Infrastructure Automation and SRE team within Cloud Engineering, where you'll be pivotal in constructing and maintaining secure, reliable, scalable, resilient, and self-service infrastructure platforms for Tekion. Collaborating closely with the Cloud Security, Product Security, and Application teams, you'll establish and uphold security best practices and automation-first principles across all cloud environments.
Join ConnectWise, a global leader in software solutions, with a dedicated team of over 3,000 professionals across North America, EMEA, and APAC. We empower more than 45,000 technology solution partners to enhance their business management, streamline sales, automate service delivery, and control technology remotely, ensuring they deliver outstanding customer experiences.Our strength lies in our connections—among colleagues and within our community. We welcome diverse talents and perspectives, fostering a culture of innovation and collaboration.We value game-changers, innovators, and those passionate about culture. Here, every moment is an opportunity for growth and recognition.Are you ready to discover how you can make a significant impact at ConnectWise?
Join our dynamic team at Bosch Group as a Senior SAP Basis Consultant. In this role, you will leverage your expertise in SAP Basis to support and optimize our SAP landscape, ensuring high availability and performance of our systems. You will collaborate with cross-functional teams to implement best practices and innovative solutions.
Join our dynamic team at Gleanwork as a Technical Support Engineer during the night shift (9:30 PM - 6:30 AM IST). In this role, you will be the first line of support for our clients, troubleshooting technical issues and ensuring customer satisfaction. This position is ideal for tech-savvy individuals who enjoy problem-solving and providing excellent customer service.
As an SAP Basis Consultant at Bosch Group, you will play a crucial role in managing and optimizing our SAP environments. You will be responsible for system installations, configuration, and performance tuning, ensuring the stability and efficiency of SAP landscapes.This is an exciting opportunity for individuals looking to advance their careers in a dynamic and innovative company.
Join us at Pure Storage as we revolutionize the data storage landscape. This is an unparalleled opportunity to lead with innovative ideas, evolve alongside a talented team, and make a significant impact in the tech world.If you're passionate about harnessing technology to create solutions that matter, we invite you to be a part of our journey.THE ROLEAs a Senior Production Engineer, you'll architect a seamless developer experience. Your mission is to create and implement high-performance, scalable productivity tools that enable our engineers to innovate at full speed. This senior leadership position involves transforming legacy systems into modern, cohesive solutions, ensuring our development ecosystem matches the sophistication of the technology we produce.WHAT YOU'LL DOArchitect the Unified "Golden Path": Spearhead the modernization and integration of legacy developer tools into high-performance solutions utilizing Go and Python.Engineer OS-Agnostic Tooling: Create and uphold essential tools that work flawlessly across Windows, Linux, and Mac, providing a consistent experience for our global engineering team.Scale Testing Capabilities: Collaborate with testing and product teams to develop tools that can manage extreme scale and corruption testing across diverse storage protocols.Embed Diagnostic Intelligence: Ensure each tool is built with robust testability and a "triage-first" design for swift, automated diagnosis of complex issues within distributed systems.Lead Technical Evolution: Identify gaps in tooling proactively and mentor junior engineers while conducting high-level design and code reviews that elevate the technical standards across the Platform Engineering organization.WHAT YOU BRINGDeep Systems Engineering Expertise: Extensive experience in complex systems software, including Distributed Systems, Linux Kernel development, and more.
As a Product Support Engineer III at Zscaler, you will play a crucial role in ensuring customer satisfaction by providing top-notch technical support for our innovative security solutions. You will troubleshoot complex issues, collaborate with cross-functional teams, and work directly with customers to resolve their queries efficiently.Your responsibilities will include analyzing customer problems, developing solutions, and documenting processes. You will leverage your expertise to improve product functionality and contribute to the overall enhancement of our services.
Mar 27, 2026
Sign in to browse more jobs
Create account — see all 978 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.