Staff Software Engineer Site Reliability Engineering Sre jobs in Bengaluru – Browse 2,448 openings on RoboApply Jobs
Staff Software Engineer Site Reliability Engineering Sre jobs in Bengaluru
Open roles matching “Staff Software Engineer Site Reliability Engineering Sre” with location signals for Bengaluru. 2,448 active listings on RoboApply Jobs.
2,448 jobs found
Staff Software Engineer - Site Reliability Engineering (SRE)
Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Experience
Qualifications
To be successful in this role, you should have:Proven experience as a Software Engineer with a focus on SRE. Expertise in cloud platforms (AWS, GCP, Azure). Strong programming skills in languages such as Python, Go, or Java. Experience with CI/CD pipelines and automation tools. Excellent problem-solving skills and a proactive attitude.
About the job
Join our dynamic team at Harvey as a Staff Software Engineer specializing in Site Reliability Engineering (SRE). In this pivotal role, you will be responsible for ensuring the reliability, availability, and performance of our services. You will collaborate closely with software development teams to build and maintain scalable systems, implement automation strategies, and drive incident response and post-mortem analysis.
The ideal candidate will possess a strong background in software engineering, a deep understanding of SRE principles, and a passion for operational excellence. You will play a crucial role in optimizing system performance and enhancing user experience.
About Harvey
Harvey is a leading technology company dedicated to delivering innovative solutions that elevate the operational capabilities of businesses. Our commitment to excellence and a collaborative work environment fosters creativity and growth, making us a preferred employer in the tech industry.
Join our dynamic team at Harvey as a Staff Software Engineer specializing in Site Reliability Engineering (SRE). In this pivotal role, you will be responsible for ensuring the reliability, availability, and performance of our services. You will collaborate closely with software development teams to build and maintain scalable systems, implement automation strategies, and drive incident response and post-mortem analysis.The ideal candidate will possess a strong background in software engineering, a deep understanding of SRE principles, and a passion for operational excellence. You will play a crucial role in optimizing system performance and enhancing user experience.
About AlphaSense: AlphaSense is the trusted partner for the world's leading companies, providing cutting-edge market intelligence that removes uncertainty from decision-making. Our platform leverages advanced AI to deliver critical insights from a vast array of trusted content, including equity research, company filings, event transcripts, expert calls, news, and trade journals.The recent acquisition of Tegus by AlphaSense in 2024 enhances our mission to empower professionals with AI-driven market insights. This collaboration will drive growth, innovation, and content expansion, enabling users to discover deeper insights from extensive content sets. Trusted by over 6,000 enterprise clients, including a significant portion of the S&P 500, AlphaSense was founded in 2011 and is headquartered in New York City, with a global workforce exceeding 2,000 employees across offices in the U.S., U.K., Finland, India, Singapore, Canada, and Ireland. Join us in shaping the future!About The Role:As we expand our Site Reliability Engineering (SRE) team, we are seeking an accomplished Staff Site Reliability Engineer to drive the future of reliability, scalability, and performance at AlphaSense. This high-impact, hands-on role will involve architecting core reliability platforms, leading incident responses, and fostering the adoption of SRE best practices throughout our global engineering organization.Your mission will be to configure our platform to meet the reliability standards of mission-critical systems, aiming for 99.99% uptime while continuously improving our systems and processes. This role transcends traditional system maintenance; it focuses on pioneering the platforms, practices, and culture that facilitate effective engineering scaling. You will serve as a mentor to fellow engineers, influence architectural decisions, and establish the technical standards for reliability throughout the organization.
About AlphaSense: AlphaSense is the trusted partner for leading companies across the globe, providing them with the intelligence needed to make informed decisions. Harnessing cutting-edge AI technology, AlphaSense aggregates valuable insights from a plethora of trustworthy sources, including equity research, company filings, event transcripts, expert calls, news articles, trade journals, and proprietary client research.In 2024, AlphaSense's acquisition of Tegus marks a significant step towards enhancing our mission of empowering professionals with AI-driven market intelligence. This strategic collaboration aims to innovate and expand content capabilities, enabling our users to discover even deeper insights from extensive datasets. With a clientele that includes over 6,000 enterprise customers, including a majority of the S&P 500, AlphaSense was founded in 2011 and is headquartered in New York City, employing over 2,000 professionals across various global offices in the U.S., U.K., Finland, India, Singapore, Canada, and Ireland. Join our dynamic team!About The Role:We are expanding our Site Reliability Engineering (SRE) team and seeking a highly skilled Staff Site Reliability Engineer. In this pivotal role, you will be instrumental in shaping the future of reliability, scalability, and performance at AlphaSense. This hands-on position involves designing and implementing core reliability platforms, leading incident response efforts, and fostering a culture of SRE best practices throughout our global engineering teams.Your mission will be to develop our platform to the rigorous reliability standards of mission-critical systems, targeting an ambitious 99.99% uptime while continuously refining our systems and processes. This role transcends traditional maintenance; you will be at the forefront of pioneering platforms, practices, and a culture that facilitates effective engineering scalability. You will serve as a mentor to fellow engineers, influence architectural decisions, and set the benchmark for reliability across the organization.
Cerebras Systems is at the forefront of AI innovation, engineering the world's largest AI chip, which is 56 times larger than traditional GPUs. Our revolutionary wafer-scale architecture delivers the computational power of dozens of GPUs on a single chip, simplifying programming and enabling users to run extensive ML applications seamlessly without managing multiple GPUs or TPUs.We proudly serve a diverse range of customers, including leading model laboratories, global corporations, and pioneering AI startups. Recently, we established a multi-year collaboration with OpenAI, aiming to scale up to 750 megawatts and revolutionize workloads with ultra-fast inference.Leveraging our innovative wafer-scale architecture, Cerebras Inference offers the fastest Generative AI solution globally, boasting speeds over 10 times quicker than conventional GPU-based hyperscale cloud inference services. This significant speed enhancement is transforming how users experience AI applications, facilitating real-time iterations and boosting intelligence through advanced computation.About The RoleWe are looking for a seasoned IT SRE Team Lead to establish and manage the reliability function for Cerebras' internal technology infrastructure.As the IT SRE Team Lead, you will oversee the availability, performance, and operational quality of the systems that Cerebras employees depend on daily, which include identity management, endpoint management, collaboration tools, SaaS applications, and internal networking. The ideal candidate will adopt a software engineering perspective in IT operations, treating corporate infrastructure as code, defining measurable SLOs, automating remediation processes, and relentlessly minimizing toil.You will build and lead a small, high-impact team of engineers responsible for developing tools, writing automation scripts, and troubleshooting issues as they arise. You will work closely with our security, networking, and infrastructure teams to ensure seamless operations.
6sense builds technology to help organizations grow, retain customers, and work more efficiently. The company encourages teams and individuals to reach their potential by supporting their goals with thoughtful tools and systems. Core values at 6sense include working as one team, staying curious, doing the right thing, owning outcomes, and creating a sense of belonging. Team members are expected to show initiative, act with integrity, and focus on delivering real value to customers. Meeting challenges directly and influencing the direction of the company’s technology are encouraged. Role overview The Site Reliability Engineering Manager, based in Bengaluru, leads a team dedicated to the scalability, reliability, and performance of 6sense’s main infrastructure and customer-facing services. This position blends technical leadership with operational oversight and people management. What you will do Mentor and guide a team of SREs who maintain and enhance critical systems Establish the direction for reliability engineering and define best practices Oversee system availability and performance, working to minimize downtime Encourage a culture of proactive problem-solving and continuous improvement Collaborate with engineering, product, and security teams to design systems that can scale with customer growth Impact and collaboration This leadership role is highly visible within 6sense. The SRE Manager helps shape how infrastructure supports both rapid company expansion and evolving customer needs.
Your Role: As the Engineering Manager for Site Reliability (SRE) at Moveworks, you will merge software and systems engineering to create and maintain large-scale, distributed, and fault-tolerant systems. Join us as a pivotal member of our SRE team in Bengaluru, where you will be instrumental in architecting and overseeing Moveworks' AI cloud infrastructure and strategy. In a rapidly growing environment, you will design and manage resilient and secure cloud infrastructure, enabling our products to operate reliably and allowing our engineering teams to rapidly build and release customer-facing features. You will collaborate with teams across platform, infrastructure, machine learning, search, data, DevOps, and frontend, building systems that empower these teams to deliver high-quality software promptly. This may involve enhancing CI/CD pipelines, enabling blue/green deployments, creating and managing canary environments, and reducing the risk of faulty code reaching production. Enhance the observability and reliability of Moveworks systems by developing and managing monitoring and alerting infrastructure. Improve debuggability by creating systems that facilitate issue resolution in production and analyze performance. Architect, design, and lead projects aimed at bolstering the reliability of our applications and systems. Serve as a technical leader for adjacent teams based in Bengaluru.
Sandisk is seeking a Staff Engineer specializing in Software Reliability Engineering based in Bengaluru. The position focuses on enhancing the reliability and performance of software products throughout the company. Key responsibilities Work closely with cross-functional teams to identify software issues and develop solutions Drive initiatives that improve system reliability and scalability Support ongoing efforts to ensure Sandisk’s software remains dependable and efficient Role overview This role centers on maintaining and improving the quality of software systems. Collaboration and problem-solving are essential, as the work involves addressing technical challenges that impact software stability and user experience.
Join Valtech as a Site Reliability Engineer - Monitoring Specialist and be at the forefront of experience innovation. Here, we empower you to challenge the norm and explore uncharted territories in technology. With 6+ years of expertise, you will play a crucial role in shaping digital solutions that transform industries.Our workplace is designed for continuous learning and meaningful impact. You will collaborate with a dynamic team, develop cutting-edge customer experiences, and drive innovation.Why Choose Valtech?We are the experience innovation company and a trusted partner for the world’s leading brands. We offer growth opportunities, a values-driven culture, and global career paths that allow you to shape the future of experience.
About UsAt The Economist Group (TEG), we are committed to fostering progress through innovation, independence, and analytical rigor. Our mission is to empower individuals and organizations to navigate the complex challenges and changes in the world. With our analytical expertise and evidence-based insights, we provide clarity and guidance to our clients and subscribers across 170 countries through our esteemed brands, including The Economist, Economist Impact, Economist Intelligence, and Economist Education.We are currently on the lookout for a dedicated and detail-oriented Site Reliability Engineer to join our expanding TechOps/SRE team. In this vital role, you will engage closely with Product, Engineering, and Software teams to enhance real-time visibility into our infrastructure, applications, and data systems.
Join Saviynt as a Staff Site Reliability Engineer, where you will play a critical role in enhancing the reliability and performance of our systems. You will work collaboratively with cross-functional teams to ensure seamless operations and create innovative solutions that drive efficiency.
Roku’s Platform Infrastructure team supports the systems behind one of the largest TV streaming platforms, serving over 100 million users and enabling billions in annual transactions. The team’s work underpins the reliability and performance of Roku’s services across the U.S., Canada, and Mexico. Role overview The Senior Software Engineer - Site Reliability Engineering position is based in Bengaluru, India. This role centers on applying SRE principles to maintain and improve cloud infrastructure and drive automation across Roku’s platform. The team works with technologies such as Kubernetes, Istio, Envoy, and various observability tools to operate at internet scale. What you will do Design and develop large-scale, reliable systems that support Roku’s streaming platform Apply SRE best practices to enhance system reliability and efficiency Automate infrastructure management and operational tasks Collaborate with engineers across departments to deliver solutions that impact the entire company Requirements Significant experience in Site Reliability Engineering or related software engineering roles Proven ability to design and build large-scale systems Hands-on experience with cloud infrastructure and automation Familiarity with technologies such as Kubernetes, Istio, Envoy, and observability tools Strong organizational skills, curiosity, and a drive to learn
Okta, Inc. helps organizations manage identity securely in a rapidly changing landscape. The technical operations team is dedicated to keeping systems available and resilient, with a strong focus on automation and reliability. This Site Reliability Engineering Manager position is based in Bengaluru. The role leads a team of SREs responsible for maintaining and improving Okta’s core infrastructure. Success in this position requires a hands-on leader who values automation, learns quickly, and is committed to both reliability and security. What you will do Mentor, manage, and guide a diverse team of SREs. Promote security best practices and drive projects that strengthen Okta’s infrastructure security. Respond to production incidents, resolve issues rapidly, and find ways to prevent future problems. Diagnose and troubleshoot complex production issues to maintain system reliability and performance. Collaborate with stakeholders across Okta to ensure new capabilities meet goals for reliability, security, and delivery speed. Work with recruiting and HR to help attract and retain top SRE talent. Monitor key metrics such as vulnerability scans, security posture, cloud costs, recovery point objectives (RPO), recovery time objectives (RTO), and toil overhead, making sure projects improve these measures. Support a 24/7 online environment as part of an on-call rotation. What sets you apart Proactive mindset: identify and resolve problems as they arise. Commitment to helping engineering peers grow, leading by example. Extensive experience managing teams in large-scale production environments, especially with Java/Tomcat and containerized services on AWS (such as EC2, ECS, KMS, Kinesis, RDS) or similar cloud platforms.
Why Choose Headout?We’re a Rocketship: 9-Figure Revenue, Record Growth, and ProfitableWith an impressive revenue of $130M and a presence in over 100 cities, Headout has achieved 18 months of profitability, making it the fastest-growing marketplace in the travel industry. We’ve secured over $60M from leading investors and are committed to building a sustainable business for the long term. Our growth story is just beginning!Our Mission MattersIn today's digital age, enhancing our human experiences is crucial. At Headout, we aim to provide the easiest, quickest, and most enjoyable way to explore real-life experiences—from immersive tours to museums and live events, we cover it all.Why Join Us Now?With a solid foundation and tremendous potential ahead, this is an exciting time to join Headout. Having reached profitability and gained momentum, we have only just begun to build. If you're seeking a role where your contributions will make a significant impact, now is the perfect time to join our team!Our CultureRevolutionizing the travel industry is challenging but incredibly rewarding. We value ownership, craftsmanship, and impact, and we're dedicated to doing the best work of our careers. If you're a builder who thrives on solving complex problems, you'll fit right in. Discover more about our unique values here. The RoleAs a Senior Site Reliability Engineer, you will oversee infrastructure management, working with Kubernetes clusters in the cloud, and optimizing workloads. Your responsibilities will include managing CI/CD pipelines, developing reusable workflows using GitHub Actions (or similar tools), conducting canary releases, and enhancing observability. You will design service-level dashboards, fine-tune alerts, and handle incident management across the organization. Additionally, you will enhance application performance through backend changes to optimize API and page performance, improve database efficiency, and eliminate bottlenecks. You will also contribute to platform tools by architecting scalable and efficient platforms for cross-pod use cases and improve developer velocity by building tools and workflows that enhance efficiency across engineering teams. Security responsibilities will include establishing guardrails on...
Join Point72 as a Site Reliability Engineer in Bengaluru, where you will play a pivotal role in ensuring the reliability, performance, and scalability of our systems. You will collaborate with talented engineers to automate processes, monitor system health, and implement best practices in site reliability engineering.
Role Overview Nexthink is hiring a Senior Site Reliability Engineer in Bengaluru. This position focuses on improving the reliability, performance, and scalability of Nexthink’s systems to support smooth client operations. What You Will Do Work with teams across the company to design and build solutions that strengthen system stability. Monitor systems and respond to incidents to minimize downtime and disruptions. Contribute to ongoing efforts that keep our infrastructure resilient and efficient. Location This role is based in Bengaluru.
Welcome to OktaAt Okta, we are redefining the future of digital identity. As The World’s Identity Company, we empower individuals to securely access technology, anytime, anywhere, across any device or application. Our innovative solutions, including the Okta and Auth0 Platforms, provide robust access management, secure authentication, and automation, placing identity at the forefront of business security and growth.We value diverse perspectives and experiences, seeking lifelong learners who can enrich our team with their unique insights. Join us in our mission to create a world where identity is truly yours.Our Workforce Identity Cloud Security Engineering group is on the lookout for a Senior Staff Site Reliability Engineer with a strong passion for DevSecOps, Infrastructure Security, and Site Reliability Engineering (SRE). You will be a part of a pioneering team that is not only delivering exceptional solutions but also setting new benchmarks in cloud security. If you possess a solid background in safeguarding large-scale, mission-critical infrastructure, we want to connect with you.As a Senior Staff Site Reliability Engineer, you will be instrumental in designing and developing security solutions that fortify our cloud infrastructure. We foster a culture of innovation, encouraging you to advocate for defense-in-depth strategies, adhere to industry security standards, and implement the principle of least privilege to elevate our security posture.Our Infrastructure Security team is distinguished by its unique blend of security expertise and the ability to design, implement, and deploy infrastructure across various cloud environments without compromising product performance. We are dedicated to enhancing our customers' safety and privacy by integrating security services with core Okta products.This role is critical in a dynamic, security-focused organization poised for substantial growth. You will serve as a liaison between the Security and Engineering teams, leveraging technical expertise to influence the security roadmap and focus on engineering security aspects across our services. Join us in revolutionizing the industry and making a significant impact!
About EarnInAt EarnIn, we are trailblazers in earned wage access, dedicated to creating solutions that provide immediate financial flexibility for individuals facing the challenge of living paycheck to paycheck. Our community members enjoy the freedom to access their earnings as they are earned, with opportunities to spend, save, and enhance their financial future without the burden of fees, interest, or credit checks.Our leadership team boasts a wealth of experience, backed by prestigious investors such as A16Z, Matrix Partners, DST, and Ribbit Capital, alongside a robust core business poised for significant growth. We are on an exciting trajectory and are eager to welcome world-class talent to join us in shaping our future.POSITION SUMMARYWe are committed to delivering an exceptional product experience for our community members. Collaborating closely with all teams, we share the responsibility of rapidly delivering production-ready features. Our focus includes building and contributing to infrastructure, reliability tools, and best practices that enable swift and safe deployments. We emphasize aspects such as effective alert management, comprehensive runbooks, clear Service Level Objectives (SLOs), and ensuring that deployments are seamless and uneventful. As a Senior Site Reliability Engineer, you will serve as a technical leader, designing, monitoring, and operating our production systems. Your attention will be on the overall service behavior, including reliability, performance, failure modes, and enhancing the engineering experience.This role is hybrid, based in our Bengaluru office, as part of our expanding operations. EarnIn offers a comprehensive benefits package, including healthcare, internet and cell phone reimbursements, a learning and development stipend, and opportunities for collaboration and travel to our Palo Alto HQ and Bangkok site. Our salary ranges are determined based on role, level, and location.
Join our team as a Staff Site Reliability Engineer dedicated to improving the reliability and performance of our systems. Collaborate with cross-functional teams to develop and implement innovative solutions that enhance our infrastructure and service offerings. Your expertise will help us create a robust platform that meets the needs of our users while ensuring scalable and efficient operations.
Join Saviynt as a Senior Site Reliability Engineer, where you will play a crucial role in ensuring the reliability, availability, and performance of our innovative solutions. You will collaborate with cross-functional teams to implement and maintain scalable infrastructure, automate deployments, and develop monitoring systems that enhance our operational resilience.
Join Saviynt as a Site Reliability Engineer and play a pivotal role in ensuring the reliability and performance of our cloud services. You will collaborate with development and operations teams to build and maintain scalable systems while implementing monitoring solutions to enhance system resilience. This position offers you the opportunity to contribute to a dynamic environment focused on continuous improvement and operational excellence.
Join our dynamic team at Harvey as a Staff Software Engineer specializing in Site Reliability Engineering (SRE). In this pivotal role, you will be responsible for ensuring the reliability, availability, and performance of our services. You will collaborate closely with software development teams to build and maintain scalable systems, implement automation strategies, and drive incident response and post-mortem analysis.The ideal candidate will possess a strong background in software engineering, a deep understanding of SRE principles, and a passion for operational excellence. You will play a crucial role in optimizing system performance and enhancing user experience.
About AlphaSense: AlphaSense is the trusted partner for the world's leading companies, providing cutting-edge market intelligence that removes uncertainty from decision-making. Our platform leverages advanced AI to deliver critical insights from a vast array of trusted content, including equity research, company filings, event transcripts, expert calls, news, and trade journals.The recent acquisition of Tegus by AlphaSense in 2024 enhances our mission to empower professionals with AI-driven market insights. This collaboration will drive growth, innovation, and content expansion, enabling users to discover deeper insights from extensive content sets. Trusted by over 6,000 enterprise clients, including a significant portion of the S&P 500, AlphaSense was founded in 2011 and is headquartered in New York City, with a global workforce exceeding 2,000 employees across offices in the U.S., U.K., Finland, India, Singapore, Canada, and Ireland. Join us in shaping the future!About The Role:As we expand our Site Reliability Engineering (SRE) team, we are seeking an accomplished Staff Site Reliability Engineer to drive the future of reliability, scalability, and performance at AlphaSense. This high-impact, hands-on role will involve architecting core reliability platforms, leading incident responses, and fostering the adoption of SRE best practices throughout our global engineering organization.Your mission will be to configure our platform to meet the reliability standards of mission-critical systems, aiming for 99.99% uptime while continuously improving our systems and processes. This role transcends traditional system maintenance; it focuses on pioneering the platforms, practices, and culture that facilitate effective engineering scaling. You will serve as a mentor to fellow engineers, influence architectural decisions, and establish the technical standards for reliability throughout the organization.
About AlphaSense: AlphaSense is the trusted partner for leading companies across the globe, providing them with the intelligence needed to make informed decisions. Harnessing cutting-edge AI technology, AlphaSense aggregates valuable insights from a plethora of trustworthy sources, including equity research, company filings, event transcripts, expert calls, news articles, trade journals, and proprietary client research.In 2024, AlphaSense's acquisition of Tegus marks a significant step towards enhancing our mission of empowering professionals with AI-driven market intelligence. This strategic collaboration aims to innovate and expand content capabilities, enabling our users to discover even deeper insights from extensive datasets. With a clientele that includes over 6,000 enterprise customers, including a majority of the S&P 500, AlphaSense was founded in 2011 and is headquartered in New York City, employing over 2,000 professionals across various global offices in the U.S., U.K., Finland, India, Singapore, Canada, and Ireland. Join our dynamic team!About The Role:We are expanding our Site Reliability Engineering (SRE) team and seeking a highly skilled Staff Site Reliability Engineer. In this pivotal role, you will be instrumental in shaping the future of reliability, scalability, and performance at AlphaSense. This hands-on position involves designing and implementing core reliability platforms, leading incident response efforts, and fostering a culture of SRE best practices throughout our global engineering teams.Your mission will be to develop our platform to the rigorous reliability standards of mission-critical systems, targeting an ambitious 99.99% uptime while continuously refining our systems and processes. This role transcends traditional maintenance; you will be at the forefront of pioneering platforms, practices, and a culture that facilitates effective engineering scalability. You will serve as a mentor to fellow engineers, influence architectural decisions, and set the benchmark for reliability across the organization.
Cerebras Systems is at the forefront of AI innovation, engineering the world's largest AI chip, which is 56 times larger than traditional GPUs. Our revolutionary wafer-scale architecture delivers the computational power of dozens of GPUs on a single chip, simplifying programming and enabling users to run extensive ML applications seamlessly without managing multiple GPUs or TPUs.We proudly serve a diverse range of customers, including leading model laboratories, global corporations, and pioneering AI startups. Recently, we established a multi-year collaboration with OpenAI, aiming to scale up to 750 megawatts and revolutionize workloads with ultra-fast inference.Leveraging our innovative wafer-scale architecture, Cerebras Inference offers the fastest Generative AI solution globally, boasting speeds over 10 times quicker than conventional GPU-based hyperscale cloud inference services. This significant speed enhancement is transforming how users experience AI applications, facilitating real-time iterations and boosting intelligence through advanced computation.About The RoleWe are looking for a seasoned IT SRE Team Lead to establish and manage the reliability function for Cerebras' internal technology infrastructure.As the IT SRE Team Lead, you will oversee the availability, performance, and operational quality of the systems that Cerebras employees depend on daily, which include identity management, endpoint management, collaboration tools, SaaS applications, and internal networking. The ideal candidate will adopt a software engineering perspective in IT operations, treating corporate infrastructure as code, defining measurable SLOs, automating remediation processes, and relentlessly minimizing toil.You will build and lead a small, high-impact team of engineers responsible for developing tools, writing automation scripts, and troubleshooting issues as they arise. You will work closely with our security, networking, and infrastructure teams to ensure seamless operations.
6sense builds technology to help organizations grow, retain customers, and work more efficiently. The company encourages teams and individuals to reach their potential by supporting their goals with thoughtful tools and systems. Core values at 6sense include working as one team, staying curious, doing the right thing, owning outcomes, and creating a sense of belonging. Team members are expected to show initiative, act with integrity, and focus on delivering real value to customers. Meeting challenges directly and influencing the direction of the company’s technology are encouraged. Role overview The Site Reliability Engineering Manager, based in Bengaluru, leads a team dedicated to the scalability, reliability, and performance of 6sense’s main infrastructure and customer-facing services. This position blends technical leadership with operational oversight and people management. What you will do Mentor and guide a team of SREs who maintain and enhance critical systems Establish the direction for reliability engineering and define best practices Oversee system availability and performance, working to minimize downtime Encourage a culture of proactive problem-solving and continuous improvement Collaborate with engineering, product, and security teams to design systems that can scale with customer growth Impact and collaboration This leadership role is highly visible within 6sense. The SRE Manager helps shape how infrastructure supports both rapid company expansion and evolving customer needs.
Your Role: As the Engineering Manager for Site Reliability (SRE) at Moveworks, you will merge software and systems engineering to create and maintain large-scale, distributed, and fault-tolerant systems. Join us as a pivotal member of our SRE team in Bengaluru, where you will be instrumental in architecting and overseeing Moveworks' AI cloud infrastructure and strategy. In a rapidly growing environment, you will design and manage resilient and secure cloud infrastructure, enabling our products to operate reliably and allowing our engineering teams to rapidly build and release customer-facing features. You will collaborate with teams across platform, infrastructure, machine learning, search, data, DevOps, and frontend, building systems that empower these teams to deliver high-quality software promptly. This may involve enhancing CI/CD pipelines, enabling blue/green deployments, creating and managing canary environments, and reducing the risk of faulty code reaching production. Enhance the observability and reliability of Moveworks systems by developing and managing monitoring and alerting infrastructure. Improve debuggability by creating systems that facilitate issue resolution in production and analyze performance. Architect, design, and lead projects aimed at bolstering the reliability of our applications and systems. Serve as a technical leader for adjacent teams based in Bengaluru.
Sandisk is seeking a Staff Engineer specializing in Software Reliability Engineering based in Bengaluru. The position focuses on enhancing the reliability and performance of software products throughout the company. Key responsibilities Work closely with cross-functional teams to identify software issues and develop solutions Drive initiatives that improve system reliability and scalability Support ongoing efforts to ensure Sandisk’s software remains dependable and efficient Role overview This role centers on maintaining and improving the quality of software systems. Collaboration and problem-solving are essential, as the work involves addressing technical challenges that impact software stability and user experience.
Join Valtech as a Site Reliability Engineer - Monitoring Specialist and be at the forefront of experience innovation. Here, we empower you to challenge the norm and explore uncharted territories in technology. With 6+ years of expertise, you will play a crucial role in shaping digital solutions that transform industries.Our workplace is designed for continuous learning and meaningful impact. You will collaborate with a dynamic team, develop cutting-edge customer experiences, and drive innovation.Why Choose Valtech?We are the experience innovation company and a trusted partner for the world’s leading brands. We offer growth opportunities, a values-driven culture, and global career paths that allow you to shape the future of experience.
About UsAt The Economist Group (TEG), we are committed to fostering progress through innovation, independence, and analytical rigor. Our mission is to empower individuals and organizations to navigate the complex challenges and changes in the world. With our analytical expertise and evidence-based insights, we provide clarity and guidance to our clients and subscribers across 170 countries through our esteemed brands, including The Economist, Economist Impact, Economist Intelligence, and Economist Education.We are currently on the lookout for a dedicated and detail-oriented Site Reliability Engineer to join our expanding TechOps/SRE team. In this vital role, you will engage closely with Product, Engineering, and Software teams to enhance real-time visibility into our infrastructure, applications, and data systems.
Join Saviynt as a Staff Site Reliability Engineer, where you will play a critical role in enhancing the reliability and performance of our systems. You will work collaboratively with cross-functional teams to ensure seamless operations and create innovative solutions that drive efficiency.
Roku’s Platform Infrastructure team supports the systems behind one of the largest TV streaming platforms, serving over 100 million users and enabling billions in annual transactions. The team’s work underpins the reliability and performance of Roku’s services across the U.S., Canada, and Mexico. Role overview The Senior Software Engineer - Site Reliability Engineering position is based in Bengaluru, India. This role centers on applying SRE principles to maintain and improve cloud infrastructure and drive automation across Roku’s platform. The team works with technologies such as Kubernetes, Istio, Envoy, and various observability tools to operate at internet scale. What you will do Design and develop large-scale, reliable systems that support Roku’s streaming platform Apply SRE best practices to enhance system reliability and efficiency Automate infrastructure management and operational tasks Collaborate with engineers across departments to deliver solutions that impact the entire company Requirements Significant experience in Site Reliability Engineering or related software engineering roles Proven ability to design and build large-scale systems Hands-on experience with cloud infrastructure and automation Familiarity with technologies such as Kubernetes, Istio, Envoy, and observability tools Strong organizational skills, curiosity, and a drive to learn
Okta, Inc. helps organizations manage identity securely in a rapidly changing landscape. The technical operations team is dedicated to keeping systems available and resilient, with a strong focus on automation and reliability. This Site Reliability Engineering Manager position is based in Bengaluru. The role leads a team of SREs responsible for maintaining and improving Okta’s core infrastructure. Success in this position requires a hands-on leader who values automation, learns quickly, and is committed to both reliability and security. What you will do Mentor, manage, and guide a diverse team of SREs. Promote security best practices and drive projects that strengthen Okta’s infrastructure security. Respond to production incidents, resolve issues rapidly, and find ways to prevent future problems. Diagnose and troubleshoot complex production issues to maintain system reliability and performance. Collaborate with stakeholders across Okta to ensure new capabilities meet goals for reliability, security, and delivery speed. Work with recruiting and HR to help attract and retain top SRE talent. Monitor key metrics such as vulnerability scans, security posture, cloud costs, recovery point objectives (RPO), recovery time objectives (RTO), and toil overhead, making sure projects improve these measures. Support a 24/7 online environment as part of an on-call rotation. What sets you apart Proactive mindset: identify and resolve problems as they arise. Commitment to helping engineering peers grow, leading by example. Extensive experience managing teams in large-scale production environments, especially with Java/Tomcat and containerized services on AWS (such as EC2, ECS, KMS, Kinesis, RDS) or similar cloud platforms.
Why Choose Headout?We’re a Rocketship: 9-Figure Revenue, Record Growth, and ProfitableWith an impressive revenue of $130M and a presence in over 100 cities, Headout has achieved 18 months of profitability, making it the fastest-growing marketplace in the travel industry. We’ve secured over $60M from leading investors and are committed to building a sustainable business for the long term. Our growth story is just beginning!Our Mission MattersIn today's digital age, enhancing our human experiences is crucial. At Headout, we aim to provide the easiest, quickest, and most enjoyable way to explore real-life experiences—from immersive tours to museums and live events, we cover it all.Why Join Us Now?With a solid foundation and tremendous potential ahead, this is an exciting time to join Headout. Having reached profitability and gained momentum, we have only just begun to build. If you're seeking a role where your contributions will make a significant impact, now is the perfect time to join our team!Our CultureRevolutionizing the travel industry is challenging but incredibly rewarding. We value ownership, craftsmanship, and impact, and we're dedicated to doing the best work of our careers. If you're a builder who thrives on solving complex problems, you'll fit right in. Discover more about our unique values here. The RoleAs a Senior Site Reliability Engineer, you will oversee infrastructure management, working with Kubernetes clusters in the cloud, and optimizing workloads. Your responsibilities will include managing CI/CD pipelines, developing reusable workflows using GitHub Actions (or similar tools), conducting canary releases, and enhancing observability. You will design service-level dashboards, fine-tune alerts, and handle incident management across the organization. Additionally, you will enhance application performance through backend changes to optimize API and page performance, improve database efficiency, and eliminate bottlenecks. You will also contribute to platform tools by architecting scalable and efficient platforms for cross-pod use cases and improve developer velocity by building tools and workflows that enhance efficiency across engineering teams. Security responsibilities will include establishing guardrails on...
Join Point72 as a Site Reliability Engineer in Bengaluru, where you will play a pivotal role in ensuring the reliability, performance, and scalability of our systems. You will collaborate with talented engineers to automate processes, monitor system health, and implement best practices in site reliability engineering.
Role Overview Nexthink is hiring a Senior Site Reliability Engineer in Bengaluru. This position focuses on improving the reliability, performance, and scalability of Nexthink’s systems to support smooth client operations. What You Will Do Work with teams across the company to design and build solutions that strengthen system stability. Monitor systems and respond to incidents to minimize downtime and disruptions. Contribute to ongoing efforts that keep our infrastructure resilient and efficient. Location This role is based in Bengaluru.
Welcome to OktaAt Okta, we are redefining the future of digital identity. As The World’s Identity Company, we empower individuals to securely access technology, anytime, anywhere, across any device or application. Our innovative solutions, including the Okta and Auth0 Platforms, provide robust access management, secure authentication, and automation, placing identity at the forefront of business security and growth.We value diverse perspectives and experiences, seeking lifelong learners who can enrich our team with their unique insights. Join us in our mission to create a world where identity is truly yours.Our Workforce Identity Cloud Security Engineering group is on the lookout for a Senior Staff Site Reliability Engineer with a strong passion for DevSecOps, Infrastructure Security, and Site Reliability Engineering (SRE). You will be a part of a pioneering team that is not only delivering exceptional solutions but also setting new benchmarks in cloud security. If you possess a solid background in safeguarding large-scale, mission-critical infrastructure, we want to connect with you.As a Senior Staff Site Reliability Engineer, you will be instrumental in designing and developing security solutions that fortify our cloud infrastructure. We foster a culture of innovation, encouraging you to advocate for defense-in-depth strategies, adhere to industry security standards, and implement the principle of least privilege to elevate our security posture.Our Infrastructure Security team is distinguished by its unique blend of security expertise and the ability to design, implement, and deploy infrastructure across various cloud environments without compromising product performance. We are dedicated to enhancing our customers' safety and privacy by integrating security services with core Okta products.This role is critical in a dynamic, security-focused organization poised for substantial growth. You will serve as a liaison between the Security and Engineering teams, leveraging technical expertise to influence the security roadmap and focus on engineering security aspects across our services. Join us in revolutionizing the industry and making a significant impact!
About EarnInAt EarnIn, we are trailblazers in earned wage access, dedicated to creating solutions that provide immediate financial flexibility for individuals facing the challenge of living paycheck to paycheck. Our community members enjoy the freedom to access their earnings as they are earned, with opportunities to spend, save, and enhance their financial future without the burden of fees, interest, or credit checks.Our leadership team boasts a wealth of experience, backed by prestigious investors such as A16Z, Matrix Partners, DST, and Ribbit Capital, alongside a robust core business poised for significant growth. We are on an exciting trajectory and are eager to welcome world-class talent to join us in shaping our future.POSITION SUMMARYWe are committed to delivering an exceptional product experience for our community members. Collaborating closely with all teams, we share the responsibility of rapidly delivering production-ready features. Our focus includes building and contributing to infrastructure, reliability tools, and best practices that enable swift and safe deployments. We emphasize aspects such as effective alert management, comprehensive runbooks, clear Service Level Objectives (SLOs), and ensuring that deployments are seamless and uneventful. As a Senior Site Reliability Engineer, you will serve as a technical leader, designing, monitoring, and operating our production systems. Your attention will be on the overall service behavior, including reliability, performance, failure modes, and enhancing the engineering experience.This role is hybrid, based in our Bengaluru office, as part of our expanding operations. EarnIn offers a comprehensive benefits package, including healthcare, internet and cell phone reimbursements, a learning and development stipend, and opportunities for collaboration and travel to our Palo Alto HQ and Bangkok site. Our salary ranges are determined based on role, level, and location.
Join our team as a Staff Site Reliability Engineer dedicated to improving the reliability and performance of our systems. Collaborate with cross-functional teams to develop and implement innovative solutions that enhance our infrastructure and service offerings. Your expertise will help us create a robust platform that meets the needs of our users while ensuring scalable and efficient operations.
Join Saviynt as a Senior Site Reliability Engineer, where you will play a crucial role in ensuring the reliability, availability, and performance of our innovative solutions. You will collaborate with cross-functional teams to implement and maintain scalable infrastructure, automate deployments, and develop monitoring systems that enhance our operational resilience.
Join Saviynt as a Site Reliability Engineer and play a pivotal role in ensuring the reliability and performance of our cloud services. You will collaborate with development and operations teams to build and maintain scalable systems while implementing monitoring solutions to enhance system resilience. This position offers you the opportunity to contribute to a dynamic environment focused on continuous improvement and operational excellence.
Mar 11, 2026
Sign in to browse more jobs
Create account — see all 2,448 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.