Team Lead Site Reliability Engineering Storage Layer Service jobs in Boston – Browse 1,126 openings on RoboApply Jobs

Team Lead Site Reliability Engineering Storage Layer Service jobs in Boston

Open roles matching “Team Lead Site Reliability Engineering Storage Layer Service” with location signals for Boston. 1,126 active listings on RoboApply Jobs.

1,126 jobs found

1 - 20 of 1,126 Jobs
Apply
companyMongoDB, Inc. logo
Full-time|On-site|Boston; Charlotte; New York City; Philadelphia; Pittsburgh; Washington DC

Join MongoDB as a Team Lead for our Site Reliability Engineering (SRE) team focused on the Storage Layer Service. In this pivotal role, you will drive the reliability, availability, and performance of MongoDB's storage systems, collaborating closely with cross-functional teams to enhance our infrastructure and ensure optimal service delivery.

Mar 25, 2026
Apply
companyInterSystems logo
Full-time|$87K/yr - $116K/yr|On-site|Boston, MA

Join our dynamic Managed Services team as a Major Incident Lead – Site Reliability. In this pivotal role, you will spearhead the management of high-severity incidents that impact our customers across InterSystems' managed services platforms. As the Incident Commander, you will be responsible for ensuring swift service restoration, effective communication with stakeholders, and coordinated efforts across Site Reliability Engineering (SRE), engineering, support, cloud, and service delivery teams. Working within a SRE-aligned service model, your primary focus will be on preserving service reliability by utilizing service level indicators and objectives. You will prioritize minimizing customer impact over root cause analysis during live incidents. In addition to incident management, you will lead post-incident reviews, transforming operational setbacks into quantifiable reliability enhancements and preventing future occurrences. This role is essential for upholding customer trust, platform resilience, and operational excellence in a 24/7, mission-critical, and highly regulated environment.

Feb 11, 2026
Apply
companydev2 logo
Full-time|On-site|Boston

As a Site Reliability Engineer at dev2, you will play a crucial role in ensuring the reliability and performance of our services. You will work closely with development and operations teams to build and maintain scalable systems, troubleshoot issues, and implement best practices in reliability engineering. Your expertise will help us deliver exceptional service and maintain our commitment to quality.

Dec 11, 2023
Apply
companyBeacon Biosignals logo
Site Reliability Engineer

Beacon Biosignals

Full-time|Remote|Boston, MA - Remote

Role Overview Beacon Biosignals is hiring a Site Reliability Engineer. This role focuses on improving the reliability and performance of the company’s systems. The position is open to candidates in Boston, MA or remote locations. What You Will Do Work with teams across engineering, product, and operations to support scalable infrastructure. Design, implement, and maintain systems that prioritize uptime and smooth user experiences. Help ensure high availability for Beacon Biosignals’ platforms and services.

Apr 17, 2026
Apply
companyMongoDB logo
Full-time|$127K/yr - $249K/yr|Remote|Boston; Miami; New York City; Pittsburgh; Raleigh; United States

Join MongoDB’s innovative Storage Layer Services (SLS) team as we redefine the MongoDB cloud storage layer. This dynamic team is at the forefront of developing high-performance, multi-tenant distributed storage solutions that not only enhance our existing Atlas storage framework but also empower our customers' workloads to operate with remarkable efficiency. In this pivotal role, you will collaborate closely with teams dedicated to building these storage services, defining Service Level Objectives (SLOs), shaping capacity plans, and ensuring the reliability, durability, and operational safety of the foundational storage layer that supports Atlas. As one of the founding members of this small but experienced team of Site Reliability Engineers (SREs), you will play a vital role in executing a multi-year vision for MongoDB’s cloud storage architecture. This position offers flexibility in location, allowing you to work from our offices in Boston, New York City, Raleigh, Miami, or Pittsburgh, or remotely from anywhere in the United States, provided you are based in the Eastern or Central time zones.

Apr 8, 2026
Apply
companyAnduril Industries logo
Full-time|$166K/yr - $220K/yr|On-site|Boston, Massachusetts, United States

Anduril Industries is at the forefront of defense technology, dedicated to revolutionizing military capabilities for the U.S. and its allies through cutting-edge innovations. By integrating the expertise, technology, and business models from the most pioneering companies of the 21st century into the defense sector, Anduril is transforming the design, construction, and sale of military systems. Our advanced family of systems is driven by Lattice OS, an AI-enhanced operating system that synthesizes vast data streams into real-time, 3D command and control environments. In this era of strategic competition, we are committed to delivering state-of-the-art autonomy, AI, computer vision, sensor fusion, and networking technologies to the military in a matter of months rather than years.ABOUT THE TEAMThe Corporate Technology Engineering team plays a crucial role in developing and enhancing the various systems that empower Anduril to achieve its mission. Our technology solutions are vital for the supply chain, accounting, sales and growth, engineering, modeling and simulation, field maintenance, manufacturing, and more. We collaborate across the organization to ensure that our teams have the necessary tools and capabilities for mission success.ABOUT THE JOB:We are in search of an experienced Senior Site Reliability Engineer to join our dynamic team. In this role, you will be responsible for the design, deployment, scaling, and maintenance of the pivotal infrastructure that supports our systems. You will engage with a diverse array of stakeholder teams to facilitate swift and secure progress on their respective technology roadmaps.WHAT YOU'LL DO:Provision, manage, and scale intricate infrastructure for all Business Systems.Continuously optimize and refine CI/CD pipelines to improve the efficiency, reliability, and speed of software delivery.Promote a culture of observability and reliability, advocating for best practices and tools that enhance system visibility and resilience.Collaborate with cross-functional engineering teams to understand their needs and translate them into effective cloud solutions using industry best practices.Possess a deep understanding of the company’s business goals and objectives to design and implement infrastructure solutions that align with them.Strengthen systems and evaluate workload demands, planning resource capacity to guarantee optimal performance and cost-effectiveness.

Mar 31, 2026
Apply
companyAxon Enterprise, Inc. logo
Full-time|$134.3K/yr - $214.8K/yr|Hybrid|Boston, Massachusetts, United States

Become a Catalyst for Positive Change at Axon.At Axon, our mission is to Protect Life. We are innovators dedicated to addressing society's most pressing safety and justice challenges through our suite of devices and cloud software solutions. Collaboration is at the heart of our success; we engage with transparency and empathy, valuing diverse perspectives from our customers, communities, and each other.Working at Axon is dynamic, rewarding, and impactful. Here, you will take the lead and create substantial change while continually evolving in your role at a company that values your contributions.Your ContributionAs a Senior Site Reliability Engineer in the APX SRE organization, you will be instrumental in implementing efficient, scalable solutions that enhance the reliability and performance of our global cloud-native Kubernetes platform and its services. You are passionate about maintaining system stability, producing clear documentation, and developing tools that enrich the developer experience.Location: This position is located in our Boston, MA office, with a hybrid working model. We encourage in-person collaboration from Tuesday to Friday, allowing for remote work on Mondays unless otherwise accommodated. We believe that strong connections drive innovation, and our office culture is designed to promote meaningful teamwork, mentorship, and collective achievement.

Mar 27, 2026
Apply
companyAxon logo
Full-time|$134.3K/yr - $214.8K/yr|Hybrid|Boston, Massachusetts, United States

Become a Force for Good at Axon.At Axon, we are dedicated to our mission of protecting life. We tackle society's most pressing safety and justice challenges through our innovative ecosystem of devices and cloud software. Collaboration is at the heart of what we do; we connect with transparency and empathy, valuing diverse perspectives from our customers, communities, and team members.Life at Axon is dynamic, challenging, and impactful. Here, you will take initiative and make a real difference. Continuously evolve as you contribute to a mission that matters at a company where your contributions are valued.Your ImpactAs a Senior Site Reliability Engineer within the APX SRE CloudOps team, you will architect and build the cloud infrastructure and automation platforms critical to Axon's product engineering teams. You will design solutions for multi-cloud environments (Azure, AWS), ensure FedRAMP compliance, and oversee large-scale Kubernetes platforms managing production workloads across various regions. A significant aspect of your role will involve coding: developing services, APIs, and internal tools using languages like Go and Python. Additionally, you will participate in on-call rotations and incident response, leveraging operational insights to enhance reliability and guide platform investments. This position merges software engineering expertise with cloud architecture at scale and production ownership.Location: This role is based in our Atlanta, Seattle, or Boston office and operates on a hybrid schedule. We prioritize in-person collaboration, requiring team members to work on-site from Tuesday to Friday, with the option to work remotely on Mondays, unless a workplace accommodation is approved. We believe that connection fosters innovation, and our in-office culture is designed to promote meaningful teamwork, mentorship, and shared success.

Apr 10, 2026
Apply
companyAxon logo
Full-time|$180K/yr - $225K/yr|Hybrid|Boston, Massachusetts, United States

Become a Force for Good at Axon.At Axon, our mission is to protect life through innovative solutions that address society's most pressing safety and justice challenges. We are a team of explorers, working collaboratively to develop a comprehensive ecosystem of devices and cloud-based software. We value connection, transparency, and diverse perspectives from our customers, communities, and each other.Life at Axon is both fast-paced and rewarding. Here, you will take charge and make a meaningful impact while continuously growing in a mission-driven environment that values your contributions.Your ImpactAs a Senior Site Reliability Engineer, you will play a pivotal role in shaping how Axon constructs and manages its core platforms, specifically focusing on Zero Touch—a compliant execution framework—and the surrounding identity and security infrastructures. Instead of manually provisioning infrastructure or managing tickets, you will design and develop the platforms, tools, and policies that empower hundreds of engineers to operate safely and efficiently at scale.Your expertise in infrastructure and platform engineering, along with your extensive experience in distributed systems, will guide your efforts toward automation, self-service, and enforcing best practices. You will prioritize APIs, workflows, and standardized processes over manual tasks, ensuring security, identity, and compliance are fundamental to your work, especially in regulated environments where precision and traceability are paramount.This position is highly collaborative, requiring you to work alongside senior engineers across product and platform teams to enhance the building, deployment, security, and operation of Axon’s cloud systems within a modern, AI-driven landscape.Location - This role is situated in our Boston office on a hybrid schedule. We emphasize in-person collaboration, with team members expected to work onsite from Tuesday to Friday, while enjoying the flexibility to work remotely on Mondays, unless a workplace accommodation has been approved. We believe that connections inspire innovation, and our office culture is designed to promote teamwork, mentorship, and collective success.

Apr 10, 2026
Apply
companyXometry logo
Full-time|On-site|Boston, MA

Join Xometry as a Site Reliability Engineer II (SRE) and be part of a dynamic team that drives innovation in the realm of automated manufacturing solutions. In this role, you will ensure the reliability, availability, and performance of our systems while collaborating closely with other engineering teams.

Mar 21, 2026
Apply
companyVeeva Systems Inc. logo
Full-time|Remote|Massachusetts - Boston

Join Veeva Systems, a groundbreaking organization at the forefront of the industry cloud, dedicated to accelerating the delivery of therapies to patients worldwide. As one of the fastest-growing SaaS companies in history, we have achieved over $2 billion in revenue last fiscal year, with abundant growth opportunities on the horizon.At Veeva, we operate based on our core values: Do the Right Thing, Customer Success, Employee Success, and Speed. In 2021, we made history by becoming a public benefit corporation (PBC), committed to balancing the needs of our customers, employees, society, and investors.As a Work Anywhere company, we empower you to choose your ideal work environment, whether from home or in the office, to help you thrive.Be a part of our mission to transform the life sciences industry and positively impact our customers, employees, and communities.The RoleWe are seeking a talented Senior Site Reliability Engineer to join our Vault Platform team. In this role, you will be responsible for ensuring the scalability and reliability of our enterprise applications. You will face complex challenges on a global scale, leveraging your deep knowledge of Java and modern open-source technologies to make a significant impact on our production systems.Ideal candidates will have extensive experience working with Java applications and the latest open-source technologies, preferably gained in enterprise software development or a rapidly growing tech environment. As a Senior SRE, you will need to be innately curious and possess strong problem-solving skills. Additionally, you will bring a unique engineering perspective, understanding how systems integrate in production to function at a global scale for hundreds of customers across North America, Europe, and Asia.

Oct 7, 2025
Apply
companyVeeva Systems Inc. logo
Full-time|Hybrid|Massachusetts - Boston

At Veeva Systems, we are dedicated to our mission and are recognized as trailblazers in the industry cloud, empowering life sciences companies to expedite the delivery of therapies to patients. As one of the fastest-growing SaaS companies in history, we achieved over $2 billion in revenue in our previous fiscal year, with immense growth opportunities on the horizon.Our core values—Do the Right Thing, Customer Success, Employee Success, and Speed—are the foundation of our culture. Distinctively, we made history in 2021 by becoming a public benefit corporation (PBC), committed to balancing the interests of our customers, employees, society, and investors.As a Work Anywhere company, we offer the flexibility to choose between working from home or in the office, allowing you to thrive in your preferred environment.Join us in our mission to transform the life sciences industry and make a positive impact on our customers, employees, and communities.The RoleWe are seeking a talented Senior Site Reliability Engineer to join our Vault Platform team. In this role, you will be pivotal in ensuring the scalability and reliability of our enterprise applications. You will face complex challenges on a global scale, leveraging your extensive knowledge in Java and modern open-source technologies to significantly enhance our production systems.The ideal candidate will possess substantial experience with Java applications and the latest open-source technologies, particularly from enterprise software development or high-growth technology firms. As a Senior SRE, you should be naturally inquisitive and possess exceptional problem-solving skills. You will bring a unique engineering mindset, comprehending how systems integrate in production to function seamlessly for hundreds of customers across North America, Europe, and Asia.

Oct 7, 2025
Apply
companyPathAI logo
Full-time|$165.8K/yr - $224.4K/yr|Hybrid|Boston, MA or Remote

Who We AreAt PathAI, we are dedicated to revolutionizing patient outcomes through the power of AI-driven pathology. Our commitment to advancing traditional pathology methodologies into innovative technologies is at the forefront of our mission. By leveraging these advancements, we aim to expedite drug development, enhance diagnostic accuracy, and deliver life-saving treatments to patients with urgency. Join our diverse and talented team, united in solving intricate challenges and making a substantial impact in healthcare.Where You FitWe are seeking a highly skilled Senior Staff Site Reliability Engineer who will play a pivotal role in designing, constructing, and managing our hybrid cloud and on-premises environment.What You’ll DoIn this role, you will harness your extensive skills and develop new ones as you:Elevate our operational practices by implementing Site Reliability Engineering (SRE) best practices focused on user satisfaction, monitoring, and automation.Engineer robust infrastructure patterns for our cloud environments using Amazon Web Services, emphasizing security, reliability, and scalability.Design, construct, and manage our data center to support our rapidly expanding Machine Learning team.Integrate on-premises datacenter environments with our existing cloud infrastructure to create a seamless hybrid cloud solution.Enhance the reliability and resilience of our infrastructure through thorough root-cause analysis and identifying design gaps.Engage in platform on-call rotations and provide assistance during critical incident responses.

Jan 20, 2026
Apply
companyCGS Federal logo
Full-time|Hybrid|Boston, MA

Cloud and Storage EngineerEmployment Type: Full-Time, ExperiencedDepartment: Information TechnologyJoin CGS Federal as a Cloud and Storage Engineer, where you will take the lead in designing and implementing comprehensive Storage Area Network (SAN) architecture for a robust EMC-based SAN infrastructure serving a major federal agency. We are looking for a dynamic individual who thrives in a fast-paced setting and can establish standards and processes for storage configuration, optimization, replication, and security.At CGS Federal, we unite dedicated, skilled, and innovative minds to tackle the government’s most pressing challenges with advanced technology. We are in search of candidates who are eager to enhance government innovation, value teamwork, and can foresee the needs of others. Our environment promotes support and encourages professional development through various learning opportunities.Key Responsibilities:- Design and implement a full-scale Storage Area Network (SAN) architecture for a complex, multi-enterprise EMC-based solution involving multiple servers and platforms connected via fiber channel to numerous EMC Storage Arrays with petabytes of capacity.- Ensure the SAN architecture addresses performance, capacity, replication, disaster recovery, and backup storage requirements.- Define strategic directions for the SAN and backup environment.- Translate customer storage needs into a comprehensive enterprise solution, overseeing development, installation, configuration, and deployment.- Manage overall storage and backup capacity planning and utilization.

Mar 3, 2026
Apply
companyMotional logo
Full-time|$240K/yr - $330K/yr|On-site|Boston, Massachusetts, United States

Mission Overview: As a Principal Engineer and Technical Lead at Motional, you will spearhead a dynamic team dedicated to creating and implementing advanced trajectory planning algorithms through numerical optimization for autonomous vehicles. You will play a pivotal role in designing the motion planning and control systems that empower our fleet to navigate through intricate environments with safety, efficiency, and comfort. As a vital leader, you will not only enhance the development and performance of innovative trajectory generation algorithms but also collaborate with multidisciplinary teams including Planning, Perception, Prediction, Localization, Machine Learning Products, and Systems Engineering. Additionally, you will mentor a skilled group of engineers to deliver cutting-edge solutions. If you are enthusiastic about autonomous driving and wish to contribute to our commercial launch this year, enjoy tackling challenging real-world problems, and are eager to make a profound impact in a rapidly evolving domain, we invite you to connect with us! Your Responsibilities: Lead and expand a high-impact trajectory planning team, defining its technical vision, execution strategy, and long-term organizational role within the Motion organization. Prepare for commercial launch readiness by enhancing on-road driving behavior (safety, comfort, assertiveness) across the fleet through feature expansion, continuous performance improvements, and innovative methodologies. Utilize a systematic, metrics-driven approach to measure and enhance on-road performance. Advocate for the use of existing training and evaluation pipelines to ensure scalable, production-ready delivery. Oversee the computational efficiency and real-time performance of trajectory generation algorithms, ensuring compliance with latency budgets. Direct the swift triage and root-cause analysis of critical fleet incidents, translating real-world edge cases into immediate algorithmic enhancements and validation sets. Manage the successful execution of complex, multi-team projects, ensuring technical alignment from design to deployment while upholding a rigorous standard for testing and validation to meet the high safety benchmarks required for our commercial launch. Collaborate with peer leads to provide technical leadership, making crucial decisions regarding architectural direction, strategic investments, tactical execution, and technical debt reduction. Ensure technical excellence through thorough design and code reviews, while actively fostering the professional growth and mentorship of your engineering team. Set and pursue ambitious goals that inspire the team to deliver industry-leading trajectory solutions.

Mar 31, 2026
Apply
companyPublic Storage logo
Full-time|On-site|Boston

Public Storage is hiring a Customer Service Manager to oversee operations at self-storage facilities in Boston. This role shapes the customer experience and ensures the company’s standards for service and facility care are met every day. Key responsibilities Supervise the on-site customer service team and guide daily operations Handle customer inquiries and resolve concerns quickly Maintain a welcoming, safe, and secure environment across the facility Monitor property condition and appearance to meet company expectations What matters here Success in this position means taking pride in a clean, secure, and friendly atmosphere. Public Storage values team members who help deliver reliable service and a positive experience for every customer.

Apr 21, 2026
Apply
companyMotional logo
Full-time|$240K/yr - $330K/yr|Remote|Boston, Massachusetts, United States; Remote U.S.

Principal Engineer and Team Lead - Motion Planning Join our innovative Planning team at Motional, where your enthusiasm for robotics and software development will contribute to the advancement of production-ready autonomous vehicles. As the Principal Engineer and Team Lead, you will oversee a talented team dedicated to designing and implementing state-of-the-art motion planning software for autonomous vehicles. This pivotal role requires you to lead the definition of motion planning and control systems that ensure our fleet operates safely, efficiently, and comfortably in complex environments. Your leadership will be instrumental in developing cutting-edge motion planning algorithms, working in collaboration with cross-functional teams (Localization, Perception, Prediction, Controls, ML Products, and Systems Engineering), and mentoring a diverse group of engineers to deliver groundbreaking solutions. If you are driven by a passion for autonomous driving, enjoy tackling challenging real-world problems, and are eager to make a significant impact in a rapidly evolving industry, we would love to hear from you.

Mar 31, 2026
Apply
companytsmg logo
Contract|On-site|Boston, MA

The Operations Team Lead at tsmg plays a crucial role in managing daily operational activities and leading the team of Drivers to ensure optimal performance. Reporting directly to the Country Manager, the Ops Lead is responsible for providing guidance, resolving issues, and facilitating communication within the team to enhance productivity and efficiency.

May 7, 2024
Apply
companyAMCS Group logo
Full-time|Hybrid|Boston

IT Operations Team Lead (Hybrid – Boston)Are you ready to elevate your career?We are seeking a dynamic and hands-on IT Operations Team Lead to guide a team of engineers while remaining deeply involved in technical aspects. You will serve as the escalation point for intricate issues, spearhead operational excellence, and oversee infrastructure and security projects across the EMEA region.This position is ideal for a technical leader who relishes building, automating, troubleshooting, and enhancing platforms, while also mentoring others and elevating standards.Why Choose AMCS?At AMCS, we don't just provide jobs; we offer purpose-driven careers. Join a rapidly growing, pioneering company that is transforming resource management, waste reduction, and operational optimization across various industries. Our culture is centered on connection—connecting to our mission, our customers, and one another. We nurture a collaborative and entrepreneurial environment where your ideas and contributions truly matter.Your ResponsibilitiesServe as the senior escalation point for complex incidents, focusing on root cause analysis, permanent fixes, and post-incident enhancements.Lead thorough root-cause analyses, define permanent solutions, and ensure that lessons learned are documented and integrated into process improvements.Take ownership of technically challenging issues, guiding them through the entire lifecycle from investigation to resolution and follow-up.Oversee technical workstreams for initiatives such as office relocations, network and firewall upgrades, cloud and hybrid enhancements, and integration of acquired companies.Collaborate closely with vendors, internal stakeholders, and other IT teams to implement changes safely and effectively.Enhance platform reliability and performance through proactive monitoring, automation, patching, and standardized operational processes.Work in partnership with the Information Systems team to support audit readiness and evidence capture in line with ISO27001, SOC, and internal control frameworks.Collaborate with the Cyber Security team to ensure security best practices are integrated across identity, endpoint, network, and infrastructure platforms.Contribute to the ongoing enhancement of security posture, resilience, and risk management.Provide daily technical direction and mentorship to engineers, helping them develop deeper expertise and confidence.Establish clear expectations regarding ownership, accountability, quality, and follow-through.Support individual development, succession planning, and preparedness for increased responsibility.Partner with IT Operations leadership on workload planning, risk management, and delivery priorities.Qualifications We SeekProficient in Microsoft 365 administration (Exchange Online, Teams, SharePoint/OneDrive).Experience with Azure fundamentals.

Apr 8, 2026
Apply
companyTagup logo
Full-time|On-site|Boston, MA

Join Tagup, a pioneering defense technology firm established at MIT, as we revolutionize logistics superiority through cutting-edge AI solutions. We are rapidly expanding and seeking innovators who are driven to implement transformative technologies to tackle some of the most pressing challenges in high-stakes environments. This is a unique opportunity to contribute to vital work that enhances national security and supports the success of U.S. and allied forces. Be part of shaping the future of defense logistics for a safer world.At Tagup, curiosity is an integral part of our culture. If you thrive on understanding complex systems, sharing knowledge, and learning from brilliant colleagues, you will feel right at home. Our team of engineers and data scientists is on a mission to enhance the safety, reliability, and efficiency of the machines and processes that drive the world. Our AI technology directly optimizes large-scale industrial equipment and logistics systems, ensuring top-tier performance for our clients.We are looking for a Cloud/DevOps Engineer who can adopt a Site Reliability Engineering (SRE) approach to our platform: automate using Infrastructure as Code (IaC), orchestrate with Kubernetes, and optimize PostgreSQL-backed services for heightened performance and availability. You will develop secure, auditable CI/CD pipelines, enforce a least-privilege access model by default, and maintain compliance across diverse, multi-region environments.

Oct 29, 2025

Sign in to browse more jobs

Create account — see all 1,126 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.