Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Experience
Qualifications
Proficient in programming languages such as Python, Go, or Java. Experience with cloud platforms like AWS, Azure, or GCP. Strong understanding of CI/CD pipelines and DevOps principles. Ability to monitor system performance and troubleshoot complex issues. Excellent communication and collaboration skills.
About the job
As a Site Reliability Engineer at dev2, you will play a crucial role in ensuring the reliability and performance of our services. You will work closely with development and operations teams to build and maintain scalable systems, troubleshoot issues, and implement best practices in reliability engineering. Your expertise will help us deliver exceptional service and maintain our commitment to quality.
About dev2
dev2 is a leading technology company headquartered in Boston, specializing in innovative software solutions that enhance operational efficiency. Our commitment to reliability and performance drives our success, and we are looking for talented individuals to join our dynamic team.
Full-time|$134.3K/yr - $214.8K/yr|Hybrid|Boston, Massachusetts, United States
Become a Catalyst for Positive Change at Axon.At Axon, our mission is to Protect Life. We are innovators dedicated to addressing society's most pressing safety and justice challenges through our suite of devices and cloud software solutions. Collaboration is at the heart of our success; we engage with transparency and empathy, valuing diverse perspectives from our customers, communities, and each other.Working at Axon is dynamic, rewarding, and impactful. Here, you will take the lead and create substantial change while continually evolving in your role at a company that values your contributions.Your ContributionAs a Senior Site Reliability Engineer in the APX SRE organization, you will be instrumental in implementing efficient, scalable solutions that enhance the reliability and performance of our global cloud-native Kubernetes platform and its services. You are passionate about maintaining system stability, producing clear documentation, and developing tools that enrich the developer experience.Location: This position is located in our Boston, MA office, with a hybrid working model. We encourage in-person collaboration from Tuesday to Friday, allowing for remote work on Mondays unless otherwise accommodated. We believe that strong connections drive innovation, and our office culture is designed to promote meaningful teamwork, mentorship, and collective achievement.
Full-time|$134.3K/yr - $214.8K/yr|Hybrid|Boston, Massachusetts, United States
Become a Force for Good at Axon.At Axon, we are dedicated to our mission of protecting life. We tackle society's most pressing safety and justice challenges through our innovative ecosystem of devices and cloud software. Collaboration is at the heart of what we do; we connect with transparency and empathy, valuing diverse perspectives from our customers, communities, and team members.Life at Axon is dynamic, challenging, and impactful. Here, you will take initiative and make a real difference. Continuously evolve as you contribute to a mission that matters at a company where your contributions are valued.Your ImpactAs a Senior Site Reliability Engineer within the APX SRE CloudOps team, you will architect and build the cloud infrastructure and automation platforms critical to Axon's product engineering teams. You will design solutions for multi-cloud environments (Azure, AWS), ensure FedRAMP compliance, and oversee large-scale Kubernetes platforms managing production workloads across various regions. A significant aspect of your role will involve coding: developing services, APIs, and internal tools using languages like Go and Python. Additionally, you will participate in on-call rotations and incident response, leveraging operational insights to enhance reliability and guide platform investments. This position merges software engineering expertise with cloud architecture at scale and production ownership.Location: This role is based in our Atlanta, Seattle, or Boston office and operates on a hybrid schedule. We prioritize in-person collaboration, requiring team members to work on-site from Tuesday to Friday, with the option to work remotely on Mondays, unless a workplace accommodation is approved. We believe that connection fosters innovation, and our in-office culture is designed to promote meaningful teamwork, mentorship, and shared success.
As a Site Reliability Engineer at dev2, you will play a crucial role in ensuring the reliability and performance of our services. You will work closely with development and operations teams to build and maintain scalable systems, troubleshoot issues, and implement best practices in reliability engineering. Your expertise will help us deliver exceptional service and maintain our commitment to quality.
Full-time|$166K/yr - $220K/yr|On-site|Boston, Massachusetts, United States
Anduril Industries is at the forefront of defense technology, dedicated to revolutionizing military capabilities for the U.S. and its allies through cutting-edge innovations. By integrating the expertise, technology, and business models from the most pioneering companies of the 21st century into the defense sector, Anduril is transforming the design, construction, and sale of military systems. Our advanced family of systems is driven by Lattice OS, an AI-enhanced operating system that synthesizes vast data streams into real-time, 3D command and control environments. In this era of strategic competition, we are committed to delivering state-of-the-art autonomy, AI, computer vision, sensor fusion, and networking technologies to the military in a matter of months rather than years.ABOUT THE TEAMThe Corporate Technology Engineering team plays a crucial role in developing and enhancing the various systems that empower Anduril to achieve its mission. Our technology solutions are vital for the supply chain, accounting, sales and growth, engineering, modeling and simulation, field maintenance, manufacturing, and more. We collaborate across the organization to ensure that our teams have the necessary tools and capabilities for mission success.ABOUT THE JOB:We are in search of an experienced Senior Site Reliability Engineer to join our dynamic team. In this role, you will be responsible for the design, deployment, scaling, and maintenance of the pivotal infrastructure that supports our systems. You will engage with a diverse array of stakeholder teams to facilitate swift and secure progress on their respective technology roadmaps.WHAT YOU'LL DO:Provision, manage, and scale intricate infrastructure for all Business Systems.Continuously optimize and refine CI/CD pipelines to improve the efficiency, reliability, and speed of software delivery.Promote a culture of observability and reliability, advocating for best practices and tools that enhance system visibility and resilience.Collaborate with cross-functional engineering teams to understand their needs and translate them into effective cloud solutions using industry best practices.Possess a deep understanding of the company’s business goals and objectives to design and implement infrastructure solutions that align with them.Strengthen systems and evaluate workload demands, planning resource capacity to guarantee optimal performance and cost-effectiveness.
Role Overview Beacon Biosignals is hiring a Site Reliability Engineer. This role focuses on improving the reliability and performance of the company’s systems. The position is open to candidates in Boston, MA or remote locations. What You Will Do Work with teams across engineering, product, and operations to support scalable infrastructure. Design, implement, and maintain systems that prioritize uptime and smooth user experiences. Help ensure high availability for Beacon Biosignals’ platforms and services.
Join Veeva Systems, a groundbreaking organization at the forefront of the industry cloud, dedicated to accelerating the delivery of therapies to patients worldwide. As one of the fastest-growing SaaS companies in history, we have achieved over $2 billion in revenue last fiscal year, with abundant growth opportunities on the horizon.At Veeva, we operate based on our core values: Do the Right Thing, Customer Success, Employee Success, and Speed. In 2021, we made history by becoming a public benefit corporation (PBC), committed to balancing the needs of our customers, employees, society, and investors.As a Work Anywhere company, we empower you to choose your ideal work environment, whether from home or in the office, to help you thrive.Be a part of our mission to transform the life sciences industry and positively impact our customers, employees, and communities.The RoleWe are seeking a talented Senior Site Reliability Engineer to join our Vault Platform team. In this role, you will be responsible for ensuring the scalability and reliability of our enterprise applications. You will face complex challenges on a global scale, leveraging your deep knowledge of Java and modern open-source technologies to make a significant impact on our production systems.Ideal candidates will have extensive experience working with Java applications and the latest open-source technologies, preferably gained in enterprise software development or a rapidly growing tech environment. As a Senior SRE, you will need to be innately curious and possess strong problem-solving skills. Additionally, you will bring a unique engineering perspective, understanding how systems integrate in production to function at a global scale for hundreds of customers across North America, Europe, and Asia.
Full-time|$180K/yr - $225K/yr|Hybrid|Boston, Massachusetts, United States
Become a Force for Good at Axon.At Axon, our mission is to protect life through innovative solutions that address society's most pressing safety and justice challenges. We are a team of explorers, working collaboratively to develop a comprehensive ecosystem of devices and cloud-based software. We value connection, transparency, and diverse perspectives from our customers, communities, and each other.Life at Axon is both fast-paced and rewarding. Here, you will take charge and make a meaningful impact while continuously growing in a mission-driven environment that values your contributions.Your ImpactAs a Senior Site Reliability Engineer, you will play a pivotal role in shaping how Axon constructs and manages its core platforms, specifically focusing on Zero Touch—a compliant execution framework—and the surrounding identity and security infrastructures. Instead of manually provisioning infrastructure or managing tickets, you will design and develop the platforms, tools, and policies that empower hundreds of engineers to operate safely and efficiently at scale.Your expertise in infrastructure and platform engineering, along with your extensive experience in distributed systems, will guide your efforts toward automation, self-service, and enforcing best practices. You will prioritize APIs, workflows, and standardized processes over manual tasks, ensuring security, identity, and compliance are fundamental to your work, especially in regulated environments where precision and traceability are paramount.This position is highly collaborative, requiring you to work alongside senior engineers across product and platform teams to enhance the building, deployment, security, and operation of Axon’s cloud systems within a modern, AI-driven landscape.Location - This role is situated in our Boston office on a hybrid schedule. We emphasize in-person collaboration, with team members expected to work onsite from Tuesday to Friday, while enjoying the flexibility to work remotely on Mondays, unless a workplace accommodation has been approved. We believe that connections inspire innovation, and our office culture is designed to promote teamwork, mentorship, and collective success.
At Veeva Systems, we are dedicated to our mission and are recognized as trailblazers in the industry cloud, empowering life sciences companies to expedite the delivery of therapies to patients. As one of the fastest-growing SaaS companies in history, we achieved over $2 billion in revenue in our previous fiscal year, with immense growth opportunities on the horizon.Our core values—Do the Right Thing, Customer Success, Employee Success, and Speed—are the foundation of our culture. Distinctively, we made history in 2021 by becoming a public benefit corporation (PBC), committed to balancing the interests of our customers, employees, society, and investors.As a Work Anywhere company, we offer the flexibility to choose between working from home or in the office, allowing you to thrive in your preferred environment.Join us in our mission to transform the life sciences industry and make a positive impact on our customers, employees, and communities.The RoleWe are seeking a talented Senior Site Reliability Engineer to join our Vault Platform team. In this role, you will be pivotal in ensuring the scalability and reliability of our enterprise applications. You will face complex challenges on a global scale, leveraging your extensive knowledge in Java and modern open-source technologies to significantly enhance our production systems.The ideal candidate will possess substantial experience with Java applications and the latest open-source technologies, particularly from enterprise software development or high-growth technology firms. As a Senior SRE, you should be naturally inquisitive and possess exceptional problem-solving skills. You will bring a unique engineering mindset, comprehending how systems integrate in production to function seamlessly for hundreds of customers across North America, Europe, and Asia.
Full-time|$127K/yr - $249K/yr|Remote|Boston; Miami; New York City; Pittsburgh; Raleigh; United States
Join MongoDB’s innovative Storage Layer Services (SLS) team as we redefine the MongoDB cloud storage layer. This dynamic team is at the forefront of developing high-performance, multi-tenant distributed storage solutions that not only enhance our existing Atlas storage framework but also empower our customers' workloads to operate with remarkable efficiency. In this pivotal role, you will collaborate closely with teams dedicated to building these storage services, defining Service Level Objectives (SLOs), shaping capacity plans, and ensuring the reliability, durability, and operational safety of the foundational storage layer that supports Atlas. As one of the founding members of this small but experienced team of Site Reliability Engineers (SREs), you will play a vital role in executing a multi-year vision for MongoDB’s cloud storage architecture. This position offers flexibility in location, allowing you to work from our offices in Boston, New York City, Raleigh, Miami, or Pittsburgh, or remotely from anywhere in the United States, provided you are based in the Eastern or Central time zones.
Full-time|$165.8K/yr - $224.4K/yr|Hybrid|Boston, MA or Remote
Who We AreAt PathAI, we are dedicated to revolutionizing patient outcomes through the power of AI-driven pathology. Our commitment to advancing traditional pathology methodologies into innovative technologies is at the forefront of our mission. By leveraging these advancements, we aim to expedite drug development, enhance diagnostic accuracy, and deliver life-saving treatments to patients with urgency. Join our diverse and talented team, united in solving intricate challenges and making a substantial impact in healthcare.Where You FitWe are seeking a highly skilled Senior Staff Site Reliability Engineer who will play a pivotal role in designing, constructing, and managing our hybrid cloud and on-premises environment.What You’ll DoIn this role, you will harness your extensive skills and develop new ones as you:Elevate our operational practices by implementing Site Reliability Engineering (SRE) best practices focused on user satisfaction, monitoring, and automation.Engineer robust infrastructure patterns for our cloud environments using Amazon Web Services, emphasizing security, reliability, and scalability.Design, construct, and manage our data center to support our rapidly expanding Machine Learning team.Integrate on-premises datacenter environments with our existing cloud infrastructure to create a seamless hybrid cloud solution.Enhance the reliability and resilience of our infrastructure through thorough root-cause analysis and identifying design gaps.Engage in platform on-call rotations and provide assistance during critical incident responses.
Join Xometry as a Site Reliability Engineer II (SRE) and be part of a dynamic team that drives innovation in the realm of automated manufacturing solutions. In this role, you will ensure the reliability, availability, and performance of our systems while collaborating closely with other engineering teams.
Full-time|On-site|Boston; Charlotte; New York City; Philadelphia; Pittsburgh; Washington DC
Join MongoDB as a Team Lead for our Site Reliability Engineering (SRE) team focused on the Storage Layer Service. In this pivotal role, you will drive the reliability, availability, and performance of MongoDB's storage systems, collaborating closely with cross-functional teams to enhance our infrastructure and ensure optimal service delivery.
Join our dynamic Managed Services team as a Major Incident Lead – Site Reliability. In this pivotal role, you will spearhead the management of high-severity incidents that impact our customers across InterSystems' managed services platforms. As the Incident Commander, you will be responsible for ensuring swift service restoration, effective communication with stakeholders, and coordinated efforts across Site Reliability Engineering (SRE), engineering, support, cloud, and service delivery teams. Working within a SRE-aligned service model, your primary focus will be on preserving service reliability by utilizing service level indicators and objectives. You will prioritize minimizing customer impact over root cause analysis during live incidents. In addition to incident management, you will lead post-incident reviews, transforming operational setbacks into quantifiable reliability enhancements and preventing future occurrences. This role is essential for upholding customer trust, platform resilience, and operational excellence in a 24/7, mission-critical, and highly regulated environment.
Join Tagup, a pioneering defense technology firm established at MIT, as we revolutionize logistics superiority through cutting-edge AI solutions. We are rapidly expanding and seeking innovators who are driven to implement transformative technologies to tackle some of the most pressing challenges in high-stakes environments. This is a unique opportunity to contribute to vital work that enhances national security and supports the success of U.S. and allied forces. Be part of shaping the future of defense logistics for a safer world.At Tagup, curiosity is an integral part of our culture. If you thrive on understanding complex systems, sharing knowledge, and learning from brilliant colleagues, you will feel right at home. Our team of engineers and data scientists is on a mission to enhance the safety, reliability, and efficiency of the machines and processes that drive the world. Our AI technology directly optimizes large-scale industrial equipment and logistics systems, ensuring top-tier performance for our clients.We are looking for a Cloud/DevOps Engineer who can adopt a Site Reliability Engineering (SRE) approach to our platform: automate using Infrastructure as Code (IaC), orchestrate with Kubernetes, and optimize PostgreSQL-backed services for heightened performance and availability. You will develop secure, auditable CI/CD pipelines, enforce a least-privilege access model by default, and maintain compliance across diverse, multi-region environments.
Join DigitalOcean as a Senior Engineer I in Observability, where you will play a crucial role in enhancing our platform's visibility and performance. You will work closely with cross-functional teams to build and optimize observability tools, ensuring that our infrastructure runs smoothly and efficiently. This is an opportunity to leverage your technical expertise to drive innovation and improve our customer experience.
Formlabs creates advanced 3D printing tools used by over 50,000 companies worldwide, including well-known names in technology, automotive, and aerospace. The team brings together builders, engineers, and innovators who aim to transform how physical objects are made. Role overview The R&D Reliability/Test Engineer joins the Reliability Engineering Team in Boston, MA. This group focuses on making Formlabs hardware dependable, allowing customers to print without unnecessary troubleshooting. The role centers on developing test strategies, building and deploying electro-mechanical test rigs, and analyzing data to strengthen product reliability. Success in this position depends on both a system-level perspective and attention to detail, along with the ability to communicate complex data clearly. Curiosity and a broad engineering view are highly valued. What you will do Design, implement, and manage validation test plans at system, subsystem, and component levels to ensure products meet design and reliability standards. Conduct hands-on, data-driven root cause analysis of electromechanical subsystems and field failures, using established problem-solving methods and a physics-of-failure mindset. Create and deploy automated testing equipment for subsystem characterization as well as reliability and environmental testing. Collaborate with R&D design teams throughout the product lifecycle: perform risk assessments, oversee ongoing testing, and participate in design reviews to embed reliability from the beginning. Analyze field failure data, identify trends, prioritize investigations by business impact, and develop clear, actionable corrective action plans. Recent team impact LCD & Light Engine: Built custom rigs to simulate UV light and heat, monitoring performance over thousands of hours.
Flywire is seeking a Senior QA Engineer I to help maintain the quality and reliability of its payment solutions. This position is based in Boston and centers on testing and quality assurance for the company’s products. Role overview This role focuses on designing, implementing, and executing test plans. The Senior QA Engineer I works closely with teams across the organization to make sure products meet quality standards before release. What you will do Develop and maintain comprehensive test plans Collaborate with cross-functional teams to identify and resolve issues Contribute to the overall quality and performance of Flywire’s payment solutions Requirements Detail-oriented approach to quality assurance Experience working in QA or software testing Ability to work effectively with multiple teams
Full-time|$148.5K/yr - $237.6K/yr|Hybrid|Boston, Massachusetts, United States
Become a Force for Good at Axon.At Axon, our mission is to Protect Life. We are innovators tackling society's most pressing safety and justice challenges through our suite of devices and cloud-based software. Our collaborative culture thrives on open communication and diverse perspectives, whether from our customers, communities, or each other.Life at Axon is dynamic, challenging, and impactful. Here, you will take charge and initiate meaningful change while continuously growing in a mission-driven environment where your contributions are valued.Your ImpactCurrently, public safety officers dedicate up to two-thirds of their work hours to administrative tasks, often drowning in paperwork. This not only hampers their efficiency but also limits their time in the field, impacting community safety. Axon Records alleviates this burden, enabling officers to focus on what truly matters.As part of the Axon Records Discovery Squad, you will help build a robust, scalable search and insights platform that empowers law enforcement to access pertinent information effortlessly.Envision the year 2030, where advanced technology has automated routine tasks, allowing officers to spend more time in communities, enhancing the quality of life and safety. With the right tools at their fingertips, officers can connect the dots and find essential answers to protect life effectively.As a Senior Engineer on our Discovery Squad, you will collaborate closely with fellow software engineers, product managers, and designers to ensure that Records data is user-friendly and easily analyzable. You will lead significant technical projects that manage large data volumes, spearhead the creation of new features, and contribute to shaping our team culture and engineering practices.Join our passionate team, dedicated to making a positive difference in the lives of first responders and the communities they serve.
Join Our Team Across the Northeastern United States!We are seeking highly skilled and principled Site Supervisors to spearhead the deployment of Cisco 2900 routers within enterprise environments throughout the Northeast. In this role, you will oversee the installation of Cisco 2900 routers and Aruba access points.You will manage a team of two installers, liaise with the end client, coordinate logistics, and supervise the entire deployment process. Each job site will involve 1-2 nights of work before transitioning to the next location. This position also entails acting as the regional manager to ensure that all maintenance tickets are promptly addressed. Please note, this position will require night work when business operations are closed.The project is expected to last approximately one year, with possibilities for extension.
Join DigitalOcean as a Senior Software Engineer I specializing in storage solutions. In this role, you will be instrumental in designing, developing, and optimizing scalable storage systems that power our cloud infrastructure. You will collaborate with cross-functional teams to enhance our platform, ensuring that our users experience top-notch performance and reliability.Your contributions will directly impact the efficiency of our services, making it essential to have a deep understanding of storage technologies and cloud computing principles.
Mar 3, 2026
Sign in to browse more jobs
Create account — see all 1,268 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.