Senior Hardware Reliability Engineer At Samsara San Francisco Ca jobs in San Francisco – Browse 12,019 openings on RoboApply Jobs

Senior Hardware Reliability Engineer At Samsara San Francisco Ca jobs in San Francisco

Open roles matching “Senior Hardware Reliability Engineer At Samsara San Francisco Ca” with location signals for San Francisco. 12,019 active listings on RoboApply Jobs.

12,019 jobs found

1 - 20 of 12,019 Jobs
Apply
companySamsara logo
Full-time|$204K/yr - $240K/yr|Hybrid|San Francisco, CA, United States

Who We AreSamsara (NYSE: IOT) is a trailblazer in the Connected Operations™ Cloud, a platform that empowers organizations reliant on physical operations to leverage Internet of Things (IoT) data for actionable insights and operational improvements. Our mission at Samsara is to enhance the safety, efficiency, and sustainability of the physical operations that underpin the global economy. Covering over 40% of global GDP, these sectors include agriculture, construction, field services, transportation, and manufacturing. We are dedicated to digitally transforming their operations on a large scale.Joining Samsara means you'll be part of a team that's defining the future of physical operations. You will contribute to a dynamic range of product solutions, including Video-Based Safety, Vehicle Telematics, Apps and Driver Workflows, and Equipment Monitoring. As a company that has recently gone public, you will enjoy the autonomy and support to make a significant impact as we build for the future.About the Role:Samsara's Hardware Reliability team plays a crucial role in ensuring an outstanding customer experience through reliable hardware. As a Senior Hardware Reliability Engineer, you will establish quality processes that uphold the high standards of Samsara's hardware.In this role, you will implement and execute comprehensive reliability strategies that cover the entire product development lifecycle, from concept to warranty repair. You will rapidly gather and analyze test, field performance, and manufacturing data to drive necessary actions both internally and with our suppliers, ensuring the production of top-quality products. Collaboration with hardware, firmware, and operations teams is a fundamental aspect of this role.This is a hybrid position open to candidates residing in the US, requiring you to visit our office in San Francisco three times a week.You Should Apply If:You want to impact the industries that run our world: Your efforts will lead to tangible real-world benefits—helping to maintain essential services and support vital industries.

Feb 14, 2026
Apply
companySamsara logo
Full-time|$204K/yr - $240K/yr|Hybrid|San Francisco - SF9

About UsSamsara (NYSE: IOT) is at the forefront of the Connected Operations™ Cloud, a groundbreaking platform that empowers organizations reliant on physical operations to leverage Internet of Things (IoT) data for actionable insights and operational improvements. We are committed to enhancing the safety, efficiency, and sustainability of the vital physical operations that drive the global economy. Covering over 40% of global GDP, our focus spans critical sectors such as agriculture, construction, field services, transportation, and manufacturing. We are eager to facilitate the digital transformation of these industries on a large scale.Joining Samsara means you will play a pivotal role in shaping the future of physical operations, contributing to a diverse range of product solutions including Video-Based Safety, Vehicle Telematics, Apps and Driver Workflows, and Equipment Monitoring. As part of a recently public company, you will enjoy the freedom and support necessary to make a significant impact as we lay the groundwork for long-term success.Position Overview:We are seeking a Senior Product Design Engineer II to join our dynamic team. In this role, you will oversee the industrial design, architecture, and engineering of one or more Samsara products from their initial concept to mass production. Your design process will be driven by data-derived insights from our telemetry data cloud, combined with a profound, hands-on understanding of our customers, gained through direct engagement at their sites and with their equipment.Throughout the product development lifecycle, you will collaborate closely with Product Management, Electrical Engineering, Firmware, Engineering Project Management, and Hardware leadership to make informed decisions that balance functionality, cost, installation, usability, and aesthetics. Samsara’s Hardware Product Design teams work hand-in-hand with Operations and Supply Chain teams, external laboratories, JDM design resources, and an expanding global supply network. Together, you will deliver outstanding products on schedule and within budget, ensuring that Samsara continues to lead the industry in innovative product design.This position is available to candidates residing in the US. It is a hybrid role requiring 3 days per week in our San Francisco office and 2 days of remote work.

Feb 14, 2026
Apply
companyGridware logo
Full-time|On-site|San Francisco, CA

About GridwareGridware is an innovative technology firm headquartered in San Francisco, committed to safeguarding and enhancing the reliability of the electrical grid. We have pioneered a revolutionary approach to grid management known as Active Grid Response (AGR), which meticulously monitors the electrical, physical, and environmental factors influencing grid safety and reliability. Our state-of-the-art AGR platform leverages high-precision sensors to identify potential issues at an early stage, facilitating proactive maintenance and fault resolution. This holistic strategy is designed to bolster safety, minimize outages, and ensure optimal grid performance. We are proud to be supported by prominent climate-tech and Silicon Valley investors. To learn more, visit www.Gridware.io.About the RoleWe are seeking a skilled Senior Hardware Reliability Engineer to lead reliability testing, analysis, and lifetime modeling of various outdoor electronic assemblies. This pivotal role will concentrate on the electronic components of our products, collaborating closely with our mechanical-focused Reliability Engineer and engaging with the broader hardware and cross-functional teams.

Feb 21, 2026
Apply
companySamsara logo
Internship|$76.6K/yr - $128.8K/yr|On-site|San Francisco - SF9

About UsSamsara (NYSE: IOT) is revolutionizing the Connected Operations™ Cloud, a transformative platform that empowers organizations reliant on physical operations to leverage Internet of Things (IoT) data for actionable insights and operational enhancements. Our mission is to enhance the safety, efficiency, and sustainability of the physical operations that drive our global economy. Covering over 40% of global GDP, these essential industries include agriculture, construction, field services, transportation, and manufacturing. We are thrilled to digitally transform their operations at scale.Joining Samsara means you will play a pivotal role in shaping the future of physical operations, contributing to a diverse range of innovative product solutions such as Video-Based Safety, Vehicle Telematics, Driver Workflow Applications, and Equipment Monitoring. As part of a publicly traded company, you will enjoy the autonomy and support needed to make a significant impact while we build for the long term.

Feb 11, 2026
Apply
companyMultiply Labs logo
Full-time|On-site|San Francisco

About Multiply LabsMultiply Labs is an innovative startup located in San Francisco, California, backed by renowned investors in technology and life sciences such as Casdin Capital, Lux Capital, and Y Combinator. Our goal is to develop the world's leading robotic systems and utilize them to make groundbreaking life-saving therapies accessible to everyone.We are transforming the manufacturing process of cell therapies through the creation of advanced robotic systems that automate and scale the production of these crucial treatments. Our cutting-edge robots enable biopharma companies to produce cell therapies efficiently without overhauling their existing processes, thus minimizing regulatory hurdles and risks. Unlike traditional methods that are labor-intensive and costly (often exceeding $1M per patient), our robotic solutions aim to make these vital treatments more affordable and reachable for those who need them.To discover more and view our robots in action, please visit www.multiplylabs.com and follow us on LinkedIn.Position OverviewWe are looking for a dedicated Hardware Reliability Engineer to become an essential part of Multiply Labs’ Reliability Engineering team. As a founding member, you will collaborate closely with the Hardware Product and Systems Integration teams to enhance our designs throughout the entire development lifecycle, from initial prototypes to fully deployed GMP production systems. Your contributions will directly support the delivery of life-saving therapies by ensuring our robots operate seamlessly within the high-stakes biotech environment.

Jan 28, 2026
Apply
companySamsara logo
Full-time|$124.1K/yr - $208.5K/yr|Hybrid|San Francisco - SF9

Who We AreSamsara (NYSE: IOT) is at the forefront of the Connected Operations™ Cloud, a transformative platform that empowers businesses reliant on physical operations to tap into Internet of Things (IoT) data. Our aim is to provide actionable insights that enhance safety, efficiency, and sustainability across vital industries such as agriculture, construction, transportation, and manufacturing. By digitally transforming these sectors, which represent over 40% of global GDP, we are contributing to a more efficient and sustainable economy.Joining Samsara means being part of a team that is defining the future of physical operations. You will engage in cutting-edge solutions, including Video-Based Safety, Vehicle Telematics, and Equipment Monitoring, within a supportive environment that fosters innovation and long-term impact.About the Role:We are seeking a Senior Hardware Systems Engineer to enhance our rapidly expanding product line. Your primary responsibility will involve leading the electrical engineering components of product architecture and design, grounded in comprehensive feasibility, design, and cost analyses. This encompasses critical aspects such as component selection, thermal management, and antenna design. You will leverage extensive telemetry and direct customer insights to inform and refine our product designs. Collaborating closely with Product Management, Firmware, and Hardware leadership, you will influence key engineering decisions while mentoring fellow engineers. The role will also require interaction with our US and Taiwan EE teams, as well as our Supply Chain and laboratory resources, to achieve our project goals effectively.This role is hybrid, requiring you to be in our San Francisco, CA office three days a week, with the flexibility to work remotely for two days. Travel may be necessary up to 25% of the time, and proximity to an international airport is essential. We offer relocation assistance for this position and welcome candidates from across the U.S. who are willing to relocate to the Bay Area.

Feb 11, 2026
Apply
companyCarta logo
Full-time|On-site|San Francisco, California; Santa Clara, California; Seattle, WA

Join Carta as a Senior Site Reliability Engineer, where you will play a pivotal role in enhancing our infrastructure and ensuring the reliability of our platforms. You will work collaboratively with cross-functional teams to implement innovative solutions that drive operational excellence and scalability.

Apr 3, 2026
Apply
companyDrata logo
Full-time|$166.9K/yr - $225.9K/yr|Hybrid|Hybrid - San Francisco

Drata helps organizations demonstrate their commitment to security and integrity. The platform supports companies as they build and maintain trust with users, customers, partners, and prospects. Values Built on Trust: Consistency shapes decisions and actions. Integrity: Choosing to do what is right, every time. Customer-Obsessed: Prioritizing customer needs above all else. Competitive Fire: Striving for higher standards and greater achievements. Diversity: Welcoming different perspectives to encourage creative solutions. Automation First: Pursuing efficiency by saving time and resources wherever possible. How the Team Works Drata blends high standards with a supportive environment focused on growth. Team members are encouraged to own their work, improve continuously, and deliver meaningful results. The company values quick, informed decisions that drive immediate impact, while always keeping the mission and customer needs at the center. The San Francisco-based team uses a hybrid work model. Colleagues collaborate in the office Tuesday through Thursday, focusing on alignment and innovation. Mondays and Fridays offer flexibility for deep work or personal needs. Growth and Culture Drata has expanded to over 600 professionals worldwide, recognized for a culture that values trust, speed, and continuous learning. The environment supports both personal and professional development. See the Speed: CEO Adam Markowitz discusses Drata’s rapid journey to $100M ARR in four years. Hear the Voice of the Team: Employee stories highlight collaboration and growth at Drata.

Apr 27, 2026
Apply
companyHyperbolic Labs logo
Full-time|On-site|San Francisco, CA

Who We AreAt Hyperbolic Labs, we are committed to democratizing AI by removing barriers to computing power with our Open-Access AI Cloud. By aggregating global computing resources, we provide an innovative GPU marketplace and AI inference service that ensures both affordability and accessibility. As trailblazers at the convergence of AI and open-source technology, we envision a future where AI innovation is only limited by creativity, not by resource availability. We invite forward-thinking individuals who share our dedication to making AI universally accessible, secure, and affordable. Join us in crafting a platform that empowers innovators worldwide to realize their visionary AI projects.In anticipation of our growth following our Series A funding, our team — guided by co-founders with advanced degrees in AI, Mathematics, and Computer Science — is set to transform the computing landscape.About the RoleWe are in search of a skilled Site Reliability Engineer to guarantee that Hyperbolic's GPU marketplace and AI infrastructure function with outstanding reliability, performance, and security. As an aggregator of computational resources from numerous global providers, our service level objectives (SLOs), trust, and economic efficiency are critical to our product. Your key responsibilities will include defining and maintaining service level objectives, developing resilient incident response protocols, managing capacity across our extensive GPU network, and implementing secure rollout and rollback mechanisms to ensure uninterrupted platform operation around the clock.In this influential role, you'll set the reliability benchmarks that foster customer trust in our platform, design comprehensive monitoring and alerting systems for enhanced infrastructure visibility, automate capacity management and resource allocation processes, lead incident response and post-mortem evaluations, and collaborate closely with engineering teams to bolster system resilience. Security and infrastructure hardening will be paramount, necessitating strong isolation protocols between tenants and suppliers, the implementation of effective key management systems, and the establishment of compliance frameworks. This high-impact position will directly affect our ability to deliver on our commitment to providing affordable, accessible AI compute at scale.

Mar 26, 2026
Apply
companyCrusoe logo
Full-time|$172K/yr - $209K/yr|On-site|San Francisco, CA - US

At Crusoe, our mission is to propel the availability of energy and intelligence. We are designing the engine that fuels a future where individuals can ambitiously innovate with AI, all while upholding standards of scale, speed, and sustainability.Join us in the AI revolution powered by sustainable technology at Crusoe. Here, you will spearhead significant innovations, make a lasting impact, and collaborate with a team that is leading the charge in responsible, transformative cloud infrastructure.About This Role:We are on the lookout for a Hardware Production / Sustaining Engineer to enhance Crusoe’s Hardware Systems Engineering team and address critical skill gaps in debugging, validation, and production support of high-performance computing systems. In this role, you will oversee the entire hardware lifecycle—from prototype initiation to mass production—while driving automation, resolving intricate issues, and ensuring reliability across Crusoe Cloud’s GPU- and CPU-based infrastructure.You will collaborate closely with cross-functional teams to support, debug, and optimize hardware platforms at scale, with a specific focus on PCIe, InfiniBand, and NVMe/storage, which are recognized as vital areas for enhanced expertise. Your contributions will significantly influence Crusoe’s capability to deploy and manage sustainable, AI-first computing systems that deliver world-class performance and reliability.What You’ll Be Working On:Lead the entire hardware development and sustaining lifecycle, encompassing feasibility, bring-up, validation, deployment, and ongoing production support.Create and maintain scripting and automation frameworks for hardware testing, diagnostics, and continuous reliability enhancements.Guide deep troubleshooting and debugging across:PCIe (link training, topology, performance issues)InfiniBand (fabric debugging, throughput, connectivity issues)NVMe/storage (performance bottlenecks, firmware interactions, failure analysis)Perform thorough system validation and characterization for GPU, CPU, and high-performance computing platforms.Assist in end-to-end integration and solution testing to guarantee that Crusoe Cloud products fulfill performance, reliability, and scalability standards.Work in tandem with mechanical, thermal, firmware, software, and manufacturing teams to resolve system-level challenges.

Feb 19, 2026
Apply
companyCrusoe logo
Full-time|$208K/yr - $253K/yr|On-site|San Francisco, CA - US

At Crusoe, our mission is to drive the evolution of energy and intelligence. We are developing the technology that fuels a future where individuals can ambitiously harness AI capabilities without compromising on scale, speed, or sustainability.Join us in revolutionizing AI with sustainable solutions at Crusoe. In this role, you will be at the forefront of innovation, making a significant impact while collaborating with a team that is shaping the future of responsible and transformative cloud infrastructure.About This Role:We are looking for a dedicated Hardware Production/Sustaining Engineer to enhance Crusoe's Hardware Systems Engineering team. This position is critical for bridging essential skill gaps in debugging, validation, and production support for high-performance computing systems. You will manage the entire hardware lifecycle—from prototype initiation to large-scale production—focusing on automation, deep troubleshooting, and reliability within Crusoe Cloud’s GPU- and CPU-oriented infrastructure.Your collaboration with cross-functional teams will be vital in supporting, debugging, and enhancing hardware platforms on a large scale, specifically targeting PCIe, InfiniBand, and NVMe/storage, which have been highlighted as key areas for expanded expertise. Your contributions will directly influence Crusoe’s capability to deploy and maintain sustainable, AI-driven computing systems that deliver exceptional performance and reliability.Your Responsibilities Will Include:Leading the complete hardware development and sustaining lifecycle, encompassing feasibility studies, bring-up, validation, deployment, and ongoing production support.Creating and sustaining automation frameworks and scripts for hardware testing, diagnostics, and continual reliability enhancements.Executing in-depth troubleshooting and debugging across:PCIe (including link training, topology, and performance issues)InfiniBand (focusing on fabric debugging, throughput, and connectivity challenges)NVMe/storage (addressing performance bottlenecks, firmware interactions, and failure analyses)Performing extensive system validation and characterization for GPU, CPU, and high-performance computing platforms.Assisting in end-to-end integration and solution testing to guarantee that Crusoe Cloud products fulfill performance, reliability, and scalability standards.Collaborating with teams across mechanical, thermal, firmware, software, and manufacturing domains to troubleshoot and enhance system performance.

Feb 19, 2026
Apply
companyEcho Neurotechnologies logo
Full-time|On-site|San Francisco

Company OverviewEcho Neurotechnologies is an innovative startup at the forefront of Brain-Computer Interface (BCI) technology. We are committed to creating advanced hardware solutions powered by AI, aimed at restoring autonomy for individuals with disabilities and enhancing their quality of life.Team CultureBecome a part of our dynamic team of passionate and skilled professionals. We thrive in a collaborative environment, where you will have the opportunity to take charge of pivotal decisions that shape our future. We prioritize continuous learning and development, encouraging contributions that drive our collective success.Position SummaryWe are looking for a Senior Hardware Engineer with expertise in Mechanical Engineering to validate our cutting-edge Echo hardware systems. You will evaluate custom hardware devices and subsystems, while also spearheading the development and execution of specialized test systems for design verification.Key ResponsibilitiesDesign and prototype mechanical components and assemblies, including rapid prototyping, machining, and injection molding.Develop electromechanical test systems to characterize and assess hardware devices.Create test protocols, implement design verification testing, and manage vendor testing processes.Analyze test data, produce technical reports, and supervise vendor test reports.Generate component and assembly drawings, including tolerance stack-ups and analyses.Plan and conduct design verification activities.QualificationsBachelor's degree in Mechanical Engineering or a related field.A minimum of 7 years of professional experience in engineering electro-mechanical hardware devices.Proficient in hands-on machining and rapid prototyping techniques.Experience in data analysis from physical systems.Familiarity with quality systems and standards.Preferred QualificationsMaster’s degree in Mechanical Engineering or a related field.Strong analytical skills and attention to detail.Ability to work collaboratively in a fast-paced environment.

Mar 6, 2026
Apply
companyVizcom logo
Full-time|$200K/yr - $250K/yr|On-site|San Francisco

Agency Notice: We are not currently collaborating with recruiting agencies for this role. We kindly ask that you refrain from contacting Vizcom employees regarding this position. Any resumes submitted without prior agreement will be considered unsolicited.About VizcomVizcom is a cutting-edge visual creation platform that merges advanced web tooling with AI-driven workflows. Our technology stack incorporates React/TypeScript for the front end, Node/Koa + PostGraphile for API services, PostgreSQL, Redis, BullMQ for queuing, and a Kubernetes-based production infrastructure.We are seeking a seasoned expert to oversee platform stability and infrastructure, ensuring our system remains reliable, efficient, and resilient as we scale.Role MissionTake full ownership of service reliability: proactively prevent incidents, minimize impact during failures, and guide swift, high-quality recovery during production downtimes.This role involves hands-on technical leadership, granting you the authority to establish reliability standards and enforce production protocols.CompensationBase salary between $200,000 and $250,000, plus significant equity.Your ResponsibilitiesReliability Standards: Define and uphold SLIs/SLOs/error budgets for key user interactions.Resilience of Production Architecture: Implement failure isolation across APIs, workers, queues, and interdependencies to ensure one subsystem's failure does not disrupt core access.Kubernetes Runtime Reliability: Establish probe contracts, deployment standards, graceful shutdown protocols, scaling/resource policies, and startup safety measures.Queue & Job Safety (BullMQ/Redis): Manage poison pill containment and workload segregation.Incident Command Quality: Lead Sev1/Sev2 incident responses from containment to corrective actions.Reliability Operating System: Oversee observability quality (prioritizing signal over noise), on-call efficiency, runbook maintenance, and postmortem discipline.Deployment Safety Authority: Gate risky deployments and enforce reliability protocols whenever production health is compromised.

Feb 24, 2026
Apply
companyPlaud Inc. logo
Full-time|On-site|San Francisco, CA

About Plaud Inc.Plaud is revolutionizing the way professionals enhance productivity and performance with our trusted AI work companion. Our innovative note-taking solutions have gained the admiration of over 1,500,000 users globally since our inception in 2023. We are on a mission to amplify human intelligence by developing next-generation intelligence infrastructure and interfaces that seamlessly capture, extract, and leverage what you say, hear, see, and think.Based in San Francisco, Plaud Inc. is a Delaware-incorporated company that is redefining the boundaries of human-AI collaboration through a unique combination of hardware and software solutions. We adhere to the highest standards of data security and privacy protection, with certifications including ISO 27001, ISO 27701, GDPR, SOC 2, HIPAA, and EN 18031 compliance.Discover more about our innovative solutions by visiting https://www.plaud.ai and follow us on Instagram, X, Facebook, LinkedIn, and YouTube.Why You Should Join UsAt Plaud, you will play a pivotal role in shaping the future of human-AI interaction. Here’s what we offer:A thriving, bootstrapped company with a remarkable $250M revenue run rate achieved in just three years.An opportunity to define the next-generation paradigm for human-AI interaction.Direct exposure to cutting-edge AI tools for professionals and a chance to contribute to our global expansion.Collaborate with a passionate team that values innovation, teamwork, and customer success.Advance your career in a culture that promotes continuous learning and rapid career growth.

Feb 24, 2026
Apply
companyprosper logo
Full-time|On-site|San Francisco, CA

Role overview The Senior Site Reliability Engineer at prosper plays a key role in maintaining and improving the reliability and performance of the company’s core systems. Collaboration with teams across the organization is essential to ensure services remain stable and efficient. What you will do Design and set up monitoring tools to track the health and performance of systems Automate routine operational tasks to minimize manual intervention and boost efficiency Diagnose and resolve complex technical problems that impact infrastructure or services Support projects aimed at strengthening infrastructure stability and preparing for future growth Location This role is located in San Francisco, CA.

Apr 27, 2026
Apply
companySieve logo
Full-time|On-site|San Francisco

About SieveSieve stands as a pioneering AI research lab dedicated solely to video data. Our innovative approach integrates exabyte-scale video infrastructure with state-of-the-art video understanding techniques and a myriad of data sources, creating unparalleled datasets that redefine video modeling. With video accounting for 80% of global internet traffic, it has become the vital digital medium fueling creativity, communication, gaming, AR/VR, and robotics. At Sieve, we aim to eliminate the most significant bottleneck hindering the expansion of these applications: access to high-quality training data.With strategic partnerships with leading AI labs, our team of just 12 has achieved remarkable financial success, generating $XXM last quarter alone. Earlier this year, we secured Series A funding from elite firms including Matrix Partners, Swift Ventures, Y Combinator, and AI Grant.About the RoleAs we process petabytes of video across numerous nodes and cloud environments, ensuring reliability, observability, and security is essential to our growth.We are seeking our inaugural Reliability Engineer, who will focus entirely on fortifying the infrastructure that underpins Sieve. This role demands high ownership and a deep understanding of:System throughput and stabilityMonitoring and incident managementSecurity principles, including least-privilege designMinimizing operational burdens for the entire engineering teamYou will collaborate closely with our CTO and founding engineers to develop the foundational tools that empower our engineering efforts.This position is ideal for an engineer who is passionate about reliability, throughput, observability, and security. You are proactive in anticipating potential failure modes, reducing operational risks, and designing resilient systems.If a system failure occurs, you take it personally, thriving under the weight of responsibility.What You'll Be DoingCollaborate with engineering to design and validate infrastructure supporting PB-scale workloadsDevelop and manage Terraform-based multi-cloud deploymentsEnhance cloud and data security (SSO, IAM, least privilege access, auditability)Lead incident response efforts and strengthen systems against failuresCreate CI/CD systems to minimize user errors and maximize safetyEstablish monitoring and alerting frameworks (Prometheus, OpenTelemetry, VictoriaMetrics)

Feb 5, 2026
Apply
companyBraze logo
Full-time|$144K/yr - $258K/yr|On-site|San Francisco

At Braze, we pride ourselves on cultivating a team that is genuinely approachable, exceptionally kind, and intensely passionate about what we do.We aim to fuel this passion by establishing high standards, promoting teamwork, and fostering a harmonious work-life balance as we collectively navigate rapid global growth, all while striving for greater equity and opportunity both within and outside our organization.To thrive in our environment, you should be prepared to hold yourself and those around you to high standards. There are always opportunities for contribution: acting with autonomy, taking accountability, and being open to new perspectives are fundamental to our ongoing success.Our deep curiosity and eagerness to share diverse passions with one another enrich our culture with a unique vibrancy.If you are motivated to tackle exciting challenges and have a proactive mindset amid change, you will be empowered to make a significant impact here, backed by a sharp and passionate team. If Braze sounds like the right fit for you, we look forward to meeting you!WHAT YOU'LL DOAs a Site Reliability Engineer (SRE), you will be responsible for ensuring the smooth operation of all internal-facing services and platforms, ultimately guaranteeing site uptime. SREs integrate the roles of system administrators and software engineers, applying sound engineering principles, operational discipline, and mature automation techniques to the infrastructure services we deliver. Our expertise spans systems such as networking, the Linux kernel, and specialized interests in scaling algorithms or distributed systems.Our team plays a crucial role in enhancing automation, infrastructure reliability, and empowering Braze’s engineering teams to leverage the infrastructure products and platforms we develop with ease. Braze operates at a massive scale, supporting over 3.3 billion monthly active users across our customers, processing hundreds of billions of data points each month, and delivering billions of messages to end-users daily. Our diverse technology stack includes Ruby on Rails, MongoDB, Redis, Kafka, Kubernetes, and more. As a Senior Site Reliability Engineer at Braze, you will collaborate with your team and consumer engineering groups to continually enhance the infrastructure, automation, and tooling that power our internal products built on these technologies.Main responsibilities:Collaborate with Braze’s engineering teams to:Design products that effectively utilize infrastructure platforms in a scalable and reliable mannerTroubleshoot reliability and scalability issues across all layers of the stack, including products built on our infrastructure platformsImplement monitoring solutions and improve overall system performance...

Feb 10, 2026
Apply
companyFlux logo
Full-time|On-site|San Francisco Office

Why Choose Flux?At Flux, we are transforming the hardware landscape by creating the world's first AI Hardware Engineer. Our mission is to democratize access to cutting-edge hardware development and revolutionize global electronics design and manufacturing.About the OpportunityAs a DevOps Engineer at Flux, you will be integral in ensuring the smooth operation of our innovative platform. Your work will encompass a wide range of full-stack systems, impacting various aspects of our service, including billing, authentication, onboarding, and seamless integrations.Your contributions will directly influence user experience, and your role will be crucial in maintaining operational efficiency as Flux continues to scale.Key ResponsibilitiesEnhance the reliability, availability, and operational health of our production systems.Establish observability standards across services, including metrics, logs, and error tracking.Define Service Level Objectives (SLOs) and Service Level Indicators (SLIs) while implementing effective alerting strategies.Collaborate with engineering teams to design robust systems and proactively mitigate operational risks.Develop internal tools to enhance system safety, debugging capabilities, and developer productivity.Manage infrastructure using Pulumi across GCP, AWS, and Firebase.

Jan 19, 2026
Apply
companyLumafield logo
Full-time|On-site|San Francisco, CA

Lumafield develops X-Ray CT scanners designed to make advanced imaging more accessible and affordable. The company’s cloud software provides engineers with detailed visualization tools, helping them analyze complex products and make informed decisions. Role overview This full-time, on-site Hardware Systems Engineer position is based in San Francisco. The role centers on leading hardware development for industrial CT scanners. Collaboration with researchers and designers is a key part of the job, with a focus on improving product development for a range of industries. What you will do Lead hardware systems development for industrial CT scanners Design and manage electrical architecture Develop firmware and oversee system integration Work hands-on to transform concepts into working products Collaborate with cross-functional teams to address customer needs Team and collaboration The engineering team includes experienced researchers and designers who value curiosity and rigor. The group is impact-driven and backed by leading venture capital firms. Location This role requires working on-site at Lumafield’s San Francisco office.

Apr 21, 2026
Apply
companyMercor logo
Full-time|On-site|San Francisco

Join the Mercor TeamAt Mercor, we stand at the dynamic intersection of labor markets and AI research. Collaborating with premier AI labs and enterprises, we empower the human intelligence that is crucial for AI's evolution.Our expansive talent network plays a vital role in training cutting-edge AI models, akin to the way educators impart knowledge to their students—by sharing insights, experiences, and contextual understanding that code alone cannot convey. Currently, our network of over 30,000 experts generates more than $2 million daily.We are pioneering a novel category of work where expertise fuels AI progress. Achieving this vision necessitates an ambitious, fast-paced, and deeply dedicated team. You will collaborate with researchers, operators, and AI firms that are at the forefront of transforming societal structures.Mercor is a thriving Series C company with a valuation of $10 billion. We operate five days a week in-person at our new headquarters in San Francisco.About the RoleAs a Site Reliability Engineer (SRE) at Mercor, you will take ownership of production reliability for our critical systems, working closely with our infrastructure leadership. You will play a pivotal role in establishing our SRE function and defining how Mercor manages large-scale, high-availability systems.Your ResponsibilitiesEnsure the reliability and safety of production for key shared services and customer-facing systems.Collaborate directly with infrastructure leadership to outline SRE priorities, reliability benchmarks, and the production safety roadmap.Enhance the structure of our production systems to ensure stability, resource efficiency, isolation, and observability.Advocate for and implement modern SRE methodologies (e.g., incident management, postmortems, SLIs/SLOs) across engineering teams.Work alongside engineering and applied AI teams to facilitate sustainable growth.Promote SRE best practices internally, supporting teams in a safe, scalable, and consistent production onboarding process.Who We SeekThe ideal candidate will have:Extensive experience in genuine SRE roles (not merely operations) across various positions or organizations.A deep understanding of SRE methodologies popularized by Google (e.g., error budgets, reliability vs. risk trade-offs, large-scale distributed systems).5+ years of SRE experience; ideally, 15+ years in total experience for this inaugural SRE position.A proven track record of managing systems at scale, with a strong grasp of the complexities involved.

Dec 27, 2025

Sign in to browse more jobs

Create account — see all 12,019 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.