Senior Electrical Reliability Engineer jobs in San Francisco – Browse 6,720 openings on RoboApply Jobs

Senior Electrical Reliability Engineer

AstranisSan Francisco

On-site Full-time $135K/yr - $235K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Senior

Qualifications

Role ResponsibilitiesCollaborate with a diverse engineering team to ensure spacecraft reliability throughout their mission life. Conduct reliability and availability analyses utilizing Reliability Block Diagrams and DFMEA methodologies. Provide guidance on optimal redundancy architectures to engineering teams. Assist engineering and production teams with design evaluations, failure assessments, and root cause investigations. Plan and oversee EEE component testing, establishing reliability targets, sample sizes, and stress conditions for both internal and external testing facilities. RequirementsBachelor’s degree in electrical engineering, physics, materials science, or a related discipline.5+ years of experience in reliability, component, or electrical engineering roles.4+ years in areas such as physics of failure, semiconductor physics, design for reliability, functional safety, or accelerated testing. Expertise in electrical design, including components, circuits, PCBs, or electronic assemblies. Familiarity with reliability statistics and data analysis. Strong analytical capabilities and a collaborative mindset.

About the job

We are the go-to satellite communications partner for clients demanding exceptional uptime, data security, network visibility, and tailored solutions. Backed by over $750 million from industry-leading investors such as Andreessen Horowitz, Blackrock, and Fidelity, our team of 450 engineers and entrepreneurs thrives in our 153,000 sq. ft. headquarters in Northern California.

Senior Electrical Reliability Engineer

As a Senior Reliability Engineer at Astranis, you will be pivotal in ensuring that our spacecraft electronics and systems fulfill our reliability and availability requirements. Collaborating with a multidisciplinary engineering team, you will push the boundaries of geo-synchronous spacecraft design and achieve previously unattainable performance in space. Your expertise will ensure that Design for Reliability remains central to our engineering efforts.

About Astranis

Astranis is at the forefront of satellite technology, dedicated to building advanced satellites for high orbits that enhance global communications and expand humanity's reach into outer space. With a vision to provide secure and reliable networks to a diverse range of clients, Astranis stands as a leader in the satellite communications industry.

Similar jobs

1 - 20 of 6,720 Jobs

Select all on this page (20)

Apply

Senior Electrical Reliability Engineer

Astranis

Full-time|$135K/yr - $235K/yr|On-site|San Francisco

Astranis is revolutionizing satellite technology by creating advanced spacecraft designed for high orbits, thereby extending humanity's presence in the solar system. Our satellites deliver dedicated and secure networks to an elite clientele, including large corporations, government entities, and the U.S. military. With five satellites successfully launched and a robust pipeline of over $1 billion in commercial contracts, Astranis is set for growth as we prepare for numerous upcoming launches.We are the go-to satellite communications partner for clients demanding exceptional uptime, data security, network visibility, and tailored solutions. Backed by over $750 million from industry-leading investors such as Andreessen Horowitz, Blackrock, and Fidelity, our team of 450 engineers and entrepreneurs thrives in our 153,000 sq. ft. headquarters in Northern California.Senior Electrical Reliability EngineerAs a Senior Reliability Engineer at Astranis, you will be pivotal in ensuring that our spacecraft electronics and systems fulfill our reliability and availability requirements. Collaborating with a multidisciplinary engineering team, you will push the boundaries of geo-synchronous spacecraft design and achieve previously unattainable performance in space. Your expertise will ensure that Design for Reliability remains central to our engineering efforts.

Mar 18, 2026

Apply

Senior Hardware Reliability Engineer

Gridware

Full-time|On-site|San Francisco, CA

About GridwareGridware is an innovative technology firm headquartered in San Francisco, committed to safeguarding and enhancing the reliability of the electrical grid. We have pioneered a revolutionary approach to grid management known as Active Grid Response (AGR), which meticulously monitors the electrical, physical, and environmental factors influencing grid safety and reliability. Our state-of-the-art AGR platform leverages high-precision sensors to identify potential issues at an early stage, facilitating proactive maintenance and fault resolution. This holistic strategy is designed to bolster safety, minimize outages, and ensure optimal grid performance. We are proud to be supported by prominent climate-tech and Silicon Valley investors. To learn more, visit www.Gridware.io.About the RoleWe are seeking a skilled Senior Hardware Reliability Engineer to lead reliability testing, analysis, and lifetime modeling of various outdoor electronic assemblies. This pivotal role will concentrate on the electronic components of our products, collaborating closely with our mechanical-focused Reliability Engineer and engaging with the broader hardware and cross-functional teams.

Feb 21, 2026

Apply

Senior Site Reliability Engineer at Drata | San Francisco

Drata

Full-time|$166.9K/yr - $225.9K/yr|Hybrid|Hybrid - San Francisco

Drata helps organizations demonstrate their commitment to security and integrity. The platform supports companies as they build and maintain trust with users, customers, partners, and prospects. Values Built on Trust: Consistency shapes decisions and actions. Integrity: Choosing to do what is right, every time. Customer-Obsessed: Prioritizing customer needs above all else. Competitive Fire: Striving for higher standards and greater achievements. Diversity: Welcoming different perspectives to encourage creative solutions. Automation First: Pursuing efficiency by saving time and resources wherever possible. How the Team Works Drata blends high standards with a supportive environment focused on growth. Team members are encouraged to own their work, improve continuously, and deliver meaningful results. The company values quick, informed decisions that drive immediate impact, while always keeping the mission and customer needs at the center. The San Francisco-based team uses a hybrid work model. Colleagues collaborate in the office Tuesday through Thursday, focusing on alignment and innovation. Mondays and Fridays offer flexibility for deep work or personal needs. Growth and Culture Drata has expanded to over 600 professionals worldwide, recognized for a culture that values trust, speed, and continuous learning. The environment supports both personal and professional development. See the Speed: CEO Adam Markowitz discusses Drata’s rapid journey to $100M ARR in four years. Hear the Voice of the Team: Employee stories highlight collaboration and growth at Drata.

Apr 27, 2026

Apply

Senior Electrical Research Engineer

Gridware

Full-time|On-site|San Francisco, CA

Join Gridware as a Senior Electrical Research Engineer, where you'll play a pivotal role in advancing innovative electrical solutions. We are looking for an expert who is passionate about research and development in electrical engineering. In this role, you'll collaborate with a dynamic team to design, analyze, and optimize electrical systems that will drive our technology forward.

Mar 20, 2026

Apply

Senior Software Engineer - Data Reliability

Fastly, Inc.

Full-time|$181.2K/yr - $217.5K/yr|On-site|Denver, CO; San Francisco, CA

At Fastly, we empower individuals to connect more effectively with the things they cherish. Our cutting-edge edge cloud platform enables customers to swiftly, securely, and reliably craft exceptional digital experiences by processing, serving, and safeguarding their applications as close to their end-users as possible — right at the edge of the Internet. Tailored for modern internet demands, our platform is programmable and supports agile software development. We proudly serve many of the world's leading companies, including GitHub, Yelp, Paramount, and JetBlue.Join us in our mission to build a more trustworthy Internet.Posting Open Date: Feb. 25, 2026Anticipated Posting Close Date*: March 25, 2026*Please note that this job posting may close early depending on the volume of applications.Role Overview:The Data Reliability team is seeking an experienced Senior Software Engineer to contribute to the development and support of next-generation data storage solutions at Fastly. The ideal candidate will possess expertise in backend and data services within cloud environments, proficiency with configuration and orchestration tools such as Terraform and Kubernetes, and the ability to create internal administration tools using Go and related technologies. Our team plays a vital role in ensuring the infrastructure, orchestration, and reliability of Fastly's most data-intensive applications, utilizing technologies like Terraform, Elasticsearch, ClickHouse, Prometheus, MySQL, and Redis across both cloud and hardware platforms. Your contributions will directly enhance our customers' success by providing product teams with a robust platform for efficient and consistent delivery of high-quality, high-throughput, globally distributed data systems and products. We embrace a distributed work model and value both collaborative and asynchronous communication styles.Key Responsibilities:Deploy, support, and maintain various critical data storage systems, scaling from gigabytes to petabytes.Develop statistics and dashboards to track service-level objectives for these systems.Create and manage tools for configuration, backup, and authenticated access to data systems employing peer review, CI/CD, and both daemon- and container-based deployment strategies.Write high-performance, maintainable, and concise code, actively participating in code reviews to enhance the codebase.

Mar 20, 2026

Apply

Senior Site Reliability Engineer

alembic

Full-time|On-site|San Francisco HQ

About the RoleJoin alembic as a Senior Site Reliability Engineer (SRE) and become an integral part of our mission to enhance platform reliability, observability, and operational excellence. In this pivotal role, you will collaborate with engineers and data scientists to architect, automate, and maintain the robust infrastructure that drives our platform, including data pipelines, machine learning workloads, and real-time analytics systems.This hands-on position offers significant visibility across the technology stack and provides you with the opportunity to shape the future of our infrastructure and operations.

Dec 22, 2025

Apply

Senior Manager, Site Reliability Engineering

Tubi TV

Full-time|$227.2K/yr - $324.5K/yr|Hybrid|San Francisco, CA (Hybrid)

About the Role: At Tubi, our Site Reliability Engineering (SRE) team transcends traditional operations. We embody a software engineering ethos, leveraging a developer's toolkit to tackle the complexities of large-scale, distributed systems. Our core mission focuses on building resilience from the ground up, empowering our product teams to innovate swiftly while delivering an exceptional user experience. We oversee the availability, latency, performance, and capacity of our platform, driven by a culture of data-informed decision-making, blameless learning, and relentless automation. We are on the lookout for a seasoned and visionary Senior Manager of SRE to lead and expand our newly formed Site Reliability Engineering team. You will be more than just a people manager or tech lead; you will be the strategic architect behind our reliability roadmap. Your role will involve building and mentoring a team of skilled engineers, cultivating an environment of blameless learning and continuous improvement, while advocating for the engineering practices that balance rapid innovation with unwavering stability. You will play a pivotal role within our engineering leadership, collaborating with peers across the organization to embed reliability as a shared responsibility and a fundamental principle of our engineering culture.

Mar 17, 2026

Apply

Senior Software Engineer - Observability and Reliability

Sigma Computing

Full-time|$170K/yr - $240K/yr|On-site|San Francisco, CA

About the Role Sigma Computing is growing its engineering team in San Francisco, CA. The company builds technology to help users access data with ease. As a Senior Software Engineer focused on Observability and Reliability, you will work alongside engineers who value high standards and collaboration. What You Will Do Design and build observability platforms and tools, including metrics collection, logging, distributed tracing, dashboards, alerting, and application performance monitoring. Work with technologies such as Go, OpenTelemetry, and Kubernetes to solve reliability challenges. Take part in on-call rotations to help maintain strong uptime for Sigma’s services. Create tools and processes to improve cloud incident triage and reduce downtime. Define and promote practices that make systems and services measurable and observable. Join design and code reviews with peers and stakeholders to reinforce quality and effective collaboration.

Apr 25, 2026

Apply

Senior Site Reliability Engineer at Hyperbolic | San Francisco

Hyperbolic Labs

Full-time|On-site|San Francisco, CA

Who We AreAt Hyperbolic Labs, we are committed to democratizing AI by removing barriers to computing power with our Open-Access AI Cloud. By aggregating global computing resources, we provide an innovative GPU marketplace and AI inference service that ensures both affordability and accessibility. As trailblazers at the convergence of AI and open-source technology, we envision a future where AI innovation is only limited by creativity, not by resource availability. We invite forward-thinking individuals who share our dedication to making AI universally accessible, secure, and affordable. Join us in crafting a platform that empowers innovators worldwide to realize their visionary AI projects.In anticipation of our growth following our Series A funding, our team — guided by co-founders with advanced degrees in AI, Mathematics, and Computer Science — is set to transform the computing landscape.About the RoleWe are in search of a skilled Site Reliability Engineer to guarantee that Hyperbolic's GPU marketplace and AI infrastructure function with outstanding reliability, performance, and security. As an aggregator of computational resources from numerous global providers, our service level objectives (SLOs), trust, and economic efficiency are critical to our product. Your key responsibilities will include defining and maintaining service level objectives, developing resilient incident response protocols, managing capacity across our extensive GPU network, and implementing secure rollout and rollback mechanisms to ensure uninterrupted platform operation around the clock.In this influential role, you'll set the reliability benchmarks that foster customer trust in our platform, design comprehensive monitoring and alerting systems for enhanced infrastructure visibility, automate capacity management and resource allocation processes, lead incident response and post-mortem evaluations, and collaborate closely with engineering teams to bolster system resilience. Security and infrastructure hardening will be paramount, necessitating strong isolation protocols between tenants and suppliers, the implementation of effective key management systems, and the establishment of compliance frameworks. This high-impact position will directly affect our ability to deliver on our commitment to providing affordable, accessible AI compute at scale.

Mar 26, 2026

Apply

Senior Reliability Test Engineer

Astranis Space Technologies Corp.

Full-time|$130K/yr - $180K/yr|On-site|San Francisco

Astranis is at the forefront of satellite technology, crafting advanced satellites designed for high orbits to broaden humanity's exploration of the solar system. Our satellites deliver dedicated, secure networks to a diverse range of esteemed clients worldwide, including large enterprises, government entities, and the US military. With five satellites currently operational and several more set to launch, we are addressing a robust backlog of over $1 billion in commercial contracts.We take pride in being the leading choice for satellite communications among clients with demanding standards for uptime, data security, network visibility, and customization. Having secured over $750 million from top-tier investors such as Andreessen Horowitz, Blackrock, and Fidelity, our team of 450 engineers and entrepreneurs operates from our expansive 153,000 sq. ft. headquarters in Northern California, USA.Senior Reliability Test EngineerAs a Senior Reliability Test Engineer, you will play a pivotal role in collaborating across all engineering disciplines to ensure our hardware achieves exceptional quality and reliability standards. With Astranis ramping up satellite production, your expertise will be essential in establishing a comprehensive reliability test program that supports the development of new product designs, monitors manufacturing processes, and identifies long-term reliability issues. The ideal candidate will possess extensive engineering experience with high-reliability products, demonstrate autonomy, and have the capability to design a reliability test program from the ground up.

Mar 9, 2026

Apply

Senior Electrical Engineer - Autonomous Robotics

DoorDash, Inc.

Full-time|$170K/yr - $250K/yr|On-site|San Francisco, CA

DoorDash Labs is an innovative team at DoorDash, dedicated to developing autonomous delivery robots and cutting-edge autonomy solutions that power DoorDash's delivery platform, utilized by millions worldwide. If you are passionate about the intersection of robotics and service technology, we would love to connect with you!About the RoleWe are on the lookout for a highly skilled and hands-on Senior Electrical Engineer to spearhead the design and development of electrical systems for our four-wheeled autonomous delivery robot. This pivotal role involves delivering high-quality, production-ready hardware that encompasses high-speed digital design, PCB assembly, RF integration, harnessing, and board-level packaging.This position is highly technical, ideal for individuals who can navigate both system-level concepts and intricate circuit designs with ease.What You’ll DoDesign and innovate electrical systems for a complex mobile robotic platform, from initial concept to production.Lead high-speed digital design initiatives, including PCIe, MIPI, Ethernet, DDR, and multi-gigabit SERDES.Oversee PCBA development, including schematic capture, stack-up definition, layout guidance, impedance control, and design for manufacturability.Drive RF integration efforts including Wi-Fi, LTE/5G, GNSS, Bluetooth, RADAR, and LiDAR, focusing on antenna placement, layout constraints, and signal integrity.Develop and assess cable harness designs, connector selection, grounding strategies, and environmental robustness.Collaborate closely with mechanical engineers on board packaging, thermal management, vibration resistance, and environmental sealing.Define and implement power distribution architectures, battery interfaces, protection circuits, and system monitoring.Ensure adherence to EMI/EMC and regulatory standards (FCC/CE) including CISPR-25 and FCC Part 15 Class A.Support prototype builds and troubleshoot hardware/firmware interface issues, quickly identifying root causes and driving corrective actions.Establish design standards and conduct comprehensive design reviews within the electrical engineering team.Engage cross-functionally with firmware, robotics, systems, manufacturing, and reliability teams.

Feb 23, 2026

Apply

Senior Site Reliability Engineer at Carta | San Francisco, CA

Carta

Full-time|On-site|San Francisco, California; Santa Clara, California; Seattle, WA

Join Carta as a Senior Site Reliability Engineer, where you will play a pivotal role in enhancing our infrastructure and ensuring the reliability of our platforms. You will work collaboratively with cross-functional teams to implement innovative solutions that drive operational excellence and scalability.

Apr 3, 2026

Apply

Senior Staff Site Reliability Engineer - Tech Lead

Unify

Full-time|On-site|San Francisco Office

Join Unify as a Senior Staff Site Reliability Engineer and take the lead in transforming our technology landscape. In this pivotal role, you will spearhead initiatives to enhance our system reliability and performance, ensuring seamless operations across our platforms. Your expertise will guide a dynamic team, driving innovation and implementing best practices in site reliability engineering.

Mar 24, 2026

Apply

Senior Site Reliability Engineer

Hive

Full-time|On-site|San Francisco

About HiveHive stands at the forefront of cloud-based AI innovation, providing cutting-edge solutions that enable organizations to understand, search, and generate content. Our platform is relied upon by some of the world's most prestigious and forward-thinking companies. We empower developers with an extensive suite of state-of-the-art, pre-trained AI models that handle billions of API requests each month. In addition to our robust model offerings, we deliver comprehensive software applications backed by proprietary AI models and datasets, unlocking transformative applications in various sectors such as content moderation, brand protection, sponsorship measurement, and context-based advertising.With over $120 million in funding from esteemed investors like General Catalyst, 8VC, Glynn Capital, Bain & Company, and Visa Ventures, Hive has cultivated a vibrant global team of over 250 employees across our San Francisco, Seattle, and Delhi offices. If you’re passionate about shaping the future of AI, we invite you to join our dynamic team!DevOps and Systems TeamIn response to our distinctive machine learning demands, we have developed our own data centers focusing on distributed high-performance computing with GPU integration. While we harness the power of these data centers, our infrastructure remains hybrid, leveraging public cloud solutions when advantageous. As we scale our machine learning models for commercial use, we are expanding our DevOps and Site Reliability team to ensure the reliability of our enterprise SaaS offerings. Our ideal candidate thrives in dynamic environments, embraces automation, and believes that every task can be automated and every server can scale. You take pride in enhancing performance across all layers of our stack and are committed to never performing the same task manually twice.

Apr 20, 2022

Apply

Senior Electrical Engineer at Tacit | San Francisco

Tacit

Full-time|$150K/yr - $200K/yr|On-site|San Francisco

About TacitTacit is an innovative, early-stage deep tech startup located in San Francisco, focused on developing groundbreaking hardware to redefine human-computer interactions. With support from prominent investors such as General Catalyst, Khosla Ventures, and Greylock Partners, our founding team brings expertise from renowned institutions including Stanford, BrainGate, Oculus, and Tesla. While we are not ready to disclose our projects fully, we are committed to solving complex engineering challenges to launch revolutionary products.Position OverviewWe are seeking a talented Senior Electrical Engineer to lead the architecture, design, and deployment of next-generation neurotechnology hardware products. As a pivotal member of our expanding hardware team, you will take ownership of significant components of our electrical systems from initial design stages to mass production, playing a crucial role in the realization of sophisticated consumer electronics.Key ResponsibilitiesElectrical System Design & DevelopmentLead the electrical architecture and board-level design for mixed-signal consumer hardware systems from concept to production.Oversee schematic design and guide the development process through EVT, DVT, PVT, and mass production stages.Define system partitioning, interfaces, and component selection across computing, sensors, connectivity, and power subsystems.Conduct rigorous design reviews and establish robust electrical design and manufacturing standards.Collaborate closely with mechanical, firmware, and software teams to enhance performance, size, thermal behavior, and reliability.Effectively navigate a startup environment where engineers manage substantial parts of the hardware stack.High-Speed and High-Density Hardware DesignDesign high-density, multi-layer PCBAs employing HDI techniques, fine-pitch BGAs, and controlled-impedance routing.Implement high-speed digital interfaces such as USB, Quad-SPI, MIPI, CSI/DSI, DDR, or similar buses.Create analog and mixed-signal circuits, including amplifiers, filters, sensor interfaces, and ADC/DAC connections.Optimize PCB layout for signal integrity, power integrity, and low-noise performance.Power Electronics & Battery SystemsDesign and implement power electronics and battery management systems for our devices.Ensure compliance with safety and regulatory standards while optimizing performance and efficiency.

Mar 24, 2026

Apply

Senior/Staff Site Reliability Engineer

fal

Full-time|On-site|San Francisco

Join our dynamic team at fal as a Senior/Staff Site Reliability Engineer. In this key role, you will leverage your expertise to enhance our systems' reliability and performance. If you are passionate about building scalable systems and enjoy working in a collaborative environment, we want to hear from you!

Feb 23, 2026

Apply

Senior Platform & Reliability Engineer at Vizcom | San Francisco

Vizcom

Full-time|$200K/yr - $250K/yr|On-site|San Francisco

Agency Notice: We are not currently collaborating with recruiting agencies for this role. We kindly ask that you refrain from contacting Vizcom employees regarding this position. Any resumes submitted without prior agreement will be considered unsolicited.About VizcomVizcom is a cutting-edge visual creation platform that merges advanced web tooling with AI-driven workflows. Our technology stack incorporates React/TypeScript for the front end, Node/Koa + PostGraphile for API services, PostgreSQL, Redis, BullMQ for queuing, and a Kubernetes-based production infrastructure.We are seeking a seasoned expert to oversee platform stability and infrastructure, ensuring our system remains reliable, efficient, and resilient as we scale.Role MissionTake full ownership of service reliability: proactively prevent incidents, minimize impact during failures, and guide swift, high-quality recovery during production downtimes.This role involves hands-on technical leadership, granting you the authority to establish reliability standards and enforce production protocols.CompensationBase salary between $200,000 and $250,000, plus significant equity.Your ResponsibilitiesReliability Standards: Define and uphold SLIs/SLOs/error budgets for key user interactions.Resilience of Production Architecture: Implement failure isolation across APIs, workers, queues, and interdependencies to ensure one subsystem's failure does not disrupt core access.Kubernetes Runtime Reliability: Establish probe contracts, deployment standards, graceful shutdown protocols, scaling/resource policies, and startup safety measures.Queue & Job Safety (BullMQ/Redis): Manage poison pill containment and workload segregation.Incident Command Quality: Lead Sev1/Sev2 incident responses from containment to corrective actions.Reliability Operating System: Oversee observability quality (prioritizing signal over noise), on-call efficiency, runbook maintenance, and postmortem discipline.Deployment Safety Authority: Gate risky deployments and enforce reliability protocols whenever production health is compromised.

Feb 24, 2026

Apply

Senior Site Reliability Engineer - Future Opportunities

Twitter Inc.

Full-time|On-site|San Francisco

Join our innovative technology team at Twitter Inc. as a Senior Site Reliability Engineer. In this role, you will be pivotal in enhancing system reliability and performance, ensuring our services run smoothly and efficiently. We are seeking passionate engineers who thrive in a fast-paced environment and are eager to tackle challenging problems.

Jan 3, 2023

Apply

Senior Site Reliability Engineer at Plaud | San Francisco

Plaud Inc.

Full-time|On-site|San Francisco, CA

About Plaud Inc.Plaud is revolutionizing the way professionals enhance productivity and performance with our trusted AI work companion. Our innovative note-taking solutions have gained the admiration of over 1,500,000 users globally since our inception in 2023. We are on a mission to amplify human intelligence by developing next-generation intelligence infrastructure and interfaces that seamlessly capture, extract, and leverage what you say, hear, see, and think.Based in San Francisco, Plaud Inc. is a Delaware-incorporated company that is redefining the boundaries of human-AI collaboration through a unique combination of hardware and software solutions. We adhere to the highest standards of data security and privacy protection, with certifications including ISO 27001, ISO 27701, GDPR, SOC 2, HIPAA, and EN 18031 compliance.Discover more about our innovative solutions by visiting https://www.plaud.ai and follow us on Instagram, X, Facebook, LinkedIn, and YouTube.Why You Should Join UsAt Plaud, you will play a pivotal role in shaping the future of human-AI interaction. Here’s what we offer:A thriving, bootstrapped company with a remarkable $250M revenue run rate achieved in just three years.An opportunity to define the next-generation paradigm for human-AI interaction.Direct exposure to cutting-edge AI tools for professionals and a chance to contribute to our global expansion.Collaborate with a passionate team that values innovation, teamwork, and customer success.Advance your career in a culture that promotes continuous learning and rapid career growth.

Feb 24, 2026

Apply

Software Engineer, Reliability

OpenAI

Full-time|On-site|San Francisco

Become a vital part of the engineering teams that responsibly bring OpenAI’s transformative technologies to the world!At OpenAI, our Applied Engineering team collaborates across research, engineering, product management, and design to deliver AI solutions to both consumers and businesses. We are committed to learning from our deployments, maximizing the benefits of AI, and ensuring that this powerful technology is utilized both safely and ethically. Our priority is safety over unchecked growth.About the RoleAs OpenAI continues to expand, we are seeking seasoned engineers who excel in problem-solving to enhance the scalability of our systems. Our achievements hinge on our ability to rapidly iterate on product development while ensuring optimal performance and reliability. You will thrive in a collaborative, fast-paced environment, playing a key role in delivering our technology to millions globally, with a focus on safety and reliability. As a reliability engineer, you will lead efforts to maintain and improve the stability, scalability, and performance of our dynamic infrastructure. You will collaborate closely with cross-functional teams, including software engineers, product managers, and data scientists, to construct and sustain robust systems capable of accommodating our growing user base and workload.Your Responsibilities Include:Designing and implementing solutions to scale our infrastructure to meet increasing demands effectively.Developing and maintaining load, chaos, and synthetic testing software that enhances the reliability of systems designed by development teams.Creating and managing automation tools to streamline repetitive tasks and bolster system reliability.Overseeing the lifecycle management platform for CPU/storage, GPU, and network resources to foster efficiency and support dynamic optimization.Implementing fault-tolerant and resilient design patterns to minimize service interruptions.Establishing and maintaining service level objectives (SLOs) and service level indicators (SLIs) to ensure system reliability.Collaborating with researchers, engineers, product managers, and designers to introduce new features and research advancements to the world.Participating in an on-call rotation to address critical incidents and ensure 24/7 system availability.Your Impact: Your contributions will be essential in guaranteeing the reliability and performance of our platforms as we continue to scale our operations.

Oct 17, 2025

Create account — see all 6,720 results

1 - 20 of 6,720 Jobs

Select all on this page (20)

Apply

Senior Electrical Reliability Engineer

Astranis

Full-time|$135K/yr - $235K/yr|On-site|San Francisco

Mar 18, 2026

Apply

Senior Hardware Reliability Engineer

Gridware