Senior Hardware Reliability Engineer jobs in San Francisco – Browse 6,676 openings on RoboApply Jobs

Senior Hardware Reliability Engineer jobs in San Francisco

Open roles matching “Senior Hardware Reliability Engineer” with location signals for San Francisco. 6,676 active listings on RoboApply Jobs.

6,676 jobs found

1 - 20 of 6,676 Jobs
Apply
companyGridware logo
Full-time|On-site|San Francisco, CA

About GridwareGridware is an innovative technology firm headquartered in San Francisco, committed to safeguarding and enhancing the reliability of the electrical grid. We have pioneered a revolutionary approach to grid management known as Active Grid Response (AGR), which meticulously monitors the electrical, physical, and environmental factors influencing grid safety and reliability. Our state-of-the-art AGR platform leverages high-precision sensors to identify potential issues at an early stage, facilitating proactive maintenance and fault resolution. This holistic strategy is designed to bolster safety, minimize outages, and ensure optimal grid performance. We are proud to be supported by prominent climate-tech and Silicon Valley investors. To learn more, visit www.Gridware.io.About the RoleWe are seeking a skilled Senior Hardware Reliability Engineer to lead reliability testing, analysis, and lifetime modeling of various outdoor electronic assemblies. This pivotal role will concentrate on the electronic components of our products, collaborating closely with our mechanical-focused Reliability Engineer and engaging with the broader hardware and cross-functional teams.

Feb 21, 2026
Apply
companyMultiply Labs logo
Full-time|On-site|San Francisco

About Multiply LabsMultiply Labs is an innovative startup located in San Francisco, California, backed by renowned investors in technology and life sciences such as Casdin Capital, Lux Capital, and Y Combinator. Our goal is to develop the world's leading robotic systems and utilize them to make groundbreaking life-saving therapies accessible to everyone.We are transforming the manufacturing process of cell therapies through the creation of advanced robotic systems that automate and scale the production of these crucial treatments. Our cutting-edge robots enable biopharma companies to produce cell therapies efficiently without overhauling their existing processes, thus minimizing regulatory hurdles and risks. Unlike traditional methods that are labor-intensive and costly (often exceeding $1M per patient), our robotic solutions aim to make these vital treatments more affordable and reachable for those who need them.To discover more and view our robots in action, please visit www.multiplylabs.com and follow us on LinkedIn.Position OverviewWe are looking for a dedicated Hardware Reliability Engineer to become an essential part of Multiply Labs’ Reliability Engineering team. As a founding member, you will collaborate closely with the Hardware Product and Systems Integration teams to enhance our designs throughout the entire development lifecycle, from initial prototypes to fully deployed GMP production systems. Your contributions will directly support the delivery of life-saving therapies by ensuring our robots operate seamlessly within the high-stakes biotech environment.

Jan 28, 2026
Apply
companySamsara logo
Full-time|$204K/yr - $240K/yr|Hybrid|San Francisco, CA, United States

Who We AreSamsara (NYSE: IOT) is a trailblazer in the Connected Operations™ Cloud, a platform that empowers organizations reliant on physical operations to leverage Internet of Things (IoT) data for actionable insights and operational improvements. Our mission at Samsara is to enhance the safety, efficiency, and sustainability of the physical operations that underpin the global economy. Covering over 40% of global GDP, these sectors include agriculture, construction, field services, transportation, and manufacturing. We are dedicated to digitally transforming their operations on a large scale.Joining Samsara means you'll be part of a team that's defining the future of physical operations. You will contribute to a dynamic range of product solutions, including Video-Based Safety, Vehicle Telematics, Apps and Driver Workflows, and Equipment Monitoring. As a company that has recently gone public, you will enjoy the autonomy and support to make a significant impact as we build for the future.About the Role:Samsara's Hardware Reliability team plays a crucial role in ensuring an outstanding customer experience through reliable hardware. As a Senior Hardware Reliability Engineer, you will establish quality processes that uphold the high standards of Samsara's hardware.In this role, you will implement and execute comprehensive reliability strategies that cover the entire product development lifecycle, from concept to warranty repair. You will rapidly gather and analyze test, field performance, and manufacturing data to drive necessary actions both internally and with our suppliers, ensuring the production of top-quality products. Collaboration with hardware, firmware, and operations teams is a fundamental aspect of this role.This is a hybrid position open to candidates residing in the US, requiring you to visit our office in San Francisco three times a week.You Should Apply If:You want to impact the industries that run our world: Your efforts will lead to tangible real-world benefits—helping to maintain essential services and support vital industries.

Feb 14, 2026
Apply
companySamsara logo
Full-time|$124.1K/yr - $208.5K/yr|Hybrid|San Francisco - SF9

Who We AreSamsara (NYSE: IOT) is at the forefront of the Connected Operations™ Cloud, a transformative platform that empowers businesses reliant on physical operations to tap into Internet of Things (IoT) data. Our aim is to provide actionable insights that enhance safety, efficiency, and sustainability across vital industries such as agriculture, construction, transportation, and manufacturing. By digitally transforming these sectors, which represent over 40% of global GDP, we are contributing to a more efficient and sustainable economy.Joining Samsara means being part of a team that is defining the future of physical operations. You will engage in cutting-edge solutions, including Video-Based Safety, Vehicle Telematics, and Equipment Monitoring, within a supportive environment that fosters innovation and long-term impact.About the Role:We are seeking a Senior Hardware Systems Engineer to enhance our rapidly expanding product line. Your primary responsibility will involve leading the electrical engineering components of product architecture and design, grounded in comprehensive feasibility, design, and cost analyses. This encompasses critical aspects such as component selection, thermal management, and antenna design. You will leverage extensive telemetry and direct customer insights to inform and refine our product designs. Collaborating closely with Product Management, Firmware, and Hardware leadership, you will influence key engineering decisions while mentoring fellow engineers. The role will also require interaction with our US and Taiwan EE teams, as well as our Supply Chain and laboratory resources, to achieve our project goals effectively.This role is hybrid, requiring you to be in our San Francisco, CA office three days a week, with the flexibility to work remotely for two days. Travel may be necessary up to 25% of the time, and proximity to an international airport is essential. We offer relocation assistance for this position and welcome candidates from across the U.S. who are willing to relocate to the Bay Area.

Feb 11, 2026
Apply
companyOpenAI logo
Full-time|On-site|San Francisco

Join Our Innovative TeamAt OpenAI, our Hardware organization is pioneering cutting-edge silicon and system-level solutions tailored to meet the demands of advanced AI workloads. We pride ourselves on developing next-generation AI-native silicon while collaborating with software and research partners to create hardware that is intricately integrated with AI models. Our mission includes delivering high-performance silicon for OpenAI’s supercomputing infrastructure and designing custom tools and methodologies that accelerate innovations, specifically optimized for AI applications.Your Role in Our MissionWe are on the lookout for a dynamic and experienced Reliability/DFX Engineer who possesses extensive knowledge in scaling machine learning systems. As an integral member of our hardware team, you will collaborate with chip design, platform design, hardware health, and the wider industry ecosystem to architect, implement, and deploy dependable next-generation AI accelerator systems. You will take a holistic approach to evaluate system and chip architecture, pinpointing high-ROI opportunities that enhance reliability and availability throughout the stack while translating these insights into actionable strategies and silicon features.Key Responsibilities:Lead the architecture, implementation, and execution of DFX strategies in silicon from concept to high-volume deployment, proposing impactful features to boost reliability and fault tolerance. Your focus will encompass design for testability, reliability, availability, and serviceability of high-performance AI hardware.Develop system-level reliability models based on empirical data to guide the organization’s DFX and reliability strategy, necessitating a deep understanding of chip and system architecture, design, implementation, and component-level reliability.Collaborate with chip and platform architecture/design teams to explore and implement DFX features, including the specification and integration of digital/mixed-signal IP, firmware/system software, and DFX methodologies.Work alongside hardware health and platform design teams to enhance reliability and fault tolerance in New Product Introduction (NPI) and High-Volume Manufacturing (HVM), driving continuous, data-driven improvements across the stack through optimized operating conditions and data analysis.Act as the DFX/reliability advocate, aligning the broader industry ecosystem with OpenAI’s strategic objectives and roadmap.Qualifications:Bachelor’s degree in Engineering or related field with 15+ years of experience, or a Master’s degree with 10+ years of relevant experience.Proven expertise in DFX methodologies and reliability engineering for high-performance hardware.Strong analytical and problem-solving skills, with a track record of improving system reliability and performance.Excellent collaboration and communication abilities, capable of working effectively in a cross-functional team environment.Familiarity with AI workloads and associated hardware requirements is highly desirable.

Sep 17, 2025
Apply
companyEcho Neurotechnologies logo
Senior Hardware Test Engineer

Echo Neurotechnologies

Full-time|On-site|San Francisco

Company OverviewEcho Neurotechnologies is a pioneering startup specializing in Brain-Computer Interface (BCI) technology. We are committed to pushing the boundaries of innovation through state-of-the-art hardware engineering and artificial intelligence solutions. Our goal is to create transformative technologies that empower individuals with disabilities, enhancing their autonomy and overall quality of life.Team CultureBecome a part of our dedicated team of passionate and skilled professionals. In our dynamic early-stage environment, you will have the chance to influence key decisions that will have lasting impacts. We prioritize continuous learning and development, promoting cross-functional collaboration where your input is essential to our collective success.Role OverviewWe are on the lookout for a seasoned Senior Hardware Test Engineer to validate our custom Echo hardware systems. In this role, you will lead the testing processes for our specialized hardware devices and subsystems while developing and implementing custom test systems.Primary ResponsibilitiesConduct in-house design verification testsCoordinate testing with external laboratoriesCollaborate with the engineering team to create tailored testing solutionsWork alongside design engineers to characterize unique hardwarePrepare tests for vendor transfer

Mar 4, 2026
Apply
companyBlock, Inc. logo
Full-time|On-site|Bay Area, CA, United States of America

Role Overview Block, Inc. is looking for a Reliability Program Manager focused on hardware solutions in the Bay Area, CA. This role guides efforts to improve the reliability and performance of hardware products. The position works closely with teams across the company to spot and address risks, helping Block deliver products that meet high standards for quality and durability.

Apr 15, 2026
Apply
companyDrata logo
Full-time|$166.9K/yr - $225.9K/yr|Hybrid|Hybrid - San Francisco

Drata helps organizations demonstrate their commitment to security and integrity. The platform supports companies as they build and maintain trust with users, customers, partners, and prospects. Values Built on Trust: Consistency shapes decisions and actions. Integrity: Choosing to do what is right, every time. Customer-Obsessed: Prioritizing customer needs above all else. Competitive Fire: Striving for higher standards and greater achievements. Diversity: Welcoming different perspectives to encourage creative solutions. Automation First: Pursuing efficiency by saving time and resources wherever possible. How the Team Works Drata blends high standards with a supportive environment focused on growth. Team members are encouraged to own their work, improve continuously, and deliver meaningful results. The company values quick, informed decisions that drive immediate impact, while always keeping the mission and customer needs at the center. The San Francisco-based team uses a hybrid work model. Colleagues collaborate in the office Tuesday through Thursday, focusing on alignment and innovation. Mondays and Fridays offer flexibility for deep work or personal needs. Growth and Culture Drata has expanded to over 600 professionals worldwide, recognized for a culture that values trust, speed, and continuous learning. The environment supports both personal and professional development. See the Speed: CEO Adam Markowitz discusses Drata’s rapid journey to $100M ARR in four years. Hear the Voice of the Team: Employee stories highlight collaboration and growth at Drata.

Apr 27, 2026
Apply
companyFastly, Inc. logo
Full-time|$181.2K/yr - $217.5K/yr|On-site|Denver, CO; San Francisco, CA

At Fastly, we empower individuals to connect more effectively with the things they cherish. Our cutting-edge edge cloud platform enables customers to swiftly, securely, and reliably craft exceptional digital experiences by processing, serving, and safeguarding their applications as close to their end-users as possible — right at the edge of the Internet. Tailored for modern internet demands, our platform is programmable and supports agile software development. We proudly serve many of the world's leading companies, including GitHub, Yelp, Paramount, and JetBlue.Join us in our mission to build a more trustworthy Internet.Posting Open Date: Feb. 25, 2026Anticipated Posting Close Date*: March 25, 2026*Please note that this job posting may close early depending on the volume of applications.Role Overview:The Data Reliability team is seeking an experienced Senior Software Engineer to contribute to the development and support of next-generation data storage solutions at Fastly. The ideal candidate will possess expertise in backend and data services within cloud environments, proficiency with configuration and orchestration tools such as Terraform and Kubernetes, and the ability to create internal administration tools using Go and related technologies. Our team plays a vital role in ensuring the infrastructure, orchestration, and reliability of Fastly's most data-intensive applications, utilizing technologies like Terraform, Elasticsearch, ClickHouse, Prometheus, MySQL, and Redis across both cloud and hardware platforms. Your contributions will directly enhance our customers' success by providing product teams with a robust platform for efficient and consistent delivery of high-quality, high-throughput, globally distributed data systems and products. We embrace a distributed work model and value both collaborative and asynchronous communication styles.Key Responsibilities:Deploy, support, and maintain various critical data storage systems, scaling from gigabytes to petabytes.Develop statistics and dashboards to track service-level objectives for these systems.Create and manage tools for configuration, backup, and authenticated access to data systems employing peer review, CI/CD, and both daemon- and container-based deployment strategies.Write high-performance, maintainable, and concise code, actively participating in code reviews to enhance the codebase.

Mar 20, 2026
Apply
companyalembic logo
Full-time|On-site|San Francisco HQ

About the RoleJoin alembic as a Senior Site Reliability Engineer (SRE) and become an integral part of our mission to enhance platform reliability, observability, and operational excellence. In this pivotal role, you will collaborate with engineers and data scientists to architect, automate, and maintain the robust infrastructure that drives our platform, including data pipelines, machine learning workloads, and real-time analytics systems.This hands-on position offers significant visibility across the technology stack and provides you with the opportunity to shape the future of our infrastructure and operations.

Dec 22, 2025
Apply
companyTubi TV logo
Full-time|$227.2K/yr - $324.5K/yr|Hybrid|San Francisco, CA (Hybrid)

About the Role: At Tubi, our Site Reliability Engineering (SRE) team transcends traditional operations. We embody a software engineering ethos, leveraging a developer's toolkit to tackle the complexities of large-scale, distributed systems. Our core mission focuses on building resilience from the ground up, empowering our product teams to innovate swiftly while delivering an exceptional user experience. We oversee the availability, latency, performance, and capacity of our platform, driven by a culture of data-informed decision-making, blameless learning, and relentless automation. We are on the lookout for a seasoned and visionary Senior Manager of SRE to lead and expand our newly formed Site Reliability Engineering team. You will be more than just a people manager or tech lead; you will be the strategic architect behind our reliability roadmap. Your role will involve building and mentoring a team of skilled engineers, cultivating an environment of blameless learning and continuous improvement, while advocating for the engineering practices that balance rapid innovation with unwavering stability. You will play a pivotal role within our engineering leadership, collaborating with peers across the organization to embed reliability as a shared responsibility and a fundamental principle of our engineering culture.

Mar 17, 2026
Apply
companySigma Computing logo
Full-time|$170K/yr - $240K/yr|On-site|San Francisco, CA

About the Role Sigma Computing is growing its engineering team in San Francisco, CA. The company builds technology to help users access data with ease. As a Senior Software Engineer focused on Observability and Reliability, you will work alongside engineers who value high standards and collaboration. What You Will Do Design and build observability platforms and tools, including metrics collection, logging, distributed tracing, dashboards, alerting, and application performance monitoring. Work with technologies such as Go, OpenTelemetry, and Kubernetes to solve reliability challenges. Take part in on-call rotations to help maintain strong uptime for Sigma’s services. Create tools and processes to improve cloud incident triage and reduce downtime. Define and promote practices that make systems and services measurable and observable. Join design and code reviews with peers and stakeholders to reinforce quality and effective collaboration.

Apr 25, 2026
Apply
companyHyperbolic Labs logo
Full-time|On-site|San Francisco, CA

Who We AreAt Hyperbolic Labs, we are committed to democratizing AI by removing barriers to computing power with our Open-Access AI Cloud. By aggregating global computing resources, we provide an innovative GPU marketplace and AI inference service that ensures both affordability and accessibility. As trailblazers at the convergence of AI and open-source technology, we envision a future where AI innovation is only limited by creativity, not by resource availability. We invite forward-thinking individuals who share our dedication to making AI universally accessible, secure, and affordable. Join us in crafting a platform that empowers innovators worldwide to realize their visionary AI projects.In anticipation of our growth following our Series A funding, our team — guided by co-founders with advanced degrees in AI, Mathematics, and Computer Science — is set to transform the computing landscape.About the RoleWe are in search of a skilled Site Reliability Engineer to guarantee that Hyperbolic's GPU marketplace and AI infrastructure function with outstanding reliability, performance, and security. As an aggregator of computational resources from numerous global providers, our service level objectives (SLOs), trust, and economic efficiency are critical to our product. Your key responsibilities will include defining and maintaining service level objectives, developing resilient incident response protocols, managing capacity across our extensive GPU network, and implementing secure rollout and rollback mechanisms to ensure uninterrupted platform operation around the clock.In this influential role, you'll set the reliability benchmarks that foster customer trust in our platform, design comprehensive monitoring and alerting systems for enhanced infrastructure visibility, automate capacity management and resource allocation processes, lead incident response and post-mortem evaluations, and collaborate closely with engineering teams to bolster system resilience. Security and infrastructure hardening will be paramount, necessitating strong isolation protocols between tenants and suppliers, the implementation of effective key management systems, and the establishment of compliance frameworks. This high-impact position will directly affect our ability to deliver on our commitment to providing affordable, accessible AI compute at scale.

Mar 26, 2026
Apply
companyOpenAI logo
Full-time|Hybrid|San Francisco

About Our TeamAt OpenAI, our Hardware organization is at the forefront of developing cutting-edge silicon and system-level solutions tailored for the specific demands of advanced AI workloads. Our team is dedicated to creating the next generation of AI-native silicon, collaborating closely with software and research partners to co-design hardware that is seamlessly integrated with AI models. We not only deliver production-grade silicon for OpenAI’s supercomputing infrastructure but also innovate custom design tools and methodologies that drive acceleration and optimization specific to AI.About This RoleAs a member of our hardware optimization and co-design team, you will play a crucial role in co-designing future hardware from various vendors, focusing on programmability and high performance. You will partner with our kernel, compiler, and machine learning engineers to comprehend their distinct requirements concerning ML techniques, algorithms, numerical approximations, programming expressivity, and compiler optimizations. Your advocacy for these constraints will help shape and influence future hardware architectures aimed at efficient training and inference for our models. If you are passionate about efficiently distributing large language models across devices, optimizing system-wide networking bottlenecks, and customizing the compute pipeline and memory hierarchy of hardware platforms while simulating workloads at various abstraction levels, then this opportunity is perfect for you!This position is based in San Francisco, CA, utilizing a hybrid work model of three days in the office each week, with relocation assistance available for new hires.Key Responsibilities:Collaborate on the co-design of future hardware focusing on programmability and performance with hardware vendors.Support hardware vendors in developing optimal kernels and integrating support within our compiler.Generate performance estimates for critical kernels across diverse hardware configurations, influencing decisions regarding compute core and memory hierarchy features.Create system performance models at various abstraction levels and conduct analyses to guide decisions on scaling and front-end networking.Engage with machine learning engineers, kernel engineers, and compiler developers to align on high-performance accelerator needs.Facilitate communication and coordination with internal and external partners.Shape the roadmap for hardware partners to optimize their products for our AI capabilities.

Feb 11, 2026
Apply
companyAstranis Space Technologies Corp. logo
Senior Reliability Test Engineer

Astranis Space Technologies Corp.

Full-time|$130K/yr - $180K/yr|On-site|San Francisco

Astranis is at the forefront of satellite technology, crafting advanced satellites designed for high orbits to broaden humanity's exploration of the solar system. Our satellites deliver dedicated, secure networks to a diverse range of esteemed clients worldwide, including large enterprises, government entities, and the US military. With five satellites currently operational and several more set to launch, we are addressing a robust backlog of over $1 billion in commercial contracts.We take pride in being the leading choice for satellite communications among clients with demanding standards for uptime, data security, network visibility, and customization. Having secured over $750 million from top-tier investors such as Andreessen Horowitz, Blackrock, and Fidelity, our team of 450 engineers and entrepreneurs operates from our expansive 153,000 sq. ft. headquarters in Northern California, USA.Senior Reliability Test EngineerAs a Senior Reliability Test Engineer, you will play a pivotal role in collaborating across all engineering disciplines to ensure our hardware achieves exceptional quality and reliability standards. With Astranis ramping up satellite production, your expertise will be essential in establishing a comprehensive reliability test program that supports the development of new product designs, monitors manufacturing processes, and identifies long-term reliability issues. The ideal candidate will possess extensive engineering experience with high-reliability products, demonstrate autonomy, and have the capability to design a reliability test program from the ground up.

Mar 9, 2026
Apply
companyAstranis logo
Full-time|$165K/yr - $220K/yr|On-site|San Francisco

Astranis is at the forefront of satellite technology, dedicated to expanding humanity's presence in the solar system. Our advanced satellites operate in high orbits, delivering dedicated, secure networks to a diverse clientele that includes large enterprises, sovereign governments, and the US military. With five satellites currently in orbit and several more set to launch, we are excited to work through a robust backlog of over $1 billion in commercial contracts.Astranis is recognized as the premier satellite communications partner for clients with demanding standards for uptime, data security, network visibility, and customization. Having secured over $750 million from top-tier investors like Andreessen Horowitz, Blackrock, and Fidelity, our team of 450 engineers and entrepreneurs designs, builds, and operates our satellites from our expansive 153,000 sq. ft. headquarters in Northern California.Position: Senior Electrical Engineer - SDR Hardware DesignAs a Senior Electrical Engineer specializing in Software Defined Radio (SDR) Hardware at Astranis, you will spearhead the design, qualification, and testing of our bespoke software-defined radios. We seek innovative professionals skilled in creating cutting-edge high-speed hardware. If you think in terms of volts, bits, and dB's, this position is tailored for you! Collaborating with the RTL, Communications, and RF design teams, you will be instrumental in crafting superior radio systems for our payload and TTC systems.

Mar 9, 2026
Apply
companyAstranis logo
Full-time|$135K/yr - $235K/yr|On-site|San Francisco

Astranis is revolutionizing satellite technology by creating advanced spacecraft designed for high orbits, thereby extending humanity's presence in the solar system. Our satellites deliver dedicated and secure networks to an elite clientele, including large corporations, government entities, and the U.S. military. With five satellites successfully launched and a robust pipeline of over $1 billion in commercial contracts, Astranis is set for growth as we prepare for numerous upcoming launches.We are the go-to satellite communications partner for clients demanding exceptional uptime, data security, network visibility, and tailored solutions. Backed by over $750 million from industry-leading investors such as Andreessen Horowitz, Blackrock, and Fidelity, our team of 450 engineers and entrepreneurs thrives in our 153,000 sq. ft. headquarters in Northern California.Senior Electrical Reliability EngineerAs a Senior Reliability Engineer at Astranis, you will be pivotal in ensuring that our spacecraft electronics and systems fulfill our reliability and availability requirements. Collaborating with a multidisciplinary engineering team, you will push the boundaries of geo-synchronous spacecraft design and achieve previously unattainable performance in space. Your expertise will ensure that Design for Reliability remains central to our engineering efforts.

Mar 18, 2026
Apply
companyPlanet Labs Inc. logo
Full-time|$182.9K/yr - $228.6K/yr|Hybrid|San Francisco, CA

Welcome to Planet. We believe in leveraging space technology to enhance life on Earth.Planet is at the forefront of satellite imaging, designing, constructing, and operating the largest constellation of imaging satellites in history. Our innovative cloud-based platform delivers a wealth of empirical data to key players in commercial, environmental, and humanitarian fields. We integrate space and data solutions to create meaningful impact.Our clients and users around the globe utilize Planet's datasets to innovate technologies, generate revenue, advance research, and tackle some of the world's most pressing challenges.Emphasizing a people-first culture, we focus on fostering a collaborative environment that prioritizes the growth and well-being of our team members. Join Planet to help transform perceptions of our world.We are a global company, with team members working remotely and from offices in San Francisco, Washington DC, Germany, Austria, Slovenia, and The Netherlands.About the Role:We are searching for a Senior Camera Hardware Engineer to lead the electrical architecture of our satellite imaging payloads within the Electrical Engineering team. This position covers the entire electrical lifecycle of our camera systems, from recommending the best image sensors to designing ultra-low-noise power rails and ensuring high-speed data integrity from the sensor to the FPGA. The engineer will collaborate closely with Electrical, Mechanical, Thermal, and Optical engineers to refine the entire optical payload (camera and telescope) to fulfill our mission objectives.This is a full-time, hybrid position requiring three days in our San Francisco office each week.

Mar 16, 2026
Apply
companyOpenAI logo
Full-time|Hybrid|San Francisco

About Our TeamAt OpenAI, our Hardware team is at the forefront of developing cutting-edge silicon and comprehensive system solutions tailored to the specific needs of advanced AI workloads. We pride ourselves on crafting the next generation of AI-native silicon, collaborating closely with software engineers and research teams to ensure our hardware is seamlessly integrated with AI models. Our mission extends beyond creating production-grade silicon for OpenAI’s supercomputing infrastructure; we also innovate custom design tools and methodologies that spark innovation and enable hardware specifically optimized for AI.About the RoleAs a Software Engineer on the Scaling team, you will play a pivotal role in designing and optimizing the foundational stack that manages computation and data flow across OpenAI’s supercomputing clusters. Your responsibilities will include crafting high-performance runtimes, developing custom kernels, enhancing compiler infrastructure, and building scalable simulation systems to validate and optimize distributed training workloads.This position requires you to work at the intersection of systems programming, machine learning infrastructure, and high-performance computing, where you will create intuitive developer APIs alongside highly efficient runtime systems. You will balance usability and introspection with the imperative for stability and performance across our dynamic hardware landscape.This role is based in San Francisco, CA, featuring a hybrid work model (three days in-office per week). Relocation assistance is provided.Key Responsibilities:Design and implement APIs and runtime components to efficiently manage computation and data movement for diverse ML workloads.Enhance compiler infrastructure by developing optimizations and compiler passes to accommodate evolving hardware advancements.Engineer and refine compute and data kernels, ensuring precision, high performance, and compatibility across simulation and production settings.Analyze and optimize system bottlenecks, focusing on I/O, memory hierarchy, and interconnects at both local and distributed scales.Create simulation infrastructure to validate runtime behaviors, test modifications to the training stack, and support the early development of hardware and systems.Quickly deploy updates to runtime and compiler across new supercomputing builds in close collaboration with hardware and research teams.Work across a varied tech stack, primarily utilizing Rust and Python, with a chance to influence architectural decisions within the training framework.

Oct 31, 2025
Apply
companyCarta logo
Full-time|On-site|San Francisco, California; Santa Clara, California; Seattle, WA

Join Carta as a Senior Site Reliability Engineer, where you will play a pivotal role in enhancing our infrastructure and ensuring the reliability of our platforms. You will work collaboratively with cross-functional teams to implement innovative solutions that drive operational excellence and scalability.

Apr 3, 2026

Sign in to browse more jobs

Create account — see all 6,676 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.