Software Engineer Generalist In Systems jobs in San Francisco – Browse 5,703 openings on RoboApply Jobs

Software Engineer Generalist In Systems jobs in San Francisco

Open roles matching “Software Engineer Generalist In Systems” with location signals for San Francisco. 5,703 active listings on RoboApply Jobs.

5,703 jobs found

1 - 20 of 5,703 Jobs
Apply
companyThinking Machines Lab logo
Full-time|$350K/yr - $475K/yr|On-site|San Francisco

At Thinking Machines Lab, we are dedicated to empowering humanity by advancing collaborative general intelligence. Our vision is to create a future where everyone can access the knowledge and tools necessary to harness AI for their unique needs.Our diverse team of scientists, engineers, and builders has developed some of the most recognized AI products, including ChatGPT and Character.ai, as well as notable open-weight models like Mistral, and popular open-source projects such as PyTorch, OpenAI Gym, Fairseq, and Segment Anything.About the RoleWe are currently seeking versatile infrastructure and systems engineers to help construct the foundational systems that support our models and facilitate research and product development. Your contributions will enable teams to create and deliver groundbreaking AI products.As a member of a small, high-impact team, you will be responsible for architecting and scaling the core infrastructure that underpins our operations. This role involves working across the entire technical stack, addressing complex distributed systems challenges, and developing robust, scalable platforms.Infrastructure is vital to our success; it serves as the foundation for every innovation. You will collaborate directly with researchers to expedite experiments, enhance infrastructure efficiency, and derive critical insights from our models, products, and data assets.Note: This is an evergreen role, meaning we are continuously accepting expressions of interest. Due to the volume of applications, there may not always be an immediate match for your skills and experience. However, we encourage you to apply. Applications are reviewed regularly, and we reach out to candidates as new opportunities arise. You may reapply if you gain additional experience, but please wait at least six months between applications. Additionally, we occasionally post specific roles for particular projects or teams, and you are welcome to apply for those as well.What You’ll DoInterviews will be conducted in a general manner, but project selection will consider your interests and experience alongside the needs of the organization. This flexible approach allows us to align talented engineers with the infrastructure teams where they will have the greatest influence and opportunities for growth.Depending on your expertise and interests, you may contribute to various areas such as:Core Infrastructure: Supporting teams that train, research, and ultimately serve AI models by building the infrastructure required for reliable and secure training of frontier models. This may include developing systems and managing large Kubernetes clusters with GPU workloads.

Nov 27, 2025
Apply
companyGeneralist logo
Full-time|On-site|San Francisco Bay Area (San Mateo) or Boston (Somerville)

About the RoleAs a Software Engineer specializing in Robotics Controls, you will take charge of the software that drives our robotic systems. Your role will involve close collaboration with machine learning teams to transform intended actions into smooth, efficient, and safe movements across various robotic embodiments. You will work alongside hardware engineers and partners to ensure optimal performance of systems, dynamics, and components.Key Responsibilities:Develop, implement, and rigorously test the complete real-time control system, ranging from low-level firmware to specifications for each robotic embodiment.Manage various robotic embodiments, from standard industrial collaborative robot arms to custom-designed hardware.Ideal Candidate Profile:Extensive knowledge in rigid body kinematics and robotic control systems.Proven experience in deploying robots in production environments with high reliability standards.Proficiency in high-performance, production-ready programming using Python and C++.Familiarity with kinematics and simulation libraries.Experience with robotic middleware frameworks such as ROS/ROS2/ZCM.

Feb 12, 2026
Apply
companyGeneralist logo
Full-time|On-site|San Francisco Bay Area (San Mateo) or Boston (Somerville)

About the RoleAs a Software Engineer, you will play a pivotal role in developing applications utilized by customers, deployment partners, researchers, and various internal teams to diagnose, educate, and perform quality assurance on our robotic systems.This position involves cutting-edge full-stack web and API engineering in a rapidly evolving AI-driven landscape.Your Responsibilities Will Include:Designing and implementing external and internal API interfaces for seamless interaction with Generalist systems.Creating tools and workflows that support and expedite the core objectives of researchers, engineers, and operations teams across the organization.Developing the primary web application used by researchers, operations, and deployment teams to assess and train robotic models.You May Excel in This Position If You:Possess a strong intuition for crafting exceptional user interfaces and experiences.Have a solid understanding of modern web stack technologies, including Typescript, React, FastAPI, Postgres, and AWS.Know how to effectively manage and utilize contemporary AI coding agents to enhance development speed while ensuring high production quality.

Feb 12, 2026
Apply
companymergeapiintegrationsandbox logo
Software Engineer - Generalist

mergeapiintegrationsandbox

Full-time|On-site|San Francisco

We are looking for a passionate and skilled Software Engineer - Generalist to join our dynamic team. In this role, you will collaborate with cross-functional teams to design, develop, and implement innovative software solutions that drive our mission forward. If you thrive in a fast-paced environment and are eager to tackle complex challenges, this is the perfect opportunity for you!

Apr 24, 2025
Apply
companyGeneralist logo
Full-time|Remote|San Francisco Bay Area (San Mateo) or Boston (Somerville)

Are you passionate about the intersection of robotics and artificial intelligence? We are seeking a Robot Learning Generalist to join our innovative team. In this role, you will leverage your expertise to develop and implement advanced learning algorithms that enhance robotic capabilities. You will collaborate with a diverse group of engineers and researchers to push the boundaries of what robots can achieve in real-world scenarios.As a part of our team, you will have the opportunity to work on cutting-edge projects that have a tangible impact on the future of robotics. Your contributions will help shape the development of intelligent systems that can learn from their environment and adapt to new challenges.

Mar 24, 2026
Apply
companyGeneralist logo
Full-time|On-site|San Francisco Bay Area (San Mateo) or Boston (Somerville)

About the RoleIn this exciting position, you will address comprehensive challenges to enhance the performance of our AI models deployed on robotic systems. Your responsibilities will include adding new features to our video processing data pipeline, updating our machine learning data loaders, training models to validate your modifications, and testing these changes in real-world robotic applications. This role requires the integration of numerous distributed Python services to achieve specific data processing and application tasks, alongside managing substantial cloud infrastructure for efficient business logic processing at scale.Your responsibilities will include:Conceptualizing and implementing innovative solutions to enhance system robustness, scalability, and speed.Revamping existing systems and services to accommodate significant future growth.Developing business logic to ensure our robots access the necessary data and that customers receive appropriate access to our robotic solutions.You may excel in this role if you:Possess extensive experience in building complex distributed applications or data pipelines at scale.Have a background in processing petabytes of data, especially video data.Demonstrate expertise in Python, with foundational knowledge in distributed infrastructure and solid understanding of modern machine learning principles.Have a robust foundation in contemporary ML techniques with experience in large-scale ML training and production deployments.Have familiarity with distributed cloud infrastructure and a deep understanding of cloud networking, permissions, and container orchestration (Kubernetes).About GeneralistAt Generalist, our mission is to realize the potential of general-purpose robots. We envision a future where industries and homes thrive on the collaboration between humans and machines. Our robots are designed to enhance productivity and efficiency.We focus on developing embodied foundation models, starting with dexterity, which necessitates pushing the boundaries of data, models, and hardware to enable robots to intelligently interact with their environments.Our company is deeply rooted in large-scale AI and robotics, with a team drawn from leading organizations like OpenAI, Boston Dynamics, and Google DeepMind, all committed to delivering groundbreaking advancements in AI technology.

Feb 12, 2026
Apply
companyHover logo
Full-time|$194K/yr - $239K/yr|On-site|san_francisconew_york

Hover is revolutionizing property design and protection through innovative technology. Our proprietary AI, built on over a decade of real property data, empowers users to visualize and estimate property transformations effortlessly. Homeowners, contractors, and insurance professionals depend on Hover for accurate, fully measured interactive 3D models—achieved in minutes with just a smartphone scan.At Hover, we are fueled by curiosity and purpose, committed to serving our customers, communities, and each other. We believe that diverse perspectives foster the best ideas, and we are dedicated to nurturing an inclusive, high-performance culture that promotes growth, accountability, and excellence. Supported by top investors like Google Ventures and Menlo Ventures, and trusted by industry leaders including Travelers, State Farm, and Nationwide, we are reshaping how people engage with their environments.Why We Want YouAt Hover, 3D data is at the heart of everything we do—from enhancing home visualization to transforming the construction and insurance sectors. We are seeking a Senior Software Engineer with extensive experience in 3D geometry, graphics, and visualization to help us expand the frontiers of immersive, impactful user experiences.As a key member of our 3D & Measurements team, you will play a pivotal role in developing and enhancing the core systems that power our leading products, including our PDF measurements report, Workspace for exteriors, and Virtual Walkthrough for interiors. You will collaborate closely with talented designers and engineers across frontend, backend, graphics, and DevOps to create state-of-the-art 3D solutions that meet our customers' needs.This is an exceptional opportunity to join a world-class team at the intersection of technology, design, and practical application—where your contributions will be recognized and valued daily.

Mar 20, 2026
Apply
companyGeneralist logo
Full-time|On-site|San Francisco Bay Area (San Mateo) or Boston (Somerville)

About the RoleJoin Generalist as the inaugural full-time member of our talent team, where you will play a pivotal role in shaping our hiring strategy. This multifaceted position encompasses full-cycle technical recruiting, while also addressing various other hiring needs as they arise. Your contributions will directly influence the growth and composition of our highly skilled team.Your Responsibilities Include:Collaborating closely with founders and executive leadership to identify and recruit key talent across all company functions.Managing the entire recruitment process from sourcing candidates to successfully closing offers.Designing and implementing efficient hiring processes and resources tailored to our evolving needs.You May Excel in This Role If You:Possess extensive experience in full-cycle recruiting.Demonstrate a quick aptitude for understanding complex technical positions.Have a strong intuition for assessing talent and interpersonal dynamics.Can adapt recruitment processes to align with the current goals of the team and company.We utilize Ashby as our Applicant Tracking System (ATS).

Feb 13, 2026
Apply
companySpellbrush logo
Full-time|On-site|San Francisco

Join Our Team as a Generalist Software EngineerAbout SpellbrushAt Spellbrush, we are dedicated to creating exceptional anime games. As the world's foremost generative AI studio, we proudly developed niji・journey.We are exploring how AI can empower artists to create masterpieces in the complex realm of video games. Our games exemplify a perfect blend of balanced gameplay and a deep appreciation for visual arts. If you have a passion for turn-based strategy games, we encourage you to apply!Your RoleAs we expand rapidly, we are in search of talented generalist engineers to contribute to the development of advanced data pipelines, foundational datasets, and the infrastructure that supports them. If you find excitement in running batch jobs across thousands of GPUs to process petabytes of anime images and videos, this position is for you!Your ImpactYou will shape the processes and infrastructure necessary to transform and make data readily accessible across the organization, whether it involves embeddings, video, bounding boxes, or more. Collaborating with our elite AI researchers will be key to understanding their requirements and designing, building, and monitoring pipelines that not only meet current demands but also scale seamlessly with our growing data volume.

Oct 30, 2024
Apply
companymomentic logo
Full-time|On-site|San Francisco

Join Momentic and Shape the Future of Quality.At Momentic, we are at the forefront of innovation with our comprehensive quality platform, leveraging cutting-edge AI agents to empower our clients in maintaining quality throughout every phase of the Software Development Life Cycle (SDLC).Leading engineering teams at renowned companies such as Notion, Bilt, Quora, and Xero rely on Momentic to deliver exceptional products. Our platform executes millions of tests daily, ensuring the highest standards of quality.With a vast problem space to explore, we are eager to bring on board several talented software engineers across various tech stacks. Your expertise will help us build and own significant components of our solution.About Our ProductWe are developing an AI-native automated testing platform.Our product is highly integrated into our clients' workflows, serving as a quality gate for every pull request and deployment.We stand out in the market, far surpassing traditional tools like Selenium, Cypress, and Playwright.Experience our platform in action by watching the demo video on our website.About UsWe are a dynamic team of 12 professionals with backgrounds from companies like Robinhood, Retool, WeWork, Qualtrics, and Assembled.Located in San Francisco, we work in a collaborative office environment.Recently, we secured a Series A funding round, raising $15M led by Standard Capital, with participation from our esteemed investors, including Y Combinator, FCVC, and Transpose Platform.Ideal Candidate ProfileYou thrive on tackling complex challenges that involve both product development and technical uncertainties.You possess robust engineering fundamentals, write efficient code, and are aware of your strengths and areas for growth.You prefer hands-on work over meetings and are focused on productivity and delivering results.You excel in environments that grant you autonomy and allow you to manage your own projects.You are located in San Francisco or are willing to relocate, and you value in-person collaboration as part of our company culture.

Jun 23, 2025
Apply
companyGigaml logo
Full-time|On-site|San Francisco

Role Overview Gigaml is hiring an AI Talent Systems and Recruiting Generalist based in San Francisco. This position focuses on using AI technologies to improve recruiting workflows and talent management systems. The role works closely with multiple teams to shape and carry out recruitment strategies that help bring in strong candidates.

Apr 17, 2026
Apply
companyCrusoe logo
Full-time|On-site|San Francisco, CA - US

Join Crusoe as a Principal Systems Software Engineer and play a vital role in revolutionizing the tech industry. You will lead the development of innovative software solutions that enhance our systems and platforms, contributing to the overall mission of providing efficient and sustainable computing resources. Your expertise will help shape the future of our software architecture and ensure seamless integration across various applications.

Feb 25, 2026
Apply
companyOpenAI logo
Full-time|On-site|San Francisco

About Our Team:Join the innovative Database Systems team at OpenAI, where we specialize in high-performance distributed databases. We are the architects behind Rockset, a cutting-edge real-time search, analytics, and vector database that powers all vector search and retrieval augmented generation (RAG) at OpenAI. Rockset underpins core functionalities across all OpenAI product lines and supports various critical internal applications.About the Role:We are in search of engineers who are passionate about distributed systems, performance optimization at a low level (with our core engine developed in C++), and constructing scalable database infrastructures from scratch. As a member of the Database Systems team, you will play a key role in enhancing the core database engine, making significant contributions to ingestion, query execution, indexing, and storage improvements. You will collaborate with multiple teams across OpenAI to unlock new product capabilities and ensure the reliability and scalability of our online database as usage expands exponentially.Your Responsibilities Will Include:Design, develop, and maintain high-performance distributed systems.Identify and address performance bottlenecks to elevate infrastructure capabilities.Define and guide the long-term technical vision and evolution of the system.Collaborate with product, engineering, and research teams to deliver robust and scalable infrastructure.Investigate complex production issues across the entire technology stack.Contribute to incident response, retrospective analyses, and establishing best practices for system reliability.You Will Excel In This Role If You:Possess substantial experience in building, scaling, and optimizing distributed systems.Exhibit a keen interest in database internals, storage engines, or low-latency query systems.Enjoy tackling complex performance challenges in high-throughput systems.Have experience managing and operating production clusters at scale (e.g., Kubernetes or similar orchestration tools).Approach scalability, correctness, and reliability with a rigorous mindset.Thrive in a fast-paced environment where you can make a significant impact.Qualifications:4+ years of relevant industry experience with a focus on distributed systems.Proficiency in C++ or similar low-level programming languages.Strong problem-solving skills and attention to detail.Experience with performance monitoring and optimization tools.Excellent collaboration and communication skills.

Jul 29, 2025
Apply
companysfcompute logo
Full-time|On-site|San Francisco, CA

Join us at sfcompute, where we are revolutionizing the future by mitigating risks associated with the largest infrastructure development in history.As the demand for GPU clusters surges, financing these data centers and their supporting infrastructure has never been more critical. Our innovative approach ensures that financing is secured through long-term contracts, providing peace of mind to both lenders and developers.In the fast-paced world of AI and compute resources, we are creating a liquid market for GPU offtake, allowing even small startups to access high-end computing power without the burdens of traditional financing.About the RoleAs a Systems Software Engineer at sfcompute, you will be instrumental in developing a GPU market that brings the advanced software capabilities of hyperscalers to our innovative GPU neoclouds. Your responsibilities will encompass provisioning and monitoring bare metal servers with our virtualization orchestration software, as well as collaborating with our GPU marketplace to facilitate user configurations of VMs, networks, and storage.Key tasks include creating and maintaining a Linux OS image tailored for our tools, ensuring consistent deployment across nodes with specific data-center adjustments, and designing the API protocols and servers for user interaction.Our primary programming language is Rust, which enables us to write efficient code across all system layers, from web servers to kernel coordination. If you are familiar with memory-managed languages like C and possess experience in higher-level programming, we encourage you to apply.

Feb 27, 2026
Apply
companyLumafield logo
Full-time|On-site|San Francisco, CA

About Lumafield: Established in 2019, Lumafield has pioneered the development of the world's first accessible X-Ray CT scanner specifically designed for engineers. Our intuitive scanner, combined with cloud-based software, empowers engineers to gain unparalleled insights into their projects at a remarkably affordable cost. Engineers face high-stakes decisions daily, necessitating tools that provide maximum visibility into their designs. By delivering exceptional product clarity and AI-enhanced tools that identify issues and produce quantitative insights, Lumafield is set to transform the creation, manufacturing, and application of complex products across various sectors. Our company thrives on impact and is dedicated to delivering the utmost value to our customers, ensuring their needs drive our development. Our talented team consists of leading researchers, industrial designers, PhD holders, innovators, and startup founders, all working collaboratively without egos. We proudly receive backing from prestigious venture capital firms, including Kleiner Perkins, Lux Capital, DCVC, and Spark Capital.Headquartered in Cambridge, MA, with an additional office in San Francisco, CA, we are excited to grow our team.About the Role: As a Senior Systems Software Engineer at Lumafield, you will be instrumental in developing the software that drives our cutting-edge, in-line manufacturing CT scanning products. You will engage with state-of-the-art X-ray physics, high-speed detectors, image processing, and embedded systems. Collaborating within a small team focused on our latest hardware, you will harness your expertise to maximize system performance and achieve outstanding results for our clients. This position is perfect for those eager to take ownership of embedded systems, firmware, and software design in an early-stage product environment. This role is based in our San Francisco, CA office, with occasional travel required to our Cambridge, MA office.

Mar 18, 2026
Apply
companyAurelius Systems logo
Full-time|On-site|San Francisco

About Us:Aurelius Systems is a venture capital-backed startup at the forefront of defense technology, specializing in the development of autonomous, edge-deployed robotic systems utilizing directed energy for counter-unmanned aerial systems (UAS).Our innovative approach involves creating laser systems designed to neutralize drones.With a dedicated team of approximately 10 engineers, former U.S. military personnel, and industry experts, we are committed to advancing America's capabilities in directed energy technology, delivering the first cost-effective and reliable laser weapon systems.Inspired by the philosophy of Marcus Aurelius, we emphasize consistent effort and accountability in our work, embodying a culture of high output without excuses. Following in the footsteps of pioneers like Henry Ford, we embrace innovation and action within our small but impactful team.In addition to our San Francisco headquarters, we are proud to operate a manufacturing hub in Detroit and conduct field tests weekly on our expansive private range.If you thrive on seeing your engineering contributions directly in action rather than being confined to a lab, we encourage you to explore this opportunity.The Position & Your Contribution:As a Robotics Software Systems Engineer, your primary responsibility will be to ensure that all subsystems function seamlessly and efficiently together.Our system comprises a complex array of subsystems including sensing, computer vision, machine learning inference, control systems, power management, and mechanical actuation. Achieving minimal processing time and inter-process latency is crucial for successfully targeting our nimble and evasive UAS.The key area we are looking to fill is real-time systems performance at the hardware interface. You should possess a deep understanding of how software execution impacts physical system behavior, how latency accumulates across CPU, GPU, memory, and I/O, and how bandwidth limitations influence sensor data processing. We need an engineer who is detail-oriented, considering microseconds, memory bandwidth, cache behavior, and system determinism.In our tight-knit team of around 10 engineers, you will have the opportunity to take ownership of systems that are field-tested. The success of our tests is binary—it's either effective or it isn't—and your role will involve iterative improvement based on real-world outcomes.Your Responsibilities:Manage the latency budget for the entire platform, from data sensing to actuation.Profile and mitigate latency across CPU, GPU, memory, and I/O interfaces.Develop and optimize kernels for high-throughput, low-latency operations.Adjust memory access patterns for optimal performance.

Mar 2, 2026
Apply
companyOpenAI logo
Full-time|On-site|San Francisco

About Our TeamThe Platform Systems team at OpenAI is at the forefront of innovation, merging advanced AI technologies with large-scale distributed systems. We are tasked with creating the engineering and research infrastructure essential for training OpenAI's premier models on some of the most powerful, custom-built supercomputers globally.Our team is dedicated to developing the core software for model training, delving deep into the technological stack. This encompasses collective communication, compute efficiency, parallelism strategies, fault tolerance, failure detection, and observability. The systems we design are pivotal to enhancing OpenAI's research capabilities, facilitating reliable and efficient training at the leading edge of technology.We work in close partnership with researchers across the organization, continuously integrating insights from various OpenAI projects to advance our training platform.About the RoleAs a Software Engineer specializing in Platform Systems, you will architect and develop distributed systems that enhance visibility into large-scale training operations, ensuring their dependable operation at scale.Your responsibilities will include designing systems for failure detection, tracing, and observability that pinpoint slow or malfunctioning nodes, identify performance bottlenecks, and assist engineers in optimizing extensive distributed training tasks. This infrastructure is integral to the functionality of OpenAI's training stack and is continuously evolving to accommodate new use cases and increasingly intricate workloads.This position is central to our training infrastructure, merging systems engineering, performance analysis, and large-scale debugging.Key ResponsibilitiesDesign and develop distributed failure detection, tracing, and profiling systems tailored for large-scale AI training jobs.Create tools to identify slow, faulty, or errant nodes and deliver actionable insights into system behavior.Enhance observability, reliability, and performance across OpenAI's training platform.Troubleshoot and resolve issues within complex, high-throughput distributed systems.Collaborate effectively with systems, infrastructure, and research teams to advance platform capabilities.Adapt and expand failure detection and tracing systems to support new training paradigms and workloads.Ideal Candidate ProfilePossesses a deep passion for performance, stability, and observability in distributed systems.Demonstrates proficiency in systems engineering and performance analysis.Has experience in debugging high-throughput distributed systems.Exhibits strong collaboration skills with a track record of working with cross-functional teams.Shows adaptability and eagerness to embrace new technologies and methodologies.

Jan 23, 2026
Apply
companyOpenAI logo
Full-time|Hybrid|San Francisco

Location: San Francisco, CA (Hybrid: 4 days onsite/week). Relocation assistance available.About Our Team:At OpenAI, we are at the forefront of technology, creating foundational platform software that ensures our consumer products are reliable, secure, and high-performing. Our team collaborates across various system layers, working closely with engineering partners to deliver exceptional capabilities from initial concept to final launch.Role Overview:We are looking for a passionate Systems Software Engineer to lead the design, implementation, and debugging of critical platform components and the pipelines that build and update system images. Your focus will span across operating system layers, emphasizing performance optimization, security enhancements, and in-depth system debugging to deliver production-grade systems that exceed expectations.Key Responsibilities:Design and develop robust system-level components and services within both kernel and user spaces.Configure and maintain essential OS platform services (init, services, networking, security policies) and related tools.Build and manage image and update pipelines, ensuring their reliability, reproducibility, and rollback safety.Instrument system performance through profiling and tracing; enhance CPU, memory, I/O, and energy efficiency.Oversee platform observability and reliability, including logging, crash capture, watchdogs, and diagnostics.Collaborate with cross-functional teams to define interfaces and deliver comprehensive end-to-end features.Establish and promote strong engineering practices such as code reviews, continuous integration, reproducible builds, and effective release management.Work alongside external vendors to support builds and deployments.You Will Excel in This Role If You:Have successfully launched production systems software on modern operating systems.Possess proficiency in C/C++ and a scripting language, with a strong understanding of OS internals including concurrency, memory management, filesystems, networking, and power management.Demonstrate exceptional systems debugging skills utilizing debuggers, tracers, profilers, and logs across kernel/user-space boundaries.Comprehend the configuration of platform services and interfaces, effectively translating requirements into stable, well-documented APIs.Are knowledgeable about user-space foundations including service management, IPC, networking, packaging, and automation.Have experience collaborating with external partners to deliver high-quality software solutions.

Dec 16, 2025
Apply
companyAchira logo
Full-time|On-site|San Francisco Office

Why Join Achira?Become part of an exceptional team comprised of scientists, ML researchers, and engineers dedicated to transforming the landscape of drug discovery.Engage with cutting-edge machine learning infrastructure at an unprecedented scale, leveraging extensive computing resources, vast datasets, and ambitious goals.Take ownership of significant projects from conception through to architecture and deployment on large-scale infrastructures.Thrive in a culture that values thoroughness, speed, and a proactive, builder-oriented mindset.About the RoleAt Achira, we are developing state-of-the-art foundation models that address the most complex challenges in simulation for drug discovery and beyond. Our atomistic foundation simulation models (FSMs) serve as comprehensive representations of the physical microcosm, encompassing machine learning interaction potentials (MLIPs), neural network potentials (NNPs), and various generative model classes.We are looking for a Software Engineer who is enthusiastic about distributed computing and its applications in machine learning. You will play a pivotal role in designing and constructing the infrastructure for our ML data generation pipelines, model training, and fine-tuning workflows across large-scale distributed systems.Your expertise will be crucial in ensuring our compute clusters are efficient, observable, cost-effective, and dependable, enabling us to advance the frontiers of ML development. If you are passionate about distributed systems, performance optimization, and cloud cost efficiency, we encourage you to apply.You will be empowered to conceptualize and manage complex workloads across multiple vendors worldwide. Achira's mission revolves around computation, and providing seamless access to our uniquely tailored workloads at the lowest possible cost is critical to our success.

Oct 7, 2025
Apply
companyOpenAI logo
Full-time|On-site|San Francisco

About Our TeamThe Frontier Systems team at OpenAI is at the forefront of technology, responsible for creating, deploying, and maintaining some of the world's largest supercomputers. These supercomputers are pivotal for training our most advanced AI models, pushing the boundaries of innovation.We transform sophisticated data center designs into operational systems and develop the software infrastructure necessary for extensive frontier model training. Our goal is to ensure these hyperscale supercomputers operate reliably and efficiently, supporting groundbreaking AI research.About the RoleAs a key member of the Frontier Systems team, you will be instrumental in designing the critical infrastructure that ensures our supercomputers function seamlessly for pioneering AI research. In this role, you'll address system-level challenges and implement automation solutions that minimize disruptions during large-scale training processes.Your responsibilities will encompass end-to-end ownership of your projects, allowing you to make significant contributions to our mission. This position is ideal for individuals who excel in diagnosing complex system issues and crafting automation strategies to proactively resolve problems across a vast network of machines.Your Responsibilities Include:Enhancing system health checks to maintain the stability of our hyperscale supercomputers during model training.Conducting in-depth investigations into hardware failures and system-level bugs to uncover root causes.Developing automation tools that monitor and resolve issues across thousands of systems, enabling uninterrupted research progress.You May Be a Great Fit If You Possess:7+ years of hands-on experience in software engineering.Strong proficiency in Python and shell scripting.Expertise in analyzing complex data sets using SQL, PromQL, Pandas, or other relevant tools.Experience in creating reproducible analyses.A solid balance of skills in both building and operationalizing systems.Prior experience with hardware is not a prerequisite for this role.Preferred Qualifications:Familiarity with the intricacies of hardware components, protocols, and Linux tools (e.g., PCIe, Infiniband, networking, power management, kernel performance tuning).Experience with system optimization and performance tuning.

May 9, 2025

Sign in to browse more jobs

Create account — see all 5,703 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.