Staff Software Engineer At Crusoe San Francisco jobs in San Francisco – Browse 11,624 openings on RoboApply Jobs
Staff Software Engineer At Crusoe San Francisco jobs in San Francisco
Open roles matching “Staff Software Engineer At Crusoe San Francisco” with location signals for San Francisco. 11,624 active listings on RoboApply Jobs.
Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Entry Level
Qualifications
Ideal candidates will possess:Proficiency in modern programming languages such as Python, Java, or C++. Experience with cloud computing and containerization technologies. Strong problem-solving skills and an ability to work in a collaborative environment. A degree in Computer Science or a related field is preferred.
About the job
About the Role
Crusoe is hiring a Staff Software Engineer in San Francisco, CA. This role focuses on building high-performance software applications that support the company’s technology initiatives.
What You Will Do
Design and develop software applications with an emphasis on performance and reliability
Collaborate with engineering teams to deliver solutions that meet business needs
Contribute technical expertise to key projects and code reviews
Location
This position is based in San Francisco, CA.
About Crusoe
Crusoe is a forward-thinking technology company based in San Francisco, dedicated to leveraging the power of computing to create sustainable solutions. We pride ourselves on fostering a culture of innovation, teamwork, and professional growth.
About the Role Crusoe is hiring a Staff Software Engineer in San Francisco, CA. This role focuses on building high-performance software applications that support the company’s technology initiatives. What You Will Do Design and develop software applications with an emphasis on performance and reliability Collaborate with engineering teams to deliver solutions that meet business needs Contribute technical expertise to key projects and code reviews Location This position is based in San Francisco, CA.
Join Crusoe as a Staff Software Engineer specializing in Networking. In this critical role, you will design and implement innovative software solutions that enhance our networking infrastructure. You will collaborate with cross-functional teams to optimize performance and reliability, ensuring that our services run efficiently and securely.
Role Overview Crusoe is seeking a Senior Staff Software Engineer focused on CAPE for its San Francisco office. This role centers on designing and building software solutions that support Crusoe's mission to advance technology in the energy sector. What You Will Do Design and implement software systems for CAPE projects Collaborate with cross-functional teams to deliver solutions that align with user needs and company goals Tackle complex technical challenges in support of Crusoe's energy initiatives What We Look For Strong technical background in software engineering Experience solving complex problems and delivering reliable software Ability to work effectively with colleagues across multiple disciplines Location San Francisco, CA - US
Full-time|$208.7K/yr - $253K/yr|On-site|San Francisco, CA - US
At Crusoe, our mission is to propel the availability of energy and intelligence. We are developing the engine that empowers individuals to pursue ambitious projects with AI, all while upholding standards of scale, speed, and sustainability.Join us in revolutionizing the AI landscape with sustainable technology. Here, you will spearhead significant innovations, create real-world impact, and collaborate with a team that is defining the future of responsible cloud infrastructure.Position Overview:As a Staff Software Engineer on the Model LifeCycle team, you will be instrumental in developing a robust managed platform that oversees the entire application development lifecycle, specifically focusing on the integration of Machine Learning models, including Large Language Models (LLMs).Your Responsibilities:Enhance systems for large foundation models through fine-tuning (SFT, PEFT, LoRA, adapters), including multi-node orchestration, checkpointing, failure recovery, and efficient scaling.Design and sustain comprehensive training pipelines for Large Language Models.Contribute to the development of distillation and reinforcement learning pipelines (e.g., preference optimization, policy optimization, reward modeling).Create and uphold the infrastructure for agent execution.Implement features for dataset, model, and experiment management: ensuring versioning, lineage tracking, evaluation, and reproducible fine-tuning at scale.Collaboration and Impact:Collaborate closely with Principal Engineers, product teams, and platform teams to implement core abstractions and APIs.Participate in architectural decisions regarding training runtimes, scheduling, storage, and model lifecycle management.Engage actively with the open-source LLM community.This role offers considerable ownership — you will be pivotal in designing and implementing core systems.
Join our innovative team at Crusoe as a Senior Software Engineer specializing in Networking. In this critical role, you will develop cutting-edge software solutions that enhance our networking capabilities and support our mission of delivering efficient computing resources.Your expertise will contribute to building scalable and reliable network architectures, enabling us to serve our clients better. Collaborate with cross-functional teams and leverage your knowledge in software engineering to push the boundaries of technology.
As a Senior Staff Software Engineer specializing in Model LifeCycle at Crusoe, you will play a vital role in shaping the future of software solutions that optimize and enhance our innovative operations. You will lead complex projects, mentor junior engineers, and collaborate with cross-functional teams to deliver high-impact results.
Join Crusoe as a Senior Software Engineer in our Streaming division, where you'll be at the forefront of innovative streaming technology solutions. You will collaborate with cross-functional teams to design, develop, and implement high-performance streaming applications that enhance user experience.As a vital member of our engineering team, you will leverage your expertise in software development to contribute to cutting-edge projects that push the boundaries of streaming technology.
Join Crusoe as a Senior Software Engineer specializing in Observability, where you will play a pivotal role in enhancing our systems and ensuring robust performance across our platforms. You will collaborate with cross-functional teams to develop innovative solutions that improve the visibility and reliability of our software applications.
Full-time|$208K/yr - $253K/yr|On-site|San Francisco, CA - US
At Crusoe, our mission is to drive the evolution of energy and intelligence. We are developing the technology that fuels a future where individuals can ambitiously harness AI capabilities without compromising on scale, speed, or sustainability.Join us in revolutionizing AI with sustainable solutions at Crusoe. In this role, you will be at the forefront of innovation, making a significant impact while collaborating with a team that is shaping the future of responsible and transformative cloud infrastructure.About This Role:We are looking for a dedicated Hardware Production/Sustaining Engineer to enhance Crusoe's Hardware Systems Engineering team. This position is critical for bridging essential skill gaps in debugging, validation, and production support for high-performance computing systems. You will manage the entire hardware lifecycle—from prototype initiation to large-scale production—focusing on automation, deep troubleshooting, and reliability within Crusoe Cloud’s GPU- and CPU-oriented infrastructure.Your collaboration with cross-functional teams will be vital in supporting, debugging, and enhancing hardware platforms on a large scale, specifically targeting PCIe, InfiniBand, and NVMe/storage, which have been highlighted as key areas for expanded expertise. Your contributions will directly influence Crusoe’s capability to deploy and maintain sustainable, AI-driven computing systems that deliver exceptional performance and reliability.Your Responsibilities Will Include:Leading the complete hardware development and sustaining lifecycle, encompassing feasibility studies, bring-up, validation, deployment, and ongoing production support.Creating and sustaining automation frameworks and scripts for hardware testing, diagnostics, and continual reliability enhancements.Executing in-depth troubleshooting and debugging across:PCIe (including link training, topology, and performance issues)InfiniBand (focusing on fabric debugging, throughput, and connectivity challenges)NVMe/storage (addressing performance bottlenecks, firmware interactions, and failure analyses)Performing extensive system validation and characterization for GPU, CPU, and high-performance computing platforms.Assisting in end-to-end integration and solution testing to guarantee that Crusoe Cloud products fulfill performance, reliability, and scalability standards.Collaborating with teams across mechanical, thermal, firmware, software, and manufacturing domains to troubleshoot and enhance system performance.
Full-time|$180K/yr - $220K/yr|On-site|San Francisco, CA - US
At Crusoe, we are on a mission to revolutionize the future by accelerating the abundance of energy and intelligence. We are building the foundational engine that empowers individuals to create bold innovations with AI while ensuring sustainability, speed, and scalability.Join us in the forefront of the AI revolution with cutting-edge sustainable technology. You will play a pivotal role in driving meaningful innovation, making a significant impact, and collaborating with a team that is leading the way in responsible, transformative cloud infrastructure.About the RoleAs a Senior Staff Cloud Support Engineer, you will serve as a technical expert within Crusoe Cloud and significantly enhance the efforts of our Customer Experience, SRE, Networking, Fleet, and Product teams. Your role transcends basic ticket resolution; you will design reliability frameworks, influence architectural decisions, mentor senior engineers, and safeguard revenue by averting large-scale incidents. With profound expertise in Linux systems, Kubernetes, networking, and AI/ML infrastructure, you will apply your knowledge with a strong focus on customer satisfaction. You will be comfortable navigating uncertainty, leading incident responses, and shaping the global scaling of high-performance AI infrastructure.Key ResponsibilitiesAct as the top escalation point for complex P1/P0 incidents.Lead cross-functional investigations into root causes involving compute, networking (IB/RDMA/RoCE), storage, and orchestration layers.Collaborate with SRE and Software teams (Storage, Networking, Compute, K8) to devise systemic solutions rather than temporary fixes.Reliability ArchitectureDesign and enhance node validation, burn-in processes, performance baselining, and release readiness.Influence Kubernetes architecture, workload orchestration (Slurm, Terraform), and AI/ML cluster stability.Minimize MTTR and prevent incident recurrence through structural enhancements.AI/ML Infrastructure ExpertiseTroubleshoot NCCL, IB, GPU driver/firmware issues, and distributed training failures.Support complex AI workloads (training + inference) through performance tuning and observability enhancements.Customer-Facing AuthorityAct as a senior technical advisor during high-stakes customer incidents.
Join our innovative team at Crusoe as a Staff Software Engineer. In this pivotal role, you will leverage your advanced software engineering skills to design, develop, and optimize cutting-edge solutions that enhance our technology stack. Collaborate with cross-functional teams to drive projects from concept to completion, ensuring high-quality deliverables that meet user needs and business objectives.
Full-time|$175K/yr - $250K/yr|On-site|San Francisco, CA - US
At Crusoe, we are on a mission to drive the proliferation of energy and intelligence in the digital age. We are developing an innovative platform that enables individuals to harness the power of AI for ambitious projects, all while ensuring unparalleled scale, speed, and sustainability.Join us at the forefront of the AI revolution, where sustainable technology meets transformative cloud infrastructure. At Crusoe, you will be part of a team that is committed to meaningful innovation and making a significant impact.About the Role:We are looking for a Senior to Senior Staff level Solutions Engineer to collaborate closely with our key enterprise clients as they deploy AI and machine learning workloads on Crusoe's cutting-edge GPU infrastructure. This role is hands-on and customer-centric, requiring extensive technical knowledge in Kubernetes, MLOps, and cloud infrastructure.You will lead clients through the entire deployment journey, overseeing the proof of concept (PoC) process, optimizing workloads after the sale, and serving as an essential technical liaison between our clients and engineering teams. Successful candidates will possess a strong passion for AI infrastructure, be proficient in containerized environments, and have the ability to effectively translate workloads across various cloud platforms.What You'll Be Working On:Customer Enablement: Spearhead the technical onboarding and deployment of sophisticated AI/ML workloads with strategic enterprise customers—taking ownership of the PoC through to post-sales optimization.Kubernetes + MLOps Focus: Design and implement ML workloads utilizing Kubernetes-based technologies (e.g., Ray, Kubeflow) while ensuring optimal performance, scalability, and efficiency.Infrastructure-Centric Thinking: Engage directly with Crusoe infrastructure to deploy and fine-tune AI/ML workloads, guaranteeing performance at both the container and hardware levels.Cross-Cloud Translation: Assist clients in migrating and adapting workloads across AWS, Azure, and GCP, while clearly articulating the trade-offs between cloud-native and Crusoe-native strategies.Technical Storytelling: Facilitate workshops, live demonstrations, and solution reviews. Contribute to case studies, solution briefs, and blog articles that showcase real-world customer success stories.Voice of the Customer: Provide feedback to internal engineering and product teams to continuously enhance Crusoe’s platform based on practical implementation experiences.What You'll Bring to the Team:Deep Kubernetes Expertise: 7+ years of experience in building and deploying containerized applications.
Join Crusoe as an Enterprise IT Architect, where you will play a pivotal role in shaping our technology landscape. This position offers the opportunity to design and implement innovative IT architecture solutions that meet our business needs and drive operational excellence.
Join Crusoe as a Senior Data Engineer, where you will play a critical role in enhancing our data infrastructure and analytics capabilities. You will be responsible for designing, developing, and maintaining robust data pipelines to support our cutting-edge applications.As a key member of our engineering team, you will work closely with data scientists and analysts to ensure that data is accessible, accurate, and actionable.
Join our innovative team at Crusoe as a Staff Software Engineer, where you will leverage your expertise in systems engineering to develop cutting-edge software solutions. In this dynamic role, you will collaborate with cross-functional teams to design, implement, and optimize systems that drive our mission forward. Your contributions will be pivotal in enhancing our technology stack and ensuring the seamless operation of our systems.
Full-time|$165K/yr - $200K/yr|On-site|San Francisco, CA - US
At Crusoe, we are on a mission to accelerate the abundance of energy and intelligence, creating an environment where innovation thrives. As we build the infrastructure that empowers ambitious AI-driven projects, we prioritize sustainability without compromising on scale or speed.Join us in being part of the AI revolution with cutting-edge technology at Crusoe, where you will spearhead impactful innovations and collaborate with a team committed to transforming cloud infrastructure responsibly.About This RoleWe are looking for a Senior API Integration Engineer who will act as a vital technical partner in our enterprise-wide digital transformation efforts. This role is pivotal in driving intelligent automation and scalable system integrations, particularly within our People Tech ecosystem, with a strong focus on Workday HCM.What You’ll Be Working OnDesigning and developing enterprise-grade integrations utilizing the Workato ONE platform to facilitate intelligent workflow automation.Creating and maintaining robust API integrations across Workday HCM and the wider People Tech landscape, which includes payroll, ATS, LMS, compensation, benefits, performance, and analytics.Employing AI-driven automation within Workato to enhance efficiency, reliability, and process optimization.Establishing reusable integration architecture patterns, frameworks, and governance standards that can scale across both automated and human-led workflows.Collaborating with business and IT stakeholders to gather requirements, lead discovery sessions, assess ROI, and translate complex needs into scalable tech solutions.Overseeing integration initiatives from concept through deployment, including sprint execution, technical reviews, and delivery accountability.Providing senior-level escalation support to ensure the reliability and monitoring of critical integrations.What You’ll Bring to the TeamA minimum of 7 years of experience as an API Developer or Integration Engineer in enterprise-level environments.At least 3 years of hands-on production experience with Workato, especially with Workato ONE.Proven expertise in building and maintaining complex Workato integrations.
About the Senior Staff Software Engineer Role Crusoe Technologies is hiring a Senior Staff Software Engineer for the CAPE project in San Francisco, CA. This role focuses on building and maintaining scalable software that strengthens our infrastructure and streamlines operations. What You Will Do Design and develop software solutions for the CAPE project Maintain and improve existing systems to support infrastructure growth Work closely with teams across disciplines to deliver reliable, efficient software What We Look For Advanced programming skills Deep understanding of software architecture Experience collaborating with cross-functional groups
Full-time|$172K/yr - $209K/yr|On-site|San Francisco, CA - US
At Crusoe, our mission is to propel the availability of energy and intelligence. We are designing the engine that fuels a future where individuals can ambitiously innovate with AI, all while upholding standards of scale, speed, and sustainability.Join us in the AI revolution powered by sustainable technology at Crusoe. Here, you will spearhead significant innovations, make a lasting impact, and collaborate with a team that is leading the charge in responsible, transformative cloud infrastructure.About This Role:We are on the lookout for a Hardware Production / Sustaining Engineer to enhance Crusoe’s Hardware Systems Engineering team and address critical skill gaps in debugging, validation, and production support of high-performance computing systems. In this role, you will oversee the entire hardware lifecycle—from prototype initiation to mass production—while driving automation, resolving intricate issues, and ensuring reliability across Crusoe Cloud’s GPU- and CPU-based infrastructure.You will collaborate closely with cross-functional teams to support, debug, and optimize hardware platforms at scale, with a specific focus on PCIe, InfiniBand, and NVMe/storage, which are recognized as vital areas for enhanced expertise. Your contributions will significantly influence Crusoe’s capability to deploy and manage sustainable, AI-first computing systems that deliver world-class performance and reliability.What You’ll Be Working On:Lead the entire hardware development and sustaining lifecycle, encompassing feasibility, bring-up, validation, deployment, and ongoing production support.Create and maintain scripting and automation frameworks for hardware testing, diagnostics, and continuous reliability enhancements.Guide deep troubleshooting and debugging across:PCIe (link training, topology, performance issues)InfiniBand (fabric debugging, throughput, connectivity issues)NVMe/storage (performance bottlenecks, firmware interactions, failure analysis)Perform thorough system validation and characterization for GPU, CPU, and high-performance computing platforms.Assist in end-to-end integration and solution testing to guarantee that Crusoe Cloud products fulfill performance, reliability, and scalability standards.Work in tandem with mechanical, thermal, firmware, software, and manufacturing teams to resolve system-level challenges.
Full-time|$210K/yr - $265K/yr|On-site|San Francisco, CA - US
At Crusoe, we are committed to accelerating the abundance of energy and intelligence. Our mission is to develop the technology that empowers individuals to innovate boldly with AI, all while ensuring scalability, speed, and sustainability.Join the AI revolution with sustainable technology at Crusoe. In this role, you will spearhead significant innovations, have a direct impact, and collaborate with a team that is leading the charge in responsible and transformative cloud infrastructure.About the PositionWe are in search of a Senior Infrastructure Security Engineer to fortify the core of Crusoe Cloud, our specialized computing platform designed for AI and high-performance tasks. This role is dedicated to designing and integrating robust security measures into our global infrastructure, allowing clients to develop advanced models in a secure and trusted environment.You will work at the convergence of infrastructure, security, and reliability, crafting identity, network, and cloud security systems that can grow alongside a rapidly expanding cloud service provider.Key ResponsibilitiesDesign and implement security controls across the compute, networking, and storage layers of a global cloud platform.Promote Infrastructure-as-Code (IaC) standards (e.g., Terraform) to establish secure defaults, enforce immutability, and implement drift detection.Develop automated security guardrails integrated within CI/CD and deployment pipelines.Collaborate on a centralized Vault-as-a-Platform service for managing secrets, encryption keys, and internal PKI.Oversee certificate lifecycles (X.509, SSH) to facilitate secure machine-to-machine trust.Advocate for the adoption of short-lived, Just-In-Time (JIT) access models to minimize standing privileges and enhance auditability.Secure foundational network components, including global DNS architecture, service discovery, and network authentication systems.Design and uphold authentication controls for network infrastructure to ensure secure and monitored access.Collaborate closely with infrastructure, platform, and SRE teams to pinpoint and address security vulnerabilities in foundational systems.What You Bring8+ years of hands-on experience in infrastructure engineering, with a strong focus on security.Proficiency in cloud security principles and practices.Strong understanding of compliance frameworks and regulations.
Full-time|$136.1K/yr - $165K/yr|On-site|San Francisco, CA - US
At Crusoe, we're on a mission to transform the landscape of energy and intelligence. Our goal is to create an ecosystem where individuals can harness the power of AI to their fullest potential, all while prioritizing sustainability and scalability.Join us in pioneering the AI revolution with innovative, sustainable technology. Your contributions will drive significant advancements and shape the future of responsible cloud infrastructure.About the RoleAs an Incident Manager, you will play a pivotal role in ensuring service reliability and maintaining customer confidence. Your efforts will directly influence our success by minimizing downtime and efficiently addressing critical incidents. You will oversee high-visibility incidents and customer escalations, guaranteeing quick and effective responses to intricate technical challenges.In addition to immediate incident resolution, we aim to refine our incident management strategies to enhance customer experiences during crises and implement robust preventive measures thereafter. By utilizing data analytics, you will foster increased resiliency and reliability, ensuring that every incident serves as an opportunity for improvement in both our products and processes.What You’ll Be Working OnCrisis Management & Data-Driven ResiliencyLead incident responses for high-impact situations, ensuring minimal disruption to customer operations. You will be the steady force during crises, managing communications and strategies to uphold customer trust during outages or critical failures.Leverage data analytics to identify incident trends, converting insights into actionable strategies that enhance system resiliency and reliability.Formulate comprehensive incident response strategies. Emphasize prevention by conducting thorough post-incident reviews to address root causes and eliminate recurrences.Technical Execution & Customer SupportDiagnose and resolve complex technical issues related to Infiniband, containerization, and distributed training.Assist customers in implementing and optimizing their HPC infrastructure for maximum performance and efficiency.Create and present training materials, including internal sessions, documentation, and knowledge base articles, to empower customers.
About the Role Crusoe is hiring a Staff Software Engineer in San Francisco, CA. This role focuses on building high-performance software applications that support the company’s technology initiatives. What You Will Do Design and develop software applications with an emphasis on performance and reliability Collaborate with engineering teams to deliver solutions that meet business needs Contribute technical expertise to key projects and code reviews Location This position is based in San Francisco, CA.
Join Crusoe as a Staff Software Engineer specializing in Networking. In this critical role, you will design and implement innovative software solutions that enhance our networking infrastructure. You will collaborate with cross-functional teams to optimize performance and reliability, ensuring that our services run efficiently and securely.
Role Overview Crusoe is seeking a Senior Staff Software Engineer focused on CAPE for its San Francisco office. This role centers on designing and building software solutions that support Crusoe's mission to advance technology in the energy sector. What You Will Do Design and implement software systems for CAPE projects Collaborate with cross-functional teams to deliver solutions that align with user needs and company goals Tackle complex technical challenges in support of Crusoe's energy initiatives What We Look For Strong technical background in software engineering Experience solving complex problems and delivering reliable software Ability to work effectively with colleagues across multiple disciplines Location San Francisco, CA - US
Full-time|$208.7K/yr - $253K/yr|On-site|San Francisco, CA - US
At Crusoe, our mission is to propel the availability of energy and intelligence. We are developing the engine that empowers individuals to pursue ambitious projects with AI, all while upholding standards of scale, speed, and sustainability.Join us in revolutionizing the AI landscape with sustainable technology. Here, you will spearhead significant innovations, create real-world impact, and collaborate with a team that is defining the future of responsible cloud infrastructure.Position Overview:As a Staff Software Engineer on the Model LifeCycle team, you will be instrumental in developing a robust managed platform that oversees the entire application development lifecycle, specifically focusing on the integration of Machine Learning models, including Large Language Models (LLMs).Your Responsibilities:Enhance systems for large foundation models through fine-tuning (SFT, PEFT, LoRA, adapters), including multi-node orchestration, checkpointing, failure recovery, and efficient scaling.Design and sustain comprehensive training pipelines for Large Language Models.Contribute to the development of distillation and reinforcement learning pipelines (e.g., preference optimization, policy optimization, reward modeling).Create and uphold the infrastructure for agent execution.Implement features for dataset, model, and experiment management: ensuring versioning, lineage tracking, evaluation, and reproducible fine-tuning at scale.Collaboration and Impact:Collaborate closely with Principal Engineers, product teams, and platform teams to implement core abstractions and APIs.Participate in architectural decisions regarding training runtimes, scheduling, storage, and model lifecycle management.Engage actively with the open-source LLM community.This role offers considerable ownership — you will be pivotal in designing and implementing core systems.
Join our innovative team at Crusoe as a Senior Software Engineer specializing in Networking. In this critical role, you will develop cutting-edge software solutions that enhance our networking capabilities and support our mission of delivering efficient computing resources.Your expertise will contribute to building scalable and reliable network architectures, enabling us to serve our clients better. Collaborate with cross-functional teams and leverage your knowledge in software engineering to push the boundaries of technology.
As a Senior Staff Software Engineer specializing in Model LifeCycle at Crusoe, you will play a vital role in shaping the future of software solutions that optimize and enhance our innovative operations. You will lead complex projects, mentor junior engineers, and collaborate with cross-functional teams to deliver high-impact results.
Join Crusoe as a Senior Software Engineer in our Streaming division, where you'll be at the forefront of innovative streaming technology solutions. You will collaborate with cross-functional teams to design, develop, and implement high-performance streaming applications that enhance user experience.As a vital member of our engineering team, you will leverage your expertise in software development to contribute to cutting-edge projects that push the boundaries of streaming technology.
Join Crusoe as a Senior Software Engineer specializing in Observability, where you will play a pivotal role in enhancing our systems and ensuring robust performance across our platforms. You will collaborate with cross-functional teams to develop innovative solutions that improve the visibility and reliability of our software applications.
Full-time|$208K/yr - $253K/yr|On-site|San Francisco, CA - US
At Crusoe, our mission is to drive the evolution of energy and intelligence. We are developing the technology that fuels a future where individuals can ambitiously harness AI capabilities without compromising on scale, speed, or sustainability.Join us in revolutionizing AI with sustainable solutions at Crusoe. In this role, you will be at the forefront of innovation, making a significant impact while collaborating with a team that is shaping the future of responsible and transformative cloud infrastructure.About This Role:We are looking for a dedicated Hardware Production/Sustaining Engineer to enhance Crusoe's Hardware Systems Engineering team. This position is critical for bridging essential skill gaps in debugging, validation, and production support for high-performance computing systems. You will manage the entire hardware lifecycle—from prototype initiation to large-scale production—focusing on automation, deep troubleshooting, and reliability within Crusoe Cloud’s GPU- and CPU-oriented infrastructure.Your collaboration with cross-functional teams will be vital in supporting, debugging, and enhancing hardware platforms on a large scale, specifically targeting PCIe, InfiniBand, and NVMe/storage, which have been highlighted as key areas for expanded expertise. Your contributions will directly influence Crusoe’s capability to deploy and maintain sustainable, AI-driven computing systems that deliver exceptional performance and reliability.Your Responsibilities Will Include:Leading the complete hardware development and sustaining lifecycle, encompassing feasibility studies, bring-up, validation, deployment, and ongoing production support.Creating and sustaining automation frameworks and scripts for hardware testing, diagnostics, and continual reliability enhancements.Executing in-depth troubleshooting and debugging across:PCIe (including link training, topology, and performance issues)InfiniBand (focusing on fabric debugging, throughput, and connectivity challenges)NVMe/storage (addressing performance bottlenecks, firmware interactions, and failure analyses)Performing extensive system validation and characterization for GPU, CPU, and high-performance computing platforms.Assisting in end-to-end integration and solution testing to guarantee that Crusoe Cloud products fulfill performance, reliability, and scalability standards.Collaborating with teams across mechanical, thermal, firmware, software, and manufacturing domains to troubleshoot and enhance system performance.
Full-time|$180K/yr - $220K/yr|On-site|San Francisco, CA - US
At Crusoe, we are on a mission to revolutionize the future by accelerating the abundance of energy and intelligence. We are building the foundational engine that empowers individuals to create bold innovations with AI while ensuring sustainability, speed, and scalability.Join us in the forefront of the AI revolution with cutting-edge sustainable technology. You will play a pivotal role in driving meaningful innovation, making a significant impact, and collaborating with a team that is leading the way in responsible, transformative cloud infrastructure.About the RoleAs a Senior Staff Cloud Support Engineer, you will serve as a technical expert within Crusoe Cloud and significantly enhance the efforts of our Customer Experience, SRE, Networking, Fleet, and Product teams. Your role transcends basic ticket resolution; you will design reliability frameworks, influence architectural decisions, mentor senior engineers, and safeguard revenue by averting large-scale incidents. With profound expertise in Linux systems, Kubernetes, networking, and AI/ML infrastructure, you will apply your knowledge with a strong focus on customer satisfaction. You will be comfortable navigating uncertainty, leading incident responses, and shaping the global scaling of high-performance AI infrastructure.Key ResponsibilitiesAct as the top escalation point for complex P1/P0 incidents.Lead cross-functional investigations into root causes involving compute, networking (IB/RDMA/RoCE), storage, and orchestration layers.Collaborate with SRE and Software teams (Storage, Networking, Compute, K8) to devise systemic solutions rather than temporary fixes.Reliability ArchitectureDesign and enhance node validation, burn-in processes, performance baselining, and release readiness.Influence Kubernetes architecture, workload orchestration (Slurm, Terraform), and AI/ML cluster stability.Minimize MTTR and prevent incident recurrence through structural enhancements.AI/ML Infrastructure ExpertiseTroubleshoot NCCL, IB, GPU driver/firmware issues, and distributed training failures.Support complex AI workloads (training + inference) through performance tuning and observability enhancements.Customer-Facing AuthorityAct as a senior technical advisor during high-stakes customer incidents.
Join our innovative team at Crusoe as a Staff Software Engineer. In this pivotal role, you will leverage your advanced software engineering skills to design, develop, and optimize cutting-edge solutions that enhance our technology stack. Collaborate with cross-functional teams to drive projects from concept to completion, ensuring high-quality deliverables that meet user needs and business objectives.
Full-time|$175K/yr - $250K/yr|On-site|San Francisco, CA - US
At Crusoe, we are on a mission to drive the proliferation of energy and intelligence in the digital age. We are developing an innovative platform that enables individuals to harness the power of AI for ambitious projects, all while ensuring unparalleled scale, speed, and sustainability.Join us at the forefront of the AI revolution, where sustainable technology meets transformative cloud infrastructure. At Crusoe, you will be part of a team that is committed to meaningful innovation and making a significant impact.About the Role:We are looking for a Senior to Senior Staff level Solutions Engineer to collaborate closely with our key enterprise clients as they deploy AI and machine learning workloads on Crusoe's cutting-edge GPU infrastructure. This role is hands-on and customer-centric, requiring extensive technical knowledge in Kubernetes, MLOps, and cloud infrastructure.You will lead clients through the entire deployment journey, overseeing the proof of concept (PoC) process, optimizing workloads after the sale, and serving as an essential technical liaison between our clients and engineering teams. Successful candidates will possess a strong passion for AI infrastructure, be proficient in containerized environments, and have the ability to effectively translate workloads across various cloud platforms.What You'll Be Working On:Customer Enablement: Spearhead the technical onboarding and deployment of sophisticated AI/ML workloads with strategic enterprise customers—taking ownership of the PoC through to post-sales optimization.Kubernetes + MLOps Focus: Design and implement ML workloads utilizing Kubernetes-based technologies (e.g., Ray, Kubeflow) while ensuring optimal performance, scalability, and efficiency.Infrastructure-Centric Thinking: Engage directly with Crusoe infrastructure to deploy and fine-tune AI/ML workloads, guaranteeing performance at both the container and hardware levels.Cross-Cloud Translation: Assist clients in migrating and adapting workloads across AWS, Azure, and GCP, while clearly articulating the trade-offs between cloud-native and Crusoe-native strategies.Technical Storytelling: Facilitate workshops, live demonstrations, and solution reviews. Contribute to case studies, solution briefs, and blog articles that showcase real-world customer success stories.Voice of the Customer: Provide feedback to internal engineering and product teams to continuously enhance Crusoe’s platform based on practical implementation experiences.What You'll Bring to the Team:Deep Kubernetes Expertise: 7+ years of experience in building and deploying containerized applications.
Join Crusoe as an Enterprise IT Architect, where you will play a pivotal role in shaping our technology landscape. This position offers the opportunity to design and implement innovative IT architecture solutions that meet our business needs and drive operational excellence.
Join Crusoe as a Senior Data Engineer, where you will play a critical role in enhancing our data infrastructure and analytics capabilities. You will be responsible for designing, developing, and maintaining robust data pipelines to support our cutting-edge applications.As a key member of our engineering team, you will work closely with data scientists and analysts to ensure that data is accessible, accurate, and actionable.
Join our innovative team at Crusoe as a Staff Software Engineer, where you will leverage your expertise in systems engineering to develop cutting-edge software solutions. In this dynamic role, you will collaborate with cross-functional teams to design, implement, and optimize systems that drive our mission forward. Your contributions will be pivotal in enhancing our technology stack and ensuring the seamless operation of our systems.
Full-time|$165K/yr - $200K/yr|On-site|San Francisco, CA - US
At Crusoe, we are on a mission to accelerate the abundance of energy and intelligence, creating an environment where innovation thrives. As we build the infrastructure that empowers ambitious AI-driven projects, we prioritize sustainability without compromising on scale or speed.Join us in being part of the AI revolution with cutting-edge technology at Crusoe, where you will spearhead impactful innovations and collaborate with a team committed to transforming cloud infrastructure responsibly.About This RoleWe are looking for a Senior API Integration Engineer who will act as a vital technical partner in our enterprise-wide digital transformation efforts. This role is pivotal in driving intelligent automation and scalable system integrations, particularly within our People Tech ecosystem, with a strong focus on Workday HCM.What You’ll Be Working OnDesigning and developing enterprise-grade integrations utilizing the Workato ONE platform to facilitate intelligent workflow automation.Creating and maintaining robust API integrations across Workday HCM and the wider People Tech landscape, which includes payroll, ATS, LMS, compensation, benefits, performance, and analytics.Employing AI-driven automation within Workato to enhance efficiency, reliability, and process optimization.Establishing reusable integration architecture patterns, frameworks, and governance standards that can scale across both automated and human-led workflows.Collaborating with business and IT stakeholders to gather requirements, lead discovery sessions, assess ROI, and translate complex needs into scalable tech solutions.Overseeing integration initiatives from concept through deployment, including sprint execution, technical reviews, and delivery accountability.Providing senior-level escalation support to ensure the reliability and monitoring of critical integrations.What You’ll Bring to the TeamA minimum of 7 years of experience as an API Developer or Integration Engineer in enterprise-level environments.At least 3 years of hands-on production experience with Workato, especially with Workato ONE.Proven expertise in building and maintaining complex Workato integrations.
About the Senior Staff Software Engineer Role Crusoe Technologies is hiring a Senior Staff Software Engineer for the CAPE project in San Francisco, CA. This role focuses on building and maintaining scalable software that strengthens our infrastructure and streamlines operations. What You Will Do Design and develop software solutions for the CAPE project Maintain and improve existing systems to support infrastructure growth Work closely with teams across disciplines to deliver reliable, efficient software What We Look For Advanced programming skills Deep understanding of software architecture Experience collaborating with cross-functional groups
Full-time|$172K/yr - $209K/yr|On-site|San Francisco, CA - US
At Crusoe, our mission is to propel the availability of energy and intelligence. We are designing the engine that fuels a future where individuals can ambitiously innovate with AI, all while upholding standards of scale, speed, and sustainability.Join us in the AI revolution powered by sustainable technology at Crusoe. Here, you will spearhead significant innovations, make a lasting impact, and collaborate with a team that is leading the charge in responsible, transformative cloud infrastructure.About This Role:We are on the lookout for a Hardware Production / Sustaining Engineer to enhance Crusoe’s Hardware Systems Engineering team and address critical skill gaps in debugging, validation, and production support of high-performance computing systems. In this role, you will oversee the entire hardware lifecycle—from prototype initiation to mass production—while driving automation, resolving intricate issues, and ensuring reliability across Crusoe Cloud’s GPU- and CPU-based infrastructure.You will collaborate closely with cross-functional teams to support, debug, and optimize hardware platforms at scale, with a specific focus on PCIe, InfiniBand, and NVMe/storage, which are recognized as vital areas for enhanced expertise. Your contributions will significantly influence Crusoe’s capability to deploy and manage sustainable, AI-first computing systems that deliver world-class performance and reliability.What You’ll Be Working On:Lead the entire hardware development and sustaining lifecycle, encompassing feasibility, bring-up, validation, deployment, and ongoing production support.Create and maintain scripting and automation frameworks for hardware testing, diagnostics, and continuous reliability enhancements.Guide deep troubleshooting and debugging across:PCIe (link training, topology, performance issues)InfiniBand (fabric debugging, throughput, connectivity issues)NVMe/storage (performance bottlenecks, firmware interactions, failure analysis)Perform thorough system validation and characterization for GPU, CPU, and high-performance computing platforms.Assist in end-to-end integration and solution testing to guarantee that Crusoe Cloud products fulfill performance, reliability, and scalability standards.Work in tandem with mechanical, thermal, firmware, software, and manufacturing teams to resolve system-level challenges.
Full-time|$210K/yr - $265K/yr|On-site|San Francisco, CA - US
At Crusoe, we are committed to accelerating the abundance of energy and intelligence. Our mission is to develop the technology that empowers individuals to innovate boldly with AI, all while ensuring scalability, speed, and sustainability.Join the AI revolution with sustainable technology at Crusoe. In this role, you will spearhead significant innovations, have a direct impact, and collaborate with a team that is leading the charge in responsible and transformative cloud infrastructure.About the PositionWe are in search of a Senior Infrastructure Security Engineer to fortify the core of Crusoe Cloud, our specialized computing platform designed for AI and high-performance tasks. This role is dedicated to designing and integrating robust security measures into our global infrastructure, allowing clients to develop advanced models in a secure and trusted environment.You will work at the convergence of infrastructure, security, and reliability, crafting identity, network, and cloud security systems that can grow alongside a rapidly expanding cloud service provider.Key ResponsibilitiesDesign and implement security controls across the compute, networking, and storage layers of a global cloud platform.Promote Infrastructure-as-Code (IaC) standards (e.g., Terraform) to establish secure defaults, enforce immutability, and implement drift detection.Develop automated security guardrails integrated within CI/CD and deployment pipelines.Collaborate on a centralized Vault-as-a-Platform service for managing secrets, encryption keys, and internal PKI.Oversee certificate lifecycles (X.509, SSH) to facilitate secure machine-to-machine trust.Advocate for the adoption of short-lived, Just-In-Time (JIT) access models to minimize standing privileges and enhance auditability.Secure foundational network components, including global DNS architecture, service discovery, and network authentication systems.Design and uphold authentication controls for network infrastructure to ensure secure and monitored access.Collaborate closely with infrastructure, platform, and SRE teams to pinpoint and address security vulnerabilities in foundational systems.What You Bring8+ years of hands-on experience in infrastructure engineering, with a strong focus on security.Proficiency in cloud security principles and practices.Strong understanding of compliance frameworks and regulations.
Full-time|$136.1K/yr - $165K/yr|On-site|San Francisco, CA - US
At Crusoe, we're on a mission to transform the landscape of energy and intelligence. Our goal is to create an ecosystem where individuals can harness the power of AI to their fullest potential, all while prioritizing sustainability and scalability.Join us in pioneering the AI revolution with innovative, sustainable technology. Your contributions will drive significant advancements and shape the future of responsible cloud infrastructure.About the RoleAs an Incident Manager, you will play a pivotal role in ensuring service reliability and maintaining customer confidence. Your efforts will directly influence our success by minimizing downtime and efficiently addressing critical incidents. You will oversee high-visibility incidents and customer escalations, guaranteeing quick and effective responses to intricate technical challenges.In addition to immediate incident resolution, we aim to refine our incident management strategies to enhance customer experiences during crises and implement robust preventive measures thereafter. By utilizing data analytics, you will foster increased resiliency and reliability, ensuring that every incident serves as an opportunity for improvement in both our products and processes.What You’ll Be Working OnCrisis Management & Data-Driven ResiliencyLead incident responses for high-impact situations, ensuring minimal disruption to customer operations. You will be the steady force during crises, managing communications and strategies to uphold customer trust during outages or critical failures.Leverage data analytics to identify incident trends, converting insights into actionable strategies that enhance system resiliency and reliability.Formulate comprehensive incident response strategies. Emphasize prevention by conducting thorough post-incident reviews to address root causes and eliminate recurrences.Technical Execution & Customer SupportDiagnose and resolve complex technical issues related to Infiniband, containerization, and distributed training.Assist customers in implementing and optimizing their HPC infrastructure for maximum performance and efficiency.Create and present training materials, including internal sessions, documentation, and knowledge base articles, to empower customers.
Feb 11, 2026
Sign in to browse more jobs
Create account — see all 11,624 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.