companyCrusoe logo

Staff Hardware Systems Engineer at Crusoe | San Francisco, CA

CrusoeSan Francisco, CA - US
On-site Full-time $208K/yr - $253K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Qualifications:Bachelor's Degree in Electrical Engineering, Computer Engineering, or a related field. Proven experience with high-performance computing systems and hardware lifecycle management. Strong scripting and automation skills for hardware testing and diagnostics. In-depth knowledge of PCIe, InfiniBand, and NVMe/storage technologies. Excellent problem-solving skills and a proactive approach to troubleshooting.

About the job

At Crusoe, our mission is to drive the evolution of energy and intelligence. We are developing the technology that fuels a future where individuals can ambitiously harness AI capabilities without compromising on scale, speed, or sustainability.

Join us in revolutionizing AI with sustainable solutions at Crusoe. In this role, you will be at the forefront of innovation, making a significant impact while collaborating with a team that is shaping the future of responsible and transformative cloud infrastructure.

About This Role:

We are looking for a dedicated Hardware Production/Sustaining Engineer to enhance Crusoe's Hardware Systems Engineering team. This position is critical for bridging essential skill gaps in debugging, validation, and production support for high-performance computing systems. You will manage the entire hardware lifecycle—from prototype initiation to large-scale production—focusing on automation, deep troubleshooting, and reliability within Crusoe Cloud’s GPU- and CPU-oriented infrastructure.

Your collaboration with cross-functional teams will be vital in supporting, debugging, and enhancing hardware platforms on a large scale, specifically targeting PCIe, InfiniBand, and NVMe/storage, which have been highlighted as key areas for expanded expertise. Your contributions will directly influence Crusoe’s capability to deploy and maintain sustainable, AI-driven computing systems that deliver exceptional performance and reliability.

Your Responsibilities Will Include:

  • Leading the complete hardware development and sustaining lifecycle, encompassing feasibility studies, bring-up, validation, deployment, and ongoing production support.

  • Creating and sustaining automation frameworks and scripts for hardware testing, diagnostics, and continual reliability enhancements.

  • Executing in-depth troubleshooting and debugging across:

    • PCIe (including link training, topology, and performance issues)

    • InfiniBand (focusing on fabric debugging, throughput, and connectivity challenges)

    • NVMe/storage (addressing performance bottlenecks, firmware interactions, and failure analyses)

  • Performing extensive system validation and characterization for GPU, CPU, and high-performance computing platforms.

  • Assisting in end-to-end integration and solution testing to guarantee that Crusoe Cloud products fulfill performance, reliability, and scalability standards.

  • Collaborating with teams across mechanical, thermal, firmware, software, and manufacturing domains to troubleshoot and enhance system performance.

About Crusoe

Crusoe is at the forefront of the AI revolution, dedicated to creating sustainable technology solutions. Our focus is on developing robust cloud infrastructure that empowers innovation while maintaining a commitment to environmental responsibility and efficiency.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.