companyCrusoe logo

Senior Hardware Systems Engineer at Crusoe | San Francisco, CA

CrusoeSan Francisco, CA - US
On-site Full-time $172K/yr - $209K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Senior

Qualifications

Candidates should possess a robust background in hardware systems engineering, with proficiency in debugging and validation of high-performance computing systems. Experience with PCIe, InfiniBand, and NVMe/storage technologies is highly desirable. A strong focus on automation and problem resolution is essential.

About the job

At Crusoe, our mission is to propel the availability of energy and intelligence. We are designing the engine that fuels a future where individuals can ambitiously innovate with AI, all while upholding standards of scale, speed, and sustainability.

Join us in the AI revolution powered by sustainable technology at Crusoe. Here, you will spearhead significant innovations, make a lasting impact, and collaborate with a team that is leading the charge in responsible, transformative cloud infrastructure.

About This Role:

We are on the lookout for a Hardware Production / Sustaining Engineer to enhance Crusoe’s Hardware Systems Engineering team and address critical skill gaps in debugging, validation, and production support of high-performance computing systems. In this role, you will oversee the entire hardware lifecycle—from prototype initiation to mass production—while driving automation, resolving intricate issues, and ensuring reliability across Crusoe Cloud’s GPU- and CPU-based infrastructure.

You will collaborate closely with cross-functional teams to support, debug, and optimize hardware platforms at scale, with a specific focus on PCIe, InfiniBand, and NVMe/storage, which are recognized as vital areas for enhanced expertise. Your contributions will significantly influence Crusoe’s capability to deploy and manage sustainable, AI-first computing systems that deliver world-class performance and reliability.

What You’ll Be Working On:

  • Lead the entire hardware development and sustaining lifecycle, encompassing feasibility, bring-up, validation, deployment, and ongoing production support.

  • Create and maintain scripting and automation frameworks for hardware testing, diagnostics, and continuous reliability enhancements.

  • Guide deep troubleshooting and debugging across:

    • PCIe (link training, topology, performance issues)

    • InfiniBand (fabric debugging, throughput, connectivity issues)

    • NVMe/storage (performance bottlenecks, firmware interactions, failure analysis)

  • Perform thorough system validation and characterization for GPU, CPU, and high-performance computing platforms.

  • Assist in end-to-end integration and solution testing to guarantee that Crusoe Cloud products fulfill performance, reliability, and scalability standards.

  • Work in tandem with mechanical, thermal, firmware, software, and manufacturing teams to resolve system-level challenges.

About Crusoe

Crusoe is dedicated to accelerating the abundance of energy and intelligence through innovative technology. Our commitment to sustainability and responsible AI solutions positions us at the forefront of transformative cloud infrastructure.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.