companysfcompute logo

High-Performance Computing (HPC) Hardware Engineer

sfcomputeSan Francisco, CA
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

A minimum of 5 years of experience in operating, supporting, and scaling at least one HPC or GPU compute cluster in a production environment (preferably with more than 1,000 GPUs). Strong understanding of server hardware fundamentals and architecture.

About the job

At sfcompute, we are on a mission to revolutionize the infrastructure landscape by minimizing the risks associated with the largest build-outs in history.

When financing GPU clusters and the data centers that support them, having a contract in place—what we call an "offtake"—is crucial. This ensures that customers have signed on to lease the cluster even before it’s constructed.

The financing process for GPU clusters carries inherent risks due to thin margins and large volumes. Lenders often hesitate to take on the risk that developers may default on their loans, while developers are wary of being unable to sell their clusters. This dynamic leads to the necessity of transferring risk to customers via fixed-price, long-term contracts.

If customer risk isn't effectively mitigated, a market bubble can form. Unlike traditional SaaS models, application layer companies engage in multi-year contracts for compute and inference while offering customers monthly subscriptions. A miscalculation in purchasing can spell disaster; a small change in revenue growth could lead to profits or bankruptcy. Imagine a world where companies could exit their contracts by selling them back to the market.

As AI technology scales, compute power will increasingly only be available for those who can manage the associated risks. A small startup in a San Francisco Victorian house cannot feasibly commit to a 5-year, take-or-pay contract for $100 million supercomputers, but they might be able to purchase a month of liquidity that someone else has sold back.

That’s the market we’re building: a liquid marketplace for GPU offtake.

About the Role

As part of our infrastructure team, you will help design and deploy some of the most powerful GPU clusters in existence, with even smaller clusters today having ranked in the TOP500 five years ago. Your responsibilities will include participating in on-call rotations, deploying new environments, troubleshooting issues, and embracing automation to facilitate large-scale deployments. As a member of a small but dynamic team, you'll have the opportunity to significantly influence our company culture, mentor junior engineers, and engage directly with our customers.

About sfcompute

sfcompute is pioneering a transformative approach to infrastructure development, focusing on risk mitigation in the financing of GPU clusters and data centers. Our vision is to create accessible solutions for a diverse range of customers, enabling them to harness the power of high-performance computing without the associated financial burdens.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.