companyInferact logo

Cloud Orchestration Engineer at Inferact | San Francisco

InferactSan Francisco
Remote Full-time $200K/yr - $400K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Minimum Qualifications:Bachelor's degree or equivalent experience in computer science, engineering, or a related field. Extensive experience with Kubernetes and large-scale container orchestration. Proficient in designing and implementing custom Kubernetes operators. Strong programming skills in Python, Rust, or Go, along with experience in infrastructure-as-code tools such as Terraform and Helm. Experience in managing GPU clusters and troubleshooting hardware issues. Ability to work across various cloud platforms (AWS, GCP, Azure) as well as on-premise infrastructure. Preferred Qualifications:Familiarity with ML-specific orchestration tools like Ray or Slurm. Understanding of GPU scheduling, multi-tenancy, and resource optimization. Knowledge of vLLM deployment patterns and configurations. Proven track record of enhancing operational reliability for machine learning systems. Bonus Points:Experience deploying inference systems on large-scale GPU clusters (1,000+ nodes).

About the job

At Inferact, we are on a mission to establish vLLM as the premier AI inference engine, aiming to propel AI advancements by making inference processes more efficient and cost-effective. Our company is founded by the original creators and core maintainers of vLLM, placing us at a unique intersection of models and hardware, a position we have cultivated over many years.

About the Role

We are seeking a talented Cloud Orchestration Engineer to develop and maintain the operational framework that ensures vLLM operates seamlessly at an extensive scale. In this role, you will be responsible for designing systems for cluster management, deployment automation, and production monitoring, enabling teams across the globe to deploy AI models effortlessly. Your work will guarantee that vLLM deployments are not only observable and debuggable but also recoverable, transforming operational complexities into reliable infrastructure that operates smoothly.

About Inferact

Inferact is committed to transforming AI inference through vLLM, making it accessible and efficient for everyone. Our focus is to innovate at the confluence of AI models and hardware, fostering an environment where technology meets practicality to drive the future of AI.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.