companyInferact logo

Infrastructure Engineer, Performance and Scale

InferactSan Francisco
Remote Full-time $200K/yr - $400K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Minimum Qualifications:Bachelor's degree or equivalent experience in Computer Science, Engineering, or a related field. Proficiency in systems programming languages such as Rust, Go, or C++. Demonstrated experience in designing and building high-performance distributed systems at scale. Solid understanding of network protocols and high-performance I/O. Strong problem-solving skills with the ability to debug complex distributed systems issues. Preferred Qualifications:Experience with ML serving infrastructure and disaggregated inference architectures. Familiarity with GPU programming models and memory hierarchies. Knowledge of GPU interconnect technologies (NVLink, InfiniBand, RoCE) and their performance characteristics. Proven track record of enhancing system reliability and performance at scale. Bonus Points:Prior experience in supporting large-scale model training or inference environments.

About the job

At Inferact, we are dedicated to establishing vLLM as the premier AI inference engine, propelling advancements in AI by making inference both cost-effective and expeditious. Founded by the original creators and key maintainers of vLLM, we occupy a unique position at the convergence of models and hardware—an achievement that has taken years to realize.

Role Overview

We are seeking a talented Infrastructure Engineer to develop the distributed systems that facilitate inference on a global scale. In this role, you will design and implement essential layers that allow vLLM to deploy models across thousands of accelerators with minimal latency and maximum reliability. Our vision is to make deploying cutting-edge models at scale as simple as launching a serverless database. The complexities will be seamlessly integrated into the robust infrastructure you will be creating.

About Inferact

Inferact is on a mission to revolutionize AI inference with vLLM, aiming to accelerate AI development by optimizing costs and speed. Our team consists of the original creators and maintainers of vLLM, and we pride ourselves on being at the forefront of innovation in AI and hardware integration.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.