Tailoring 0 resumes…

We'll move completed jobs to Ready to Apply automatically.

Technical Staff Member - Inference Engineering

InferactSan Francisco

Remote Full-time $200K/yr - $400K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Required QualificationsBachelor's degree or equivalent experience in computer science, engineering, or a related field.In-depth understanding of transformer architectures and their derivatives.Proficient programming skills in Python, with a strong background in PyTorch internals.Experience with LLM inference systems (e.g., vLLM, TensorRT-LLM, SGLang, TGI).Ability to interpret and implement model architectures and inference techniques as presented in academic papers.Proven capability to produce high-performance, maintainable code and troubleshoot complex machine learning codebases.Preferred QualificationsComprehensive knowledge of KV-cache memory management, prefix caching, and hybrid model serving.Familiarity with reinforcement learning frameworks and algorithms for large language models.Experience in multimodal inference across various media types (audio, image, video, text).Previous contributions to open-source machine learning or systems infrastructure projects.Additionally, bonus points if you have:Successfully implemented core features in vLLM or other inference engine projects.Contributed to vLLM integrations (e.g., verl, OpenRLHF, Unsloth, LlamaFactory).Authored widely-shared technical blogs or side projects focusing on vLLM or LLM inference.

About the job

At Inferact, we are on a mission to establish vLLM as the premier AI inference engine, revolutionizing AI progress by making inference both more accessible and efficient. Our founding team consists of the original creators and key maintainers of vLLM, positioning us uniquely at the nexus of cutting-edge models and advanced hardware.

Role Overview

We are seeking a passionate inference runtime engineer eager to explore and expand the frontiers of LLM and diffusion model serving. As models evolve and grow in complexity with new architectures like mixture-of-experts and multimodal designs, the demand for innovative solutions in our inference engine intensifies. This role places you at the heart of vLLM, where you will enhance model execution across a variety of hardware platforms and architectures. Your contributions will have a direct influence on the future of AI inference.

About Inferact

Inferact is dedicated to advancing the field of artificial intelligence through innovative solutions in inference technology. Our team, comprised of the original architects of vLLM, is committed to shaping the future of AI by creating tools that make inference faster and more cost-effective.

Technical Staff Member - Inference Engineering

InferactSan Francisco

Remote Full-time $200K/yr - $400K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Technical Staff Member - Inference Engineering

Unlock Your Potential

Experience Level

Qualifications

About the job

Role Overview

About Inferact

Architectural Project Manager - Healthcare

Architectural Project Manager - Healthcare

Senior Architectural Project Manager

Experienced Architectural Project Manager

Architectural Project Technician

Architectural Technician

Architecture Intern - Summer 2026

Architecture Intern

Assistant Project Manager - Facilities at LaBella Associates | Rochester, NY

Assistant Project Surveyor

Athletic Design Specialist

Bridge Engineer - King of Prussia, PA

Bridge Engineer Position at LaBella Associates | Scranton, PA

Bridge Engineer (2-5 Years Experience) at LaBella Associates | Charlotte, NC

Bridge Engineer (7 - 10 Years Experience)

Bridge Engineer (7-10 Years Experience)

Bridge Project Engineer

Bridge Project Engineer

Bridge Project Engineer

Bridge Project Engineer

Technical Staff Member - Inference Engineering

Unlock Your Potential

Experience Level

Qualifications

About the job

Role Overview

About Inferact

Technical Staff Member - Inference Engineering

Unlock Your Potential

Experience Level

Qualifications

About the job

Role Overview

About Inferact

Technical Staff Member - Inference Engineering

Unlock Your Potential

Experience Level

Qualifications

About the job

Role Overview

About Inferact