companyFluidstack logo

Software Engineer - Inference Platform at Fluidstack | San Francisco

FluidstackSan Francisco, CA
On-site Full-time $165K/yr - $500K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Entry Level

Qualifications

Key ResponsibilitiesManage end-to-end inference deployments, covering initial configuration, performance optimization, and ongoing production maintenance. Enhance throughput, time-to-first-token (TTFT), and cost-per-token for various model families and workload patterns. Develop and maintain KV cache and scheduling systems to optimize resource utilization across concurrent requests. Design and implement scalable prefill/decode pipelines and the necessary Kubernetes orchestration. Analyze and troubleshoot performance bottlenecks across computing, memory, and communication layers; ensure comprehensive observability of deployments. Collaborate with clients to adapt their model architectures and latency demands into effective deployment configurations and platform enhancements. Contribute to the strategic direction and architectural planning of the inference platform.

About the job

Join the Fluidstack Team

At Fluidstack, we’re pioneering the infrastructure for advanced intelligence. We collaborate with leading AI laboratories, governmental entities, and major corporations—including Mistral, Poolside, and Meta—to deliver computing solutions at unprecedented speeds.

Our mission is to transform the vision of Artificial General Intelligence (AGI) into a reality. Driven by our purpose, our dedicated team is committed to building state-of-the-art infrastructure that prioritizes our customers' success. If you share our passion for excellence and are eager to contribute to the future of intelligence, we invite you to be part of our journey.

Role Overview

The Inference Platform team at Fluidstack is at the forefront of addressing the cost and latency challenges associated with frontier AI. You will play a crucial role in managing the serving layer that connects our global accelerator supply with the production workloads of our clients, which include LLM serving frameworks, KV cache infrastructure, and Kubernetes orchestration across multiple data centers.

This hands-on individual contributor role combines elements of distributed systems, model optimization, and serving infrastructure. You will oversee the entire lifecycle of inference deployments for leading AI labs, striving for enhancements in throughput, cost-efficiency, and response times, while also influencing the architectural decisions that guide Fluidstack’s deployment strategies.

About Fluidstack

Fluidstack is a cutting-edge technology company dedicated to building the infrastructure for the next generation of artificial intelligence. By partnering with top-tier AI labs and enterprises, we are committed to delivering exceptional computing capabilities that drive innovation and accelerate the development of AGI.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.