companyGimlet Labs logo

Technical Staff Member - Machine Learning Systems & Inference

Gimlet LabsSan Francisco
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Entry Level

Qualifications

ResponsibilitiesDesign and optimize comprehensive inference pipelines from request ingestion to execution and response delivery. Develop and enhance inference runtimes that effectively manage latency, throughput, and concurrency under realistic load conditions. Analyze batching, queuing, and scheduling trade-offs, including their effects on tail latency and fairness. Oversee KV cache allocation, placement, reuse, and eviction strategies across models and requests. Optimize prefill and decode pathways, focusing on attention mechanisms and memory management. Profile and troubleshoot inference performance issues across model, runtime, and system boundaries. Collaborate closely with teams specializing in compilers, kernels, networking, and distributed systems to achieve end-to-end performance enhancements. QualificationsProficiency in machine learning frameworks and inference optimization techniques. Experience with performance profiling and debugging tools. Strong understanding of system architecture and hardware-software integration. Ability to work collaboratively in a fast-paced, innovative environment. Excellent problem-solving skills and attention to detail.

About the job

At Gimlet Labs, we are pioneering the development of the first heterogeneous neocloud designed specifically for AI workloads. As the demand for AI systems surges, traditional homogeneous infrastructures face critical limits in power, capacity, and cost. Our innovative platform effectively decouples AI workloads from their hardware foundations, intelligently partitioning tasks and orchestrating them to the most suitable hardware for optimal performance and efficiency. This strategy fosters heterogeneous systems that span multiple vendors and generations, including cutting-edge accelerators, enabling significant enhancements in performance and cost-effectiveness at scale.

In addition to this foundational work, Gimlet is establishing a robust neocloud for agentic workloads. Our clients benefit from deploying and managing their workloads via stable, production-ready APIs, without the need to navigate hardware selection or performance optimization intricacies.

We collaborate with foundation labs, hyperscalers, and AI-native companies to drive real production workloads capable of scaling to gigawatt-class AI datacenters.

We are currently seeking a Member of Technical Staff specializing in ML systems and inference. In this pivotal role, you will be responsible for designing and constructing inference systems that facilitate the execution of complete models in real production environments. You will operate at the intersection of model architecture and system performance to ensure that inference processes are swift, predictable, and scalable.

This position is perfect for engineers with a deep understanding of modern model execution and a passion for optimizing latency, throughput, and memory utilization across the entire inference lifecycle.

About Gimlet Labs

Gimlet Labs is at the forefront of AI infrastructure innovation, creating solutions that redefine how AI workloads are managed and executed. Our cutting-edge technologies empower businesses to leverage AI effectively while addressing scalability and efficiency challenges, ensuring they remain competitive in a rapidly evolving landscape.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.