companyGimlet Labs logo

Technical Staff Member - Distributed Systems

Gimlet LabsSan Francisco
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

ResponsibilitiesDesign and develop distributed systems that orchestrate and operate AI workloads at large scale. Create scheduling, routing, and resource management components that coordinate execution across multiple nodes and services. Develop production-grade APIs and control planes for workload deployment and management. Implement strategies for reliability, availability, and fault tolerance within distributed environments. Instrument systems for enhanced observability and debugging capabilities at scale. Collaborate closely with compilers, runtimes, and hardware to ensure comprehensive system correctness and optimized performance. QualificationsStrong foundation in software engineering principles. Experience in building or managing distributed systems within production settings.

About the job

At Gimlet Labs, we are pioneering the first heterogeneous neocloud tailored for AI workloads. As AI technology evolves, the industry confronts critical limitations in power, capacity, and cost linked to the traditional homogeneous, vertically integrated infrastructure. Gimlet addresses these challenges by decoupling AI workloads from the fundamental hardware, intelligently partitioning them into components and orchestrating each to the hardware that best meets its performance and efficiency needs. This innovative approach facilitates heterogeneous systems across diverse vendors and generations of hardware, including the latest emerging accelerators, resulting in significant improvements in performance and cost efficiency at scale.

Building upon this platform, Gimlet is developing a production-grade neocloud for agentic workloads. Our customers can deploy and manage their workloads through stable, production-ready APIs without the complexities of hardware selection, placement, or low-level performance optimization.

Gimlet collaborates with foundational labs, hyperscalers, and AI-native companies to enable real production workloads designed to scale to gigawatt-class AI datacenters.

We are currently in search of a Technical Staff Member specializing in distributed systems. In this role, you will be instrumental in developing the core platform responsible for scheduling, routing, and managing AI workloads reliably at production scale. You will engage with systems that coordinate execution across thousands of nodes, provide stable production APIs, and guarantee predictable workload performance under real-world conditions of load and failure.

This position is ideal for engineers passionate about building foundational infrastructure, grasping end-to-end systems, and operating at scale.

About Gimlet Labs

Gimlet Labs is at the forefront of AI infrastructure innovation, creating a unique neocloud platform that redefines how AI workloads are managed. Our mission is to empower organizations to harness the full potential of AI by providing scalable, efficient, and reliable solutions that transcend traditional limitations.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.