About the job
About the Role
Join our innovative Cluster Orchestration team at CoreWeave, where you will significantly contribute to the enhancement of our orchestration platform, including SUNK (Slurm on Kubernetes) and more. This is a unique opportunity to influence a vital component of the AI cloud ecosystem, ensuring seamless, reliable, and efficient operations across extensive GPU clusters. By creating systems that alleviate infrastructure constraints and introduce novel orchestration functionalities, you will empower our clients to accelerate their AI innovations and redefine possibilities.
Your Responsibilities
In your role as a Staff Engineer, you will lead the technical vision for CoreWeave’s orchestration platform. You will set architectural guidelines, oversee critical orchestration components and managed services, and spearhead cross-organizational initiatives in scheduling, quota management, and large-scale scaling. Additionally, you will provide mentorship to senior engineers, establish best practices for reliability and observability, and ensure the evolution of CoreWeave’s orchestration layer aligns with the requirements of next-generation AI workloads.
Who You Are
- 8+ years of software engineering experience.
- Demonstrated success in designing and managing large-scale distributed systems in production environments.
- In-depth knowledge of Slurm/Kubernetes internals and cloud-native development.
- Advanced skills in Go and distributed systems architecture.
- Proven ability to mentor teams and promote best practices across the organization.

