About the job
At Crusoe, we are dedicated to accelerating the abundance of energy and intelligence. Our innovative solutions are designed to empower a world where ambitious AI creations thrive, seamlessly merging scale, speed, and sustainability.
Join us in the AI revolution by leveraging sustainable technology at Crusoe. Here, you will spearhead significant innovations, make a real difference, and collaborate with a team that is leading the charge in responsible and transformative cloud infrastructure.
About This Role:
We seek a dynamic Engineering Manager to guide our Managed Services and Production Engineering team. In this role, you will balance your efforts between enhancing reliability and managing a team of Site Reliability Engineers (SREs) focused on Crusoe's AI-managed and service offerings. You will take responsibility for the production health of services provided to enterprise clients, including Managed Kubernetes, Managed Inference, and AutoClusters, while collaborating closely with embedded engineering teams to elevate operational excellence, automation, and customer satisfaction.
What You'll Be Working On:
Leading and developing a team of SREs within Crusoe's Managed AI and Managed Services sectors, establishing technical direction and nurturing a culture of ownership and continuous enhancement.
Contributing as an individual contributor—reviewing code, creating tools, and promoting automation to minimize manual work and increase the reliability and scalability of managed services.
Taking charge of SLA/SLO performance, incident response, and on-call health for managed services; conducting blameless post-mortems and driving systemic remediation efforts.
Collaborating with product and platform engineering teams to influence infrastructure design, observability strategies, and operational readiness for both new and existing managed services.
Establishing and monitoring reliability, performance, and operational maturity metrics across the team; converting data into prioritized roadmap investments.
Acting as a technical escalation point for high-severity production incidents impacting enterprise customers, and coordinating with Cloud Support and Customer Success teams to ensure effective resolution and communication.
What You'll Bring to the Team:
Over 5 years of experience in software or infrastructure engineering, with a minimum of 1–2 years in an engineering management or technical lead position.
Proven track record in leading teams and driving technical initiatives.

