companyCrusoe logo

Incident Manager at Crusoe | San Francisco, CA

CrusoeSan Francisco, CA - US
On-site Full-time $136.1K/yr - $165K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Manager

Qualifications

We are looking for individuals with a strong background in incident management, technical troubleshooting, and customer support. You should possess excellent communication skills and a proactive approach to problem-solving. Experience with data analytics and a solid understanding of cloud infrastructure will be advantageous.

About the job

At Crusoe, we're on a mission to transform the landscape of energy and intelligence. Our goal is to create an ecosystem where individuals can harness the power of AI to their fullest potential, all while prioritizing sustainability and scalability.

Join us in pioneering the AI revolution with innovative, sustainable technology. Your contributions will drive significant advancements and shape the future of responsible cloud infrastructure.

About the Role

As an Incident Manager, you will play a pivotal role in ensuring service reliability and maintaining customer confidence. Your efforts will directly influence our success by minimizing downtime and efficiently addressing critical incidents. You will oversee high-visibility incidents and customer escalations, guaranteeing quick and effective responses to intricate technical challenges.

In addition to immediate incident resolution, we aim to refine our incident management strategies to enhance customer experiences during crises and implement robust preventive measures thereafter. By utilizing data analytics, you will foster increased resiliency and reliability, ensuring that every incident serves as an opportunity for improvement in both our products and processes.

What You’ll Be Working On

Crisis Management & Data-Driven Resiliency

  • Lead incident responses for high-impact situations, ensuring minimal disruption to customer operations. You will be the steady force during crises, managing communications and strategies to uphold customer trust during outages or critical failures.

  • Leverage data analytics to identify incident trends, converting insights into actionable strategies that enhance system resiliency and reliability.

  • Formulate comprehensive incident response strategies. Emphasize prevention by conducting thorough post-incident reviews to address root causes and eliminate recurrences.

Technical Execution & Customer Support

  • Diagnose and resolve complex technical issues related to Infiniband, containerization, and distributed training.

  • Assist customers in implementing and optimizing their HPC infrastructure for maximum performance and efficiency.

  • Create and present training materials, including internal sessions, documentation, and knowledge base articles, to empower customers.

About Crusoe

At Crusoe, we are dedicated to revolutionizing the use of energy and intelligence through innovative and sustainable solutions. Our cutting-edge technology empowers individuals to pursue ambitious projects with AI while ensuring scalability and environmental responsibility.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.