companyCrusoe logo

Senior Software Engineer, Managed AI - AI Model Lifecycle

CrusoeSan Francisco, CA - US
On-site Full-time $172.4K/yr - $209K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Senior

Qualifications

Engineering Fundamentals:Bachelor's degree in Computer Science, Engineering, or a related field.4-5+ years of relevant industry experience. Strong background in software engineering principles, particularly with AI and machine learning technologies.

About the job

At Crusoe, we are on a mission to accelerate the convergence of energy and intelligence. We are building a powerful engine that enables individuals to innovate boldly with AI, all while upholding principles of scalability, speed, and sustainability.

Join us in spearheading the AI revolution through sustainable technology. At Crusoe, you will be at the forefront of meaningful innovation, making a significant impact while collaborating with a team dedicated to shaping the future of responsible, transformative cloud infrastructure.

About the Role:

As a Senior Software Engineer on the Model Lifecycle team, you will play a pivotal role in developing a managed platform that supports the entire application development lifecycle, with an emphasis on harnessing the power of Machine Learning models, particularly Large Language Models (LLMs).

Your Responsibilities:

  • Design and maintain systems for fine-tuning large foundational models (SFT, PEFT, LoRA, adapters), ensuring multi-node orchestration, checkpointing, failure recovery, and cost-effective scaling.

  • Create and manage end-to-end training pipelines for Large Language Models.

  • Implement components for distillation and reinforcement learning pipelines, focusing on preference optimization, policy optimization, and reward modeling.

  • Develop and sustain the core agent execution infrastructure.

  • Implement features for dataset, model, and experiment management, emphasizing versioning, lineage, evaluation, and reproducible fine-tuning.

Collaboration and Impact:

  • Collaborate closely with Senior Engineers, Principal Engineers, and various product and platform teams to implement systems abstractions and APIs.

  • Engage in technical discussions surrounding training runtimes, scheduling, storage, and overall model lifecycle management.

  • Bring 4-5+ years of industry experience, demonstrating a strong track record of successfully leading a diverse portfolio of initiatives.

  • Participate in and contribute to the open-source LLM ecosystem.

  • This position involves taking significant ownership of core system components.

Your Qualifications:

  • Engineering Fundamentals:

    • Bachelor's degree in Computer Science, Engineering, or a related discipline.

    • Proven experience in software engineering with a focus on AI models and machine learning.

About Crusoe

Crusoe is dedicated to accelerating the abundance of energy and intelligence. We are shaping the future of technology by creating innovative solutions that empower individuals to unleash their creativity with AI while maintaining a focus on sustainability and efficiency.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.