About the job
At Crusoe, we are on a mission to accelerate the convergence of energy and intelligence. We are building a powerful engine that enables individuals to innovate boldly with AI, all while upholding principles of scalability, speed, and sustainability.
Join us in spearheading the AI revolution through sustainable technology. At Crusoe, you will be at the forefront of meaningful innovation, making a significant impact while collaborating with a team dedicated to shaping the future of responsible, transformative cloud infrastructure.
About the Role:
As a Senior Software Engineer on the Model Lifecycle team, you will play a pivotal role in developing a managed platform that supports the entire application development lifecycle, with an emphasis on harnessing the power of Machine Learning models, particularly Large Language Models (LLMs).
Your Responsibilities:
Design and maintain systems for fine-tuning large foundational models (SFT, PEFT, LoRA, adapters), ensuring multi-node orchestration, checkpointing, failure recovery, and cost-effective scaling.
Create and manage end-to-end training pipelines for Large Language Models.
Implement components for distillation and reinforcement learning pipelines, focusing on preference optimization, policy optimization, and reward modeling.
Develop and sustain the core agent execution infrastructure.
Implement features for dataset, model, and experiment management, emphasizing versioning, lineage, evaluation, and reproducible fine-tuning.
Collaboration and Impact:
Collaborate closely with Senior Engineers, Principal Engineers, and various product and platform teams to implement systems abstractions and APIs.
Engage in technical discussions surrounding training runtimes, scheduling, storage, and overall model lifecycle management.
Bring 4-5+ years of industry experience, demonstrating a strong track record of successfully leading a diverse portfolio of initiatives.
Participate in and contribute to the open-source LLM ecosystem.
This position involves taking significant ownership of core system components.
Your Qualifications:
Engineering Fundamentals:
Bachelor's degree in Computer Science, Engineering, or a related discipline.
Proven experience in software engineering with a focus on AI models and machine learning.

