About the job
ABOUT BASETEN
At Baseten, we are at the forefront of enabling transformative AI solutions for some of the world's leading companies, including Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer. Our innovative platform combines cutting-edge AI research, adaptable infrastructure, and developer-friendly tools to facilitate the production of advanced models. Recently, we celebrated our rapid growth with a successful $300M Series E funding round from notable investors like BOND, IVP, Spark Capital, Greylock, and Conviction. We invite you to join our dynamic team and contribute to the evolution of AI product deployment.
THE ROLE
As a Senior Software Engineer specializing in Model Training at Baseten, you will play a pivotal role in constructing the infrastructure essential for the large-scale training and fine-tuning of foundational AI models. Your responsibilities will include designing and implementing distributed training systems, optimizing GPU utilization, and establishing scalable pipelines that empower Baseten and our clientele to adapt models with efficiency and reliability. This role demands a high level of technical expertise and hands-on involvement: you will be responsible for critical components of our training stack, collaborate with product and infrastructure teams to identify customer needs, and drive advancements in scalable training infrastructure.
EXAMPLE WORK:
Training open-source models that surpass GPT-5 capabilities for a leading digital insurer
Exploring specialized, continuously learning models as the future of AI
RESPONSIBILITIES
Design, construct, and sustain distributed training infrastructures for large foundation models
Develop scalable pipelines for fine-tuning and training across diverse GPU/accelerator clusters
Enhance training performance through optimization of algorithms and infrastructure
Collaborate closely with cross-functional teams to align technical solutions with business objectives
Stay abreast of advancements in the field of machine learning and AI to continually improve our training processes

