Qualifications
You will:Design, profile, and optimize our training and inference framework. Collaborate with ML and research teams to expedite their research and development efforts, enabling them to create the next generation of models and data curation systems. Research and integrate cutting-edge technologies to enhance our ML systems. Ideally, you’d have:A strong passion for system optimization. Experience in multi-node LLM training and inference. Background in developing large-scale distributed ML systems. Familiarity with post-training methods such as RLHF/RLVR and algorithms including PPO/GRPO. Robust software engineering skills with proficiency in frameworks and tools like CUDA, PyTorch, transformers, and flash attention. Excellent written and verbal communication skills for effective collaboration in a cross-functional team environment. Nice to haves:Proven expertise in post-training methods and/or next-generation applications for large language models, including instruction tuning, RLHF, tool use, reasoning, agents, and multimodal.
About the job
Join Scale's innovative Large Language Model (LLM) post-training platform team, where you will contribute to the development of our internal distributed framework designed specifically for LLM training. This sophisticated platform empowers Machine Learning Engineers (MLEs), researchers, data scientists, and operators to perform rapid and automated training and evaluation of LLMs. Additionally, it underpins the training framework for our data quality evaluation pipeline.
Scale is at the forefront of the Artificial Intelligence sector, acting as a vital provider of training and evaluation data, as well as comprehensive solutions for the entire machine learning lifecycle. In this role, you will collaborate closely with Scale’s ML teams and researchers to construct the foundational platform that supports all our ML research and development initiatives. Your work will involve building and optimizing this platform to facilitate the training, inference, and data curation of next-generation LLMs.
If you are passionate about driving the future of AI through groundbreaking innovations, we invite you to connect with us!
About Scale AI
Scale AI is a leading provider of AI training and evaluation solutions, uniquely positioned to deliver end-to-end services throughout the machine learning lifecycle. Our commitment to innovation places us at the center of the AI field, making us an essential partner for researchers and organizations aiming to leverage AI technologies effectively.