Qualifications
Key Responsibilities:Design, profile, and enhance our training and inference framework. Work collaboratively with ML teams to expedite their research and development processes, empowering them to create next-generation models and data curation strategies. Investigate and incorporate cutting-edge technologies to refine our ML system. Preferred Qualifications:A strong enthusiasm for system optimization. Hands-on experience with multi-node LLM training and inference. Proven experience in developing large-scale distributed ML systems. Robust software engineering capabilities, with proficiency in frameworks and tools like CUDA, PyTorch, Transformers, Flash Attention, etc. Excellent written and verbal communication skills with the ability to thrive in a cross-functional team environment. Desirable Skills:Demonstrated expertise in post-training methodologies and/or innovative use cases for large language models, including instruction tuning, RLHF (Reinforcement Learning from Human Feedback), tool usage, reasoning, agents, and multimodal applications.
About the job
Join Scale AI's ML platform team (RLXF) as a Machine Learning Research Engineer, where you will play a pivotal role in developing our advanced distributed framework for training and inference of large language models. This platform is vital for enabling machine learning engineers, researchers, data scientists, and operators to conduct rapid and automated training, as well as evaluation of LLMs and data quality.
At Scale, we occupy a unique position in the AI landscape, serving as an essential provider of training and evaluation data along with comprehensive solutions for the entire ML lifecycle. You will collaborate closely with Scale's ML teams and researchers to enhance the foundational platform that underpins our ML research and development initiatives. Your contributions will be crucial in optimizing the platform to support the next generation of LLM training, inference, and data curation.
If you are passionate about driving the future of AI through groundbreaking innovations, we want to hear from you!
About Scale AI, Inc.
Scale AI is a leader in the AI sector, providing indispensable training and evaluation data as well as comprehensive end-to-end solutions for the machine learning lifecycle. Our platform empowers researchers and engineers to push the boundaries of AI technology.