Qualifications
Minimum Qualifications:Proficient in Python programming with a focus on engineering-quality code. Experience with Docker and a production-oriented mindset, including debugging and ensuring reliability and speed of iteration. Solid understanding of LLMs and their current limitations. Ability to meet throughput expectations and respond efficiently to feedback. You May Be a Good Fit If:You possess a deep understanding of transformer architecture, as well as training and inference processes for modern LLMs, and have experience with inference libraries (e.g., vLLM, SGLang). Expertise in CUDA or Pallas kernel development, with a track record of optimizing complex neural modules for specific hardware. You have published research or contributed public code in an active deep learning or machine learning research area. You have strong foundational knowledge and diverse research interests, regularly reading and deeply understanding academic papers, and possess the creativity to apply them to RLVR challenges. Experience building complex interactive RL environments and insights into open-ended RL-based learning systems.
About the job
About Us
At Preference Model, we are pioneering the next generation of training data to unlock the full potential of artificial intelligence. While today's models show remarkable capabilities, they often fall short of their potential across diverse applications due to out-of-distribution tasks. We create Reinforcement Learning environments that allow models to tackle real-world research and engineering challenges, iterating and learning through realistic feedback loops.
Our founding team comprises seasoned professionals from Anthropic’s data team, where we developed data infrastructure, tokenizers, and datasets for Claude. We collaborate with leading AI laboratories to drive AI closer to its transformative potential and are backed by a16z.
About the Role
We are seeking talented Reinforcement Learning Environments Engineers to design and implement MLE environments. Your primary mission will be to enable Large Language Models (LLMs) to acquire improved reasoning and advanced understanding of modern machine learning concepts. This role is fully remote with a requirement for at least 4 hours of overlap with PST and proficiency in English at a C1/C2 level.
About Preference Model
Preference Model is at the forefront of AI innovation, dedicated to enhancing the capabilities of machine learning systems through advanced training data methodologies. Our experienced team is committed to collaboration with leading AI labs to push the boundaries of technology.