About the job
About Us
At Preference Model, we are pioneering the development of next-generation training data that drives the future of artificial intelligence. While current models demonstrate significant capabilities, they often fall short in diverse applications due to many tasks being out of distribution. We create reinforcement learning environments where models tackle real-world research and engineering challenges, continuously iterating and learning from authentic feedback loops.
Our founding team, with experience from Anthropic’s data division, has built the infrastructure and datasets that support the Claude AI. We collaborate with top-tier AI laboratories to accelerate AI's journey toward its transformative potential, and we are proudly backed by a16z.
About the Role
We envision a future where models can autonomously train on their weaknesses. We seek innovative thinkers eager to explore the limits of self-directed learning. In this position, you will meld research with engineering, implementing cutting-edge methodologies and influencing research trajectories.
Representative Projects:
Architect and enhance our core reinforcement learning infrastructure, developing clean training abstractions and distributed experiment management systems to accommodate increasingly complex research workflows.
Design, implement, and validate training environments, evaluations, and methodologies for reinforcement learning agents.
Enhance performance through profiling, optimization, and benchmarking. Implement efficient caching techniques and debug distributed systems to expedite training and evaluation processes.
Collaborate with cross-functional teams in research and engineering to establish automated testing frameworks, design user-friendly APIs, and build scalable infrastructure that propels AI research forward.
You May Be a Good Fit If You:
Are proficient in Python and familiar with frameworks like PyTorch or Jax.
Have hands-on experience in training and conducting machine learning research on large language models.
Can effectively balance research exploration with practical engineering implementation.
Enjoy pair programming and prioritize code quality, testing, and performance.
Possess strong systems design and communication skills.
Have a solid understanding of reinforcement learning algorithms and stay updated with current publications in the field.

