About the job
Preference Model creates new types of training data to help artificial intelligence systems improve beyond their current limits. The team specializes in building reinforcement learning environments that test both research and engineering abilities, giving models the chance to learn from realistic feedback. Founded by former members of Anthropic’s data division, Preference Model draws on experience building data infrastructure, tokenizers, and datasets for Claude. The company partners with top AI labs and is backed by a16z.
Role overview
This entry-level machine learning engineer position is based in San Francisco and is intended for recent graduates. The focus is on building and maintaining the infrastructure that powers Preference Model’s reinforcement learning training pipeline. The team is small, so each engineer takes responsibility for their projects. Deep production experience is not required, but strong technical fundamentals, curiosity about reinforcement learning, and the ability to learn quickly are essential.
What you will do
- Develop and scale distributed training systems with PyTorch
- Design automation for monitoring, debugging, and recovery during large-scale training runs
- Collaborate with researchers to turn RL training experiments into dependable infrastructure
- Enhance performance and reliability for GPU and TPU workloads
Requirements
- Recent graduate (BS, MS, or PhD) in Computer Science, Machine Learning, or a related field
- Interest in reinforcement learning and AI infrastructure

