About the job
Preference Model develops reinforcement learning environments that mirror the complexity of real-world tasks. The company focuses on building diverse RL tasks and detailed reward structures, aiming to push the boundaries of artificial intelligence. The founding team brings experience from developing data infrastructure and datasets for Claude at Anthropic, and Preference Model works closely with top AI research labs.
Role overview
The Senior Software Engineer - Reinforcement Learning Environments position centers on designing and delivering RL environments that challenge and improve current AI models. This role involves leading complex projects, including multi-step workflows and realistic stakeholder interactions, within a large codebase. Engineers work directly with the founders and a small, collaborative team, delivering environments used for training advanced models at partner labs. The position provides significant autonomy, regular feedback, and support for professional development.
What you will do
- Design, build, and iterate on reinforcement learning tasks, taking them from concept through evaluation.
- Lead the development of sophisticated environments, focusing on complex workflows and coding standards.
- Interact with coding agents, review their outputs, and identify subtle failures.
- Analyze whether issues stem from model limitations or environment design, then redesign tasks to reveal deeper failure modes.
- Contribute to building and maintaining the core infrastructure and tools for the environments team.
- Mentor junior engineers as the team expands.
Location
This role is based in San Francisco.

