company

Senior Software Engineer - Reinforcement Learning Environments

Preference ModelSan Francisco
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Senior

Qualifications

Who We SeekWe are looking for an individual with extensive software engineering experience across various domains, strong problem-solving skills, and a passion for advancing AI technologies. You should be adept at navigating complex codebases and system design challenges.

About the job

Preference Model develops reinforcement learning environments that mirror the complexity of real-world tasks. The company focuses on building diverse RL tasks and detailed reward structures, aiming to push the boundaries of artificial intelligence. The founding team brings experience from developing data infrastructure and datasets for Claude at Anthropic, and Preference Model works closely with top AI research labs.

Role overview

The Senior Software Engineer - Reinforcement Learning Environments position centers on designing and delivering RL environments that challenge and improve current AI models. This role involves leading complex projects, including multi-step workflows and realistic stakeholder interactions, within a large codebase. Engineers work directly with the founders and a small, collaborative team, delivering environments used for training advanced models at partner labs. The position provides significant autonomy, regular feedback, and support for professional development.

What you will do

  • Design, build, and iterate on reinforcement learning tasks, taking them from concept through evaluation.
  • Lead the development of sophisticated environments, focusing on complex workflows and coding standards.
  • Interact with coding agents, review their outputs, and identify subtle failures.
  • Analyze whether issues stem from model limitations or environment design, then redesign tasks to reveal deeper failure modes.
  • Contribute to building and maintaining the core infrastructure and tools for the environments team.
  • Mentor junior engineers as the team expands.

Location

This role is based in San Francisco.

About Preference Model

Preference Model is at the forefront of automated ML research engineering, dedicated to creating RL environments that simulate real-world complexities. Our team combines expertise from leading tech backgrounds to redefine AI capabilities.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.