About the job
Location: Preference for San Francisco, but remote candidates are welcome to apply.
Duration: This internship will last for 10-12 weeks during Summer 2026.
Compensation: This is a paid internship opportunity.
About Us
At Preference Model, we are pioneering the next era of training data to fuel the advancement of AI technologies. While current models are impressive, they often struggle with diverse applications due to out-of-distribution tasks. Our focus is on developing reinforcement learning (RL) environments where models can engage with complex research and engineering challenges, iterating and learning from realistic feedback mechanisms.
Our founding team boasts extensive experience from Anthropic's data division, where we built data infrastructure, tokenizers, and datasets that powered Claude. We collaborate with top AI labs to accelerate AI's journey toward its transformative potential and are proudly supported by a16z.
About the Role
We are seeking talented PhD students and exceptional undergraduate candidates to join us this summer in developing RL training environments tailored for large language models.
What You'll Do
Design and implement RL environments to assess LLM reasoning across various ML, systems, and research problems.
Produce clean, production-quality Python code (not just notebooks).
Utilize Docker to create reproducible environments and troubleshoot issues as they arise.
Translate ML research papers and concepts into actionable training tasks.
Who We're Looking For
You are either an undergraduate or a PhD student in Computer Science, Machine Learning, Mathematics, Physics, or a related discipline. You have a knack for writing real code beyond mere research prototypes and you enjoy reading ML literature in your spare time.
Must-Have Qualifications:
Proficient in Python programming.
Understanding of large language models (LLMs), their strengths, and limitations.
Self-motivated and capable of taking feedback to iterate quickly.
Preferred Qualifications:
Familiarity with transformer architecture and experience with training or inference code.
Experience in writing CUDA kernels or engaging in low-level GPU programming.
Deep knowledge in a particular research area (demonstrated by publications, public code, or strong coursework).
A passion for continuous learning and research in the field of AI.

