company

Reinforcement Learning Environments Engineer Summer Intern

Preference ModelSan Francisco
Remote Internship

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Entry Level

Qualifications

• Bachelor’s Degree or higher in Computer Science, Machine Learning, Mathematics, Physics, or related fields.• Strong coding skills in Python.• Familiarity with the workings of large language models and their applications.• Ability to work independently and adapt quickly to feedback.

About the job

Location: Preference for San Francisco, but remote candidates are welcome to apply.

Duration: This internship will last for 10-12 weeks during Summer 2026.

Compensation: This is a paid internship opportunity.

About Us

At Preference Model, we are pioneering the next era of training data to fuel the advancement of AI technologies. While current models are impressive, they often struggle with diverse applications due to out-of-distribution tasks. Our focus is on developing reinforcement learning (RL) environments where models can engage with complex research and engineering challenges, iterating and learning from realistic feedback mechanisms.

Our founding team boasts extensive experience from Anthropic's data division, where we built data infrastructure, tokenizers, and datasets that powered Claude. We collaborate with top AI labs to accelerate AI's journey toward its transformative potential and are proudly supported by a16z.

About the Role

We are seeking talented PhD students and exceptional undergraduate candidates to join us this summer in developing RL training environments tailored for large language models.

What You'll Do

  • Design and implement RL environments to assess LLM reasoning across various ML, systems, and research problems.

  • Produce clean, production-quality Python code (not just notebooks).

  • Utilize Docker to create reproducible environments and troubleshoot issues as they arise.

  • Translate ML research papers and concepts into actionable training tasks.

Who We're Looking For

You are either an undergraduate or a PhD student in Computer Science, Machine Learning, Mathematics, Physics, or a related discipline. You have a knack for writing real code beyond mere research prototypes and you enjoy reading ML literature in your spare time.

Must-Have Qualifications:

  • Proficient in Python programming.

  • Understanding of large language models (LLMs), their strengths, and limitations.

  • Self-motivated and capable of taking feedback to iterate quickly.

Preferred Qualifications:

  • Familiarity with transformer architecture and experience with training or inference code.

  • Experience in writing CUDA kernels or engaging in low-level GPU programming.

  • Deep knowledge in a particular research area (demonstrated by publications, public code, or strong coursework).

  • A passion for continuous learning and research in the field of AI.

About Preference Model

Preference Model is at the forefront of revolutionizing AI training data. Our team, with experience from Anthropic, partners with leading AI labs to push the boundaries of what AI can achieve, supported by investment from a16z.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.