companyReflection AI logo

Technical Staff Member - Model Evaluations

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Strong statistical analysis and experimental design skills to rigorously measure model improvements. Familiarity with LLM evaluation methodologies including static benchmarks, human preference evaluations, and agentic tasks. High agency and ability to thrive in a fast-paced startup environment with a focus on impact over process. Excited to work in a pioneering lab, defining how we measure and accelerate progress toward more capable models. Collaborative, detail-oriented, and motivated to build feedback loops that lead to significant model improvement.

About the job

Our Mission

At Reflection AI, we are dedicated to creating accessible open superintelligence for everyone.

Our team is composed of top-tier AI researchers and innovators from prestigious organizations like DeepMind, OpenAI, Google Brain, Meta, Character. AI, Anthropic, and more. We are committed to building open weight models for individuals, enterprises, and even nation states.

About the Role

  • Perform essential comparative analyses to deepen our insights into model capabilities.

  • Design and enhance evaluation systems and processes that establish robust feedback loops between data, evaluations, and model behavior.

  • Create generalizable evaluation frameworks that effectively capture reasoning, alignment, and practical usefulness.

  • Collaborate closely with pre-training, post-training, and applied teams to translate insights into tangible model improvements.

  • Expand the boundaries of measurable metrics, utilizing synthetic evaluations, human feedback, and real-world interaction data.

About You

  • Proficient in statistical analysis and experimental design, with the ability to rigorously measure model advancements.

  • Knowledgeable in LLM evaluation methodologies, including static benchmarks, human preference evaluations, and agentic tasks.

  • Possess a high degree of agency and thrive in a fast-paced startup atmosphere, prioritizing impact over rigid processes.

  • Eager to work in a pioneering lab, shaping how we measure and accelerate the development of more capable models.

  • Collaborative, detail-oriented, and driven by the desire to create effective feedback loops that enhance model performance.

What We Offer:

We believe in building superintelligence that is genuinely open, starting from the ground up. Joining Reflection means you will be part of a small, talent-dense team where you will help shape our future and push the boundaries of open foundational models.

You will have the opportunity to engage in the most impactful work of your career, knowing that you and your loved ones are well-supported.

  • Competitive Compensation: Salary and equity structured to attract and retain top global talent.

  • Health & Wellness: Comprehensive medical, dental, vision, life, and disability insurance.

About Reflection AI

Reflection AI is at the forefront of developing open superintelligence technology that is accessible to all. With a team of experts from top-tier organizations, we aim to create innovative AI solutions that empower individuals and institutions alike.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.