companyReflection AI logo

Machine Learning Engineer - Post-Training and Evaluation

Reflection AISan Francisco
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Qualifications:Hands-on experience in applied machine learning, particularly with fine-tuning language models. You should have experience preparing datasets, executing training loops, evaluating outcomes, and deploying fine-tuned models. Familiarity with techniques like SFT, DPO, RLHF, or similar methodologies is required. A solid understanding of evaluation methodologies: the ability to design evaluations, interpret training graphs, and assess model performance accurately.

About the job

Reflection AI develops open-weight models with the goal of making superintelligence broadly accessible. The team draws on backgrounds from DeepMind, OpenAI, Google Brain, Meta, and Anthropic, and serves a wide range of users including individuals, enterprises, and government organizations.

Role overview

This Machine Learning Engineer position focuses on post-training and evaluation within the Applied AI group in San Francisco. The main responsibility is to fine-tune and evaluate Reflection AI’s open-weight models for enterprise customers, adapting them to specific domains and tasks using real customer data. The work covers the entire process: preparing and cleaning datasets, running fine-tuning workflows, building evaluation systems, and deploying models into production. Collaboration is central, both with clients to understand their needs and with research colleagues to advance model capabilities.

What you will do

  • Fine-tune open-weight models for customer use cases, including dataset preparation, configuring training (such as SFT, preference optimization, and reinforcement fine-tuning), and iterating based on evaluation feedback.
  • Design and maintain evaluation infrastructure: create evaluation suites, curate test sets, set baselines, and measure improvements on key customer tasks.
  • Prepare training data from raw customer sources by assessing data quality, cleaning and formatting, identifying noisy or adversarial samples, and building reproducible data pipelines.
  • Troubleshoot training and inference by analyzing loss curves, diagnosing data issues, and identifying problematic training dynamics.
  • Deploy fine-tuned models in hybrid environments (public cloud, VPC, on-premises) to ensure reliable, high-performance inference in production.
  • Contribute to developing playbooks, evaluation benchmarks, and best practices for fine-tuning and evaluation as the team’s approach evolves.

Requirements

  • Hands-on experience in applied machine learning, especially fine-tuning language models. This includes preparing datasets, running training loops, evaluating results, and deploying models. Familiarity with SFT, DPO, RLHF, or related techniques is required.
  • Strong understanding of evaluation methods, with the ability to design evaluations, interpret training metrics, and accurately assess model performance.

Location

San Francisco

About Reflection AI

Reflection AI is dedicated to pioneering open superintelligence that everyone can access. Our team is composed of AI experts and innovators from top organizations, striving to deliver adaptable AI solutions for diverse sectors.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.