companyReflection AI logo

Data Quality Engineer - Member of Technical Staff (Pre-training)

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Entry Level

Qualifications

Bachelor's degree in a relevant field preferred. Experience in data quality assurance and data management.

About the job

Our Mission

At Reflection AI, we are dedicated to creating open superintelligence and making it universally accessible.

We are pioneering open weight models designed for individuals, agents, enterprises, and even nations. Our talented team consists of AI researchers and innovators from leading organizations such as DeepMind, OpenAI, Google Brain, Meta, Character. AI, and Anthropic.

Role Overview

Data is becoming increasingly vital in the realm of AI advancements. Recent significant breakthroughs have frequently stemmed from enhanced data rather than new architectures.

As a vital member of the Data Team, your primary role will be to guarantee that the data utilized for training our models adheres to the highest standards of quality, reliability, and impact. You will have a direct influence on our models' performance in essential capabilities.

Collaborating with exceptional researchers on our pre-training teams, you will help transform abstract concepts of "good data" into specific, quantifiable standards applicable across extensive data campaigns. We are seeking engineers who possess robust engineering skills combined with a profound curiosity about data quality and its relevance to model performance.

In close partnership with our pre-training teams, you will:

  • Take ownership of upstream data quality for LLM pre-training, functioning as either a specialist or generalist across various languages and modalities.

  • Collaborate with research and pre-training teams to convert requirements into measurable quality signals, providing actionable feedback to external data vendors.

  • Incorporate human-in-the-loop processes while designing, validating, and scaling automated QA methods to consistently measure data quality across large-scale campaigns.

  • Create reusable QA pipelines that ensure the delivery of high-quality data to pre-training teams for model training.

  • Continuously monitor and report on data quality, driving ongoing improvements in quality standards, processes, and acceptance criteria.

Candidate Profile

  • Strong engineering background with experience in building data pipelines, QA systems, or evaluation workflows for pre-training data.

  • Detail-oriented with an analytical mindset, capable of identifying failure modes, inconsistencies, and nuanced issues affecting data quality.

  • Solid understanding of the influence of data quality on pre-training, with the capacity to translate quality concerns into tangible signals, decisions, and feedback.

About Reflection AI

Reflection AI is at the forefront of AI innovation, aiming to democratize access to superintelligence. Our team is composed of pioneers from the most prestigious organizations in the AI field, committed to advancing technology for the benefit of all.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.