About the job
About AfterQuery
AfterQuery partners with leading AI labs to advance training data and evaluation frameworks. The team builds high-signal datasets and runs thorough evaluations that go beyond standard benchmarks. As a post-Series A, early-stage company in San Francisco, AfterQuery gives each team member room to shape the future of AI models.
Role Overview: Research Scientist - Frontier Data
This role focuses on designing datasets and developing evaluation systems that influence how top AI models are trained and assessed. Working closely with research teams at major AI labs, the scientist explores new data collection techniques, investigates where models fall short, and sets up metrics to track progress. The work is hands-on and experimental, moving quickly from hypothesis to live testing and directly impacting large-scale model training.
Key Responsibilities
- Design data slides and analyze data structures to uncover model weaknesses in areas like finance, software development, and enterprise operations.
- Build and refine evaluation rubrics and reward signals for RLHF and RLVR training approaches.
- Study annotator behavior and run experiments to improve model capabilities across different domains.
- Develop quantitative frameworks to measure dataset quality, diversity, and their effect on model alignment and performance.
- Work with research teams to turn training objectives into concrete data and evaluation needs.
What We Look For
- Experience as an undergraduate or master’s research student (PhD not required).
- Background or internships with RL environments or AI safety and benchmarking organizations (e.g., METR, Artificial Analysis) is a strong plus.
- Genuine interest in how data structure, selection, and quality affect model outcomes.
- Demonstrated skill in designing experiments, acting quickly, and extracting insights from complex data.
- Comfort working across sectors such as finance, software engineering, and policy.
- Strong quantitative background and familiarity with LLM training pipelines, RLHF/RLVR methods, or evaluation frameworks.
- A hands-on mindset focused on building practical solutions.

