companyPerplexity logo

Data Scientist, Evals

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Mid to Senior

Qualifications

QualificationsPhD or MS in a technical discipline or equivalent experience4+ years of experience in data science or machine learningStrong proficiency in Python and SQL, capable of writing production-quality codeExperience with modern cloud data stacks, specifically AWS and DatabricksComfortable with agentic coding workflows and AI-assisted development toolsPreferred Qualifications1+ years of experience with LLMs at scale, particularly LLM-as-a-judgeExperience with consumer-facing web products or apps with real user trafficStrong research background, applying methods to real-world ML problemsExperience defining evaluation metrics and building ground truth datasets

About the job

Join Perplexity, a cutting-edge company serving millions of users each day with high-quality answers powered by an LLM-first search engine and specialized data sources. We strive to leverage the latest models as they become available, navigating the complexities of the intelligence frontier where traditional benchmarks may fall short. In this pivotal role, you will be responsible for creating specialized evaluations aimed at enhancing answer quality across Perplexity, specifically focusing on search-based LLM responses and other user-favored scenarios.

Responsibilities

  • Design and manage automated evaluation pipelines that measure answer quality across Perplexity's products, ensuring adherence to high standards of accuracy and usefulness.

  • Create tailored evaluation datasets and methodologies to assess the influence of tool calls, particularly in web search retrieval, on the quality of final answers.

  • Develop VLM-based solutions to programmatically analyze the visual rendering of final answers across various platforms and devices.

  • Consistently evaluate public benchmarks and academic assessments for their relevance to Perplexity's offerings, adapting and integrating them into our ongoing performance evaluations.

  • Collaborate within a small, high-impact team where your evaluation metrics will directly influence product enhancements, working closely with technical leadership to measure and elevate Answer Quality.

Qualifications

  • PhD or MS in a technical discipline or equivalent practical experience.

  • A minimum of 4 years of experience in data science or machine learning.

  • Proficient in Python and SQL, with the ability to write production-quality code.

  • Experience with modern cloud data stacks, particularly AWS and Databricks.

  • Familiarity with agentic coding workflows and utilizing AI-assisted development tools for efficient iteration.

Preferred Qualifications

  • At least 1 year of experience working with LLMs at scale, especially in LLM-as-a-judge configurations.

  • Experience developing customer-facing web products or consumer applications with significant user traffic.

  • A robust research background, demonstrating the application of research methodologies to real-world machine learning challenges.

  • Experience in defining evaluation metrics, such as factual consistency, hallucination rate, and retrieval precision, along with creating ground truth datasets.

About Perplexity

Perplexity is a forward-thinking technology company that powers millions of daily interactions through a unique LLM-first search engine. Our mission focuses on delivering accurate, high-quality answers by leveraging both innovative models and specialized data sources, setting us apart in the competitive landscape.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.