companySentry logo

Senior Software Engineer, AI Evals

SentrySan Francisco, California
Hybrid Full-time $240K/yr - $280K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Senior

About the job

About Sentry

At Sentry, we are committed to transforming the way developers build software. With a mission to eradicate poor software experiences, we empower developers to create better applications more efficiently, ensuring a seamless encounter with technology.

Backed by over $217 million in funding and trusted by more than 100,000 organizations, including industry giants like Disney, Microsoft, and Atlassian, we are at the forefront of performance monitoring and error tracking solutions. Our innovative tools enable companies to focus on product development rather than bug fixes.

We embrace a hybrid work environment across our global offices, designating Mondays, Tuesdays, and Thursdays as in-office collaboration days to foster meaningful team interactions. If you are passionate about creating solutions that enhance the digital experience, join us in developing the next wave of software monitoring tools.

About the Role

As a Senior Software Engineer on Sentry’s AI/ML team, you will play a pivotal role in constructing the evaluation infrastructure that assesses the accuracy, reliability, and performance of our AI systems in real-world scenarios. This position is essential for ensuring that our debugging agents and AI-driven features operate correctly, safely, and predictably as they scale. You will design datasets, benchmarks, and test harnesses that convert vague AI behavior into quantifiable metrics, enabling the team to deploy AI solutions with confidence.

In This Role You Will

  • Develop and implement robust evaluation frameworks to assess accuracy, reliability, regressions, and edge cases within AI systems.

  • Generate and manage high-quality datasets, golden test cases, and benchmarks based on real production data.

  • Create automated test harnesses and metrics pipelines to continuously evaluate models, prompts, and workflows.

  • Collaborate closely with applied AI engineers and product leaders to establish clear definitions of success and translate them into measurable criteria.

  • Oversee the evaluation lifecycle for significant AI projects, from initial experimentation to ongoing production monitoring.

You'll Love This Job If You

  • Have a strong commitment to accuracy, rigor, and measurement in AI systems.

  • Enjoy transforming ambiguous product objectives and model behaviors into precise tests and metrics.

  • Take pleasure in building foundational infrastructure that facilitates rapid iteration and boosts team confidence.

  • Thrive in collaborative environments and relish the opportunity to influence model design through effective evaluation.

About Sentry

Sentry is a leading provider of performance monitoring and error tracking solutions, dedicated to helping developers create better software efficiently. With a strong backing and a diverse client base, Sentry is pioneering advancements in the tech space.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.