companyJudgment Labs logo

Research Engineer at Judgment Labs | San Francisco

Judgment LabsSan Francisco
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Entry Level

Qualifications

Candidates should possess a Bachelor's Degree in Computer Science, Engineering, Data Science, or a related field.

About the job

At Judgment Labs, we are revolutionizing the monitoring of agent behavior through our innovative infrastructure for Agent Behavior Monitoring (ABM). Unlike traditional observability metrics focused solely on logging exceptions and latency, our approach identifies behavioral anomalies including instruction drifts and context retrieval losses within scaled production environments.

Numerous teams developing autonomous agents depend on Judgment Labs to gain insights into their systems' performance after deployment. Rather than merely reacting to incidents, they can cluster patterns across conversations and workflows, correlate regressions with specific interaction types, and accurately identify where reliability falters in their operational contexts.

We are proud to announce that we have raised over $30 million in two funding rounds over the last five months. Our esteemed investors include Lightspeed, SV Angel, Valor Equity Partners, Nova Global, and notable individuals like Chris Manning and Michael Ovitz.

The Role:

We seek passionate Research Engineers to help us develop AI systems that utilize agent interaction data to enhance our understanding of agent behavior, facilitate large-scale evaluations, and drive improvements through iterative learning and feedback.

Your research will have a tangible impact. You will engage directly with real-world agent data, implement cutting-edge methodologies in production, and witness your contributions being deployed in real-time. By enhancing the measurability and debuggability of agent behavior, your work will empower teams across finance, legal, operations, and other critical domains. You will lead projects from inception to completion, enjoying substantial autonomy while collaborating closely with our team to create self-improving agent systems.

What You'll Do:

  • Develop systems that aggregate, index, and analyze extensive agent interaction data to derive valuable evaluation metrics.

  • Create agent-based systems for the analysis and evaluation of complex, long-term behaviors.

  • Design and execute post-training and optimization workflows aimed at enhancing agent performance.

  • Build internal tools and infrastructure that promote rapid experimentation, analysis, and training.

What We're Looking For:

You should resonate with at least one of the following:

  • A strong focus on data quality, evaluation, and benchmarking, with a hands-on approach to working with complex datasets.

  • Experience in developing agent systems and applying them in real-world or production environments.

  • A robust background in machine learning or related fields, with an eagerness to advance agent technology.

About Judgment Labs

Judgment Labs is at the forefront of agent behavior monitoring, providing essential tools and insights for teams developing autonomous systems. Our innovative solutions help organizations understand and optimize their agent performance, ensuring reliability and effectiveness in various high-stakes applications.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.