About the job
At Judgment Labs, we are revolutionizing the monitoring of agent behavior through our innovative infrastructure for Agent Behavior Monitoring (ABM). Unlike traditional observability metrics focused solely on logging exceptions and latency, our approach identifies behavioral anomalies including instruction drifts and context retrieval losses within scaled production environments.
Numerous teams developing autonomous agents depend on Judgment Labs to gain insights into their systems' performance after deployment. Rather than merely reacting to incidents, they can cluster patterns across conversations and workflows, correlate regressions with specific interaction types, and accurately identify where reliability falters in their operational contexts.
We are proud to announce that we have raised over $30 million in two funding rounds over the last five months. Our esteemed investors include Lightspeed, SV Angel, Valor Equity Partners, Nova Global, and notable individuals like Chris Manning and Michael Ovitz.
The Role:
We seek passionate Research Engineers to help us develop AI systems that utilize agent interaction data to enhance our understanding of agent behavior, facilitate large-scale evaluations, and drive improvements through iterative learning and feedback.
Your research will have a tangible impact. You will engage directly with real-world agent data, implement cutting-edge methodologies in production, and witness your contributions being deployed in real-time. By enhancing the measurability and debuggability of agent behavior, your work will empower teams across finance, legal, operations, and other critical domains. You will lead projects from inception to completion, enjoying substantial autonomy while collaborating closely with our team to create self-improving agent systems.
What You'll Do:
Develop systems that aggregate, index, and analyze extensive agent interaction data to derive valuable evaluation metrics.
Create agent-based systems for the analysis and evaluation of complex, long-term behaviors.
Design and execute post-training and optimization workflows aimed at enhancing agent performance.
Build internal tools and infrastructure that promote rapid experimentation, analysis, and training.
What We're Looking For:
You should resonate with at least one of the following:
A strong focus on data quality, evaluation, and benchmarking, with a hands-on approach to working with complex datasets.
Experience in developing agent systems and applying them in real-world or production environments.
A robust background in machine learning or related fields, with an eagerness to advance agent technology.

