About the job
Join Fluency in Revolutionizing the Autonomous Enterprise
At Fluency, we challenge the limits of our models' capabilities. Unlike traditional chatbot prompt engineering, we are developing advanced evaluation frameworks and research systems that gauge, enhance, and validate enterprise intelligence at an unprecedented scale.
We are seeking a Research Engineer who will be instrumental in designing experiments, constructing evaluation infrastructure, and enhancing model quality for our process conformance, productivity measurement, and AI impact analysis initiatives across Fortune 500 companies.
Understanding the Problem Space
You will be at the forefront of developing methodologies and systems that assess the effectiveness of our models. With inputs ranging from screenshots, OCR text, application metadata to behavioral signals, the data can be complex and the truth often ambiguous. Your challenge will be to create robust evaluation frameworks that quantify model performance and pinpoint areas for improvement.
Key responsibilities include:
Crafting evaluation pipelines to measure accuracy, precision, and recall across various classification tasks.
Creating ground truth datasets utilizing ambiguous real-world enterprise data.
Conducting systematic prompt engineering experiments to enhance LLM performance.
Developing A/B testing frameworks for model comparisons.
Investigating innovative approaches to process understanding, activity classification, and intent extraction.
Evaluating cost-accuracy trade-offs across diverse model architectures and prompting strategies.
Establishing automated world-model training infrastructures based on our ontology.
You will be pioneering methodologies that currently do not exist; you will define the playbook.
Supported by top-tier VCs like Accel and prestigious research institutions such as Princeton, we are at a pivotal moment in our journey with enterprises worldwide.
In this role, you will collaborate closely with founders and our engineering team to tackle technical challenges related to LLM evaluation, experimental design, and applied research.

