About the job
ABOUT THE ROLE:
As an AI Engineer at Varick, you will take charge of designing and optimizing the intelligence layer within our enterprise operations. This involves creating agent systems that efficiently handle thousands of transactions, make classification decisions, and learn from human interactions.
This position is suited for engineers with extensive experience in LLMs, agent architectures, and evaluation systems. You have successfully developed agent workflows that operate in production, not merely in demo environments. Your expertise includes prompt engineering, retrieval, tool calling, multi-agent orchestration, and the evaluation frameworks necessary for deploying trustworthy AI systems in enterprise settings.
WHAT YOU'LL DO:
• Design and implement agent architectures that tackle complex enterprise workflows, focusing on multi-step reasoning, tool calling, and exception handling.
• Develop and sustain evaluation systems that ensure agent quality, accuracy, safety, and groundedness.
• Create robust prompt systems, retrieval pipelines, and context engineering strategies to ensure reliable agent performance.
• Establish feedback loops that empower agents to learn from human corrections and enhance their functionality over time.
• Optimize inference costs and latency for production workloads to ensure efficiency.
• Define and uphold best practices for agent reliability, observability, and governance.
• Stay updated with the latest models, frameworks, and research to ensure impactful deployments into production.
WHAT WE'RE LOOKING FOR:
• At least 3 years of software engineering experience, with a minimum of 1–2 years dedicated to LLM applications or AI systems in a production environment.
• Practical experience in constructing agent workflows featuring tool calling, retrieval, and multi-step reasoning.
• A profound understanding of prompt engineering, context engineering, and methods to elicit reliable behavior from LLMs.
• Experience in developing evaluation systems to assess AI output quality.
• Proficient in Python with a solid foundation in backend engineering principles.
• You have delivered AI features to actual users and have navigated challenges such as hallucinations, edge cases, accuracy drops, and cost management.
• Must be based in San Francisco.

