About the job
Become a Part of Cleric
At Cleric, we are envisioning a future where engineers can dedicate their talents to creating groundbreaking products, liberated from the burdens of operational tasks. Our current focus is an AI-driven Site Reliability Engineering (SRE) agent that autonomously resolves issues within production environments. This innovative solution employs a large language model (LLM) reasoning engine to understand, interpret, and implement solutions to production problems, even when faced with new challenges.
Now is an incredible moment to join our team. We are a compact group of seasoned professionals in AI, software development, and infrastructure, supported by a prominent AI venture capital firm and Silicon Valley investors. Our product is actively utilized by leading technology companies in sectors such as fintech, ride-hailing, and autonomous vehicles.
About the Role
We are on the lookout for a Staff Software Engineer specializing in AI. In this role, you will spearhead the development of our SRE agent, focusing on its reasoning capabilities, learning processes, and integration. You will also work on the foundational runtime and technical architecture of our comprehensive product. Your contributions will be crucial in maintaining high engineering standards, crafting exceptional customer experiences, and advancing the agent's ability to tackle increasingly complex SRE challenges in a dynamic, non-deterministic environment where success is evaluated through outcomes rather than code inspection.
What You'll Do:
Design and implement frameworks for evaluating agent performance and reliability
Analyze agent behavior beyond mere code: trace reasoning, tool selections, and contextual factors to comprehend decision-making processes.
Develop autonomous agents capable of executing intricate, multi-step tasks
Create self-improving systems that adapt based on feedback and outcomes
Enhance agent observability to address routine issues related to tool usage and agent orchestration
You Bring:
Extensive experience in building and managing production systems, with a solid understanding of machine learning fundamentals, including recall, precision, and F1 scores, along with the knowledge that accuracy alone can be misleading.
Experience in building at least one agent from the ground up.
Regular use of coding agents and AI assistants for practical tasks, with insights into their strengths and weaknesses.
Strong foundational skills in software engineering, with informed perspectives on developing reliable, scalable, and secure software.
Experience operating high-scale distributed infrastructures, with an emphasis on performance and efficiency.

