About the job
At Palona, we are pioneering the integration of cutting-edge generative and multimodal AI into the hospitality sector. Our dynamic engineering team drives innovation at a rapid pace, utilizing generative AI models to create products that continually adapt and improve. In this fast-evolving landscape, traditional software excellence needs to evolve to accommodate the unique nature of AI outputs, which differ significantly from conventional software failures. This position is crucial in establishing an engineering discipline that identifies potential issues before they impact our customers.
In this role, you will engage with evaluation pipelines, observability, cloud infrastructure, and CI/CD processes to enhance Palona's AI agent platform. You will blend DevOps and AI reliability, overseeing production infrastructure while developing tools that ensure optimal AI agent performance.
Responsibilities
As an AI Reliability Engineer, your key responsibilities will include:
- Creating and implementing observability systems to identify quality degradation, latency issues, and system anomalies in production, including the development of instrumentation, dashboards, and alerting mechanisms.
- Writing and maintaining automated tests to assess agent output quality, incorporating deterministic checks and LLM-as-judge evaluations.
- Developing automated release and validation systems to streamline deployments across different environments and enforce quality gates for AI-driven products.
- Building and refining platform infrastructure using infrastructure as code, with a strong emphasis on reliability, scalability, and cost efficiency.
- Enhancing evaluation pipelines that gauge AI agent conversation quality, accuracy, and safety, collaborating with product and engineering teams to refine evaluation criteria.
- Designing and developing internal tools and services that bolster AI reliability, evaluation, and operational workflows.
- Architecting new systems to tackle emerging reliability and quality challenges within the AI agent platform.
- Producing production-grade code for reliability and evaluation infrastructure, contributing as a software engineer rather than merely an operator.

