About the job
About Tavily
At Tavily, we are revolutionizing the infrastructure layer for intelligent web interactions on a grand scale. Our API has been meticulously crafted to facilitate Retrieval-Augmented Generation (RAG) and real-time reasoning in AI systems. By seamlessly linking large language models (LLMs) with high-quality, reliable web content, we empower developers to create agents that are not just smart but also well-informed.
Collaborating with some of the most innovative teams in AI—from dynamic startups to major enterprises—we are the essential bridge connecting LLMs with the real world, whether it's enhancing sales assistants, developing research copilots, or optimizing internal knowledge tools.
The Role: DevOps Engineer
Oversee and manage Kubernetes clusters across diverse environments and regions.
Take ownership of infrastructure as code for all resources.
Enhance and maintain CI/CD pipelines and GitOps-based deployments.
Optimize and sustain real-time data pipelines processing billions of events daily through distributed queues and stream processors.
Develop comprehensive monitoring, alerting, and observability frameworks.
Troubleshoot production issues across various services.
Manage cloud costs and strategize capacity planning.
Work closely with a compact engineering team—owning the entire infrastructure.
What We're Looking For
Approximately 3+ years in a DevOps or platform engineering role within production environments.
Demonstrated experience in designing and managing large-scale, distributed systems, along with a robust understanding of API design, reliability, and performance at scale.
Extensive Kubernetes experience in a managed cloud environment.
Proficient in infrastructure as code (Terraform or similar technologies).
Experience with GitOps-based deployment workflows.
Established background in building or maintaining observability stacks (logging, metrics, alerting).
Ability to calmly and methodically manage production incidents.
Preferred Qualifications
Experience with multi-region deployments.
Familiarity with search infrastructure.
Knowledge of data pipeline technologies (streaming, warehousing).
Expertise in proxy/networking infrastructure at scale.

