About the job
About xAI
At xAI, our mission is to develop advanced AI systems that can comprehend the universe and assist humanity in its quest for knowledge. Our dedicated team is small, highly motivated, and committed to engineering excellence, making it an ideal environment for individuals who thrive on challenges and curiosity. We foster a flat organizational structure where every employee plays a crucial role in driving our mission forward. We value initiative and excellence, rewarding those who consistently demonstrate strong work ethic and prioritization skills. Effective communication is essential, and all team members are expected to share their insights clearly and concisely.
About the Team
You will join a team responsible for the backend services that power our innovative products, including grok.com and our API. Our focus is on developing and maintaining highly scalable and reliable services capable of efficiently processing tens of thousands of queries per second, hosted across multiple Kubernetes clusters in both on-premises and cloud environments.
About the Role
We are looking for a candidate who meets the following criteria:
- In-depth expertise in Kubernetes.
- Proficiency with continuous deployment systems, including Buildkite and ArgoCD.
- Extensive experience with monitoring tools such as Prometheus, Grafana, and PagerDuty.
- Strong knowledge of infrastructure as code practices utilizing tools like Pulumi or Terraform.
- Familiarity with systems programming languages such as Rust, C++, or Go.
- Experience in traffic management and HTTP proxies, such as nginx and envoy.
Location
This position requires in-person attendance in London, UK. While we typically work from the office five days a week, we do provide flexibility for remote work when necessary. Candidates should be prepared to attend late meetings at least once a week to coordinate with our global teams.

