About the job
Atlas is revolutionizing the restaurant industry by developing a comprehensive operating system designed to streamline the processes of starting, managing, and expanding restaurants, whether online or offline. The talented team at Atlas previously founded Grain, a venture-backed online restaurant that achieved millions in revenue. Today, Atlas empowers restaurants with innovative solutions such as online storefronts, POS systems, third-party logistics, and seamless integrations with food platforms and AI technologies.
Our current clientele includes notable names like SaladStop, Killiney, and Haidilao, and we are continuously bringing new brands into our ecosystem, including Casa Vostra, Artichoke, and Wewa, adding fresh restaurants every week.
Our team and investors hail from prestigious companies such as Y Combinator, Global Founders Capital, Grain, Accenture, Microsoft, Udacity, McKinsey, and Salesforce.
Explore our hiring memo here.
Role Overview
The Product Infrastructure Engineers at Atlas are crucial in propelling every engineering effort forward. You will construct the systems that enhance the safety, speed, and predictability of our shipping processes.
Your work will be situated at the crossroads of infrastructure and product, with the systems you design powering the fundamental experiences that span compute, databases, APIs, deployment pipelines, and measurement frameworks. Your contributions will not only support scalability; they will shape the evolution of Atlas as a product.
Key Responsibilities
- Design and develop robust infrastructure for multi-tenant computing, databases, queuing systems, and observability tools.
- Enhance deployment pipelines, implement feature gating, and facilitate canary rollouts to ensure safe and rapid shipping.
- Scale shared services and core platform components utilized across the Atlas ecosystem.
- Develop internal tools for monitoring, metrics, and experimentation to foster learning and reliability.
- Collaborate with product engineers to ensure scalability, performance, and fault tolerance are prioritized from the outset.
- Reassess abstractions and defaults that could hinder speed or resilience.
Required Skills and Experience
- 6+ years of experience in Software Engineering or Site Reliability Engineering (or Infrastructure Engineering).
- Proficiency with container orchestration platforms and tools such as Docker and Kubernetes.
- Experience with infrastructure as code and configuration management tools.
- Strong incident management skills and experience leading incident responses.
- Familiarity with Google Cloud Platform services and tools.
- Knowledge of modern observability platforms like Prometheus, Grafana, and ScoutAPM.
- Experience with Ruby on Rails and PostgreSQL is a plus.
Ideal Candidate Attributes
- You value speed and craftsmanship equally.
- You have created solutions that enhance both the product and the development process.
- You possess a systems-oriented mindset, understanding how code, data, and infrastructure influence product development.

