About the job
GetYourGuide connects travelers with memorable experiences in over 12,000 cities. Since 2009, the company has helped millions discover new destinations. The Berlin headquarters leads a global team, with offices in cities such as New York and Bangkok. More than 850 employees collaborate to reshape how people find and book travel adventures.
The Staff Site Reliability Engineer joins the Operational Excellence team, which works to minimize disruptions, boost productivity, and build user trust. As GetYourGuide expands its AI-powered travel solutions, this role ensures engineering speed and reliability remain strong so customers enjoy seamless experiences.
What you will do
- Collaborate with product teams to improve system reliability, performance, and trust across the platform.
Incident management and reliability
- Reduce the number of incidents, as well as Mean Time to Detect (MTTD) and Mean Time to Recovery (MTTR).
- Lead post-incident reviews and turn findings into lasting improvements.
- Create tools and runbooks that speed up diagnosis and resolution of production issues.
- Foster a culture that treats incidents as learning opportunities, not blame assignments.
- Take part in the infrastructure on-call rotation.
Observability and production confidence
- Advance the Datadog-based observability stack, including metrics, logs, traces, dashboards, and alerts.
- Help teams define meaningful Service Level Objectives (SLOs) and prevent alert fatigue.
- Strengthen production debugging tools so engineers can solve issues independently.
Change confidence and release quality
- Lower change failure rates by guiding teams on effective testing and deployment practices.
Learn more about GetYourGuide’s team and mission at getyourguide.careers.

