About the job
Location Details:
At GoDaddy, our approach to the future of work is diverse and flexible. While some teams operate fully in-office, others enjoy a hybrid model, working both remotely and in-office. There are also teams that work entirely remotely.
This position allows you to work from home, with occasional visits to a GoDaddy office for team events or meetings.
Join Our Innovative Team
GoDaddy is on the lookout for a skilled Site Reliability Engineer to become a vital part of our Monitoring and Observability team. In this pivotal role, you will focus on ensuring the reliability, performance, and availability of our infrastructure, which supports millions of customers around the globe. You will collaborate at the intersection of development and operations, creating and maintaining observability solutions that foster proactive monitoring and swift incident response across both cloud-based and on-premises systems.
Your Responsibilities Include:
Observability & Monitoring
- Design and maintain comprehensive monitoring solutions that encompass metrics, logs, and traces.
- Deploy and manage monitoring infrastructures utilizing tools such as the Grafana Labs ecosystem, ICINGA2, Site24x7, SNMP, and various API integrations.
Reliability & Incident Response
- Respond promptly to automated alerts and production incidents.
- Participate in on-call rotations to support global operations.
- Collaborate with engineering teams to address availability, performance, and security challenges.
Automation & Tooling
- Develop automation processes to minimize operational overhead and enhance reliability.
- Create self-service observability tools for engineering teams.
- Support CI/CD pipelines related to monitoring infrastructure.

