About the job
About the Team and Our Vision
As the Lead Observability Engineer, you will spearhead the strategy, implementation, and advancement of observability practices across all production and delivery environments. Your pivotal role will be to ensure system reliability, enhance performance visibility, and facilitate proactive issue resolution across our platforms.
Operating at the crossroads of Engineering, DevOps, and Production Support, you will instill structure, standardization, and intelligence into our monitoring and system management practices. You will lead the transition from reactive operations to a proactive, AI-driven observability model, focusing on automated reliability.
In this exciting role, your responsibilities will include:
- Owning and enhancing the observability platform (e.g., New Relic) to deliver comprehensive visibility across applications and infrastructure.
- Setting standards for monitoring, alerting, dashboards, and telemetry (logs, metrics, traces).
- Utilizing AIOps capabilities to enhance anomaly detection, diminish noise, and expedite root cause analysis.
- Driving automation and self-healing workflows to reduce manual intervention and enhance system resilience.
- Collaborating across teams to ensure systems are designed with observability in mind and meet reliability objectives.
- Continuously evaluating system behavior and incident patterns to optimize performance, scalability, and uptime.
Join us in building a highly reliable, data-driven, and scalable operational ecosystem, where observability serves as a fundamental pillar for engineering excellence.

