About the job
Join our esteemed partner, a premier financial services institution, as they navigate the complexities of a vast enterprise IT ecosystem catering to millions of users. Their mission centers on developing stable, scalable, and cutting-edge digital platforms while consistently enhancing reliability and user satisfaction. In line with a transformative technological journey, they are fortifying their observability and reliability engineering capabilities.
Key Responsibilities:
- Participate in the architecture and execution of observability solutions, including high-level and low-level designs.
- Create and manage logging, metrics, and distributed tracing systems.
- Develop and oversee monitoring dashboards and alerting mechanisms.
- Assist in incident analysis and root cause investigations.
- Implement improvements to system reliability by leveraging SRE methodologies.
- Establish and enforce observability standards and best practices.
- Automate monitoring and operational processes.
- Work collaboratively with infrastructure and application teams to enhance system visibility and operability.

