About the job
As the DevOps Engineering Manager for Site Reliability Engineering (SRE) and Cloud Services, you will spearhead initiatives that guarantee the reliability, scalability, and performance of IntegriChain’s cloud platforms and production systems. Your leadership will encompass the management of DevOps and SRE functions, emphasizing cloud infrastructure, automation, and operational excellence.
In this role, you will collaborate closely with application engineering, platform security, and IT teams to facilitate product delivery while upholding the highest standards for availability, resilience, and security. This position requires a balance of people leadership and hands-on technical involvement, making it crucial for the success of our SaaS platforms within the healthcare and life sciences sector.
Typical Day in This Role
Your day begins with daily standups or operational check-ins with your team. You will assess system health, review ongoing work, manage incidents, and prioritize tasks to ensure that the team is focused on critical objectives and that potential risks are mitigated promptly. You will maintain close oversight of production systems through dashboards, alerts, and direct discussions with engineers.
Throughout the day, you will actively engage with DevOps, SRE, and application engineering teams to eliminate obstacles and facilitate progress. This may involve troubleshooting issues, guiding technical decisions, or coordinating inter-team efforts to address dependencies. You will also play a pivotal role in design and architecture conversations, helping teams consider aspects of reliability, scalability, performance, and operational readiness.
Given the team's operation across multiple time zones, you will dedicate time to coordinating efforts and ensuring clear communication across regions. You will establish shared processes, clear handoffs, and consistent expectations to ensure that work continues seamlessly around the clock.
In the event of incidents or operational challenges, you will support response efforts, coordinate resolutions, and ensure follow-up actions are executed. Over time, you will drive lasting improvements by enhancing automation, cloud practices, and reliability standards.
Key Responsibilities
Leadership in DevOps and SRE
- Lead and nurture a team of DevOps and SRE engineers to support cloud infrastructure and production systems.
- Establish clear priorities, objectives, and standards for reliability, performance, and operational readiness.
- Promote a culture of ownership, continuous improvement, and learning within the team.
Cloud and Platform Operations
- Oversee cloud infrastructure across diverse environments, ensuring scalability, resilience, and cost-effectiveness.
- Champion the adoption of infrastructure as code, automation practices, and standardized tools.
- Collaborate with engineering teams to address platform needs and support production deployments.

