About UsAt Heidi Health, we believe that healthcare deserves a more harmonious approach—one that ensures continuous and deeply human care. Our mission is to develop an AI Care Partner that collaborates with clinicians to achieve this goal.Our diverse team comprises doctors, engineers, designers, researchers, and creatives dedicated to creating tools that empower clinicians to concentrate on what really counts: their patients.In just 18 months, we've reclaimed over 18 million hours for healthcare professionals, facilitating 73 million patient visits across 116 countries. Currently, over two million patient visits weekly are powered by Heidi around the globe.Supported by nearly $100 million in funding, we are expanding into the US, UK, Canada, and Europe. We collaborate with premier health systems, including the NHS, Beth Israel Lahey Health, and Monash Health.The PositionThe Senior Site Reliability Engineer will join our core Platform/SRE team responsible for production. You will directly engage in incident response, on-call duties, system reliability, and the daily operations of Heidi’s platform.We welcome strong mid-level SRE candidates eager to take on more responsibility, as well as seasoned SREs who thrive in hands-on operational roles. This position is purposefully operations-focused, with an emphasis on maintaining the health of real systems in production.Your ResponsibilitiesEngage in on-call and incident response: Address production incidents, assist in service restoration, and ensure clear communication during incidents, gradually taking on more leadership in managing incidents.Enhance operational reliability: Identify recurring issues and reliability risks, driving improvements through better alerting, automation, system adjustments, or process enhancements.Oversee components of the production environment: Manage and enhance Kubernetes clusters, cloud infrastructure, and core platform services, with increasing ownership as you gain experience.Bolster observability: Improve dashboards, alerts, logs, and traces to ensure quicker detection and diagnosis of issues, focusing on actionable insights.Minimize operational toil: Automate repetitive tasks, streamline runbooks, and enhance tooling to make on-call and daily operations more efficient and secure.
Feb 10, 2026