About UsAt Heidi Health, we believe that healthcare deserves a more harmonious approach—one that ensures care remains continuous and deeply personalized. Our innovative AI Care Partner collaborates with healthcare providers to enhance the care experience for patients and clinicians alike.Our diverse team includes doctors, engineers, designers, researchers, and creatives, all dedicated to creating tools that empower clinicians to focus on what matters most: their patients.In just 18 months, we've reclaimed over 18 million hours for healthcare professionals, facilitating 73 million patient visits across 116 countries. Currently, our technology supports more than two million patient visits weekly worldwide.With nearly $100 million in funding, we are expanding our presence in the US, UK, Canada, and Europe, partnering with prestigious health systems such as the NHS, Beth Israel Lahey Health, and Monash Health.The OpportunityJoin our core Platform/SRE team, where you will take charge of production reliability. This role involves active incident response, on-call duties, system reliability, and daily operational oversight of Heidi’s platform.We welcome applications from mid-level SREs eager to embrace greater responsibility, as well as senior SREs who relish hands-on operational roles. This position emphasizes operational involvement and aims to maintain the health of real systems in production.Your ResponsibilitiesEngage in on-call and incident response: Address production incidents, assist in service restoration, and facilitate clear communication during incidents, escalating to leading incidents over time.Enhance operational reliability: Identify recurring issues and reliability risks, driving improvements through better alerting, automation, system enhancements, and process refinements.Manage production environment components: Operate and enhance Kubernetes clusters, cloud infrastructure, and core platform services, increasing responsibility as expertise grows.Boost observability: Refine dashboards, alerts, logs, and traces to enable earlier detection and faster diagnosis of issues, concentrating on actionable insights.Minimize operational toil: Automate repetitive tasks, streamline runbooks, and enhance tooling to facilitate smoother and safer on-call and daily operations.
Feb 26, 2026