About the job
Role Overview
As the SRE Tech Lead at Obsidian, you will play a key role in establishing and building the reliability framework for our complex, multi-tenant SaaS platform that serves enterprise and financial clients. You will collaborate with the DevOps and Platform Engineering leads to drive a cohesive reliability strategy across the organization.
Your primary objective is to ensure that Obsidian identifies every system failure before our customers do and communicates proactively when issues arise.
This hands-on technical leadership position comes with significant ownership and visibility, reporting directly to the CTO. You will design and implement systems capable of managing real-world complexities, including upstream SaaS dependencies, sparse and noisy data, and critical enterprise workloads.
Key Responsibilities:
- Map and instrument critical system paths for our top-tier enterprise customers
- Develop connector health models to classify issues:
- Internal defects (“our bug”)
- Upstream SaaS outages
- Anticipated sparse/low-signal scenarios
- Establish tiered incident communication:
- Public signals
- Current issues and updates

