About the job
About Chalkboard:
Chalkboard is pioneering the next generation of sports gaming. Our mission is to seamlessly merge watching and playing by transforming real-money sports gaming into a dynamic, social experience designed for fans eager to win. We are redefining how sports enthusiasts connect with the games they cherish.
At our essence, we are a team of passionate, sports-loving innovators who prioritize transparency, equity, and the excitement of empowering fans to turn insights into actionable strategies.
The Role:
We are on the lookout for a Principal Site Reliability Engineer to join our ranks at Chalkboard, contributing to the creation of a platform that is not only reliable and scalable but also user-friendly for our development teams.
In this pivotal role, you will collaborate with Engineering, Product, and Data teams, significantly impacting how millions of fans engage with sports in real time. If you thrive in a fast-paced environment, love to build robust solutions from the ground up, and aim to achieve team success rather than individual accolades, we want to hear from you!
Your Game Plan:
Take ownership of platform reliability from start to finish, proactively identifying and mitigating risks before they affect users.
Develop and enhance observability (metrics, logs, tracing) to facilitate rapid issue detection, diagnosis, and resolution.
Anticipate infrastructure needs by identifying bottlenecks and implementing sustainable architectural improvements.
Minimize developer friction by refining CI/CD pipelines, deployment workflows, and internal tools.
Lead incident responses and root cause analyses, focusing on systemic solutions rather than temporary fixes.
Establish and uphold best practices for infrastructure, deployments, and system reliability.
Create reusable, self-service infrastructure that empowers teams to deploy quickly and securely.
Continuously enhance systems through automation and Infrastructure-as-Code methodologies.
What You Bring to the Team:
Experience with Cloud Infrastructure (preferably GCP): including networking, IAM, databases, and storage.
Proficiency in Kubernetes: managing cluster operations and workloads.
Skilled in Infrastructure as Code tools: Terraform, Helm.
Familiarity with CI/CD practices: using GitHub Actions or similar tools.
Knowledge of observability practices: metrics, logging, tracing, and alerting.

