companyDataHub logo

Site Reliability Engineering Tech Lead

DataHubPalo Alto, California, United States
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Manager

Qualifications

To be successful in this role, candidates should possess a strong background in site reliability engineering, with proven experience in designing and implementing scalable infrastructure solutions. Familiarity with cloud services, distributed systems, and automation tools is essential. Additionally, candidates should have excellent analytical skills, a proactive mindset, and the ability to work collaboratively in a dynamic team environment.

About the job

DataHub is an AI & Data Context Platform utilized by over 3,000 enterprises, including industry leaders like Apple, CVS Health, Netflix, and Visa. Developed in collaboration with a vibrant open-source community of more than 13,000 members, DataHub's metadata graph offers profound insights into AI and data assets, ensuring unparalleled scalability and extensibility.

Our flagship enterprise SaaS offering, DataHub Cloud, provides a fully managed solution featuring AI-driven discovery, observability, and governance tools. Organizations leverage DataHub's innovative solutions to enhance the value of their data investments, guarantee the reliability of AI systems, and establish unified governance, bringing order to the complexities of data management.

About the Role

We are in search of a seasoned Site Reliability Engineering (SRE) Tech Lead to join our team at DataHub. This pivotal role will involve spearheading initiatives that enhance the reliability, scalability, and operational excellence of our platform offerings. You will oversee technical projects across DataHub Cloud and our evolving enterprise deployment solutions, empowering customers with greater control and flexibility in managing DataHub within their preferred environments.

Key Responsibilities

Technical Leadership & Architecture

  • Design and develop robust, scalable infrastructure solutions for DataHub Cloud and enterprise deployments.
  • Lead the technical vision for multi-cloud deployment strategies and distributed system integrations.
  • Architect monitoring, observability, and alerting systems across various environments.
  • Promote best practices for infrastructure as code, configuration management, and deployment automation.

Enterprise Platform Development

  • Collaborate with product and engineering teams to shape the development of advanced deployment capabilities.
  • Work alongside cross-functional teams to create systems for seamless installation, upgrades, and rollback processes across diverse environments.
  • Contribute to the design and implementation of comprehensive monitoring and health check systems for distributed deployments.
  • Collaborate with engineering teams to develop self-healing and automated remediation capabilities.

Platform Reliability & Operations

  • Establish and uphold SLAs/SLOs for both cloud and enterprise offerings.
  • Lead incident response and conduct post-mortem analyses to drive continuous improvement.
  • Implement chaos engineering practices to enhance system resilience and reliability.

About DataHub

Acryldata is a pioneering technology company that focuses on delivering innovative AI and data solutions to enterprises, ensuring they can maximize their data investments and enhance operational efficiency.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.