About the job
Join the Founding Team as a DevOps Engineer!
Location: San Francisco Bay Area
Type: Full-Time
Compensation: Competitive salary plus meaningful equity at the founding tier.
Supported by 8VC, we are on a mission to assemble an exceptional team addressing one of the industry's most pressing infrastructure challenges.
About the Role
We are developing an AI-native, multi-tenant enterprise platform designed for complex industrial domains. In this cutting-edge architecture, the role of DevOps extends beyond just feature delivery; it encompasses operationalizing intelligent agents, ensuring traceability across AI systems, and maintaining mission-critical machine learning infrastructure at scale.
We are seeking a passionate DevOps engineer who will take ownership of our infrastructure from Day 1, automating everything from CI/CD processes to observability, cloud governance, and security protocols. You will collaborate with a highly skilled team focused on building real-time AI pipelines and multi-agent systems. If you aspire to be the driving force behind a fast, secure, reliable, and explainable platform, this position is for you.
Key Responsibilities
Develop and maintain scalable cloud infrastructure across AWS, GCP, and Azure, prioritizing secure, tenant-isolated deployments.
Own and enhance CI/CD systems (e.g., GitHub Actions, ArgoCD) with progressive rollout, testing, and rollback strategies.
Establish observability tools across services, agents, and pipelines (including OpenTelemetry, Prometheus, Grafana, and Sentry).
Implement policy-as-code (OPA, Rego) to ensure deployment safety, RBAC, audit logging, and approval workflows.
Define and uphold SLAs, uptime targets (99.99%+), incident response, and remediation workflows.
Secure infrastructure through IAM, VPC, encryption, key management, image scanning, and secrets rotation.
Automate deployments, infrastructure provisioning (using Terraform, Helm), and environment replication.
Qualifications
Essential Experience:
4–10+ years of experience in DevOps, platform engineering, or Site Reliability Engineering (SRE) in production-grade systems.
Extensive experience with Docker, Kubernetes (EKS/GKE), and either Terraform or Pulumi.
Proven track record in deploying and monitoring distributed cloud applications.

