About the job
Join us at Kineto, an innovative platform designed to empower creators, educators, and small businesses in generating, deploying, and managing fully functional AI-powered web applications instantly and at scale. Our platform seamlessly integrates LLM-driven code generation, multi-tenant Postgres (Neon), dynamic hosting (GKE and Knative), automated deployments (Flux), analytics, billing, and a user-friendly chat-based experience, making software creation accessible to everyone. As our team expands rapidly, we are on the lookout for a skilled Cloud Infrastructure Engineer to design, build, and maintain our cloud-native platform, prioritizing scalability, reliability, and automated operations.
Key Responsibilities:
Cloud and Platform Engineering (DevOps):
- Architect, implement, and manage the fundamental infrastructure that powers Kineto's platform on Google Cloud Platform (GCP), focusing on networking, security, and identity management.
- Develop and operate resilient, highly available distributed systems utilizing Kubernetes (GKE), Knative, Istio, and other cloud-native technologies.
- Automate the entire infrastructure lifecycle (IaC) using Terraform and Terragrunt to ensure secure, reproducible, and auditable environments.
- Implement and maintain CI/CD pipelines (e.g., GitHub Actions and TeamCity) along with deployment tools such as Flux and Helm for GitOps-driven application delivery.
- Optimize and manage the multi-tenant data layer using Postgres and Neon, ensuring robust tenant isolation, performance, backups, and safe schema management.
Operational Excellence and Reliability:
- Drive site reliability engineering (SRE) practices, which include monitoring, alerting (Prometheus, Grafana), logging (Loki), and incident response.
- Address complex operational challenges, such as optimizing scale-to-zero for cost efficiency, minimizing cold starts, enhancing autoscaling behavior, and managing queue backpressure.
- Implement platform-wide performance tuning strategies (e.g., container resource limits, distributed locks, caching strategies, and garbage collection configurations).
- Ensure platform security and compliance by adopting best practices for secrets management, network segmentation, and vulnerability scanning.
Technical Leadership:
- Take ownership of significant infrastructure roadmap items including multi-region deployments, disaster recovery planning, advanced tenancy separation, and ephemeral preview environments.
- Promote DevOps and SRE principles within the engineering team, mentoring engineers on cloud-native best practices, operational readiness, and debugging complex distributed systems.
- Collaborate effectively with product and engineering teams to define the long-term vision for Kineto's infrastructure.

