About the job
Our Mission
At Reflection AI, our mission is to develop open superintelligence accessible to everyone.
We are creating open weight models for individuals, enterprises, agents, and nation-states. Our diverse team includes AI researchers and industry veterans from leading organizations such as DeepMind, OpenAI, Google Brain, Meta, Character. AI, and Anthropic.
Foundations
Vision:
We aim to establish and maintain a company-wide foundational platform designed to enhance every team by offering dependable, scalable development infrastructure, Site Reliability Engineering (SRE) capabilities, and high-throughput data ingestion tools. This enables Reflection to expedite its growth as we scale.
What This Team Does
Our team builds and manages the essential platform layer that every engineering team relies on. We craft clear, opinionated golden paths for cloud infrastructure, networking, and access patterns, ensuring that the safest and most scalable routes are also the easiest to navigate. We work across various domains including cloud infrastructure, network primitives, security protocols, and developer tools to empower engineers to deploy rapidly while ensuring reliability, security, and cost predictability.
Define and uphold golden paths for cloud projects and access patterns.
Implement secure defaults and guardrails, making the safe path the most straightforward option.
Construct and manage core networking components such as routing, DNS, load balancing, and connectivity.
Establish baseline observability and operational best practices across services and infrastructure.
Oversee developer infrastructure including build systems, CI/CD, caching, and testing infrastructure to facilitate rapid shipping.
Enhance cloud cost visibility and reduction through effective tooling, guardrails, and optimization strategies without hindering team progress.
What You'll Work With
Cloud & Multi-Cloud Architecture: GCP, AWS, multi-cloud configurations, VPC architecture, private DNS, load balancers, advanced routing (BGP), and large-scale Kubernetes clusters.
Identity & Security Foundations: IAM, OIDC, RBAC, KMS, secrets management, policy-as-code (OPA, Gatekeeper), secure defaults, and cloud guardrails.
Infrastructure & Automation: Terraform, Pulumi, foundational Kubernetes, CI/CD systems, Bazel, build caching, and hermetic builds.
Observability & Cost Governance: Prometheus, Grafana, OpenTelemetry, and more.

