About the job
About Bretton AI
Bretton AI is at the forefront of artificial intelligence solutions tailored for the financial services sector. Trusted by industry leaders such as Robinhood, Mercury, and Gusto, we specialize in automating crucial operations, with a focus on anti-money laundering and counter-terrorism investigations.
Having secured over $95 million in funding from prominent investors like Greylock, Y Combinator, and Thomson Reuters Ventures, our headquarters are situated in the vibrant downtown of San Francisco. Our diverse team comprises individuals from esteemed organizations including SpaceX, Google, Netflix, Stripe, and Plaid.
The Role
As a Senior Infrastructure Engineer, you will be pivotal in creating the robust infrastructure that allows for the deployment of secure and compliant AI systems within major financial institutions, addressing financial crime on a grand scale. Our infrastructure is based on a state-of-the-art, container-native architecture utilizing Docker and Kubernetes to provide reliable, auditable deployments across varied customer environments.
Your responsibilities will extend to collaborating directly with our largest clients—institutions that serve billions—to architect, automate, and fortify both our on-premises and cloud environments, ensuring adherence to stringent regulatory and performance standards such as SOC 2 compliance. Your efforts will be guided by genuine customer needs and will have a wide-reaching impact, necessitating the construction of enterprise-grade systems, seamless cooperation with engineering and customer teams, a solid understanding of financial services compliance, and an ability to adapt swiftly to changing requirements.
What You’ll Do
- Oversee and enhance our Kubernetes infrastructure, encompassing cluster management, service mesh configurations, and container security protocols.
- Design and implement advanced delivery pipelines featuring canary deployments, automated rollbacks, and deployment health validations.
- Develop and sustain our observability infrastructure in Datadog, including dashboards, monitors, SLOs, and distributed tracing capabilities.
- Lead incident responses for critical outages and proactively assess capacity requirements for low-latency AI inference.
- Architect and automate secure infrastructure leveraging Infrastructure-as-Code for VPCs, IAM policies, Kubernetes manifests, and private cloud deployments.
- Maintain and enhance the infrastructure controls that ensure our SOC 2 compliance.
- Guide customer engagements for enterprise rollouts and mentor mid-level engineers on best practices in infrastructure.
What We’re Looking For
Must-Haves:
We are seeking candidates who are not only technically proficient but also passionate about working in a dynamic environment that directly impacts the financial sector.

