About the job
At Braze, our team is not just a workforce; we are a community of approachable, kind, and passionate individuals. We are united by a shared commitment to excellence, teamwork, and a harmonious work-life balance as we navigate rapid global growth while promoting equity and opportunity within and beyond our organization.
To succeed in our environment, you must be ready to set high standards for yourself and inspire those around you. We believe there is always a way to contribute; autonomy, accountability, and openness to new perspectives are critical to our success.
Our insatiable curiosity and eagerness to share diverse interests create a unique vibrancy in our culture. If you are motivated to tackle exciting challenges and embrace change with action, you will have the opportunity to make a significant impact alongside a dedicated and passionate team. If Braze resonates with you, we are eager to meet you!
WHAT YOU'LL DO
Join the AI Platform team at Braze to develop and expand the BrazeAI Decisioning Studio—a cutting-edge reinforcement learning platform that spearheads AI Decisioning. This platform facilitates continuous experimentation and tailors customer engagement at an individual level, enabling brands to evolve from traditional rule-based campaigns to autonomous, self-optimizing interactions. You will work at the intersection of cloud-native infrastructure, data-intensive systems, and machine learning in production.
Key Responsibilities:
- Design and uphold essential services and subsystems on our AI platform, balancing performance with cost efficiency.
- Implement cloud-native solutions that guarantee reliability, scalability, and fault tolerance.
- Resolve production incidents comprehensively, delving deep to identify root causes and apply sustainable fixes.
- Contribute to observability practices using Sentry and Datadog to proactively identify issues and reduce downtime.
- Collaborate with data scientists, ML engineers, and product teams to transform real-world use cases into platform capabilities.
- Enhance the developer experience by optimizing workflows, improving tools, and advocating for MLOps best practices.
Technological Stack:
- Core Data & ML: Python, Ibis, FastAPI, Dataproc (Spark), SQL, BigQuery, MLflow, Streamlit
- Platform & Infrastructure: Google Cloud Platform, AWS, Kubernetes, Helm, Terraform
- Workflows & Orchestration: Airflow, RabbitMQ, Celery
- CI/CD: GitHub Actions, Jenkins

