companyBaseten logo

Site Reliability Engineer (SRE)

BasetenSan Francisco Office
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Proven experience as a Site Reliability Engineer or similar role in a cloud environment. Strong understanding of cloud services (AWS, Azure, GCP) and container orchestration (Kubernetes, Docker). Proficiency in scripting languages such as Python, Go, or Bash. Experience with CI/CD tools and practices. In-depth knowledge of monitoring and logging tools (Prometheus, Grafana, ELK stack). Excellent problem-solving skills and a proactive approach to overcoming challenges. Strong communication skills to work collaboratively with diverse teams.

About the job

ABOUT BASETEN

Baseten is at the forefront of powering mission-critical AI inference for some of the most innovative companies globally, including Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer. We integrate cutting-edge applied AI research with a flexible infrastructure and intuitive developer tools to empower companies at the leading edge of AI to deploy sophisticated models effectively. With our recent $300M Series E funding round—supported by prominent investors such as BOND, IVP, Spark Capital, Greylock, and Conviction—we are rapidly expanding. Join our dynamic team and contribute to creating an essential platform for engineers to launch AI products with ease.

THE ROLE

As a Site Reliability Engineer, you will design and implement resilient systems and processes that ensure our infrastructure is scalable, reliable, and efficient. Your responsibilities will encompass everything from automating deployments and monitoring systems to enhancing performance and managing incidents effectively.

Collaboration is key; you will work closely with our users to understand their challenges in operationalizing machine learning, facilitating their onboarding onto our platform, and leveraging these insights to inform improvements to Baseten.

EXAMPLE INITIATIVES

As part of our Infrastructure team, you will engage in exciting projects such as:

RESPONSIBILITIES

  • Design and maintain scalable infrastructures to support the deployment and operational needs of machine learning models.
  • Establish standards and best practices to enhance reliability and performance across the infrastructure.
  • Proactively identify and resolve reliability issues using monitoring and alerting systems.
  • Collaborate with cross-functional teams to apply best practices in infrastructure management and incident response.
  • Create automation scripts to streamline processes and reduce manual intervention.

About Baseten

Baseten is a pioneering technology company that enables the world's leading AI firms to operationalize their models efficiently. By combining advanced AI research with a flexible infrastructure and user-friendly developer tools, we help our clients implement cutting-edge AI solutions. Our rapid growth and substantial investment from top-tier venture capitalists highlight our commitment to innovation and excellence in the AI space.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.