companyCerebras Systems logo

IT Site Reliability Engineering (SRE) Team Lead

Cerebras SystemsBengaluru, Karnataka, India
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Manager

Qualifications

Key Qualifications:Proven experience in Site Reliability Engineering or a similar role. Strong background in software engineering and IT operations. Experience with infrastructure as code, SLOs, and automated systems. Excellent problem-solving skills and a proactive approach to system reliability. Experience leading teams and managing cross-functional projects.

About the job

Cerebras Systems is at the forefront of AI innovation, engineering the world's largest AI chip, which is 56 times larger than traditional GPUs. Our revolutionary wafer-scale architecture delivers the computational power of dozens of GPUs on a single chip, simplifying programming and enabling users to run extensive ML applications seamlessly without managing multiple GPUs or TPUs.

We proudly serve a diverse range of customers, including leading model laboratories, global corporations, and pioneering AI startups. Recently, we established a multi-year collaboration with OpenAI, aiming to scale up to 750 megawatts and revolutionize workloads with ultra-fast inference.

Leveraging our innovative wafer-scale architecture, Cerebras Inference offers the fastest Generative AI solution globally, boasting speeds over 10 times quicker than conventional GPU-based hyperscale cloud inference services. This significant speed enhancement is transforming how users experience AI applications, facilitating real-time iterations and boosting intelligence through advanced computation.

About The Role

We are looking for a seasoned IT SRE Team Lead to establish and manage the reliability function for Cerebras' internal technology infrastructure.

As the IT SRE Team Lead, you will oversee the availability, performance, and operational quality of the systems that Cerebras employees depend on daily, which include identity management, endpoint management, collaboration tools, SaaS applications, and internal networking. The ideal candidate will adopt a software engineering perspective in IT operations, treating corporate infrastructure as code, defining measurable SLOs, automating remediation processes, and relentlessly minimizing toil.

You will build and lead a small, high-impact team of engineers responsible for developing tools, writing automation scripts, and troubleshooting issues as they arise. You will work closely with our security, networking, and infrastructure teams to ensure seamless operations.

About Cerebras Systems

Cerebras Systems is a trailblazer in AI technology, recognized for creating the largest AI chip on the market. Our unique approach allows for unparalleled computational power, driving the future of machine learning and AI applications. We collaborate with top-tier organizations and research labs to push the boundaries of technology and innovation.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.