companyCerebras Systems logo

Distributed Software Engineer

Cerebras SystemsBengaluru, Karnataka, India; Sunnyvale CA or Toronto Canada
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Qualifications:Proven experience in software engineering, particularly in distributed systems and large-scale computing. Strong proficiency in programming languages such as Python, C++, or similar. Experience with automation frameworks and tools. Familiarity with cloud computing platforms and container orchestration. Ability to troubleshoot complex systems and optimize performance.

About the job

Cerebras Systems is at the forefront of AI technology, having developed the world's largest AI chip—56 times the size of traditional GPUs. Our revolutionary wafer-scale architecture enables the processing power of multiple GPUs on a single chip, simplifying programming and enhancing efficiency. This innovation allows our clients to experience unparalleled training and inference speeds, facilitating the seamless execution of large-scale machine learning applications without the complexity of managing numerous GPUs or TPUs.

Cerebras serves a diverse clientele, including leading model labs, global corporations, and pioneering AI startups. Notably, OpenAI has formed a multi-year partnership with Cerebras to harness 750 megawatts of power, revolutionizing key workloads through ultra-high-speed inference.

Our cutting-edge wafer-scale architecture enables Cerebras Inference to provide the fastest Generative AI inference solution globally, achieving speeds over 10 times faster than GPU-based cloud inference services. This significant acceleration is transforming how users interact with AI applications, promoting real-time iterations and enhancing intelligence through advanced computational capabilities.

About The Role

As a leader in large-scale AI supercomputers, Cerebras Systems deploys multi-exaflop supercomputers in some of the largest data centers worldwide. Our supercomputers leverage Wafer-Scale Cluster technology, consisting of multiple Wafer Scale Engine (WSE) chips. The Cluster engineering team is tasked with delivering comprehensive software solutions for our clusters.

Responsibilities

  • Automate the bare-metal configuration of networking, operating systems, and application software across extensive clusters of Cerebras WSE, servers, and switches.
  • Implement additional streamlined workflows for cluster upgrades, downgrades, and security patching, with key performance metrics designed to minimize cluster downtime.
  • Develop an orchestration and scheduling system for resource allocation and job submissions in a multi-user cluster environment.
  • Provide seamless support for both on-premise and cloud-based deployment and operations.
  • Create a robust monitoring system capable of detecting and addressing failures across various cluster resources, ensuring high availability.

About Cerebras Systems

Cerebras Systems is revolutionizing AI computing with the world's largest AI chip, enabling unprecedented integration and processing capabilities. Our technology is pivotal in transforming the landscape of machine learning, allowing users to harness the power of AI without the typical complexities associated with traditional hardware.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.