companyxAI logo

AI/HPC Network Development Engineer - Networking

xAIPalo Alto, CA
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Proven expertise in RoCEv2 and high-performance computing networks. Strong knowledge of NCCL and experience in developing metric dashboards. Exceptional analytical skills combined with a proactive approach to problem-solving. Ability to work collaboratively in a fast-paced environment while managing multiple priorities. Excellent communication skills for sharing insights and knowledge within the team.

About the job

About xAI

At xAI, we are on a mission to develop advanced AI systems that can profoundly comprehend the universe and support humanity in its quest for knowledge. Our team is compact yet highly driven, emphasizing engineering excellence and innovation. We seek individuals who relish challenges and thrive on curiosity, contributing directly to our mission in a collaborative, flat organizational structure. Initiative and a commitment to delivering outstanding results are paramount. Strong communication skills are essential, enabling team members to convey knowledge effectively and precisely.

 

About the Role

xAI has pioneered the creation of a 100k GPU cluster on an Ethernet network, achieving this remarkable feat twice in just 92 days. We are currently seeking an experienced engineer proficient in RoCEv2 to scale our operations while enhancing performance and reliability.

Our rapid development pace with cutting-edge hardware is crucial in deepening our understanding of the universe. To achieve our next major breakthrough, we must take charge of our network performance and availability, optimizing them for our training models and customer inference queries. Your role will predominantly involve diving deep into NCCL, creating metric dashboards, and fine-tuning configurations to maximize performance. You will play a pivotal role in designing the next generation of our backend and front-end networks, enabling seamless expansion of our GPU infrastructure with minimal engineering intervention.

Expect considerable travel to Memphis for capacity expansion, participation in team on-call rotations, and assistance with scaling and maintenance initiatives. This position promises to be both dynamic and rewarding.

About xAI

xAI is at the forefront of AI innovation, striving to create systems that not only understand but also interpret the complexities of the universe. Our culture is centered around engineering excellence and collaborative teamwork, inviting those who are curious and driven to join us in making impactful contributions.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.