companyOpenAI logo

Software Engineer, Collective Communication

OpenAISan Francisco
Hybrid Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Candidates should have a strong background in low-level, performance-critical software development, with experience in distributed algorithms and RDMA being a plus. Proficiency in C++ and CUDA is essential, along with familiarity with network simulation techniques.

About the job

About the Team

Join the innovative Workload Networking team at OpenAI, where we develop cutting-edge collective communication frameworks that power our most extensive training initiatives. Our team leverages advanced C++ and CUDA technologies to explore pioneering collective communication strategies, maximizing the efficiency of our flagship models on custom-built supercomputers.

The models we create are vital to the advancement of AI research at OpenAI and across the industry. We continuously integrate insights from our diverse research community into our training architecture.

About the Role

As a Software Engineer specializing in Networking, you will play a crucial role in designing and implementing tailored networking collectives that are seamlessly integrated into our training pipeline.

We seek candidates with a solid foundation in performance-critical low-level software development. Familiarity with collective communication principles is a valuable asset.

This position is located in San Francisco, CA, and follows a hybrid work model, requiring three days in the office per week, with relocation support available for new hires.

Your Responsibilities Include:

  • Collaborating with ML researchers to design and implement high-performance collective operations in C++ and CUDA.

  • Optimizing our extensive training jobs to fully leverage various network transport technologies utilized in our supercomputers.

  • Conducting simulations to guide the design of future supercomputer networks.

Ideal Candidates Will:

  • Have experience developing distributed algorithms utilizing RDMA.

  • Be proficient in writing low-level, performance-sensitive code for CPUs and/or GPUs.

  • Possess knowledge of network simulation methodologies.

About OpenAI

OpenAI is at the forefront of AI research and deployment, dedicated to ensuring that general-purpose artificial intelligence is a benefit to all of humanity. We challenge the limits of AI capabilities and strive to deploy these technologies safely through our innovative products. Our mission is rooted in creating AI responsibly, with a focus on safety and the diverse needs of humanity. We value varied perspectives and experiences to enrich our endeavors.

We are proud to be an equal opportunity employer, committed to inclusivity and diversity in the workplace.

About OpenAI

OpenAI is a leading AI research and deployment organization committed to ensuring that artificial intelligence serves the greater good of humanity. We innovate at the intersection of technology and safety, striving to responsibly harness the power of AI while embracing diverse perspectives and experiences.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.