companyDatabricks logo

Senior GenAI Research Engineer - Optimization and Kernels

DatabricksSan Francisco, California
On-site Full-time $166K/yr - $225K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Senior

Qualifications

We are looking for candidates with the following qualifications: A BS, MS, or PhD in Computer Science or a related field, coupled with practical experience in writing and optimizing CUDA kernels for machine learning training applications, or experience in distributed training frameworks (such as PyTorch DDP, DeepSpeed, Megatron-LM, FSDP). Proven expertise in deep learning methodologies and a robust understanding of GPU architectures. A collaborative mindset with excellent problem-solving abilities and a passion for advancing AI technologies.

About the job

At Databricks, we are dedicated to empowering data teams to tackle the world's most challenging problems, from detecting security threats to advancing cancer drug development. We achieve this by offering the premier data and AI platform, allowing our customers to concentrate on their mission-critical challenges.

The Mosaic AI organization assists companies in developing AI models and systems utilizing their own data, employing technologies that range from training large language models (LLMs) from the ground up to employing advanced retrieval methods for enhanced generation. We pride ourselves on pushing the boundaries of science and operationalizing our innovations. Mosaic AI believes that a company’s AI models hold intrinsic value, akin to any other core intellectual property, and that superior AI models should be accessible to all.

Job Overview

As a research engineer in the Scaling team, you will stay abreast of the latest advancements in deep learning and pioneer new methodologies that surpass the current state of the art. You will collaborate with a diverse team of researchers and engineers, sharing insights and expertise. Most importantly, you will be passionate about our customers, striving to ensure their success in implementing cutting-edge LLMs and AI systems by translating our scientific knowledge into practical applications.

Your Impact

  • Enhance performance through innovative optimization techniques, including kernel fusion, mixed precision, memory layout optimization, tiling strategies, and tensorization tailored for training-specific patterns.
  • Design, implement, and optimize high-performance GPU kernels for training workloads, including attention mechanisms, custom layers, gradient computations, and activation functions, specifically for NVIDIA architectures.
  • Create and implement distributed training frameworks for large language models, incorporating parallelism strategies (data, tensor, pipeline, ZeRO-based) and optimized communication patterns for gradient synchronization and collective operations.
  • Profile, debug, and optimize comprehensive training workflows to pinpoint and resolve performance bottlenecks, utilizing memory optimization techniques such as activation checkpointing, gradient sharding, and mixed precision training.

About Databricks

Databricks is at the forefront of the data and AI landscape, enabling organizations to harness their data for groundbreaking advancements across various sectors. Our commitment to innovation and excellence empowers companies to develop AI solutions that redefine the possibilities of technology.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.