companyDatabricks logo

Lead Staff Software Engineer - GenAI Inference

DatabricksSan Francisco, California
On-site Full-time $190.9K/yr - $232.8K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Senior

Qualifications

To succeed in this role, you should possess a robust background in software engineering, with extensive experience in high-performance systems and a deep understanding of machine learning inference. Your technical expertise should be complemented by excellent collaboration skills, allowing you to work effectively with cross-functional teams and researchers. You should also have a track record of leading complex projects and driving architectural decisions to successful outcomes.

About the job

P-1285

About This Role

Join Databricks as a Staff Software Engineer specializing in GenAI inference, where you will spearhead the architecture, development, and optimization of the inference engine that powers the Databricks Foundation Model API. Your role will be crucial in bridging cutting-edge research with real-world production requirements, ensuring exceptional throughput, minimal latency, and scalable solutions. You will work across the entire GenAI inference stack, including kernels, runtimes, orchestration, memory management, and integration with various frameworks and orchestration systems.

What You Will Do

  • Take full ownership of the architecture, design, and implementation of the inference engine, collaborating on a model-serving stack optimized for large-scale LLM inference.
  • Work closely with researchers to integrate new model architectures or features, such as sparsity, activation compression, and mixture-of-experts into the engine.
  • Lead comprehensive optimization efforts focused on latency, throughput, memory efficiency, and hardware utilization across GPUs and other accelerators.
  • Establish and uphold standards for building and maintaining instrumentation, profiling, and tracing tools to identify performance bottlenecks and drive optimizations.
  • Design scalable solutions for routing, batching, scheduling, memory management, and dynamic loading tailored to inference workloads.
  • Guarantee reliability, reproducibility, and fault tolerance in inference pipelines, including capabilities for A/B testing, rollbacks, and model versioning.
  • Collaborate cross-functionally to integrate with federated and distributed inference infrastructure, ensuring effective orchestration across nodes, load balancing, and minimizing communication overhead.
  • Foster collaboration with cross-functional teams, including platform engineers, cloud infrastructure, and security/compliance professionals.
  • Represent the team externally through benchmarks, whitepapers, and contributions to open-source projects.

What We Look For

  • A BS/MS/PhD in Computer Science or a related discipline.
  • A solid software engineering background with 6+ years of experience in performance-critical systems.
  • A proven ability to own complex system components and influence architectural decisions from conception to execution.
  • A deep understanding of ML inference internals, including attention mechanisms, MLPs, recurrent modules, quantization, and sparse operations.
  • Hands-on experience with CUDA, GPU programming, and essential libraries (cuBLAS, cuDNN, NCCL, etc.).
  • A strong foundation in distributed systems design, including RPC frameworks, queuing, RPC batching, sharding, and memory partitioning.
  • Demonstrated proficiency in diagnosing and resolving performance bottlenecks across multiple layers (kernel, memory, networking, scheduler).

About Databricks

Databricks is a leading data and AI company that empowers organizations to unlock the potential of their data. Our platform combines data engineering, data science, and machine learning, enabling teams to collaborate and innovate seamlessly. We are committed to fostering a culture of excellence, creativity, and inclusivity, making us a great place to work and grow your career.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.