About the job
ABOUT BASETEN
At Baseten, we empower AI innovators by providing mission-critical inference solutions for some of the most dynamic companies in the field, including Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer. Our unique blend of applied AI research, adaptable infrastructure, and intuitive developer tools allows organizations at the forefront of AI to deploy state-of-the-art models efficiently. With rapid growth and a recent $300M Series E funding round led by prominent investors like BOND, IVP, Spark Capital, Greylock, and Conviction, we are on an exciting journey. Join us in shaping the platform that engineers rely on to launch AI products successfully.
THE OPPORTUNITY
We are actively seeking early-career Software Engineers to join our dynamic team in Vancouver, BC. This specialized position merges high-performance computing (HPC) with Large Language Model (LLM) engineering. You'll take charge of creating an automated suite of tools designed to diagnose and enhance our next-generation AI infrastructure.
In this role, you will delve deep into model performance, breaking down systems to analyze their efficiency at the hardware level. You will develop tools for measuring GPU FLOPS, stress-testing InfiniBand clusters, and establishing the benchmarks necessary for production readiness.
RESPONSIBILITIES
Performance Benchmarking: Automate and execute standard LLM quality benchmarks (GSM8K, MMLU) alongside tailored performance suites for specific workloads, including long-context windows and KV cache reuse.
Infrastructure Validation: Design and implement automated acceptance tests for new GPU clusters across both x86 and ARM systems, evaluating GPU memory bandwidth, networking throughput, and multi-node networking performance.
Model Development Experience: Create and maintain internal GPU-enabled development environments akin to GitHub Codespaces, ensuring the team has access to high-performance "dev machines" optimized for model experimentation.
Tool Development: Contribute to and enhance tools such as InferenceMAX and genai-bench to automate model evaluation and optimization processes.
Deep Hardware Profiling: Utilize PyTorch Profiler and NVIDIA Nsight Systems to gather performance profiles, pinpoint bottlenecks, and debug NVIDIA compute/networking issues.

