companyBaseten logo

Software Engineer - Model Performance

BasetenSan Francisco
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Qualifications:Proven experience in backend software development, particularly in Python or similar programming languages. Strong understanding of machine learning principles and algorithms, with a focus on model optimization techniques. Experience with frameworks such as TensorFlow, PyTorch, or similar ML libraries. Familiarity with performance optimization techniques, particularly for LLMs. Excellent problem-solving skills and a collaborative mindset. A passion for AI and a commitment to continuous learning in the field.

About the job

ABOUT BASETEN

Baseten is at the forefront of AI technology, empowering leading-edge companies like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer to seamlessly integrate advanced AI models into their operations. Our unique blend of applied AI research, adaptable infrastructure, and intuitive developer tools enables innovators to bring their most ambitious AI products to life. With our recent $300M Series E funding from top-tier investors such as BOND, IVP, Spark Capital, Greylock, and Conviction, we are poised for rapid growth. Join us in shaping the platform that engineers rely on to deploy transformative AI solutions.

THE ROLE

Are you driven by a passion for enhancing artificial intelligence applications? We are seeking a proactive Software Engineer specializing in ML performance to join our energetic team. This position is perfect for backend engineers who thrive in a fast-paced startup environment and are eager to make substantial contributions to the realm of Large Language Model (LLM) Inference. If you're enthusiastic about optimizing open-source ML models, we can't wait to hear from you!

EXAMPLE INITIATIVES

As a member of our Model Performance team, you will have the opportunity to work on exciting projects, including:

RESPONSIBILITIES

  • Develop, refine, and implement advanced techniques (quantization, speculative decoding, kv cache reuse, chunked prefill, and LoRA) for ML model inference and infrastructure.

  • Conduct thorough investigations into the codebases of TensorRT, PyTorch, TensorRT-LLM, vllm, sglang, CUDA, and other libraries to troubleshoot and resolve ML performance issues.

  • Scale and apply optimization techniques across a diverse array of ML models, with a focus on large language models.

About Baseten

Baseten is a pioneering company dedicated to revolutionizing the AI landscape. We provide essential infrastructure and tools that empower top-tier companies to harness the full potential of AI technology. Our commitment to innovation and excellence has attracted significant investment and a talented workforce, making us a leader in the AI inference domain.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.