companyGenmo logo

GPU Performance Engineer

GenmoSan Francisco HQ
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Mid to Senior

Qualifications

QualificationsBachelor's or Master's degree in Computer Science, Electrical Engineering, or a related field. A minimum of 5 years of systems programming experience, with at least 3 years specializing in GPU optimization. Expert-level proficiency with GPU profiling tools including Nsight Systems and nvprof. Strong skills in CUDA programming and experience in developing production kernels. In-depth knowledge of GPU architecture including memory hierarchy, streaming multiprocessors (SMs), and warps. Proven track record of achieving substantial performance improvements (5-10x). Experience with Python and C++ in production environments.

About the job

At Genmo, we are at the forefront of advancing artificial intelligence through innovative research in video generation. Our mission is to construct open, cutting-edge models that will ultimately contribute to the realization of Artificial General Intelligence (AGI). As part of our dynamic team, you will play a pivotal role in redefining the future of AI and expanding the horizons of video creation.

We are looking for a skilled GPU Performance Engineer who can extract maximum performance from our H100 infrastructure and fine-tune our model serving stack to achieve unparalleled efficiency. If you are passionate about optimizing performance, particularly at the microsecond level, and thrive on pushing hardware to its limits, this is the perfect opportunity for you.

Key Responsibilities

  • Utilize advanced profiling tools such as Nsight Systems and nvprof to analyze and enhance GPU workloads.

  • Develop high-performance CUDA and Triton kernels to optimize essential model functions.

  • Reduce cold start latency from seconds to mere milliseconds in our serving infrastructure.

  • Optimize memory access patterns, implement kernel fusion, and maximize GPU utilization.

  • Collaborate closely with machine learning engineers to optimize model implementations.

  • Diagnose and resolve performance issues throughout the application and hardware stack.

  • Implement custom memory pooling and allocation strategies to enhance performance.

  • Promote performance optimization techniques and foster a culture of excellence across teams.

About Genmo

Genmo is a pioneering research lab focused on developing open and advanced models for video generation. Our commitment to innovation aims to unlock new possibilities in artificial intelligence, particularly in the realm of AGI.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.