About the job
At Genmo, we are at the forefront of advancing artificial intelligence through innovative research in video generation. Our mission is to construct open, cutting-edge models that will ultimately contribute to the realization of Artificial General Intelligence (AGI). As part of our dynamic team, you will play a pivotal role in redefining the future of AI and expanding the horizons of video creation.
We are looking for a skilled GPU Performance Engineer who can extract maximum performance from our H100 infrastructure and fine-tune our model serving stack to achieve unparalleled efficiency. If you are passionate about optimizing performance, particularly at the microsecond level, and thrive on pushing hardware to its limits, this is the perfect opportunity for you.
Key Responsibilities
Utilize advanced profiling tools such as Nsight Systems and nvprof to analyze and enhance GPU workloads.
Develop high-performance CUDA and Triton kernels to optimize essential model functions.
Reduce cold start latency from seconds to mere milliseconds in our serving infrastructure.
Optimize memory access patterns, implement kernel fusion, and maximize GPU utilization.
Collaborate closely with machine learning engineers to optimize model implementations.
Diagnose and resolve performance issues throughout the application and hardware stack.
Implement custom memory pooling and allocation strategies to enhance performance.
Promote performance optimization techniques and foster a culture of excellence across teams.

