companyPinely logo

Machine Learning Performance Engineer

PinelyAmsterdam, North Holland, Netherlands
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

To thrive in this role, you should possess:Proven experience in optimizing machine learning workflows, with a focus on speed and efficiency. A deep understanding of performance tuning principles, particularly in high-performance computing environments. Hands-on experience with key ML frameworks and technologies. A spirit of collaboration and mentorship, with a commitment to fostering team growth and innovation.

About the job

Join our innovative team at Pinely as a Machine Learning Performance Engineer. We are on a mission to accelerate large-scale model training by optimizing our internal infrastructure and computing stack. In this pivotal role, you will engage with the entire training pipeline—from GPU kernels to system-wide throughput—utilizing profiling, CUDA-level tuning, and advanced distributed systems methodologies. Your contributions will be vital in minimizing training durations, enhancing iteration speeds, and maximizing computational efficiency.

As a key member of our growing team, you will help cultivate deep technical expertise in ML training systems.

Responsibilities:

  • Enhance our model training pipeline to increase speed and reliability, facilitating quicker and more effective experimentation.
  • Utilize GPU optimization techniques via tools like JAX, Triton, and low-level CUDA to elevate training performance and efficiency at scale.
  • Diagnose and rectify performance bottlenecks throughout the ML pipeline—from data loading and preprocessing to CUDA kernels.
  • Develop tools and expand our internal infrastructure to enable scalable, reproducible, and high-performance training workflows.
  • Guide and mentor engineers and researchers in implementing performance best practices across the team.
  • Assist in enhancing the team's capabilities in GPU and systems-level expertise, contributing to a culture of engineering excellence and rapid experimentation.

Requirements:

  • Proven experience optimizing neural network training in production or large-scale research environments, such as reducing training time, enhancing hardware utilization, or expediting feedback cycles for ML researchers.
  • Extensive hands-on experience with ML frameworks like PyTorch or JAX.
  • Practical experience training and optimizing deep learning architectures, including LSTM and Transformer-based models with various attention mechanisms.
  • Familiarity with CUDA, Triton, or other low-level GPU technologies for performance tuning.
  • Expertise in profiling and debugging training pipelines using tools like Nsight, cprofiler, CUDA, gdb, or torch profiler.
  • Comprehension of distributed training concepts including data/model/tensor/sequence/pipeline/context parallelism and memory-compute trade-offs.
  • A collaborative and proactive approach, coupled with strong communication skills and the ability to mentor team members effectively.
  • Strong proficiency in Python for developing infrastructure-level tools, debugging training systems, and integrating with ML frameworks and profiling tools.


What We Offer:

  • Competitive salary and comprehensive social benefits.
  • Attractive bonus structure; we are flexible in discussions regarding salary and employment conditions.
  • Access to state-of-the-art hardware and software in production, alongside a highly skilled technical team.

About Pinely

Pinely is at the forefront of machine learning innovation, dedicated to enhancing the efficiency and performance of model training. Our team is composed of experts in the field, committed to pushing the boundaries of what's possible in machine learning.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.