companyMirelo AI logo

Training Infrastructure Engineer

Mirelo AIBerlin
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Mid to Senior

Qualifications

Key ResponsibilitiesIdentify and implement optimal training strategies, including parallelism approaches and precision trade-offs for diverse model sizes and computational loads. Profile, debug, and optimize single and multi-GPU operations utilizing tools like Nsight and stack trace viewers to gain insights at the hardware level. Conduct comprehensive analysis and enhancement of the entire training pipeline, focusing on efficient data storage, loading, distributed training, checkpoint saving, and logging. Establish scalable systems for experiment tracking, data/model versioning, and deriving experiment insights. Design, deploy, and maintain large-scale ML training clusters utilizing SLURM for distributed workload orchestration. Ideal Candidate ProfileProven experience in optimizing training and inference workloads through hands-on implementation of the latest techniques. Strong understanding of GPU memory hierarchy and computational capabilities, with insights into hardware limitations. Experience in optimizing both memory-bound and compute-bound operations, with clarity on when each constraint is critical. Expertise in efficient attention algorithms and their performance characteristics.

About the job

Join Mirelo AI, where we are pioneering the future of creative tools by transforming silent video content into immersive sound, speech, and music.

Our team is at the forefront of developing advanced generative AI models that bring life to video content, enabling creators across gaming and video platforms to enhance their storytelling. Recently, we secured a strong $41 million Seed funding round, led by prestigious firms including Andreessen Horowitz and Index Ventures, propelling our rapid expansion in Product, Engineering, Go-to-Market, and Growth.

About the Role

As a Training Infrastructure Engineer, you will play a crucial role in optimizing our training stack. Your responsibilities will include profiling GPU behavior, debugging training pipelines, enhancing throughput, selecting optimal parallelism strategies, and building robust infrastructure for efficient model training at scale. You will collaborate on cluster management, model training, and the development of efficient data pipelines for video and audio processing.

About Mirelo AI

Mirelo AI is a trailblazer in the field of generative AI, dedicated to enhancing creative expression through innovative technology. Our mission is to empower storytellers worldwide by providing tools that turn silent videos into vibrant audio experiences, creating captivating narratives.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.