Mirelo AI is at the forefront of innovation, crafting the future of creative tools through advanced technology that generates realistic sound, speech, and music from video.Our pioneering foundational generative AI models breathe life into silent video content, producing custom, hyper-realistic audio tailored for gaming, video platforms, and content creators. By empowering storytellers worldwide, we enable the transformation of their narratives.After successfully securing a $41 million Seed round co-led by Andreessen Horowitz and Index Ventures, with additional support from Atlantic, we are rapidly scaling our teams across Product, Engineering, Go-to-Market, and Growth.Role OverviewAs a Research Scientist at Mirelo, you will be central to the development of next-generation multimodal video-to-audio models. This position entails hands-on research and development work, supported by an impressive H100/200-per-engineer ratio, allowing you to explore and innovate new multimodal models and extend the frontiers of music, sound, and speech generation. Your collaboration across research and engineering will be key as you conduct focused ablation studies, derive actionable insights, and guide the team with clear next steps. From data curation to deployment, you will actively contribute to shaping the entire lifecycle of the models that drive our products and partnerships.Key ResponsibilitiesDesign, implement, and train large-scale multimodal generative models for audio generation (including diffusion and autoregressive models).Investigate innovative modeling concepts for audio generation across music, sound, and speech, drawing inspiration from the language and image domains.Develop and test new capabilities through post-training methods (e.g., fine-grained control, in/out-painting, editing).Conduct thorough ablation studies, derive actionable insights, and effectively communicate findings to guide future research directions.Engage in all stages of model development, including data curation, experimentation, evaluation, and deployment.Ideal Candidate ProfileProven hands-on experience in training large-scale generative models within a dynamic research environment.In-depth understanding of advanced methods and machine learning research in at least one domain: image, language, video, or audio (specific audio expertise is a plus).Strong proficiency in PyTorch and transformer architectures, alongside a comprehensive knowledge of the modern deep learning ecosystem.Solid grasp of statistical methods, data analysis, and model evaluation techniques.
Dec 3, 2025