About the job
Zyphra is an innovative artificial intelligence company located in the heart of San Francisco, California.
The Opportunity:
Join our dynamic team as a Research Engineer - Audio & Speech Models, where you will play a pivotal role in advancing Zyphra’s Audio Team. You will be instrumental in developing cutting-edge open-source text-to-speech and audio models. Your contributions will span the full spectrum of the model training process, from data collection and processing to the design of innovative architectures and training approaches.
Your Responsibilities:
Conduct large-scale audio training operations
Optimize the performance of our training infrastructure
Collect, process, and evaluate audio datasets
Implement architectural and methodological improvements through rigorous testing
What We Seek:
A strong research mindset with the ability to navigate projects from ideation to implementation and documentation.
Proficiency in rapid prototyping and implementation, allowing for swift experimentation.
Effective collaboration skills in a fast-paced research environment.
A quick learner who is eager to embrace and implement new concepts.
Excellent communication abilities, enabling you to contribute to both research and engineering tasks at scale.
Preferred Qualifications:
Expertise in training audio models, such as text-to-speech, ASR, speech-to-speech, or emotion recognition.
Experience with training audio autoencoders.
Solid understanding of signal processing, particularly in audio.
Familiarity with diffusion models, consistency models, or GANs.
Experience with large-scale (multi-node) GPU training environments.
Strong understanding of experimental methodologies for conducting rigorous tests and ablations.
Interest in large-scale, parallel data processing pipelines.
Competence in PyTorch and Python programming.
Experience contributing to large, established codebases with rapid adaptation.

