About the job
Tavus – Multimodal AI Model Optimization
Research Engineer
At Tavus, we are pioneering the human aspect of AI technology. Our objective is to make human-AI interactions as seamless and natural as in-person conversations, allowing for a human touch in areas that were once considered unscalable.
We accomplish this through groundbreaking research in multimodal AI, focusing on human-to-human communication modeling (encompassing language, audio, and video) and the development of audio-visual avatar behaviors. Our innovative models drive applications ranging from text-to-video AI avatars to real-time conversational video experiences across sectors such as healthcare, recruitment, sales, and education.
By empowering AI to perceive, listen, and engage with an authentic human-like presence, we are laying the groundwork for the next generation of AI workers, assistants, and companions.
As a Series B company, we are supported by renowned investors, including Sequoia, Y Combinator, and Scale VC. Join us as we shape the future of human-AI interaction.
The Role
We are seeking an accomplished Research Scientist/Engineer with expertise in model optimization to be a vital part of our core AI team.
The ideal candidate thrives in dynamic startup environments, is adept at setting priorities independently, and is open to making calculated decisions. We are moving swiftly and need individuals who can help navigate our path forward.
Your Mission
Transform state-of-the-art research models into fast, efficient, and production-ready systems through techniques such as sparsification, distillation, and quantization.
Oversee the optimization lifecycle for critical models: establish metrics, conduct experiments, and evaluate trade-offs among latency, cost, and quality.
Collaborate closely with researchers and engineers to convert innovative concepts into deployable solutions.
Requirements
Extensive experience in deep learning with PyTorch.
Practical experience in model optimization and compression, including knowledge distillation, pruning/sparsification, quantization, and mixed precision.
Familiarity with efficient architectures such as low-rank adapters.
Strong grasp of inference performance and GPU/accelerator fundamentals.
Proficient in Python coding and adherence to best practices in research engineering.
Experience with large models and datasets in cloud environments.
Capability to read ML literature, reproduce results, and modify ideas accordingly.

