About the job
Baseten develops infrastructure and tools that help AI companies deploy and scale inference. Teams at organizations like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer rely on Baseten to bring advanced machine learning models into production. The company recently secured a $300M Series E from investors including BOND, IVP, Spark Capital, Greylock, and Conviction.
Role overview
This Software Engineer - GPU Inference position joins the founding team for Baseten Voice AI in San Francisco. The team focuses on building production-ready Voice AI systems, bringing open-source voice models into real-world use for clients in productivity, customer service, healthcare conversations, and education. The work shapes how people interact with technology through voice, creating broad impact across industries.
In this role, the engineer leads the internal inference stack that powers Voice AI models. Responsibilities include guiding the product roadmap and driving engineering execution. Collaboration is a key part of the job, working closely with Forward Deployed Engineers, Model Performance Engineers, and other technical groups to advance Voice AI capabilities.
Sample projects and initiatives
- The world's fastest Whisper, with streaming and diarization
- Canopy Labs selects Baseten for Orpheus TTS inference
- Partnering with the Core Product team to build an orchestration framework for a multi-model voice agent
- Working with the Training Platform team to support continuous training of voice models
- Designing a developer-friendly API and SDK for self-service adoption of Baseten Voice AI products

