About the job
About Sesame
At Sesame, we envision a world where technology seamlessly integrates into our daily lives, creating lifelike computers that can see, hear, and interact with us naturally. Our mission focuses on developing innovative voice agents that become essential companions in our everyday activities. Our diverse team comprises visionary founders from Oculus and Ubiquity6, alongside seasoned professionals from Meta, Google, and Apple, all bringing extensive expertise in both hardware and software. Join us as we revolutionize the way we engage with technology and bring computers to life.
Key Responsibilities:
Lead the development of evaluation pipelines — architect, construct, and automate both offline and live evaluations to ensure our speech and multimodal models perform reliably in real-world applications.
Leverage data effectively — develop tools that facilitate safe, versioned, and privacy-compliant dataset curation and exploration.
Deliver impactful models — collaborate with research and infrastructure teams to prototype, train, and deploy cutting-edge voice models that enhance Sesame’s real-time companion experience.
Optimize performance — enhance training and inference efficiency for LLM-class workloads; minimize latency, maximize throughput, and control costs until performance metrics are optimized.
Implement monitoring and live evaluations — proactively identify quality regressions before they affect users or product managers.
Act with agility — transition concepts from whiteboard to production within days, not quarters; maintain thorough documentation and testing practices.
Required Qualifications:
Proficient in PyTorch at an expert level.
Demonstrated software engineering skills with a passion for machine learning; adept at writing production-ready code across various technology stacks.
Substantial experience in training and fine-tuning large language models and other expansive models using diverse methodologies.
Evaluation specialist — you have crafted metrics and tools that accurately predict user satisfaction.
In-depth understanding of the machine learning lifecycle, including dataset operations, training pipelines, evaluation frameworks, deployment, and monitoring.
Proven track record of delivering complex projects into production, particularly user-facing online machine learning systems, while navigating changing requirements and unforeseen challenges.
High level of autonomy with sound judgment on when to work independently versus collaborating with team members.
Experience in guiding technical direction, fostering consensus, and collaborating effectively with cross-functional teams.

