About the job
About Us
At Cohere, our mission is to amplify intelligence to benefit humanity. We specialize in training and deploying cutting-edge models for developers and enterprises, enabling them to create extraordinary AI experiences such as content generation, semantic search, retrieval-augmented generation (RAG), and intelligent agents. Our work is pivotal in driving the widespread adoption of artificial intelligence.
We are deeply passionate about our creations. Each team member plays a crucial role in enhancing our models and maximizing the value they deliver to our clients. We thrive on hard work and agility, always prioritizing the needs of our customers.
Cohere is made up of a diverse team of leading researchers, engineers, designers, and more, all dedicated to their craft. We value unique perspectives as essential for developing exceptional products.
Join us in our journey to shape the future of AI!
Role Overview
As Large Language Models (LLMs) redefine the capabilities of AI, inference remains a critical bottleneck. Our Model Efficiency team is at the forefront of enhancing LLM inference efficiency across our foundational models. We focus on groundbreaking advancements in the model execution stack, encompassing:
Optimization of model architecture and mixture of experts (MoE) routing
Innovations in decoding and inference-time algorithms
Co-design of software and hardware for GPU acceleration
Performance enhancements without sacrificing model quality
Note: We have offices in Toronto, Montreal, San Francisco, New York, Paris, Seoul, and London. We embrace a remote-friendly culture, strategically distributing teams based on interests, expertise, and time zones to foster collaboration and flexibility. Our Model Efficiency team primarily operates in the EST and PST time zones.
As a Staff Research Engineer, you'll be instrumental in developing, prototyping, and deploying methodologies that significantly enhance the speed and efficiency of our models in production.
Ideal Candidate Profile
You may be an excellent fit for our Model Efficiency team if you:
Hold a PhD in Machine Learning or a closely related discipline
Possess a deep understanding of LLM architecture and optimization techniques for inference under resource constraints
Bring substantial experience in model optimization and performance enhancement strategies

