companyCohere logo

Staff Research Engineer, Model Efficiency

CohereNew York
On-site Full-Time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Qualifications

PhD in Machine Learning or related field, extensive knowledge of LLM architecture and inference optimization, significant experience in model performance enhancement.

About the job

About Us

At Cohere, our mission is to amplify intelligence to benefit humanity. We specialize in training and deploying cutting-edge models for developers and enterprises, enabling them to create extraordinary AI experiences such as content generation, semantic search, retrieval-augmented generation (RAG), and intelligent agents. Our work is pivotal in driving the widespread adoption of artificial intelligence.

We are deeply passionate about our creations. Each team member plays a crucial role in enhancing our models and maximizing the value they deliver to our clients. We thrive on hard work and agility, always prioritizing the needs of our customers.

Cohere is made up of a diverse team of leading researchers, engineers, designers, and more, all dedicated to their craft. We value unique perspectives as essential for developing exceptional products.

Join us in our journey to shape the future of AI!

Role Overview

As Large Language Models (LLMs) redefine the capabilities of AI, inference remains a critical bottleneck. Our Model Efficiency team is at the forefront of enhancing LLM inference efficiency across our foundational models. We focus on groundbreaking advancements in the model execution stack, encompassing:

  • Optimization of model architecture and mixture of experts (MoE) routing

  • Innovations in decoding and inference-time algorithms

  • Co-design of software and hardware for GPU acceleration

  • Performance enhancements without sacrificing model quality

Note: We have offices in Toronto, Montreal, San Francisco, New York, Paris, Seoul, and London. We embrace a remote-friendly culture, strategically distributing teams based on interests, expertise, and time zones to foster collaboration and flexibility. Our Model Efficiency team primarily operates in the EST and PST time zones.

As a Staff Research Engineer, you'll be instrumental in developing, prototyping, and deploying methodologies that significantly enhance the speed and efficiency of our models in production.

Ideal Candidate Profile

You may be an excellent fit for our Model Efficiency team if you:

  • Hold a PhD in Machine Learning or a closely related discipline

  • Possess a deep understanding of LLM architecture and optimization techniques for inference under resource constraints

  • Bring substantial experience in model optimization and performance enhancement strategies

About Cohere

Cohere is on a mission to elevate intelligence for the benefit of humanity through the development of innovative AI models. Our passionate team of experts is committed to shaping the future of AI technology, fostering a collaborative and diverse environment that drives excellence.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.