companyCohere logo

Staff Software Engineer, Inference Infrastructure

CohereSan Francisco
On-site Full-Time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Qualifications

You may be a strong candidate if you possess:A minimum of 5 years of engineering experience managing production infrastructure at scale. Proven expertise in designing large, highly available distributed systems utilizing Kubernetes, along with GPU workload management on these clusters. Hands-on experience with Kubernetes development and operational support. Familiarity with cloud platforms like GCP, Azure, AWS, OCI, and multi-cloud or hybrid serving environments. Experience in designing, deploying, supporting, and troubleshooting within complex Linux-based computing environments. Background in resource and cost management related to compute, storage, and networking. Outstanding collaboration and problem-solving skills essential for building mission-critical systems and ensuring smooth operations and teamwork. A proactive and adaptable attitude towards resolving complex technical challenges.

About the job

Who are we?

At Cohere, our mission is to elevate intelligence to benefit humanity. We specialize in training and deploying cutting-edge models for developers and enterprises focused on creating AI systems that deliver extraordinary experiences such as content generation, semantic search, retrieval-augmented generation, and intelligent agents. We view our work as pivotal to the broad acceptance of AI technologies.

We are passionate about our creations. Every team member plays a vital role in enhancing our models' capabilities and the value they provide to our customers. We thrive on hard work and speed, always prioritizing our clients' needs.

Cohere is a diverse team of researchers, engineers, designers, and more, all dedicated to their craft. Each individual is a leading expert in their field, and we recognize that a variety of perspectives is essential to developing exceptional products.

Join us in our mission and help shape the future of AI!

Why this role?

Are you excited about architecting high-performance, scalable, and reliable machine learning systems? Do you aspire to shape and construct the next generation of AI platforms that enhance advanced NLP applications? We are seeking talented Members of Technical Staff to join our Model Serving team at Cohere. This team is responsible for the development, deployment, and operation of our AI platform, which delivers Cohere's large language models via user-friendly API endpoints. In this role, you will collaborate with multiple teams to deploy optimized NLP models in production settings characterized by low latency, high throughput, and robust availability. Additionally, you will have the opportunity to work directly with customers to create tailored deployments that fulfill their unique requirements.

About Cohere

Cohere is at the forefront of AI innovation, driven by a passionate team of experts dedicated to creating groundbreaking technologies that empower humanity. Our collaborative environment fosters diverse perspectives, ensuring our products are not only effective but also transformative. Join us in making a difference in the AI landscape.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.