companyPerplexity logo

AI Inference Engineer at Perplexity | London

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Required QualificationsProficient in ML systems and deep learning frameworks such as PyTorch, TensorFlow, and ONNX. Familiar with prevalent LLM architectures and inference optimization techniques including continuous batching and quantization. Knowledgeable about GPU architectures with experience in GPU kernel programming using CUDA.

About the job

Join our innovative team at Perplexity as an AI Inference Engineer, where you'll be at the forefront of deploying machine learning models for real-time inference. Our technology stack includes Python, Rust, C++, PyTorch, Triton, CUDA, and Kubernetes. This is a fantastic opportunity to contribute to large-scale ML applications.

Key Responsibilities

  • Develop robust APIs for AI inference catering to both internal and external clients.

  • Conduct benchmarking and resolve performance bottlenecks in our inference stack.

  • Enhance system reliability and observability, responding effectively to outages.

  • Investigate cutting-edge research and implement optimizations for LLM inference.

About Perplexity

Perplexity is a pioneering technology company dedicated to advancing artificial intelligence. Our team thrives on innovation, collaboration, and the pursuit of cutting-edge solutions in AI. We empower our engineers to take ownership of their work and drive impactful results.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.