companyPerplexity logo

AI Inference Engineer at Perplexity | San Francisco

PerplexitySan Francisco
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Entry Level

Qualifications

QualificationsProficiency in machine learning systems and deep learning frameworks such as PyTorch, TensorFlow, and ONNX. Familiarity with prevalent LLM architectures and inference optimization methods, including continuous batching and quantization. Solid understanding of GPU architectures, along with experience in GPU kernel programming using CUDA.

About the job

Join our dynamic team at Perplexity as an AI Inference Engineer, where you will be at the forefront of deploying cutting-edge machine learning models for real-time inference. Our tech stack includes Python, Rust, C++, PyTorch, Triton, CUDA, and Kubernetes, providing you with a chance to work on large-scale applications that make a real impact.

Key Responsibilities

  • Design and develop APIs for AI inference that cater to both internal and external stakeholders.

  • Conduct benchmarking and identify bottlenecks within our inference stack to enhance performance.

  • Ensure the reliability and observability of our systems while promptly addressing any outages.

  • Investigate innovative research and implement optimizations for LLM inference.

About Perplexity

Perplexity is a forward-thinking technology company based in San Francisco, dedicated to harnessing the power of artificial intelligence to transform industries. Our innovative team thrives on collaboration and creativity, driving advancements in AI and machine learning.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.