Staff Software Engineer, Inference
Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Qualifications
About Anthropic
At Anthropic, we are dedicated to creating AI systems that are not only reliable but also interpretable and steerable. Our mission is to ensure that AI remains safe and beneficial for our users and society at large. Our rapidly expanding team comprises passionate researchers, engineers, policy experts, and business leaders collaborating to develop AI solutions that make a positive impact. Join us in our pursuit of building the future of beneficial AI.
Similar jobs
Search for Ai Inference Engineer At Perplexity London
10,352 results
Join our innovative team at Perplexity as an AI Inference Engineer, where you'll be at the forefront of deploying machine learning models for real-time inference. Our technology stack includes Python, Rust, C++, PyTorch, Triton, CUDA, and Kubernetes. This is a fantastic opportunity to contribute to large-scale ML applications.Key ResponsibilitiesDevelop robust APIs for AI inference catering to both internal and external clients.Conduct benchmarking and resolve performance bottlenecks in our inference stack.Enhance system reliability and observability, responding effectively to outages.Investigate cutting-edge research and implement optimizations for LLM inference.
Join Perplexity, a pioneering firm serving millions of users daily by providing reliable and high-quality answers through an innovative LLM-first search engine combined with specialized data sources. Our Answer Quality team plays a crucial role in refining prompts, tools, searches, and specialized datasets, ensuring that our evaluations are fast, accurate, and actionable. In this exciting role, you will be instrumental in building the data flywheel that supports various teams across Perplexity.ResponsibilitiesDevelop and maintain systems and pipelines that empower Search, Product, and other teams to independently access reliable evaluation verdicts without delays.Take charge of the "evals-to-product" loop, autonomously transforming raw signals into robust datasets that drive decision-making throughout the organization.Construct a sophisticated simulator pipeline capable of replaying user interactions with the product in formats comprehensible to LLMs and VLMs, reflecting real-time product updates.Ensure data integrity by implementing monitoring, lineage tracking, and quality checks, providing downstream consumers with reliable results.Collaborate within a small, impactful team where your contributions will directly influence how Perplexity measures and enhances Answer Quality.
Join Perplexity, a pioneering company based in London, as we transform the way users search and engage with the internet. We are seeking seasoned Infrastructure Engineers to become integral members of our dynamic team. In this role, you will lead the design, implementation, and scaling of innovative tools, systems, and platforms that empower web, mobile, and browser engineers to develop cutting-edge products. Your contributions will be vital in enabling our product, AI, and development teams to innovate swiftly while ensuring utmost reliability, security, and performance at scale.Our Tech StackPython | Go | TypeScript | PostgreSQL | DynamoDB | Redis | FastAPI | React | Bazel | GitHub | AWSTeams HiringSenior/Staff PlatformThe Platform team is fundamental to ensuring product reliability, scalability, and performance at Perplexity. This elite team is responsible for developing and maintaining the critical infrastructure—from backend systems such as authentication, real-time data flows, and service orchestration to frontend frameworks that guarantee fast, reliable, and secure user experiences. By upholding stringent standards for code quality, uptime, and developer productivity, the Platform team enables all of Perplexity to innovate rapidly on a solid, well-designed foundation.Senior/Staff DevXThe Developer Experience (DevX) team is dedicated to empowering engineers at Perplexity to build, ship, and iterate faster than ever before. They manage internal platforms for source control, build, test, and deployment, effectively removing bottlenecks from each phase of product development. This team designs seamless onboarding, ultra-fast CI/CD pipelines, and developer tools that enhance creativity and safety at scale. Through collaborative efforts and significant autonomy, the DevX team amplifies the contributions of every engineer, facilitating rapid, dependable innovation across the organization.Senior/Staff Cloud InfraThe Cloud Infrastructure team architects and manages the essential cloud infrastructure that powers Perplexity’s global platform. This team designs and scales computing, storage, and networking systems to support high-throughput, low-latency workloads, balancing proactive project initiatives with operational support. Collaboration with product, AI research, security, and other teams ensures that our services remain consistently available, reliable, and responsive to user demands.
At Perplexity, we empower millions of users every day with accurate, high-quality answers through our innovative LLM-first search engine and specialized data sources. Join our Answer Quality team, where we focus on enhancing user experience by ensuring our prompts, tools, search capabilities, and specialized datasets, combined with both cutting-edge and in-house models, provide the best results. As a Data Scientist/Engineer, you will analyze online signals derived from user interactions to connect shifts in answer quality with actual user behavior.Key ResponsibilitiesIdentify and validate online signals from user interactions that act as dependable indicators of true answer quality.Develop and implement innovative online metrics for tracking in A/B testing and product health dashboards, ensuring alignment with ground-truth evaluations.Evaluate experimental results to validate these metrics, ensuring they accurately reflect user satisfaction and guide product decisions.Construct and sustain data pipelines that compute these metrics at scale, providing actionable quality signals to Search, Product, and model training teams.Share insights and foster collaboration with Product and Search teams to enhance clarity and understanding.Contribute to a small, high-impact team where your work is instrumental in shaping how Perplexity measures and enhances Answer Quality.QualificationsMaster's degree in a technical field or equivalent professional experience.4+ years of experience in roles such as Data Scientist, Analytics Engineer, or similar positions.Proven experience with search, recommendation, or LLM-based products, particularly in designing online metrics and analyzing A/B tests.Strong coding skills in Python and SQL, with the ability to produce production-grade code.In-depth knowledge of statistical analysis methodologies.Experience using Business Intelligence (BI) tools for data visualization and reporting.Comfortable with coding workflows and using AI-assisted development tools for rapid iteration.Preferred QualificationsFamiliarity with Apache Spark and Databricks.Experience in developing or validating LLM-as-a-judge systems.Previous experience supporting large-scale customer-facing products.
Perplexity is thrilled to introduce our Internship Program, designed for outstanding Master’s or PhD students specializing in Computer Science or Engineering in the UK for the 2025-2026 academic year. This immersive program offers a direct collaboration with our AI Inference team, providing a distinctive opportunity to acquire invaluable experience within a rapidly expanding AI startup. Exceptional interns may receive an offer for a full-time position upon completion of the program.Our AI Inference team is integral to the performance of Perplexity's products, overseeing the inference engine and deployments for models ranging from single-node embeddings to advanced distributed sparse Mixture-of-Experts models, all while managing extensive GPU clusters. With a focus on optimizing latency and throughput, the Inference team encompasses the entire serving stack, from GPU kernels to networking and monitoring infrastructure.
Perplexity AI
Join the innovative team at Perplexity AI as an AI Infrastructure Engineer. We harness cutting-edge technologies, including Kubernetes, Slurm, Python, C++, and PyTorch, primarily within the AWS ecosystem. In this role, you will collaborate intimately with our Inference and Research teams to design, deploy, and enhance our extensive AI training and inference clusters.
Perplexity
At Perplexity, we are transforming the way individuals discover and engage with information through cutting-edge AI-powered search and knowledge tools. As we broaden our global reach, we are excited to establish a significant presence in London, a key hub for innovation and growth across Europe.The Role:We are in search of an outstanding Site Lead to launch and expand our London office. This is a remarkable opportunity to define Perplexity's footprint in one of the world's foremost technology centers, where you will cultivate teams and a vibrant culture from the ground up, all while propelling technical brilliance in infrastructure and AI systems.As the Site Lead, you will be the representative of Perplexity in London, tasked with constructing our technical organization, nurturing a premier engineering culture, and directly overseeing one or more infrastructure teams. Your role will involve reporting to senior leadership and collaborating across various teams within our global network.
Perplexity AI
Join Perplexity AI as a Search Machine Learning Engineer Intern, where you will have the opportunity to work alongside industry experts in the field of machine learning and artificial intelligence. This internship is designed to provide hands-on experience with cutting-edge technologies and methodologies in search algorithms and data analysis.
Anthropic
Join Anthropic as a Senior Software Engineer specializing in Inference, where you will develop cutting-edge machine learning systems and inference algorithms. You will play a crucial role in enhancing our AI products and ensuring they are reliable and efficient.
Role Overview Perplexity is seeking a People Operations Manager for a maternity cover based in London. This role centers on supporting the team through a range of HR responsibilities while contributing to a positive workplace culture. What You Will Do Oversee daily HR operations, including employee relations and performance management Collaborate with leadership to develop and implement workplace policies Manage talent acquisition and onboarding processes Support initiatives that help employees feel engaged and valued What Matters Here Experience with HR functions such as employee relations, recruitment, and onboarding Ability to work closely with leaders to shape workplace culture Strong organizational skills and a people-first approach This is a fixed-term position covering maternity leave.
At Coram AI, we are revolutionizing video security in today's digital landscape. Our innovative cloud-native platform leverages cutting-edge computer vision and artificial intelligence to empower businesses with enhanced safety, informed decision-making, and rapid response capabilities. Experience real-time alerts, effortless clip sharing, and comprehensive visibility across multiple sites.Joining our dynamic, agile team means embracing clarity, excellence, and impactful contributions. Every team member has a voice, delivers significant work, and plays a vital role in shaping how AI can foster a safer and more interconnected world.In this role, you will be part of a pioneering team responsible for managing a sophisticated infrastructure that transcends conventional cloud setups. Beyond our robust AWS and Kubernetes configurations, we also oversee a vast array of IoT devices. We are in search of a skilled engineer who will play a crucial role in developing and maintaining our edge and cloud stack that supports our IoT product offerings, focusing on both infrastructure and the bespoke software we utilize.You will tackle intriguing challenges at the confluence of user experience, machine learning, and infrastructure, while committing to excellence, continuous learning, and delivering exceptional products in a fast-paced startup environment.
StackOne
Join StackOne:At StackOne, we are pioneering the future of SaaS and AI integrations. Supported by GV and Workday Ventures with $24M in funding, we empower developers of SaaS platforms and AI Agents to seamlessly integrate with a multitude of tools, enhancing functionality with scalable and precise solutions. Our platform boasts 25,000 pre-mapped actions across 200 connectors, alongside an AI-driven integration development toolkit that emphasizes security: featuring real-time architecture, managed authentication, and comprehensive observability.Join us on our ambitious journey towards revolutionizing agentic integrations.With an AI-native integration toolkit that ensures real-time execution and robust security measures, we are heavily investing in AI R&D. Our new lab focuses on developing specialized LLMs that excel in precision, reliability, and safety - crucial components for effective agentic execution.Your Role:As an AI Engineer, you will play a pivotal role in enabling users to integrate their preferred tools effortlessly with just one click, thanks to StackOne's innovative solutions.We seek a skilled Software Engineer with a solid foundation in AI production-grade applications, particularly with experience in working with LLMs (e.g., GPT, Anthropic, etc.).In this position, you will be responsible for delivering AI features and products throughout their lifecycle, collaborating closely with our dedicated AI team, which includes both AI researchers and engineers. You will report directly to the CTO.
Amari AI
About WexlerWexler is at the forefront of developing a revolutionary AI system tailored for litigation. We collaborate with some of the largest law firms globally, assisting them in navigating their most intricate legal battles, uncovering winning strategies, and transforming the overwhelming influx of documents and facts into actionable insights. Located in London, Wexler is a rapidly expanding legal AI company trusted by renowned firms such as Clifford Chance, HSFKramer, Goodwin, and Addleshaw Goddard to pinpoint crucial facts that can sway case outcomes.Backed by a $5.3M Seed round from Pear VC, Seedcamp, The LegalTechFund, and a host of esteemed industry angels, we are developing a comprehensive AI platform aimed at managing, resolving, and preventing legal disputes for Enterprise law firms and Fortune 500 companies. Our growth trajectory is remarkable, boasting a 10x year-over-year increase and onboarding more distinguished firms each month. Our innovative system efficiently extracts and cross-references facts from millions of documents, empowering litigators to secure more victories while significantly reducing the time spent on case preparation. Ultimately, we ensure that every client receives the legal representation they rightfully deserve.As AI reshapes the legal landscape, most existing tools have concentrated on contract law or serve as generic copilots across various legal tasks. In contrast, Wexler stands out as the premier generative AI platform meticulously designed for the intricacies of litigation, evidenced by our rapid growth and industry acceptance.About the RoleWe are in search of a talented AI Engineer to help redefine the future of legal practice. You will play a crucial role in building autonomous systems that analyze vast amounts of documents to extract pivotal factual information essential for winning legal cases. As an integral member of our team, you will have direct influence over our AI strategy, leveraging cutting-edge technology to address the real challenges faced by elite litigators.
Join Edra as an AI EngineerAt Edra, we are tackling one of the most challenging issues in enterprise AI: the mismatch between generic AI models and specific company processes. We are developing AI agents that learn to navigate and execute processes as they truly operate.As a Series A startup, supported by Sequoia and other notable venture capital firms, we are expanding our teams in New York and London. Our workforce is composed of exceptionally skilled engineers, AI researchers, and strategists who believe that outstanding talent is the cornerstone of our success.Your RoleYou will be instrumental in creating a learning system that equips AI agents with the knowledge of how enterprises function. This system will assimilate knowledge bases, dialogues, tickets, and system logs, generating written directives that agents can confidently execute, including a scoring mechanism to determine when to automate tasks or request human intervention.We seek AI Engineers with a track record of building intricate, production-ready LLM-based systems. If you've scaled LLM workflows to manage millions of requests, developed multi-agent systems in live environments, or created evaluation frameworks for enterprise applications, we want you. Your contributions will directly impact our core learning library, and you will ship features that serve actual enterprise customers while enhancing the platform. This role covers continuous learning systems, agentic features, human-in-the-loop feedback mechanisms, and orchestration of AI agents.Your ResponsibilitiesContribute to our core context learning library by implementing new learning capabilities and generalizing them for broader platform use.Design and execute LLM-powered systems and workflows from initial concept to production.Collaborate with enterprise customers to identify issues, prototype solutions, and bring them to production.Develop agentic features for knowledge management, enabling agents to autonomously edit, update, and maintain extensive knowledge bases.Establish reliability and confidence systems, including evaluation frameworks and logic for determining when to automate versus when to escalate to human oversight.Architect asynchronous, scalable systems for complex AI orchestration.Influence the direction of our AI strategy and product capabilities.
Distyl AI
Join Distyl AIAt Distyl AI, we are revolutionizing the landscape of applied artificial intelligence. As a pioneering technology company, we collaborate with leading institutions across sectors such as telecommunications, healthcare, insurance, manufacturing, and consumer goods. Our mission is to redefine operational processes through innovative AI-driven solutions.Our extensive research and deployment initiatives focus on creating AI-native systems that enhance efficiency and effectiveness. Our groundbreaking technologies support massive operational frameworks, impacting millions of consumer interactions and vital workflows globally.With backing from top-tier investors including Lightspeed Venture Partners and Khosla Ventures, Distyl AI boasts a remarkable track record of 100% success in production deployments, making us one of the few profitable enterprises in the AI domain.Your RoleWe are excited to announce the opening of our London office and are in search of talented AI Engineers to architect and implement production-grade AI systems harnessing the power of Large Language Models (LLMs).As an AI Engineer, you will collaborate with Fortune 500 clients to streamline intricate workflows using state-of-the-art AI technologies. You will design and launch AI applications that operate on a large scale in critical environments, from intelligent agents to comprehensive AI solutions.This position is highly hands-on, requiring close collaboration with clients to define system architectures and develop reliable, impactful AI systems from initial prototypes to full-scale production. You will also play a pivotal role in shaping technical strategies across key customer projects, guiding enterprise teams through the intricacies of AI adoption.
Anthropic
Join Anthropic as a Staff Software Engineer on our Inference team, where you will play a pivotal role in building and maintaining the essential systems that power Claude for millions of users globally. This position involves overseeing the complete stack—from intelligent request routing to fleet-wide orchestration across various AI accelerators. You will focus on enhancing compute efficiency to support our rapid user growth while enabling groundbreaking research through high-performance inference infrastructure. As a key player in tackling complex distributed systems challenges, you will help shape the future of AI technology.
Lightning AI
Who We AreLightning AI is the innovative force behind PyTorch Lightning, established in 2019. We provide a comprehensive platform designed to streamline the development, training, and deployment of AI systems, effectively bridging the gap between research and production.Our merger with Voltage Park, a neocloud and AI Factory, empowers us to combine developer-centric software solutions with economical, large-scale computing capabilities. We equip teams with essential tools for experimentation, training, and production inference, ensuring built-in security, observability, and control.We cater to solo researchers, startups, and large enterprises alike, operating globally with offices in cities such as New York City, San Francisco, Seattle, and London. Our efforts are supported by notable investors including Coatue, Index Ventures, Bain Capital Ventures, and Firstminute.What We’re Looking ForWe are seeking a skilled Backend Engineer to develop and enhance the Lightning AI platform across various components, from the frontend and CLI to API, billing, security, and integrations. You will take full ownership of key features, driving development from concept to production while collaborating with a talented team of engineers, product managers, and designers. This role offers the chance to create groundbreaking technology that will revolutionize the machine-learning ecosystem!This hybrid position is based in our London office with a requirement of 2 in-office days per week. The salary range for this role is $120,000 - $250,000.
Lightning AI
About UsLightning AI, the driving force behind PyTorch Lightning, was established in 2019. Our mission is to create a seamless, end-to-end platform that simplifies the development, training, and deployment of AI systems, allowing innovative ideas to transition smoothly from research to production.Following our merger with Voltage Park—an AI Factory and neocloud—Lightning AI merges developer-centric software with efficient, large-scale computation. Our platform provides teams with the essential tools for experimentation, training, and production inference, all while ensuring security, observability, and control.We cater to a diverse clientele, including individual researchers, startups, and major enterprises. With a global presence and offices in New York City, San Francisco, Seattle, and London, Lightning AI is supported by esteemed investors such as Coatue, Index Ventures, Bain Capital Ventures, and Firstminute.Our Core ValuesMove Fast: We prioritize speed and precision, breaking down complex challenges into manageable tasks.Focus: We dedicate ourselves to completing one goal at a time, collaborating effectively to deliver precise features.Balance: We believe sustained performance results from adequate rest and recovery, promoting a healthy work-life balance.Craftsmanship: Innovation through excellence—every detail counts, and we take pride in honing our craft.Minimal: We embrace simplicity as a driver of innovation, focusing on what truly matters.Your RoleWe are seeking a skilled Frontend Engineer to enhance and scale the UI and frontend infrastructure of the Lightning AI platform.In this role, you will take ownership of crucial features, driving development from concept to completion while collaborating with a talented team of engineers, product managers, and designers. Your work will focus on ensuring speed, quality, and rapid iteration—from initial proof of concept to final release.With over 10,000 organizations utilizing Lightning AI, this position offers you a unique opportunity to influence how AI solutions are built and deployed in production environments.You will be part of the Growth Squad and report directly to our Director of Engineering. This hybrid role is based in our London office, requiring in-office attendance for 2 days each week.
At Wayve, we are dedicated to fostering a diverse, equitable, and inclusive culture that values the unique skills and perspectives of every individual, regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, veteran status, pregnancy or related conditions (including breastfeeding), or any other status protected by applicable law.About UsEstablished in 2017, Wayve stands at the forefront of Embodied AI technology. Our cutting-edge AI software and foundational models empower vehicles to perceive, comprehend, and navigate complex environments, significantly enhancing the safety and functionality of automated driving systems.Our mission is to drive the evolution of autonomy that moves the world forward. Our innovative, mapless, and hardware-agnostic AI solutions are tailored for automakers, expediting the transition from assisted to fully automated driving. In our dynamic environment, we thrive on tackling significant challenges, embracing uncertainty to unlock pioneering solutions. We aim high while remaining humble in our pursuit of excellence, continuously learning and adapting as we pave the path toward a smarter, safer future.Your contributions at Wayve are impactful. We celebrate diversity, welcome fresh perspectives, and cultivate an inclusive workplace where we support one another in creating meaningful results.Make Wayve the defining experience of your career!The RoleIn this position, you will have the opportunity to work with Wayve’s advanced on-vehicle computing and sensor platform while contributing to all stages of the software development lifecycle. As a vital member of the Robot Software, Inference & Accelerators team, you will collaborate with colleagues to develop software for edge devices, enabling data-driven autonomy across a vast fleet of vehicles. You will closely partner with our Autonomy and Science teams to ensure they have the necessary data and interfaces to train models, conduct experiments, and gather insights on driving performance. Your responsibilities will include delivering software layers that extend our inference abstractions to new hardware and accelerators, ensuring timely delivery of data, and creating a dynamically configurable system. A key aspect of your role will be to ensure that the software you develop operates reliably at scale.
Join Lendable’s MissionLendable is dedicated to revolutionizing the credit landscape by creating cutting-edge technology that empowers individuals to access credit and save money effectively. As one of the UK’s newest unicorns, we are rapidly establishing ourselves among the fastest-growing fintech companies.With a dynamic team of over 700 professionals, we have been profitable since 2017 and are supported by esteemed investors including Balderton Capital and Goldman Sachs. Our customer satisfaction speaks volumes, boasting a remarkable rating of 4.9 from tens of thousands of reviews on Trustpilot.Having successfully reconstructed the Big Three consumer finance products—loans, credit cards, and car finance—we are committed to providing our customers with quick access to funds. Our vision extends to expanding into major markets like the UK and US, targeting outdated banking systems and improving user experiences.Why You Should Join UsOwnership: Take charge from day one and influence Lendable’s trajectory.Exceptional Teams: Collaborate with a small group of innovative individuals dedicated to solving complex challenges and improving existing solutions.In-House Technology: Develop top-tier technology using machine learning and AI to optimize processes and enhance efficiency.Role OverviewWe are in search of a hands-on AI Engineer to bolster our Internal Automation team. Your role will focus on enhancing the efficiency of various internal teams, including Finance, Compliance, Product, and QA, through the development of AI-driven tools and automated workflows. Working within a compact team of four engineers and one project manager, you will play a vital role in streamlining operations, minimizing friction, and allowing colleagues to devote more time to high-priority tasks.
Sign in to browse more jobs
Create account — see all 10,352 results

