Inference Engineer at Cartesia | San Francisco, CA

Cartesia*HQ - San Francisco, CA

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Prospective candidates should possess strong engineering skills and demonstrate a capacity for clean code practices, experience in building distributed systems, technical leadership abilities, and familiarity with machine learning inference pipelines.

About the job

Join Cartesia as an Inference Engineer

At Cartesia, our vision is to create the next evolution of AI: an interactive, omnipresent intelligence that operates seamlessly across all environments. Currently, even the most advanced models struggle to continuously analyze a year's worth of audio, video, and text data—comprising 1 billion text tokens, 10 billion audio tokens, and 1 trillion video tokens—much less perform these tasks on-device.

We are at the forefront of developing the model architectures that will make this a reality. Our founding team, who met as PhD candidates at the Stanford AI Lab, pioneered State Space Models (SSMs), a groundbreaking framework for training efficient, large-scale foundation models. Our talented team merges deep expertise in model innovation and systems engineering with a design-focused product engineering approach, enabling us to build and launch state-of-the-art models and user experiences.

Supported by leading investors such as Index Ventures and Lightspeed Venture Partners, along with contributions from Factory, Conviction, A Star, General Catalyst, SV Angel, Databricks, and others, we are fortunate to be guided by numerous exceptional advisors and over 90 angel investors from diverse industries, including some of the world’s foremost experts in AI.

About the Role

We are actively seeking an Inference Engineer to propel our mission of creating real-time multimodal intelligence.

Your Impact

Develop and implement a low-latency, scalable, and dependable model inference and serving stack for our innovative foundation models utilizing Transformers, SSMs, and hybrid models.
Collaborate closely with our research team and product engineers to efficiently deliver our product suite in a fast, cost-effective, and reliable manner.
Construct robust inference infrastructure and monitoring systems for our product offerings.
Enjoy substantial autonomy in shaping our products and directly influencing how cutting-edge AI is integrated across diverse devices and applications.

What You Bring

At Cartesia, we prioritize strong engineering skills due to the complexity and scale of the challenges we tackle.

Proficient engineering skills with a comfort level in navigating intricate codebases, and a commitment to producing clean, maintainable code.
Experience in developing large-scale distributed systems with strict performance, reliability, and observability requirements.
Proven technical leadership, capable of executing and delivering results from zero to one amidst uncertainty.
A background in or experience with inference pipelines, machine learning, and generative models.

About Cartesia

Cartesia is committed to pioneering the future of AI through innovative model architectures, guided by a team of experts with a strong foundation in academic research and practical application.

Similar jobs

1 - 20 of 11,418 Jobs

Search for Software Engineer Genai Inference At Databricks San Francisco

11,418 results

Select all on this page (20)

Apply

Software Engineer - GenAI Inference at Databricks | San Francisco

Databricks

Full-time|$142.2K/yr - $204.6K/yr|On-site|San Francisco, California

About This Role Join Databricks as a Software Engineer focused on GenAI inference, where you will play a pivotal role in designing, developing, and enhancing the inference engine that drives our Foundation Model API. Collaborating at the intersection of research and production, you will ensure our large language model (LLM) serving systems are optimized for speed, scalability, and efficiency. Your contributions will span the entire GenAI inference stack, from kernels and runtimes to orchestration and memory management. What You Will Do Participate in the design and implementation of the inference engine, collaborating on a model-serving stack tailored for large-scale LLM inference. Work closely with researchers to integrate new model architectures or features such as sparsity, activation compression, and mixture-of-experts into the engine. Optimize latency, throughput, memory efficiency, and hardware utilization across GPUs and other accelerators. Build and maintain tools for instrumentation, profiling, and tracing to identify bottlenecks and inform optimization efforts. Develop scalable routing, batching, scheduling, memory management, and dynamic loading mechanisms for inference workloads. Ensure reliability, reproducibility, and fault tolerance in inference pipelines, including A/B launches, rollback, and model versioning. Integrate with federated and distributed inference infrastructure, orchestrating across nodes, balancing load, and managing communication overhead. Engage in cross-functional collaboration with platform engineers, cloud infrastructure, and security/compliance teams. Document and share insights, contributing to internal best practices and open-source initiatives as appropriate.

Jan 30, 2026

Apply

Lead Staff Software Engineer - GenAI Inference

Databricks

Full-time|$190.9K/yr - $232.8K/yr|On-site|San Francisco, California

P-1285 About This Role Join Databricks as a Staff Software Engineer specializing in GenAI inference, where you will spearhead the architecture, development, and optimization of the inference engine that powers the Databricks Foundation Model API. Your role will be crucial in bridging cutting-edge research with real-world production requirements, ensuring exceptional throughput, minimal latency, and scalable solutions. You will work across the entire GenAI inference stack, including kernels, runtimes, orchestration, memory management, and integration with various frameworks and orchestration systems. What You Will Do Take full ownership of the architecture, design, and implementation of the inference engine, collaborating on a model-serving stack optimized for large-scale LLM inference. Work closely with researchers to integrate new model architectures or features, such as sparsity, activation compression, and mixture-of-experts into the engine. Lead comprehensive optimization efforts focused on latency, throughput, memory efficiency, and hardware utilization across GPUs and other accelerators. Establish and uphold standards for building and maintaining instrumentation, profiling, and tracing tools to identify performance bottlenecks and drive optimizations. Design scalable solutions for routing, batching, scheduling, memory management, and dynamic loading tailored to inference workloads. Guarantee reliability, reproducibility, and fault tolerance in inference pipelines, including capabilities for A/B testing, rollbacks, and model versioning. Collaborate cross-functionally to integrate with federated and distributed inference infrastructure, ensuring effective orchestration across nodes, load balancing, and minimizing communication overhead. Foster collaboration with cross-functional teams, including platform engineers, cloud infrastructure, and security/compliance professionals. Represent the team externally through benchmarks, whitepapers, and contributions to open-source projects. What We Look For A BS/MS/PhD in Computer Science or a related discipline. A solid software engineering background with 6+ years of experience in performance-critical systems. A proven ability to own complex system components and influence architectural decisions from conception to execution. A deep understanding of ML inference internals, including attention mechanisms, MLPs, recurrent modules, quantization, and sparse operations. Hands-on experience with CUDA, GPU programming, and essential libraries (cuBLAS, cuDNN, NCCL, etc.). A strong foundation in distributed systems design, including RPC frameworks, queuing, RPC batching, sharding, and memory partitioning. Demonstrated proficiency in diagnosing and resolving performance bottlenecks across multiple layers (kernel, memory, networking, scheduler).

Jan 30, 2026

Apply

Senior Machine Learning Engineer at Databricks | San Francisco

Databricks

Full-time|$164.2K/yr - $205.2K/yr|On-site|San Francisco, California

The Applied AI team at Databricks is dedicated to pioneering advancements in GenAI-driven products. In recent years, we have successfully launched notable innovations such as the Databricks Assistant, AI/BI Genie, and Agent Bricks. These products are utilized by hundreds of thousands of Databricks users daily. We are addressing complex challenges such as code suggestions, error detection and correction, text-to-SQL generation, automatic pipeline creation, and knowledge QA. As our GenAI products continue to advance, we are on the lookout for multiple GenAI Engineers, ranging from junior to senior levels, to spearhead the next phase of development. In 2025, our focus will be on enhancing the quality of LLMs, broadening GenAI functionalities across Databricks products, and fortifying our platform architecture to facilitate seamless AI interactions at scale.

Feb 2, 2026

Apply

Staff Machine Learning Engineer at Databricks | San Francisco

Databricks

Full-time|$190K/yr - $285K/yr|On-site|San Francisco, California

The Applied AI team at Databricks is at the forefront of pioneering GenAI-powered products. In recent years, we have successfully launched the Databricks Assistant, AI/BI Genie, and Agent Bricks, collaborating with product teams to significantly enhance LLM quality for these offerings. These innovations are utilized by hundreds of thousands of Databricks users daily. We are dedicated to solving complex challenges such as code suggestion, error detection and correction, text-to-SQL generation, automatic pipeline generation, and knowledge QA.As we continue to evolve our GenAI products, we are looking for multiple GenAI Engineers at various experience levels to lead the next phase of our development. Our goals for 2025 include improving LLM quality, broadening GenAI capabilities across Databricks products, and reinforcing our platform architecture to facilitate seamless AI interactions on a large scale.

Feb 2, 2026

Apply

Staff Software Engineer - Backend at Databricks | San Francisco

Databricks

Full-time|$192K/yr - $260K/yr|On-site|San Francisco, California

P-150 At Databricks, we are dedicated to empowering data teams to tackle some of the world's most challenging problems, ranging from security threat detection to the development of cancer treatments. We achieve this by creating and managing the leading data and AI infrastructure platform, allowing our clients to concentrate on the high-value challenges central to their missions. Founded in 2013 by the original creators of Apache Spark™, Databricks has evolved from a modest office in Berkeley, California, to a global enterprise with over 1000 employees. Our platform is trusted by thousands of organizations, from startups to Fortune 100 companies, establishing us as one of the fastest-growing SaaS firms worldwide. Our engineering teams design and develop highly technical products that meet significant real-world needs. We continually push the limits of data and AI technology while maintaining the resilience, security, and scalability essential for our customers' success on our platform. We operate one of the largest-scale software platforms, comprising millions of virtual machines that generate terabytes of logs and process exabytes of data daily. Given our scale, we frequently encounter cloud hardware, network, and operating system faults, and our software must effectively shield our customers from any disruptions. As a Software Engineer focused on backend development, you will collaborate closely with your team and product management to prioritize, design, implement, test, and operate microservices for the Databricks platform and products. This role includes writing software in Scala/Java, building data pipelines using Apache Spark™ and Apache Kafka, integrating with third-party applications, and interacting with cloud APIs such as AWS, Azure, CloudFormation, and Terraform. Below are examples of teams you can join: Data Science and Machine Learning Infrastructure: Develop services and infrastructure that bridge the gap between machine learning and distributed systems. Our technology supports flagship collaborative workspaces, notebooks, IDE integrations, and project management tools. We also facilitate large-scale machine learning through environment management, distributed training, and lifecycle management via MLflow. Compute Fabric: Construct the resource management infrastructure that supports all big data and machine learning workloads on the Databricks platform in a robust, flexible, secure, and cloud-agnostic manner. This software oversees millions of virtual machines.

Jan 30, 2026

Apply

Senior Backend Software Engineer at Databricks | San Francisco

Databricks

Full-time|$166K/yr - $225K/yr|On-site|San Francisco, California

Join Databricks as a Senior Backend Software Engineer and contribute to solving some of the world's most challenging problems. We empower data teams through our state-of-the-art data and AI infrastructure platform, enabling businesses to leverage deep data insights for transformative growth. At Databricks, founded by engineers and driven by a customer-first philosophy, we tackle technical challenges with enthusiasm — from designing next-gen UI/UX for data interfacing to scaling our services across millions of virtual machines.In this role, you will collaborate closely with your team to develop scalable infrastructure and innovative products for the Databricks platform.

Jan 30, 2026

Apply

Staff Software Engineer - GenAI Performance and Kernel

Databricks

Full-time|$190.9K/yr - $232.8K/yr|On-site|San Francisco, California

P-1285 About This Role Join our dynamic team at Databricks as a Staff Software Engineer specializing in GenAI Performance and Kernel. In this pivotal role, you will take charge of designing, implementing, and optimizing high-performance GPU kernels that drive our GenAI inference stack. Your expertise will lead the development of finely-tuned, low-level compute paths, balancing hardware efficiency with versatility, while mentoring fellow engineers in the intricacies of kernel-level performance engineering. Collaborating closely with machine learning researchers, systems engineers, and product teams, you will elevate the forefront of inference performance at scale. What You Will Do Lead the design, implementation, benchmarking, and maintenance of essential compute kernels (such as attention, MLP, softmax, layernorm, memory management) tailored for diverse hardware backends (GPU, accelerators). Steer the performance roadmap for kernel-level enhancements, focusing on areas like vectorization, tensorization, tiling, fusion, mixed precision, sparsity, quantization, memory reuse, scheduling, and auto-tuning. Integrate kernel optimizations seamlessly with higher-level machine learning systems. Develop and uphold profiling, instrumentation, and verification tools to identify correctness, performance regressions, numerical discrepancies, and hardware utilization inefficiencies. Conduct performance investigations and root-cause analyses to address inference bottlenecks, such as memory bandwidth, cache contention, kernel launch overhead, and tensor fragmentation. Create coding patterns, abstractions, and frameworks to modularize kernels for reuse, cross-backend compatibility, and maintainability. Influence architectural decisions to enhance kernel efficiency (including memory layout, dataflow scheduling, and kernel fusion boundaries). Guide and mentor fellow engineers focused on lower-level performance, conducting code reviews and establishing best practices. Collaborate with infrastructure, tooling, and machine learning teams to implement kernel-level optimizations in production and assess their impacts.

Jan 30, 2026

Apply

Software Engineer - Inference Platform at Fluidstack | San Francisco

Fluidstack

Full-time|$165K/yr - $500K/yr|On-site|San Francisco, CA

Join the Fluidstack TeamAt Fluidstack, we’re pioneering the infrastructure for advanced intelligence. We collaborate with leading AI laboratories, governmental entities, and major corporations—including Mistral, Poolside, and Meta—to deliver computing solutions at unprecedented speeds.Our mission is to transform the vision of Artificial General Intelligence (AGI) into a reality. Driven by our purpose, our dedicated team is committed to building state-of-the-art infrastructure that prioritizes our customers' success. If you share our passion for excellence and are eager to contribute to the future of intelligence, we invite you to be part of our journey.Role OverviewThe Inference Platform team at Fluidstack is at the forefront of addressing the cost and latency challenges associated with frontier AI. You will play a crucial role in managing the serving layer that connects our global accelerator supply with the production workloads of our clients, which include LLM serving frameworks, KV cache infrastructure, and Kubernetes orchestration across multiple data centers.This hands-on individual contributor role combines elements of distributed systems, model optimization, and serving infrastructure. You will oversee the entire lifecycle of inference deployments for leading AI labs, striving for enhancements in throughput, cost-efficiency, and response times, while also influencing the architectural decisions that guide Fluidstack’s deployment strategies.

Mar 5, 2026

Apply

Software Engineer - GPU Inference at Baseten | San Francisco

Baseten

Full-time|On-site|San Francisco

Baseten develops infrastructure and tools that help AI companies deploy and scale inference. Teams at organizations like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer rely on Baseten to bring advanced machine learning models into production. The company recently secured a $300M Series E from investors including BOND, IVP, Spark Capital, Greylock, and Conviction. Role overview This Software Engineer - GPU Inference position joins the founding team for Baseten Voice AI in San Francisco. The team focuses on building production-ready Voice AI systems, bringing open-source voice models into real-world use for clients in productivity, customer service, healthcare conversations, and education. The work shapes how people interact with technology through voice, creating broad impact across industries. In this role, the engineer leads the internal inference stack that powers Voice AI models. Responsibilities include guiding the product roadmap and driving engineering execution. Collaboration is a key part of the job, working closely with Forward Deployed Engineers, Model Performance Engineers, and other technical groups to advance Voice AI capabilities. Sample projects and initiatives The world's fastest Whisper, with streaming and diarization Canopy Labs selects Baseten for Orpheus TTS inference Partnering with the Core Product team to build an orchestration framework for a multi-model voice agent Working with the Training Platform team to support continuous training of voice models Designing a developer-friendly API and SDK for self-service adoption of Baseten Voice AI products

Apr 26, 2026

Apply

Senior Engineering Manager - Photon Team at Databricks | San Francisco

Databricks

Full-time|On-site|San Francisco, California

At Databricks, we are driven by our mission to empower data teams to tackle the world's most challenging problems—whether it's realizing the next mode of transportation or accelerating medical advancements. Our dedication is to build and operate the premier Data and AI infrastructure platform, enabling our clients to harness deep insights from data to enhance their operations. We are on the lookout for a Senior Engineering Manager to join our Photon team, playing a pivotal role in defining the next generation of ultra-fast, scalable data processing solutions. In this role, you will lead a team of exceptional engineers working on Photon—the innovative query engine that powers Databricks' groundbreaking Lakehouse Platform. This is a unique opportunity to drive innovation on a grand scale, optimize data systems for tangible results, and collaborate with top experts in query optimization, execution, storage, and large-scale analytics. If you are passionate about empowering teams, architecting groundbreaking technologies, and accelerating analytics for millions of users, we invite you to join our team.

Jan 30, 2026

Apply

Senior GenAI Research Engineer - Optimization and Kernels

Databricks

Full-time|$166K/yr - $225K/yr|On-site|San Francisco, California

At Databricks, we are dedicated to empowering data teams to tackle the world's most challenging problems, from detecting security threats to advancing cancer drug development. We achieve this by offering the premier data and AI platform, allowing our customers to concentrate on their mission-critical challenges. The Mosaic AI organization assists companies in developing AI models and systems utilizing their own data, employing technologies that range from training large language models (LLMs) from the ground up to employing advanced retrieval methods for enhanced generation. We pride ourselves on pushing the boundaries of science and operationalizing our innovations. Mosaic AI believes that a company’s AI models hold intrinsic value, akin to any other core intellectual property, and that superior AI models should be accessible to all. Job Overview As a research engineer in the Scaling team, you will stay abreast of the latest advancements in deep learning and pioneer new methodologies that surpass the current state of the art. You will collaborate with a diverse team of researchers and engineers, sharing insights and expertise. Most importantly, you will be passionate about our customers, striving to ensure their success in implementing cutting-edge LLMs and AI systems by translating our scientific knowledge into practical applications. Your Impact Enhance performance through innovative optimization techniques, including kernel fusion, mixed precision, memory layout optimization, tiling strategies, and tensorization tailored for training-specific patterns. Design, implement, and optimize high-performance GPU kernels for training workloads, including attention mechanisms, custom layers, gradient computations, and activation functions, specifically for NVIDIA architectures. Create and implement distributed training frameworks for large language models, incorporating parallelism strategies (data, tensor, pipeline, ZeRO-based) and optimized communication patterns for gradient synchronization and collective operations. Profile, debug, and optimize comprehensive training workflows to pinpoint and resolve performance bottlenecks, utilizing memory optimization techniques such as activation checkpointing, gradient sharding, and mixed precision training.

Jan 30, 2026

Apply

Inference Engineer at Cartesia | San Francisco, CA

Cartesia

Full-time|On-site|*HQ - San Francisco, CA

Join Cartesia as an Inference EngineerAt Cartesia, our vision is to create the next evolution of AI: an interactive, omnipresent intelligence that operates seamlessly across all environments. Currently, even the most advanced models struggle to continuously analyze a year's worth of audio, video, and text data—comprising 1 billion text tokens, 10 billion audio tokens, and 1 trillion video tokens—much less perform these tasks on-device.We are at the forefront of developing the model architectures that will make this a reality. Our founding team, who met as PhD candidates at the Stanford AI Lab, pioneered State Space Models (SSMs), a groundbreaking framework for training efficient, large-scale foundation models. Our talented team merges deep expertise in model innovation and systems engineering with a design-focused product engineering approach, enabling us to build and launch state-of-the-art models and user experiences.Supported by leading investors such as Index Ventures and Lightspeed Venture Partners, along with contributions from Factory, Conviction, A Star, General Catalyst, SV Angel, Databricks, and others, we are fortunate to be guided by numerous exceptional advisors and over 90 angel investors from diverse industries, including some of the world’s foremost experts in AI.About the RoleWe are actively seeking an Inference Engineer to propel our mission of creating real-time multimodal intelligence.Your ImpactDevelop and implement a low-latency, scalable, and dependable model inference and serving stack for our innovative foundation models utilizing Transformers, SSMs, and hybrid models.Collaborate closely with our research team and product engineers to efficiently deliver our product suite in a fast, cost-effective, and reliable manner.Construct robust inference infrastructure and monitoring systems for our product offerings.Enjoy substantial autonomy in shaping our products and directly influencing how cutting-edge AI is integrated across diverse devices and applications.What You BringAt Cartesia, we prioritize strong engineering skills due to the complexity and scale of the challenges we tackle.Proficient engineering skills with a comfort level in navigating intricate codebases, and a commitment to producing clean, maintainable code.Experience in developing large-scale distributed systems with strict performance, reliability, and observability requirements.Proven technical leadership, capable of executing and delivering results from zero to one amidst uncertainty.A background in or experience with inference pipelines, machine learning, and generative models.

Dec 12, 2024

Apply

Staff Data & AI Technical Solutions Engineer at Databricks | San Francisco

Databricks

Full-time|$102.4K/yr - $181.2K/yr|Hybrid|San Francisco, California

P-1398 Note: This is a hybrid role requiring approximately 3 days in the office in Plano, TX. Mission: As a Staff Data & AI Technical Solutions Engineer, you will take the lead in developing and guiding others in crafting innovative Data & AI technical solutions for various customer issues. This includes an in-depth exploration of production data pipelines, AI workflows, streaming data ingestion, and more. You possess extensive expertise in Data & AI architectures, having encountered a diverse array of use cases while successfully managing intricate production environments in compliance with Service Level Agreements (SLAs). You will work collaboratively with product engineering and customer account teams to deliver exceptional results for customers, especially in critical production downtime scenarios. Your role is to empower customers with the knowledge and skills necessary to create efficient architectures and dependable services, leveraging Databricks’ platform to achieve their strategic goals. Reporting to a Technical Support Engineering manager, you are recognized as a technical authority within a leading global support engineering organization, celebrated for your leadership, technical knowledge, and ability to resolve even the most challenging customer situations. The impact you will have: Lead and direct technical solutions for a wide range of complex challenges reported by Databricks customers, including escalated and critical cases. Conduct comprehensive code-level analysis and systems architecture assessments of customer workloads to resolve issues related to Databricks products, including Spark core internals, Spark SQL, Delta, DLT, and Model Serving. Act as a customer support advisor, adept at diffusing escalations during incidents directly with customer stakeholders, swiftly identifying mitigations and solutions for advanced use cases, and strategically helping to avert future customer issues. Contribute to simplifying Databricks products and enhancing customer production environments' stability by influencing Engineering and Backline Support teams to identify areas for product improvements. Establish yourself as a deep technical expert and leader in the Databricks platform and production systems, mentoring others to share your expertise. What we look for: A minimum of 8 years of experience in designing, building, testing, and maintaining data pipelines, with relevant past experience in Spark being mandatory. Proficiency in SQL databases or data warehouses, among other technical skills.

Feb 13, 2026

Apply

Software Engineer, Inference

Pulse

Full-time|On-site|San Francisco

OverviewAt Pulse, we are revolutionizing the way data infrastructure operates by addressing the critical challenge of accurately extracting structured information from intricate documents on a large scale. Our innovative document understanding technique merges intelligent schema mapping with advanced extraction models, outperforming traditional OCR and parsing methods.Located in the heart of San Francisco, we are a dynamic team of engineers dedicated to empowering Fortune 100 enterprises, YC startups, public investment firms, and growth-stage companies. Backed by top-tier investors, we are rapidly expanding our footprint in the industry.What sets our technology apart is our sophisticated multi-stage architecture, which includes:Specialized models for layout understanding and component detectionLow-latency OCR models designed for precise extractionAdvanced algorithms for reading-order in complex document structuresProprietary methods for table structure recognition and parsingFine-tuned vision-language models for interpreting charts, tables, and figuresIf you possess a strong passion for the convergence of computer vision, natural language processing, and data infrastructure, your contributions at Pulse will significantly impact our clients and help shape the future of document intelligence.

Jul 30, 2025

Apply

PhD GenAI Research Scientist Intern

Databricks

Internship|$54K/yr - $60K/yr|On-site|San Francisco, California

Company Overview: At Databricks, we are dedicated to empowering data teams to tackle some of the world’s most challenging issues, ranging from security threat detection to breakthroughs in cancer drug development. We achieve this by creating and operating the premier data and AI platform, allowing our clients to concentrate on the high-value challenges central to their missions. The Mosaic AI division equips organizations to develop AI models and systems utilizing their own data, utilizing technologies that span from fine-tuning large language models (LLMs) for specific enterprise domains to building complex AI systems that incorporate retrieval and agents. We believe that a company's AI models are as valuable as any other intellectual property and that high-quality AI models should be accessible to all. Job Summary: Our research team is focused on advancing the boundaries of “domain adaptation” — discovering how to create LLMs and AI systems that excel in specialized domains. We are investigating open research challenges across a variety of themes, including scaling and automating evaluation, fine-tuning with synthetic data, retrieval augmentation, and optimizing inference speed and efficiency. As a PhD GenAI Research Scientist Intern, you will collaborate with our research team on projects that aim to adapt LLMs and AI systems for enterprise settings. Your tasks may include: Enhancing, refining, and assessing methodologies from existing literature. Designing novel approaches for effective domain adaptation. Combining various methods to formulate innovative strategies for efficient post-training. Conducting evaluations of LLMs and AI systems.

Jan 30, 2026

Apply

Staff Data Scientist at Databricks | San Francisco, CA

Databricks

Full-time|$192K/yr - $260K/yr|On-site|San Francisco, California

At Databricks, our passion lies in empowering data teams to tackle some of the most challenging problems globally, from detecting security threats to advancing cancer drug development. We achieve this by creating and operating the world's premier Data Intelligence Platform, enabling our customers to concentrate on high-value challenges central to their missions. Founded in 2013 by the original creators of Apache Spark, Databricks has evolved from a small office in Berkeley, California, into a global powerhouse with over 1000 employees. We are trusted by thousands of organizations—from small startups to Fortune 100 companies—with their mission-critical workloads, establishing us as one of the fastest-growing SaaS companies worldwide. Our engineering teams are dedicated to developing highly technical products that address real-world needs. We continuously push the limits of data and AI technology while maintaining the resilience, security, and scalability crucial for our customers' success on our platform. Operating one of the largest-scale software platforms, our infrastructure comprises millions of virtual machines that generate terabytes of logs and process exabytes of data daily. At this scale, we routinely encounter cloud hardware, network, and operating system faults, and our software is designed to shield our customers from such issues effectively. As a Data Scientist on the Data Team, you will play a pivotal role in fostering a data-driven culture within Databricks by addressing product and business challenges. The Data Team serves as an internal, production 'customer' that utilizes Databricks and influences the future trajectory of our products. Your Impact: Steer the direction of key data science initiatives including segmentation, recommendation systems, forecasting, product analytics, churn prediction, and insights. Collaborate closely with Engineering, Product Management, Sales, and Customer Success to discern product usage patterns and trends, facilitating data-driven decisions, recommendations, and forecasts. Manage stakeholder expectations in your focus area—gather evolving requirements, define project OKRs and milestones, and effectively communicate progress and results to non-technical audiences. Mentor and support junior data scientists within the team, assisting with project planning, technical decisions, and conducting code and documentation reviews. Advocate for the data science discipline across the organization, amplifying our commitment to becoming more data-driven. Develop self-service internal data products to simplify data access within the company. Represent Databricks at academic and industry conferences and events.

Jan 30, 2026

Apply

Solutions Architect at Databricks | San Francisco, CA

Databricks

Full-time|$180K/yr - $247.5K/yr|On-site|San Francisco, California

Location: San Francisco Bay Area, California REQ ID FEQ127R103 Mission Join our team as a Solutions Architect, where you will play a pivotal role in empowering large enterprises to harness the full potential of their data. Collaborating closely with the Enterprise Account Executive (AE), you will define and implement the technical strategy for our most significant accounts, driving deeper and wider adoption of machine learning (ML) and artificial intelligence (AI) solutions. Your expertise in value-based selling, technical account management, and leadership will be key in maximizing client success. You will work alongside a dedicated team that builds proofs of concept and showcases Databricks' innovative products. A strong technical understanding of ML and AI applications across various use cases relevant to the High Tech and Manufacturing sectors is essential. Outcomes Act as the primary technical voice for Databricks across multiple clients. Guide customers through a transformational journey, facilitating their evaluation and adoption of Databricks solutions. Implement tailored technical strategies within accounts based on a deep understanding of their needs. Foster a network of technical champions within client organizations. Align technical strategies with Databricks solutions effectively. Provide structured mentorship to enhance the skills of team members. Earn respect from peers through your experience, insights, and contributions. Competencies: Expertise in establishing and leading virtual teams to achieve success within client accounts. Experience managing large global accounts with an annual recurring revenue (ARR) exceeding $1 million. Ability to build relationships with executives and key influencers. Skill in presenting persuasive viewpoints to decision-makers, steering them toward successful outcomes. Strong technical background in big data, data science, and cloud technologies. Proven ability to drive data-driven business transformation. Programming experience in Python, R, Scala, or Java. Preferred: Databricks Certification.

Feb 1, 2026

Apply

AI Inference Engineer at Perplexity | San Francisco

Perplexity

Full-time|On-site|San Francisco

Join our dynamic team at Perplexity as an AI Inference Engineer, where you will be at the forefront of deploying cutting-edge machine learning models for real-time inference. Our tech stack includes Python, Rust, C++, PyTorch, Triton, CUDA, and Kubernetes, providing you with a chance to work on large-scale applications that make a real impact.Key ResponsibilitiesDesign and develop APIs for AI inference that cater to both internal and external stakeholders.Conduct benchmarking and identify bottlenecks within our inference stack to enhance performance.Ensure the reliability and observability of our systems while promptly addressing any outages.Investigate innovative research and implement optimizations for LLM inference.

Jun 10, 2024

Apply

Senior Developer Advocate at Databricks | San Francisco, CA

Databricks

Full-time|$153K/yr - $210.4K/yr|On-site|San Francisco, California

Are you a dedicated Developer Advocate with a passion for bridging the gap between technology and the developer community? If you thrive on connecting with others, hosting engaging events, and diving into the nuances of application development and developer experience, we want to hear from you!As a Senior Developer Advocate at Databricks on the Neon DevRel team, you'll play a pivotal role in linking our engineering teams with an expanding community of application and agent developers utilizing Postgres and Neon. Your technical acumen will be essential in maintaining and enhancing our SDKs, educating users, and shaping the future of Agent and Developer Experience (AX & DX) for Neon.Developer experience will be central to your responsibilities. You will help enhance our TypeScript SDK and CLI, manage Neon’s MCP server, and develop other developer-oriented tools, all while keeping ahead of the evolving AI landscape. Your expertise in DevTools and AX will be shared through presentations, meetups, blog articles, videos, and community interactions, empowering developers to build more effectively with Neon and demonstrating our commitment to exceptional Developer Experience (DX) and Agent Experience (AX).Collaboration will be key as you work closely with the neon.com community, product teams, and engineering staff, ensuring that developer feedback directly influences product direction. You'll also partner with fellow advocates, program managers, and the broader Databricks DevRel organization to create a cohesive and impactful strategy for agent experiences.The ideal candidate will embody the core values of our DevRel team: a profound enthusiasm for application development, strong empathy for developer needs, and the ability to convey complex concepts with clarity and effectiveness.In this role, you’ll foster and expand the Neon community through top-notch technical content and hands-on experiences, continuously feeding insights back into the product to further enhance the developer experience.

Feb 19, 2026

Apply

Software Engineer, Model Inference

OpenAI

Full-time|On-site|San Francisco

About Our TeamJoin the Inference team at OpenAI, where we leverage cutting-edge research and technology to deliver exceptional AI products to consumers, enterprises, and developers. Our mission is to empower users to harness the full potential of our advanced AI models, enabling unprecedented capabilities. We prioritize efficient and high-performance model inference while accelerating research advancements.About the RoleWe are seeking a passionate Software Engineer to optimize some of the world's largest and most sophisticated AI models for deployment in high-volume, low-latency, and highly available production and research environments.Key ResponsibilitiesCollaborate with machine learning researchers, engineers, and product managers to transition our latest technologies into production.Work closely with researchers to enable advanced research initiatives through innovative engineering solutions.Implement new techniques, tools, and architectures that enhance the performance, latency, throughput, and effectiveness of our model inference stack.Develop tools to identify bottlenecks and instability sources, designing and implementing solutions for priority issues.Optimize our code and Azure VM fleet to maximize every FLOP and GB of GPU RAM available.You Will Excel in This Role If You:Possess a solid understanding of modern machine learning architectures and an intuitive grasp of performance optimization strategies, especially for inference.Take ownership of problems end-to-end, demonstrating a willingness to acquire any necessary knowledge to achieve results.Bring at least 5 years of professional software engineering experience.Have or can quickly develop expertise in PyTorch, NVidia GPUs, and relevant optimization software stacks (such as NCCL, CUDA), along with HPC technologies like InfiniBand, MPI, and NVLink.Have experience in architecting, building, monitoring, and debugging production distributed systems, with bonus points for working on performance-critical systems.Have successfully rebuilt or significantly refactored production systems multiple times to accommodate rapid scaling.Are self-driven, enjoying the challenge of identifying and addressing the most critical problems.

Feb 6, 2025

Create account — see all 11,418 results