Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Entry Level
Qualifications
The ideal candidate will possess a strong background in computer science or a related field, with a focus on artificial intelligence and optimization techniques. Experience with machine learning frameworks and proficiency in programming languages such as Python or C++ are essential. A passion for research and a problem-solving mindset will be key to your success.
About the job
Join Zyphra as a Research Engineer specializing in AI Performance and Kernel Optimization. In this role, you will work at the forefront of AI technologies, developing and optimizing kernel solutions that enhance the performance of our systems. You will collaborate with cross-functional teams, leveraging your expertise to drive innovation and efficiency.
About Zyphra
Zyphra is a leader in AI technology, dedicated to pushing the boundaries of what's possible. Our innovative solutions empower businesses to harness the power of artificial intelligence, driving efficiency and growth. We foster a collaborative and dynamic work environment where creativity and innovation thrive.
Similar jobs
1 - 20 of 5,540 Jobs
Search for Compute Optimization Researcher Engineer
Team Overview The infrastructure team at OpenAI manages the core systems that support AI workloads worldwide. As OpenAI expands its compute capabilities across company-owned data centers, cloud environments, and strategic partnerships, the need for careful planning and resource management grows. Reliable and cost-effective compute operations depend on this foundation. The Compute Optimization group operates at the intersection of engineering, operations, finance, and infrastructure strategy. This team develops models, decision tools, and planning systems to improve how compute resources are scheduled, deployed, and scaled as global needs shift. Role Overview OpenAI is hiring a Compute Optimization Researcher/Engineer to help maximize the use of compute capacity across the organization. This role addresses complex optimization challenges related to capacity allocation, demand forecasting, cluster planning, workload placement, and infrastructure utilization. Work includes building mathematical models, developing software systems, and collaborating with other teams to improve planning and use of compute resources. Areas of focus span GPU clusters, networking, storage, and data center infrastructure. Candidates with experience in operations research, optimization, applied mathematics, infrastructure systems, or large-scale capacity planning will be well-suited for this position. Location and Work Model This position is based in San Francisco, CA. OpenAI follows a hybrid schedule with three days per week in the office. Relocation assistance is offered.
Join Zyphra as a Research Engineer specializing in AI Performance and Kernel Optimization. In this role, you will work at the forefront of AI technologies, developing and optimizing kernel solutions that enhance the performance of our systems. You will collaborate with cross-functional teams, leveraging your expertise to drive innovation and efficiency.
Full-time|$166K/yr - $225K/yr|On-site|San Francisco, California
At Databricks, we are dedicated to empowering data teams to tackle the world's most challenging problems, from detecting security threats to advancing cancer drug development. We achieve this by offering the premier data and AI platform, allowing our customers to concentrate on their mission-critical challenges. The Mosaic AI organization assists companies in developing AI models and systems utilizing their own data, employing technologies that range from training large language models (LLMs) from the ground up to employing advanced retrieval methods for enhanced generation. We pride ourselves on pushing the boundaries of science and operationalizing our innovations. Mosaic AI believes that a company’s AI models hold intrinsic value, akin to any other core intellectual property, and that superior AI models should be accessible to all. Job Overview As a research engineer in the Scaling team, you will stay abreast of the latest advancements in deep learning and pioneer new methodologies that surpass the current state of the art. You will collaborate with a diverse team of researchers and engineers, sharing insights and expertise. Most importantly, you will be passionate about our customers, striving to ensure their success in implementing cutting-edge LLMs and AI systems by translating our scientific knowledge into practical applications. Your Impact Enhance performance through innovative optimization techniques, including kernel fusion, mixed precision, memory layout optimization, tiling strategies, and tensorization tailored for training-specific patterns. Design, implement, and optimize high-performance GPU kernels for training workloads, including attention mechanisms, custom layers, gradient computations, and activation functions, specifically for NVIDIA architectures. Create and implement distributed training frameworks for large language models, incorporating parallelism strategies (data, tensor, pipeline, ZeRO-based) and optimized communication patterns for gradient synchronization and collective operations. Profile, debug, and optimize comprehensive training workflows to pinpoint and resolve performance bottlenecks, utilizing memory optimization techniques such as activation checkpointing, gradient sharding, and mixed precision training.
Full-time|On-site|San Francisco (London/Europe - OK)
Tavus – Multimodal AI Model OptimizationResearch EngineerAt Tavus, we are pioneering the human aspect of AI technology. Our objective is to make human-AI interactions as seamless and natural as in-person conversations, allowing for a human touch in areas that were once considered unscalable.We accomplish this through groundbreaking research in multimodal AI, focusing on human-to-human communication modeling (encompassing language, audio, and video) and the development of audio-visual avatar behaviors. Our innovative models drive applications ranging from text-to-video AI avatars to real-time conversational video experiences across sectors such as healthcare, recruitment, sales, and education.By empowering AI to perceive, listen, and engage with an authentic human-like presence, we are laying the groundwork for the next generation of AI workers, assistants, and companions.As a Series B company, we are supported by renowned investors, including Sequoia, Y Combinator, and Scale VC. Join us as we shape the future of human-AI interaction.The RoleWe are seeking an accomplished Research Scientist/Engineer with expertise in model optimization to be a vital part of our core AI team.The ideal candidate thrives in dynamic startup environments, is adept at setting priorities independently, and is open to making calculated decisions. We are moving swiftly and need individuals who can help navigate our path forward.Your MissionTransform state-of-the-art research models into fast, efficient, and production-ready systems through techniques such as sparsification, distillation, and quantization.Oversee the optimization lifecycle for critical models: establish metrics, conduct experiments, and evaluate trade-offs among latency, cost, and quality.Collaborate closely with researchers and engineers to convert innovative concepts into deployable solutions.RequirementsExtensive experience in deep learning with PyTorch.Practical experience in model optimization and compression, including knowledge distillation, pruning/sparsification, quantization, and mixed precision.Familiarity with efficient architectures such as low-rank adapters.Strong grasp of inference performance and GPU/accelerator fundamentals.Proficient in Python coding and adherence to best practices in research engineering.Experience with large models and datasets in cloud environments.Capability to read ML literature, reproduce results, and modify ideas accordingly.
Join our innovative team at Zyphra as a Research Engineer specializing in Brain-Computer Interface (BCI) Models. In this pivotal role, you will contribute to groundbreaking research and development initiatives in the field of neuroscience and artificial intelligence. Your expertise will help shape the future of communication between humans and machines, enhancing the quality of life for countless individuals.As a Research Engineer, you will be responsible for designing, implementing, and testing advanced BCI models, collaborating closely with a diverse team of scientists and engineers. Your work will play a crucial role in advancing our understanding of neural dynamics and their applications in technology.
Join DigitalOcean as a Senior Engineer focused on Inference Optimizations, where you will play a pivotal role in enhancing our AI and machine learning capabilities. Collaborate with a talented team to develop cutting-edge solutions that optimize inference processes across various applications.
Position OverviewAt Bobyard, we tackle complex computer vision challenges to streamline takeoff processes for contractors, saving them extensive hours on each project. Our innovative systems deliver results that surpass human capabilities.As a Computer Vision Research Engineer, you will enjoy significant autonomy in the design, execution, and refinement of groundbreaking products. In our fast-paced startup environment, you will become a foremost expert in the solutions you develop. We seek exceptional engineers who not only possess a visionary outlook for the product but can also focus deeply to produce outstanding code.
At Merge Labs, we are at the forefront of research, dedicated to uniting biological and artificial intelligence to enhance human capability, autonomy, and overall experience. Our innovative approach focuses on developing revolutionary brain-computer interfaces that offer high-bandwidth interaction with the brain, seamlessly integrate advanced AI, and are designed to be safe and accessible for everyone.About the Team:Our Bio team is responsible for designing, constructing, and characterizing the biotechnologies that underpin the next generation of brain-computer interfaces. By integrating molecular engineering, synthetic biology, neuroscience, and cutting-edge physical methods such as ultrasound, we aim to establish less invasive, high-bandwidth connections with neurons. The Bio team is dedicated to developing our core molecular technologies, validating their performance both in vitro and in vivo, and showcasing their advanced capabilities in animal models. We create custom experimental setups and pipelines while collaborating closely with engineers and data scientists to tackle some of the most challenging problems in biotechnology.About the Role:We are seeking a Senior/Principal Machine Learning Biophysicist to spearhead the creation of scalable molecular dynamics pipelines, integrating physics-based models with machine learning frameworks. You will build the molecular modeling foundations of the company from first principles, establishing tools and workflows for simulating, analyzing, and interpreting biomolecular dynamics to elucidate function relationships. Over time, your contributions will help translate these frameworks into predictive models that expedite molecular engineering, guide experimental campaigns, and facilitate the discovery of highly functional molecules.Key Responsibilities:Develop the scientific and engineering framework for protein structure modeling and molecular dynamics, along with integrations into downstream ML frameworks.Collaborate with wet-lab scientists to establish realistic optimization objectives and encode domain-specific priors and constraints.Prototype modeling frameworks utilizing internal and public datasets; benchmark and validate performance.Make complex analyses accessible to non-domain experts through democratization of first-principles analysis.Lead the development of ML frameworks that explicitly incorporate first-principles priors.Stay abreast of the latest advancements in deep learning and molecular dynamics.
Join Merge Labs, a pioneering research facility dedicated to merging biological and artificial intelligence to enhance human capabilities, agency, and experience. We aim to achieve this by crafting innovative brain-computer interfaces that communicate with the brain at high bandwidth, seamlessly integrate with cutting-edge AI, and prioritize safety and accessibility for all users.About the Team:At Merge Labs, we are on a mission to revolutionize brain-computer interfaces by leveraging advancements in synthetic biology, neuroscience, AI, and non-invasive imaging technologies. Our cross-functional data science team is situated at the convergence of computational modeling, neuroscience, and biomolecular engineering. This collaborative unit works closely with wet-lab scientists, automation specialists, and data engineers to develop machine learning frameworks that facilitate rapid molecule discovery and device enhancement.About the Role:We are seeking a talented Senior / Principal ML Scientist to architect and scale Bayesian optimization and reinforcement learning frameworks that guide molecular engineering initiatives through iterative design-build-test-learn (DBTL) cycles. You will start with a fresh approach to construct the company's closed-loop optimization infrastructure, establishing the data and modeling foundations that link experiments with these ML frameworks. Over time, you will transition prototypes into operational pipelines, significantly enhancing experimental throughput and discovery success across various biomolecular and neuroengineering sectors.Key Responsibilities:Develop the scientific and engineering framework for active learning and closed-loop optimization, encompassing data ingestion, ML modeling, and library design.Collaborate with wet-lab scientists to establish feasible optimization objectives while incorporating domain-specific priors and constraints.Create prototypes for representation learning and acquisition strategies utilizing both internal and public datasets; benchmark and validate the performance of models.Integrate machine learning models with experimental data streams, making them accessible to non-domain experts for broader utilization.Extend machine learning frameworks to accommodate multi-objective or constrained optimization challenges.Stay abreast of the latest advancements in Bayesian optimization, active learning, and reinforcement learning, and prototype innovative algorithms to enhance the company's capabilities.
Team and Platform Focus The Compute Infrastructure team at OpenAI designs, builds, and maintains the systems that support AI research at scale. This work brings together accelerators, CPUs, networking, storage, data centers, orchestration software, agent infrastructure, developer tools, and observability. The aim is to create a reliable, unified experience for researchers and product teams across the company. Projects span the full stack: capacity planning, cluster lifecycle management, bare-metal automation, and distributed systems. The team manages Kubernetes scheduling, system optimization, high-performance networking, storage, fleet health, reliability, workload profiling, benchmarking, and improvements to the developer experience. Even small improvements in communication, scheduling, hardware efficiency, or debugging can significantly accelerate research. OpenAI matches engineers to areas within Compute Infrastructure that align with their skills and interests. Role Overview This Software Engineer role centers on building and evolving the compute platform that supports OpenAI’s research and products. Candidates may bring expertise in low-level systems, high-performance computing, distributed infrastructure, reliability, CaaS, agent infrastructure, developer platforms, tooling, or infrastructure user experience. The most important qualities are strong analytical skills, the ability to write resilient code, and a collaborative approach that helps colleagues move faster and with more confidence. What You Will Work On Working close to hardware or at the user interaction layer Developing CaaS and agent infrastructure Managing control and data planes that connect the system Bringing new supercomputing capabilities online Optimizing training workloads through profiler traces and benchmarks Improving NCCL and collective communication Analyzing GPUs, NICs, topology, firmware, thermal dynamics, and failure modes Designing abstractions to unify diverse clusters into a single platform Areas of Expertise No one is expected to cover every area listed. Some engineers focus on system performance, kernel or runtime behavior, large-scale networking protocols, RDMA, NCCL, GPU hardware, benchmarking, scheduling, or hardware reliability. Others improve the platform’s usability through APIs, tools, workflows, and developer experience. The team values strong engineering judgment and a drive to advance the field.
Full-time|$200K/yr - $250K/yr|On-site|San Francisco, California, United States
Join fuku as an Applied Research Engineer in San Francisco, CA, where you will be at the forefront of AI video data research. As a crucial member of our team, your mission will involve building robust, high-performance frameworks and extensive pipelines to process and decode video data with exceptional accuracy. You will tackle complex research challenges, refine machine learning models and APIs, and deliver comprehensive solutions across computer vision, audio, and text processing domains. This role is designed for engineers who thrive in both research and production environments and are eager to spearhead the evolution of video understanding from research to deployment.
WHO WE AREAt Applied Compute, we specialize in creating Specific Intelligence for enterprises—agents that continually learn from a company's processes, data, expertise, and goals. Our mission is to develop a continual learning layer and platform that captures context, memory, and decision traces across organizations, fostering an environment where specialized agents perform real work effectively.Why Join Us: We operate at a unique intersection of product development and advanced research. Our product team is building the platform for a new generation of digital coworkers, while our research team is pioneering advancements in post-training and reinforcement learning to enrich product experiences. Our applied research engineers collaborate closely with customers, deploying agents into production seamlessly. This blend of robust product focus, in-depth research, and real-world application is our approach to integrating AI into enterprises. We pride ourselves on being product-led, research-enabled, and forward-deployed.Our Team: We are a diverse group of engineers, researchers, and operators, many of whom are former founders with experience in RL infrastructure at OpenAI, data foundations at Scale AI, and various systems across renowned firms like Two Sigma and Watershed. We collaborate with Fortune 50 clients and are proudly backed by reputable investors including Kleiner Perkins, Benchmark, Sequoia, Lux, and Greenoaks.Who Thrives Here: We seek individuals passionate about applying innovative research and complex systems to solve real-world challenges. You should be adept at navigating new environments swiftly, whether it's a fresh codebase, a customer's data architecture, or an unfamiliar problem domain. Our team values collaboration with customers, emphasizing active listening and understanding their workflows. We find that former founders, individuals with extensive side projects, and those who demonstrate end-to-end ownership excel in our culture.THE ROLEIn the role of Research Systems Engineer, you will train frontier-scale models and devise methodologies to implement continual learning in enterprise settings. Your responsibilities will include designing and executing large-scale experiments, investigating cutting-edge reinforcement learning techniques, and developing tools to gain insights into training processes. This position lies at the crossroads of research and systems engineering, where you will innovate algorithms alongside researchers and collaborate with infrastructure engineers to implement them on GPUs.
Join Our Pioneering TeamAt Sieve, we are trailblazers in the realm of AI research, specifically dedicated to harnessing the power of video data. Our cutting-edge infrastructure processes exabyte-scale video, utilizing innovative video understanding methodologies, and integrating diverse data sources to create groundbreaking datasets that redefine video modeling. With video accounting for a staggering 80% of global internet traffic, it stands as the cornerstone of digital creativity, communication, gaming, AR/VR, and robotics. Our mission is to eliminate the primary barrier to the growth of these technologies: the scarcity of high-quality training data.Having collaborated with leading AI laboratories, we achieved $XXM in revenue last quarter alone with a compact team of just 15 talented individuals. Our successful Series A funding round last year, backed by prestigious firms such as Matrix Partners, Swift Ventures, Y Combinator, and AI Grant, underscores our potential for exponential growth.The Role You’ll PlayAs an Applied Research Engineer at Sieve, you will be instrumental in constructing high-performance building blocks and expansive pipelines to achieve high-precision video comprehension at internet scale. Your role will often involve tackling ambiguous research challenges and devising ingenious solutions. You will engage with domains including computer vision, audio processing, and text processing.The ideal candidate will possess a strong command of models and APIs, leveraging innovative pre/post-processing techniques, parallelism, pipelining, inference optimization, and occasional fine-tuning to maximize performance.
About AlljoinedAt Alljoined, we are pioneering solutions to eliminate communication barriers between humans and technology by non-invasively interpreting brain activity. Through advanced deep learning techniques applied to extensive EEG datasets gathered from cost-effective hardware, we decode various stimuli, including images, emotions, and even thoughts. Initially focusing on business applications, our long-term vision includes creating a consumer-facing interface to enhance productivity, entertainment, and AI applications.Our dedicated team of researchers is expanding, and we are committed to developing a high-performance and accessible interface that significantly enhances individual lives and contributes positively to societal well-being.
Eventual Computing builds tools that help AI teams work with large, complex datasets. Based in San Francisco, the company supports projects in robotics, autonomous vehicles, and advanced video generation. Its open-source engine, Daft, is already in use at organizations with demanding data needs. The team focuses on making data curation and model training more efficient, so the right datasets are always within reach. The office is located in the Mission district, where collaboration with leading AI labs and infrastructure companies is part of daily work. Role overview The Research Engineer - Multimodal Data will join the Visual Understanding team. This position centers on building solutions to make vast amounts of video and sensor data accessible and easy to query. The work directly supports researchers who need to find and use specific datasets quickly. What you will do Develop and refine systems that process petabytes of multimodal data, including video and sensor streams. Apply vision-language models to improve how data is discovered and retrieved. Define and influence the roadmap for visual understanding features. Train models to streamline large-scale data annotation and improve efficiency for research teams.
About UsAt Applied Compute, we are pioneering Specific Intelligence for enterprises through advanced AI agents that learn continuously from organizational processes, data, and objectives. We recognize the significant gap between what AI models can achieve in isolation and their performance within actual business contexts, often failing to adapt to feedback. Our mission is to build a continual learning layer that captures context, memory, and decision traces across enterprises, creating environments where specialized agents excel at real tasks.Why Join Us? We operate at a unique intersection of product development and research. Our product team is developing the platform that empowers a new generation of digital coworkers, while our research team is advancing post-training and reinforcement learning to enhance product experiences. As applied research engineers, we work closely with customers to deploy models into production effectively. This blend of robust product focus, deep research, and customer engagement is our strategy for successfully integrating AI into enterprise operations. We are product-led, research-enabled, and strategically deployed.Meet Our Team: Our team consists of engineers, researchers, and operators, many of whom are former founders. We have established RL infrastructure at OpenAI, developed data foundations at Scale AI, and built systems at Together, Two Sigma, and Watershed. We collaborate with Fortune 50 clients, including DoorDash, Mercor, and Cognition, and are backed by esteemed investors such as Benchmark, Sequoia, and Lux.Who Excels Here: We seek individuals passionate about applying innovative research and complex systems to overcome real-world challenges. Candidates should thrive in unfamiliar environments, whether it involves navigating new codebases, understanding new customer data architectures, or tackling unfamiliar problem domains. A genuine enjoyment of customer interactions—listening, empathizing, and comprehending how work is accomplished within organizations—is essential. Those with prior entrepreneurial experience, extensive side projects, or a proven ability to manage initiatives from start to finish will thrive in our culture.Your RoleAs a Research Systems Engineer, you will be responsible for training cutting-edge models and developing methodologies that facilitate continual learning within enterprise settings. You will design and execute large-scale experiments, delve into advanced reinforcement learning techniques, and create tools that enhance our understanding of the training process. This role uniquely positions you at the crossroads of research and systems engineering, where you will innovate new algorithms in collaboration with researchers and work alongside infrastructure engineers to deploy them on GPUs.
On-site|On-site|New York City, NY; San Francisco, CA; Seattle, WA
About AnthropicAt Anthropic, we are dedicated to developing AI systems that are reliable, interpretable, and steerable. We aim to ensure that AI is safe, beneficial, and aligned with the needs of both our users and society. Our expanding team consists of passionate researchers, engineers, policy experts, and business leaders collaborating to create groundbreaking AI solutions.About the RoleWe are seeking a talented Research Engineer with a solid foundation in computer vision, who shares our belief that visual and spatial reasoning are essential for unleashing the full potential of large language models (LLMs). In this collaborative role, you will engage in research, development, and evaluation of cutting-edge Claude models, with a specific emphasis on enhancing visual and spatial capabilities. You will contribute across multiple facets of our research initiatives, employing a full-stack approach that encompasses pretraining, reinforcement learning (RL), and runtime techniques such as agentic harnesses. Additionally, you will work closely with our product team to ensure that your vision enhancements positively influence Claude's performance in real-world applications.
Company OverviewEcho Neurotechnologies is a pioneering startup in the Brain-Computer Interface (BCI) sector, dedicated to revolutionizing the lives of individuals with disabilities through advanced hardware engineering and artificial intelligence solutions. Our vision is to develop innovative technologies that empower users, restoring autonomy and enhancing their quality of life.Team CultureWe pride ourselves on cultivating an inclusive and dynamic team of skilled professionals who are passionate about their work. Our startup environment encourages ownership of impactful decisions and fosters continuous learning and collaboration, where every contribution is essential to our collective success.Job SummaryWe are on the lookout for a talented Machine Learning Research Engineer specialized in speech modeling to join our innovative team. The successful candidate will leverage ML/AI methodologies to create and refine adaptable speech models aimed at brain-computer interface applications, ultimately making a difference in the lives of patients facing severe disabilities. Candidates should possess significant expertise in speech modeling, feature engineering, time-series analysis, and the development of custom ML models.Key ResponsibilitiesDesign and evaluate diverse model architectures and strategies to enhance the accuracy and resilience of models for interpreting speech from brain activity.Investigate and implement cutting-edge speech features and representations within neural-decoding frameworks, informed by speech science and functional neurophysiology.Create pipelines for generating personalized and naturalistic speech from both text and brain activity inputs.Develop algorithms to analyze both intact and compromised speech signals, identifying biomarkers linked to various diseases and disabilities.Collaborate within a tight-knit team to build models, define R&D workflows, and translate scientific discoveries into practical applications.Contribute to best practices ensuring reliability, observability, reproducibility, and scientific rigor across the R&D landscape.Maintain well-documented, versioned code, analysis pipelines, and results for maximum interpretability and reproducibility.
About Our TeamJoin our innovative Robotics team at OpenAI, where we are dedicated to pioneering general-purpose robotics and advancing towards AGI-level intelligence within dynamic, real-world environments. Our collaborative efforts span the entire model stack as we integrate state-of-the-art hardware and software to explore a diverse array of robotic form factors. We aim to harmonize high-level AI capabilities with the constraints of physical systems to enhance the quality of life for people worldwide.About the RoleAs a Research Engineer specializing in SLAM and Multi-View Geometry, you will be instrumental in developing systems that empower robots to perceive, track, and reconstruct their environment in 3D using multi-camera and multimodal sensor data. Your work will focus on creating real-time and offline SLAM pipelines for teleoperation and data collection while also building scalable systems for 3D structure reconstruction from extensive datasets.We seek individuals who possess a strong foundation in computer vision and hands-on experience in constructing robust perception systems. The ideal candidate is adept in both classical geometry-based techniques and contemporary machine learning methods, thriving in close collaboration with AI researchers and engineers.This position is based in San Francisco, CA, utilizing a hybrid work model of 4 days in the office each week, and we provide relocation assistance for new hires.Key Responsibilities:Develop and implement online SLAM systems for robotic data collection utilizing multi-camera sensor arrays and teleoperation platforms.Create systems for large-scale 3D reconstruction and point tracking across extensive datasets, facilitating innovative approaches to world modeling and perception.Collaborate with research and engineering teams to enhance multi-view geometry pipelines for large datasets.Elevate the accuracy, robustness, and scalability of perception systems utilized in robotics data collection and training pipelines.Engage in cross-disciplinary collaboration with robotics, perception, and ML teams to integrate geometry-based methods with learned models.
About Our TeamAt OpenAI, our Foundations team is dedicated to examining how model behavior evolves as we scale up models, data, and computing resources. We meticulously analyze the relationships between model architecture, optimization strategies, and training datasets to inform the design and training of next-generation models.About the PositionAs a Team Lead in Research Inference, you will be instrumental in constructing systems that empower advanced AI models to operate efficiently at scale. Your role lies at the crossroads of model research and systems engineering, where you will translate innovative architectural concepts into high-performance inference systems, clearly illustrating the trade-offs in performance, memory usage, and scalability.Your contributions will significantly shape model design, evaluation, and iteration processes across our research organization. By developing and refining high-performance inference infrastructures, you will provide researchers with the tools necessary to explore new ideas while understanding their computational and systems implications.This position does not involve serving products; instead, it supports research through a focus on performance, accuracy, and realism, ensuring that our AI research is firmly rooted in scalable solutions.ResponsibilitiesDesign and develop optimized inference runtimes for large-scale AI models, emphasizing efficiency, reliability, and scalability.Take ownership of optimizing core execution processes, including model execution, memory management, batching, and scheduling.Enhance and expand distributed inference across multiple GPUs, focusing on parallelism, communication patterns, and runtime coordination.Implement and refine critical inference operators and kernels based on real-world workloads.Collaborate closely with research teams to ensure accurate and efficient support for new model architectures within inference systems.Identify and resolve performance bottlenecks through comprehensive profiling, benchmarking, and low-level debugging.Contribute to the observability, correctness, and reliability of large-scale AI systems.Ideal Candidate ProfileExperience in developing production-level inference systems, beyond just training and executing models.Proficient in GPU-centric performance engineering, including managing memory behavior and understanding latency/throughput trade-offs.Strong analytical skills and familiarity with performance profiling tools.
Mar 19, 2026
Sign in to browse more jobs
Create account — see all 5,540 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.