Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Experience
Qualifications
Are You a Great Fit?You possess a genuine enthusiasm for customer engagement and a knack for addressing complex, often ambiguous real-world issues. You have experience as a machine learning engineer, preferably with a focus on AI products for external clients. Your background includes being a technical founder or holding a similar role. You have hands-on experience with transformer models and large language models (LLMs). You excel in communication and thrive in cross-functional team settings. You have a proven track record of owning projects from start to finish, and you're quick to learn and adapt to new challenges. Your collaborative and kind working style enhances team dynamics.
About the job
Join Reka as a Member of the Technical Staff in Applied AI!
Leverage cutting-edge AI models to tackle intricate real-world challenges.
Engage in close collaboration with researchers and fellow team members to explore the latest developments in AI and ML.
Partner with our customers to seamlessly integrate our innovative models into their existing technology frameworks.
Drive business success with a strong sense of product ownership and accountability.
Be part of a pioneering team in a rapidly growing environment, taking on diverse roles.
About Reka
Reka is dedicated to developing impactful multimodal artificial intelligence that empowers organizations and businesses. As a globally distributed startup focused on foundational AI models, we are headquartered in the vibrant San Francisco Bay Area, California. Our remote-first approach allows us to gather top talent from across the world, with our founding team having contributed significantly to recent AI breakthroughs.
Similar jobs
1 - 20 of 2,687 Jobs
Search for Technical Staff Member Machine Learning Systems Inference
At Gimlet Labs, we are pioneering the development of the first heterogeneous neocloud designed specifically for AI workloads. As the demand for AI systems surges, traditional homogeneous infrastructures face critical limits in power, capacity, and cost. Our innovative platform effectively decouples AI workloads from their hardware foundations, intelligently partitioning tasks and orchestrating them to the most suitable hardware for optimal performance and efficiency. This strategy fosters heterogeneous systems that span multiple vendors and generations, including cutting-edge accelerators, enabling significant enhancements in performance and cost-effectiveness at scale.In addition to this foundational work, Gimlet is establishing a robust neocloud for agentic workloads. Our clients benefit from deploying and managing their workloads via stable, production-ready APIs, without the need to navigate hardware selection or performance optimization intricacies.We collaborate with foundation labs, hyperscalers, and AI-native companies to drive real production workloads capable of scaling to gigawatt-class AI datacenters.We are currently seeking a Member of Technical Staff specializing in ML systems and inference. In this pivotal role, you will be responsible for designing and constructing inference systems that facilitate the execution of complete models in real production environments. You will operate at the intersection of model architecture and system performance to ensure that inference processes are swift, predictable, and scalable.This position is perfect for engineers with a deep understanding of modern model execution and a passion for optimizing latency, throughput, and memory utilization across the entire inference lifecycle.
At Magic, we are driven by our mission to develop safe Artificial General Intelligence (AGI) that propels humanity forward in addressing the most critical challenges. We firmly believe that the future of safe AGI lies in automating research and code generation, allowing us to enhance models and tackle alignment issues more effectively than humans alone can manage. Our innovative approach combines cutting-edge pre-training, domain-specific reinforcement learning (RL), ultra-long context, and efficient inference-time computation to realize this vision.Position OverviewAs a Software Engineer within the Inference & RL Systems team, you will play a pivotal role in designing and managing the distributed systems that enable our models to function seamlessly in production, supporting extensive post-training workflows.This position operates at the intersection of model execution and distributed infrastructure, focusing on systems that influence inference latency, throughput, stability, and the reliability of RL and post-training training loops.Our long-context models impose significant execution demands, including KV-cache scaling, managing memory constraints for lengthy sequences, batching strategies, long-horizon trajectory rollouts, and ensuring consistent throughput under real-world workloads. You will be responsible for the infrastructure that ensures both production inference and large-scale RL iterations are efficient and dependable.Key ResponsibilitiesCraft and scale high-performance inference serving systems.Optimize KV-cache management, batching methods, and scheduling processes.Enhance throughput and latency for long-context tasks.Develop and sustain distributed RL and post-training infrastructure.Boost reliability across rollout, evaluation, and reward pipelines.Automate fault detection and recovery mechanisms for serving and RL systems.Analyze and eliminate performance bottlenecks across GPU, networking, and storage components.Collaborate with Kernel and Research teams to ensure alignment between execution systems and model architecture.QualificationsSolid foundation in software engineering and distributed systems.Proven experience in building or managing large-scale inference or training systems.In-depth understanding of GPU execution constraints and memory trade-offs.Experience troubleshooting performance issues in production machine learning systems.Capability to analyze system-level trade-offs between latency, throughput, and cost.
As a Technical Staff Member specializing in Machine Learning, you will:Engage in the complete development lifecycle of innovative large-scale deep learning models.Curate datasets, architect solutions, implement algorithms, and train and assess models to enhance our offerings.Work collaboratively with engineers and researchers to convert groundbreaking research into real-world applications.Join us at a pivotal time, take on diverse roles, and contribute to building transformative products from the ground up!
Full-time|$200K/yr - $400K/yr|Remote|San Francisco
At Inferact, we are on a mission to establish vLLM as the premier AI inference engine, revolutionizing AI progress by making inference both more accessible and efficient. Our founding team consists of the original creators and key maintainers of vLLM, positioning us uniquely at the nexus of cutting-edge models and advanced hardware.Role OverviewWe are seeking a passionate inference runtime engineer eager to explore and expand the frontiers of LLM and diffusion model serving. As models evolve and grow in complexity with new architectures like mixture-of-experts and multimodal designs, the demand for innovative solutions in our inference engine intensifies. This role places you at the heart of vLLM, where you will enhance model execution across a variety of hardware platforms and architectures. Your contributions will have a direct influence on the future of AI inference.
Join fal as we revolutionize the generative-media infrastructure landscape. Our mission is to enhance model inference performance, enabling creative experiences on an unprecedented scale. We are seeking a Staff Technical Lead for Inference & ML Performance, an individual who possesses a unique blend of deep technical knowledge and strategic foresight. In this pivotal role, you will lead a talented team dedicated to building and optimizing cutting-edge inference systems. If you're ready to influence the future of inference performance in a fast-paced and rapidly growing environment, we want to hear from you.Why This Role MattersIn this role, you will play a crucial part in shaping the future of fal’s inference engine, ensuring that our generative models consistently deliver outstanding performance. Your contributions will directly affect our capacity to swiftly provide innovative creative solutions to a diverse clientele, from individual creators to global brands.Your ResponsibilitiesDefine and steer the technical direction, guiding your team across various domains including kernels, applied performance, ML compilers, and distributed inference to develop high-performance solutions.
About Liquid AIBorn from the innovation of MIT CSAIL, Liquid AI is at the forefront of developing general-purpose AI systems that operate seamlessly across various deployment platforms, including data center accelerators and on-device hardware. Our solutions prioritize low latency, minimal memory consumption, privacy, and reliability. We collaborate with leading enterprises in sectors such as consumer electronics, automotive, life sciences, and financial services. As we experience rapid growth, we seek extraordinary talent to join our mission.The OpportunityJoin our Edge Inference team, where we transform Liquid Foundation Models into highly optimized machine code for resource-limited devices such as smartphones, laptops, Raspberry Pis, and smartwatches. As key contributors to llama.cpp, we establish the infrastructure necessary for efficient on-device AI. You will collaborate closely with our technical lead to tackle complex challenges that demand a profound understanding of machine learning architectures and hardware constraints. This role offers high ownership, allowing your code to be deployed in production environments and directly influence model performance on real devices.While San Francisco and Boston are preferred, we welcome applicants from other locations.
Join the team at Mirendil as a Member of Technical Staff specializing in Machine Learning Systems. In this role, you will leverage your expertise to develop innovative solutions that enhance our ML frameworks and contribute to groundbreaking projects in the AI space. Collaborate with top talent in a dynamic environment that promotes creativity and technical excellence.
Cohere builds and deploys advanced AI models used by developers and enterprises. These models support applications like content generation, semantic search, retrieval-augmented generation (RAG), and intelligent agents. The team’s work aims to make AI more accessible and practical for real-world use. Each person at Cohere plays a direct role in strengthening the models and increasing their value for clients. The company values practical outcomes and continuous improvement, focusing on delivering reliable technology to users. The team includes researchers, engineers, designers, and professionals from a wide range of backgrounds. Cohere believes that diverse perspectives help create better products. The company welcomes those interested in shaping the future of AI to join its mission.
Join our dynamic team at Liquid AI as a Member of Technical Staff where you will leverage your expertise in applied machine learning and recommendation systems to drive innovative solutions. You will collaborate with a talented group of professionals in a fast-paced environment, contributing to the development of advanced algorithms that enhance user experience and operational efficiency.
Shape the Future with AI InnovationAt asari.ai, we are on a mission to empower individuals to create intricate systems and tackle the world's most challenging problems with the aid of advanced AI agents that are both scalable and reliable.Our team boasts a history of publishing award-winning AI research and is supported by elite investors such as Eric Schmidt, Caltech, Jeff Dean, and JP Millon.We thrive in a fast-paced environment, leveraging first-principles thinking and purposeful building. We believe that exceptional outcomes arise when individuals take ownership, grow collectively, and share both the hurdles and victories.Your RoleDesign, train, and assess hybrid AI systems that excel at scale while making optimal trade-offs.Develop scalable data processing and machine learning frameworks.Address min-max challenges: maximizing output while minimizing resources.Enhance our productivity by eliminating operational and tooling constraints.Ideal Candidate ProfileYou find motivation in resolving complex real-world challenges.You possess a proven background in a technical field such as machine learning, computer science, physics, or mathematics.You demonstrate strong programming skills (in Python, C++) and mathematical proficiency.You exhibit robust conceptual and structured thinking abilities.You communicate clearly and effectively, both verbally and in writing.You are eager and capable of rapid learning.You embody a team-oriented spirit.You can independently organize, plan, prioritize, and execute tasks.You are driven by excellence, ownership, and a proactive mindset.Preferred QualificationsExperience with open-source projects, published research, or demonstrated expertise in machine learning.Familiarity with applying deep learning, reinforcement learning, unsupervised learning, and related techniques to large-scale problems.Experience with distributed computing and managing large datasets.Compensation and BenefitsAttractive salaryEquity optionsComprehensive health insurance (100% covered), dental (90% covered), and vision (90% covered). Dependent coverage is at 50%.
Join Our Team as a Member of Technical StaffAt Eragon, we are on the forefront of innovation, seeking a dedicated Member of Technical Staff to spearhead the development and deployment of state-of-the-art AI systems. This exciting role invites you to collaborate across modeling, systems, and product teams to transform innovative concepts into impactful real-world applications.Your Key ResponsibilitiesSystem Development & Deployment: Design, integrate, and launch AI-driven systems into production environments.Model Development: Refine, assess, and implement machine learning models in practical scenarios.Systems Engineering: Architect scalable pipelines for training, inference, and data processing workflows.Performance Optimization: Enhance system latency, throughput, cost-effectiveness, and reliability.Data & Infrastructure: Manage large-scale datasets and ensure seamless integration with internal tools and APIs.Cross-Functional Collaboration: Collaborate with product, research, and design teams to deliver complete features.Evaluation & Monitoring: Establish evaluation frameworks, monitoring tools, and feedback mechanisms.What You Bring to the TableEducation: A Bachelor's or Master's degree in Computer Science, Engineering, or a related field.Technical Skills: Proficient in Python and familiar with modern engineering or machine learning frameworks.Production Experience: Proven experience in building and deploying systems in production settings.Systems Knowledge: Knowledge of data pipelines, APIs, and cloud infrastructures (AWS, GCP).Practical ML Experience: Hands-on experience with machine learning models or data-centric systems.Preferred QualificationsExperience with deploying or scaling machine learning systems in production environments.Familiarity with large language models, agents, or workflow automation systems.Experience in distributed systems or large-scale infrastructure.A background in fast-paced or early-stage environments.
Full-time|On-site|San Francisco / Tel Aviv / Zurich
Tzafon is at the forefront of machine intelligence, operating as a cutting-edge foundation model lab dedicated to building scalable computing systems. With offices in San Francisco, Zurich, and Tel Aviv, we have secured over $12 million in funding to propel our mission of expanding the boundaries of machine intelligence.Our talented team comprises engineers and scientists with extensive expertise in ML infrastructure and research, founded by distinguished IOI and IMO medalists, PhD holders, and alumni from top tech firms such as Google DeepMind, Character, and NVIDIA. We specialize in training models and constructing infrastructure for swarms of agents to automate tasks across real-world environments.In this role, you'll collaborate between our product and post-training teams to deploy Large Action Models that deliver results. Your responsibilities will include building evaluations, benchmarks, and fine-tuning pipelines, as well as defining optimal model behavior and achieving it at scale.
Full-time|$190.9K/yr - $232.8K/yr|On-site|San Francisco, California
P-1285 About This Role Join Databricks as a Staff Software Engineer specializing in GenAI inference, where you will spearhead the architecture, development, and optimization of the inference engine that powers the Databricks Foundation Model API. Your role will be crucial in bridging cutting-edge research with real-world production requirements, ensuring exceptional throughput, minimal latency, and scalable solutions. You will work across the entire GenAI inference stack, including kernels, runtimes, orchestration, memory management, and integration with various frameworks and orchestration systems. What You Will Do Take full ownership of the architecture, design, and implementation of the inference engine, collaborating on a model-serving stack optimized for large-scale LLM inference. Work closely with researchers to integrate new model architectures or features, such as sparsity, activation compression, and mixture-of-experts into the engine. Lead comprehensive optimization efforts focused on latency, throughput, memory efficiency, and hardware utilization across GPUs and other accelerators. Establish and uphold standards for building and maintaining instrumentation, profiling, and tracing tools to identify performance bottlenecks and drive optimizations. Design scalable solutions for routing, batching, scheduling, memory management, and dynamic loading tailored to inference workloads. Guarantee reliability, reproducibility, and fault tolerance in inference pipelines, including capabilities for A/B testing, rollbacks, and model versioning. Collaborate cross-functionally to integrate with federated and distributed inference infrastructure, ensuring effective orchestration across nodes, load balancing, and minimizing communication overhead. Foster collaboration with cross-functional teams, including platform engineers, cloud infrastructure, and security/compliance professionals. Represent the team externally through benchmarks, whitepapers, and contributions to open-source projects. What We Look For A BS/MS/PhD in Computer Science or a related discipline. A solid software engineering background with 6+ years of experience in performance-critical systems. A proven ability to own complex system components and influence architectural decisions from conception to execution. A deep understanding of ML inference internals, including attention mechanisms, MLPs, recurrent modules, quantization, and sparse operations. Hands-on experience with CUDA, GPU programming, and essential libraries (cuBLAS, cuDNN, NCCL, etc.). A strong foundation in distributed systems design, including RPC frameworks, queuing, RPC batching, sharding, and memory partitioning. Demonstrated proficiency in diagnosing and resolving performance bottlenecks across multiple layers (kernel, memory, networking, scheduler).
The OpportunityJoin us at ComfyOrg as a Senior/Staff Applied Machine Learning Engineer! We are on the hunt for a passionate innovator who is enthusiastic about optimizing model inference. You will play a pivotal role in developing the heart of ComfyUI, our cutting-edge visual AI platform. Your expertise will help us push the limits of AI model performance, making them run faster and more efficiently than ever before.Are You a Match?You are fascinated by model inference, memory management, and torch optimizations.You possess experience in writing production-level PyTorch code that challenges performance standards.You have a passion for understanding the inner workings of AI models.You thrive on developing highly optimized code that consistently delivers results.You believe that the current landscape of ML deployment holds significant room for improvement.Your Responsibilities:Develop and enhance the core inference engine that drives ComfyUI.Optimize large models for speed and memory efficiency.Collaborate with our core team to architect new features.Tackle complex technical challenges within the visual AI domain.Contribute to the future direction of our technology.Experience with diffusion or LLM models, as well as creating custom nodes for ComfyUI, is highly beneficial.
TierZero builds tools that help engineering teams deliver and manage code efficiently. The platform enables quicker incident response, clearer operational visibility, and shared knowledge among engineers. Backed by $7 million from investors like Accel and SV Angel, TierZero supports clients such as Discord, Drata, and Framer as they strengthen infrastructure for AI-driven work. This in-person role is based at TierZero's San Francisco headquarters, with a hybrid schedule requiring three days onsite each week. As a founding member of the technical staff, work directly with the CEO, CTO, and customers to influence the direction of TierZero’s core products and systems. The position calls for flexibility as priorities shift and close collaboration across the company. What you will do Design and develop AI systems that handle large volumes of unstructured data. Build full-stack product features, informed by direct feedback from users. Enhance the product so agents are intelligent, reliable, and easy for engineers to use. Create systems to automatically evaluate outputs from large language models and improve agentic reasoning through self-play and feedback. Construct machine learning pipelines, including data ingestion, feature creation, embedding stores, retrieval-augmented generation (RAG) pipelines, vector search, and graph databases. Experiment with open-source and emerging large language models to compare different approaches. Develop scalable infrastructure for long-running, multi-step agents, including memory, state management, and asynchronous workflows. Requirements Interest in working with large language models, managed cloud platforms, cloud infrastructure, and observability tools. At least 5 years of professional experience or significant open-source contributions. Comfort with shifting priorities and tackling new technical problems. Strong product focus and commitment to customer outcomes. Openness to learning from a team with a track record of delivering over $10 billion in value. Ability to work onsite in San Francisco three days per week. Bonus: Experience in a startup setting and familiarity with startup dynamics.
Who are we?At Cohere, our mission is to elevate intelligence to benefit humanity. We specialize in training and deploying cutting-edge models for developers and enterprises focused on creating AI systems that deliver extraordinary experiences such as content generation, semantic search, retrieval-augmented generation, and intelligent agents. We view our work as pivotal to the broad acceptance of AI technologies.We are passionate about our creations. Every team member plays a vital role in enhancing our models' capabilities and the value they provide to our customers. We thrive on hard work and speed, always prioritizing our clients' needs.Cohere is a diverse team of researchers, engineers, designers, and more, all dedicated to their craft. Each individual is a leading expert in their field, and we recognize that a variety of perspectives is essential to developing exceptional products.Join us in our mission and help shape the future of AI!Why this role?Are you excited about architecting high-performance, scalable, and reliable machine learning systems? Do you aspire to shape and construct the next generation of AI platforms that enhance advanced NLP applications? We are seeking talented Members of Technical Staff to join our Model Serving team at Cohere. This team is responsible for the development, deployment, and operation of our AI platform, which delivers Cohere's large language models via user-friendly API endpoints. In this role, you will collaborate with multiple teams to deploy optimized NLP models in production settings characterized by low latency, high throughput, and robust availability. Additionally, you will have the opportunity to work directly with customers to create tailored deployments that fulfill their unique requirements.
Join Reka as a Member of the Technical Staff in Applied AI!Leverage cutting-edge AI models to tackle intricate real-world challenges.Engage in close collaboration with researchers and fellow team members to explore the latest developments in AI and ML.Partner with our customers to seamlessly integrate our innovative models into their existing technology frameworks.Drive business success with a strong sense of product ownership and accountability.Be part of a pioneering team in a rapidly growing environment, taking on diverse roles.
At Tzafon, we are pioneering the development of scalable computing systems and pushing the boundaries of machine intelligence with our foundation model lab. Located in vibrant cities such as San Francisco, Zurich, and Tel Aviv, we have successfully secured over $12 million in funding to fuel our mission of expanding the horizons of AI technology.Our dynamic team comprises engineers and scientists with extensive expertise in machine learning infrastructure and research. Founded by IOI and IMO medalists, PhDs, and seasoned professionals from top tech firms, we specialize in training advanced models and constructing robust infrastructures to automate tasks across various real-world scenarios.In this role, you will collaborate closely with our product and post-training teams to deploy Large Action Models that drive impactful results. Your responsibilities will include building evaluation frameworks, establishing benchmarks, and creating fine-tuning pipelines to ensure optimal model performance.
Our VisionAt ReflectionAI, we strive to create open superintelligence and ensure its accessibility for everyone.Our team is dedicated to developing open weight models for individuals, organizations, and even nations. Our collective expertise comes from leading AI institutions such as DeepMind, OpenAI, Google Brain, Meta, Character.AI, Anthropic, and more.Role OverviewConduct research and develop solutions focusing on algorithms, scaling laws, data processing, optimizers, and model architecture.Design and execute scientific experiments to enhance our understanding of scaling large language models and improving data efficiency.Apply cutting-edge techniques from the deep learning literature to our projects.Independently lead small research initiatives while collaborating on larger projects.Enhance our training infrastructure for optimal scaling efficiency.Contribute across the entire technology stack, from low-level optimizations to high-level model design.Your ProfilePossess a graduate degree (MS or PhD) in Computer Science, Machine Learning, or a related field.Demonstrate strong software engineering skills with experience in large-scale systems development.Have prior experience with large-scale ETL processes and preparing training data.Possess a deep understanding of large-scale machine learning, specifically regarding language models, distributed training, and scaling.Be proficient in Python and familiar with deep learning frameworks, preferably PyTorch.Effectively navigate the trade-offs between research goals and practical engineering challenges.Excel in a fast-paced, high-agency startup culture with a proactive approach.Exhibit strong communication skills and a collaborative mindset.Show a passion for pushing the boundaries of intelligence.What We Provide:We believe that building truly open superintelligence starts with a solid foundation. Joining ReflectionAI means being part of a tightly-knit, highly talented team, where you will help shape our future and redefine the landscape of open foundational models.
Our MissionAt Reflection AI, our goal is to develop open superintelligence and make it universally accessible.We are pioneering open weight models tailored for individuals, agents, enterprises, and even entire nations. Our diverse team comprises talented AI researchers and industry veterans from prestigious organizations such as DeepMind, OpenAI, Google Brain, Meta, Character.AI, Anthropic, and many more.Role OverviewConstruct and enhance distributed training systems that drive the pre-training of cutting-edge models.Collaborate with research teams to design and execute extensive training runs for foundational models.Create infrastructure that facilitates efficient training across thousands of GPUs leveraging contemporary distributed training frameworks.Enhance training throughput, stability, and efficiency for extensive model training tasks.Work closely with pre-training researchers to convert experimental concepts into scalable, production-ready training systems.Boost performance of distributed training tasks through optimization of communication, memory management, and GPU utilization.Develop and maintain training pipelines that accommodate large-scale datasets, checkpointing, and iterative experiments.Identify and resolve performance bottlenecks within distributed training systems, including model parallelism, GPU communication, and training runtime environments.Contribute to the creation of systems that promote swift experimentation and iteration on novel training methods.
Mar 24, 2026
Sign in to browse more jobs
Create account — see all 2,687 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.