Technical Staff Member Pre Training At Reflectionai San Francisco jobs in San Francisco – Browse 10,479 openings on RoboApply Jobs

Technical Staff Member Pre Training At Reflectionai San Francisco jobs in San Francisco

Open roles matching “Technical Staff Member Pre Training At Reflectionai San Francisco” with location signals for San Francisco. 10,479 active listings on RoboApply Jobs.

10,479 jobs found

1 - 20 of 10,479 Jobs
Apply
companyReflectionAI logo
Full-time|On-site|SF

Our VisionAt ReflectionAI, we strive to create open superintelligence and ensure its accessibility for everyone.Our team is dedicated to developing open weight models for individuals, organizations, and even nations. Our collective expertise comes from leading AI institutions such as DeepMind, OpenAI, Google Brain, Meta, Character.AI, Anthropic, and more.Role OverviewConduct research and develop solutions focusing on algorithms, scaling laws, data processing, optimizers, and model architecture.Design and execute scientific experiments to enhance our understanding of scaling large language models and improving data efficiency.Apply cutting-edge techniques from the deep learning literature to our projects.Independently lead small research initiatives while collaborating on larger projects.Enhance our training infrastructure for optimal scaling efficiency.Contribute across the entire technology stack, from low-level optimizations to high-level model design.Your ProfilePossess a graduate degree (MS or PhD) in Computer Science, Machine Learning, or a related field.Demonstrate strong software engineering skills with experience in large-scale systems development.Have prior experience with large-scale ETL processes and preparing training data.Possess a deep understanding of large-scale machine learning, specifically regarding language models, distributed training, and scaling.Be proficient in Python and familiar with deep learning frameworks, preferably PyTorch.Effectively navigate the trade-offs between research goals and practical engineering challenges.Excel in a fast-paced, high-agency startup culture with a proactive approach.Exhibit strong communication skills and a collaborative mindset.Show a passion for pushing the boundaries of intelligence.What We Provide:We believe that building truly open superintelligence starts with a solid foundation. Joining ReflectionAI means being part of a tightly-knit, highly talented team, where you will help shape our future and redefine the landscape of open foundational models.

Oct 7, 2025
Apply
companyReflection AI logo
Full-time|On-site|San Francisco

Our MissionAt Reflection AI, our goal is to develop open superintelligence and make it universally accessible.We are pioneering open weight models tailored for individuals, agents, enterprises, and even entire nations. Our diverse team comprises talented AI researchers and industry veterans from prestigious organizations such as DeepMind, OpenAI, Google Brain, Meta, Character.AI, Anthropic, and many more.Role OverviewConstruct and enhance distributed training systems that drive the pre-training of cutting-edge models.Collaborate with research teams to design and execute extensive training runs for foundational models.Create infrastructure that facilitates efficient training across thousands of GPUs leveraging contemporary distributed training frameworks.Enhance training throughput, stability, and efficiency for extensive model training tasks.Work closely with pre-training researchers to convert experimental concepts into scalable, production-ready training systems.Boost performance of distributed training tasks through optimization of communication, memory management, and GPU utilization.Develop and maintain training pipelines that accommodate large-scale datasets, checkpointing, and iterative experiments.Identify and resolve performance bottlenecks within distributed training systems, including model parallelism, GPU communication, and training runtime environments.Contribute to the creation of systems that promote swift experimentation and iteration on novel training methods.

Mar 24, 2026
Apply
companyReflectionAI logo
Full-time|On-site|SF

Our VisionAt ReflectionAI, we aim to create open superintelligence and ensure its availability to everyone. Our team, comprised of leading AI researchers and industry innovators from renowned organizations like DeepMind, OpenAI, Google Brain, Meta, and Character.AI, is on a mission to develop open-weight models that cater to individuals, agents, enterprises, and even nations.Role OverviewThe HR Business Partner will act as a strategic consultant to both technical and non-technical leaders. This critical position focuses on enhancing organizational effectiveness, fostering high-performance cultures, and guiding management through intricate people-related decisions.This pivotal role will lay the groundwork for the HRBP function, creating essential management toolkits, refining organizational design, advising on performance metrics, and facilitating connections between leadership and the broader People team. You will play a key role in ensuring our growth is intentional, high-performing, and well-judged, while establishing a robust foundation for future organizational expansion.Your ResponsibilitiesStrategic Leadership PartnershipCollaborate with leaders to enhance team effectiveness and organizational vitality.Provide insights on organizational design, role definition, and team structuring.Assist in workforce planning in alignment with business goals.Coach managers on performance feedback, accountability, and effective leadership practices.Support leaders in navigating change and scalability challenges.Performance & Talent ManagementGuide managers through performance assessments, calibration processes, and promotion considerations.Maintain high performance standards and consistency across all teams.Facilitate talent planning and succession discussions.Assist in performance improvement strategies and complex personnel decisions.Ensure that decisions are made thoughtfully, consistently, and are well-documented.Develop scalable frameworks for performance management and separation practices.Building for GrowthIdentify deficiencies in people practices and implement practical, scalable solutions.Partner with leaders to drive organizational health and performance.

Mar 4, 2026
Apply
companyReflection AI logo
Full-time|On-site|SF

Our MissionAt Reflection AI, our goal is to create open superintelligence and ensure its accessibility for everyone.We are pioneering open weight models for various users, including individuals, enterprises, and even nation-states. Our talented team comprises AI researchers and industry veterans from leading organizations such as DeepMind, OpenAI, Google Brain, Meta, Character.AI, and Anthropic.Role OverviewDevelop systems that convert robust pre-trained models into aligned and versatile agents.Lead research and engineering efforts to advance post-training practices, focusing on data curation and large-scale optimization.Create data generation frameworks, reward models, reinforcement learning algorithms, and techniques for inference-time scaling.Collaborate with both pre-training and post-training teams to achieve significant enhancements in model capabilities.Help refine our understanding of how large models learn to reason, follow instructions, and evolve through reinforcement learning.Your ProfileSolid grasp of machine learning principles with hands-on experience in large-scale LLM training.Proficient engineering skills, with the ability to navigate intricate ML codebases and distributed systems.Experience in enhancing model performance through data, reward modeling, or reinforcement learning techniques.Track record of leading ambitious research or engineering projects resulting in measurable improvements.Thrives in a dynamic, high-agency startup atmosphere; oriented towards action and clarity in execution.Ability to work seamlessly across research and infrastructure boundaries.Excellent communication skills and a collaborative mindset.Driven by a passion for pushing the boundaries of intelligence.What We Provide:At Reflection AI, we believe that to truly build open superintelligence, it must be rooted in a strong foundation. By joining us, you will contribute to building from the ground up within a compact, highly skilled team. Together, we will shape the future of our company and the landscape of open foundational models.We aim for you to accomplish the most impactful work of your career, with the assurance that you and your loved ones are well-supported.

Oct 7, 2025
Apply
companyReflectionAI logo
Full-time|On-site|SF

Our VisionAt ReflectionAI, we are on a mission to develop open superintelligence and ensure its accessibility for everyone.We are creating open weight models tailored for individuals, agents, enterprises, and even nation-states. Our dedicated team comprises AI researchers and innovators from renowned organizations like DeepMind, OpenAI, Google Brain, Meta, Character.AI, Anthropic, and beyond.We are always eager to connect with exceptional individuals. If you resonate with our mission but do not find a suitable position listed, we encourage you to reach out to us.What We Provide:We believe that building truly open superintelligence requires a strong foundation. Joining ReflectionAI means being part of a small, highly skilled team where you will shape the future of the company and the landscape of open foundational models.We want you to engage in the most impactful work of your career, knowing that you and your loved ones are well-supported.Competitive Compensation: Attractive salary and equity designed to retain top talent globally.Comprehensive Health & Wellness: Extensive medical, dental, vision, life, and disability insurance.Family Support: Fully paid parental leave for all new parents, including those on adoptive and surrogate journeys. Financial assistance for family planning.Work-Life Balance: Generous paid time off when needed, relocation assistance, and various perks to enhance your time.Team Engagement Opportunities: Daily lunches and dinners provided, along with regular team off-sites and celebrations.

Oct 7, 2025
Apply
companytierzero logo
Full-time|Hybrid|SF HQ

About tierzero tierzero builds tools that help engineering teams manage production code with stronger incident response, better operational visibility, and collaborative knowledge sharing. Companies like Discord, Drata, and Framer use tierzero to support their infrastructure in an AI-driven landscape. Backed by $7 million from investors including Accel and SV Angel, tierzero is growing quickly from its San Francisco headquarters. Role Overview: Founding Member of Technical Staff This is a hands-on role shaping tierzero’s core product and systems from the ground up. The founding technical team works closely with the CEO, CTO, and early customers to solve real engineering challenges. The position is based in San Francisco, with a hybrid schedule: three days each week in the office. What You’ll Do Design and build intelligent AI systems that process large volumes of unstructured data Deliver full-stack features informed by real-time user feedback Improve usability so AI agents are both effective and trustworthy for engineers Develop systems for automated evaluation of LLM outputs, including feedback loops and self-play Construct machine learning pipelines for data ingestion, feature generation, embedding storage, retrieval-augmented generation (RAG), vector search, and graph databases Prototype with open-source LLMs to understand their strengths and weaknesses Create scalable infrastructure for complex, multi-step agents, focusing on memory, state management, and asynchronous workflows Who We’re Looking For 5+ years of professional experience or significant open-source contributions Interest in LLMs, MCPs, cloud infrastructure, and observability tools Comfort working in changing, ambiguous situations Product-focused and customer-first mindset Experience learning from and collaborating with engineers from diverse backgrounds Bonus: Previous experience in a startup setting Work Location Hybrid schedule: three days per week in-person at the San Francisco HQ.

Apr 16, 2026
Apply
companytierzero logo
Full-time|Hybrid|SF HQ

About tierzero tierzero helps engineering teams build and deploy code with greater speed and operational clarity in an AI-driven world. The company focuses on improving incident response, operational visibility, and knowledge sharing for engineers. Backed by $7 million in funding from investors like Accel and SV Angel, tierzero supports large-scale systems for clients such as Discord, Drata, and Framer. Role Overview: Founding Member of Technical Staff This role is based at tierzero's San Francisco headquarters. In-person work is required three days a week. As a founding member of the technical team, you will help design and build core products and systems from the ground up. Collaboration is central: expect to work closely with the CEO, CTO, and customers. Projects span a wide range of technical challenges and product areas. What You Will Do Design and implement intelligent AI systems that process and reason over large volumes of unstructured data. Develop full-stack features, incorporating direct feedback from users. Improve the product experience so intelligent agents are practical and reliable for engineers. Create systems that automatically evaluate LLM outputs and refine agent reasoning using self-play and feedback loops. Build machine learning pipelines covering data ingestion, feature generation, embedding stores, RAG pipelines, vector search, and graph databases. Prototype and experiment with open-source and advanced LLMs to weigh different approaches. Set up scalable infrastructure for long-running, multi-step agents, including memory management, state handling, and asynchronous workflows. What We Look For At least 5 years of professional or open-source experience in a relevant technical field. Comfort working in a setting that changes and evolves quickly. Strong product focus and an understanding of customer needs. Interest in LLMs, MCPs, cloud infrastructure, and observability tools. Ability to learn from and collaborate with engineers who have delivered over $10 billion in value. Commitment to working onsite in San Francisco three days per week. Startup experience is a plus.

Apr 20, 2026
Apply
companytierzero logo
Full-time|On-site|SF HQ

Role overview As a Founding Member of Technical Staff at tierzero in San Francisco, this early hire will play a key role in shaping both the company’s technology and its products from day one. The position calls for hands-on work, building new solutions and collaborating with the team to set technical direction. What you will do Work directly with the founding team on core product development Help establish engineering practices and technical standards Take on projects designed to set new benchmarks in the industry Influence decisions that guide the company’s technology roadmap Location This role is based at tierzero’s headquarters in San Francisco.

Apr 28, 2026
Apply
companytierzero logo
Full-time|Hybrid|SF HQ

TierZero seeks a Founding Member of Technical Staff to join the team in San Francisco. This in-person position requires working from the SF headquarters at least three days per week. Role overview This role centers on close collaboration with a group of engineers who have collectively delivered over $10 billion in value during their careers. Expect to work side by side with teammates, sharing ideas and building strong connections in the office. The environment often shifts, so adaptability and comfort with changing priorities are important. Key responsibilities Work directly with experienced engineers to design and build new products Prioritize customer needs and satisfaction in product decisions Develop solutions using large language models (LLMs), multi-cloud platforms (MCPs), cloud infrastructure, and observability tools Requirements Minimum 5 years of professional engineering experience or a strong record of open-source contributions Experience in startups and familiarity with their unique challenges is a plus Location This position is based in San Francisco. In-office presence is required three days each week for collaboration.

Apr 23, 2026
Apply
companyReflection AI logo
Full-time|On-site|San Francisco

About the Role Reflection AI is hiring a Member of Technical Staff focused on Infrastructure Security in San Francisco. This position plays a key part in protecting the company’s infrastructure from security threats. What You Will Do Work with teams across the company to design, implement, and monitor security protocols and systems Help safeguard digital assets by maintaining the integrity and security of infrastructure

Apr 16, 2026
Apply
company
Full-time|On-site|San Francisco

About Liquid AIFounded as a spin-off from MIT CSAIL, Liquid AI specializes in the development of versatile artificial intelligence systems optimized for performance across various deployment environments, ranging from data center accelerators to on-device hardware. Our focus on low latency, minimal memory consumption, privacy, and reliability allows us to partner effectively with enterprises in sectors such as consumer electronics, automotive, life sciences, and financial services. As we experience rapid growth, we are eager to welcome talented individuals who can contribute to our mission.The OpportunityThis unique position places you at the forefront of advanced foundation models and their practical applications. You will oversee post-training projects from start to finish for some of the world’s leading enterprises, while also playing a vital role in the ongoing development of Liquid’s core models.In this role, you will not have to choose between impactful customer work and foundational development; instead, you will enjoy deep involvement in both. You will have significant influence over how models are adapted, assessed, and deployed, directly contributing to the enhancement of Liquid’s post-training capabilities.If you are passionate about data integrity, evaluation processes, and ensuring that models perform effectively in real-world scenarios, this is your chance to redefine the standards of applied AI at a foundation-model company.What We're Looking ForWe seek an individual who:Takes ownership: You will lead post-training initiatives from customer requirements to delivery and evaluation.Thinks end-to-end: You will connect the dots across data generation, training, alignment, and evaluation as a cohesive system.Is pragmatic: You prioritize model quality and customer satisfaction over theoretical publications.Communicates clearly: You can interpret customer needs and effectively communicate with internal technical teams, providing constructive feedback when necessary.The WorkServe as the technical lead for post-training engagements with enterprise clients.Translate client requirements into actionable post-training specifications and workflows.Design and implement data generation, filtering, and quality assessment methodologies.Conduct supervised fine-tuning, preference alignment, and reinforcement learning processes.Create task-specific evaluations, analyze outcomes, and integrate insights back into core post-training workflows.

Jan 23, 2026
Apply
companyReflection AI logo
Full-time|On-site|SF

Our MissionAt Reflection AI, we are dedicated to creating accessible open superintelligence for everyone.Our team is composed of top-tier AI researchers and innovators from prestigious organizations like DeepMind, OpenAI, Google Brain, Meta, Character.AI, Anthropic, and more. We are committed to building open weight models for individuals, enterprises, and even nation states.About the RolePerform essential comparative analyses to deepen our insights into model capabilities.Design and enhance evaluation systems and processes that establish robust feedback loops between data, evaluations, and model behavior.Create generalizable evaluation frameworks that effectively capture reasoning, alignment, and practical usefulness.Collaborate closely with pre-training, post-training, and applied teams to translate insights into tangible model improvements.Expand the boundaries of measurable metrics, utilizing synthetic evaluations, human feedback, and real-world interaction data.About YouProficient in statistical analysis and experimental design, with the ability to rigorously measure model advancements.Knowledgeable in LLM evaluation methodologies, including static benchmarks, human preference evaluations, and agentic tasks.Possess a high degree of agency and thrive in a fast-paced startup atmosphere, prioritizing impact over rigid processes.Eager to work in a pioneering lab, shaping how we measure and accelerate the development of more capable models.Collaborative, detail-oriented, and driven by the desire to create effective feedback loops that enhance model performance.What We Offer:We believe in building superintelligence that is genuinely open, starting from the ground up. Joining Reflection means you will be part of a small, talent-dense team where you will help shape our future and push the boundaries of open foundational models.You will have the opportunity to engage in the most impactful work of your career, knowing that you and your loved ones are well-supported.Competitive Compensation: Salary and equity structured to attract and retain top global talent.Health & Wellness: Comprehensive medical, dental, vision, life, and disability insurance.

Dec 17, 2025
Apply
companyListen Labs logo
Full-time|On-site|San Francisco, CA

Overview: Join Listen Labs as we embark on an exciting journey to revolutionize decision-making for companies through cutting-edge AI technology. With a robust product roadmap planned for the next six months, we are expanding our engineering team. We are in search of a highly technical individual who thrives on solving complex problems and is eager to contribute to our mission. If you are passionate about innovation and want to be part of a team that includes several IOI medalists, we want to hear from you!About Listen Labs:Listen Labs is at the forefront of AI-powered research, enabling teams to extract valuable insights from customer interviews in a matter of hours rather than months. Our platform assists users in analyzing conversations, identifying key themes, and making informed product decisions swiftly.Why Join Us?Exceptional Team: Our founding team consists of seasoned entrepreneurs with a proven track record in AI, alongside top talents from renowned organizations such as Jane Street, Twitter, Stripe, and Goldman Sachs.Rapid Growth: Backed by Sequoia Capital, we have grown from zero to a $14M run-rate in under a year, with a dedicated team of 40.Impressive Clientele: We are witnessing significant traction across various sectors, securing enterprise clients like Google, Microsoft, and Nestlé.Product Excellence: Our differentiated product offers an industry-leading win rate, which is a testament to our commitment to quality.Market Success: Our customer base is expanding rapidly, with numerous six-figure contracts leading to further growth.Viral Impact: Our product's interviews reach tens of thousands of viewers, driving organic growth and interest from Fortune 500 companies.

Feb 25, 2026
Apply
companyReflection AI logo
Full-time|On-site|SF

Our MissionAt Reflection AI, we are dedicated to creating open superintelligence and making it universally accessible.We are pioneering open weight models designed for individuals, agents, enterprises, and even nations. Our talented team consists of AI researchers and innovators from leading organizations such as DeepMind, OpenAI, Google Brain, Meta, Character.AI, and Anthropic.Role OverviewData is becoming increasingly vital in the realm of AI advancements. Recent significant breakthroughs have frequently stemmed from enhanced data rather than new architectures.As a vital member of the Data Team, your primary role will be to guarantee that the data utilized for training our models adheres to the highest standards of quality, reliability, and impact. You will have a direct influence on our models' performance in essential capabilities.Collaborating with exceptional researchers on our pre-training teams, you will help transform abstract concepts of "good data" into specific, quantifiable standards applicable across extensive data campaigns. We are seeking engineers who possess robust engineering skills combined with a profound curiosity about data quality and its relevance to model performance.In close partnership with our pre-training teams, you will:Take ownership of upstream data quality for LLM pre-training, functioning as either a specialist or generalist across various languages and modalities.Collaborate with research and pre-training teams to convert requirements into measurable quality signals, providing actionable feedback to external data vendors.Incorporate human-in-the-loop processes while designing, validating, and scaling automated QA methods to consistently measure data quality across large-scale campaigns.Create reusable QA pipelines that ensure the delivery of high-quality data to pre-training teams for model training.Continuously monitor and report on data quality, driving ongoing improvements in quality standards, processes, and acceptance criteria.Candidate ProfileStrong engineering background with experience in building data pipelines, QA systems, or evaluation workflows for pre-training data.Detail-oriented with an analytical mindset, capable of identifying failure modes, inconsistencies, and nuanced issues affecting data quality.Solid understanding of the influence of data quality on pre-training, with the capacity to translate quality concerns into tangible signals, decisions, and feedback.

Jan 8, 2026
Apply
companyReflection AI logo
Full-time|On-site|San Francisco

Our VisionAt Reflection AI, we are on a mission to develop open superintelligence and democratize its access for everyone.Our team, hailing from renowned organizations like DeepMind, OpenAI, Google Brain, Meta, Character.AI, and Anthropic, is dedicated to creating open weight models that cater to individuals, enterprises, and even nations.Role OverviewDesign, construct, and manage state-of-the-art GPU infrastructure for high-throughput model inference and mid-training processes.Develop systems that facilitate synthetic data generation and reinforcement learning pipelines at scale.Create high-performance inference platforms capable of serving and evaluating models across thousands of GPUs.Optimize throughput, latency, and GPU utilization for large language model inference and deployment tasks.Construct infrastructure that enhances reinforcement learning pipelines, including large-scale rollout generation, evaluation, and policy enhancement loops.Collaborate closely with research teams to support distributed reinforcement learning workloads and extensive model evaluation infrastructure.Enhance model execution performance through kernel-level optimization, model parallelism strategies, and GPU runtime improvements.Develop distributed systems that enable large-scale synthetic data generation and reinforcement learning-driven training workflows.Identify and address performance bottlenecks across inference runtimes, GPU kernels, networking, and distributed computing systems.

Mar 24, 2026
Apply
company
Full-time|On-site|San Francisco

Technical Staff MemberAt humans&, we are dedicated to pioneering a human-centric approach to artificial intelligence. Our mission is to redefine AI by placing individuals and their interpersonal connections at the heart of our innovations.We invite talented researchers and engineers who have made significant contributions to the cutting-edge of AI to join our dynamic team. If you excel in your field and are driven to innovate, we want to hear from you!

Jan 20, 2026
Apply
companyCatalog logo
Full-time|On-site|San Francisco

At Catalog, we are pioneering the commerce infrastructure for AI—creating the essential framework that enables digital agents to not only explore the web but also comprehend, analyze, and engage with products. Our innovations drive the future of AI-driven shopping experiences, fundamentally transforming how consumers discover and purchase items online.Role OverviewAs a Technical Staff Member, you will be instrumental in developing core systems, shaping our engineering culture, and transitioning our vision from prototype to a robust platform. This role requires full-stack expertise and a commitment to owning and resolving challenges from start to finish.Who You AreYou have experience creating beloved and trusted products from the ground up.You combine technical proficiency with a keen product sense and data-driven intuition.You are well-versed in AI technologies.You prioritize speed, write clean code, and ensure thorough instrumentation.You seek a high level of ownership within a small, talent-rich team based in San Francisco.Challenges You Will TackleDevelop and deploy agentic-search APIs that deliver structured and real-time product data in milliseconds.Build checkout systems enabling agents to conduct transactions with any merchant.Create an embeddings and retrieval layer that optimizes recall, precision, and cost efficiency.Establish a product graph and ranking pipeline that adapts based on actual user outcomes.Preferred QualificationsProven experience shipping data-centric products in a live environment.Experience with recommendation systems or information retrieval methodologies.Familiarity with API development, search indexing, and data pipeline construction.Our Work CultureWe operate with a small, high-trust, and highly motivated team, fostering an environment of in-person collaboration in North Beach, San Francisco. Our process involves debate, decision-making, and execution.If your profile aligns with our needs, we will contact you to arrange 2-3 brief technical interviews, followed by an onsite meeting in our office where you will collaborate on a small project, exchange ideas, and meet the team.

Oct 15, 2025
Apply
company
Full-time|On-site|San Francisco

At Magic, we are dedicated to creating safe artificial general intelligence (AGI) that propels humanity forward in tackling the most pressing global challenges. We believe that the most effective route to achieving safe AGI involves automating the research and code generation processes to enhance models and resolve alignment issues more reliably than humans can achieve independently. Our methodology incorporates cutting-edge pre-training at scale, domain-specific reinforcement learning (RL), ultra-long context capabilities, and optimized inference-time computations.Role OverviewIn your role as a Software Engineer on the Pre-training Systems team, you will be responsible for designing and managing the distributed infrastructure necessary for training Magic’s long-context models at scale.This position emphasizes large-scale model training utilizing extensive GPU clusters. You will operate at the intersection of deep learning and distributed systems, ensuring that training processes are efficient, reliable, and reproducible under extreme conditions.Magic’s long-context models present complex systems challenges, such as sustained memory usage, communication overhead across thousands of devices, long-duration jobs requiring fault tolerance, and efficient sequence packing within hardware limitations. You will take ownership of the systems that ensure large-scale pre-training is both stable and rapid.Your ContributionsScale distributed training across large GPU clusters, implementing data, tensor, and pipeline parallelism.Optimize communication strategies and gradient synchronization.Enhance checkpointing, fault tolerance, and job recovery mechanisms.Profile and resolve performance bottlenecks across computing, networking, and storage.Advance experiment reproducibility and orchestration workflows.Boost hardware utilization and overall training throughput.Collaborate with Kernel and Research teams to align model architecture with system capabilities.Qualifications We SeekSolid foundation in software engineering and distributed systems.Experience with training large models in multi-node GPU environments.In-depth understanding of parallelism techniques and performance trade-offs.Experience in debugging cross-layer issues within production ML systems.Demonstrated ownership mentality and capability to manage critical infrastructure.Proven track record in enhancing the performance or reliability of large-scale systems.

Feb 28, 2026
Apply
companyAdyen logo
Full-time|On-site|San Francisco

Join our dynamic team at Adyen as a Technical Staff Member in San Francisco! We are seeking innovative minds passionate about technology and problem-solving. In this role, you will collaborate with cross-functional teams to craft solutions that enhance our services and improve customer experiences.

Mar 6, 2026
Apply
company
Full-time|On-site|San Francisco

About Liquid AIOriginating from MIT CSAIL, Liquid AI specializes in the development of general-purpose AI systems designed to operate seamlessly across various platforms, including data center accelerators and on-device hardware. Our focus is on delivering low latency, efficient memory usage, privacy, and reliability. We collaborate with organizations in diverse sectors such as consumer electronics, automotive, life sciences, and financial services. As we experience rapid growth, we seek outstanding talent to join our mission.The OpportunityThe Training Infrastructure team is at the forefront of building the distributed systems that empower our next-generation Liquid Foundation Models. As our operations expand, we aim to innovate, implement, and enhance the infrastructure crucial for large-scale training.This role is centered around high ownership of training systems, emphasizing runtime, performance, and reliability rather than a typical platform or SRE function. You will collaborate within a small, agile team, creating vital systems from the ground up instead of working with pre-existing infrastructure.While San Francisco and Boston are preferred, we are open to other locations.What We're Looking ForWe are seeking an individual who:Embraces the complexity of distributed systems: Our team is dedicated to maintaining stability during extensive training runs, troubleshooting training failures across GPU clusters, and enhancing overall performance.Is passionate about building: We value team members who take pride in developing robust, efficient, and reliable infrastructure.Excels in uncertain environments: Our systems are designed to support evolving model architectures. You will be making decisions based on incomplete information and rapidly iterating.Aligns with team goals and delivers results: The best engineers on our team align with collective priorities while providing data-driven feedback when challenges arise.The WorkDesign and develop core systems that ensure quick and reliable large training runs.Create scalable distributed training infrastructure for GPU clusters.Implement and refine parallelism and sharding strategies for evolving architectures.Optimize distributed efficiency through topology-aware collectives, communication/compute overlap, and straggler mitigation.Develop data loading systems to eliminate I/O bottlenecks for multimodal datasets.

Jul 29, 2025

Sign in to browse more jobs

Create account — see all 10,479 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.