Infrastructure Research Engineer Numerics At Thinking Machines San Francisco jobs in San Francisco – Browse 11,544 openings on RoboApply Jobs

Infrastructure Research Engineer Numerics At Thinking Machines San Francisco jobs in San Francisco

Open roles matching “Infrastructure Research Engineer Numerics At Thinking Machines San Francisco” with location signals for San Francisco. 11,544 active listings on RoboApply Jobs.

11,544 jobs found

1 - 20 of 11,544 Jobs
Apply
companyThinking Machines Lab logo
Full-time|$350K/yr - $475K/yr|On-site|San Francisco

At Thinking Machines Lab, our mission is to empower humanity by advancing collaborative general intelligence. We envision a future where everyone has access to the knowledge and tools necessary to make AI work for their individual needs and goals. Our team comprises scientists, engineers, and innovators who have developed some of the most widely adopted AI products, including ChatGPT and Character.ai, alongside open-weight models like Mistral, as well as popular open-source initiatives such as PyTorch, OpenAI Gym, Fairseq, and Segment Anything.About the RoleWe are seeking a highly skilled infrastructure research engineer to architect and develop core systems that facilitate efficient large-scale model training, with a strong emphasis on numerics. You will enhance the numerical foundations of our distributed training stack, focusing on precision formats, kernel optimizations, and communication frameworks to ensure that training trillion-parameter models is stable, scalable, and fast.This position is perfect for an individual who excels at the intersection of research and systems engineering—a creator who comprehends both the mathematics of optimization and the practicalities of distributed computing.Note: This is an "evergreen role" that remains open for ongoing expressions of interest. While we receive numerous applications and there may not always be an immediate opening that perfectly matches your skills and experience, we encourage you to apply. We continuously review applications and will contact applicants as new opportunities arise. You are welcome to reapply if you gain additional experience, but please refrain from applying more than once every six months. You may also notice postings for specific roles related to particular projects or teams; in those instances, you are welcome to apply for those positions in addition to the evergreen role.What You’ll DoDesign and optimize distributed training infrastructure for large-scale LLMs, ensuring performance, stability, and reproducibility in multi-GPU and multi-node environments.Implement and assess low-precision numerics (e.g., BF16, MXFP8, NVFP4) to enhance efficiency while maintaining model quality.Develop kernels and communication primitives that leverage hardware-level support for mixed and low-precision arithmetic.Collaborate with research teams to co-design model architectures and training methodologies that align with new numeric formats and stability requirements.Prototype and benchmark scaling strategies, including data, tensor, and pipeline parallelism that integrate precision-adaptive computation and quantized communication.Contribute to the design of our internal orchestration and monitoring frameworks.

Nov 27, 2025
Apply
companyNumeral logo
Full-time|On-site|HQ - San Francisco, CA

About NumeralNumeral is revolutionizing the automation framework for online commerce, beginning with the often tedious task of sales tax compliance. We take care of everything from registration to remittance, providing an exceptional service that allows e-commerce businesses to concentrate on their core mission: expanding their products, customer base, and teams.As one of the rapidly growing companies from Y Combinator’s Winter 2023 cohort, we are backed by prestigious investors such as Benchmark Capital. Our team boasts extensive experience from the pioneering days at Stripe, Airbnb, Notion, and other leading firms, and we are poised to bring that same level of expertise, speed, and ambition to an industry ripe for transformation.Numeral may be small, but our impact is significant. Our growth is already approaching unmanageable levels, meaning each new hire will play a crucial role in shaping our company’s future. If you’re eager to join as an early team member and desire the kind of ownership that can define your career, we would love to connect with you.MissionOperating an online business today often requires juggling numerous responsibilities, many of which are not why founders launched their ventures. Our mission is to alleviate the administrative and accounting pressures that divert businesses from their passions.We have already assisted hundreds of merchants in sidestepping the headache of establishing large finance teams solely to manage tax compliance. Looking ahead, we aim to broaden our positive impact by developing the automation layer that enables online businesses to remain agile, compliant, and prepared for the future.About the RoleWe are seeking a foundational Software Engineer (Infrastructure) who excels at tackling complex distributed systems challenges at scale. In this role, you will design and implement core infrastructure, enhance service reliability and observability, and guarantee the platform’s scalability as we accommodate growing transaction volumes and integrations.This is a pivotal role with high leverage: your contributions will shape the architecture and technical direction of our infrastructure platform, directly influencing our customers’ experience and the trajectory of the company.ResponsibilitiesDesign and develop highly scalable, secure, and reliable infrastructure to support critical APIs, services, and data pipelines.Lead infrastructure architecture decisions focusing on performance, observability, and fault tolerance.

Sep 9, 2025
Apply
companyThinking Machines Lab logo
Full-time|$350K/yr - $475K/yr|On-site|San Francisco

At Thinking Machines Lab, we are on a mission to empower humanity by advancing collaborative general intelligence. Our vision is to create a future where everyone has access to the knowledge and tools necessary to harness AI for their unique needs and objectives.We are a diverse team of scientists, engineers, and builders responsible for developing some of the most influential AI products on the market, such as ChatGPT and Character.ai. Our contributions extend to open-weight models like Mistral and popular open-source projects including PyTorch, OpenAI Gym, Fairseq, and Segment Anything.About the RoleWe are seeking talented engineers to join our team and develop the libraries and tools that will accelerate research efforts at Thinking Machines. You will take charge of our internal infrastructure—creating evaluation libraries, reinforcement learning training libraries, and experiment tracking platforms—while building systems that enhance research velocity over time.This position emphasizes collaboration. You will work closely with researchers to identify bottlenecks and pain points, ensuring that they trust your systems to function seamlessly and find them enjoyable to use.What You'll DoDesign, build, and manage research infrastructure, including evaluation frameworks, RL training systems, experiment tracking platforms, visualization tools, and shared utilities.Develop high-throughput, scalable pipelines for distributed evaluation, reward modeling, and multimodal assessment.Establish systems for reproducibility, traceability, and robust quality control across research experiments and model training runs, implementing effective monitoring and observability.Collaborate directly with researchers to identify bottlenecks and unlock new capabilities, managing research tools like a product manager by proactively seeking feedback and tracking adoption.Work alongside infrastructure, data, and product teams to integrate tools across the technical stack.

Feb 3, 2026
Apply
companyThinking Machines Lab logo
Full-time|$350K/yr - $475K/yr|On-site|San Francisco

At Thinking Machines Lab, our ambition is to enhance human potential by advancing collaborative general intelligence. We envision a future where individuals have the tools and knowledge to harness AI for their distinct requirements and aspirations.Our team comprises dedicated scientists, engineers, and innovators who have contributed to some of the most renowned AI products, including ChatGPT and Character.ai, along with open-weight models like Mistral, and influential open-source projects such as PyTorch, OpenAI Gym, Fairseq, and Segment Anything.About the RoleWe are seeking an Infrastructure Research Engineer to architect, optimize, and sustain the computational frameworks that facilitate large-scale language model training. You will create high-performance machine learning kernels (e.g., CUDA, CuTe, Triton), enable effective low-precision arithmetic operations, and enhance the distributed computing infrastructure essential for training expansive models.This position is ideal for an engineer who thrives in close collaboration with hardware and research disciplines. You will partner with researchers and systems architects to merge algorithmic design with hardware efficiency. Your responsibilities will include prototyping new kernel implementations, evaluating performance across various hardware generations, and helping to establish the numerical and parallelism strategies crucial for scaling next-generation AI systems.Note: This is an evergreen role that remains open continuously for expressions of interest. We receive numerous applications, and there may not always be an immediate opportunity that aligns with your qualifications. However, we encourage you to apply, as we regularly assess applications and will reach out as new positions become available. You are also welcome to reapply after gaining additional experience, but please refrain from applying more than once every six months. Additionally, you may notice postings for specific roles catering to particular projects or team needs. In such cases, you are encouraged to apply directly alongside this evergreen listing.What You’ll DoDesign and develop custom ML kernels (e.g., CUDA, CuTe, Triton) for key LLM operations such as attention, matrix multiplication, gating, and normalization, optimized for contemporary GPU and accelerator architectures.Conceptualize compute primitives aimed at alleviating memory bandwidth bottlenecks and enhancing kernel compute efficiency.Collaborate with research teams to synchronize kernel-level optimizations with model architecture and algorithmic objectives.Create and maintain a library of reusable kernels and performance benchmarks that serve as the foundation for internal model training.Contribute to the stability and scalability of our infrastructure, ensuring it meets the growing demands of AI development.

Nov 27, 2025
Apply
companyNumeric logo
Full-time|On-site|San Francisco

Join our dynamic team at Numeric as a Sales Engineer, where you will play a pivotal role in driving our technology solutions forward. You will work closely with clients to understand their needs and provide innovative solutions that align with their business objectives. Your expertise will empower our sales team and ensure the successful implementation of our products.

Apr 3, 2026
Apply
companyThinking Machines Lab logo
Full-time|$175K/yr - $475K/yr|On-site|San Francisco

At Thinking Machines Lab, we strive to empower humanity by advancing collaborative general intelligence. Our vision is to create a future where everyone can access the knowledge and tools necessary to harness AI for their specific needs and aspirations.Our team comprises scientists, engineers, and innovators who have developed some of the most widely utilized AI products, such as ChatGPT and Character.ai, along with notable open-weight models like Mistral, as well as prominent open-source projects including PyTorch, OpenAI Gym, Fairseq, and Segment Anything.About the RoleAs a Research Product Manager (RPM) at Thinking Machines Lab, you will play a pivotal role in driving complex, high-impact technical products and programs that encompass research, infrastructure, and applied initiatives. You will facilitate the transformation of ambitious concepts into reality by propelling cross-functional collaboration, ensuring projects maintain momentum, and fostering clarity in fast-paced, ambiguous settings.Your contributions will connect people, ideas, and systems, guaranteeing that our critical research initiatives remain aligned, well-defined, and progressing efficiently. This position is ideal for someone who excels in technical discussions, comprehends the intricacies of research, can conceptualize at a high level while also delving into detailed aspects, ultimately aiming to assist the company in executing at scale.Note: This is an "evergreen role" that we keep open on an ongoing basis to express interest. We receive numerous applications, and there may not always be an immediate role that aligns perfectly with your experience and skills. Nevertheless, we encourage you to apply. We continuously review applications and reach out to applicants as new opportunities arise. You are welcome to reapply if you gain more experience, but please refrain from applying more than once every six months. You may also find that we post job openings for specific roles related to separate projects or team needs. In those cases, you are welcome to apply directly in addition to this evergreen role.What You’ll DoDrive and coordinate large-scale research products and programs, ensuring that complex projects are executed efficiently, transparently, and with scientific rigor.Translate technical ideas into actionable, well-scoped plans, defining milestones and ensuring team alignment across model development, data campaigns, infrastructure, and product integration.Collaborate across disciplines—from research and ML infrastructure to legal and business development—quickly ramping up on new domains as necessary.Create and maintain compute and resource roadmaps, identifying bottlenecks and solutions to optimize project flow.

Nov 28, 2025
Apply
companyNumeral logo
Full-time|Remote|HQ - San Francisco, CA

About Numeral:Numeral is revolutionizing the tax landscape for digital businesses, helping them navigate the complexities of regulations across 47 states and over 50 countries. Our mission is to lift this burden so that teams can concentrate on their primary objectives.As the largest and fastest-growing AI-driven tax solution, Numeral was established in 2023 and has successfully raised over $57 million from notable investors such as Benchmark, Mayfield, and Y-Combinator. We proudly serve more than 2,500 paying customers and have consistently tripled our revenue year over year since our inception.While our main hiring hubs are in San Francisco and New York, we do offer remote opportunities based on specific circumstances.About the Role:We are in search of a foundational Full-Stack Engineer who thrives in dynamic environments, enjoys constructing intricate distributed systems, and is eager to establish core architecture within a rapidly expanding early-stage startup. You will spearhead the design and implementation of APIs, services, and infrastructure that form the backbone of our platform. Additionally, you will collaborate with leadership to drive engineering strategy, product execution, and cultural development.Your Responsibilities:Design and develop scalable and secure backend services and APIs from the ground up.Make architectural choices that prioritize speed, data integrity, and long-term sustainability.Work closely with product, design, and data teams to create efficient systems that enhance crucial customer workflows.Take ownership of infrastructure, observability, and performance for the services you create.Contribute to the establishment of engineering best practices, code quality, and a robust backend culture.What You Bring:Over 7 years of experience in developing backend systems in high-growth settings.Proficiency in backend technologies such as Node.js, PostgreSQL, Redis, and cloud services (AWS preferred).Experience in designing resilient APIs and distributed systems at scale.A strong product sensibility with the ability to balance performance, flexibility, and speed.A startup mindset: comfortable with ambiguity and eager for swift growth.Intensity & Ownership: This role goes beyond a standard 9-5 — we are rapidly scaling and face exciting opportunities ahead.Customer Obsession: A genuine commitment to enhancing user experience and resolving their challenges.

Jul 16, 2025
Apply
companyNumeral logo
Full-time|Remote|HQ - San Francisco, CA

About NumeralNumeral is revolutionizing the tax compliance landscape for digital businesses, alleviating the complexities of navigating regulations across 47 states and over 50 countries. Our mission is to empower teams to concentrate on their core objectives by eliminating these burdens.As the largest and fastest-growing AI-driven tax solution, Numeral launched in 2023 and has successfully raised over $57 million from esteemed investors including Benchmark, Mayfield, and Y-Combinator. Currently, we proudly serve over 2,500 paying customers while consistently tripling our revenue year over year.While our primary hiring locations are in our San Francisco and New York offices, we also offer remote positions in select cases.About the Role:As a Data Analyst at Numeral, you'll play an integral role within our operations team, focusing on reconciling customer data, enhancing data quality, and advancing our platform's intelligence and scalability. Collaborating closely with Engineering, Product, and Customer Experience teams, your contributions will ensure that customers have access to clean, actionable data and seamless onboarding experiences.This is a pivotal role with significant influence, allowing you to shape our data utilization strategies across the organization—free from bureaucratic constraints, you will experience true ownership and tangible results.What You'll Do:Reconcile financial and transactional data from various sources to facilitate accurate and repeatable sales tax filings.Standardize customer data across platforms (e.g., Stripe, Shopify, Numeral API) to maintain integrity in tax workflows.Collaborate with Sales, Engineering, Product, and Solutions teams to identify and resolve onboarding data issues.Assist customer-facing teams with ad hoc data inquiries, investigations, and insights.Develop internal tools and processes to enhance reconciliation and improve onboarding efficiency.Enhance the fidelity and performance of our DBT pipelines and contribute to the evolution of our broader data architecture.

Sep 25, 2025
Apply
companyThinking Machines Lab logo
Full-time|$350K/yr - $475K/yr|On-site|San Francisco

Thinking Machines Lab brings together scientists, engineers, and innovators behind widely recognized AI products such as ChatGPT and Character.ai, as well as open-source frameworks like PyTorch, OpenAI Gym, Fairseq, and Segment Anything. The team is driven by a mission to enhance humanity through collaborative general intelligence, aiming for a future where AI adapts to individual needs and goals. Tinker, the lab’s fine-tuning API, empowers researchers and developers to customize advanced AI models for their own use cases. Tinker manages the infrastructure, allowing users to train open-weight models with their chosen datasets, algorithms, and objectives. As Tinker grows its user base and features, the team is expanding to better support the community. Role overview The Forward Deployed Engineer acts as the main point of contact for a broad range of clients, from solo developers to large organizations. This role identifies customer challenges and requirements, then translates those insights into actionable product improvements. Both customer interaction and product development responsibilities are central to this position. What you will do Triage and resolve customer issues across the full stack, including analyzing logs, reproducing failures, and tracing job executions. Develop tools, integrations, and automation to address recurring problems and speed up user support. Create and update clear documentation and practical guides based on real user experiences and implementations. Work closely with research and infrastructure teams to turn customer feedback into prioritized engineering tasks. Help shape Tinker’s product roadmap by sharing insights from daily customer interactions.

Apr 27, 2026
Apply
companyThinking Machines Lab logo
Full-time|$350K/yr - $475K/yr|On-site|San Francisco

Thinking Machines Lab aims to advance collaborative general intelligence, making AI accessible and adaptable for individuals and organizations. The team brings together scientists, engineers, and innovators behind well-known AI solutions, including ChatGPT, Character.ai, Mistral, and open-source projects like PyTorch, OpenAI Gym, Fairseq, and Segment Anything. Tinker, the lab’s fine-tuning API, helps researchers and developers customize AI models using their own data and algorithms. By handling the infrastructure, Tinker allows users to focus on training and deploying models that suit their needs. With a growing customer base and expanding features, the team is looking for a Software Engineer, Platform to support Tinker's continued development. Role overview This position centers on building and maintaining the core platform systems that power Tinker. The engineer will manage billing and usage metering, permissions and access control, organizational structures, data exports, audit logging, and the administrative tools that tie these systems together. Collaboration with product and legal teams is essential, as changes to features, pricing, and enterprise agreements will involve this role. What you will do Design the authorization layer for all products, including RBAC, API key scoping, organizational hierarchies, and permission boundaries. Oversee billing infrastructure, covering usage metering, plan management, payment processing, invoicing, and revenue recognition support. Develop and improve models for organizations and teams, such as seat management, SSO/SAML, workspace isolation, and invitation flows. Implement data export and deletion processes that align with enterprise standards and data residency requirements. Create audit logging systems to track user actions and decisions. This role is based in San Francisco.

Apr 27, 2026
Apply
companyThinking Machines Lab logo
Full-time|$350K/yr - $475K/yr|On-site|San Francisco

At Thinking Machines Lab, our mission is to empower humanity by advancing collaborative general intelligence. We envision a future where everyone has access to the knowledge and tools necessary to harness AI for their unique needs and goals.Our team comprises scientists, engineers, and builders who have developed some of the most widely utilized AI products, such as ChatGPT and Character.ai, alongside open-weight models like Mistral, and popular open-source initiatives like PyTorch, OpenAI Gym, Fairseq, and Segment Anything.About the PositionWe are seeking an Infrastructure Research Engineer to design and construct the foundational systems that facilitate the scalable and efficient training of large models for both deployment and research purposes. Your primary objective will be to streamline experimentation and training at Thinking Machines, enabling our research teams to concentrate on scientific advancements rather than system limitations.This role is a perfect match for an individual who possesses a strong blend of deep systems expertise and a keen interest in machine learning at scale. You will take full ownership of the training stack, ensuring that every GPU cycle contributes to scientific progress.Note: This is an evergreen role that we keep open continuously to express interest. We receive numerous applications, and there may not always be an immediate role that aligns perfectly with your experience and skills. However, we encourage you to apply. We regularly review applications and reach out to candidates as new opportunities arise. Feel free to reapply as you gain more experience, but please avoid applying more than once every six months. We may also post specific roles for individual projects or team needs, in which case you are welcome to apply directly alongside this evergreen role.Key ResponsibilitiesDesign, implement, and optimize distributed training systems that scale across thousands of GPUs and nodes for extensive training workloads.Develop high-performance optimizations to maximize throughput and efficiency.Create reusable frameworks and libraries that enhance training reproducibility, reliability, and scalability for new model architectures.Establish standards for reliability, maintainability, and security, ensuring systems remain robust under rapid iterations.Collaborate with researchers and engineers to construct scalable infrastructure.Publish and disseminate findings through internal documentation, open-source libraries, or technical reports that contribute to the advancement of scalable AI infrastructure.

Nov 27, 2025
Apply
companyThinking Machines Lab logo
Full-time|$350K/yr - $475K/yr|On-site|San Francisco

At Thinking Machines Lab, we are dedicated to empowering humanity by advancing collaborative general intelligence. Our vision is to create a future where everyone can leverage AI to meet their unique needs and aspirations.Our talented team comprises scientists, engineers, and innovators who have developed some of the most widely recognized AI products, including ChatGPT and Character.ai, alongside open-weight models like Mistral and popular open-source projects such as PyTorch, OpenAI Gym, Fairseq, and Segment Anything.About the PositionWe are seeking a motivated Infrastructure Research Engineer to design, enhance, and scale the systems that underpin large AI models. Your contributions will significantly improve inference speed, cost-effectiveness, reliability, and reproducibility, allowing our teams to concentrate on enhancing model capabilities rather than dealing with bottlenecks.Our mission centers on delivering high-performance and efficient model inference to support real-world applications and accelerate research efforts. In this role, you will be responsible for the infrastructure that guarantees smooth operation for every experiment, evaluation, and deployment at scale.Note: This is an evergreen role, kept open continuously to express interest. We receive numerous applications and may not always have an immediate opening that aligns perfectly with your skills and experience. However, we encourage you to apply. We regularly review applications and reach out to candidates as new opportunities arise. Feel free to reapply as you gain more experience, but we kindly ask that you avoid applying more than once every six months. You may also notice postings for specific roles related to particular projects or teams, in which case you are welcome to apply directly in addition to this evergreen role.What You Will DoCollaborate with researchers and engineers to transition cutting-edge AI models into production.Partner with research teams to ensure high-performance inference for innovative architectures.Design and implement new techniques, tools, and architectures that enhance performance, latency, throughput, and efficiency.Optimize our codebase and computing resources (e.g., GPUs) to maximize hardware FLOPs, bandwidth, and memory usage.Extend orchestration frameworks (e.g., Kubernetes, Ray, SLURM) for distributed inference, evaluation, and large-batch serving.Establish standards for reliability, observability, and reproducibility throughout the inference stack.Publish and share insights through internal documentation, open-source libraries, or technical reports that further the field of scalable AI infrastructure.

Nov 27, 2025
Apply
companyNumeral logo
Full-time|On-site|HQ - San Francisco, CA

About NumeralAt Numeral, we are revolutionizing the tax compliance landscape for digital businesses. Navigating through a complex web of regulations across 47 states and over 50 countries can be overwhelming, and we are here to lift that burden, allowing teams to concentrate on their core missions.As the largest and fastest-growing AI-native tax solution, Numeral was founded in 2023 and has successfully secured over $57M in funding from notable investors such as Benchmark, Mayfield, and Y-Combinator. With more than 2,500 paying customers, our revenue has more than tripled each year since inception.Your RoleAdvanced Tax Filings: Own and manage complex and specialty tax filings, translating intricate requirements into effective operational processes.Product Operations: Serve as the vital link between our operations team and engineering, identifying opportunities to automate and optimize filing operations within our product.Data & Analytics: Craft SQL queries to derive insights, manipulate datasets, conduct data audits, and ensure exceptional accuracy in data pipelines.AI-Enhanced Work: Leverage tools like Claude Code to enhance problem-solving, automate workflows, and boost your operational efficiency.General Problem-Solving: Tackle any challenges that arise, addressing unoptimized processes today so that our software can automate them tomorrow.What We’re Looking ForExperience: 1 to 3 years in an analytical, operations, data, or product-related role.Technical Skills: Proficient in SQL, with a strong ability to manage and interpret complex datasets.AI Enthusiast: Experience with (or a keen interest in using) AI tools like Claude Code to enhance productivity.Resourceful Innovator: Thrive in a high-autonomy, early-stage startup environment and be ready to advance our goals every single day.Location: Must be based in or willing to relocate to San Francisco.BenefitsCompetitive salary and equity - share in the success of Numeral.Flexible working arrangements and comprehensive health benefits.

Mar 17, 2026
Apply
companyThinking Machines Lab logo
Full-time|$350K/yr - $475K/yr|On-site|San Francisco, California

Thinking Machines Lab brings together scientists, engineers, and innovators who have contributed to well-known AI products such as ChatGPT, Character.ai, and open-weight models like Mistral. The team’s open-source projects include PyTorch, OpenAI Gym, Fairseq, and Segment Anything. Their mission centers on advancing collaborative general intelligence and making AI tools accessible for a wide range of users and goals. The Tinker platform offers a fine-tuning API that lets researchers and developers tailor advanced AI models to their needs. By handling the underlying infrastructure, Tinker enables users to train open-weight models with custom data, algorithms, and objectives. As demand grows, the team is adding new features and supporting an expanding community. Role overview The Full Stack Software Engineer will play a key part in building and maintaining the products and services that Tinker users depend on. This position involves working closely with frontend, backend, and infrastructure teams to deliver the Tinker console, developer tools, and essential features. What you will do Develop and enhance Tinker’s APIs and backend services using Python and Rust, focusing on areas like job submission, orchestration, billing, and usage tracking. Design and launch user interfaces, including the Tinker console and upcoming developer tools, using React and TypeScript. Refine the developer experience by improving SDK usability, error messages, API design, and onboarding processes. Work to increase system reliability, observability, and security in production, and participate in on-call rotations. Create internal tools that help research and infrastructure teams work more efficiently. Location This role is based in San Francisco, California.

Apr 28, 2026
Apply
companyThinking Machines Lab logo
Full-time|$350K/yr - $475K/yr|On-site|San Francisco

Thinking Machines Lab brings together scientists, engineers, and innovators who have shaped well-known AI products like ChatGPT and Character.ai, as well as open-weight models such as Mistral. The team also contributes to open-source projects including PyTorch, OpenAI Gym, Fairseq, and Segment Anything. The company’s mission centers on advancing collaborative general intelligence, aiming to make AI accessible and adaptable to individual needs. Tinker, the company’s fine-tuning API, enables researchers and developers to customize advanced AI models using their own data and algorithms. Thinking Machines manages the infrastructure, giving users the flexibility to train open-weight models while focusing on their unique requirements. As Tinker expands, the platform continues to evolve alongside its growing community. Role overview The Site Reliability Engineer will focus on improving the reliability and resilience of the Tinker platform. This role involves close collaboration with platform engineers and research teams to strengthen every layer of the system, from infrastructure to user-facing services. What you will do Define and take ownership of end-to-end reliability, including CI/CD workflows, production observability, and incident response processes. Set Service Level Objectives for distributed training systems, balancing reliability, scheduling latency, and development speed. Design and implement monitoring and observability across the training pipeline. Manage incident response for Tinker, ensuring prompt recovery, thorough incident analysis, and systematic improvements to prevent recurrence. Enhance multi-tenant isolation and resource scheduling to support LoRA-based workload co-scheduling, maintaining both reliability and data separation. Collaborate with security teams to identify and address production vulnerabilities. This position is based in San Francisco.

Apr 28, 2026
Apply
companyThinking Machines Lab logo
Full-time|$350K/yr - $475K/yr|On-site|San Francisco

Thinking Machines Lab brings together scientists, engineers, and innovators who have contributed to well-known AI products such as ChatGPT, Character.ai, and open-source frameworks like PyTorch, OpenAI Gym, Fairseq, and Segment Anything. The team's mission centers on advancing collaborative general intelligence, aiming to make AI accessible for people to address their own needs and ambitions. The Tinker platform offers a fine-tuning API that lets researchers and developers tailor advanced AI models to their specific requirements. Tinker provides the infrastructure, while users maintain flexibility to train open-weight models with their own data and algorithms. As Tinker grows its features and user base, the team is expanding to support the platform's evolution. Role overview This Full Stack Software Engineer role focuses on designing, building, and maintaining the products and services that Tinker users rely on. The work covers frontend, backend, and infrastructure, with an emphasis on the Tinker console, developer tools, and meeting the changing needs of the Tinker community. What you will do Develop and improve Tinker’s APIs and backend services using Python and Rust, including systems for job submission, orchestration, billing, and usage tracking. Build user-facing interfaces such as the Tinker console and future developer tools with React and TypeScript. Enhance the developer experience by refining SDK usability, error messages, API design, and onboarding workflows. Increase system reliability, observability, and security in Tinker’s production environment, and participate in on-call rotations. Create internal tools to support the research and infrastructure teams working on Tinker. This position is based in San Francisco.

Apr 27, 2026
Apply
companyThinking Machines Lab logo
Full-time|$200K/yr - $250K/yr|On-site|San Francisco, CA

At Thinking Machines Lab, we are on a mission to enhance humanity through the advancement of collaborative general intelligence. Our vision is to create a future where everyone has the opportunity to leverage AI tailored to their individual needs and aspirations.Our team comprises scientists, engineers, and innovators who have developed some of the most renowned AI products in the industry, such as ChatGPT, Character.ai, as well as open-weight models like Mistral and popular open-source projects including PyTorch, OpenAI Gym, Fairseq, and Segment Anything.About the RoleWe are seeking an Executive Business Partner to provide vital support to several technical leaders from our San Francisco office. Your role will be crucial in ensuring our team remains focused and organized by managing personal logistics and handling tasks that may otherwise be overlooked.This position is unique, requiring creativity and flexibility to adapt to various work styles and the dynamic challenges of a fast-paced startup environment. You will enjoy significant autonomy in decision-making without extensive supervision.What You’ll DoManage calendars, schedule meetings, and coordinate travel for 3-4 technical leaders.Act as the primary liaison between your supported leaders and other departments within the company.Assist with recruiting coordination efforts.Monitor projects and commitments to ensure nothing is overlooked.

Mar 19, 2026
Apply
companyThinking Machines Lab logo
Full-time|$190K/yr - $300K/yr|On-site|San Francisco, California

At Thinking Machines Lab, our mission is to empower humanity by advancing collaborative general intelligence. We envision a future where everyone has access to the knowledge and tools necessary to leverage AI for their unique goals.Our team consists of scientists, engineers, and builders who have developed some of the most utilized AI products, such as ChatGPT and Character.ai, alongside open-weight models like Mistral and popular open-source projects including PyTorch, OpenAI Gym, Fairseq, and Segment Anything.HR Business PartnerThe HR Business Partner role is essential in empowering our team to thrive as we scale. You will be pivotal in coaching our leaders and designing people systems that align with our mission.As the HR Business Partner, you will facilitate leadership coaching and the design of performance management systems that foster growth and collaboration. You will support managers in enhancing team dynamics and personal development while building a scalable people infrastructure that includes performance feedback systems, compensation structures, and career frameworks.What You’ll DoProvide coaching to managers by observing their leadership styles, identifying strengths and areas for growth, and promoting continuous improvement.Advise leadership on organizational strategies, including team structure, succession planning, and strategic people decisions that influence our operational effectiveness.Develop compensation frameworks that attract top-tier machine learning talent while ensuring alignment with our core values and principles.Create career progression frameworks tailored for a research environment where growth often transcends traditional management roles and where contributions such as mentorship and expertise are valued.Establish feedback and evaluation mechanisms that prioritize personal improvement over mere assessment.

Feb 2, 2026
Apply
companyThinking Machines Lab logo
Full-time|$350K/yr - $475K/yr|On-site|San Francisco

At Thinking Machines Lab, our mission is to empower humanity by advancing collaborative general intelligence. We're dedicated to crafting a future where everyone can harness the power of AI to meet their unique needs and aspirations.Our team comprises scientists, engineers, and innovators who have developed some of the most widely utilized AI products, including ChatGPT and Character.ai, as well as open-weight models like Mistral, in addition to renowned open-source projects such as PyTorch, OpenAI Gym, Fairseq, and Segment Anything.About the RoleWe are seeking a talented Infrastructure Research Engineer to architect and develop the foundational systems that facilitate the scalable and efficient training of large models using reinforcement learning.This position exists at the crossroads of research and large-scale systems engineering, requiring a professional who not only comprehends the algorithms behind reinforcement learning but also appreciates the practicalities of distributed training and inference at scale. You will have a diverse set of responsibilities, from optimizing rollout and reward pipelines to enhancing the reliability, observability, and orchestration of systems. Collaboration with researchers and infrastructure teams will be essential to ensure reinforcement learning is stable, rapid, and production-ready.Note: This is an evergreen role that we maintain on an ongoing basis to express interest. Due to the high volume of applications we receive, there may not always be an immediate position that aligns perfectly with your skills and experience. We encourage you to apply, as we continuously review applications and reach out to candidates when new opportunities arise. You may reapply after gaining more experience, but please refrain from applying more than once every six months. Additionally, you may notice postings for specific roles that cater to unique project or team needs; in those circumstances, you are welcome to apply directly alongside this evergreen role.What You’ll DoDesign, implement, and optimize the infrastructure that supports large-scale reinforcement learning and post-training workloads.Enhance the reliability and scalability of the RL training pipeline, including distributed RL workloads and training throughput.Create shared monitoring and observability tools to ensure high uptime, debuggability, and reproducibility of RL systems.Work closely with researchers to translate algorithmic concepts into production-quality training pipelines.Develop evaluation and benchmarking infrastructure to assess model performance based on helpfulness, safety, and factual accuracy.Publish and disseminate insights through internal documentation, open-source libraries, or technical reports that contribute to the advancement of scalable AI infrastructure.

Nov 27, 2025
Apply
companyThinking Machines Lab logo
Full-time|$175K/yr - $300K/yr|On-site|San Francisco, California

Thinking Machines Lab brings together scientists, engineers, and innovators with a track record in developing widely used AI products and open-source projects. The team has contributed to tools like ChatGPT, Character.ai, Mistral, PyTorch, OpenAI Gym, Fairseq, and Segment Anything. The company’s mission centers on advancing collaborative general intelligence to help people achieve more with AI tailored to their needs. Tinker, the company’s fine-tuning API, enables researchers and developers to adapt advanced AI models to their own data and algorithms. By handling the infrastructure, Tinker allows users to focus on customization, opening up capabilities that were once limited to a few specialized labs. As Tinker’s customer base and feature set grow, the team is focused on building a scalable platform and supporting an expanding community. Role overview The GTM Strategy & Operations Lead will build and refine the commercial structure for Tinker. This person will design strategies and processes that turn organic product adoption into a consistent, scalable revenue stream. The role involves shaping how Tinker’s fine-tuning capabilities are packaged, priced, launched, and sold across different customer segments. Collaboration with product, engineering, and research teams is central to the work. Tinker is designed for technically sophisticated users. The GTM lead must be comfortable discussing training infrastructure and understand how developers evaluate and adopt new tools. What you will do Develop and execute commercialization strategies for Tinker, including pricing, packaging, and launch plans based on market and competitor analysis. Create go-to-market approaches tailored to different types of customers. Manage partnerships to expand Tinker’s reach and open new channels for demand. Design and oversee customer pilots, onboarding, and expansion playbooks to move accounts from testing to production use. Produce commercial playbooks to help customer-facing engineers and FDEs position and sell Tinker effectively. Set and track success metrics for launches and GTM projects, running experiments to test assumptions about pricing and product packaging.

Apr 27, 2026

Sign in to browse more jobs

Create account — see all 11,544 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.