Engineering Leader For Platform And Infrastructure jobs in San Francisco – Browse 5,602 openings on RoboApply Jobs

Engineering Leader For Platform And Infrastructure jobs in San Francisco

Open roles matching “Engineering Leader For Platform And Infrastructure” with location signals for San Francisco. 5,602 active listings on RoboApply Jobs.

5,602 jobs found

1 - 20 of 5,602 Jobs
Apply
companyPeregrine Technologies logo
Full-time|$282K/yr - $363K/yr|On-site|San Francisco, CA

Supported by premier investors from Silicon Valley, Peregrine Technologies empowers public safety organizations, government entities, federal agencies, and private institutions to tackle societal challenges with unmatched speed and precision. Our AI-driven platform transforms isolated and unconnected data into actionable operational intelligence, swiftly surfacing critical information that enables better, faster decision-making, thereby enhancing outcomes at every interaction. Currently, Peregrine serves hundreds of clients across more than 30 states and two countries, impacting over 125 million individuals, and we are poised to extend our influence into enterprise sectors and globally.Our TeamAs a cohesive engineering unit, we firmly believe that empathy enhances our solutions. Observing how users interact with our products is pivotal in guiding us toward the right solutions. Engineers will have the opportunity to collaborate closely with our onsite team to grasp the diverse use cases that Peregrine addresses.We are on the lookout for an Engineering Manager to join our core engineering teams. You will collaborate cross-functionally with design and product management to develop robust, scalable, and user-centered systems. Our teams face a range of challenges, from enabling real-time collaboration on detailed maps to constructing high-scale backend architectures capable of processing billions of data points.We value both ownership and collaboration—you will take full responsibility for significant features while working closely with fellow engineers to drive projects to fruition. We hold that humility and empathy are vital for crafting the right solutions—you will engage directly with our deployment team and users as we iterate to tackle their challenges. Creativity and perseverance are essential in realizing our vision.RoleThis position is central to the strategic execution of Peregrine's platform. You will define how our core systems scale, perform, and evolve as Peregrine continues its rapid growth and strengthens its impact across public safety, government, and enterprise sectors.As a senior platform leader, your role transcends mere system management; you will establish the technical direction, build your team, and create the operational framework that empowers every product team at Peregrine to progress with speed, safety, and assurance. Your contributions will directly influence system reliability and performance.

Mar 18, 2026
Apply
companymlabs logo
Full-time|$185K/yr - $400K/yr|On-site|San Francisco, California, United States

Join Our Team as an Infrastructure & Platform EngineerWe are seeking a talented Infrastructure & Platform Engineer to join our dynamic team at mlabs in San Francisco. As a rapidly growing technology company, we are at the cutting edge of the crypto derivatives market, an industry that generates tens of billions in annual revenue. Our exchange is one of the fastest-growing platforms for crypto derivatives, and we are committed to enhancing our offerings to meet the evolving needs of our users.Your mission will be to develop the next critical feature: Multi-Asset Margin, which will streamline how users post collateral directly on-chain, thus improving trading efficiency. You will work alongside our Infrastructure & Platform team, focusing on designing and managing our high-performance systems that deliver exceptional speed and reliability.Key Responsibilities:Design and implement robust scripts and services that ensure optimal performance in real-time environments.Manage and deploy computing resources and containers for tailored services and integrations.Automate scaling, load balancing, and congestion control for both compute and database layers.Establish and maintain CI/CD pipelines for streamlined deployments and continuous delivery.Monitor and optimize system performance across multiple metrics to enhance throughput and resilience.Develop and maintain indexing and explorer services for fast, real-time data access.Provision and optimize diverse database systems, including time-series, relational, key-value, and in-memory databases.

Mar 5, 2026
Apply
companyBenchling logo
Full-time|On-site|San Francisco, CA

Join Benchling as an Infrastructure Engineering Leader and play a pivotal role in shaping our engineering practices. We are seeking an innovative leader to oversee our infrastructure development, driving efficiency and scalability. This position offers an exciting opportunity to work with cutting-edge technologies and lead a talented team in a dynamic environment.

Mar 2, 2026
Apply
companyFable Security logo
Full-time|$160K/yr - $225K/yr|Hybrid|San Francisco, CA (Hybrid)

About Fable SecurityIn today’s digital landscape, AI-driven threats and human errors represent the most significant risks to enterprise security. Cybercriminals exploit human behavior, contributing to 70% of security breaches. At Fable, we empower individuals to transform from potential targets to active defenders with innovative tools.Fable is at the forefront of human risk management, offering a platform that effectively influences employee behavior. Our user-friendly, scalable solution analyzes complex employee data, identifies high-risk behaviors, and delivers timely interventions directly to users in their work environment.Supported by notable investors like Redpoint Ventures and Greylock Partners, and founded by former members of the Abnormal Security team, Fable is tackling one of cybersecurity's greatest challenges in a rapidly expanding market. Our team comprises alumni from esteemed organizations such as Meta, Twitter, and Flexport, as well as top universities including Waterloo, Columbia, and Stanford. This is an exceptional opportunity for you to join us at a time of rapid growth and help shape the future of security.Why Join UsBuild and scale the foundational data infrastructure that drives a groundbreaking product.Collaborate closely with engineering, data science, and product teams to operationalize data at scale.Become part of a small, high-caliber team where your contributions will have a significant impact.As part of an early-stage company, every engineer plays a crucial role in shaping the evolution of our products and the company's approach to data management.Your RoleAs a Platform and Infrastructure Engineer, you will be instrumental in developing and scaling the core systems that underpin Fable’s product and data operations.Your responsibilities will span backend systems including real-time services and data pipelines. You will ensure reliability, scalability, and optimal performance across all layers. This highly collaborative role involves working closely with data and ML teams, contributing to systems that effectively manage data ingestion, processing, and delivery.This role demands cross-functional collaboration with engineering, data, and product teams to create robust, production-grade systems that grow alongside the company.ResponsibilitiesDesign, develop, and maintain scalable backend and infrastructure systems.Collaborate with cross-functional teams to deliver high-quality software solutions.Ensure system reliability, performance, and security through rigorous testing and monitoring.

Apr 6, 2026
Apply
companyDecagon logo
Full-time|On-site|San Francisco

Decagon seeks an Engineering Manager to lead its Platform Infrastructure team in San Francisco. This position shapes the technical foundation behind Decagon’s scalable applications, focusing on both performance and reliability. The role involves hands-on leadership and a commitment to building infrastructure that supports the company’s growth. Role overview This manager will oversee a group of engineers dedicated to platform infrastructure. The team’s work underpins the systems that allow Decagon’s products to scale smoothly and operate dependably. What you will do Guide and support engineers working on key infrastructure projects Direct the development and maintenance of systems that power Decagon’s applications Encourage solutions that boost platform performance and reliability

Apr 23, 2026
Apply
company
Full-time|On-site|San Francisco

About the RoleJoin the innovative team at Known as an Infrastructure and Platform Engineer, where you will take the lead in managing and enhancing our core infrastructure and platform systems. Your work will be crucial in powering AI-driven matching, voice, and scheduling functionalities. You will be responsible for everything from cloud infrastructure and data orchestration to performance monitoring and model deployment support, designing and scaling systems that ensure Known operates swiftly, reliably, and securely.In this pivotal role, you will collaborate closely with the founding team, comprising experts in AI/ML, product development, and design, to establish Known’s technical foundation. You will play a key role in shaping our architecture, engineering culture, and best practices right from the start. This position is perfect for a practical builder who thrives in early-stage environments and is passionate about taking projects from concept to production.

Oct 9, 2025
Apply
companyworldlabs logo
Full-time|Remote|San Francisco

Join our innovative team at worldlabs as a Platform Engineer specializing in Developer Infrastructure. In this pivotal role, you will enhance and maintain the tools and platforms that empower our developers to build efficient and scalable applications. You will work collaboratively across teams to ensure that our infrastructure meets the needs of our growing organization.

Apr 10, 2026
Apply
companyBaton logo
Full-time|$162K/yr - $216K/yr|Hybrid|San Francisco, California, United States

Who We AreBaton is Ryder’s innovative product development division dedicated to leveraging cutting-edge technologies to transform the transportation and logistics landscape. Managing over $10 billion in freight, our technology has a significant impact across the U.S. economy.We are committed to creating and delivering software that not only meets but exceeds the needs of Ryder and its 50,000+ clients, which includes some of the most recognized brands globally. Our projects range from user-centric applications to the robust data platform that will drive the future of Ryder’s innovations.Baton’s mission: To enable a supply chain that operates on autopilot.Since Ryder’s acquisition of Baton in 2022, we have been operating with the agility of a startup while benefiting from the extensive reach of a Fortune 500 company. If you're passionate about tackling intricate challenges and making a real impact in the backbone of the American economy, you’ll thrive with us.Role: Software Engineer - InfrastructureDepartment: Data PlatformLocation: Hayes Valley, San Francisco, CA

Dec 2, 2025
Apply
companyHilbert logo
Full-time|On-site|San Francisco

Join Hilbert, a pioneering scalable data science platform driving growth for B2C teams with predictive insights into user behavior and revenue sources. Our innovative technology condenses decision-making timelines from months to mere minutes.Trusted by Fortune 10 companies and cherished brands like FreshDirect, Blank Street, and Levain Bakery, Hilbert empowers operators to enhance their growth strategies. We are also collaborating with leading AI firms to co-develop advanced solutions.We seek an innovative Head of Engineering to spearhead our technical division. You will manage two key departments: Data Engineering and Software/Platform Engineering, growing a team of 15-20 exceptional engineers by the end of the year. This is not a maintenance position; you will cultivate a high-performance system capable of handling 40+ simultaneous mid-market integrations while deploying tailored enterprise-grade AI environments.While you won't be writing all the pull requests, your expertise must shine during high-stakes enterprise architecture discussions, positioning you as the technological ambassador for Hilbert to our major clients.

Feb 26, 2026
Apply
companyDiscord Inc. logo
Full-time|$196K/yr - $220.5K/yr|On-site|San Francisco Bay Area

At Discord, we connect over 200 million users monthly for diverse experiences, with gaming being the predominant activity. Our platform supports more than 90% of our users in enjoying games, collectively logging 1.5 billion hours each month across various titles. As we shape the future of gaming, our mission is to enhance interactions before, during, and after gaming sessions.The Platform Infrastructure teams are pivotal in constructing and upholding the essential systems that energize Discord's core functionalities. We manage systems that process hundreds of thousands of requests per second and handle tens of billions of transactions daily, enabling seamless connections for millions of users. By developing foundational platform components, we empower internal developers to deploy new features swiftly and securely, ensuring Discord remains reliable, efficient, and scalable.As a Senior Software Engineer on our team, you will play a crucial role in continuously refining our codebase, processes, and infrastructure, directly impacting user interactions on Discord!

Feb 3, 2026
Apply
companyPlasmidsaurus logo
Full-time|On-site|San Francisco

Plasmidsaurus helps scientists worldwide by streamlining sequencing. Researchers from leading institutions and companies rely on this platform daily. With a global network of labs, the company delivers fast, affordable sequencing results, and has recently expanded into RNA-seq to broaden its genomics reach. The team is focused on building a universal sequencing platform designed for efficiency and global scale. Role overview The Lead Engineer for AI Infrastructure in Platform Engineering sets both technical direction and management strategy for the company’s compute, data, AI, and security infrastructure. This position oversees the entire sequencing operation, from laboratory devices to data delivery. What you will do Oversee core services that coordinate laboratory devices, including robots, sequencers, and on-premises Linux servers, as well as the data ingestion pipeline. Develop cloud infrastructure and data pipelines for storing, processing, and delivering terabytes of sequencing data. Design systems to manage millions of bioinformatics tasks, handling queue management, workflow orchestration, and scheduling. Build AI infrastructure and internal tools to support autonomous systems, including: Quality Scientist Agents: Monitor operations, detect anomalies, and escalate quality or reliability concerns. Logistics Agents: Coordinate global transportation of samples to labs and carriers. Bioinformatics Coding Agents: Run adaptive analyses on varied sample types with different data distributions. Culture The team values initiative and a strong sense of ownership. High agency and responsibility shape how work gets done at Plasmidsaurus.

Apr 28, 2026
Apply
companyOpenAI logo
Full-time|Hybrid|San Francisco

About Our TeamThe Scaling team at OpenAI is dedicated to designing, constructing, and managing essential infrastructure that powers groundbreaking research.Our mission is straightforward: to expedite the advancement of research towards Artificial General Intelligence (AGI). We achieve this by developing foundational systems that researchers depend on, spanning from core infrastructure elements to specialized applications tailored for research. Our systems are designed to scale efficiently with the growing complexity and size of our workloads while ensuring reliability and user-friendliness.About the PositionWe are seeking a Senior Software Engineer to take charge of critical production infrastructure from start to finish.This role primarily focuses on backend and systems engineering, with a strong emphasis on low-level performance, distributed systems, and the hands-on management of vital services at scale. You will be responsible for transforming ambiguous challenges into actionable plans, delivering pragmatic solutions promptly, and refining them based on real-world feedback and iterations.This position goes beyond a standard Python backend role; we are specifically on the lookout for candidates with robust systems experience in Rust or C++, particularly in performance-sensitive infrastructure.This is an in-office role based in San Francisco, CA, following a hybrid model of three days in the office per week. We also provide relocation assistance for new hires.Your ResponsibilitiesManage critical infrastructure throughout its lifecycle, including design, implementation, deployment, operation, and ongoing improvements.Develop and maintain high-performance backend systems in Rust or C++ that facilitate core research operations.Design and optimize distributed data and serving systems, considering partitioning, replication, consistency, retries, backpressure, and failure isolation.Identify and resolve production bottlenecks related to latency, throughput, contention, hot spots, and overload scenarios.Oversee mission-critical services, including on-call duties, incident management, postmortems, observability, deployment safety, and zero-downtime migrations.Enhance the reliability of services running on Kubernetes, focusing on resource tuning and failure management.Collaborate closely with engineers and researchers to deliver fast, dependable, and effective systems.Elevate standards through strong technical judgment, ownership, and commitment to quality.You Will Excel in This Role If You Have:A proven track record of owning and delivering operationally critical systems end to end in ambiguous settings.Experience with systems programming in Rust or C++.Strong analytical skills and a problem-solving mindset.Excellent communication and collaboration skills.

Apr 11, 2026
Apply
companyOpenAI logo
Full-time|On-site|San Francisco

About the TeamThe Scaling team at OpenAI is dedicated to designing, constructing, and managing essential infrastructure that propels research forward. Our mission is straightforward: to expedite the advancement of research toward Artificial General Intelligence (AGI). We achieve this by developing foundational systems that our researchers depend on, which range from fundamental infrastructure components to tailored applications for research. These systems are designed to scale with the growing complexity and volume of our workloads while maintaining reliability and user-friendliness.About the RoleWe are in search of a skilled Site Reliability Engineer to take ownership of our production-critical infrastructure from start to finish. This role focuses on managing data-intensive, low-latency workloads, particularly involving large-scale ClickHouse clusters, high-throughput Kafka pipelines, and dependable integrations with Snowflake. You will transform unclear operational challenges into actionable plans, deliver practical solutions swiftly, and refine them based on production feedback and iterations.The ideal candidate will have the ability to independently establish and elevate operational standards across teams while remaining actively engaged with production systems.Key ResponsibilitiesOversee the lifecycle management of infrastructure, including provisioning, upgrades, scaling, and decommissioning with an Infrastructure as Code (IaC) approach.Manage and scale ClickHouse clusters, focusing on sharding, replication, capacity planning, performance tuning, and maintenance.Operate Kafka as the data ingestion backbone, enhancing throughput, lag management, backpressure handling, and failure recovery.Enhance end-to-end latency and reliability for data-heavy serving and querying workloads.Develop and sustain robust monitoring and alerting systems: SLIs/SLOs, dashboards, alert policies, and actionable runbooks.Establish, implement, and continuously refine incident response protocols, on-call practices, and postmortem evaluations.Manage backup/restore and disaster recovery strategies, including regular recovery drills.Plan and execute safe rollouts across various environments (development, staging, production), including canary deployments and rollback strategies.Collaborate daily with software engineers to embed reliability within design, implementation, and release processes.Set the benchmark for operational readiness and runbook standards, driving their adoption across teams.Enhance CI/CD pipelines and developer experience for improved speed and safety.

Apr 28, 2026
Apply
companyCoframe logo
Full-time|On-site|SF Bay Area

Role Overview Coframe is hiring a Platform Engineer in the SF Bay Area. This role focuses on building and improving the foundation of our engineering systems. The work ranges from applying AI to streamline productivity and deployment, to strengthening monitoring, security, and cost management. Impact This is a high-impact position with significant responsibility. The solutions built here will support every team at Coframe and help shape how we develop software in the future.

Apr 20, 2026
Apply
company
Full-time|On-site|San Francisco Office

Slash Financial develops business banking infrastructure tailored for real operational demands. Since its founding in 2021, the company has processed over $10 billion in annual business transactions across several industries. With $100M in Series C funding from investors including Ribbit Capital, Khosla Ventures, Goodwater Capital, NEA, and Y Combinator, Slash continues to grow its product offerings and market presence. The San Francisco headquarters fosters a collaborative, in-person work environment. Role overview The Senior Infrastructure/Platform Engineer will take a hands-on role in scaling and strengthening the core platform behind Slash’s banking products. This position involves designing, building, and maintaining infrastructure to support rapid company growth and high transaction volumes. The work spans AWS cloud operations, Terraform, Kubernetes, and related systems. This engineer will help shape infrastructure strategy and make key decisions on performance, observability, security, and deployment. What you will do Lead the development of next-generation database, real-time workflow, and container orchestration infrastructure. Scale and enhance the Kubernetes (EKS) platform, CockroachDB clusters, Kafka (MSK), Temporal workflows, and ElastiCache Redis. Collaborate with engineering teams to establish and scale best practices using AWS ALB, WAF, Route 53, OpenSearch, S3, and Vercel. Create and maintain abstractions in Terraform and Pulumi to streamline architecture and assist product teams. Improve the speed and reliability of CI/CD pipelines. Tackle complex scaling, performance, and low-latency challenges within a monolithic architecture. Location This role is based in the San Francisco office and supports an in-person work culture.

Apr 22, 2026
Apply
companyAnything logo
Full-time|On-site|San Francisco

About AnythingAnything is a pioneering AI product engineering company, empowering the next generation of entrepreneurs. Our innovative AI agent transforms English into fully functional applications, encapsulating everything needed to monetize online ventures, including mobile solutions, web interfaces, design, AI capabilities, backend services, infrastructure, and payment systems. Since our launch on August 7th, we have achieved $5 million in revenue and are rapidly expanding. Discover more at anything.com.Role OverviewWhat You Will DoWe are looking for individuals eager to accelerate their growth and make a significant impact. In this role, you will develop systems that support millions of applications and billions of users, addressing the challenges that arise in a high-demand environment. You will design and maintain the runtime, control plane, and isolation boundaries essential for safely executing user-generated applications at scale.Your innovative solutions will utilize platform telemetry, execution data, and feedback loops to enhance code generation and application performance, all powered by our AI-centric platform.You will take ownership of key components of the platform from architecture and implementation to operational production and iteration.Operational ResponsibilitiesDesign and manage multi-tenant cloud infrastructure, focusing on isolation, deployment, observability, and cost control for customer applications.Ensure top-tier reliability and performance for our platform.Conduct research to inform decisions regarding technology choices and service providers.Collaborate closely with product teams to develop platform features that drive product innovation.Stay informed about the latest advancements in infrastructure research and development.Successful platform management requires composure under pressure. We value self-assurance coupled with curiosity and a commitment to evidence-based decision-making.Key Performance MetricsYour effectiveness will be evaluated based on:1. Runtime InfrastructureDevelop and oversee scalable, low-latency infrastructure for user applications.2. Platform ReliabilityYou will ensure the platform's uptime and reliability, preventing failures from affecting multiple customers. Our users expect high availability and rapid issue resolution.3. Platform Support for Product FeaturesYou will create the platform features essential to support our product roadmap, ensuring seamless integration and performance.

Dec 15, 2025
Apply
companyScale AI logo
Full-time|$179.4K/yr - $224.3K/yr|On-site|San Francisco, CA; New York, NY

In a world where software is rapidly evolving, artificial intelligence (AI) is at the forefront, transforming how we interact with technology. At Scale AI, we recognize the immense potential of AI to enhance human capabilities, offering personalized support across various aspects of life—from coaching and tutoring to shopping and travel guidance. As enterprises, startups, and governments rush to integrate large language models (LLMs) into their operations, it is crucial to ensure these systems are safe, aligned, and effective. This involves rigorous human evaluation and reinforcement learning through human feedback (RLHF) during all stages of model development.Our innovative products, including the Generative AI Data Engine, SGP, and Donovan, are designed to empower the most advanced LLMs and generative models globally. By leveraging world-class RLHF, human data generation, model evaluation, safety, and alignment, we are shaping the future of human-AI interaction.As a member of our Platform Engineering team, you will play a pivotal role in designing and developing the foundational platforms that support Scale's operations. Your responsibilities will include architecting our core cloud infrastructure, enhancing our data lifecycle, and transforming the software development process for engineers at Scale. You will gain invaluable insights into the AI landscape as it develops within diverse sectors.

Mar 26, 2026
Apply
companyStitch Fix, Inc. logo
Platform Engineer

Stitch Fix, Inc.

Full-time|$88.1K/yr - $147K/yr|Remote|Remote, USA

About Stitch Fix, Inc. At Stitch Fix (NASDAQ: SFIX), we're transforming the way people shop for clothing. As the premier online personal styling service, we help clients uncover styles that fit them perfectly, ensuring they always look and feel their best. Our unique approach combines expert stylists with advanced AI and recommendation algorithms, offering a curated selection of exclusive and national brands tailored to individual tastes. Founded in 2011 and headquartered in San Francisco, we streamline the shopping experience, allowing clients to express their personal style without the hassle of traditional shopping. About the TeamThe Platform Engineering team is dedicated to developing and maintaining the tools that empower our Application and Service Developers at Stitch Fix. We prioritize creating low-friction tools and practices that enhance developer happiness, as we believe that satisfied developers produce superior code. Our practices include Infrastructure-as-Code, Immutable Infrastructure, and the use of Containers, alongside leveraging AWS and various Cloud SaaS solutions.About the RoleAs a Platform Engineer, you will play a vital role in building and enhancing Stitch Fix's cloud-native infrastructure and internal developer tools. Your focus will be on developing tools and automations that simplify the deployment, operation, and debugging of services, while you gain exposure to modern platform engineering practices alongside seasoned professionals. This position is ideal for engineers passionate about enhancing developer experience and eager to expand their expertise in cloud infrastructure and CI/CD systems.You'll be excited about this opportunity because you will…Contribute to the design and evolution of our internal platform-as-a-service utilized by application and service developers.Develop and uphold tools that enhance developer workflows, improve deployment reliability, and boost daily productivity.Work collaboratively with platform and application engineers to pinpoint friction points and apply incremental improvements.Learn and implement best practices surrounding Infrastructure-as-Code, containerized workloads, and CI/CD pipelines.Utilize or be eager to adopt AI-assisted development tools to increase productivity, and assist in exploring and integrating LLM-powered solutions for automating internal support and operational tasks.

Apr 10, 2026
Apply
company
Full-time|On-site|San Francisco

ABOUT UNTO LABSAt Unto Labs, we are a team of visionary engineers committed to pushing the boundaries of distributed systems and developing cutting-edge blockchain technology on standard hardware. Our focus lies in creating core system primitives and high-performance architectures that will shape the future of digital assets and markets.ABOUT THE ROLEWe are seeking a skilled Platform/DevOps Engineer to take ownership of the infrastructure utilized by our engineering team. You will be responsible for managing CI/CD pipelines, deployment environments, observability, and developer tools. Our infrastructure is rapidly evolving, encompassing distributed node architectures across multiple cloud providers, self-managed CI runners, and a variety of deployment targets. Your contributions will enhance the reliability, performance, and cost-effectiveness of our systems.This is a high-impact position within a small, elite team. Your efforts will empower every engineer to work quickly and minimize disruptions. You can expect to collaborate directly with systems engineers and product teams to build and fortify the platform layer for next-generation blockchain systems.RESPONSIBILITIESTake charge of optimizing our CI/CD pipelines to ensure reliability, cost efficiency, and effective spin-down strategies to manage compute costs.Consolidate and streamline deployment infrastructure across Cloudflare, Google Cloud Run, and legacy providers, minimizing redundancy and fragility.Implement and maintain observability through structured logging (OpenTelemetry/HyperDX), alerting (PagerDuty/Slack), and dashboards that monitor node health, transaction flow, and network performance.Facilitate smoother developer onboarding by creating reproducible local development setups, improving secrets management, and maintaining clear documentation.Support network operations, manage chain state, automate backups, and maintain high uptime standards.Assess and deploy infrastructure tooling, including Infrastructure as Code (IaC), container orchestration, and cloud cost monitoring.REQUIREMENTSMinimum of 3 years of experience in platform engineering, DevOps, SRE, or similar infrastructure roles, preferably in high-growth or startup environments.Proficient in CI/CD systems, including self-hosted runners and cost tracking.Extensive knowledge of Google Cloud Platform (GCP) (including Cloud Run, Compute Engine, GCS, IAM) or equivalent AWS services.Experienced with Docker/containerization and related technologies.

Mar 30, 2026
Apply
companyBrain Co. logo
Full-time|On-site|San Francisco Bay Area

About Brain Co.At Brain Co., we are at the forefront of artificial intelligence, developing innovative systems that facilitate mission-critical operations for some of the world's leading institutions. Our cutting-edge platform operates in high-security, high-stakes environments, where reliability, performance, and robust engineering practices are paramount.As an AI Platform Engineer specializing in Infrastructure, you will be instrumental in building and scaling the foundational platform that supports AI systems used in essential sectors, including government, energy, and healthcare. You will work within dynamic environments that span both cloud and on-premises settings, directly influencing our platform's reliability and performance, ensuring we meet the high standards required by our clients.This role is pivotal within our Infrastructure/Platform team. You will collaborate closely with engineering, AI/ML, and product teams to design scalable architectures, enhance our environments, optimize deployment processes, and guarantee the robustness necessary for enterprise and sovereign applications.

Sep 29, 2025

Sign in to browse more jobs

Create account — see all 5,602 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.