Founding Cloud Infrastructure Engineer
Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Senior
Qualifications
About Zaimler
Zaimler is a pioneering technology company focused on creating a new infrastructure layer for AI agents, ensuring that enterprise data is comprehensible and actionable. Our founders bring extensive experience from leading roles at major tech companies, and we are committed to revolutionizing how businesses leverage their data.
Similar jobs
About ZaimlerZaimler is at the forefront of transforming the way enterprise data is utilized in the era of AI. Our mission is to eliminate the fragmentation of data across disparate systems, providing AI agents with the contextual understanding necessary to operate efficiently and effectively. We are pioneering an innovative infrastructure layer that will redefine the capabilities of autonomous agents, enabling real-time inference through advanced knowledge graphs.Founded by industry veterans Biswajit Das and Sofus Macskassy, Zaimler is a seed-stage startup focused on delivering cutting-edge solutions to major enterprises across various sectors, including insurance, travel, and technology. If you're passionate about building the foundational infrastructure for the next generation of AI, we invite you to join our small, experienced team.Your RoleAs our Founding Cloud Infrastructure Engineer, you will take the lead in designing, constructing, and managing the cloud infrastructure that underpins Zaimler’s semantic platform. This is not a maintenance position; it’s an opportunity to create a robust system from the ground up that will shape the future of our operations.
Join Notable as a Staff Software Engineer specializing in Cloud Infrastructure and Applications. In this pivotal role, you will lead the design, development, and implementation of scalable cloud solutions that drive our innovative projects forward. Collaborate with cross-functional teams to optimize application performance and enhance user experience.
Skydio builds autonomous drones for a wide range of users, from utility inspectors and first responders to military personnel in the field. Based in San Mateo, California, Skydio combines artificial intelligence expertise with advanced hardware and software development, always focused on customer needs. About the Cloud Infrastructure Team The Cloud infrastructure group keeps Skydio’s platform available whenever and wherever it’s needed, whether for routine inspections or urgent disaster response. With thousands of drones deployed worldwide, the team continually improves how infrastructure is delivered and updated. Role Overview The Infrastructure Software Engineer manages and evolves Skydio’s Kubernetes fleet, making key software changes to support new and changing requirements. This hybrid role spans both infrastructure and software, offering the chance to shape product architecture, security, and performance. The position suits someone who enjoys working across the stack and tackling a mix of challenges. What You’ll Do Redesign and maintain a growing Kubernetes fleet and its supporting systems. Improve and expand the continuous delivery pipeline for Skydio’s products. Work with teams from hardware to cloud to introduce new platform features. Partner with security experts to strengthen data and drone protection measures. Introduce cost-saving strategies early in the product lifecycle to support long-term growth. What We’re Looking For At least 2 years of experience in infrastructure or software engineering. Hands-on knowledge of Kubernetes and cloud platforms. Strong analytical and problem-solving skills, with a collaborative approach. Drive for innovation and a high standard of quality in your work. Location: San Mateo, California, United States
Skydio, a premier drone manufacturer based in the United States, stands at the forefront of autonomous flight technology, paving the way for the future of drones and aerial mobility. Our diverse team merges profound expertise in artificial intelligence with top-tier hardware and software development, operational excellence, and a relentless focus on customer satisfaction. We empower a wide array of drone users, from utility inspectors to first responders and military personnel, to leverage our cutting-edge technology in various scenarios.About the Team: The Skydio Cloud Infrastructure team is dedicated to ensuring the Skydio Cloud platform is consistently available to our users at critical moments, whether conducting routine inspections or supporting rescue missions during emergencies. With a global fleet of thousands of drones, we are committed to continuous improvement, emphasizing robust delivery and testing pipelines as vital components of our operations.About the Role: As a Senior Infrastructure Engineer focused on an innovative product, you will play a pivotal role in maintaining our Kubernetes fleet and enhancing the core product software to meet evolving use cases. This position blends software engineering and infrastructure management, allowing you to address product deficiencies directly rather than solely relying on automation. We seek a professional who thrives on the autonomy to influence architecture, security, and functionality across the entire stack.Your Impact:Re-engineer and sustain the expanding requirements of our Kubernetes fleet and its underlying infrastructure.Enhance and broaden the continuous delivery processes for our product.Collaborate across teams (hardware to cloud) to introduce new capabilities to the platform.Engage directly with security teams to refine practices and controls that safeguard our customers' data and drones.Lead cost-saving initiatives early in the product lifecycle to ensure scalability.
Generalist
About the RoleThis position is pivotal in overseeing infrastructure across our entire tech stack. If it exists in the cloud, it falls under your purview. In the world of robotics, data is essential, and we require robust, scalable infrastructure to manage, store, and process vast amounts of this data. The APIs, services, and monitoring systems you will manage are critical to our operations.Your Responsibilities Include:Managing compute resources (both CPU and GPU) to efficiently process petabytes of data at high throughput.Overseeing the infrastructure required for data processing and storage.Ensuring the security and integrity of our infrastructure and data.You Will Excel in This Role If You Have:A minimum of 5 years of experience in managing large-scale cloud infrastructure using tools such as Kubernetes and Terraform, with a primary focus on Python services.Deep understanding of AWS services (or their equivalents) and their permission models.Strong perspectives on the effective use of coding agents within an infrastructure context.
Genesis Therapeutics
At Genesis Therapeutics, we are on a mission to revolutionize drug discovery by harnessing the power of machine learning, biophysical simulation, and computational chemistry. We are assembling a top-tier computational team and seek a passionate Infrastructure Engineer to contribute to the development of innovative medicines while playing a pivotal role in enhancing our AI platform.Your ResponsibilitiesCollaborate with the infrastructure team to sustain and expand our multi-cloud compute infrastructure that underpins ML model training, computational chemistry research, and ongoing drug discovery initiatives.Develop configuration and procedures for monitoring, resource allocation, and deployment automation to scale our autoscaling compute clusters for larger workloads.Enhance our orchestration scheduling framework to boost execution throughput, reliability, and compute utilization across diverse pipelines.Your QualificationsA minimum of 5 years of experience in building and maintaining large-scale cloud infrastructure, preferably in AWS or GCP.Strong proficiency in Python, Bash, Terraform, Ray, and Kubernetes.Experience in constructing and maintaining compute clusters for distributed ML training jobs utilizing 1,000+ GPUs is highly desirable.Hands-on experience with physical hardware and datacenter management is a plus.What We OfferAn opportunity to work on impactful infrastructure that accelerates the discovery of new medicines.Join a world-class, close-knit team of dedicated professionals across software, machine learning, computational chemistry, medicinal chemistry, and biology.Competitive salary and equity, along with comprehensive medical, dental, and vision insurance, and a 401(k) program.
Genesis Therapeutics
At Genesis Therapeutics, we are at the forefront of revolutionizing drug discovery by harnessing the power of machine learning, biophysical simulations, and computational chemistry. We are actively seeking a passionate Infrastructure Engineer to join our elite computational team. In this role, you will contribute to the development of groundbreaking medicines and play a pivotal part in the expansion of our advanced AI platform.Your Role:Collaborate with our infrastructure team to enhance and maintain our extensive multi-cloud compute infrastructure, which is vital for ML model training, computational chemistry research, and drug discovery initiatives.Develop and implement configurations and procedures for monitoring, resource allocation, and deployment automation to adapt to the growing demands of our autoscaling compute clusters.Contribute to the orchestration scheduling framework to boost execution throughput, increase reliability, and optimize compute utilization across diverse pipelines.Your Qualifications:Minimum of 5 years of experience in building and maintaining scalable cloud infrastructure, particularly with AWS or GCP.Proficient in Python, Bash, Terraform, Ray, and Kubernetes.Experience with distributed ML training jobs on compute clusters with over 1,000 GPUs is highly desirable.Hands-on experience with physical hardware and data center management is a plus.What We Offer:An opportunity to engage with impactful infrastructure that accelerates the discovery of new medicines.Be part of a world-class, close-knit team of dedicated professionals across software, machine learning, computational chemistry, medicinal chemistry, and biology.Competitive salary and equity options, alongside comprehensive medical, dental, and vision coverage, plus a 401(k) retirement plan.
About UsAt Verkada, we are revolutionizing the way organizations safeguard their people and property through an integrated, AI-powered platform. As a frontrunner in cloud physical security, Verkada empowers more than 30,000 organizations globally—including over 100 Fortune 500 companies—to enhance their safety and operational efficiency via a unified software platform that offers solutions for video surveillance, access management, air quality monitoring, alarms, intercoms, and visitor management.Founded in 2016, Verkada has experienced rapid growth, boasting 15 offices and a dedicated team of over 2,200 employees.The RoleJoin our innovative cloud infrastructure team, where you will play a crucial role in designing, building, and maintaining highly scalable, reliable systems that power Verkada’s services. You will have the chance to work on exciting projects such as scaling microservice clusters, automating serverless deployments, adopting a full service mesh, and enhancing system observability. Take charge of a subdomain and lead collaborative efforts across teams.This position requires your presence at our headquarters located in San Mateo, CA, as we are dedicated to fostering a vibrant in-office culture.
At Skydio, we pride ourselves on being the leading drone company in the US and the worldwide authority in autonomous flight technology, a critical element for the future of aerial mobility. Our dedicated team integrates extensive knowledge in artificial intelligence with premier hardware and software development practices, operational excellence, and a strong commitment to customer satisfaction. We empower a diverse range of drone users, including utility inspectors, first responders, and military personnel, delivering innovative solutions that cater to their unique needs.About the Role:In this role, you will be pivotal in transforming how we deliver essential information swiftly to our customers. Our clients require secure access to our cloud services every moment of every day. Our guiding principle is to ensure that enhanced security, comprehensive compliance, and adaptable deployment models do not compromise the user experience. We aim to provide our clients with robust security and flexibility—multi-cloud, multi-region, and government-grade compliance—without sacrificing speed or usability.As a Senior Product Manager, you will take ownership of and drive the Cloud Platform & Infrastructure roadmap, enabling our multi-cloud and government-compliant platform capabilities to support Skydio's global expansion across federal, state, local, and commercial sectors. Additionally, you will manage Skydio's on-premise product offerings by transforming air-gapped and self-hosted deployments into scalable, secure, and impactful solutions.This senior individual-contributor position focuses on converting field and sales challenges into a well-defined, customer-centric roadmap while leading cross-functional initiatives that bring these solutions to market. You will collaborate closely with Site Reliability Engineering (SRE), cloud engineering, security & compliance, legal, go-to-market (GTM) teams, and executive leadership to establish platform foundations that foster growth while ensuring the seamless and efficient experience our customers rely on.LocationThis position is based on-site five days a week at our headquarters in San Mateo, CA. Relocation assistance is available if needed.How You'll Make an Impact:Customer & Sales Engagement: Conduct discovery sessions with government and commercial clients to uncover their needs, build a strong business case for investment, validate proposed solutions in real-world scenarios, and enhance business growth by providing platform capabilities with clear delivery expectations.Own the Platform Roadmap: Collaborate with sales, customer success, deployment engineers, and clients to transform recurring pain points into actionable platform advancements.
About UsAt Verkada, we are revolutionizing the way organizations safeguard their personnel and properties through an integrated, AI-driven platform. As a frontrunner in cloud-based physical security, we empower over 30,000 organizations worldwide, including more than 100 Fortune 500 companies, to enhance safety and operational efficiency via a single, connected software solution. Our offerings encompass video security, access control, air quality sensors, alarms, intercoms, and visitor management. Established in 2016, Verkada has experienced remarkable growth, now boasting 15 offices and a dedicated workforce of over 2,200 employees.Role OverviewThe Verkada Security Team, primarily composed of software engineers, is dedicated to establishing optimal software security practices. In this pivotal role, you will enhance Verkada’s security throughout the software development lifecycle (SDLC) by utilizing automation, libraries, tools, and frameworks. Your responsibilities will span various technology stacks and involve collaborating with engineering teams across Verkada’s Command platform.
Zaimler
About ZaimlerIn a world where AI agents struggle to reason over fragmented data, Zaimler emerges as the solution. Our mission is to unify disparate enterprise data across countless systems, providing a shared context, meaning, and structure. This transformation is essential as we transition from traditional copilots to fully autonomous agents, necessitating a new infrastructure layer that we are dedicated to building.At Zaimler, we are pioneering context infrastructure for the agentic era—a platform that autonomously discovers domain knowledge, maps intricate relationships, and equips AI agents with the semantic understanding required for precise and scalable operations. Envision knowledge graphs that facilitate real-time inference, tailored for systems that need to reason rather than merely retrieve data.Founded by industry veterans Biswajit Das (former VP Engineering at Truera and Chief Architect at Visa) and Sofus Macskassy (ex-Director of Engineering at LinkedIn), who notably built one of the largest knowledge graphs in production, Zaimler is a small, senior team at the seed stage, collaborating with major enterprises in sectors like insurance, travel, and technology. If you are passionate about creating the infrastructure that will support the next decade of AI advancements, we are eager to connect with you.The RoleWe are in search of a talented Data Infrastructure Engineer to establish the foundational distributed data layer that will power our semantic platform. In this role, you will be responsible for designing, building, and scaling systems that enable high-throughput data ingestion, transformation, and real-time processing.
About UsAt Verkada, we are revolutionizing the way organizations ensure the safety of their people and spaces through a cohesive, AI-driven platform. As a frontrunner in cloud-based physical security, we empower over 30,000 organizations globally, including more than 100 Fortune 500 companies, to enhance their safety and operational efficiency with a single, integrated software solution encompassing video surveillance, access control, air quality monitoring, alarms, intercoms, and visitor management.Established in 2016, Verkada has witnessed rapid growth, boasting 15 offices and over 2,200 dedicated employees.Position OverviewWe are on the lookout for an exceptional Site Reliability Engineer to become part of our Infrastructure team. In this role, you will oversee and enhance our infrastructure, striving to optimize monitoring and scalability through the integration of third-party tools and your own innovative solutions. Your projects could involve improving cluster cost efficiency, enforcing security protocols, enhancing monitoring and alerting systems, and implementing a service mesh.
Maxima
Join Maxima as a Founding Product ManagerAt Maxima, we are revolutionizing enterprise accounting with innovative integrations, user-friendly design, and cutting-edge AI automation. Our platform consolidates essential processes into one streamlined solution, empowering accounting teams to concentrate on strategic initiatives and achieve greater outcomes with fewer resources.Led by a team of top engineers and finance experts from prestigious companies such as Robinhood, EY, Facebook, and Amazon, we are leveraging our extensive experience to reshape financial management for businesses.Backed by prominent Silicon Valley investors, we have secured the largest seed funding in our sector, enabling us to launch a fully operational product and onboard major clients.As the Founding Product Manager, you will steer Maxima's product strategy, roadmap, and execution for specific modules. Collaborating with our founders, engineering, and go-to-market teams, you will help define the future of enterprise accounting automation. This role is ideal for a proactive visionary who thrives in a fast-paced startup atmosphere and is eager to build impactful products from the ground up.
Join Our Team at MaximaAt Maxima, we are pioneering an innovative AI platform designed to automate enterprise accounting processes. Our solution effectively manages vast amounts of financial data, intricate accounting workflows, and guarantees robust execution that is both precise and dependable.We are addressing some of the most challenging issues in enterprise automation, and we have attracted a stellar engineering team comprised of industry experts from renowned companies like Robinhood, Glean, Google, Netflix, and Meta. Backed by prestigious investors such as Kleiner Perkins and Redpoint Ventures, we are proud to serve leading clients like Scale AI and Rippling.Your RoleAs a Software Engineer specializing in Data Infrastructure, you will be instrumental in developing and scaling systems that efficiently ingest and process large financial datasets. This foundational platform supports all accounting workflows and the agentic system. Your responsibilities will include designing our data lakes, constructing multi-tenant relational databases, and exploring optimal search engine and vector database configurations to enhance various workflows.
Genesis Therapeutics
Join Our Innovative TeamAt Genesis Therapeutics, we are a dynamic and passionate group of drug discovery experts, deep learning researchers, and software engineers dedicated to revolutionizing biochemistry through AI. Our mission is clear: to uncover and develop transformative therapies for patients with severe medical conditions.Our AI team is at the forefront of creating foundational models for small molecule drug discovery. We conduct cutting-edge research that bridges machine learning, physics, and computational chemistry, while building resilient software systems capable of executing large-scale simulations and training advanced generative and predictive AI models utilizing our powerful cluster of thousands of GPUs and tens of thousands of CPUs.Your RoleWe are on the lookout for skilled ML infrastructure engineers to propel our machine learning research initiatives, particularly in generative modeling of molecular systems, which is vital to our overarching goals.In this position, you will spearhead the rapid advancement of our AI platform and infrastructure, enhancing performance, efficiency, and scalability to unprecedented levels. You will construct expansive distributed training and inference pipelines, essential MLOps tools and frameworks, and fine-tune GPU operations to accelerate ML model performance.Genesis fosters a collaborative and interdisciplinary environment, allowing you to work closely with our talented engineers, researchers, and scientists.Your ResponsibilitiesDrive engineering initiatives aimed at the continuous enhancement of our AI platform, focusing on the rapid development of scalable and robust distributed infrastructures for ML training, inference, and evaluation.Facilitate model training and deployment across various clusters and cloud environments, optimizing for throughput and cost-effectiveness.Maximize the efficiency of ML models and other workloads in terms of latency, throughput, and memory usage, particularly through GPU performance engineering, pushing the boundaries of current hardware capabilities.Contribute to the long-term strategic vision for Genesis’ infrastructure platform.Your ProfileA strong engineering background with a focus on machine learning infrastructure.
Fireworks AI
About Us:At Fireworks AI, we are at the forefront of creating next-generation generative AI infrastructure. Our cutting-edge platform is recognized for delivering the highest-quality models with unparalleled speed and scalability in inference. Independently benchmarked as a leader in LLM inference speed, we drive significant advancements through innovative projects, including our proprietary function calling and multimodal models. As a Series C company valued at $4 billion and backed by leading investors such as Benchmark, Sequoia, Lightspeed, Index, and Evantic, we are a dynamic team of builders, comprised of veterans from Meta PyTorch and Google Vertex AI.The Role:We are seeking a talented Software Engineer to join our AI Infrastructure team. In this pivotal role, you will contribute to designing and developing the foundational systems that power Fireworks AI’s generative AI platform. Your focus will be on building robust infrastructure and tools that guarantee the reliability, performance, quality, and availability of our AI systems.Our mission is to establish Fireworks AI as the most dependable and user-friendly generative AI platform globally. You will collaborate closely with our cloud infrastructure, product, and performance teams to create infrastructure solutions that connect our customers with the high-performance proprietary Fireworks inference engine.Key Responsibilities:Design and develop scalable backend infrastructure supporting distributed training, inference, and data pipelines.Build and maintain essential backend services, including LLM CI/CD pipelines, control planes, and model serving systems.Enhance performance optimization, cost efficiency, and reliability across compute, storage, and networking layers.Create frameworks and safeguards to ensure Fireworks AI maintains the highest model quality in the industry.Work alongside performance, training, and product teams to translate research and product requirements into effective infrastructure solutions.Engage in code reviews, technical discussions, and continuous integration and deployment processes.
About the RoleAt Generalist, we are at the forefront of training expansive robot foundation models, leveraging cutting-edge GPU hardware, primarily from Nvidia, to execute distributed training tasks and experimental research. Our operations demand exceptional storage solutions and optimized data loading processes, necessitating the full utilization of cloud infrastructure alongside custom-built solutions.In this role, you will take charge of our inference infrastructure. Our robotic systems rely on a dedicated fleet of on-premises GPUs designed for demanding real-time computations and latency-sensitive applications within resource-constrained environments.Your Responsibilities:Manage and optimize our GPU compute fleets.Facilitate user-friendly access to GPUs for researchers, ensuring optimal utilization.Enhance ML data loading, transport, and storage systems in extensively utilized distributed environments.Oversee the orchestration of our robot inference fleets.You May Excel in This Position If You:Have experience managing large GPU fleets for large-scale, distributed training or inference.Possess significant expertise in using Slurm or Kubernetes for ML workload orchestration.Have developed high-scale ML data loaders and preparation systems.Understand the intricacies of ML hardware, storage, and networking systems.Are familiar with the Nvidia GPU ecosystem.
Skydio stands at the forefront of the US drone industry and is recognized globally as the leader in autonomous flight technology, a pivotal advancement for the future of drones and aerial mobility. Our team merges profound expertise in artificial intelligence with exceptional hardware and software product development, operational excellence, and an unwavering focus on customer satisfaction. We aim to empower a diverse range of drone users, from utility inspectors to first responders and military personnel.About the Role:Our mission is to deliver critical, life-saving information to our customers swiftly and securely. They rely on consistent engagement with our cloud services, and we are committed to ensuring that enhanced security, comprehensive compliance, and versatile deployment models do not hinder their experience. Our vision is to provide our clients with the flexibility and security they need—multi-cloud, multi-region deployments, and government-grade compliance—without compromising on speed, reliability, or user-friendliness.We are seeking a Staff Product Manager to lead the Cloud Platform & Infrastructure roadmap, facilitating the development of multi-cloud, multi-region, and government-compliant platform capabilities essential for Skydio's global scaling across federal, state, local, and commercial sectors. Additionally, you will oversee Skydio's on-prem product line, transforming air-gapped and self-hosted deployments into scalable, secure, and impactful solutions.In this senior individual contributor position, you will convert field and sales challenges into a well-defined, customer-centric roadmap and spearhead cross-functional initiatives to bring these solutions to market. Collaboration with Site Reliability Engineering (SRE), cloud engineering, security, compliance, legal, go-to-market teams, and executive leadership will be essential in building the foundational platform that drives growth, all while preserving the efficient, seamless experience our customers expect.LocationThis role is based onsite 5 days per week at our headquarters in San Mateo, CA. Relocation assistance is available if required.How You'll Make an Impact:Customer & Sales Engagement: Conduct discovery sessions with government and commercial clients to uncover the underlying needs, construct business cases for investment, validate proposed solutions in real-world scenarios, and support business growth by delivering platform capabilities with clear timelines and expectations.Own the Platform Roadmap: Collaborate with sales, customer success teams, deployment engineers, and clients to address recurring challenges and drive product innovation.
Join the dynamic world of Roblox, where millions engage daily in exploring, creating, learning, and connecting through immersive 3D experiences crafted by a global community of developers and creators.At Roblox, we are committed to building innovative tools and platforms that empower our community to realize their creative visions. Our mission is to transform how individuals connect, regardless of geographical boundaries, and across any device. We strive to foster connections among a billion users with positivity and respect, and we are actively seeking exceptional talent to help us achieve this goal.A career at Roblox is an opportunity to influence the future of human interaction, tackle unique technical challenges at scale, and contribute to creating safer, more respectful shared experiences for everyone.As a vital member of our Infrastructure Foundation Hardware Engineering team, you will lead the charge in delivering a reliable, high-performance, and cost-effective infrastructure that supports the world’s play. In this specialized role, you will act as the technical lead for our GPU and AI accelerator ecosystem, managing the entire lifecycle of GPU hardware—from architectural evaluation and firmware qualification to large-scale fleet integration and performance optimization. Your expertise will ensure that Roblox's extensive rendering and machine learning workloads operate on the most efficient and stable hardware available.Your Responsibilities Will Include:Architect & Prototype: Develop next-generation GPU-accelerated hardware platforms, ensuring seamless integration between high-density compute nodes, high-speed interconnects (NVLink/PCIe Gen5/6), and system firmware.GPU Optimization: Lead the integration, performance testing, and debugging of GPUs within our fleet, focusing on hardware-level optimizations, driver tuning, and thermal/power management.Validation & Certification: Create and implement comprehensive evaluation and stress-testing strategies for GPU-centric server platforms to meet Roblox's unique requirements for real-time rendering and low-latency AI inference.Firmware & Systems: Spearhead firmware qualification (BIOS/BMC) and troubleshooting, along with implementing automation systems to monitor GPU health and manage firmware updates.Vendor Collaboration: Collaborate with technology partners to enhance our GPU and AI infrastructure.
Join Roblox, where millions of users engage daily to explore, create, play, learn, and connect within 3D immersive digital environments fueled by our global community of innovators.At Roblox, we’re committed to developing tools and a platform that empower our community to turn their creative visions into reality. Our mission is to rethink how people connect from anywhere in the world, on any device. We aim to unite a billion people with a spirit of optimism and civility, and we are on the lookout for exceptional talent to help us achieve this goal.A role at Roblox means you will contribute to shaping the future of human interaction, address unique technical challenges at scale, and play a part in fostering safer, more respectful shared experiences for all.Roblox is redefining how individuals come together to connect, create, and express themselves. To support our extensive scale, we leverage microservices architecture. The Application Networking team is responsible for connecting and securing these services.Within this team, the Gateway Team acts as the "Front Door" of Roblox, overseeing critical infrastructure that manages all traffic entering Roblox (Ingress) and facilitates traffic across significant architectural boundaries.You Will:Steer the Gateway team towards delivering top-tier traffic management infrastructure.Lead the design and execution of ambitious "Moonshot" initiatives.Streamline our ingress stack and "Platformize" the gateway to enhance extensibility for developer teams.Oversee the reliability of Tier-0 systems.Guide engineers and team leads, ensuring sustainable on-call rotations and promoting continuous career advancement.
Sign in to browse more jobs
Create account — see all 448 results
About ZaimlerZaimler is at the forefront of transforming the way enterprise data is utilized in the era of AI. Our mission is to eliminate the fragmentation of data across disparate systems, providing AI agents with the contextual understanding necessary to operate efficiently and effectively. We are pioneering an innovative infrastructure layer that will redefine the capabilities of autonomous agents, enabling real-time inference through advanced knowledge graphs.Founded by industry veterans Biswajit Das and Sofus Macskassy, Zaimler is a seed-stage startup focused on delivering cutting-edge solutions to major enterprises across various sectors, including insurance, travel, and technology. If you're passionate about building the foundational infrastructure for the next generation of AI, we invite you to join our small, experienced team.Your RoleAs our Founding Cloud Infrastructure Engineer, you will take the lead in designing, constructing, and managing the cloud infrastructure that underpins Zaimler’s semantic platform. This is not a maintenance position; it’s an opportunity to create a robust system from the ground up that will shape the future of our operations.
Join Notable as a Staff Software Engineer specializing in Cloud Infrastructure and Applications. In this pivotal role, you will lead the design, development, and implementation of scalable cloud solutions that drive our innovative projects forward. Collaborate with cross-functional teams to optimize application performance and enhance user experience.
Skydio builds autonomous drones for a wide range of users, from utility inspectors and first responders to military personnel in the field. Based in San Mateo, California, Skydio combines artificial intelligence expertise with advanced hardware and software development, always focused on customer needs. About the Cloud Infrastructure Team The Cloud infrastructure group keeps Skydio’s platform available whenever and wherever it’s needed, whether for routine inspections or urgent disaster response. With thousands of drones deployed worldwide, the team continually improves how infrastructure is delivered and updated. Role Overview The Infrastructure Software Engineer manages and evolves Skydio’s Kubernetes fleet, making key software changes to support new and changing requirements. This hybrid role spans both infrastructure and software, offering the chance to shape product architecture, security, and performance. The position suits someone who enjoys working across the stack and tackling a mix of challenges. What You’ll Do Redesign and maintain a growing Kubernetes fleet and its supporting systems. Improve and expand the continuous delivery pipeline for Skydio’s products. Work with teams from hardware to cloud to introduce new platform features. Partner with security experts to strengthen data and drone protection measures. Introduce cost-saving strategies early in the product lifecycle to support long-term growth. What We’re Looking For At least 2 years of experience in infrastructure or software engineering. Hands-on knowledge of Kubernetes and cloud platforms. Strong analytical and problem-solving skills, with a collaborative approach. Drive for innovation and a high standard of quality in your work. Location: San Mateo, California, United States
Skydio, a premier drone manufacturer based in the United States, stands at the forefront of autonomous flight technology, paving the way for the future of drones and aerial mobility. Our diverse team merges profound expertise in artificial intelligence with top-tier hardware and software development, operational excellence, and a relentless focus on customer satisfaction. We empower a wide array of drone users, from utility inspectors to first responders and military personnel, to leverage our cutting-edge technology in various scenarios.About the Team: The Skydio Cloud Infrastructure team is dedicated to ensuring the Skydio Cloud platform is consistently available to our users at critical moments, whether conducting routine inspections or supporting rescue missions during emergencies. With a global fleet of thousands of drones, we are committed to continuous improvement, emphasizing robust delivery and testing pipelines as vital components of our operations.About the Role: As a Senior Infrastructure Engineer focused on an innovative product, you will play a pivotal role in maintaining our Kubernetes fleet and enhancing the core product software to meet evolving use cases. This position blends software engineering and infrastructure management, allowing you to address product deficiencies directly rather than solely relying on automation. We seek a professional who thrives on the autonomy to influence architecture, security, and functionality across the entire stack.Your Impact:Re-engineer and sustain the expanding requirements of our Kubernetes fleet and its underlying infrastructure.Enhance and broaden the continuous delivery processes for our product.Collaborate across teams (hardware to cloud) to introduce new capabilities to the platform.Engage directly with security teams to refine practices and controls that safeguard our customers' data and drones.Lead cost-saving initiatives early in the product lifecycle to ensure scalability.
Generalist
About the RoleThis position is pivotal in overseeing infrastructure across our entire tech stack. If it exists in the cloud, it falls under your purview. In the world of robotics, data is essential, and we require robust, scalable infrastructure to manage, store, and process vast amounts of this data. The APIs, services, and monitoring systems you will manage are critical to our operations.Your Responsibilities Include:Managing compute resources (both CPU and GPU) to efficiently process petabytes of data at high throughput.Overseeing the infrastructure required for data processing and storage.Ensuring the security and integrity of our infrastructure and data.You Will Excel in This Role If You Have:A minimum of 5 years of experience in managing large-scale cloud infrastructure using tools such as Kubernetes and Terraform, with a primary focus on Python services.Deep understanding of AWS services (or their equivalents) and their permission models.Strong perspectives on the effective use of coding agents within an infrastructure context.
Genesis Therapeutics
At Genesis Therapeutics, we are on a mission to revolutionize drug discovery by harnessing the power of machine learning, biophysical simulation, and computational chemistry. We are assembling a top-tier computational team and seek a passionate Infrastructure Engineer to contribute to the development of innovative medicines while playing a pivotal role in enhancing our AI platform.Your ResponsibilitiesCollaborate with the infrastructure team to sustain and expand our multi-cloud compute infrastructure that underpins ML model training, computational chemistry research, and ongoing drug discovery initiatives.Develop configuration and procedures for monitoring, resource allocation, and deployment automation to scale our autoscaling compute clusters for larger workloads.Enhance our orchestration scheduling framework to boost execution throughput, reliability, and compute utilization across diverse pipelines.Your QualificationsA minimum of 5 years of experience in building and maintaining large-scale cloud infrastructure, preferably in AWS or GCP.Strong proficiency in Python, Bash, Terraform, Ray, and Kubernetes.Experience in constructing and maintaining compute clusters for distributed ML training jobs utilizing 1,000+ GPUs is highly desirable.Hands-on experience with physical hardware and datacenter management is a plus.What We OfferAn opportunity to work on impactful infrastructure that accelerates the discovery of new medicines.Join a world-class, close-knit team of dedicated professionals across software, machine learning, computational chemistry, medicinal chemistry, and biology.Competitive salary and equity, along with comprehensive medical, dental, and vision insurance, and a 401(k) program.
Genesis Therapeutics
At Genesis Therapeutics, we are at the forefront of revolutionizing drug discovery by harnessing the power of machine learning, biophysical simulations, and computational chemistry. We are actively seeking a passionate Infrastructure Engineer to join our elite computational team. In this role, you will contribute to the development of groundbreaking medicines and play a pivotal part in the expansion of our advanced AI platform.Your Role:Collaborate with our infrastructure team to enhance and maintain our extensive multi-cloud compute infrastructure, which is vital for ML model training, computational chemistry research, and drug discovery initiatives.Develop and implement configurations and procedures for monitoring, resource allocation, and deployment automation to adapt to the growing demands of our autoscaling compute clusters.Contribute to the orchestration scheduling framework to boost execution throughput, increase reliability, and optimize compute utilization across diverse pipelines.Your Qualifications:Minimum of 5 years of experience in building and maintaining scalable cloud infrastructure, particularly with AWS or GCP.Proficient in Python, Bash, Terraform, Ray, and Kubernetes.Experience with distributed ML training jobs on compute clusters with over 1,000 GPUs is highly desirable.Hands-on experience with physical hardware and data center management is a plus.What We Offer:An opportunity to engage with impactful infrastructure that accelerates the discovery of new medicines.Be part of a world-class, close-knit team of dedicated professionals across software, machine learning, computational chemistry, medicinal chemistry, and biology.Competitive salary and equity options, alongside comprehensive medical, dental, and vision coverage, plus a 401(k) retirement plan.
About UsAt Verkada, we are revolutionizing the way organizations safeguard their people and property through an integrated, AI-powered platform. As a frontrunner in cloud physical security, Verkada empowers more than 30,000 organizations globally—including over 100 Fortune 500 companies—to enhance their safety and operational efficiency via a unified software platform that offers solutions for video surveillance, access management, air quality monitoring, alarms, intercoms, and visitor management.Founded in 2016, Verkada has experienced rapid growth, boasting 15 offices and a dedicated team of over 2,200 employees.The RoleJoin our innovative cloud infrastructure team, where you will play a crucial role in designing, building, and maintaining highly scalable, reliable systems that power Verkada’s services. You will have the chance to work on exciting projects such as scaling microservice clusters, automating serverless deployments, adopting a full service mesh, and enhancing system observability. Take charge of a subdomain and lead collaborative efforts across teams.This position requires your presence at our headquarters located in San Mateo, CA, as we are dedicated to fostering a vibrant in-office culture.
At Skydio, we pride ourselves on being the leading drone company in the US and the worldwide authority in autonomous flight technology, a critical element for the future of aerial mobility. Our dedicated team integrates extensive knowledge in artificial intelligence with premier hardware and software development practices, operational excellence, and a strong commitment to customer satisfaction. We empower a diverse range of drone users, including utility inspectors, first responders, and military personnel, delivering innovative solutions that cater to their unique needs.About the Role:In this role, you will be pivotal in transforming how we deliver essential information swiftly to our customers. Our clients require secure access to our cloud services every moment of every day. Our guiding principle is to ensure that enhanced security, comprehensive compliance, and adaptable deployment models do not compromise the user experience. We aim to provide our clients with robust security and flexibility—multi-cloud, multi-region, and government-grade compliance—without sacrificing speed or usability.As a Senior Product Manager, you will take ownership of and drive the Cloud Platform & Infrastructure roadmap, enabling our multi-cloud and government-compliant platform capabilities to support Skydio's global expansion across federal, state, local, and commercial sectors. Additionally, you will manage Skydio's on-premise product offerings by transforming air-gapped and self-hosted deployments into scalable, secure, and impactful solutions.This senior individual-contributor position focuses on converting field and sales challenges into a well-defined, customer-centric roadmap while leading cross-functional initiatives that bring these solutions to market. You will collaborate closely with Site Reliability Engineering (SRE), cloud engineering, security & compliance, legal, go-to-market (GTM) teams, and executive leadership to establish platform foundations that foster growth while ensuring the seamless and efficient experience our customers rely on.LocationThis position is based on-site five days a week at our headquarters in San Mateo, CA. Relocation assistance is available if needed.How You'll Make an Impact:Customer & Sales Engagement: Conduct discovery sessions with government and commercial clients to uncover their needs, build a strong business case for investment, validate proposed solutions in real-world scenarios, and enhance business growth by providing platform capabilities with clear delivery expectations.Own the Platform Roadmap: Collaborate with sales, customer success, deployment engineers, and clients to transform recurring pain points into actionable platform advancements.
About UsAt Verkada, we are revolutionizing the way organizations safeguard their personnel and properties through an integrated, AI-driven platform. As a frontrunner in cloud-based physical security, we empower over 30,000 organizations worldwide, including more than 100 Fortune 500 companies, to enhance safety and operational efficiency via a single, connected software solution. Our offerings encompass video security, access control, air quality sensors, alarms, intercoms, and visitor management. Established in 2016, Verkada has experienced remarkable growth, now boasting 15 offices and a dedicated workforce of over 2,200 employees.Role OverviewThe Verkada Security Team, primarily composed of software engineers, is dedicated to establishing optimal software security practices. In this pivotal role, you will enhance Verkada’s security throughout the software development lifecycle (SDLC) by utilizing automation, libraries, tools, and frameworks. Your responsibilities will span various technology stacks and involve collaborating with engineering teams across Verkada’s Command platform.
Zaimler
About ZaimlerIn a world where AI agents struggle to reason over fragmented data, Zaimler emerges as the solution. Our mission is to unify disparate enterprise data across countless systems, providing a shared context, meaning, and structure. This transformation is essential as we transition from traditional copilots to fully autonomous agents, necessitating a new infrastructure layer that we are dedicated to building.At Zaimler, we are pioneering context infrastructure for the agentic era—a platform that autonomously discovers domain knowledge, maps intricate relationships, and equips AI agents with the semantic understanding required for precise and scalable operations. Envision knowledge graphs that facilitate real-time inference, tailored for systems that need to reason rather than merely retrieve data.Founded by industry veterans Biswajit Das (former VP Engineering at Truera and Chief Architect at Visa) and Sofus Macskassy (ex-Director of Engineering at LinkedIn), who notably built one of the largest knowledge graphs in production, Zaimler is a small, senior team at the seed stage, collaborating with major enterprises in sectors like insurance, travel, and technology. If you are passionate about creating the infrastructure that will support the next decade of AI advancements, we are eager to connect with you.The RoleWe are in search of a talented Data Infrastructure Engineer to establish the foundational distributed data layer that will power our semantic platform. In this role, you will be responsible for designing, building, and scaling systems that enable high-throughput data ingestion, transformation, and real-time processing.
About UsAt Verkada, we are revolutionizing the way organizations ensure the safety of their people and spaces through a cohesive, AI-driven platform. As a frontrunner in cloud-based physical security, we empower over 30,000 organizations globally, including more than 100 Fortune 500 companies, to enhance their safety and operational efficiency with a single, integrated software solution encompassing video surveillance, access control, air quality monitoring, alarms, intercoms, and visitor management.Established in 2016, Verkada has witnessed rapid growth, boasting 15 offices and over 2,200 dedicated employees.Position OverviewWe are on the lookout for an exceptional Site Reliability Engineer to become part of our Infrastructure team. In this role, you will oversee and enhance our infrastructure, striving to optimize monitoring and scalability through the integration of third-party tools and your own innovative solutions. Your projects could involve improving cluster cost efficiency, enforcing security protocols, enhancing monitoring and alerting systems, and implementing a service mesh.
Maxima
Join Maxima as a Founding Product ManagerAt Maxima, we are revolutionizing enterprise accounting with innovative integrations, user-friendly design, and cutting-edge AI automation. Our platform consolidates essential processes into one streamlined solution, empowering accounting teams to concentrate on strategic initiatives and achieve greater outcomes with fewer resources.Led by a team of top engineers and finance experts from prestigious companies such as Robinhood, EY, Facebook, and Amazon, we are leveraging our extensive experience to reshape financial management for businesses.Backed by prominent Silicon Valley investors, we have secured the largest seed funding in our sector, enabling us to launch a fully operational product and onboard major clients.As the Founding Product Manager, you will steer Maxima's product strategy, roadmap, and execution for specific modules. Collaborating with our founders, engineering, and go-to-market teams, you will help define the future of enterprise accounting automation. This role is ideal for a proactive visionary who thrives in a fast-paced startup atmosphere and is eager to build impactful products from the ground up.
Join Our Team at MaximaAt Maxima, we are pioneering an innovative AI platform designed to automate enterprise accounting processes. Our solution effectively manages vast amounts of financial data, intricate accounting workflows, and guarantees robust execution that is both precise and dependable.We are addressing some of the most challenging issues in enterprise automation, and we have attracted a stellar engineering team comprised of industry experts from renowned companies like Robinhood, Glean, Google, Netflix, and Meta. Backed by prestigious investors such as Kleiner Perkins and Redpoint Ventures, we are proud to serve leading clients like Scale AI and Rippling.Your RoleAs a Software Engineer specializing in Data Infrastructure, you will be instrumental in developing and scaling systems that efficiently ingest and process large financial datasets. This foundational platform supports all accounting workflows and the agentic system. Your responsibilities will include designing our data lakes, constructing multi-tenant relational databases, and exploring optimal search engine and vector database configurations to enhance various workflows.
Genesis Therapeutics
Join Our Innovative TeamAt Genesis Therapeutics, we are a dynamic and passionate group of drug discovery experts, deep learning researchers, and software engineers dedicated to revolutionizing biochemistry through AI. Our mission is clear: to uncover and develop transformative therapies for patients with severe medical conditions.Our AI team is at the forefront of creating foundational models for small molecule drug discovery. We conduct cutting-edge research that bridges machine learning, physics, and computational chemistry, while building resilient software systems capable of executing large-scale simulations and training advanced generative and predictive AI models utilizing our powerful cluster of thousands of GPUs and tens of thousands of CPUs.Your RoleWe are on the lookout for skilled ML infrastructure engineers to propel our machine learning research initiatives, particularly in generative modeling of molecular systems, which is vital to our overarching goals.In this position, you will spearhead the rapid advancement of our AI platform and infrastructure, enhancing performance, efficiency, and scalability to unprecedented levels. You will construct expansive distributed training and inference pipelines, essential MLOps tools and frameworks, and fine-tune GPU operations to accelerate ML model performance.Genesis fosters a collaborative and interdisciplinary environment, allowing you to work closely with our talented engineers, researchers, and scientists.Your ResponsibilitiesDrive engineering initiatives aimed at the continuous enhancement of our AI platform, focusing on the rapid development of scalable and robust distributed infrastructures for ML training, inference, and evaluation.Facilitate model training and deployment across various clusters and cloud environments, optimizing for throughput and cost-effectiveness.Maximize the efficiency of ML models and other workloads in terms of latency, throughput, and memory usage, particularly through GPU performance engineering, pushing the boundaries of current hardware capabilities.Contribute to the long-term strategic vision for Genesis’ infrastructure platform.Your ProfileA strong engineering background with a focus on machine learning infrastructure.
Fireworks AI
About Us:At Fireworks AI, we are at the forefront of creating next-generation generative AI infrastructure. Our cutting-edge platform is recognized for delivering the highest-quality models with unparalleled speed and scalability in inference. Independently benchmarked as a leader in LLM inference speed, we drive significant advancements through innovative projects, including our proprietary function calling and multimodal models. As a Series C company valued at $4 billion and backed by leading investors such as Benchmark, Sequoia, Lightspeed, Index, and Evantic, we are a dynamic team of builders, comprised of veterans from Meta PyTorch and Google Vertex AI.The Role:We are seeking a talented Software Engineer to join our AI Infrastructure team. In this pivotal role, you will contribute to designing and developing the foundational systems that power Fireworks AI’s generative AI platform. Your focus will be on building robust infrastructure and tools that guarantee the reliability, performance, quality, and availability of our AI systems.Our mission is to establish Fireworks AI as the most dependable and user-friendly generative AI platform globally. You will collaborate closely with our cloud infrastructure, product, and performance teams to create infrastructure solutions that connect our customers with the high-performance proprietary Fireworks inference engine.Key Responsibilities:Design and develop scalable backend infrastructure supporting distributed training, inference, and data pipelines.Build and maintain essential backend services, including LLM CI/CD pipelines, control planes, and model serving systems.Enhance performance optimization, cost efficiency, and reliability across compute, storage, and networking layers.Create frameworks and safeguards to ensure Fireworks AI maintains the highest model quality in the industry.Work alongside performance, training, and product teams to translate research and product requirements into effective infrastructure solutions.Engage in code reviews, technical discussions, and continuous integration and deployment processes.
About the RoleAt Generalist, we are at the forefront of training expansive robot foundation models, leveraging cutting-edge GPU hardware, primarily from Nvidia, to execute distributed training tasks and experimental research. Our operations demand exceptional storage solutions and optimized data loading processes, necessitating the full utilization of cloud infrastructure alongside custom-built solutions.In this role, you will take charge of our inference infrastructure. Our robotic systems rely on a dedicated fleet of on-premises GPUs designed for demanding real-time computations and latency-sensitive applications within resource-constrained environments.Your Responsibilities:Manage and optimize our GPU compute fleets.Facilitate user-friendly access to GPUs for researchers, ensuring optimal utilization.Enhance ML data loading, transport, and storage systems in extensively utilized distributed environments.Oversee the orchestration of our robot inference fleets.You May Excel in This Position If You:Have experience managing large GPU fleets for large-scale, distributed training or inference.Possess significant expertise in using Slurm or Kubernetes for ML workload orchestration.Have developed high-scale ML data loaders and preparation systems.Understand the intricacies of ML hardware, storage, and networking systems.Are familiar with the Nvidia GPU ecosystem.
Skydio stands at the forefront of the US drone industry and is recognized globally as the leader in autonomous flight technology, a pivotal advancement for the future of drones and aerial mobility. Our team merges profound expertise in artificial intelligence with exceptional hardware and software product development, operational excellence, and an unwavering focus on customer satisfaction. We aim to empower a diverse range of drone users, from utility inspectors to first responders and military personnel.About the Role:Our mission is to deliver critical, life-saving information to our customers swiftly and securely. They rely on consistent engagement with our cloud services, and we are committed to ensuring that enhanced security, comprehensive compliance, and versatile deployment models do not hinder their experience. Our vision is to provide our clients with the flexibility and security they need—multi-cloud, multi-region deployments, and government-grade compliance—without compromising on speed, reliability, or user-friendliness.We are seeking a Staff Product Manager to lead the Cloud Platform & Infrastructure roadmap, facilitating the development of multi-cloud, multi-region, and government-compliant platform capabilities essential for Skydio's global scaling across federal, state, local, and commercial sectors. Additionally, you will oversee Skydio's on-prem product line, transforming air-gapped and self-hosted deployments into scalable, secure, and impactful solutions.In this senior individual contributor position, you will convert field and sales challenges into a well-defined, customer-centric roadmap and spearhead cross-functional initiatives to bring these solutions to market. Collaboration with Site Reliability Engineering (SRE), cloud engineering, security, compliance, legal, go-to-market teams, and executive leadership will be essential in building the foundational platform that drives growth, all while preserving the efficient, seamless experience our customers expect.LocationThis role is based onsite 5 days per week at our headquarters in San Mateo, CA. Relocation assistance is available if required.How You'll Make an Impact:Customer & Sales Engagement: Conduct discovery sessions with government and commercial clients to uncover the underlying needs, construct business cases for investment, validate proposed solutions in real-world scenarios, and support business growth by delivering platform capabilities with clear timelines and expectations.Own the Platform Roadmap: Collaborate with sales, customer success teams, deployment engineers, and clients to address recurring challenges and drive product innovation.
Join the dynamic world of Roblox, where millions engage daily in exploring, creating, learning, and connecting through immersive 3D experiences crafted by a global community of developers and creators.At Roblox, we are committed to building innovative tools and platforms that empower our community to realize their creative visions. Our mission is to transform how individuals connect, regardless of geographical boundaries, and across any device. We strive to foster connections among a billion users with positivity and respect, and we are actively seeking exceptional talent to help us achieve this goal.A career at Roblox is an opportunity to influence the future of human interaction, tackle unique technical challenges at scale, and contribute to creating safer, more respectful shared experiences for everyone.As a vital member of our Infrastructure Foundation Hardware Engineering team, you will lead the charge in delivering a reliable, high-performance, and cost-effective infrastructure that supports the world’s play. In this specialized role, you will act as the technical lead for our GPU and AI accelerator ecosystem, managing the entire lifecycle of GPU hardware—from architectural evaluation and firmware qualification to large-scale fleet integration and performance optimization. Your expertise will ensure that Roblox's extensive rendering and machine learning workloads operate on the most efficient and stable hardware available.Your Responsibilities Will Include:Architect & Prototype: Develop next-generation GPU-accelerated hardware platforms, ensuring seamless integration between high-density compute nodes, high-speed interconnects (NVLink/PCIe Gen5/6), and system firmware.GPU Optimization: Lead the integration, performance testing, and debugging of GPUs within our fleet, focusing on hardware-level optimizations, driver tuning, and thermal/power management.Validation & Certification: Create and implement comprehensive evaluation and stress-testing strategies for GPU-centric server platforms to meet Roblox's unique requirements for real-time rendering and low-latency AI inference.Firmware & Systems: Spearhead firmware qualification (BIOS/BMC) and troubleshooting, along with implementing automation systems to monitor GPU health and manage firmware updates.Vendor Collaboration: Collaborate with technology partners to enhance our GPU and AI infrastructure.
Join Roblox, where millions of users engage daily to explore, create, play, learn, and connect within 3D immersive digital environments fueled by our global community of innovators.At Roblox, we’re committed to developing tools and a platform that empower our community to turn their creative visions into reality. Our mission is to rethink how people connect from anywhere in the world, on any device. We aim to unite a billion people with a spirit of optimism and civility, and we are on the lookout for exceptional talent to help us achieve this goal.A role at Roblox means you will contribute to shaping the future of human interaction, address unique technical challenges at scale, and play a part in fostering safer, more respectful shared experiences for all.Roblox is redefining how individuals come together to connect, create, and express themselves. To support our extensive scale, we leverage microservices architecture. The Application Networking team is responsible for connecting and securing these services.Within this team, the Gateway Team acts as the "Front Door" of Roblox, overseeing critical infrastructure that manages all traffic entering Roblox (Ingress) and facilitates traffic across significant architectural boundaries.You Will:Steer the Gateway team towards delivering top-tier traffic management infrastructure.Lead the design and execution of ambitious "Moonshot" initiatives.Streamline our ingress stack and "Platformize" the gateway to enhance extensibility for developer teams.Oversee the reliability of Tier-0 systems.Guide engineers and team leads, ensuring sustainable on-call rotations and promoting continuous career advancement.
Sign in to browse more jobs
Create account — see all 448 results

