Software Engineer Enhancing Network Performance Reliability jobs in San Francisco – Browse 5,607 openings on RoboApply Jobs

Software Engineer Enhancing Network Performance Reliability jobs in San Francisco

Open roles matching “Software Engineer Enhancing Network Performance Reliability” with location signals for San Francisco. 5,607 active listings on RoboApply Jobs.

5,607 jobs found

1 - 20 of 5,607 Jobs
Apply
companyCloudflare, Inc. logo
Full-time|Hybrid|Hybrid

Join Cloudflare as a Software Engineer dedicated to enhancing our network performance and reliability. In this dynamic role, you will collaborate with cross-functional teams to develop innovative software solutions that optimize our network infrastructure and ensure high availability and performance for our users. Your contributions will directly impact millions of users worldwide, making the internet a safer place for everyone.

Mar 11, 2026
Apply
companyCloudflare, Inc. logo
Full-time|Hybrid|Hybrid

Join Cloudflare as a Senior Software Engineer specializing in Network Performance & Reliability! In this role, you'll be at the forefront of enhancing the performance and stability of our global network, ensuring our customers benefit from unparalleled speed and reliability. You'll collaborate with experts across various teams to design and implement innovative solutions that optimize network operations.

Mar 11, 2026
Apply
companyOpenAI logo
Full-time|On-site|San Francisco

Become a vital part of the engineering teams that responsibly bring OpenAI’s transformative technologies to the world!At OpenAI, our Applied Engineering team collaborates across research, engineering, product management, and design to deliver AI solutions to both consumers and businesses. We are committed to learning from our deployments, maximizing the benefits of AI, and ensuring that this powerful technology is utilized both safely and ethically. Our priority is safety over unchecked growth.About the RoleAs OpenAI continues to expand, we are seeking seasoned engineers who excel in problem-solving to enhance the scalability of our systems. Our achievements hinge on our ability to rapidly iterate on product development while ensuring optimal performance and reliability. You will thrive in a collaborative, fast-paced environment, playing a key role in delivering our technology to millions globally, with a focus on safety and reliability. As a reliability engineer, you will lead efforts to maintain and improve the stability, scalability, and performance of our dynamic infrastructure. You will collaborate closely with cross-functional teams, including software engineers, product managers, and data scientists, to construct and sustain robust systems capable of accommodating our growing user base and workload.Your Responsibilities Include:Designing and implementing solutions to scale our infrastructure to meet increasing demands effectively.Developing and maintaining load, chaos, and synthetic testing software that enhances the reliability of systems designed by development teams.Creating and managing automation tools to streamline repetitive tasks and bolster system reliability.Overseeing the lifecycle management platform for CPU/storage, GPU, and network resources to foster efficiency and support dynamic optimization.Implementing fault-tolerant and resilient design patterns to minimize service interruptions.Establishing and maintaining service level objectives (SLOs) and service level indicators (SLIs) to ensure system reliability.Collaborating with researchers, engineers, product managers, and designers to introduce new features and research advancements to the world.Participating in an on-call rotation to address critical incidents and ensure 24/7 system availability.Your Impact: Your contributions will be essential in guaranteeing the reliability and performance of our platforms as we continue to scale our operations.

Oct 17, 2025
Apply
companyCheckr, Inc. logo
Full-time|Remote|Denver, Colorado, United States; San Francisco, California, United States

Join Checkr as a Software Engineer focusing on Reliability, where your contributions will enhance our platform's robustness and performance. You will be part of a dynamic team dedicated to building and scaling systems that support our growth and ensure outstanding service delivery to our clients.

Mar 13, 2026
Apply
companyMeter Inc. logo
Full-time|On-site|San Francisco

Role overview Meter Inc. is developing tools to capture and preserve the expertise of network engineers. The team’s goal is to build systems that document how experts diagnose network issues, making it possible for future models to manage networks with less manual effort. This work will help Meter support many customer networks while reducing the need for direct engineer intervention. What makes this work unique Network engineering lacks the structured archives found in software development. While Git and GitHub record software decisions, the reasoning behind network troubleshooting often disappears once a problem is fixed. This role centers on building a structured, searchable system for network operations, a kind of GitHub for network engineering. The system will capture network state, expert observations, and the logic behind key decisions. Your first 90 days First 30 days: Meet with network engineers to learn their workflows. Study what effective diagnostic documentation looks like and identify the necessary data. Review telemetry (ClickHouse), configurations (Postgres), and support history (Salesforce). By 60 days: Deliver a working annotation interface. Network engineers should be able to review past support tickets, view the network’s state during incidents, and record their reasoning. The tool should be practical and encourage regular use. By 90 days: Network engineers will be able to create training data independently. Initial model benchmarks from your pipeline will be live, showing how your work improves the process. Technical stack TypeScript React Go GraphQL Kafka Postgres Collaboration This role works closely with Meter’s co-founder and CEO, who will help guide the product roadmap and set priorities. Location This position is based in San Francisco.

Apr 22, 2026
Apply
companySierra logo
Full-time|On-site|San Francisco, CA

About UsAt Sierra, we are pioneering a transformative platform that empowers businesses to forge authentic customer experiences through AI technology. Headquartered in the vibrant city of San Francisco, we also boast a dynamic presence in Atlanta, New York, London, France, Singapore, and Japan.Our operations are anchored in core values that shape our culture: Trust, Customer Obsession, Craftsmanship, Intensity, and Family. These principles guide our actions and are integral to our mission.Our visionary founders, Bret Taylor and Clay Bavor, bring unparalleled expertise. Bret, currently the Board Chair of OpenAI, previously co-led Salesforce and served as CTO at Facebook, while Clay led numerous initiatives at Google, including AR/VR projects and Google Workspace.Your RoleIn your capacity as a Software Engineer on the Site Reliability team, you will play a crucial role in establishing and enhancing the reliability, observability, and scalability of Sierra’s AI-centric infrastructure. Collaborating closely with our engineering and product teams, your goal is to ensure our systems remain highly available, efficient, and primed for growth.Lead the development of Sierra’s observability stack—including monitoring, alerting, logging, and tracing—to provide engineers with critical insights into system health and performance.Collaborate with product and platform engineers to architect systems that prioritize reliability and scalability from the outset, not as an afterthought.Design and implement robust, scalable, and secure cloud infrastructure on AWS, employing Terraform and cutting-edge DevOps tools.Enhance the reliability and scalability of our LLM deployments, ensuring they operate efficiently and cost-effectively.Drive improvements in deployment pipelines, CI/CD tooling, and incident management processes to minimize downtime and accelerate response times.Define and cultivate SRE practices within Sierra, shaping culture, tooling, and best practices across the engineering organization.QualificationsBachelor's degree in Computer Science or a related field, or equivalent experience.Proven experience in Site Reliability Engineering or a similar role, with a strong understanding of cloud infrastructure (AWS).Proficiency in Terraform and modern DevOps practices.Experience with observability tools and techniques—monitoring, alerting, logging, and tracing.Strong problem-solving skills with a focus on scalability and performance optimization.Excellent collaboration and communication skills, with the ability to work effectively in a team environment.

Oct 21, 2025
Apply
companyFastly, Inc. logo
Full-time|$181.2K/yr - $217.5K/yr|On-site|Denver, CO; San Francisco, CA

At Fastly, we empower individuals to connect more effectively with the things they cherish. Our cutting-edge edge cloud platform enables customers to swiftly, securely, and reliably craft exceptional digital experiences by processing, serving, and safeguarding their applications as close to their end-users as possible — right at the edge of the Internet. Tailored for modern internet demands, our platform is programmable and supports agile software development. We proudly serve many of the world's leading companies, including GitHub, Yelp, Paramount, and JetBlue.Join us in our mission to build a more trustworthy Internet.Posting Open Date: Feb. 25, 2026Anticipated Posting Close Date*: March 25, 2026*Please note that this job posting may close early depending on the volume of applications.Role Overview:The Data Reliability team is seeking an experienced Senior Software Engineer to contribute to the development and support of next-generation data storage solutions at Fastly. The ideal candidate will possess expertise in backend and data services within cloud environments, proficiency with configuration and orchestration tools such as Terraform and Kubernetes, and the ability to create internal administration tools using Go and related technologies. Our team plays a vital role in ensuring the infrastructure, orchestration, and reliability of Fastly's most data-intensive applications, utilizing technologies like Terraform, Elasticsearch, ClickHouse, Prometheus, MySQL, and Redis across both cloud and hardware platforms. Your contributions will directly enhance our customers' success by providing product teams with a robust platform for efficient and consistent delivery of high-quality, high-throughput, globally distributed data systems and products. We embrace a distributed work model and value both collaborative and asynchronous communication styles.Key Responsibilities:Deploy, support, and maintain various critical data storage systems, scaling from gigabytes to petabytes.Develop statistics and dashboards to track service-level objectives for these systems.Create and manage tools for configuration, backup, and authenticated access to data systems employing peer review, CI/CD, and both daemon- and container-based deployment strategies.Write high-performance, maintainable, and concise code, actively participating in code reviews to enhance the codebase.

Mar 20, 2026
Apply
companyOpenAI logo
Full-time|On-site|San Francisco

OpenAI’s networking teams design and manage high-performance systems that support the company’s training and inference infrastructure. As a Software Engineer focused on productivity, this position centers on making those teams more effective by improving the developer experience and streamlining complex workflows. Role overview This role supports engineers working on intricate infrastructure, with a focus on build systems, testing architecture, release pipelines, and overall development efficiency. The work involves optimizing how engineers build, test, validate, and deploy changes in environments that span multiple servers and interact closely with hardware. What you will do Improve development workflows for engineers building and operating networking systems at OpenAI. Design and refine pipelines for continuous deployment, release, and validation. Create and maintain test harnesses for multi-server, networked, and hardware-backed environments. Increase iteration speed across codebases, particularly in C++, Python, and environments centered on build systems. Work with engineers to identify and resolve pain points in CI, testing, debugging, and deployment processes. Lead testing and reliability strategies for infrastructure components that support large-scale training and inference workloads. Collaborate with both centralized developer experience teams and networking engineers who work directly with these systems.

Apr 28, 2026
Apply
companySigma Computing logo
Full-time|$170K/yr - $240K/yr|On-site|San Francisco, CA

About the Role Sigma Computing is growing its engineering team in San Francisco, CA. The company builds technology to help users access data with ease. As a Senior Software Engineer focused on Observability and Reliability, you will work alongside engineers who value high standards and collaboration. What You Will Do Design and build observability platforms and tools, including metrics collection, logging, distributed tracing, dashboards, alerting, and application performance monitoring. Work with technologies such as Go, OpenTelemetry, and Kubernetes to solve reliability challenges. Take part in on-call rotations to help maintain strong uptime for Sigma’s services. Create tools and processes to improve cloud incident triage and reduce downtime. Define and promote practices that make systems and services measurable and observable. Join design and code reviews with peers and stakeholders to reinforce quality and effective collaboration.

Apr 25, 2026
Apply
companyDatabricks logo
Full-time|$166K/yr - $225K/yr|On-site|San Francisco, California

P-97 At Databricks, we are dedicated to empowering data teams to tackle some of the most challenging problems in the world. We achieve this by creating and managing a leading data and AI infrastructure platform that enables our clients to leverage deep data insights for business enhancement. Our commitment to pushing the limits of data and AI technology is matched by our focus on resilience, security, and scalability, which are essential for our customers' success on our platform. Databricks operates one of the largest-scale software platforms, comprising millions of virtual machines that generate terabytes of logs and process exabytes of data daily. Given our scale, we frequently encounter cloud hardware, network, and operating system faults, and our software must adeptly protect our customers from these issues. As a Senior Performance Engineer, you will collaborate with various teams throughout the organization to assess product and feature performance, pinpoint performance bottlenecks, and partner with engineers to address performance and scalability challenges. This includes setting performance goals for different software releases, guiding teams in developing performance benchmarks, conducting competitive benchmark analyses for various Databricks products, and performing in-depth analyses to identify and resolve performance issues.

Jan 30, 2026
Apply
companyCloudflare, Inc. logo
Full-time|On-site|In-Office

Join Cloudflare as a Senior Software Engineer in our Network Development team, where you will design and implement innovative network solutions that enhance our global infrastructure. This role offers the opportunity to work with cutting-edge technologies and collaborate with talented engineers to drive significant improvements in network performance and reliability.

Mar 10, 2026
Apply
companyCanva logo
Full-time|On-site|San Francisco

Join our talented team at Canva as a Senior Software Engineer specializing in Video Performance. We are looking for an innovative and solutions-oriented engineer who is passionate about optimizing video experiences for our users. In this role, you will collaborate with cross-functional teams to enhance performance, develop new features, and implement best practices in video engineering.

Mar 16, 2026
Apply
companyBaseten logo
Full-time|On-site|San Francisco

ABOUT BASETENBaseten is at the forefront of AI technology, empowering leading-edge companies like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer to seamlessly integrate advanced AI models into their operations. Our unique blend of applied AI research, adaptable infrastructure, and intuitive developer tools enables innovators to bring their most ambitious AI products to life. With our recent $300M Series E funding from top-tier investors such as BOND, IVP, Spark Capital, Greylock, and Conviction, we are poised for rapid growth. Join us in shaping the platform that engineers rely on to deploy transformative AI solutions.THE ROLEAre you driven by a passion for enhancing artificial intelligence applications? We are seeking a proactive Software Engineer specializing in ML performance to join our energetic team. This position is perfect for backend engineers who thrive in a fast-paced startup environment and are eager to make substantial contributions to the realm of Large Language Model (LLM) Inference. If you're enthusiastic about optimizing open-source ML models, we can't wait to hear from you!EXAMPLE INITIATIVESAs a member of our Model Performance team, you will have the opportunity to work on exciting projects, including:Baseten Embeddings Inference: The quickest embeddings solution availableThe Baseten Inference StackDriving model performance optimizationRESPONSIBILITIESDevelop, refine, and implement advanced techniques (quantization, speculative decoding, kv cache reuse, chunked prefill, and LoRA) for ML model inference and infrastructure.Conduct thorough investigations into the codebases of TensorRT, PyTorch, TensorRT-LLM, vllm, sglang, CUDA, and other libraries to troubleshoot and resolve ML performance issues.Scale and apply optimization techniques across a diverse array of ML models, with a focus on large language models.

Mar 28, 2024
Apply
companyAstranis logo
Full-time|$130K/yr - $175K/yr|On-site|San Francisco

Astranis is at the forefront of satellite technology, creating cutting-edge satellites for high orbits that extend humanity's reach into the solar system. Our satellites are dedicated to providing secure networks for sophisticated clients worldwide, including large corporations, government entities, and the US military. With five satellites already in orbit and a robust pipeline of upcoming launches, we are addressing a commercial contract backlog exceeding $1 billion.Astranis stands out as the preferred satellite communication partner for clients with critical needs for uptime, data security, network visibility, and tailored solutions. We have successfully raised over $750 million from top-tier investors like Andreessen Horowitz, Blackrock, and Fidelity, and we proudly employ a talented team of 450 engineers and innovators at our expansive 153,000 sq. ft. headquarters in Northern California, USA.Embedded Software Engineer - Network SoftwareAs an Embedded Software Engineer on the Network Software team at Astranis, you will play a pivotal role in designing, developing, and implementing the flight software that commands and monitors our spacecraft. You will also have the opportunity to mentor junior engineers and collaborate cross-functionally to ensure the compatibility of the software with various hardware platforms.This position supports both commercial and US Government projects.RoleDevelop software for the Flight Computer and Earth Stations Hub for our satellites.Create firmware for control boards on the satellites and Earth Stations Hub.Enhance software reliability through requirements generation, continuous integration, automated testing, and code reviews.Collaborate with multiple teams including Payload, Communications Ground Control, and Telemetry.Utilize C++ to interface with the Flight computer and engage with custom hardware on the vehicle.Deliver complex projects through an incremental development approach.Requirements2-4+ years of experience in software development using modern C++ or C.Familiarity with software development methodologies and design patterns.Experience with communication protocols such as TCP/IP, I2C, and SPI.Strong communication and writing skills.A passion for space technology.Experience in building high availability and high reliability systems.

Mar 17, 2026
Apply
companyOpenAI logo
Full-time|On-site|San Francisco

Role overview This Software Engineer position at OpenAI focuses on inference and performance optimization. Based in San Francisco, the role centers on increasing the speed and efficiency of advanced AI systems. Collaboration with experienced engineers is a key part of the work, with an emphasis on refining AI performance. What you will do Work on optimizing the performance of AI inference systems Collaborate with other engineers to improve efficiency and speed Contribute to solutions that enhance AI system capabilities Location This role is based in San Francisco.

Apr 25, 2026
Apply
companyOpenAI logo
Full-time|On-site|San Francisco

OpenAI is seeking a Software Engineer in San Francisco to focus on improving productivity by optimizing model performance. This position centers on developing solutions that make machine learning models more efficient and effective. Role overview This role involves working closely with teams across different functions to identify and address areas where model performance can be improved. The aim is to deliver changes that have a measurable impact on both systems and workflows. What you will do Collaborate with engineers and other specialists to enhance model efficiency Develop and implement solutions that improve the effectiveness of machine learning systems Contribute to projects that streamline processes and drive productivity gains Impact Your work will help shape improvements in how models operate and how teams at OpenAI achieve their goals. The changes you help deliver will support more effective use of resources and better outcomes for the organization.

Apr 29, 2026
Apply
companyCrusoe Technologies logo
Full-time|On-site|San Francisco, CA - US

Join Crusoe Technologies as a Principal Software Engineer specializing in Software-Defined Networking (SDN). In this pivotal role, you will lead the design and development of innovative networking solutions that leverage SDN technologies. You will work closely with cross-functional teams to enhance our networking capabilities and drive the future of our products.As a thought leader in SDN, you will be responsible for architecting scalable solutions, optimizing performance, and ensuring robust security across our network infrastructure. This is an exciting opportunity to impact the rapidly evolving tech landscape.

Mar 12, 2026
Apply
companyZipline logo
Full-time|$170K/yr - $210K/yr|On-site|South San Francisco, California, USA

Software Engineer, Delivery Network Platform Join Zipline, where we are revolutionizing logistics with an autonomous delivery network. As part of the Delivery Network Platform team, you will develop the foundational systems that enable aircraft, sites, and infrastructure to operate seamlessly in live delivery scenarios. Your work will involve creating software solutions that provide operators with real-time insights and control, designing orchestration systems that manage fleet movements, and developing validation platforms to ensure the network's reliability as it scales. Your Responsibilities You will be responsible for software systems that are pivotal to fleet operations, including: Network Operating Center software for real-time visibility and interventions across aircraft, sites, missions, weather, and demand. Fleet orchestration systems for assignment, routing, scheduling, and rebalancing tasks. Maintenance and asset health systems linking issue detection to service readiness. Simulation and validation platforms to assess topology, load, and policy changes prior to production. Platform interfaces and configurable control planes that empower other teams to safely extend the network. Tackling Complex Challenges Unlike typical software roles focused on digital experiences, this position plays a critical role in managing a live autonomous logistics network. You'll address challenges such as: Maintaining an accurate real-time view of aircraft and essential site assets across the network. Ensuring the network remains operational amidst shifting demand, changing weather conditions, infrastructure issues, or capacity constraints. Creating user-friendly operator control interfaces that facilitate quick and accurate decision-making under pressure. Simulating potential future network behaviors to mitigate risks before they impact production. These systems directly affect operational performance. You will own significant components of the platform, make critical technical and product decisions, and have a substantial impact on the network's effectiveness. Team Dynamics Our team operates with a strong emphasis on ownership, trust, and high technical standards. Engineers are expected to identify significant problems, develop a clear vision for system functionality, and drive solutions from conception to production. Additionally, we encourage engineers to leverage AI tools to enhance exploration, implementation, and debugging processes while upholding strong engineering principles, judgment, and accountability.

Mar 23, 2026
Apply
companyOpenAI logo
Full-time|On-site|San Francisco

About Our TeamJoin our dynamic Infrastructure organization at OpenAI, where we are actively seeking talented software engineers to bolster our efforts across several high-impact teams. With a variety of focus areas available—including Core Distributed Systems, Databases, Observability, and Cloud Infrastructure—you'll have the opportunity to work on projects that fascinate you. Our teams operate with a high level of autonomy and foster a deeply collaborative environment, all dedicated to enhancing safety, reliability, and operational velocity across the organization.About the RoleAs a Software Engineer focused on Infrastructure Reliability, you will play a pivotal role in scaling and fortifying the infrastructure that supports some of the world’s most widely utilized AI systems. Your work will ensure that our systems maintain high reliability, observability, performance, and security—enabling researchers to iterate rapidly and allowing products like ChatGPT and the OpenAI API to effectively serve millions of users.This hands-on, impactful role is perfect for engineers who enjoy ownership, excel at solving complex technical challenges across the entire stack, and wish to contribute to systems that facilitate cutting-edge research deployed on a global scale. You will significantly influence technical direction, enhance system resilience, and collaborate closely with infrastructure, product, and research teams to transform intricate infrastructure into dependable platforms.Key ResponsibilitiesDesign, construct, and maintain reliable, high-performance systems utilized across engineering.Identify and resolve performance bottlenecks and inefficiencies, ensuring our infrastructure scales appropriately.Investigate and troubleshoot complex issues thoroughly.Enhance automation to minimize manual tasks and improve internal developer tools.Participate in incident response, postmortem analysis, and the development of best practices surrounding system reliability and scalability.Ideal Candidate ProfilePossess a deep understanding of distributed systems principles, with a proven track record in developing and managing scalable, reliable systems.Demonstrate a strong focus on performance and optimization, with the ability to maximize efficiency in complex, globally distributed systems.Have experience managing orchestration systems such as Kubernetes at scale and creating abstractions over cloud platforms.Be comfortable working within Linux environments and possess strong problem-solving skills.

Mar 19, 2026
Apply
companyCrusoe logo
Full-time|On-site|San Francisco, CA - US

Join Crusoe as a Staff Software Engineer specializing in Networking. In this critical role, you will design and implement innovative software solutions that enhance our networking infrastructure. You will collaborate with cross-functional teams to optimize performance and reliability, ensuring that our services run efficiently and securely.

Mar 25, 2026

Sign in to browse more jobs

Create account — see all 5,607 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.