Senior Platform Reliability Engineer At Vizcom San Francisco jobs in San Francisco – Browse 11,927 openings on RoboApply Jobs

Senior Platform & Reliability Engineer at Vizcom | San Francisco

VizcomSan Francisco

On-site Full-time $200K/yr - $250K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Senior

Qualifications

To thrive in this role, you should have a strong background in platform reliability engineering with a focus on incident management, Kubernetes, and API architecture. You should also possess excellent communication skills and the ability to work collaboratively across teams.

About the job

Agency Notice: We are not currently collaborating with recruiting agencies for this role. We kindly ask that you refrain from contacting Vizcom employees regarding this position. Any resumes submitted without prior agreement will be considered unsolicited.

About Vizcom

Vizcom is a cutting-edge visual creation platform that merges advanced web tooling with AI-driven workflows. Our technology stack incorporates React/TypeScript for the front end, Node/Koa + PostGraphile for API services, PostgreSQL, Redis, BullMQ for queuing, and a Kubernetes-based production infrastructure.

We are seeking a seasoned expert to oversee platform stability and infrastructure, ensuring our system remains reliable, efficient, and resilient as we scale.

Role Mission

Take full ownership of service reliability: proactively prevent incidents, minimize impact during failures, and guide swift, high-quality recovery during production downtimes.

This role involves hands-on technical leadership, granting you the authority to establish reliability standards and enforce production protocols.

Compensation

Base salary between $200,000 and $250,000, plus significant equity.

Your Responsibilities

Reliability Standards: Define and uphold SLIs/SLOs/error budgets for key user interactions.
Resilience of Production Architecture: Implement failure isolation across APIs, workers, queues, and interdependencies to ensure one subsystem's failure does not disrupt core access.
Kubernetes Runtime Reliability: Establish probe contracts, deployment standards, graceful shutdown protocols, scaling/resource policies, and startup safety measures.
Queue & Job Safety (BullMQ/Redis): Manage poison pill containment and workload segregation.
Incident Command Quality: Lead Sev1/Sev2 incident responses from containment to corrective actions.
Reliability Operating System: Oversee observability quality (prioritizing signal over noise), on-call efficiency, runbook maintenance, and postmortem discipline.
Deployment Safety Authority: Gate risky deployments and enforce reliability protocols whenever production health is compromised.

About Vizcom

At Vizcom, we are revolutionizing visual creation through our sophisticated platform that leverages the latest advancements in AI. Our collaborative environment encourages innovation and creativity, making us a leader in the industry.

Similar jobs

1 - 20 of 11,927 Jobs

Select all on this page (20)

Apply

Senior Platform & Reliability Engineer at Vizcom | San Francisco

Vizcom

Full-time|$200K/yr - $250K/yr|On-site|San Francisco

Agency Notice: We are not currently collaborating with recruiting agencies for this role. We kindly ask that you refrain from contacting Vizcom employees regarding this position. Any resumes submitted without prior agreement will be considered unsolicited.About VizcomVizcom is a cutting-edge visual creation platform that merges advanced web tooling with AI-driven workflows. Our technology stack incorporates React/TypeScript for the front end, Node/Koa + PostGraphile for API services, PostgreSQL, Redis, BullMQ for queuing, and a Kubernetes-based production infrastructure.We are seeking a seasoned expert to oversee platform stability and infrastructure, ensuring our system remains reliable, efficient, and resilient as we scale.Role MissionTake full ownership of service reliability: proactively prevent incidents, minimize impact during failures, and guide swift, high-quality recovery during production downtimes.This role involves hands-on technical leadership, granting you the authority to establish reliability standards and enforce production protocols.CompensationBase salary between $200,000 and $250,000, plus significant equity.Your ResponsibilitiesReliability Standards: Define and uphold SLIs/SLOs/error budgets for key user interactions.Resilience of Production Architecture: Implement failure isolation across APIs, workers, queues, and interdependencies to ensure one subsystem's failure does not disrupt core access.Kubernetes Runtime Reliability: Establish probe contracts, deployment standards, graceful shutdown protocols, scaling/resource policies, and startup safety measures.Queue & Job Safety (BullMQ/Redis): Manage poison pill containment and workload segregation.Incident Command Quality: Lead Sev1/Sev2 incident responses from containment to corrective actions.Reliability Operating System: Oversee observability quality (prioritizing signal over noise), on-call efficiency, runbook maintenance, and postmortem discipline.Deployment Safety Authority: Gate risky deployments and enforce reliability protocols whenever production health is compromised.

Feb 24, 2026

Apply

Senior Product Engineer at Vizcom | San Francisco

Vizcom

Full-time|$195K/yr - $225K/yr|On-site|San Francisco

Agency Notice: We are not collaborating with recruiting agencies for this position. Please refrain from contacting Vizcom employees regarding this role. Any unsolicited resumes will not be acknowledged.Why Join Vizcom?At Vizcom, we are on a mission to revolutionize design by simplifying the journey from concept to creation, transforming impactful ideas into tangible realities.Backed by significant funding (Series B with $51 million raised to date), we are experiencing robust month-over-month growth and are actively expanding our core engineering team in San Francisco. Our scope is vast: real-time collaboration, GPU inference at scale, utilizing a modern TypeScript stack, and addressing real enterprise needs.The RoleAs a Senior Product Engineer, you will be responsible for key product components in Vizcom’s real-time design environment, including our canvas rendering engine and collaborative UX systems. Your expertise in React, WebGL, and GraphQL will be essential in enhancing performance, interaction design, and developer experience. You will work closely with designers, AI engineers, and backend leaders to ensure that Vizcom’s interface is not only visually stunning but also exceptionally fast. In your first 90 days, you will: deliver a new or re-architected user-facing feature that enhances product performance or fidelity, establish a render or collaboration performance baseline (including metrics like frame rate, latency, and bundle size), contribute to frontend technology direction (such as render pipeline abstraction and UI state modeling), and eliminate one persistent performance or UX issue across the frontend stack.Compensation$195,000 - $225,000 annually + Equity OpportunitiesYour ResponsibilitiesCreate Engaging Interactive Features: Develop functionalities that integrate design, collaboration, and 3D visualization.Enhance Performance: Optimize React and WebGL rendering to ensure instantaneous responsiveness under load.Design Developer Ergonomics: Develop component architecture, hooks, and state models that scale effectively.Collaborate with Design Teams: Rapidly prototype, test user experiences, and bring high-fidelity UX to fruition.Boost Observability: Implement and analyze performance metrics (FPS, React profiler, network traces).Mentor Colleagues: Review code for clarity, structure, and long-term maintainability.

Dec 4, 2025

Apply

Senior Software Engineer – Graphics at Vizcom | San Francisco

Vizcom

Full-time|$195K/yr - $225K/yr|On-site|San Francisco

Agency Notice: We are not currently collaborating with recruitment agencies for this position. Please refrain from contacting Vizcom employees about this role. Any resumes submitted without prior agreement will be considered unsolicited.Why Join Vizcom?At Vizcom, we are on a mission to transform the design process by simplifying the journey from concept to creation, allowing impactful ideas to materialize in the physical world. Backed by significant funding (Series B with $51 million raised to date), we are experiencing robust growth and are expanding our core engineering team in San Francisco. Our projects span a wide range of areas including real-time collaboration, large-scale GPU inference, and leveraging a modern TypeScript stack to serve enterprise needs.The RoleAs a Senior Software Engineer – Graphics (Rendering & Real-Time Systems), you will be responsible for the rendering and interactivity layer that powers Vizcom’s creative canvas. Your work will encompass everything from brush simulation to real-time lighting and 3D previews, challenging the boundaries of what is achievable in the browser. You will engage with technologies such as WebGL, WebGPU, and AI-assisted graphics, collaborating with design and AI teams to make creativity feel instantaneous. In your first 90 days, you will: deliver a core rendering improvement (e.g., new brush pipeline, shader cache, or culling system) that enhances frame stability or interactivity; create a graphics profiling toolkit or debugging overlay to visualize GPU performance in production; ship an experimental graphics feature to production or beta users (for example, stylized rendering, 3D lighting, or layered compositing); and document a graphics architecture RFC clarifying design decisions and performance budgets.Compensation$195,000 - $225,000 annually + Equity OptionsWhat You’ll Be DoingBuild and Optimize the Rendering Engine: Design and refine Vizcom’s core WebGL/WebGPU pipeline to achieve high performance and visual fidelity.Enhance Real-Time UX: Develop seamless brush and material interactions, animation systems, and responsive 3D previews.Integrate AI Solutions: Collaborate to incorporate GPU inference outputs (such as diffusion, segmentation, and line following) into interactive visual workflows.Performance Optimization: Profile GPU and CPU paths, minimize frame latency, and optimize memory usage for real-world applications.

Dec 4, 2025

Apply

Senior Full Stack Engineer at Vizcom | San Francisco

Vizcom

Full-time|$195K/yr - $225K/yr|On-site|San Francisco

Agency Notice: We are not currently collaborating with recruiting agencies for this position. Please refrain from contacting Vizcom employees regarding this role. Any resumes submitted without a prior agreement will be deemed unsolicited.Why Join Vizcom?Vizcom is an innovative AI-driven design platform utilized by industry leaders such as GM, Nike, and Ford, among over 200 other esteemed companies, to transform sketches into stunning photorealistic renders, 3D models, and engaging videos. Our belief is that AI should enhance human creativity rather than replace it. By adopting a sketch-first methodology, we empower designers to maintain control while significantly speeding up the iteration process from 5 ideas to 500 in a single session. Our team of around 50 is rapidly expanding, delivering multiple updates daily, and developing the tool that is setting the benchmark for professional industrial design.Your RoleIn this role, you will engage with the entirety of Vizcom's technology stack, from TypeScript-based design tools to Postgres data models and AI inference pipelines, to deliver fully integrated features that designers rely on daily. While our frontend and backend specialists dive deep into their focuses, your expertise spans wide: you will be the engineer capable of tracing a bug from a React component through a GraphQL query to a Postgres function, resolving it, and deploying the fix before lunch.This position is perfect for individuals who thrive on variety and ownership. In any given week, you may create a new canvas interaction in React Flow, refine a PostGraphile query, implement a BullMQ job for a new AI workflow, and review a fellow team member's migration. Your contributions will make the philosophy of enhancing human creativity with AI a reality across our entire product.Compensation$195K – $225K + Offers EquityKey ResponsibilitiesDeliver complete features: Take ownership of features from React component to Postgres schema — with no handoffs or delays.Enhance product UI: Create fast and precise interactions in React, TypeScript, and React Flow.Develop APIs and data models: Execute Postgres migrations, PostGraphile mutations, and RLS policies for multi-tenant isolation.Integrate AI workflows: Seamlessly connect frontend experiences to GPU inference pipelines through BullMQ and external AI services.Optimize the tech stack: Profile and enhance performance across the board — from React renders to GraphQL resolutions and query plans.Collaborate extensively: Partner with frontend specialists, graphics engineers, designers, and product teams to deliver cohesive user experiences.Enhance reliability: Implement monitoring for features to ensure robust performance.

Feb 17, 2026

Apply

Founding Platform & Reliability Engineer at OpenArt | San Francisco

OpenArt

Full-time|On-site|San Francisco Bay Area

Founding Platform & Reliability Engineer About OpenArtOpenArt is a revolutionary AI-driven storytelling and visual creation platform utilized by millions around the globe. Our mission is to build the next generation of creative tools powered by advanced AI technology, allowing users to generate videos, visuals, characters, and narratives with speed and creativity never seen before. We envision a future where creativity is inherently AI-native, and we are at the forefront of this transformation. Why Join OpenArt?Be part of a small, dynamic team where senior engineers are responsible for significant systems, not just fragments.Contribute to large-scale projects, with your work impacting millions of users swiftly.Benefit from a founder-led engineering culture where both founders are technical and actively engaged in product and architectural decisions.Work on an AI-native product, crafting how state-of-the-art AI models translate into tangible user experiences.Experience high ownership with minimal bureaucracy, emphasizing judgment, clarity, and speed.Join us during a period of significant growth, with a 7-10X revenue increase over the past two years, and play a pivotal role in scaling the company to new heights. About the RoleWe are seeking a Founding Platform & Reliability Engineer to take charge of the design, scalability, and reliability of our entire infrastructure stack, from high-level architectural choices to hands-on implementation, observability, and cost management.This role is not suited for traditional operators or narrow DevOps specialists. You should be adept at navigating cloud infrastructure, distributed systems, backend services, and developer tools, making practical decisions that optimize product velocity, system reliability, and cost efficiency, particularly in a fast-paced AI-centric landscape.You will collaborate closely with the founders and product engineers to design and refine the platform that powers OpenArt, influencing key decisions like serverless versus containerized architecture, multi-provider AI reliability, and scaling systems for millions of users, while serving as a force multiplier for the entire engineering team. What You’ll DoEstablish and operationalize SLOs/SLIs across essential user journeys (generation, editing, payments/credits, uploads, etc.), utilizing them to guide prioritization (including error budgets).Lead the design and implementation of robust infrastructure solutions that effectively support OpenArt's rapid growth and evolving needs.

Mar 26, 2026

Apply

Senior Staff Software Engineer

Vizcom

Full-time|$234K/yr - $285K/yr|On-site|San Francisco

Agency Notice: We are not currently partnering with recruitment agencies for this position. Please refrain from contacting Vizcom employees regarding this role. Any resumes submitted without prior agreement will be deemed unsolicited.Why Join Vizcom?At Vizcom, our mission is to revolutionize design by simplifying the journey from concept to realization, ensuring that impactful ideas are transformed into tangible products. We are well-funded, with a Series B funding of $51 million to date, and we are experiencing double-digit growth month over month. Our core engineering team in San Francisco is expanding, focusing on a wide range of challenges including real-time collaboration, GPU inference at scale, and utilizing a modern TypeScript stack to serve enterprise needs.Your RoleAs a Senior Staff Software Engineer, you will take full ownership of essential product domains, guiding the technical strategy from architecture to deployment. Your responsibilities will include conducting design reviews, delivering prototypes that advance our products, and establishing practical engineering standards that harmonize speed and quality. In addition to your own contributions, you will mentor colleagues to become future technical leaders, fostering a top-tier engineering culture as Vizcom scales.Compensation$234K – $285K + Equity OptionsKey ResponsibilitiesDevelop the architectural roadmap to ensure Vizcom remains reliable, scalable, and enjoyable for development as usage and data evolve.Engage hands-on by diving into code for key functionalities, rapid prototypes, and complex debugging (including latency, memory issues, race conditions, and WebGL/WASM peculiarities).Implement quality assurance measures that can scale: testing strategies, code review processes, performance budgets, and observability standards.Mentor and cultivate talent: guide senior engineers to evolve into tech leads, document clear design practices, and exemplify effective decision-making.Collaborate cross-functionally: articulate engineering trade-offs to Product, Design, and Go-To-Market teams; act as the CTO's representative when necessary.First 90 Days GoalsEnhance reliability and performance: establish an observability-driven strategy for our top 5 critical workflows; aim to reduce p95 for at least one user-visible flow by approximately 30% through targeted optimizations (query planning, caching, batching, and GPU/CPU boundaries).Architectural Contributions: Draft a significant RFC (e.g., real-time document models, rendering pipeline demarcations, or background job architecture) and lead it to implementation.Team Development: Mentor two engineers, ensuring measurable progress in their scope, design quality, or other key performance metrics.

Aug 27, 2025

Apply

Site Reliability Engineer - Platform at CodeRabbit | San Francisco

CodeRabbit

Full-time|On-site|San Francisco

About CodeRabbitCodeRabbit is a pioneering research and development firm dedicated to creating highly efficient human-machine collaboration systems. Our mission is to develop the next generation of AI-driven code review tools, fostering a harmonious partnership between human creativity and advanced algorithms that far exceed the capabilities of individual engineers. By merging language models with human innovation, we aim to elevate the standards of efficiency and quality in software development.The RoleWe are in search of a talented Site Reliability Engineer (SRE) to become a vital part of our Platform Engineering team located in the Bay Area. In this role, you will play a crucial part in maintaining the high availability, performance, and scalability of CodeRabbit's AI-enhanced code review platform. This position lies at the nexus of software engineering and systems operations, where you will construct the foundational platforms and automation that empower our engineering teams to deploy, monitor, and scale our services with reliability.As a Site Reliability Engineer at CodeRabbit, your responsibilities will include improving the reliability of our essential services that handle millions of code reviews, developing sophisticated automation platforms, and managing the infrastructure that drives our AI analysis engine. You will engage with cutting-edge technologies such as large language models, real-time processing systems, and distributed architectures that function at scale.Key ResponsibilitiesInfrastructure & Platform OwnershipDesign, implement, and maintain scalable infrastructure on Google Cloud Platform to accommodate CodeRabbit's expanding user base and processing needs.Take ownership of and operate essential platform services.Develop and manage Infrastructure as Code using Terraform to guarantee consistent, reproducible, and version-controlled infrastructure deployments.Reliability & Performance EngineeringEstablish and uphold SLI/SLO frameworks for all critical services, ensuring we fulfill our reliability commitments to users.Implement comprehensive monitoring, alerting, and observability solutions utilizing Datadog and custom instrumentation.Conduct in-depth incident response, root cause analysis, and post-mortem processes to continually enhance system reliability.Optimize application and infrastructure performance to manage millions of pull request analyses with minimal latency.

Jan 9, 2026

Apply

Platform Engineer at allinbits | San Francisco

All In Bits, Inc.

Full-time|On-site|San Francisco

About the RoleJoin our dynamic team at allinbits as a Platform Engineer, where your expertise will be vital in designing and maintaining the robust infrastructure that supports our cutting-edge projects. Your role will combine technical acumen with strategic insight, ensuring our development and operational environments are finely tuned for optimal performance, reliability, and scalability.We prioritize experience in our team, especially if you have transitioned from a developer role into DevOps or Site Reliability Engineering (SRE). Your capacity to innovate and construct resilient systems will prove invaluable.In this position, you will utilize tools such as Ansible, Docker, and Hashicorp Nomad to enhance our operations.

Feb 4, 2025

Apply

Senior Site Reliability Engineer at Drata | San Francisco

Drata

Full-time|$166.9K/yr - $225.9K/yr|Hybrid|Hybrid - San Francisco

Drata helps organizations demonstrate their commitment to security and integrity. The platform supports companies as they build and maintain trust with users, customers, partners, and prospects. Values Built on Trust: Consistency shapes decisions and actions. Integrity: Choosing to do what is right, every time. Customer-Obsessed: Prioritizing customer needs above all else. Competitive Fire: Striving for higher standards and greater achievements. Diversity: Welcoming different perspectives to encourage creative solutions. Automation First: Pursuing efficiency by saving time and resources wherever possible. How the Team Works Drata blends high standards with a supportive environment focused on growth. Team members are encouraged to own their work, improve continuously, and deliver meaningful results. The company values quick, informed decisions that drive immediate impact, while always keeping the mission and customer needs at the center. The San Francisco-based team uses a hybrid work model. Colleagues collaborate in the office Tuesday through Thursday, focusing on alignment and innovation. Mondays and Fridays offer flexibility for deep work or personal needs. Growth and Culture Drata has expanded to over 600 professionals worldwide, recognized for a culture that values trust, speed, and continuous learning. The environment supports both personal and professional development. See the Speed: CEO Adam Markowitz discusses Drata’s rapid journey to $100M ARR in four years. Hear the Voice of the Team: Employee stories highlight collaboration and growth at Drata.

Apr 27, 2026

Apply

Senior Site Reliability Engineer at Hyperbolic | San Francisco

Hyperbolic Labs

Full-time|On-site|San Francisco, CA

Who We AreAt Hyperbolic Labs, we are committed to democratizing AI by removing barriers to computing power with our Open-Access AI Cloud. By aggregating global computing resources, we provide an innovative GPU marketplace and AI inference service that ensures both affordability and accessibility. As trailblazers at the convergence of AI and open-source technology, we envision a future where AI innovation is only limited by creativity, not by resource availability. We invite forward-thinking individuals who share our dedication to making AI universally accessible, secure, and affordable. Join us in crafting a platform that empowers innovators worldwide to realize their visionary AI projects.In anticipation of our growth following our Series A funding, our team — guided by co-founders with advanced degrees in AI, Mathematics, and Computer Science — is set to transform the computing landscape.About the RoleWe are in search of a skilled Site Reliability Engineer to guarantee that Hyperbolic's GPU marketplace and AI infrastructure function with outstanding reliability, performance, and security. As an aggregator of computational resources from numerous global providers, our service level objectives (SLOs), trust, and economic efficiency are critical to our product. Your key responsibilities will include defining and maintaining service level objectives, developing resilient incident response protocols, managing capacity across our extensive GPU network, and implementing secure rollout and rollback mechanisms to ensure uninterrupted platform operation around the clock.In this influential role, you'll set the reliability benchmarks that foster customer trust in our platform, design comprehensive monitoring and alerting systems for enhanced infrastructure visibility, automate capacity management and resource allocation processes, lead incident response and post-mortem evaluations, and collaborate closely with engineering teams to bolster system resilience. Security and infrastructure hardening will be paramount, necessitating strong isolation protocols between tenants and suppliers, the implementation of effective key management systems, and the establishment of compliance frameworks. This high-impact position will directly affect our ability to deliver on our commitment to providing affordable, accessible AI compute at scale.

Mar 26, 2026

Apply

Product Designer for Enterprise and Platform Solutions

Vizcom

Full-time|$180K/yr - $210K/yr|On-site|San Francisco

Product Designer for Enterprise and Platform SolutionsLocation: San Francisco, CAAt Vizcom, we are pioneering the development of AI-driven tools that empower industrial designers to transition from concept sketches to production with enhanced speed and confidence. We are on the lookout for a talented Product Designer to influence the foundational product experiences that cater to both initial user interactions and intricate workflows.In this dynamic role, you will tackle challenges ranging from assisting new users in recognizing the platform's value to engineering scalable systems that facilitate long-term professional engagement. Collaboration will be key as you work closely with Product and Engineering teams to distill complex requirements into seamless, intuitive experiences that evolve alongside our users and our product team. Join a compact, agile team that cherishes innovation, exploration, and a commitment to aesthetic excellence.Your ResponsibilitiesCraft comprehensive product experiences encompassing user onboarding, core workflows, and advanced use cases.Simplify complex systems into user-friendly interfaces that cater to the diverse needs of enterprise teams and individual users.Collaborate with Product and Engineering to identify challenges, explore solutions, and deliver high-quality features.Leverage user research and product insights to guide design choices that enhance usability, retention, and user engagement, with a focus on initial user experiences.Contribute to the growth of our user research practice through participation in interviews, usability testing, and by helping to enhance our research methodologies.Participate in the evolution of our design system, aiding in the scalability of components, patterns, and guidelines as the product and team expand.Enhance design team efficiency by exploring and integrating emerging tools and workflows that optimize prototyping, testing, and project delivery.Rapidly prototype and iterate in ambiguous problem spaces, utilizing prototypes to explore and validate concepts.Champion thoughtful UX decisions that strike a balance between speed, quality, and long-term usability while navigating our evolving product development process.

Mar 12, 2026

Apply

Senior Site Reliability Engineer at Plaud | San Francisco

Plaud Inc.

Full-time|On-site|San Francisco, CA

About Plaud Inc.Plaud is revolutionizing the way professionals enhance productivity and performance with our trusted AI work companion. Our innovative note-taking solutions have gained the admiration of over 1,500,000 users globally since our inception in 2023. We are on a mission to amplify human intelligence by developing next-generation intelligence infrastructure and interfaces that seamlessly capture, extract, and leverage what you say, hear, see, and think.Based in San Francisco, Plaud Inc. is a Delaware-incorporated company that is redefining the boundaries of human-AI collaboration through a unique combination of hardware and software solutions. We adhere to the highest standards of data security and privacy protection, with certifications including ISO 27001, ISO 27701, GDPR, SOC 2, HIPAA, and EN 18031 compliance.Discover more about our innovative solutions by visiting https://www.plaud.ai and follow us on Instagram, X, Facebook, LinkedIn, and YouTube.Why You Should Join UsAt Plaud, you will play a pivotal role in shaping the future of human-AI interaction. Here’s what we offer:A thriving, bootstrapped company with a remarkable $250M revenue run rate achieved in just three years.An opportunity to define the next-generation paradigm for human-AI interaction.Direct exposure to cutting-edge AI tools for professionals and a chance to contribute to our global expansion.Collaborate with a passionate team that values innovation, teamwork, and customer success.Advance your career in a culture that promotes continuous learning and rapid career growth.

Feb 24, 2026

Apply

Senior Site Reliability Engineer at Carta | San Francisco, CA

Carta

Full-time|On-site|San Francisco, California; Santa Clara, California; Seattle, WA

Join Carta as a Senior Site Reliability Engineer, where you will play a pivotal role in enhancing our infrastructure and ensuring the reliability of our platforms. You will work collaboratively with cross-functional teams to implement innovative solutions that drive operational excellence and scalability.

Apr 3, 2026

Apply

Senior Site Reliability Engineer at prosper | San Francisco

prosper

Full-time|On-site|San Francisco, CA

Role overview The Senior Site Reliability Engineer at prosper plays a key role in maintaining and improving the reliability and performance of the company’s core systems. Collaboration with teams across the organization is essential to ensure services remain stable and efficient. What you will do Design and set up monitoring tools to track the health and performance of systems Automate routine operational tasks to minimize manual intervention and boost efficiency Diagnose and resolve complex technical problems that impact infrastructure or services Support projects aimed at strengthening infrastructure stability and preparing for future growth Location This role is located in San Francisco, CA.

Apr 27, 2026

Apply

Reliability Engineer at Sieve | San Francisco

Sieve

Full-time|On-site|San Francisco

About SieveSieve stands as a pioneering AI research lab dedicated solely to video data. Our innovative approach integrates exabyte-scale video infrastructure with state-of-the-art video understanding techniques and a myriad of data sources, creating unparalleled datasets that redefine video modeling. With video accounting for 80% of global internet traffic, it has become the vital digital medium fueling creativity, communication, gaming, AR/VR, and robotics. At Sieve, we aim to eliminate the most significant bottleneck hindering the expansion of these applications: access to high-quality training data.With strategic partnerships with leading AI labs, our team of just 12 has achieved remarkable financial success, generating $XXM last quarter alone. Earlier this year, we secured Series A funding from elite firms including Matrix Partners, Swift Ventures, Y Combinator, and AI Grant.About the RoleAs we process petabytes of video across numerous nodes and cloud environments, ensuring reliability, observability, and security is essential to our growth.We are seeking our inaugural Reliability Engineer, who will focus entirely on fortifying the infrastructure that underpins Sieve. This role demands high ownership and a deep understanding of:System throughput and stabilityMonitoring and incident managementSecurity principles, including least-privilege designMinimizing operational burdens for the entire engineering teamYou will collaborate closely with our CTO and founding engineers to develop the foundational tools that empower our engineering efforts.This position is ideal for an engineer who is passionate about reliability, throughput, observability, and security. You are proactive in anticipating potential failure modes, reducing operational risks, and designing resilient systems.If a system failure occurs, you take it personally, thriving under the weight of responsibility.What You'll Be DoingCollaborate with engineering to design and validate infrastructure supporting PB-scale workloadsDevelop and manage Terraform-based multi-cloud deploymentsEnhance cloud and data security (SSO, IAM, least privilege access, auditability)Lead incident response efforts and strengthen systems against failuresCreate CI/CD systems to minimize user errors and maximize safetyEstablish monitoring and alerting frameworks (Prometheus, OpenTelemetry, VictoriaMetrics)

Feb 5, 2026

Apply

Senior Platform Engineer at Flux | San Francisco

Flux

Full-time|On-site|San Francisco Office

Join Flux: Revolutionizing Hardware EngineeringAt Flux, we are transforming the landscape of hardware engineering with our pioneering AI Hardware Engineer. Our mission is to empower innovators worldwide to design and build cutting-edge hardware effortlessly.Your RoleAs a Senior Platform Engineer, you will be at the forefront of developing robust systems that enhance the Flux platform. You'll be responsible for crucial functionalities beyond the editor, including billing, authentication, onboarding, and integrations.Your contributions will directly impact every user, ensuring seamless operations and enabling Flux to scale efficiently.Core ResponsibilitiesDevelop and sustain user-centric platform features, such as billing, authentication, onboarding, and settings.Integrate and manage third-party APIs including Stripe, Firebase, and various analytics tools.Create backend services using TypeScript/Node.js and develop frontend components with React.Enhance the reliability and performance of essential platform workflows.Establish standards for observability, logging, and error tracking.Collaborate with cross-functional teams to design APIs that enable new products and integrations.Document architecture and best practices within your domain.

Jan 7, 2026

Apply

Site Reliability Engineer at Mercor | San Francisco

Mercor

Full-time|On-site|San Francisco

Join the Mercor TeamAt Mercor, we stand at the dynamic intersection of labor markets and AI research. Collaborating with premier AI labs and enterprises, we empower the human intelligence that is crucial for AI's evolution.Our expansive talent network plays a vital role in training cutting-edge AI models, akin to the way educators impart knowledge to their students—by sharing insights, experiences, and contextual understanding that code alone cannot convey. Currently, our network of over 30,000 experts generates more than $2 million daily.We are pioneering a novel category of work where expertise fuels AI progress. Achieving this vision necessitates an ambitious, fast-paced, and deeply dedicated team. You will collaborate with researchers, operators, and AI firms that are at the forefront of transforming societal structures.Mercor is a thriving Series C company with a valuation of $10 billion. We operate five days a week in-person at our new headquarters in San Francisco.About the RoleAs a Site Reliability Engineer (SRE) at Mercor, you will take ownership of production reliability for our critical systems, working closely with our infrastructure leadership. You will play a pivotal role in establishing our SRE function and defining how Mercor manages large-scale, high-availability systems.Your ResponsibilitiesEnsure the reliability and safety of production for key shared services and customer-facing systems.Collaborate directly with infrastructure leadership to outline SRE priorities, reliability benchmarks, and the production safety roadmap.Enhance the structure of our production systems to ensure stability, resource efficiency, isolation, and observability.Advocate for and implement modern SRE methodologies (e.g., incident management, postmortems, SLIs/SLOs) across engineering teams.Work alongside engineering and applied AI teams to facilitate sustainable growth.Promote SRE best practices internally, supporting teams in a safe, scalable, and consistent production onboarding process.Who We SeekThe ideal candidate will have:Extensive experience in genuine SRE roles (not merely operations) across various positions or organizations.A deep understanding of SRE methodologies popularized by Google (e.g., error budgets, reliability vs. risk trade-offs, large-scale distributed systems).5+ years of SRE experience; ideally, 15+ years in total experience for this inaugural SRE position.A proven track record of managing systems at scale, with a strong grasp of the complexities involved.

Dec 27, 2025

Apply

Site Reliability Engineer at Superhuman | San Francisco

Superhuman, Inc.

Full-time|$214K/yr - $260K/yr|Hybrid|Hub - San Francisco

At Superhuman, we embrace a vibrant hybrid work model that offers our team members the ideal blend of focused individual work and collaborative in-person interactions, fostering trust, innovation, and a robust team culture.About SuperhumanSuperhuman, the AI productivity platform, is on a transformative mission to unlock the superhuman potential within everyone. With the integration of Grammarly's writing assistance and innovative tools like Coda’s collaborative workspaces and Go, our proactive AI assistant, we empower over 40 million individuals and 50,000 organizations globally. Founded in 2009, we strive to eliminate busywork and enhance productivity. Discover more at superhuman.com and explore our values here.The OpportunityTo meet our ambitious goals, we are seeking a Site Reliability Engineer (SRE) to join our infrastructure team. This pivotal role focuses on developing software solutions to maintain the reliability of our back-end systems while collaborating with engineering teams to strategize our future growth. You will also engage with our production engineering teams in Europe as we transition from a “you build it, you own it” approach.At Superhuman, our engineers and researchers enjoy the autonomy to innovate and drive breakthroughs, directly impacting our product roadmap. As we rapidly scale our interfaces, algorithms, and infrastructure, the complexity of our technical challenges is growing. Learn more about our technical endeavors on our technical blog.As an SRE, your responsibilities will include:Scaling our Kubernetes-based control plane that processes billions of events each day.Enhancing our automation mechanisms to efficiently respond to workload demands.Deploying machine learning systems across various departments.

Jun 18, 2025

Apply

Senior Software Engineer - Platform at Trunkio | San Francisco

Trunkio

Full-time|On-site|San Francisco

Join Trunkio, where our mission is to enable teams to develop high-quality software swiftly. We have collaborated with engineering teams at top-tier companies like Google X, Zillow, and Brex to identify build failures, manage flaky tests, and enhance code deployment speed without compromising reliability. Although AI has accelerated code writing, the delivery process remains a challenge due to merge conflicts, inconsistent code quality, and other productivity-draining issues. Our goal is to help engineering teams focus on the design, implementation, and delivery of exceptional software, resulting in more fulfilling work experiences. We are currently developing a CI Reliability Platform that empowers teams to deliver code efficiently.Founded in 2021 by industry veterans from Uber, Google, YouTube, and Microsoft, Trunkio has successfully raised a $25M Series A led by Initialized Capital and a16z, with backing from notable investors including Haystack Ventures and the creators of GitHub, Apollo GraphQL, and Algolia.We are seeking a passionate and skilled Senior Software Engineer to join our Platform/Data Engineering team. In this pivotal role, you will design and optimize data ingestion pipelines to manage large volumes of real-time and batch data from diverse sources. Your expertise will be vital in creating systems that are scalable, reliable, and performant, while also ensuring seamless data integration across our ecosystem.

Mar 24, 2022

Apply

Senior Hardware Reliability Engineer at Samsara | San Francisco, CA

Samsara

Full-time|$204K/yr - $240K/yr|Hybrid|San Francisco, CA, United States

Who We AreSamsara (NYSE: IOT) is a trailblazer in the Connected Operations™ Cloud, a platform that empowers organizations reliant on physical operations to leverage Internet of Things (IoT) data for actionable insights and operational improvements. Our mission at Samsara is to enhance the safety, efficiency, and sustainability of the physical operations that underpin the global economy. Covering over 40% of global GDP, these sectors include agriculture, construction, field services, transportation, and manufacturing. We are dedicated to digitally transforming their operations on a large scale.Joining Samsara means you'll be part of a team that's defining the future of physical operations. You will contribute to a dynamic range of product solutions, including Video-Based Safety, Vehicle Telematics, Apps and Driver Workflows, and Equipment Monitoring. As a company that has recently gone public, you will enjoy the autonomy and support to make a significant impact as we build for the future.About the Role:Samsara's Hardware Reliability team plays a crucial role in ensuring an outstanding customer experience through reliable hardware. As a Senior Hardware Reliability Engineer, you will establish quality processes that uphold the high standards of Samsara's hardware.In this role, you will implement and execute comprehensive reliability strategies that cover the entire product development lifecycle, from concept to warranty repair. You will rapidly gather and analyze test, field performance, and manufacturing data to drive necessary actions both internally and with our suppliers, ensuring the production of top-quality products. Collaboration with hardware, firmware, and operations teams is a fundamental aspect of this role.This is a hybrid position open to candidates residing in the US, requiring you to visit our office in San Francisco three times a week.You Should Apply If:You want to impact the industries that run our world: Your efforts will lead to tangible real-world benefits—helping to maintain essential services and support vital industries.

Feb 14, 2026

Create account — see all 11,927 results

1 - 20 of 11,927 Jobs

Select all on this page (20)

Apply

Senior Platform & Reliability Engineer at Vizcom | San Francisco

Vizcom

Full-time|$200K/yr - $250K/yr|On-site|San Francisco

Feb 24, 2026

Apply

Senior Product Engineer at Vizcom | San Francisco

Vizcom