Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Entry Level
Qualifications
Proficient in programming languages such as JavaScript, Python, or Go. Strong understanding of observability principles, including logging, monitoring, and tracing. Experience with cloud platforms and serverless architectures. Excellent problem-solving abilities and a strong analytical mindset. Ability to work collaboratively in a fast-paced, agile environment.
About the job
Join our dynamic team at Cloudflare as a Software Engineer focused on Workers Observability. In this pivotal role, you'll be instrumental in enhancing the observability features of our Workers platform, ensuring optimal performance and reliability for our users. You will collaborate with cross-functional teams, tackle complex technical challenges, and contribute to the advancement of our innovative cloud solutions.
About Cloudflare, Inc.
Cloudflare is a leading web performance and security company dedicated to helping businesses build a better internet. With cutting-edge technology and a commitment to innovation, we empower organizations to enhance their online presence while safeguarding them against threats. Join us as we shape the future of the internet!
Similar jobs
1 - 20 of 5,515 Jobs
Search for Software Engineer Workers Observability
Join our dynamic team at Cloudflare as a Software Engineer focused on Workers Observability. In this pivotal role, you'll be instrumental in enhancing the observability features of our Workers platform, ensuring optimal performance and reliability for our users. You will collaborate with cross-functional teams, tackle complex technical challenges, and contribute to the advancement of our innovative cloud solutions.
Become part of the innovative engineering teams at OpenAI, where we create and deliver groundbreaking AI technologies responsibly and safely to the world!Our Applied Engineering team collaborates across research, engineering, product, and design disciplines to deploy OpenAI's cutting-edge technology for both consumers and businesses. We are committed to learning from our deployments and ensuring that AI is utilized ethically while maximizing its benefits. To us, safety takes precedence over unchecked growth.About the RoleWe are in the process of developing OpenAI's observability product, which encompasses everything from scalable infrastructure to an intuitive, AI-enhanced user interface. Our systems process petabytes of logs and billions of time series metrics throughout our infrastructure. We are now integrating intelligence to create features like agents that summarize service events, auto-generate dashboards, and assist engineers in debugging through user-friendly notebook-like interfaces.We are looking to hire software engineers at all levels of our stack—be it infrastructure, backend, or product. You will be part of a dynamic, resourceful team that develops both foundational infrastructure and innovative internal tools, ensuring the reliability, performance, and observability of OpenAI's production systems.What You’ll DoLead the development of core observability infrastructure, focusing on distributed logging, time series, and trace storage.Create AI-integrated tools that empower engineers to autonomously identify, comprehend, and resolve issues.Enhance user interface experiences including dashboards, notebooking, and interactive debugging.Work collaboratively with engineers, researchers, user operations, and various teams to craft the next generation of the observability product.You Might Be a Fit If You:Have experience operating large-scale distributed systems in production, particularly logging systems or time series databases.Excel in ambiguous environments and tackle unscoped challenges head-on.Possess full-stack development skills or a strong product sensibility; you are eager to build practical tools that users will engage with.Demonstrate robust knowledge of systems, networking, and cloud infrastructure (Kubernetes, AWS, etc.).Bonus: Have built or contributed to observability systems (e.g., Prometheus, OpenTelemetry, etc.).Why This Team?We combine infrastructure and product development to create real AI applications for in-house use.Your contributions will directly enhance the reliability of GPT-based products at OpenAI.
Join Gusto as a Staff Software Engineer specializing in Observability, where you will play a pivotal role in enhancing our software's performance and reliability. Utilize your expertise to develop and implement monitoring solutions that provide insights into application behavior, ensuring a seamless experience for our users.Your contributions will directly impact our engineering processes and product quality. Collaborate with cross-functional teams to identify and resolve issues proactively, while also driving initiatives to improve system observability.
Full-time|On-site|San Francisco, CA • New York, NY • United States
Join Figma as a Software Engineering Manager specializing in Observability. In this pivotal role, you will lead a dynamic team of engineers in developing cutting-edge solutions that enhance visibility and performance across our platform. Your expertise will drive the design and implementation of observability tools that empower our engineering teams to optimize their workflows, ensuring the robustness and reliability of our applications.
Full-time|On-site|San Francisco, CA | New York City, NY | Seattle, WA
Join Anthropic as a Staff+ Software Engineer specializing in Observability, where you will play a crucial role in enhancing our systems to ensure high-performance and reliability. Collaborate with cross-functional teams to develop innovative solutions, implement observability metrics, and drive improvements that enable better decision-making and user experiences.
Full-time|$170K/yr - $240K/yr|On-site|San Francisco, CA
About the Role Sigma Computing is growing its engineering team in San Francisco, CA. The company builds technology to help users access data with ease. As a Senior Software Engineer focused on Observability and Reliability, you will work alongside engineers who value high standards and collaboration. What You Will Do Design and build observability platforms and tools, including metrics collection, logging, distributed tracing, dashboards, alerting, and application performance monitoring. Work with technologies such as Go, OpenTelemetry, and Kubernetes to solve reliability challenges. Take part in on-call rotations to help maintain strong uptime for Sigma’s services. Create tools and processes to improve cloud incident triage and reduce downtime. Define and promote practices that make systems and services measurable and observable. Join design and code reviews with peers and stakeholders to reinforce quality and effective collaboration.
Role overview Adyen seeks a Senior Software Engineer in San Francisco to focus on Customer Developer Observability. This position aims to enhance the tools and systems that let clients monitor and analyze their performance across the Adyen platform. What you will do Collaborate with cross-functional teams to design and build observability solutions. Create and implement features that provide customers with deeper insights into their systems and data. Help improve the customer experience by making monitoring and analysis more effective and accessible.
Full-time|$166K/yr - $201K/yr|On-site|San Francisco, CA - US
At Crusoe, we are on a mission to accelerate the availability of energy and intelligence. We are building the foundational technology that empowers individuals to innovate boldly with AI while maintaining speed, scale, and sustainability.Join us in the AI revolution with sustainable technology at Crusoe, where you will lead significant innovations, make a real impact, and collaborate with a team that is pioneering responsible and transformative cloud infrastructure.About the Role:We are seeking a highly proficient engineer with extensive experience in designing and managing observability platforms at scale. You will be responsible for architecting, developing, and operating Crusoe’s next-generation observability stack, which will allow engineers to gain insights into the internal state of distributed systems through metrics, logs, and traces. Your contributions will guarantee reliability, performance, and actionable insights across Crusoe’s global infrastructure and cloud platform.Key Responsibilities:Design and manage scalable observability systems (metrics, logging, tracing) in multi-datacenter Kubernetes environments.Architect comprehensive telemetry pipelines, covering ingestion, storage, querying, and visualization.Enhance monitoring and alerting mechanisms with Prometheus, Alertmanager, Thanos/Cortex, Grafana, and OpenTelemetry.Develop scalable log collection and processing pipelines utilizing Fluent Bit, Vector, Loki, or ELK/Opensearch stacks.Implement distributed tracing platforms (Tempo, Jaeger, OpenTelemetry) and integrate with service meshes, load balancers, and APIs.Establish and promote the adoption of SLOs, SLIs, and error budgets across various services and teams.Automate the provisioning and scaling of observability infrastructure using Kubernetes, Terraform, and custom tools (Go, Python).Ensure the reliability and cost-effectiveness of telemetry pipelines while supporting high-volume workloads (AI/ML, HPC clusters, GPU infrastructure).Integrate security best practices into observability platforms, including RBAC, TLS, secret management, and multi-tenant access controls.Collaborate with engineering teams to embed observability into applications, services, and infrastructure.Mentor engineers and influence Crusoe’s observability strategy and technical roadmap.
Full-time|Remote|Remote with offices in San Francisco, CA / New York, NY / Minneapolis, MN
Join Dagster Labs as a Software Engineer specializing in our Observability Product. In this fully remote role, you will play a crucial part in enhancing the visibility and performance of our software solutions. Collaborate with cross-functional teams to develop and implement innovative observability features that empower our users to monitor and optimize their applications effectively.
Join Crusoe as a Senior Software Engineer specializing in Observability, where you will play a pivotal role in enhancing our systems and ensuring robust performance across our platforms. You will collaborate with cross-functional teams to develop innovative solutions that improve the visibility and reliability of our software applications.
Join Adyen as an Engineering Manager for our Developer Observability team! In this pivotal role, you will lead a dynamic group of engineers dedicated to enhancing the observability of our developer platforms. You will be responsible for driving technical innovation, mentoring your team, and collaborating closely with cross-functional partners to deliver exceptional developer experiences.As a leader, you will empower your team to excel in building tools and solutions that provide insights into system performance, ensuring our developers have everything they need to thrive. If you are passionate about technology, leadership, and fostering a culture of excellence, we want to hear from you!
Join Cloudflare as a Senior Software Engineer specializing in Workers Runtime. In this dynamic role, you will be instrumental in developing and enhancing our serverless platform, enabling developers worldwide to build and deploy applications seamlessly. You will leverage your expertise in programming languages, cloud technologies, and software architecture to craft innovative solutions that power the next generation of applications.We are looking for a passionate engineer who thrives in a collaborative environment and is eager to tackle complex challenges. You will work closely with cross-functional teams to deliver high-quality software and contribute to Cloudflare's mission of building a better internet.
Join Cloudflare as a Software Engineer focusing on Workers Deployment and Configuration. In this role, you will be an integral part of our engineering team, working to enhance our deployment processes and improve configuration management for our serverless platform. You will collaborate with cross-functional teams to deliver high-quality software solutions that meet the needs of our clients.
Full-time|$200K/yr - $250K/yr|On-site|San Francisco, CA
About Us:At LangChain, we are dedicated to making intelligent agents a standard part of everyday life. Our goal is to provide the essential framework for agent engineering, empowering developers to transition their ideas from prototypes to production-ready AI agents that teams can trust. Initially launched as a widely embraced open-source initiative, our evolution has led us to offer a robust platform tailored for building, evaluating, deploying, and managing agents at scale.Our platforms, including LangChain, LangGraph, LangSmith, and Agent Builder, are now instrumental for teams delivering innovative AI solutions across diverse sectors, from startups to major corporations. Industry leaders such as Replit, Clay, Coinbase, Workday, Lyft, Cloudflare, Harvey, Rippling, and Vanta, along with 35% of the Fortune 500, rely on LangChain for their AI initiatives.Having successfully secured $125M in Series B funding from prominent investors like IVP, Sequoia, Benchmark, CapitalG, and Sapphire Ventures, we are poised for continued growth and innovation. At LangChain, every team member plays a vital role in shaping our projects and collaborative work environment, making it a place where your input can significantly influence the future of technology.About The Role:We are seeking a dynamic Engineering Manager to spearhead the development of LangSmith, our observability and evaluation platform designed for LLM applications. In this role, you will set the technical vision, cultivate and mentor a high-performing engineering team, and collaborate closely with product and design teams to deliver features that enable developers to construct and deploy reliable AI systems with assurance.You will: Build, mentor, and expand a talented team of engineers, fostering a culture of collaboration, ownership, and accountability.Enhance LangChain’s engineering culture through mentorship, commitment to high-quality code, and technical excellence.Define long-term technical strategy and guarantee the scalability and reliability of the LangSmith AI Observability Platform.Work alongside product and design teams to outline project scope, sequence, and success metrics for key initiatives.Uphold a high standard of technical excellence while ensuring the team remains focused and operates with urgency.Lead by example in producing clean, maintainable, and thoroughly tested code using Go/Python and TypeScript.Engage directly with customers to grasp their needs and translate those insights into actionable product enhancements.
Join DigitalOcean as a Senior Observability Engineer, where you will play a critical role in enhancing our monitoring and observability platforms. Your expertise will help us ensure that our systems are performant, reliable, and scalable, providing a seamless experience for our customers.
Full-time|$125K/yr - $145K/yr|On-site|San Francisco, CA
About Us:At LangChain, we are dedicated to making intelligent agents a fundamental part of everyday technology. Our mission is to provide the essential tools for agent engineering in practical applications, enabling developers to transition seamlessly from initial prototypes to production-ready AI agents that organizations can depend on. Starting as a suite of widely adopted open-source tools, we have expanded to offer a comprehensive platform for building, evaluating, deploying, and managing AI agents at scale.Currently, our platforms, including LangChain, LangGraph, LangSmith, and Agent Builder, are trusted by teams developing real AI solutions in both startups and established enterprises. Our technology powers AI initiatives for renowned companies such as Replit, Clay, Coinbase, Workday, Lyft, Cloudflare, Harvey, Rippling, Vanta, and 35% of the Fortune 500.With $125M raised in Series B funding from IVP, Sequoia, Benchmark, CapitalG, and Sapphire Ventures, we are at an exciting juncture where we continue to innovate, grow rapidly, and every team member can make a significant impact on our products and collaboration. Join us at LangChain, where your contributions can reshape the technology landscape.About the Role:In-person, 5 days a week in San FranciscoWe are seeking a Fullstack Engineer to join our LangSmith product team, focusing on our commercial AI observability and evaluation platform. In this position, you will have the opportunity to develop new features and capabilities for our platform while collaborating closely with enterprise clients, developer end-users, and internal stakeholders.Your Responsibilities:Design and implement critical product features utilizing our Go, Python, and TypeScript stackWork in close partnership with product and design teams to refine features and enhance the product roadmapDrive project timelines effectively while maintaining high engineering standards through clean, maintainable, and well-tested codeTo Succeed in This Role:2+ years of experience in software engineering, particularly with complex platform productsFullstack engineering experience with Go or Python on the backend and React + TypeScript on the frontendStrong understanding of database systems, especially Postgres and RedisExperience in designing and scaling APIs, ideally in high-performance environments
Full-time|$320K/yr - $405K/yr|On-site|San Francisco, CA
About AnthropicAt Anthropic, we are dedicated to developing AI systems that are reliable, interpretable, and controllable. Our mission is to ensure that artificial intelligence remains safe and beneficial for individuals and society at large. Our rapidly expanding team comprises passionate researchers, engineers, policy experts, and business leaders collaborating to create positive AI solutions.About the TeamAs the scale of AI training and deployment increases, so does the volume of data that requires monitoring and comprehension. Our team utilizes Claude to interpret this data effectively. We manage an integrated suite of tools that empowers Anthropic to pose open-ended inquiries, identify unexpected patterns, and maintain significant human oversight over extensive datasets.Our tools are widely utilized internally, driving ongoing enforcement, threat intelligence investigations, model audits, and much more. We are seeking skilled engineers and researchers to enhance existing applications and innovate new ones from the ground up.About the RoleAs a Research Engineer on our team, you will design and develop systems that enable AI to analyze vast, unstructured datasets—think tens or hundreds of thousands of conversations or documents—and generate structured, reliable insights. You will engage with the entire technology stack, from foundational analysis frameworks to user-facing applications and interfaces.This is a high-impact position. The tools you create will be utilized by numerous researchers and investigators, directly influencing our capacity to assess and counteract both misuse and misalignment.
Full-time|$194K/yr - $267K/yr|On-site|San Francisco, California
Discover OktaOkta is recognized as The World’s Identity Company, empowering individuals to securely leverage any technology across various devices and applications. Our versatile Okta Platform and Auth0 Platform provide reliable access, authentication, and automation, placing identity at the forefront of business security and expansion.At Okta, we value diverse perspectives and experiences. We seek continuous learners and individuals who can enhance our team with their distinct backgrounds.Join us as we create a world where identity is truly yours.We are in search of a highly skilled Observability Site Reliability Engineer specializing in Google Cloud, to take charge of and elevate our Observability ecosystem within GCP. In this position, you will progress beyond basic monitoring to develop a world-class, comprehensive, and scalable Observability Platform that supports our SRE teams and business collaborators. You will implement infrastructure as code by employing Terraform and demonstrating strong coding skills in Go, Python, or Ruby to automate the deployment of agents and collectors across intricate distributed systems.Key ResponsibilitiesAutomated Infrastructure: Design, build, and maintain scalable observability infrastructure utilizing tools such as Terraform.GCP Observability Engineering: Enhance the collection, processing, and storage of Observability data to guarantee high reliability and low latency for our Splunk and Grafana services.Incident Response: Engage in on-call rotations and conduct post-incident reviews to foster systemic improvements and promote 'observability-driven development.'Automation: Minimize 'toil' by automating the deployment and scaling of observability agents and collectors.
About UsAt Braintrust, we are pioneering the AI observability landscape. Our platform seamlessly integrates evaluations and observability into a unified workflow, providing developers with crucial insights into AI behavior in production and powerful tools for enhancement.Our clients, including renowned teams at Notion, Stripe, Zapier, Vercel, and Ramp, leverage Braintrust to benchmark models, optimize prompts, and identify regressions, transforming production data into superior AI functionality with each iteration.The OpportunityWe are in search of a motivated Product Engineer who is enthusiastic about crafting tools that users adore and rely on daily. In this role, you will engage closely with our users—developers, product managers, and designers in the AI domain—and significantly influence our product roadmap.Our platform is built on a high-performance, local-first architecture with a visualization-heavy UI, utilizing modern Typescript and React. Our clientele, comprised of some of the leading technology firms, demands a product that is exceptionally fast, reliable, and user-friendly.Your ResponsibilitiesKey responsibilities include:Guaranteeing a seamless experience for customers enabling AI observability in their systems.Contributing to the foundational UI architecture with a focus on performance and efficient data loading strategies.You will also develop user-facing components for Braintrust, such as:An exceptional prompt playground that accommodates multiple models and thousands of user inputs.A robust system for managing prompts, configurations, and version comparisons.A scalable multiplayer human review system.LLM output analysis and comparison across extensive datasets.Ideal Candidate ProfileProficient in Typescript, React, HTML, CSS, SQL, and NextJS.Experience in founding or working with startups is advantageous.Familiarity with writing prompts and experimenting with GPT models and applications.What We OfferComprehensive medical, dental, and vision coverage.Daily lunch, snacks, and beverages provided.Flexible time-off policy.Competitive salary and equity opportunities.AI stipend for continued learning and tools.Commitment to DiversityBraintrust is a staunch advocate for equal opportunity in the workplace. We believe in fostering a diverse and inclusive environment.
Full-time|$175K/yr - $225K/yr|On-site|San Francisco, CA
About Us:At LangChain, we are dedicated to making intelligent agents a common part of everyday technology. Our goal is to provide a robust foundation for agent engineering that empowers developers to transition from prototypes to production-ready AI agents that teams can depend on. Initially starting as a widely embraced open-source toolset, we have expanded our offerings to include a comprehensive platform for the building, evaluating, deploying, and managing of agents at scale.Currently, our tools—LangChain, LangGraph, LangSmith, and Agent Builder—are utilized by teams developing real AI products in both startups and large enterprises. Millions of developers rely on LangChain to power AI initiatives at notable companies such as Replit, Clay, Coinbase, Workday, Lyft, Cloudflare, Harvey, Rippling, Vanta, and 35% of the Fortune 500.Having secured $125M in Series B funding from leading investors like IVP, Sequoia, Benchmark, CapitalG, and Sapphire Ventures, we are in an exciting phase of product development and rapid growth, where every team member has a substantial impact on our projects and collaborative efforts. At LangChain, your contributions will play a crucial role in shaping how this technology manifests in the real world.About the Role:This position requires in-person attendance 5 days a week in San Francisco, CA, as well as options in New York and Boston.We are seeking a seasoned frontend engineer to innovate and improve features on LangSmith, our enterprise platform designed for LLM application observability, testing, and debugging.What You Will Do:Create new user-facing features utilizing React and TypeScript.Develop reusable components and front-end libraries for future projects.Convert designs and wireframes into high-quality, maintainable code.Optimize components for peak performance across diverse web-capable devices and browsers.Collaborate with fullstack and backend developers as well as UX/UI designers to enhance usability and experience.You’re a Good Fit If You Have:Extensive frontend engineering experience, with strong command of React, JavaScript, and TypeScript.Practical experience with frontend development tools such as Babel, Vite, Webpack, NPM, and Yarn.Familiarity with REST APIs and experience collaborating closely with fullstack and backend developers.
Jun 9, 2025
Sign in to browse more jobs
Create account — see all 5,515 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.