companyCoreWeave logo

Principal Engineer - Observability

CoreWeaveNew York, NY / Sunnyvale, CA
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Qualifications

To be successful in this role, you should possess a strong background in software engineering, with extensive experience in observability tools and methodologies. Proven expertise in designing scalable systems and high-performance telemetry pipelines is essential. You should be adept at creating intuitive user interfaces and have a deep understanding of AI workloads. Excellent communication skills to collaborate effectively with cross-functional teams and stakeholders are a must. A strong focus on operational excellence and problem-solving is critical.

About the job

CoreWeave is redefining the cloud landscape for AI. Our platform, crafted by innovators for innovators, empowers businesses to build and scale AI solutions with unparalleled confidence. Trusted by leading AI laboratories, startups, and global enterprises, we combine exceptional infrastructure performance with profound technical expertise to drive innovation and transform complex computations into actionable capabilities. Founded in 2017, CoreWeave proudly became a publicly traded entity (Nasdaq: CRWV) in March 2025. Discover more at www.coreweave.com.

About the Role

We are in search of a highly skilled and strategic Principal Engineer, Observability to spearhead the architecture, development, and operational excellence of our Observability platform. In this pivotal role, you will shape how our customers monitor, troubleshoot, and manage their AI workloads at scale on the CoreWeave platform.

You will collaborate directly with customers and work closely with engineering leaders across various teams to ensure a cohesive Observability experience across all CoreWeave products, including metrics, logs, traces, and customer-facing insights.

What You’ll Do

  • Lead the Observability strategy and roadmap, aligning it with business objectives, product direction, and performance/SLA targets.
  • Design and deploy low-latency, high-scale telemetry pipelines and data storage solutions that enhance observability across all CoreWeave offerings.
  • Create customer-centric experiences—including dashboards, alerts, and workflows—that facilitate rapid troubleshooting and provide deep insights into AI workloads and platform health.
  • Promote reliability, durability, and self-healing within the Observability stack, taking ownership of key services in production to uphold high operational standards.
  • Enhance customer visibility by establishing benchmarks for metrics, SLOs, and dashboards that clearly communicate system performance and reliability.

About CoreWeave

CoreWeave is at the forefront of cloud technology, specifically tailored for AI applications. Our team is composed of industry pioneers dedicated to delivering cutting-edge solutions that empower organizations to leverage AI effectively. We provide a robust infrastructure combined with expert guidance, enabling our clients to achieve their AI ambitions confidently. As a publicly traded company, we continue to innovate and adapt to the fast-paced tech landscape, striving to support the needs of our diverse clientele.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.