About the job
P-186
At Databricks, we are passionate about empowering data teams to tackle some of the world's most challenging problems, from detecting security threats to advancing cancer drug development. Our mission is to build and operate the premier data and AI infrastructure platform, allowing our customers to concentrate on the high-value challenges central to their missions.
Since our inception in 2013 by the original creators of Apache Spark™, Databricks has transformed from a small office in Berkeley, California, into a global powerhouse with over 1,000 employees. Today, thousands of organizations—from startups to Fortune 100 companies—trust Databricks with their mission-critical workloads, solidifying our status as one of the fastest-growing SaaS companies worldwide.
Our engineering teams are dedicated to developing highly technical products that address real-world needs. We continuously push the limits of data and AI technology while ensuring the resilience, security, and scale essential for our customers' success on our platform.
We manage one of the largest-scale software platforms, comprising millions of virtual machines that generate terabytes of logs and process exabytes of data daily. At this scale, we routinely encounter cloud hardware, network, and operating system faults, and our software must adeptly shield our customers from these challenges.
In modern data analysis, sophisticated techniques such as machine learning extend far beyond the traditional SQL query engine's roll-up and drill-down capabilities. As a software engineer on the Runtime team at Databricks, you will contribute to building next-generation distributed data storage and processing systems that not only surpass specialized SQL query engines in relational query performance but also offer the expressiveness and programming abstractions necessary to support varied workloads, ranging from ETL to data science.
Here are some example projects you might work on:
- Apache Spark™: Develop the de facto open-source standard framework for big data.
- Data Plane Storage: Provide reliable and high-performance services and client libraries for storing and accessing massive amounts of data on cloud storage backends like AWS S3 and Azure Blob Store.
- Delta Lake: A storage management system that merges the scale and cost-efficiency of data lakes with the performance and reliability of data warehouses, along with the low latency of streaming. Its higher-level abstractions, including ACID transactions and time travel, significantly simplify real-world data engineering architecture.
- Delta Pipelines: Managing a single data engineering pipeline can be challenging. The objective of the Delta Pipelines project is to streamline this process.

