About the job
P-97
At Databricks, we are on a mission to fundamentally simplify the entire data lifecycle—from ingestion and ETL to BI and ultimately to ML/AI—through a unified platform. We envision a future where the traditional data warehouse architecture is transformed by an innovative architectural model known as the Lakehouse (CIDR 2021 paper). This open platform merges data warehousing with advanced analytics, effectively addressing critical challenges such as data staleness, reliability, total cost of ownership, data lock-in, and limited use-case support.
A key component in realizing this vision is the development of a next-generation decoupled query engine and structured storage system that surpasses the performance of specialized data warehouses while maintaining the flexibility of general-purpose systems like Spark™ to cater to a wide range of workloads, from ETL processes to data science applications.
As a vital member of our team, you will engage in the design and implementation of these next-generation systems that aim to leapfrog the current state-of-the-art in the following areas:
- Query compilation and optimization
- Distributed query execution and scheduling
- Vectorized execution engine
- Data security
- Resource management
- Transaction coordination
- Efficient storage structures (encodings, indexes)
- Automatic physical data optimization

