About the job
P-1414
At Databricks, our mission is to revolutionize the entire data lifecycle—from ingestion to ETL, BI, and extending into ML/AI—through a unified platform. We envision a future where the traditional data warehouse architecture is transformed into a modern architectural paradigm known as Lakehouse (CIDR 2021 paper). This open platform harmonizes data warehousing with advanced analytics, addressing key challenges such as data staleness, reliability, total cost of ownership, data lock-in, and limited use-case support.
To realize this vision, Databricks is developing a cutting-edge (decoupled) query engine and structured storage system that aims to surpass specialized data warehouses in relational query performance while preserving the flexibility and robustness characteristic of general-purpose systems like Apache Spark™. Your contributions will be vital to this multi-year initiative.
As a member of our team, you will design next-generation systems that set new benchmarks in the following areas:
- Query compilation and optimization
- Distributed query execution and scheduling
- Vectorized execution engine
- Data security
- Resource management
- Transaction coordination
- Efficient storage structures (encodings, indexes)
- Automatic physical data optimization

