About the job
P-1415
At Databricks, our mission is to transform the data lifecycle by simplifying processes from data ingestion to ETL, BI, and extending to ML/AI through a unified platform. We envision a future where traditional data warehouse architectures are superseded by an innovative pattern known as the Lakehouse (CIDR 2021 paper), which merges data warehousing with advanced analytics. This paradigm shift addresses critical challenges like data staleness, reliability, total cost of ownership, data lock-in, and the limitation of use-case support.
To realize this vision, we are developing a cutting-edge query engine and structured storage system designed to exceed the performance of specialized data warehouses in relational queries. We aim to preserve the expressiveness and robustness of general-purpose systems like Apache Spark™ to accommodate various workloads, including ETL and data science. Join us on this multi-year journey.
As a member of our team, you will be involved in designing next-generation systems set to redefine industry standards in several key areas:
- Query compilation and optimization
- Distributed query execution and scheduling
- Vectorized execution engine
- Data security
- Resource management
- Transaction coordination
- Efficient storage structures (encodings, indexes)
- Automatic physical data optimization
Your Responsibilities:
- Clarify requirements and drive design decisions for ambiguous problems
- Create technical design documents and project plans
- Develop new features
- Mentor junior engineers
- Test, deploy to production, and monitor outcomes.

