About the job
About LanceDB
LanceDB is an innovative, developer-centric, open-source database designed for multimodal AI applications. We provide robust solutions ranging from hyper-scalable vector search capabilities to advanced retrieval for Retrieval-Augmented Generation (RAG). LanceDB is your ideal partner for creating AI applications, enabling seamless interaction with large-scale AI datasets and powering some of the most cutting-edge applications across various industries.
About the Role
We are seeking a Senior Open Source Engineer to enhance the presence of LanceDB within the extensive data infrastructure ecosystem. You will engage in projects that sit at the convergence of high-performance computing, big data, and open-source systems. Your contributions will drive integrations, optimize distributed operations, and support initiatives within the Apache and AI communities.
Key Responsibilities
Lead open-source community initiatives to integrate the Lance format with systems such as Spark, Hive Metastore, Presto, Trino, and Ray.
Design and sustain efficient distributed operations for Lance datasets.
Develop optimized indices to facilitate predicate pushdown and enhance query performance in Spark, Ray, or Trino.
Engage in the development of table formats, data encodings, and various components of the Lance format using Rust.
Manage and enhance internal data processing infrastructure.
Advocate for the Lance format in open-source forums and at major Big Data conferences.
Requirements
Over 10 years of experience in developing high-performance databases, big data systems, or large-scale data services.
In-depth knowledge of the internal workings of open-source Big Data or AI training systems such as Hadoop, Spark, Flink, Ray, Iceberg, Delta Lake, Hudi, ClickHouse, Trino, Presto, PyTorch, or JAX.
Extensive experience with high-performance computing using Java or Scala.
Familiarity with Rust is preferred, or a strong willingness to learn.
Demonstrated ability to work efficiently, independently, and collaboratively within a high-caliber team environment.
Preferred Qualifications
Active contributor, committer, or PMC member in Apache or other significant open-source projects.
Experience with Java, Rust, C++, Apache Arrow, DataFusion, Parquet, Iceberg, or Delta Lake is a plus.

