companyLanceDB logo

Senior Open Source Engineer at LanceDB | San Francisco, CA

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Senior

Qualifications

Requirements10+ years of experience building high-performance databases, big data systems, or large-scale data servicesDeep understanding of open-source Big Data or AI training systems (e.g., Hadoop, Spark, Flink, Ray, Iceberg, Delta Lake, Hudi, ClickHouse, Trino, Presto, PyTorch, or JAX)Strong experience with high-performance computing in Java or ScalaExperience with Rust (or willingness to learn it)Proven ability to work independently and as part of a high-caliber team

About the job

About LanceDB

LanceDB is an innovative, developer-centric, open-source database designed for multimodal AI applications. We provide robust solutions ranging from hyper-scalable vector search capabilities to advanced retrieval for Retrieval-Augmented Generation (RAG). LanceDB is your ideal partner for creating AI applications, enabling seamless interaction with large-scale AI datasets and powering some of the most cutting-edge applications across various industries.

About the Role

We are seeking a Senior Open Source Engineer to enhance the presence of LanceDB within the extensive data infrastructure ecosystem. You will engage in projects that sit at the convergence of high-performance computing, big data, and open-source systems. Your contributions will drive integrations, optimize distributed operations, and support initiatives within the Apache and AI communities.

Key Responsibilities

  • Lead open-source community initiatives to integrate the Lance format with systems such as Spark, Hive Metastore, Presto, Trino, and Ray.

  • Design and sustain efficient distributed operations for Lance datasets.

  • Develop optimized indices to facilitate predicate pushdown and enhance query performance in Spark, Ray, or Trino.

  • Engage in the development of table formats, data encodings, and various components of the Lance format using Rust.

  • Manage and enhance internal data processing infrastructure.

  • Advocate for the Lance format in open-source forums and at major Big Data conferences.

Requirements

  • Over 10 years of experience in developing high-performance databases, big data systems, or large-scale data services.

  • In-depth knowledge of the internal workings of open-source Big Data or AI training systems such as Hadoop, Spark, Flink, Ray, Iceberg, Delta Lake, Hudi, ClickHouse, Trino, Presto, PyTorch, or JAX.

  • Extensive experience with high-performance computing using Java or Scala.

  • Familiarity with Rust is preferred, or a strong willingness to learn.

  • Demonstrated ability to work efficiently, independently, and collaboratively within a high-caliber team environment.

Preferred Qualifications

  • Active contributor, committer, or PMC member in Apache or other significant open-source projects.

  • Experience with Java, Rust, C++, Apache Arrow, DataFusion, Parquet, Iceberg, or Delta Lake is a plus.

About LanceDB

LanceDB is at the forefront of developing open-source solutions for multimodal AI, providing a robust database architecture that supports advanced AI applications and large-scale data processing. Our technology empowers developers with the tools they need to build innovative applications that push the boundaries of what's possible in AI.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.