About the job
At Exa, we are on a mission to create a cutting-edge search engine from the ground up, tailored specifically for AI applications. Our team is dedicated to developing large-scale infrastructure that efficiently crawls the internet, trains advanced embedding models for indexing, and constructs high-performance vector databases in Rust for optimized searching. We also manage a state-of-the-art $5M H200 GPU cluster that activates thousands of machines simultaneously.
As a Software Engineer specializing in Distributed Data Systems, you will be responsible for designing and implementing the data infrastructure that drives our operations—from crawling billions of web pages to training sophisticated embedding models and delivering real-time search functionalities. You will enjoy significant autonomy in creating systems capable of scaling to hundreds of petabytes. This is your opportunity to work on data pipelines at an unprecedented scale.

