About the job
At Unstructured, we are setting the benchmark for enterprise data transformation in an era defined by large language models (LLMs) and generative AI. Within just two years, we've successfully secured over $65 million in funding from esteemed investors such as Menlo Ventures, Bain Capital, Databricks, NVIDIA, Microsoft, and IBM.
Our open-source toolkit has achieved over 52 million downloads and is utilized by more than 66,000 companies, including nearly half of the Fortune 500. We empower AI workflows across commercial and federal sectors by transforming unstructured data—such as PDFs, HTML, Word documents, images, and emails—into scalable, AI-ready data pipelines.
We’re not just creating tools; we’re constructing the fundamental infrastructure of generative AI that enables intelligence across various industries.
Unstructured is seeking Staff and Principal Software Engineers to establish the architectural foundation for processing and transforming unstructured data for LLM applications.
This high-impact, hands-on position is ideal for individuals who excel in complex systems work—those who could author a guide on Kubernetes, are passionate about Python, and have a deep understanding of Postgres at the index and transaction-log levels. You will lead the technical direction for our core platform, overseeing the flow of data from ingestion to transformation and retrieval.
You will collaborate closely with the Head of Engineering, Head of Infrastructure, and product leaders to ensure Unstructured’s systems are efficient, resilient, and capable of supporting next-generation AI workloads.

