About the job
Speechify builds text-to-speech tools that help over 50 million people access content in new ways. Our products convert reading materials, PDFs, books, Google Docs, news articles, and websites, into audio, supporting users to read efficiently and retain more. We offer mobile apps for iOS and Android, a Chrome Extension, and a Web App. Our work has earned recognition from Google (Chrome Extension of the Year) and Apple (2025 Design Award for Inclusivity).
Our fully remote team includes nearly 200 professionals worldwide, with backgrounds at Amazon, Microsoft, Google, and top universities such as Stanford. We bring together frontend and backend engineers, AI researchers, and others passionate about accessible technology.
Role Overview
The Software Engineer - Data Infrastructure & Acquisition will join Speechify’s AI team, focusing on the Data division. This role centers on building and maintaining large-scale data collection systems that support model training. The work blends infrastructure, engineering, and research to deliver high-quality datasets at petabyte scale while controlling costs.
What You Will Do
- Identify and source new audio data, integrating it into Speechify’s ingestion pipeline.
- Manage and improve cloud infrastructure for the data ingestion pipeline, currently on Google Cloud Platform and managed with Terraform.
- Partner with scientists to optimize for cost, throughput, and data quality, supporting next-generation model development.
- Work with the AI team and company leadership to shape the dataset roadmap for upcoming consumer and enterprise products.
Location
This position is based in San Diego, CA, USA. The team operates fully remotely.

