About the job
Speechify builds tools that help remove reading barriers for millions. Our text-to-speech products turn PDFs, books, Google Docs, news articles, and websites into audio, making information more accessible and learning more enjoyable. Over 50 million people use our apps on iOS, Android, Mac, Chrome, and the web. Google named us Chrome Extension of the Year, and Apple recognized us with its 2025 Design Award for Inclusivity.
Our team of nearly 200 works fully remotely, with no physical office. Engineers, AI researchers, and specialists from companies like Amazon, Microsoft, and Google collaborate alongside startup founders and PhDs from top programs.
Role Overview
The Data Infrastructure & Acquisition team is looking for a Software Engineer to advance our AI data efforts. This position focuses on collecting and managing data to support model training. The team’s mission: build high-quality datasets at petabyte scale and low cost by combining infrastructure, engineering, and research.
What You Will Do
- Find and connect new audio data sources to our ingestion pipeline.
- Maintain and improve our cloud-based ingestion pipeline, which runs on GCP and uses Terraform for orchestration.
- Work with scientists to optimize data cost, throughput, and quality for next-generation model development.
- Partner with the AI team and company leadership to plan a dataset roadmap that supports both consumer and enterprise products.
Location
This is a remote role. Speechify is headquartered in Austin, TX, but the team operates globally.

