About the job
Speechify’s mission is to remove barriers to learning caused by reading challenges. With a user base of more than 50 million, Speechify turns PDFs, books, Google Docs, news articles, and websites into audio, helping people read faster and retain more. Our text-to-speech technology powers apps for iOS, Android, Mac, Chrome, and the web. Google named our Chrome extension Extension of the Year, and Apple recognized us with a 2025 Design Award.
Speechify operates fully remotely, with no physical office. Nearly 200 team members, including frontend and backend engineers, AI researchers, and industry veterans from Amazon, Microsoft, Google, and more, work together from around the world. Our team also includes graduates of top PhD programs and founders of companies like Stripe, Vercel, and Bolt.
Role Overview
The Data team within our AI division is looking for a Software Engineer focused on data infrastructure and acquisition. This position plays a central role in collecting and managing the data that powers model training. The team’s goal: build and maintain high-quality datasets at petabyte scale while controlling costs through strong infrastructure and engineering practices.
What You’ll Do
- Find and integrate new sources of audio data into the ingestion pipeline.
- Manage and improve the cloud infrastructure for data ingestion, using Google Cloud Platform (GCP) and Terraform.
- Work with scientists to optimize for cost, throughput, and data quality, delivering better datasets to support advanced models.
- Collaborate with the AI team and company leadership to shape the dataset roadmap for future consumer and enterprise products.
Location
This role is based in Mexico City, Mexico.

