About the job
Speechify builds text-to-speech tools that help over 50 million people worldwide turn reading materials, like PDFs, books, Google Docs, news articles, and websites, into audio. Our platform spans iOS, Android, Mac, Chrome, and web. Recent recognition includes Chrome Extension of the Year by Google and Apple’s 2025 Design Award for Inclusivity.
The team brings together nearly 200 people from diverse backgrounds, including engineers, AI researchers, and alumni of companies such as Amazon, Microsoft, and Google. Many hold advanced degrees from places like Stanford. Speechify works fully remotely, without a physical office.
Role Overview
This Software Engineer role focuses on data infrastructure and acquisition within our AI group. The main mission: collect and manage large-scale, high-quality datasets to improve model training. The work blends infrastructure, engineering, and research to support petabyte-scale data growth while keeping costs in check.
What You Will Do
- Identify and integrate new audio data sources into the ingestion pipeline.
- Oversee and improve cloud infrastructure for data ingestion, using GCP and Terraform.
- Partner with scientists to refine cost, throughput, and data quality, enabling richer and larger datasets for future models.
- Work with the AI team and leadership to shape a dataset roadmap for upcoming consumer and enterprise products.
Location
This position is based in Hong Kong, Hong Kong. The team operates fully remotely.

