About the job
Speechify helps over 50 million people turn reading materials, PDFs, books, Google Docs, news articles, and websites, into audio. Our text-to-speech apps span iOS, Android, Mac, Chrome, and web. Google named us Chrome Extension of the Year, and Apple awarded us the 2025 Design Award for Inclusivity.
Our fully remote team of about 200 includes engineers, AI researchers, and alumni from Amazon, Microsoft, Google, Stripe, Vercel, Bolt, and top academic programs. We work across time zones to build tools that make reading more accessible for everyone.
Role overview
Speechify’s AI division is looking for a Software Engineer focused on Data Infrastructure & Acquisition. This role centers on collecting and managing large-scale audio data to support model training. The team builds and maintains systems that create high-quality datasets at petabyte scale, balancing efficiency and cost.
What you will do
- Find and source new audio data, then integrate it into our ingestion pipeline.
- Maintain and improve our cloud infrastructure for data ingestion, currently running on Google Cloud Platform and managed with Terraform.
- Partner with scientists to optimize for cost, throughput, and data quality, delivering large-scale datasets efficiently for new model development.
- Work with the AI team and company leadership to plan the dataset roadmap for future consumer and enterprise products.
Location
This position is based in Sydney, Australia.

