About the job
Speechify aims to remove reading barriers for learners everywhere. More than 50 million people use our text-to-speech tools to turn PDFs, books, Google Docs, news articles, and websites into audio. This helps users read faster, cover more material, and remember more.
Our products include apps for iOS, Android, Mac, a Chrome extension, and a web app. Google named Speechify the Chrome Extension of the Year, and Apple awarded us the 2025 Design Award for Inclusivity.
The Speechify team is fully remote, with nearly 200 people across the globe. Our group includes frontend and backend engineers, AI research scientists, and team members from companies like Amazon, Microsoft, and Google, as well as top PhD programs such as Stanford.
Role Overview
The Data team within Speechify’s AI division is looking for a Software Engineer focused on data infrastructure and acquisition. This position plays a key part in collecting and managing the large-scale datasets that power our model training. The team’s goal is to build high-quality datasets at petabyte scale and keep costs low by tightly integrating infrastructure, engineering, and research.
What You Will Do
- Find and bring in new audio data sources for the ingestion pipeline.
- Maintain and grow the cloud infrastructure for data ingestion, currently on Google Cloud Platform (GCP) and managed with Terraform.
- Work with scientists to improve data cost, throughput, and quality, supporting the creation of advanced models.
- Partner with the AI team and company leadership to shape the dataset roadmap for future consumer and enterprise products.
Location
This role is based in Cardiff, United Kingdom.

