About the job
Speechify’s mission is to remove reading barriers and open up learning opportunities for everyone. Our text-to-speech technology supports over 50 million users, turning PDFs, books, Google Docs, news articles, and websites into audio. This helps people read more efficiently and remember more. The product suite covers iOS, Android, Mac, a Chrome Extension, and a Web App. Speechify has earned recognition from Google as Chrome Extension of the Year and received Apple’s 2025 Design Award for Inclusivity.
The team is fully distributed, with nearly 200 professionals from companies such as Amazon, Microsoft, and Google, as well as graduates from Stanford and other top universities. Speechify operates without physical offices.
Role Overview
This Software Engineer position sits within the AI team, focusing on data infrastructure and acquisition. The work spans all aspects of data collection to support model training. The role blends infrastructure, engineering, and research to help build high-quality datasets at petabyte scale, efficiently and cost-effectively.
What You Will Do
- Identify and source new audio data to improve the data ingestion pipeline.
- Manage and expand cloud infrastructure for data ingestion, currently on Google Cloud Platform (GCP) and managed with Terraform.
- Work closely with scientists to improve data quality, throughput, and cost, supporting the development of future models.
- Collaborate with the AI team and leadership to plan and prioritize datasets for upcoming consumer and enterprise products.
What We Look For
- BS, MS, or PhD in Computer Science or a related field.
- At least 5 years of professional software development experience.
- Strong skills in bash and Python scripting in Linux environments.
- Familiarity with Docker and Infrastructure-as-Code, with hands-on experience using a major cloud provider (GCP preferred).
- Experience with web crawlers and large-scale data processing is a plus.
- Comfort handling multiple priorities and shifting focus as needed.
- Clear verbal and written communication skills.
Location: Cambridge, United Kingdom (fully remote/distributed team).

