About the job
Speechify aims to remove reading as a barrier to learning. Over 50 million people use Speechify’s text-to-speech tools to turn PDFs, books, Google Docs, news articles, and websites into audio. Users can read faster, retain more, and access information in ways that suit their needs. The product lineup includes apps for iOS, Android, Mac, Chrome, and the web. Recent recognition includes Chrome Extension of the Year from Google and Apple’s 2025 Design Award for Inclusivity.
Speechify is a fully distributed company with nearly 200 team members. The group brings together frontend and backend engineers, AI research scientists, and professionals from companies like Amazon, Microsoft, and Google. Team members also include PhD candidates from top programs such as Stanford and founders from high-growth startups including Stripe, Vercel, and Bolt.
Role Overview
The Data team within Speechify’s AI division is looking for a Software Engineer focused on Data Infrastructure & Acquisition. This position centers on data collection to support model training. The team combines infrastructure, engineering, and research to build high-quality, petabyte-scale datasets efficiently. This role offers the chance to contribute to projects that shape the future of Speechify’s products.
What You Will Do
- Find and connect new audio data sources to the ingestion pipeline.
- Maintain and improve cloud infrastructure for the ingestion pipeline, currently running on Google Cloud Platform (GCP) and managed with Terraform.
- Work closely with scientists to optimize for cost, throughput, and quality, enabling larger and richer datasets at lower costs for new models.
- Partner with the AI team and leadership to plan datasets that will support future consumer and enterprise offerings.
What We Look For
- BS, MS, or PhD in Computer Science or a related field.
- At least 5 years of professional software development experience.
- Skilled in bash and Python scripting in Linux environments.
- Comfortable with Docker and Infrastructure-as-Code practices, plus experience with at least one major cloud provider (GCP preferred).
- Experience with web crawlers and large-scale data processing is a plus.
- Strong organizational skills and ability to handle shifting priorities.
- Clear written and verbal communication skills.
Location
This is a remote role based in Ottawa, Canada.

