About the job
Speechify aims to remove reading as a barrier to learning. More than 50 million people use Speechify to turn text from PDFs, books, Google Docs, and web articles into audio, helping them read faster and retain more. The product suite spans iOS, Android, Mac, a Chrome extension, and web. Speechify has earned recognition from Google as Chrome Extension of the Year and from Apple with the 2025 Design Award for Inclusivity.
The company operates fully remotely, with a team of nearly 200 professionals. Team members include frontend and backend engineers, AI research scientists, and specialists from organizations such as Amazon, Microsoft, Google, Stripe, and Vercel.
Role Overview
Speechify is hiring a Software Engineer focused on data infrastructure and acquisition for the AI team. This engineer will oversee data collection processes used to train models, working closely with both engineers and researchers. The work centers on building and maintaining high-quality datasets at petabyte scale, while keeping infrastructure costs low through thoughtful engineering.
Key Responsibilities
- Find and source new audio data to improve the ingestion pipeline.
- Manage and expand cloud infrastructure for data ingestion, currently on Google Cloud Platform (GCP) with Terraform.
- Work with scientists to optimize data cost, throughput, and quality for next-generation model development.
- Partner with the AI team and leadership to plan dataset roadmaps for consumer and enterprise products.
Qualifications
- Bachelor’s, Master’s, or PhD in Computer Science or a related field.
- At least 5 years of professional experience in software development.
- Skilled in bash and Python scripting, especially in Linux environments.
- Hands-on experience with Docker, Infrastructure-as-Code, and a major cloud provider (GCP preferred).
- Knowledge of web crawlers and large-scale data processing workflows is a plus.
- Comfortable managing multiple priorities and adapting to shifting requirements.
- Strong written and verbal communication skills.
Location
Remote. The team is distributed, but this role is listed for Ann Arbor, MI, USA.

