About the job
Speechify’s mission is to remove reading as a barrier to learning. Over 50 million people use our text-to-speech tools to turn books, PDFs, Google Docs, news, and websites into audio. Our products include iOS and Android apps, a Mac app, a Chrome extension, and a web app. Speechify has earned recognition from Google as Chrome Extension of the Year and received Apple’s 2025 Design Award for Inclusivity.
Our fully distributed team includes nearly 200 professionals, from engineers to AI researchers, with backgrounds at Amazon, Microsoft, Google, Stripe, Vercel, and Bolt. Collaboration happens across time zones and continents.
Role Overview
This Software Engineer, Data Infrastructure & Acquisition role sits within the AI team. The focus: oversee data collection and infrastructure to support model training. The team builds large, high-quality datasets at petabyte scale while keeping costs low. The work blends infrastructure, engineering, and research. Candidates who care deeply about data engineering and AI will find meaningful challenges here.
What You Will Do
- Find and evaluate new audio data sources, then integrate them into the ingestion pipeline.
- Maintain and scale cloud infrastructure for data ingestion, using GCP and Terraform.
- Work with scientists to balance cost, throughput, and data quality, delivering better datasets for model development.
- Partner with the AI team and leadership to shape a dataset roadmap for future Speechify products.
What We Look For
- Bachelor’s, Master’s, or PhD in Computer Science or a related field.
- At least 5 years of software development experience.
- Strong skills in bash and Python scripting in Linux environments.
- Hands-on experience with Docker and Infrastructure-as-Code (such as Terraform), plus experience with a major cloud provider (GCP preferred).
- Familiarity with web crawlers and large-scale data processing is a plus.
- Comfort managing multiple priorities and shifting demands.
- Clear communication skills, both written and spoken.
Location: Columbus, OH, USA (remote team, distributed worldwide).

