About the job
Speechify’s mission is to remove reading barriers from learning. With a user base of over 50 million, our products convert materials like PDFs, books, Google Docs, news articles, and websites into audio. People use Speechify to read faster, understand more, and remember what matters. Our platform includes apps for iOS, Android, Mac, Chrome, and the web. Recent recognition includes Chrome Extension of the Year from Google and Apple’s 2025 Design Award for Inclusivity.
Our distributed team includes nearly 200 professionals: frontend and backend engineers, AI researchers, and others from companies such as Amazon, Microsoft, and Google, as well as alumni of top PhD programs and founders of startups like Stripe and Vercel.
Role Overview
Speechify is hiring a Software Engineer to join the AI team’s data division in Chennai, India. This role centers on data collection for model training. The team’s focus: building high-quality datasets at petabyte scale while keeping costs low through a blend of infrastructure, engineering, and research.
What You Will Do
- Find and integrate new audio data sources into the ingestion pipeline.
- Manage and improve cloud infrastructure for data ingestion, using GCP and Terraform.
- Work with scientists to push the boundaries of cost, throughput, and data quality for next-generation models.
- Collaborate with AI team members and leadership to shape the dataset roadmap for future consumer and enterprise products.
Qualifications
- BS, MS, or PhD in Computer Science or a related discipline.
- At least 5 years of software development experience.
- Strong skills in bash and Python scripting on Linux.
- Expertise with Docker and Infrastructure-as-Code, plus hands-on work with at least one major cloud provider (GCP preferred).
- Background with web crawlers and large-scale data processing is a plus.
- Comfort with multitasking and shifting priorities.
- Clear written and verbal communication skills.

