companySpeechify logo

Software Engineer - Data Infrastructure & Acquisition

SpeechifyChennai, India
Remote Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Mid to Senior

Qualifications

Ideal Candidate Qualifications BS, MS, or PhD in Computer Science or a related field. 5+ years of experience in software development. Strong proficiency in bash/Python scripting within Linux environments. Expertise in Docker and Infrastructure-as-Code principles, with hands-on experience with at least one major cloud provider (GCP preferred). Experience with web crawlers and large-scale data processing workflows is a plus. Ability to multitask and adapt to changing priorities effectively. Excellent communication skills, both written and verbal.

About the job

Speechify’s mission is to remove reading barriers from learning. With a user base of over 50 million, our products convert materials like PDFs, books, Google Docs, news articles, and websites into audio. People use Speechify to read faster, understand more, and remember what matters. Our platform includes apps for iOS, Android, Mac, Chrome, and the web. Recent recognition includes Chrome Extension of the Year from Google and Apple’s 2025 Design Award for Inclusivity.

Our distributed team includes nearly 200 professionals: frontend and backend engineers, AI researchers, and others from companies such as Amazon, Microsoft, and Google, as well as alumni of top PhD programs and founders of startups like Stripe and Vercel.

Role Overview

Speechify is hiring a Software Engineer to join the AI team’s data division in Chennai, India. This role centers on data collection for model training. The team’s focus: building high-quality datasets at petabyte scale while keeping costs low through a blend of infrastructure, engineering, and research.

What You Will Do

  • Find and integrate new audio data sources into the ingestion pipeline.
  • Manage and improve cloud infrastructure for data ingestion, using GCP and Terraform.
  • Work with scientists to push the boundaries of cost, throughput, and data quality for next-generation models.
  • Collaborate with AI team members and leadership to shape the dataset roadmap for future consumer and enterprise products.

Qualifications

  • BS, MS, or PhD in Computer Science or a related discipline.
  • At least 5 years of software development experience.
  • Strong skills in bash and Python scripting on Linux.
  • Expertise with Docker and Infrastructure-as-Code, plus hands-on work with at least one major cloud provider (GCP preferred).
  • Background with web crawlers and large-scale data processing is a plus.
  • Comfort with multitasking and shifting priorities.
  • Clear written and verbal communication skills.

About Speechify

Speechify is dedicated to eliminating reading barriers in learning by providing innovative text-to-speech solutions. With millions of users, our products enhance reading efficiency and comprehension, making learning accessible for everyone.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.