companySpeechify logo

Software Engineer - Data Infrastructure & Acquisition

SpeechifyCambridge, United Kingdom
Remote Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Ideal Candidate Qualifications BS, MS, or PhD in Computer Science or a related discipline. 5+ years of professional experience in software development. Strong proficiency in bash and Python scripting within Linux environments. Familiarity with Docker and Infrastructure-as-Code concepts, with hands-on experience using a major cloud provider (GCP preferred). Experience with web crawlers and large-scale data processing workflows is advantageous. Ability to multitask and adapt to evolving priorities. Excellent verbal and written communication skills.

About the job

Speechify’s mission is to remove reading barriers and open up learning opportunities for everyone. Our text-to-speech technology supports over 50 million users, turning PDFs, books, Google Docs, news articles, and websites into audio. This helps people read more efficiently and remember more. The product suite covers iOS, Android, Mac, a Chrome Extension, and a Web App. Speechify has earned recognition from Google as Chrome Extension of the Year and received Apple’s 2025 Design Award for Inclusivity.

The team is fully distributed, with nearly 200 professionals from companies such as Amazon, Microsoft, and Google, as well as graduates from Stanford and other top universities. Speechify operates without physical offices.

Role Overview

This Software Engineer position sits within the AI team, focusing on data infrastructure and acquisition. The work spans all aspects of data collection to support model training. The role blends infrastructure, engineering, and research to help build high-quality datasets at petabyte scale, efficiently and cost-effectively.

What You Will Do

  • Identify and source new audio data to improve the data ingestion pipeline.
  • Manage and expand cloud infrastructure for data ingestion, currently on Google Cloud Platform (GCP) and managed with Terraform.
  • Work closely with scientists to improve data quality, throughput, and cost, supporting the development of future models.
  • Collaborate with the AI team and leadership to plan and prioritize datasets for upcoming consumer and enterprise products.

What We Look For

  • BS, MS, or PhD in Computer Science or a related field.
  • At least 5 years of professional software development experience.
  • Strong skills in bash and Python scripting in Linux environments.
  • Familiarity with Docker and Infrastructure-as-Code, with hands-on experience using a major cloud provider (GCP preferred).
  • Experience with web crawlers and large-scale data processing is a plus.
  • Comfort handling multiple priorities and shifting focus as needed.
  • Clear verbal and written communication skills.

Location: Cambridge, United Kingdom (fully remote/distributed team).

About Speechify

Speechify is dedicated to breaking down reading barriers and making learning more accessible. With innovative text-to-speech solutions utilized by millions, we strive to enhance reading efficiency and retention across various platforms.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.