Reka logoReka logo

Technical Staff Member (Data) - World Models at Reka

RekaUS, Singapore, Remote
Remote Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Experience

Qualifications

To excel in this role, candidates should possess strong data engineering skills, experience in ML data operations, foundational knowledge of machine learning, and the ability to create robust specifications for AI systems.

About the job

The Technical Staff Member (Data) - World Models position at Reka centers on building and maintaining large-scale data systems for AI model training. This role can be based in the US, Singapore, or remote. The focus is on integrating and processing petabyte-scale multimodal datasets, ensuring data infrastructure is both sustainable and efficient.

What you will do

  • Lead the development of data pipelines and storage solutions to handle vast and varied datasets for model training.
  • Create automated, resource-efficient tools and systems that process diverse small datasets at scale.
  • Design, automate, and maintain Python ETL pipelines (using Spark or Ray) for large multimodal data processing.
  • Develop and maintain systems for data cataloging, lineage tracking, quality assurance, integrity checks, access management, and lifecycle oversight.
  • Support colleagues by providing internal tools, documentation, and guidance on data best practices.
  • Act as the primary steward of the organization’s datasets, ensuring their quality, accessibility, and overall health.

Key challenges

  • Build high-performance pipelines that process petabyte-scale datasets across thousands of CPUs and hundreds of GPUs.
  • Adapt data formats, storage, and processing methods to keep pace with AI advancements while maintaining backward compatibility.
  • Scale data infrastructure to support rapid organizational growth.
  • Ensure the platform remains flexible for handling heterogeneous datasets and ad-hoc analytics needs.

Requirements

  • Expertise in data engineering, especially with Python ETL pipelines and familiarity with infrastructure, data formats, and large-scale storage systems.
  • Experience in managing datasets, annotations, and data versioning for machine learning model training.
  • Solid understanding of fundamental machine learning concepts to collaborate effectively with researchers and inform platform decisions.
  • Ability to draft clear specifications for AI agents and maintain strong human oversight of AI-generated outputs.
  • Demonstrated initiative, ownership, and effective communication in managing workload and priorities.

About Reka

Reka is at the forefront of technological innovation, focusing on developing advanced AI-driven solutions that leverage extensive datasets. Our mission is to push the boundaries of what's possible in artificial intelligence while fostering a collaborative and inclusive work environment.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.