Qualifications
What You'll Be Doing
Design and manage large-scale data pipelines to generate training datasets for machine learning training and experimentation.
Develop infrastructure that supports distributed training workflows utilizing technologies such as PyTorch, Ray Data, and Ray Train.
Integrate ML pipelines with workflow orchestration systems like Flyte or Airflow to facilitate reliable multi-stage training workflows.
Enhance reproducibility and observability of ML pipelines through dataset validation, monitoring, and automated testing.
Optimize performance and resource utilization across distributed computing systems used for data processing and model training.
Collaborate closely with ML engineers to enable efficient large-scale experimentation and model iteration.
Lead architectural enhancements to ensure our offline ML pipelines remain scalable, reliable, and cost-effective.
What We're Looking For
Strong experience in building large-scale ML pipelines.
Proficiency with distributed computing frameworks such as Ray, Spark, Flink and familiarity with the Ray ecosystem (Ray Data, Ray Train) for distributed data processing and model training.
Experience in developing infrastructure for training data generation and management.
About the job
Unity Technologies is building an offline machine learning platform to support analytics, experimentation, attribution, and AI-driven decisions across the organization. This platform processes both batch and streaming data at scale, serving as the backbone for product intelligence, machine learning pipelines, and business operations. As data sets grow in size and complexity, the platform enables advanced model training, feature generation, and experimentation workflows that are critical for production ML systems.
What you will do
- Design and enhance Unity's offline machine learning infrastructure to support large-scale needs
- Develop reliable systems for generating training datasets and managing machine learning workflows
- Support efficient, distributed model training across teams
- Collaborate with machine learning engineers and platform teams to ensure pipelines scale with data growth and evolving training requirements
- Shape processes for dataset preparation, model training, validation, and delivery to distributed systems
- Maintain high standards for reliability, scalability, and performance of the offline ML platform
Requirements
This role requires a senior technical leader with experience in large-scale machine learning infrastructure. A successful candidate has a strong background in building and maintaining reliable, scalable, and well-architected ML pipelines.
Location
This position is based in Mountain View, CA, USA.
About Unity Technologies
Unity Technologies is at the forefront of creating real-time 3D content and experiences. Our mission is to empower creators by providing the most powerful and flexible platform for real-time 3D development, fostering innovation and creativity across industries.