About the job
Who We Are
At Twelve Labs, we are on a mission to redefine the global standards for video understanding AI! We are creating world-class AI models tailored for video data, providing specialized search, analysis, summarization, and insight generation capabilities.
Our models are utilized by the largest sports leagues worldwide to swiftly and accurately highlight key moments in vast game footage, delivering an ultra-personalized viewing experience. In South Korea, our technology assists integrated command centers in efficiently navigating CCTV footage to respond to crises, while major global broadcasters and studios leverage our models for content creation aimed at billions of viewers.
As a Deep Tech startup with offices in San Francisco and Seoul, Twelve Labs has been recognized as one of the world's top 100 AI startups by CB Insights for four consecutive years. We have secured over $110 million in investments from leading VCs and companies including NVIDIA, NEA, Index Ventures, Databricks, and Snowflake. Our AI model, uniquely developed in Korea, is exclusively offered through Amazon Bedrock. We thrive on building innovative products alongside exceptional colleagues and growing with clients worldwide.
Our core values are foundational to our work:
Honesty and reflection about ourselves and our teams
Perseverance and humility in the face of failure and feedback
A commitment to continuously elevate team capabilities through ongoing learning
If you enjoy tackling challenging problems and growing through the process, the opportunity awaits you at Twelve Labs!
About the Team
The ML Data Team at Twelve Labs believes that 'data determines the performance of AI models.' We construct high-quality data end-to-end necessary for training and evaluating multimodal AI models. Our team is responsible for collecting, filtering, processing, and labeling diverse multimodal data such as video, image, and audio. We collaborate with various teams to design training data that can unleash new model capabilities, while also developing evaluation datasets that reflect real user experiences. To perform all these processes efficiently, we develop and continuously improve internal tools.
Through a meticulously designed data pipeline, the ML Data Team plays a crucial role in developing Twelve Labs' world-class video understanding models.
About the Role
As the Engineering Manager for the ML Data Team, you will be a pivotal leader in building and guiding a high-performing team, while also developing the large-scale data infrastructure necessary for AI model training. This position encompasses team operations, technical architecture design, engineering recruitment and mentoring, and enhancing project execution, all aimed at fostering a high-quality data ecosystem that supports Twelve Labs' products and research.
In this Role, You Will
Team Building and Technical Culture Development
Lead the ML data engineering team by defining and executing the team's mission and technical strategy. Establish and execute recruitment plans to attract top talent, while fostering a high-level engineering culture through code reviews and technical mentoring.Design and Operate Multimodal ML Data Engine
Design and operate pipelines for collecting, refining, and labeling petabyte-scale video, image, and audio data for AI model training. Additionally, build an automated dataset generation system for VLM/LLM training and a high-precision preprocessing engine.Design and Operate Multimodal ML Data Infrastructure
Create and manage scalable data infrastructure capable of reliably processing large-scale data.Establish and Execute Data Strategy
Collaborate closely with PM, research, infrastructure, product, and other organizations to establish and prioritize a data strategy aligned with the company's vision.

