About the job
About Us
At Twelve Labs, we are at the forefront of pioneering advanced multimodal foundation models that enable video comprehension akin to human understanding. Our innovative models have set new benchmarks in video-language modeling, granting us enhanced capabilities and transforming media interaction and analysis.
With an impressive backing of over $110 million through Seed and Series A funding from esteemed venture capital entities such as NVIDIA’s NVentures, NEA, Radical Ventures, and Index Ventures, we are guided by influential AI leaders and founders, including Fei-Fei Li, Silvio Savarese, and Alexandr Wang. Our headquarters in San Francisco and significant APAC presence in Seoul emphasize our dedication to global innovation.
Our collaborations with NVIDIA and AWS grant us access to state-of-the-art chips, like the B300s, allowing us to expand the horizons of video AI.
We celebrate the diversity of each individual’s journey, believing that the variances in our cultural, educational, and life experiences empower us to challenge conventional norms. We seek passionate individuals inspired by our mission to make a meaningful impact in the realm of technology. Join us in revolutionizing video understanding and multimodal AI.
Team Overview
The Pegasus team is integral to Twelve Labs’ video understanding initiatives, responsible for our flagship Video Analysis product. Our mission is to develop multimodal video analysis systems that excel in instruction-following capabilities and generate complex, hierarchically structured outputs. We prioritize delivering products with tangible real-world impact and operate as a goal-driven, cross-functional team comprising both ML researchers and engineers.
Our projects tackle a wide array of challenges, including large-scale distributed training of multimodal LLMs, precise temporal segmentation, structured metadata extraction for practical applications, and enhancing temporal context lengths to several hours. We focus on data curation processes that enable aligned evaluation and performance improvements through enhanced training data.
Our team is equipped with the world’s most advanced chips, including NVIDIA B300s, to expedite our research-to-production cycle, driving rapid advancements in video analysis systems.

