About the job
About Us
At TwelveLabs, we are at the forefront of innovation, developing advanced multimodal foundation models capable of understanding videos as humans do. Our groundbreaking work in video-language modeling has set new benchmarks, enhancing our ability to interact with and analyze diverse media forms.
With over $110 million secured in Seed and Series A funding, we are supported by prestigious venture capitalists, including NVIDIA’s NVentures, NEA, Radical Ventures, and Index Ventures, alongside notable AI leaders like Fei-Fei Li and Silvio Savarese. Headquartered in San Francisco, our significant presence in the APAC region, particularly in Seoul, illustrates our dedication to global innovation.
Leveraging partnerships with NVIDIA and AWS grants us access to state-of-the-art technology, enabling us to explore the full potential of video AI.
We celebrate the uniqueness of every individual's journey, believing that our diverse backgrounds foster an environment where we can challenge norms and push technological boundaries. We invite driven individuals who are passionate about our mission to join us as we transform the landscape of video understanding and multimodal AI.
The Team
The Pegasus team is central to TwelveLabs' video comprehension capabilities, focusing on our Video Analysis product. We are dedicated to developing multimodal video analysis systems that excel in high instruction-following capabilities and produce complex, structured outputs. Our emphasis is on delivering products that create real-world impact, collaborating in a cross-functional team of ML researchers and engineers.
Our challenges include large-scale distributed training of multimodal LLMs from pre-training to reinforcement learning, precise temporal segmentation, and structured metadata extraction for practical applications. Additionally, we are involved in extending temporal context lengths and enhancing training data for improved evaluation and performance.
Equipped with the latest NVIDIA B300 chips, our team accelerates the transition from research to production, striving for rapid advancements in video analysis systems.

