About the job
About Liquid AI
Founded as a spin-off from MIT CSAIL, Liquid AI specializes in creating versatile AI systems designed for optimal performance across various deployment platforms, including data center accelerators and on-device hardware. Our technology emphasizes low latency, minimal memory consumption, privacy, and dependability. We collaborate with leading enterprises in sectors such as consumer electronics, automotive, life sciences, and financial services. As we experience rapid growth, we are on the lookout for exceptional talent to join our team.
The Opportunity
The Data team at Liquid AI drives the development of our Liquid Foundation Models, focusing on pre-training, vision, audio, and emerging modalities. With the stagnation of public data sources, the effectiveness of our models increasingly relies on specially curated datasets. We are seeking engineers with a machine learning mindset who can efficiently gather, filter, and synthesize high-quality data at scale.
At Liquid AI, we regard data as a research challenge rather than an infrastructural issue. Our engineers conduct experiments, design ablations, and assess how data-related decisions impact model quality. We will align you with a team where you can experience rapid growth and make a significant impact, be it in pre-training, post-training reinforcement learning, vision-language, audio, or multimodal applications.
While we prefer candidates in San Francisco and Boston, we are open to considering other locations.
What We're Looking For
We are in search of a candidate who:
- Thinks like a researcher and executes like an engineer: You should be able to formulate hypotheses, conduct experiments, and evaluate results. Our engineers produce research-level code while our researchers implement production systems.
- Learns quickly and adapts: You will be working in rapidly evolving modalities, so the ability to quickly grasp new domains and thrive in ambiguity is essential.
- Prioritizes data quality: We hold data quality in high regard; tasks such as filtering, deduplication, augmentation, and evaluation are key responsibilities, not afterthoughts.
- Solves problems autonomously: Data engineers operate within training groups (pre-training and multimodal). While collaboration is crucial, we expect ownership and self-direction.
The Work
- Develop and maintain data processing, filtering, and selection pipelines at scale.
- Establish pipelines for pretraining, midtraining, supervised fine-tuning, and preference optimization datasets.
- Design synthetic data generation systems utilizing large language models (LLMs), structured prompting, and domain-specific generative techniques.

