companyLila Sciences logo

Senior / Principal Research Engineer in LLM Synthetic Data

Lila SciencesCambridge, MA USA; San Francisco, CA USA
On-site Full-time $224K/yr - $336K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Senior

Qualifications

We are looking for candidates who possess a robust background in machine learning and data science, particularly in synthetic data generation. A strong ability to work collaboratively in a team and a passion for innovation in the scientific domain are essential.

About the job

Your Impact at Lila

As a Senior or Principal Research Engineer specializing in Synthetic Data, you will play a pivotal role in shaping the vision, roadmap, and execution of our synthetic data initiatives. Your responsibilities will span from asset generation and simulation to integrating machine learning training and achieving measurable enhancements in model performance. Collaborating closely with our Research Engineering team, you will design, generate, and implement artificial datasets aimed at training, testing, and refining Lila’s platform to achieve our strategic objectives.

What You Will Build

  • Define and refine the synthetic data strategy along with a comprehensive multi-quarter roadmap.
  • Create evaluation frameworks that effectively connect synthetic interventions with genuine model performance.
  • Establish high standards for asset quality, diversity, thorough documentation, and reproducibility while fostering a robust review culture.

What You Will Need to Succeed

  • Over 6 years of experience in applied ML/ML systems, with at least 3 years leading industry initiatives, showcasing a strong track record in advanced algorithms and frameworks designed for large-scale synthetic data generation.
  • More than 8 years of experience working with contemporary ML workflows, including Python, PyTorch, dataset tools, training loops, and evaluation frameworks; adept at profiling and optimizing GPU-intensive pipelines.

Bonus Points For

  • A proven history of constructing synthetic datasets from source data to significantly enhance model performance in specific domains.
  • Experience with instruction fine-tuning and hill-climbing techniques.
  • Ability to translate product requirements and feedback into a scalable synthetic data generation pipeline.
  • Knowledge of quantization, distillation, routing, mixture-of-experts, and cost optimization at scale.
  • Experience in compliance-heavy settings (HIPAA, PCI, FedRAMP) and with on-premises/VPC deployments.

About Lila

Lila Sciences stands at the forefront of innovation as the world’s first scientific superintelligence platform and autonomous lab, dedicated to life sciences, chemistry, and materials science. We are ushering in an era of limitless discovery by harnessing AI to enhance every facet of the scientific method. Our mission is to empower scientists to tackle humanity's most pressing challenges in health, climate, and sustainability at an unprecedented pace and scale. Discover more about our mission at www.lila.ai.

If this sounds like an environment in which you would thrive, we encourage you to apply even if your experience doesn't perfectly align with every requirement listed.

About Lila Sciences

Lila Sciences is pioneering the future of scientific exploration by developing the first scientific superintelligence platform that integrates AI into life sciences, chemistry, and materials science, enabling rapid advancements to solve global challenges.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.