companyMithrl logo

Data Engineer - Scientific Data Ingestion

MithrlSan Francisco
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Must-have:5+ years of experience in data engineering or data wrangling with real-world tabular or semi-structured data. Strong proficiency in Python.

About the job

ABOUT MITHRL

We envision a world where innovative drugs and therapies reach patients in months rather than years, expediting breakthroughs that save lives.

Mithrl is at the forefront of creating the world's first commercially available AI Co-Scientist—an advanced discovery engine that enables life science teams to transform chaotic biological data into insightful discoveries in mere minutes. Scientists can pose questions in natural language, and Mithrl responds with genuine analysis, innovative targets, and patent-ready reports.

Our success is evident:

  • 12X year-over-year revenue growth

  • Trusted by leading biotech firms and major pharmaceutical companies across three continents

  • Driving significant breakthroughs from target discovery to patient outcomes.

WHAT YOU WILL DO

Take the lead in creating and managing an AI-driven data ingestion and normalization pipeline to assimilate data from diverse sources—ranging from raw Excel/CSV uploads to lab and instrument exports, as well as processed outputs from internal systems.

Develop comprehensive schema mapping, coercion, and conversion logic, including units normalization, metadata standardization, variable-name harmonization, addressing vendor-instrument peculiarities, plate-reader formats, reference-genome or annotation updates, and batch-effect corrections.

Utilize LLM-driven and classical data-engineering tools to structure semi-structured or messy tabular data, focusing on metadata extraction, inferring column roles/types, cleaning free-text headers, resolving inconsistencies, and preparing final clean datasets.

Ensure that all transformations that must occur only once—such as normalization, coercion, and batch-correction—are executed during ingestion, ensuring that downstream analytics and the AI Co-Scientist operate with clean, canonical data.

Establish validation, verification, and quality control measures to detect ambiguous, inconsistent, or corrupted data before it enters the platform.

Collaborate with product teams, data science/bioinformatics colleagues, and infrastructure engineers to define and uphold data standards, ensuring that pipeline outputs integrate smoothly into downstream analysis and storage systems.

WHAT YOU BRING

Must-have:

  • 5+ years of experience in data engineering or data wrangling with real-world tabular or semi-structured data.

  • Strong proficiency in Python,

About Mithrl

Mithrl is pioneering the development of AI-driven tools to accelerate drug discovery and enhance patient outcomes, with a vision of transforming the life sciences landscape.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.