companyCartesia logo

Data Research Engineer at Cartesia | San Francisco, CA

Cartesia*HQ - San Francisco, CA
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Experience in building or working with large multilingual datasets. Experience with generative models (speech, text, or multimodal). Ability to guide human annotation and evaluation across multiple languages. Strong analytical skills and a passion for data-driven decision-making.

About the job

About Cartesia

At Cartesia, our vision is to create the future of artificial intelligence—intelligent systems that are seamlessly integrated into daily life. We aim to overcome current limitations by enabling models to continuously understand and analyze vast streams of audio, video, and text data—ranging from 1 billion text tokens to 1 trillion video tokens—right on your device.

Our pioneering team, comprised of PhDs from the Stanford AI Lab, has developed State Space Models (SSMs), a groundbreaking approach to training efficient, large-scale foundation models. With a rich blend of expertise in model innovation and systems engineering, alongside a product-focused engineering team, we are committed to developing and delivering cutting-edge AI models and user experiences.

Supported by prominent investors including Index Ventures and Lightspeed Venture Partners, as well as many esteemed advisors and over 90 angel investors from diverse industries, we are at the forefront of AI advancements.

About The Role

In our quest to create truly global AI, we must train our models using datasets that represent the vast diversity of languages and cultures around the world. We are looking for a Research Engineer to take charge of the quality and comprehensiveness of the data that drives our models. As our in-house expert in global data, you will ensure that our models excel across multiple languages, leveraging your keen understanding of linguistic subtleties and your enthusiasm for building inclusive, large-scale datasets.

Your Impact

  • Design and construct extensive datasets for model training, conducting controlled experiments to evaluate their effect on model performance.

  • Develop assessments for speech models through both manual annotation and automated evaluation metrics.

  • Utilize data generation techniques to enhance model intelligence and reduce biases.

  • Create automated quality control systems to validate and filter the generated data.

  • Collaborate with product teams to ensure optimal support for key languages and markets.

What You Bring

  • Proven experience in developing or working with extensive multilingual datasets.

  • Familiarity with generative models, including speech, text, or multimodal systems.

  • Ability to guide human annotation and evaluation across various languages.

  • Strong analytical skills and a passion for data-driven decision-making.

About Cartesia

Cartesia is at the cutting edge of artificial intelligence, focused on creating interactive intelligence that enhances our daily lives. With a strong foundation in innovative model development and a commitment to inclusivity, we are redefining the possibilities of AI technology.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.