companyZyphra logo

Data Engineer - Multimodal Systems

ZyphraSan Francisco
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Entry Level

Qualifications

To excel as a Data Engineer - Multimodal Systems at Zyphra, candidates should possess strong implementation skills, a collaborative spirit, and the ability to learn rapidly in a fast-paced environment. Candidates should be adept at transforming concepts into experiments, have a meticulous attention to detail, and demonstrate effective communication skills.

About the job

Zyphra is a cutting-edge artificial intelligence firm located in the heart of San Francisco, California, dedicated to advancing technology across various modalities.

About the Position:

We are seeking a Data Engineer - Multimodal Systems to play a pivotal role in the enhancement and expansion of Zyphra's datasets and data pipelines. This position offers a unique opportunity to collaborate with diverse teams and contribute to innovative data solutions. You will engage in the collection of extensive datasets and the development and optimization of high-performance parallel data pipelines.

Your Responsibilities Will Include:

  • Executing large-scale data collection across multiple modalities, including text, audio, and image.

  • Designing and implementing highly efficient, parallelized data processing pipelines that integrate various modalities.

  • Conducting rigorous experimental ablations to evaluate the effectiveness of new data enhancements.

Candidate Requirements:

  • Proven ability in implementation and prototyping.

  • Capability to transform ideas into experimental frameworks swiftly.

  • Strong collaborative skills, thriving in a dynamic research environment.

  • Eagerness to learn and apply new concepts effectively.

  • Exceptional communication and teamwork skills, capable of contributing to both research and large-scale engineering projects.

Preferred Qualifications:

  • Experience in the collection, management, and processing of large datasets.

  • Familiarity with parallel programming frameworks in Python, such as Dask.

  • In-depth understanding of state-of-the-art dataset curation practices.

  • A detail-oriented mindset with a passion for data integrity and verification.

  • Strong foundation in experimental methodologies for conducting thorough ablation studies and hypothesis testing.

  • Knowledge and interest in large-scale, highly parallel data processing systems.

  • Proficiency in PyTorch and Python.

  • Experience with large, complex codebases and the ability to quickly become productive within them.

  • Published research in respected machine learning venues.

  • Postgraduate degree in a relevant field is a plus.

About Zyphra

Zyphra is at the forefront of artificial intelligence innovation, striving to harness the power of data to drive advancements across various industries. Our San Francisco headquarters serves as a hub for creativity and technological development, where passionate individuals come together to push the boundaries of what’s possible.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.