companymaincode logo

Signal Engineer at maincode | Melbourne

maincodeMelbourne
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Essential SkillsProficient in Python and data processing frameworks. Detail-oriented with the ability to spot and rectify errors in large datasets. Ability to assess and curate high-quality training data. Experience with large and complex datasets. Interest in data's impact on machine learning models. Quick learning and adaptability.

About the job

Maincode is seeking a Signal Engineer in Melbourne to help shape the future of Matilda, a leading Australian language model. This role focuses on transforming massive amounts of raw data into high-quality training sets, directly influencing model performance. Both engineering and editorial skills are essential, as the work blends technical pipeline development with careful judgment about data quality.

Key Responsibilities

  • Design and implement large-scale data pipelines for ingesting, cleaning, deduplicating, filtering, and scoring training data (handling volumes from terabytes up to petabytes).
  • Develop classifiers and heuristics to distinguish valuable data from irrelevant or low-quality input.
  • Experiment with different dataset combinations to identify what best improves the language model.
  • Build tools for exploring, sampling, and auditing the data corpus.
  • Collaborate closely with researchers and training engineers to align data strategies with model objectives.

Requirements

  • Strong engineering background, particularly with Python, data tools, distributed processing, and building reliable data pipelines.
  • High attention to detail, recognizing that small errors can quickly scale in this environment.
  • Ability to assess the quality of training data accurately.
  • Comfort working with extremely large and complex datasets.
  • Interest in how data decisions influence model behavior.
  • Quick to pick up new concepts; prior experience with large language models is not required.

Preferred Experience

  • Experience with web-scale corpora or pre-training data pipelines.
  • Familiarity with unstructured text data.
  • Knowledge of distributed data frameworks such as Spark or Ray.
  • Background in deduplication, quality classification, or tokenization.

Additional Details

This is a full-time, in-person position based in Melbourne. Maincode cannot provide visa sponsorship, so candidates must already hold valid and unrestricted work rights in Australia.

About maincode

maincode is at the forefront of language modeling in Australia, dedicated to harnessing cutting-edge technology to create sophisticated and intelligent systems. Our flagship model, Matilda, is a testament to our commitment to quality and innovation in AI.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.