Gramian Consultancy logoGramian Consultancy logo

AI Evaluation Engineer - Data Analysis & Multi-Agent Systems

Gramian ConsultancyRemote — Brazil
Remote Contract

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Experience

Qualifications

5+ years of experience in data analysis or analytics-heavy rolesStrong proficiency in Python (pandas, NumPy) and SQLExperience handling real-world, messy datasets (CSV, JSON, logs, reports)Ability to design analytical problems with clear, verifiable answersSolid understanding of statistics (distributions, correlations, outliers)Familiarity with AI benchmarks or evaluation environments (e.g., SWE-bench or similar)Hands-on experience with Docker (Dockerfiles, image builds, debugging)

About the job

Gramian Consultancy seeks an AI Evaluation Engineer specializing in data analysis and multi-agent systems. This remote contractor position is open to candidates based in Brazil, as well as Bangladesh, Colombia, Egypt, Ghana, India, Indonesia, Kenya, Nigeria, Turkey, and Vietnam.

Role overview

This role centers on designing and implementing benchmark tasks that reflect real-world analytical challenges for AI systems. The focus is on building scenarios where multi-agent systems analyze large, complex datasets from various sources, assign tasks to specialized agents, and generate clear, verifiable results. The contract requires a full-time commitment of 8 hours per day, with at least 4 hours overlapping with Pacific Standard Time. The minimum contract length is 4 weeks. No medical or paid leave is provided, as this is a contractor role. The interview process includes a 60-minute take-home assessment.

What you will do

  • Design and implement benchmark tasks for multi-agent systems, emphasizing complex data analysis workflows
  • Create or select realistic datasets in formats such as CSV, JSON, logs, reports, and financial or operational data
  • Develop tasks that require cross-referencing multiple data sources, identifying anomalies and contradictions, and performing statistical analysis
  • Define strategies for distributing tasks among specialized sub-agents (for example, financial, technical, or operational analysis)
  • Develop verification logic to ensure analytical outputs are precise and not generic
  • Build evaluation pipelines using Python and SQL
  • Create reproducible environments with Docker
  • Review and refine tasks for clarity, complexity, and scoring accuracy

Requirements

  • Minimum 5 years of experience in data analysis or analytics-focused positions
  • Advanced skills in Python (including pandas and NumPy) and SQL
  • Experience working with real-world, messy datasets (CSV, JSON, logs, reports)
  • Ability to design analytical problems with verifiable answers
  • Strong understanding of statistics, including distributions, correlations, and outlier detection
  • Familiarity with AI benchmarks or evaluation tools (such as SWE-bench or similar)
  • Hands-on experience with Docker, including writing Dockerfiles, building images, and troubleshooting

About Gramian Consultancy

Gramian Consultancy is a leading consultancy firm that specializes in IT professional services and engineering talent solutions. With our extensive experience in software engineering and leadership, we empower companies to construct exceptional teams by aligning them with the right professionals.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.