About the job
Please submit your resume in English and include your English proficiency level.
Toloka AI, through the Mindrift platform, offers freelance opportunities for specialists to contribute to leading AI projects. This Senior Python Systems Developer - Functional Testing Specialist role is remote and open to candidates based in Wisconsin or anywhere in the United States. The position is project-based and not a permanent staff role.
Role overview
This position centers on designing and implementing functional black box tests for large, multi-language codebases. The work involves building and maintaining Docker environments to ensure reproducible builds and test executions across platforms. Monitoring code coverage and configuring automated scoring systems to align with industry standards are also key parts of the job. The role includes using large language models (LLMs) such as Roo Code and Claude Code to automate repetitive tasks, streamline workflows, and improve code quality.
What you will do
- Design and implement functional black box tests for codebases written in various programming languages.
- Set up and maintain Docker environments for reproducible builds and test runs.
- Monitor code coverage and configure automated scoring systems to meet benchmarks.
- Use LLMs (Roo Code, Claude Code) to automate tasks and enhance code quality.
Requirements
- At least 5 years of experience as a Software Engineer, primarily working with Python.
- Extensive experience with pytest, including fixtures, session-scoped tests, timeouts, and building black-box functional tests for CLI tools.
- Expertise in Docker: writing reproducible Dockerfiles, managing user contexts, and maintaining secure workspaces.
- Strong knowledge of Linux and Bash scripting, including debugging within containers.
- Familiarity with modern Python tools such as uv, pyproject.toml, and packaging.
- Ability to read and understand code in C, C++, Rust, and Go, with support from LLMs.
- Experience using LLMs (Claude Code, Roo Code, Cursor) to accelerate development and generate test cases.
- English proficiency at B2 level or higher.
Preferred qualifications
- Experience with agent evaluation platforms and MCP CLI.
Tools and technologies
Python (pytest, uv, Pillow), Docker, Bash, Git Submodules, C/C++/Rust/Go (reading), Dagger, GitHub Codespaces, LLMs (Claude Code, Roo Code, Cursor), coverage.py, gcov, kcov.
Project details and compensation
- Freelance, project-based work through the Mindrift platform (Toloka AI).
- Fully remote and flexible scheduling. Expected commitment is 20-30 hours per week.
- Compensation is project-based and depends on scope and expertise. AI trainers can earn up to $80 per hour.
