About the job
Join us at datacurve as we innovate a gamified developer platform that empowers thousands of engineers to create high-fidelity datasets, advancing the frontiers of large language models (LLMs). In this pivotal role, you will oversee the entire technical lifecycle of our data pipelines—from collaborating with partner labs to establish new data formats, to delivering the essential tools, environments, documentation, and quality assurance processes that bring these formats to life at scale.
Key Responsibilities
Lead Projects End-to-End: Take ownership of projects from initial prototyping through to ongoing maintenance and iterative improvements based on user feedback.
Oversee Developer Experience Pipelines: Develop and prototype tools for capturing new data formats, transitioning to a production workflow, and refining the developer experience.
Champion Developer Experience: Produce clear and concise guidelines and documentation to empower our contributors and ensure the quality of project inputs.
Quality Assurance & Governance: Establish and manage quality standards for your projects, which includes training content reviewers to ensure data consistency and accuracy. Implement automated checks, evaluation harnesses, and workflows to meet data quality benchmarks.
Continuous Improvement: Monitor systems, troubleshoot issues, and enhance reliability, latency, and contributor success rates.
Occasional Responsibilities
Define Innovative Data Formats: Collaborate with frontier lab researchers to create specifications and design schemas, metadata, and versioning for new formats.
Develop Tools and Environments: Deliver tools, sandboxes, command-line interfaces (CLIs), and instrumentation to streamline contribution processes.

