About the job
Skydio stands at the forefront of the drone industry, recognized as the leading American company and a global innovator in autonomous flight technology. Our mission is to revolutionize aerial mobility through cutting-edge artificial intelligence, advanced hardware, and exceptional software development. At Skydio, we are dedicated to creating solutions that empower a diverse array of users—from utility inspectors and first responders to soldiers in challenging environments.
About the Role:
As a Senior Engineer within the Autonomy Data Curation team, you will play a pivotal role in developing our data flywheel. Your expertise will be crucial in gathering essential data from various drone fleets and transforming it into high-quality, model-ready datasets for our Autonomy teams. You will collaborate closely with the Deep Learning and Computer Vision teams to ensure swift and confident data utilization for training and model development.
This position is an individual contributor role, reporting directly to the Director of Autonomy Data Curation.
How You’ll Make an Impact:
Create and manage efficient pipelines that convert raw autonomy logs and media into well-curated datasets, ensuring strong observability and clear ownership, facilitating broad data reuse.
Develop tools for rapid data discovery and self-service slicing for Autonomy teams, including media search functionalities and hard mining loops with infrastructure for automated data routing to annotation.
Enhance dataset quality and repeatability through effective versioning, provenance tracking, and automated validations.
Implement privacy and security measures throughout our processes, including access controls, data retention policies, and redaction/anonymization techniques.
Adopt a data-driven and impact-oriented approach, utilizing dashboards to highlight costs, dataset balance, and audit details.
What Makes You a Good Fit:
Minimum of 5 years of professional software engineering experience (or equivalent), with significant responsibility for production systems.
Proficient in programming, with expertise in at least one of our primary languages (Python/C++).
Hands-on experience in building data pipelines for large-scale datasets (ETL/ELT, streaming or batch processing, orchestration).
Experience in data modeling, schema evolution, and dataset/version management.
Strong understanding of reliability engineering principles, including monitoring, observability, and incident management.

