About the job
About Gapstars
Gapstars is a dynamic partner to some of Europe’s most innovative tech firms, ranging from disruptive startups to rapidly growing scaleups. We specialize in building high-performing remote engineering teams and are headquartered in the Netherlands, with talent hubs located in Sri Lanka and Portugal. Our team of over 275 engineers excels at addressing real-world challenges using cutting-edge technologies across various domains, including networking, marketplaces, SaaS, and AI, thereby delivering scalable solutions that yield impactful results. If you seek an organization that fosters technical excellence, a robust culture, and opportunities for personal and professional growth, you will find your place at Gapstars.
Your Role
As a Senior Data Engineer, you will be instrumental in the design, construction, and maintenance of scalable and reliable data solutions on the Google Cloud Platform (GCP). Your responsibilities will include developing robust pipelines into BigQuery, implementing transformations (preferably using dbt), and ensuring high standards of data quality and operational stability. Additionally, you will support monitoring and observability efforts, particularly working with BI tools (such as Power BI) primarily for monitoring and validation, rather than as a BI developer.
Key Responsibilities
1) Data Engineering & Architecture (GCP-first)
- Design and enhance data solutions on GCP, selecting services and architecture patterns with a focus on scalability, reliability, performance, and cost efficiency.
- Construct and uphold comprehensive data pipelines (primarily batch processing, with streaming/event-driven solutions where applicable).
- Develop and sustain the structure of the BigQuery warehouse: establish data modeling conventions, implement partitioning/clustering strategies, and optimize for performance and cost.
- Define and execute integration patterns for ingesting data from various sources (such as enterprise systems, SharePoint, external APIs, and cloud storage).
- Ensure datasets are reusable, well-structured, and analytics-ready to effectively support reporting, monitoring, and future modeling requirements.
2) Transformations (dbt as a priority)
- Implement transformation logic using SQL and Python, ideally utilizing dbt for standardized modeling and testing.
- Establish standards for modular, maintainable transformations and comprehensive documentation.
- Contribute to best practices surrounding version control, code reviews, and deployment for transformation workflows.
3) Data Quality, Reliability & Governance
- Develop and maintain data quality controls, including validation, source system reconciliation, and consistency checks.
- Oversee pipeline health, data freshness, and reliability; proactively troubleshoot incidents to prevent recurrence.
- Support data governance practices such as naming conventions, access principles, and compliance standards.
