About the job
Position Summary:
The Data Integration Engineer will take the lead in designing, constructing, and maintaining robust data pipelines and warehouse models that drive insightful analytics and operational reporting. You will seamlessly integrate various data sources—including databases, APIs, SaaS applications, and flat files—and develop efficient ELT/ETL processes in a cloud environment. Collaboration with analytics teams, application developers, and business stakeholders will be key as you transform requirements into reliable, production-ready datasets.
Duties/Responsibilities:
Data Pipeline Development
- Create and implement scalable ETL/ELT pipelines (both batch and near-real-time) to source data from databases, APIs, SaaS, and flat files into platforms like Snowflake and Azure Synapse.
- Utilize integration tools such as Azure Data Factory (ADF), Fivetran, CData Sync, and Boomi, enhancing capabilities with custom code as necessary.
- Produce clean, maintainable code primarily in SQL, with Python or PHP utilized as needed for custom connectors, transformations, or microservices.
- Enhance workflows for optimal performance, reliability, and scalability through effective partitioning, parallel processing, incremental loads, CDC, idempotency, and robust retry/rollback strategies.
- Oversee data warehouse platforms, including Azure Synapse and Snowflake, ensuring their smooth operation.
- Identify and resolve data pipeline failures and errors promptly.
Data Warehouse Management
- Develop and sustain data models, schemas, views, and stored procedures; proficiently manage staging, core, and mart layers along with source-to-target mappings.
- Enforce data quality validation and monitoring practices (e.g., null/duplicate/range checks, schema drift detection, reconciliation).
- Address and troubleshoot identified errors with efficiency.
- Adopt and apply best practices in data warehousing (e.g., clustering, partitioning, cost governance, role-based access, tagging/lineage).
SQL & Python Support
- Execute complex SQL queries, refactor existing queries (using window functions, CTE chains), and optimize performance through explain plans, pruning, and join strategies.
- Create and troubleshoot Python notebooks, ensuring seamless integration into pipelines, including aspects like packaging, scheduling, and secret management.
Collaboration & Documentation
- Work closely with data analysts and stakeholders to clarify requirements and acceptance criteria, translating them into detailed technical designs.
- Maintain technical specifications, data flow diagrams, and operational procedures; contribute to documentation standards and reusable patterns.

