About the job
* Design, implement, and maintain robust data ingestion pipelines that are secure and scalable, sourcing data from diverse systems such as SAP, Salesforce, SharePoint, APIs, and legacy manufacturing platforms.
* Develop and enhance metadata-driven services to facilitate discoverability, access governance, and operational transparency of enterprise data.
* Act as a technical authority and cross-functional facilitator for the acquisition, quality, and compliance of both structured and unstructured data.
* Establish and oversee comprehensive data quality management, including monitoring and reporting.
* Contribute to a global data engineering team that supports all major business domains.
* Spearhead the ingestion and metadata service implementation for over 100 enterprise data sources.
* Collaborate with IT, cybersecurity, infrastructure, and architecture teams to ensure secure and sustainable data delivery.
Main Responsibilities:
▪ Construct and maintain extraction services using Python or Scala (e.g., Debezium Server, custom APIs, rclone).
• Implement Change Data Capture (CDC), delta, and event-based patterns.
• Facilitate push-based HTTP and Kerberos-authenticated DLT delivery.
• Establish, operate, and troubleshoot SAP extraction using tools like Theobald Extract Universal.
• Integrate with systems such as Salesforce, SharePoint, and other API or file-based endpoints.
▪ Develop a user-friendly, web-accessible data catalog application, featuring dataset profiles, metadata, and usability enhancements.
▪ Integrate dataset discoverability, preview/exploration features, and lineage information using Unity Catalog as a backend metadata system.
▪ Design and implement structured access request workflows encompassing submission, approval chains, audit trails, and enablement triggers.
• Conduct design reviews with the Cybersecurity team.
• Ensure proper documentation and compliance for all interfaces and data ingress points.
• Manage audit and traceability requirements.
• Collaborate with IT and business users to translate requirements into scalable technical solutions.
• Serve as a technical escalation point for complex source integration challenges.
▪ Define and execute a multi-layered data quality framework, incorporating unit-level, integration-level, and cross-pipeline validation rules.
▪ Establish centralized, version-controlled storage of Data Quality (DQ) rules, integrating them into orchestration and CI/CD pipelines.
▪ Implement automated DQ monitoring with varying severity levels (Critical, High, Medium, Low) and enable flagging, filtering, and quarantining mechanisms at relevant pipeline stages.
▪ Work closely with source system owners and business stakeholders to define meaningful and actionable DQ thresholds.

