company

Data Engineer (PySpark) - Expert in Cloudera Data Platform

gsstech-groupDubai, Dubai, United Arab Emirates
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Mid to Senior

Qualifications

QualificationsEducation and Experience:Bachelor’s or Master’s degree in Computer Science, Data Engineering, Information Systems, or a related field.3+ years of experience as a Data Engineer, with a substantial emphasis on PySpark and the Cloudera Data Platform.

About the job

Job Title: Data Engineer (PySpark)

________________________________________

About the Role

We are in search of a talented Data Engineer with significant expertise in PySpark and the Cloudera Data Platform (CDP) to bolster our data engineering team. In this role, you will be tasked with the design, development, and maintenance of scalable data pipelines that guarantee high data quality and availability across the organization. A robust background in big data ecosystems, cloud-native tools, and advanced data processing techniques is essential.

The right candidate will possess hands-on experience in data ingestion, transformation, and optimization within the Cloudera Data Platform, along with a solid history of implementing data engineering best practices. Collaboration with fellow data engineers will be key to creating solutions that yield impactful business insights.

Responsibilities

  • Design, develop, and sustain highly scalable and optimized ETL pipelines utilizing PySpark on the Cloudera Data Platform, ensuring data integrity and precision.
  • Oversee data ingestion processes from diverse sources (e.g., relational databases, APIs, file systems) to the data lake or data warehouse on CDP.
  • Employ PySpark to process, cleanse, and transform extensive datasets into actionable formats that fulfill analytical needs and business objectives.
  • Optimize performance by tuning PySpark code and Cloudera components to enhance resource utilization and minimize ETL runtimes.
  • Establish data quality checks, monitoring, and validation protocols to uphold data accuracy and reliability throughout the pipeline.
  • Automate data workflows utilizing tools such as Apache Oozie, Airflow, or comparable orchestration tools within the Cloudera ecosystem.
  • Monitor pipeline performance, troubleshoot issues, and perform routine maintenance on the Cloudera Data Platform and related data processes.
  • Collaborate closely with other data engineers, analysts, product managers, and other stakeholders to comprehend data requirements and support various data-driven initiatives.
  • Document data engineering processes, code, and pipeline configurations thoroughly.

About gsstech-group

At gsstech-group, we are committed to driving innovation and delivering exceptional data solutions. As one of the leading banks in the UAE, we harness cutting-edge technology to enhance our operations and provide top-notch services to our clients.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.