company

Data Engineer (PySpark) - Cloudera Expert at Leading UAE Bank

gsstech-groupBengaluru, Karnataka, India
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

QualificationsBachelor’s or Master’s degree in Computer Science, Data Engineering, Information Systems, or a related discipline.3+ years of experience as a Data Engineer, focusing on PySpark and the Cloudera Data Platform.

About the job

Job Title: Data Engineer (PySpark)

________________________________________

About the Role

We invite you to join our dynamic data engineering team as a proficient Data Engineer specializing in PySpark and the Cloudera Data Platform (CDP). In this pivotal role, you will be tasked with architecting, developing, and sustaining robust data pipelines that guarantee exceptional data quality and accessibility throughout the organization. Your expertise in big data ecosystems, cloud-native technologies, and sophisticated data processing methodologies is essential.

The ideal candidate will possess extensive hands-on experience in data ingestion, transformation, and optimization on the Cloudera Data Platform, complemented by a strong history of applying data engineering best practices. You will collaborate closely with fellow data engineers to devise solutions that foster significant business insights.

Key Responsibilities

  • Design and develop scalable ETL pipelines using PySpark on CDP, ensuring data integrity.
  • Manage data ingestion processes from diverse sources (e.g., relational databases, APIs, file systems) to the data lake or warehouse on CDP.
  • Utilize PySpark for processing, cleansing, and transforming vast datasets to meet analytical and business needs.
  • Optimize performance by fine-tuning PySpark code and Cloudera components to enhance resource utilization.
  • Establish data quality checks and validation routines to maintain data accuracy throughout the pipeline.
  • Automate workflows using orchestration tools like Apache Oozie or Airflow within the Cloudera ecosystem.
  • Monitor pipeline performance, troubleshoot issues, and maintain the Cloudera Data Platform and associated processes.
  • Collaborate with data engineers, analysts, product managers, and other stakeholders to understand data requirements.
  • Document data engineering processes, code, and pipeline configurations thoroughly.

Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Data Engineering, Information Systems, or a related discipline.
  • 3+ years of experience as a Data Engineer, focusing on PySpark and the Cloudera Data Platform.

About gsstech-group

Join gsstech-group, a leading UAE Bank, where innovation meets expertise. We are committed to leveraging cutting-edge technologies to provide exceptional financial services.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.