companyQloo logo

Data Engineer at Qloo | New York City

QlooNew York City
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

- Bachelor's degree in Computer Science, Software Engineering, or a related discipline, or equivalent hands-on experience.- Proficient in Python and distributed data processing using Spark (PySpark) in EMR or a comparable environment.- Practical experience with key AWS data services, particularly: - S3 (data lake, partitioning, lifecycle management) - AWS Glue (jobs, crawlers, catalogs) - EMR or other managed Spark environments.

About the job

About Qloo
At Qloo, we leverage extensive behavioral and catalog data to drive insightful recommendations across various sectors including entertainment, dining, travel, and retail. Our cutting-edge platform is constructed on a robust AWS data architecture, facilitating advanced analytics, APIs, and machine-learning models utilized by prominent brands. We are in search of a skilled Data Engineer to contribute to the enhancement and scaling of this innovative platform.

Position Summary
As a Data Engineer at Qloo, you will be instrumental in designing, constructing, and managing the data pipelines that aggregate information from external vendors, internal systems, and public sources into our S3-based data lake and related services. You will engage with AWS Glue, EMR (Spark), Athena/Hive, and Airflow (MWAA) to ensure that our data is precise, well-structured, and readily accessible for analytics, indexing, and machine-learning initiatives.

You should feel confident managing end-to-end data workflows, encompassing ingestion, transformation, quality assurance, monitoring, and performance optimization.

Key Responsibilities
- Design, develop, and maintain batch data pipelines utilizing Python, Spark (EMR), and AWS Glue, transferring data from S3, RDS, and external sources into Hive/Athena tables.
- Structure datasets in our S3/Hive data lake to facilitate analytics (Hex), API functionalities, Elasticsearch indexing, and machine learning models.
- Create and manage workflows in Airflow (MWAA), handling dependency management, scheduling, retries, and alert notifications via Slack.
- Establish robust data quality and validation mechanisms (schema validation, freshness/volume checks, anomaly detection) to ensure prompt issue identification through monitoring and alerts.
- Enhance job performance for cost and efficiency (partitioning, file formats, join strategies, appropriate utilization of EMR/Glue resources).
- Collaborate closely with data scientists, ML engineers, and application developers to comprehend data needs and design schemas and pipelines that address diverse use cases.
- Contribute to internal tools and shared libraries to streamline, secure, and standardize interactions with our data platform.
- Document pipelines, datasets, and best practices to ensure clear understanding and accessibility for the wider team.

Required Qualifications
- Bachelor's degree in Computer Science, Software Engineering, or a related discipline, or equivalent hands-on experience.
- Proficient in Python and distributed data processing using Spark (PySpark) in EMR or a comparable environment.
- Practical experience with key AWS data services, particularly:
- S3 (data lake, partitioning, lifecycle management)
- AWS Glue (jobs, crawlers, catalogs)
- EMR or other managed Spark environments.

About Qloo

At Qloo, we leverage extensive behavioral and catalog data to drive insightful recommendations across various sectors including entertainment, dining, travel, and retail. Our cutting-edge platform is constructed on a robust AWS data architecture, facilitating advanced analytics, APIs, and machine-learning models utilized by prominent brands.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.