companyToss logo

Data Engineer for AI Platform

TossSeoul
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

We are looking for candidates who meet the following criteria: Experience designing and developing backend services based on Python. Hands-on experience in designing and implementing data pipelines or data platforms. A deep understanding of SQL with experience optimizing large-scale data queries. Familiarity with REST API design and microservices architecture. Interest in automating repetitive problems through systems. Experience developing asynchronous APIs using FastAPI and Pydantic is a plus. Experience deploying and operating services in a Kubernetes environment is preferred. Hands-on experience designing and operating data pipelines based on Airflow is advantageous. Experience working with Elasticsearch and operating vector search (KNN/HNSW) is a plus. Familiarity with OLAP engines such as StarRocks, Impala, or ClickHouse is beneficial. Experience with semantic data modeling, including GraphRAG and knowledge graphs, is desirable. Experience developing data quality management platforms (DQ Rule engine, Health Check systems) is preferred. Experience integrating LLM APIs (OpenAI, Claude, etc.) or utilizing AI agent frameworks is advantageous. Familiarity with MCP (Model Context Protocol) or similar AI tool protocols is a plus.

About the job

Join Us and Engage in Exciting Work!

After completing a comprehensive onboarding process to familiarize yourself with the Toss data environment, you will be part of the Data Warehouse Team, undertaking the following responsibilities:

Develop a data quality platform that enhances table consistency, advances DQ rules, and establishes health check metrics. We aim to create a reliability management platform allowing all data users to work without questioning, 'Can I trust this data?'.
Enhance the GraphRAG pipeline. Build a knowledge graph construction pipeline that extracts entities by parsing ontology YAML, SQL, and code, followed by vector embedding for indexing in Elasticsearch, making Toss's data assets easily navigable for everyone.
Design and operate MSA architectures. Split necessary services for the ontology platform into microservices, ensuring each is designed, implemented, and operated reliably.
Develop AI agent infrastructure. Create a multi-agent workflow execution environment based on open-source agent frameworks like CrewAI. Establish an MCP Tool Registry and develop integration infrastructure with external MCP servers.
Build an early warning platform. Create a monitoring system that detects anomalies in data lineage, code, and trends, automatically performing alerts and analyses to identify issues before they escalate.
Develop a lineage tracking engine. Create a system that automatically analyzes the extent of impacts by parsing SQL to extract column-wise influence relationships, determining how far changes propagate.

About Toss

Toss is a forward-thinking company focused on transforming the financial landscape through innovative solutions and a data-driven approach. We believe in creating a collaborative environment that fosters creativity and growth.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.