About the job
About Us
At Abacus Insights, we are revolutionizing the utilization of data within health plans. Our mission is straightforward: to make healthcare data actionable, enabling decision-makers in care and cost management to respond swiftly and confidently. We assist health plans in dismantling data silos to establish a unified, trustworthy data foundation. This foundation leads to improved decision-making, enhanced outcomes, reduced waste, and superior experiences for both members and providers.
With $100 million in backing from leading investors, we are addressing significant challenges in an industry poised for transformation. Our platform supports Generative AI applications by ensuring the delivery of clean, connected, and reliable healthcare data that facilitates automation, prioritization, and decision workflows—keeping us at the forefront of innovation.
Our progress is driven by our people. We embrace boldness, curiosity, and collaboration, as we believe the best ideas emerge from teamwork. Are you ready to make a difference? Join us in building the future.
About the Role:
We are seeking a skilled Data QA Engineer to join our TechOps team. If you are passionate about guiding critical feature delivery, this role is tailored for you. The team is tasked with implementing and configuring data pipelines that ingest, ETL, and process diverse data sources, including the construction of large batch processing and streaming systems. You'll engage with various aspects of our platform and AWS services (Serverless – Lambda, EMR – Hadoop/Spark, EKS – Kubernetes, etc.) while also contributing to the product and team evolution during our rapid growth. Your expertise in architectural evolution will play a crucial role as we develop new platform components.
Your Daily Responsibilities:
- Evaluate mapping specifications to identify business requirements and assess QA efforts.
- Craft ETL test scenarios and test cases while validating input file layouts and parsers.
- Create source-to-target SQL queries for data validation and user acceptance criteria.
- Automate data validation processes using PySpark and Databricks.
- Conduct testing with synthetic data that simulates real-time production scenarios.

