About the job
At Similarweb, we are dedicated to constructing some of the most comprehensive and unique perspectives on the functionality of the digital landscape.
The Data for AI team, a specialized unit within Similarweb, collaborates closely with a select group of the world's leading AI organizations, primarily foundational model companies. Our mission is to empower these companies to enhance their models and AI assistants by leveraging Similarweb's data for real-world AI applications.
This role demands deep collaboration with AI teams, emphasizing data quality, scalability, and relevance to contemporary machine learning systems. We operate in a nimble, high-ownership environment where team members significantly influence the application of Similarweb’s data in advanced AI products.
About the Role
We are in search of a Data Analyst to join our Data for AI team. This hands-on, customer-facing role focuses on partnering with leading AI companies to transform real-world data into valuable inputs that facilitate model development and evaluation.
You will work closely with external AI teams and internal engineering and product partners to deliver data-driven solutions tailored to specific AI use cases. The environment is fast-paced, technical, and often open-ended, necessitating a comfort with large datasets, ambiguous requirements, and complete ownership of projects.
Your Day-to-Day Responsibilities:
- Oversee the entire delivery process of data solutions for AI applications, from understanding model and product requirements to analysis, implementation, quality assurance, and automation.
- Engage directly with large, raw datasets to produce high-quality data inputs that support model training, evaluation, and iteration.
- Utilize strong quantitative analysis and data exploration skills to evaluate the coverage, quality, and behavior of data utilized in AI systems.
- Develop scripts, analyses, and reusable components in Python and SQL to ensure scalable and repeatable workflows.
- Collaborate closely with Engineering to guarantee that solutions are reliable, scalable, and ready for production.
- Partner directly with external AI teams and internal stakeholders to translate open-ended inquiries into specific data outputs.
This Role is Ideal for You if You Have:
- 4+ years of practical experience managing large-scale data using SQL and Spark or BigQuery.
- Robust Python skills for data analysis, scripting, and creating reusable workflows.
- Experience in handling raw, flawed data and transforming it into reliable, high-quality outputs.
- Strong analytical and problem-solving abilities, capable of dissecting open-ended or ambiguous challenges.

