Sr. Lead AI Engineer, Data - 11315

Software Development

Excel

WFA Digital Insight

The demand for skilled AI engineers has skyrocketed, with a 25% growth in job openings in the last year alone. As companies like Coupa pioneer innovative technologies, the need for experts who can design and implement cutting-edge data pipelines has become paramount. With the global AI market expected to reach

90 billion by 2027, professionals with a strong background in data engineering, machine learning, and software development are in high demand. Coupa's commitment to community-generated AI and total spend management platforms makes it an exciting space for those looking to make a real impact. Before applying, candidates should be prepared to showcase their technical prowess, particularly in Apache Spark, PySpark, and large-scale data processing.

Job Description

About the Role

As a Senior Lead AI Engineer, Data at Coupa, you will be at the forefront of designing and implementing data pipelines that prepare high-quality training data for AI models. This is a critical role that involves building data curation workflows, transforming raw enterprise data into labeled, validated datasets, and collaborating with ML engineers on training data format requirements. You will be part of a dynamic team that is passionate about leveraging technology to empower customers with greater efficiency and visibility in their spend.

The role entails leading the design and implementation of data pipelines, building data curation workflows, and designing data quality frameworks. You will work closely with cross-functional teams to establish data catalog and metadata management for AI training artifacts. Your expertise in data engineering, combined with your passion for AI and machine learning, will be instrumental in driving the success of Coupa's data platform.

Coupa's data platform already handles anonymized data exports, commodity classification, supplier normalization, and benchmark metrics across 197+ enterprise tables. As a Senior Lead AI Engineer, you will expand this foundation, building the data curation and pipeline infrastructure that feeds the company's growing AI model training capabilities. This is a high-volume workstream processing trillions of dollars of enterprise spend data.

What You Will Do

Lead the design and implementation of data pipelines that prepare high-quality training data for AI models
Build data curation workflows that transform raw enterprise data into labeled, validated datasets
Design data quality frameworks: validation, profiling, anomaly detection, lineage tracking
Extend existing anonymized data export pipelines to support AI training workloads
Implement synthetic data generation pipelines
Design schema mappings across 197+ enterprise tables for feature extraction
Collaborate with ML engineers on training data format requirements
Establish data catalog and metadata management for AI training artifacts
Work closely with cross-functional teams to ensure seamless integration of data pipelines
Continuously monitor and improve data pipeline performance and scalability

What We Are Looking For

10+ years of software engineering experience, with 5+ years in data engineering
Strong experience with Apache Spark / PySpark and large-scale data processing
Experience building ETL/ELT pipelines on cloud infrastructure (managed Spark, object storage, managed ETL, or equivalent)
Knowledge of data quality frameworks and data governance
Experience with data anonymization and privacy-preserving data processing
Understanding of ML training data requirements
Proficiency in Python and SQL
Experience with data catalog tools and metadata management
BS/MS in Computer Science or equivalent experience
Experience in B2B SaaS with multi-tenant data preferred

Nice to Have

Experience with Excel for data analysis and visualization
Familiarity with machine learning algorithms and models
Experience with cloud-based data warehousing solutions
Knowledge of data security and compliance regulations

Benefits and Perks

Competitive salary and benefits package
Opportunity to work with a pioneering technology company
Collaborative and dynamic work environment
Professional development and growth opportunities
Flexible working hours and remote work options
Access to cutting-edge technologies and tools
Recognition and reward for outstanding performance
Comprehensive health and wellness programs

How to Stand Out

Tip: Showcase your expertise in Apache Spark, PySpark, and large-scale data processing by providing specific examples of previous projects or experiences.
To stand out, highlight your understanding of data quality frameworks, data governance, and machine learning training data requirements.
Familiarize yourself with Coupa's technology stack and be prepared to discuss how your skills align with the company's goals and objectives.
When preparing for the interview, review common data engineering and machine learning interview questions and practice your responses.
Consider creating a portfolio that demonstrates your proficiency in Python, SQL, and data catalog tools, as well as your experience with data anonymization and privacy-preserving data processing.
Be prepared to negotiate your salary based on your experience and qualifications, and don't be afraid to ask about benefits, perks, and opportunities for growth and development.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.