Sr. Lead AI Engineer, Data - 11315
WFA Digital Insight
The demand for skilled AI engineers has skyrocketed, with a 25% growth in job openings in the last year alone. As companies like Coupa pioneer innovative technologies, the need for experts who can design and implement cutting-edge data pipelines has become paramount. With the global AI market expected to reach
Job Description
About the Role
As a Senior Lead AI Engineer, Data at Coupa, you will be at the forefront of designing and implementing data pipelines that prepare high-quality training data for AI models. This is a critical role that involves building data curation workflows, transforming raw enterprise data into labeled, validated datasets, and collaborating with ML engineers on training data format requirements. You will be part of a dynamic team that is passionate about leveraging technology to empower customers with greater efficiency and visibility in their spend.The role entails leading the design and implementation of data pipelines, building data curation workflows, and designing data quality frameworks. You will work closely with cross-functional teams to establish data catalog and metadata management for AI training artifacts. Your expertise in data engineering, combined with your passion for AI and machine learning, will be instrumental in driving the success of Coupa's data platform.
Coupa's data platform already handles anonymized data exports, commodity classification, supplier normalization, and benchmark metrics across 197+ enterprise tables. As a Senior Lead AI Engineer, you will expand this foundation, building the data curation and pipeline infrastructure that feeds the company's growing AI model training capabilities. This is a high-volume workstream processing trillions of dollars of enterprise spend data.
What You Will Do
- Lead the design and implementation of data pipelines that prepare high-quality training data for AI models
- Build data curation workflows that transform raw enterprise data into labeled, validated datasets
- Design data quality frameworks: validation, profiling, anomaly detection, lineage tracking
- Extend existing anonymized data export pipelines to support AI training workloads
- Implement synthetic data generation pipelines
- Design schema mappings across 197+ enterprise tables for feature extraction
- Collaborate with ML engineers on training data format requirements
- Establish data catalog and metadata management for AI training artifacts
- Work closely with cross-functional teams to ensure seamless integration of data pipelines
- Continuously monitor and improve data pipeline performance and scalability
What We Are Looking For
- 10+ years of software engineering experience, with 5+ years in data engineering
- Strong experience with Apache Spark / PySpark and large-scale data processing
- Experience building ETL/ELT pipelines on cloud infrastructure (managed Spark, object storage, managed ETL, or equivalent)
- Knowledge of data quality frameworks and data governance
- Experience with data anonymization and privacy-preserving data processing
- Understanding of ML training data requirements
- Proficiency in Python and SQL
- Experience with data catalog tools and metadata management
- BS/MS in Computer Science or equivalent experience
- Experience in B2B SaaS with multi-tenant data preferred
Nice to Have
- Experience with Excel for data analysis and visualization
- Familiarity with machine learning algorithms and models
- Experience with cloud-based data warehousing solutions
- Knowledge of data security and compliance regulations
Benefits and Perks
- Competitive salary and benefits package
- Opportunity to work with a pioneering technology company
- Collaborative and dynamic work environment
- Professional development and growth opportunities
- Flexible working hours and remote work options
- Access to cutting-edge technologies and tools
- Recognition and reward for outstanding performance
- Comprehensive health and wellness programs
How to Stand Out
- Tip: Showcase your expertise in Apache Spark, PySpark, and large-scale data processing by providing specific examples of previous projects or experiences.
- To stand out, highlight your understanding of data quality frameworks, data governance, and machine learning training data requirements.
- Familiarize yourself with Coupa's technology stack and be prepared to discuss how your skills align with the company's goals and objectives.
- When preparing for the interview, review common data engineering and machine learning interview questions and practice your responses.
- Consider creating a portfolio that demonstrates your proficiency in Python, SQL, and data catalog tools, as well as your experience with data anonymization and privacy-preserving data processing.
- Be prepared to negotiate your salary based on your experience and qualifications, and don't be afraid to ask about benefits, perks, and opportunities for growth and development.
This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.