AI Data Engineer

Bright Vision Technologies·Remote(United States)

Software Development

Excel

WFA Digital Insight

As demand for AI solutions grows, the need for skilled data engineers who can design and operate complex data pipelines is skyrocketing. With the AI market expected to reach

90 billion by 2025, professionals with expertise in data engineering and AI workloads are in high demand. Bright Vision Technologies, a pioneer in innovative software development, is seeking an AI Data Engineer to join their dynamic team. With a strong focus on scalability, security, and user experience, this role stands out in the current remote job market. Candidates should be prepared to showcase their technical skills and experience in building petabyte-scale data systems.

Job Description

About the Role

The AI Data Engineer role at Bright Vision Technologies is a unique opportunity to build and operate large-scale data systems that power modern AI training and evaluation pipelines. As a key member of the team, you will be responsible for designing and implementing data pipelines that support AI training, evaluation, and continual improvement workflows. Your expertise in data engineering and AI workloads will be crucial in driving the development of scalable, secure, and user-friendly applications.

The ideal candidate will have a deep understanding of how data infrastructure choices impact model quality and training efficiency. You will be working closely with ML researchers and engineers to align data systems with model development needs, ensuring seamless integration and optimal performance.

What You Will Do

Design and operate large-scale data pipelines supporting AI training, evaluation, and continual improvement workflows
Build ingestion systems for diverse modalities, including text, image, audio, video, and structured signals
Implement data cleaning, deduplication, filtering, and quality assurance at petabyte scale
Develop dataset versioning, lineage, and provenance tracking systems suitable for reproducible training
Build high-throughput data loading systems that maximize GPU utilization during training
Implement labeling workflows, active learning pipelines, and human-in-the-loop data improvement systems
Design storage architectures balancing cost, throughput, and latency across data tiers
Build evaluation dataset construction pipelines with strict integrity and contamination controls
Implement data privacy, redaction, and consent enforcement throughout the pipeline
Collaborate with ML researchers and engineers to align data systems with model development needs
Drive observability of data quality, drift, and pipeline health across the AI data ecosystem

What We Are Looking For

6+ years of experience in data engineering, with a strong focus on AI workloads and large-scale data systems
Expertise in building and operating petabyte-scale data systems, with a deep understanding of data infrastructure and its impact on model quality
Strong software engineering fundamentals, with experience in designing and implementing scalable, secure, and user-friendly applications
Experience with data pipeline tools and technologies, such as Apache Beam, Apache Spark, or similar
Strong understanding of data quality, data governance, and data privacy principles
Experience working with ML researchers and engineers to integrate data systems with model development needs
Excellent communication and collaboration skills, with the ability to work effectively in a remote team environment

Nice to Have

Experience with cloud-based data platforms, such as AWS, GCP, or Azure
Familiarity with machine learning frameworks, such as TensorFlow, PyTorch, or similar
Experience with data visualization tools, such as Tableau, Power BI, or similar
Certification in data engineering, data science, or a related field

Benefits and Perks

Competitive base salary commensurate with experience
Comprehensive benefits package, including health, dental, and vision insurance
Generous PTO policy, with flexible working hours and remote work options
Opportunity to work on cutting-edge AI projects, with a strong focus on innovation and growth
Collaborative and dynamic team environment, with regular team-building activities and social events

How to Stand Out

To stand out in this role, make sure to showcase your expertise in building and operating large-scale data systems, with a strong focus on AI workloads and data engineering.
Be prepared to discuss your experience with data pipeline tools and technologies, such as Apache Beam or Apache Spark.
Highlight your understanding of data quality, data governance, and data privacy principles, and how you have implemented these in previous roles.
Emphasize your ability to collaborate effectively with ML researchers and engineers, and your experience in integrating data systems with model development needs.
Be prepared to provide examples of your experience with petabyte-scale data systems, and your ability to design and implement scalable, secure, and user-friendly applications.
Consider including a portfolio of your work, or examples of your projects, to demonstrate your skills and experience.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.