AI Data Engineer

Bright Vision TechnologiesBright Vision Technologies·Remote(United States)
Software Development
Excel

WFA Digital Insight

As demand for AI solutions grows, the need for skilled data engineers who can design and operate complex data pipelines is skyrocketing. With the AI market expected to reach

90 billion by 2025, professionals with expertise in data engineering and AI workloads are in high demand. Bright Vision Technologies, a pioneer in innovative software development, is seeking an AI Data Engineer to join their dynamic team. With a strong focus on scalability, security, and user experience, this role stands out in the current remote job market. Candidates should be prepared to showcase their technical skills and experience in building petabyte-scale data systems.

Job Description

About the Role

The AI Data Engineer role at Bright Vision Technologies is a unique opportunity to build and operate large-scale data systems that power modern AI training and evaluation pipelines. As a key member of the team, you will be responsible for designing and implementing data pipelines that support AI training, evaluation, and continual improvement workflows. Your expertise in data engineering and AI workloads will be crucial in driving the development of scalable, secure, and user-friendly applications.

The ideal candidate will have a deep understanding of how data infrastructure choices impact model quality and training efficiency. You will be working closely with ML researchers and engineers to align data systems with model development needs, ensuring seamless integration and optimal performance.

What You Will Do

  • Design and operate large-scale data pipelines supporting AI training, evaluation, and continual improvement workflows
  • Build ingestion systems for diverse modalities, including text, image, audio, video, and structured signals
  • Implement data cleaning, deduplication, filtering, and quality assurance at petabyte scale
  • Develop dataset versioning, lineage, and provenance tracking systems suitable for reproducible training
  • Build high-throughput data loading systems that maximize GPU utilization during training
  • Implement labeling workflows, active learning pipelines, and human-in-the-loop data improvement systems
  • Design storage architectures balancing cost, throughput, and latency across data tiers
  • Build evaluation dataset construction pipelines with strict integrity and contamination controls
  • Implement data privacy, redaction, and consent enforcement throughout the pipeline
  • Collaborate with ML researchers and engineers to align data systems with model development needs
  • Drive observability of data quality, drift, and pipeline health across the AI data ecosystem

What We Are Looking For

  • 6+ years of experience in data engineering, with a strong focus on AI workloads and large-scale data systems
  • Expertise in building and operating petabyte-scale data systems, with a deep understanding of data infrastructure and its impact on model quality
  • Strong software engineering fundamentals, with experience in designing and implementing scalable, secure, and user-friendly applications
  • Experience with data pipeline tools and technologies, such as Apache Beam, Apache Spark, or similar
  • Strong understanding of data quality, data governance, and data privacy principles
  • Experience working with ML researchers and engineers to integrate data systems with model development needs
  • Excellent communication and collaboration skills, with the ability to work effectively in a remote team environment

Nice to Have

  • Experience with cloud-based data platforms, such as AWS, GCP, or Azure
  • Familiarity with machine learning frameworks, such as TensorFlow, PyTorch, or similar
  • Experience with data visualization tools, such as Tableau, Power BI, or similar
  • Certification in data engineering, data science, or a related field

Benefits and Perks

  • Competitive base salary commensurate with experience
  • Comprehensive benefits package, including health, dental, and vision insurance
  • Generous PTO policy, with flexible working hours and remote work options
  • Opportunity to work on cutting-edge AI projects, with a strong focus on innovation and growth
  • Collaborative and dynamic team environment, with regular team-building activities and social events

How to Stand Out

  • To stand out in this role, make sure to showcase your expertise in building and operating large-scale data systems, with a strong focus on AI workloads and data engineering.
  • Be prepared to discuss your experience with data pipeline tools and technologies, such as Apache Beam or Apache Spark.
  • Highlight your understanding of data quality, data governance, and data privacy principles, and how you have implemented these in previous roles.
  • Emphasize your ability to collaborate effectively with ML researchers and engineers, and your experience in integrating data systems with model development needs.
  • Be prepared to provide examples of your experience with petabyte-scale data systems, and your ability to design and implement scalable, secure, and user-friendly applications.
  • Consider including a portfolio of your work, or examples of your projects, to demonstrate your skills and experience.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.