AI Data Infrastructure Engineer

Bright Vision Technologies·Remote(United States)
Software Development
Excel

WFA Digital Insight

The demand for skilled AI Data Infrastructure Engineers has skyrocketed, with a 25% increase in job postings over the past year. This role is particularly interesting, as it requires a deep understanding of data engineering and AI workloads. With the rise of big data and machine learning, companies like Bright Vision Technologies are looking for experts who can design and operate large-scale data pipelines. Candidates should be aware that a strong background in software engineering and data infrastructure is essential, as well as experience with petabyte-scale data systems. Before applying, candidates should be prepared to showcase their technical expertise and hands-on experience.

Job Description

About the Role

The AI Data Infrastructure Engineer role at Bright Vision Technologies is a critical position that requires a unique blend of data engineering expertise and AI workload understanding. As a key member of the team, you will be responsible for designing and operating large-scale data pipelines that support AI training, evaluation, and continual improvement workflows. You will work closely with ML researchers and engineers to align data systems with model development needs, driving the company's mission to transform business processes through technology.

The ideal candidate has experience operating petabyte-scale data systems, strong software engineering fundamentals, and a clear understanding of how data infrastructure choices propagate into model quality and training efficiency. You will be part of a dynamic team that is dedicated to building innovative solutions that help businesses automate and optimize their operations.

What You Will Do

  • Design and operate large-scale data pipelines supporting AI training, evaluation, and continual improvement workflows
  • Build ingestion systems for diverse modalities, including text, image, audio, video, and structured signals
  • Implement data cleaning, deduplication, filtering, and quality assurance at petabyte scale
  • Develop dataset versioning, lineage, and provenance tracking systems suitable for reproducible training
  • Build high-throughput data loading systems that maximize GPU utilization during training
  • Implement labeling workflows, active learning pipelines, and human-in-the-loop data improvement systems
  • Design storage architectures balancing cost, throughput, and latency across data tiers
  • Build evaluation dataset construction pipelines with strict integrity and contamination controls
  • Implement data privacy, redaction, and consent enforcement throughout the pipeline
  • Collaborate with ML researchers and engineers to align data systems with model development needs
  • Drive observability of data quality, drift, and pipeline health across teams

What We Are Looking For

  • 6+ years of experience in data engineering, with a strong focus on AI workloads and large-scale data systems
  • Strong software engineering fundamentals, including proficiency in languages such as Python, Java, or C++
  • Experience with petabyte-scale data systems, including data ingestion, processing, and storage
  • Clear understanding of how data infrastructure choices propagate into model quality and training efficiency
  • Experience with data pipeline tools, such as Apache Beam, Apache Spark, or AWS Glue
  • Strong understanding of data quality, data governance, and data privacy principles
  • Excellent collaboration and communication skills, with the ability to work with cross-functional teams
  • Experience with Agile development methodologies and version control systems, such as Git

Nice to Have

  • Experience with machine learning frameworks, such as TensorFlow or PyTorch
  • Knowledge of cloud-based data platforms, such as AWS, GCP, or Azure
  • Experience with data visualization tools, such as Tableau or Power BI
  • Certification in data engineering or a related field
  • Experience with containers and orchestration tools, such as Docker or Kubernetes

Benefits and Perks

  • Competitive base salary commensurate with experience
  • Comprehensive benefits package, including health, dental, and vision insurance
  • 401(k) retirement plan with company match
  • Generous PTO policy, including vacation, sick leave, and holidays
  • Remote work stipend, including equipment and software reimbursement
  • Opportunities for professional growth and development, including training and education reimbursement
  • Collaborative and dynamic work environment, with a team of experienced professionals

How to Stand Out

  • Be prepared to showcase your technical expertise and hands-on experience with large-scale data systems and AI workloads.
  • Highlight your understanding of data infrastructure choices and how they impact model quality and training efficiency.
  • Emphasize your experience with data pipeline tools, such as Apache Beam or Apache Spark, and your ability to design and operate large-scale data pipelines.
  • Demonstrate your ability to collaborate with cross-functional teams, including ML researchers and engineers.
  • Be prepared to discuss your approach to data quality, data governance, and data privacy principles.
  • Showcase your experience with Agile development methodologies and version control systems, such as Git.
  • Highlight any certifications or training you have received in data engineering or a related field.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.