AI Data Infrastructure Engineer

Bright Vision Technologies·Remote(United States)

Software Development

Excel

WFA Digital Insight

The demand for skilled AI Data Infrastructure Engineers has skyrocketed, with a 25% increase in job postings over the past year. This role is particularly interesting, as it requires a deep understanding of data engineering and AI workloads. With the rise of big data and machine learning, companies like Bright Vision Technologies are looking for experts who can design and operate large-scale data pipelines. Candidates should be aware that a strong background in software engineering and data infrastructure is essential, as well as experience with petabyte-scale data systems. Before applying, candidates should be prepared to showcase their technical expertise and hands-on experience.

Job Description

About the Role

The AI Data Infrastructure Engineer role at Bright Vision Technologies is a critical position that requires a unique blend of data engineering expertise and AI workload understanding. As a key member of the team, you will be responsible for designing and operating large-scale data pipelines that support AI training, evaluation, and continual improvement workflows. You will work closely with ML researchers and engineers to align data systems with model development needs, driving the company's mission to transform business processes through technology.

The ideal candidate has experience operating petabyte-scale data systems, strong software engineering fundamentals, and a clear understanding of how data infrastructure choices propagate into model quality and training efficiency. You will be part of a dynamic team that is dedicated to building innovative solutions that help businesses automate and optimize their operations.

What You Will Do

Design and operate large-scale data pipelines supporting AI training, evaluation, and continual improvement workflows
Build ingestion systems for diverse modalities, including text, image, audio, video, and structured signals
Implement data cleaning, deduplication, filtering, and quality assurance at petabyte scale
Develop dataset versioning, lineage, and provenance tracking systems suitable for reproducible training
Build high-throughput data loading systems that maximize GPU utilization during training
Implement labeling workflows, active learning pipelines, and human-in-the-loop data improvement systems
Design storage architectures balancing cost, throughput, and latency across data tiers
Build evaluation dataset construction pipelines with strict integrity and contamination controls
Implement data privacy, redaction, and consent enforcement throughout the pipeline
Collaborate with ML researchers and engineers to align data systems with model development needs
Drive observability of data quality, drift, and pipeline health across teams

What We Are Looking For

6+ years of experience in data engineering, with a strong focus on AI workloads and large-scale data systems
Strong software engineering fundamentals, including proficiency in languages such as Python, Java, or C++
Experience with petabyte-scale data systems, including data ingestion, processing, and storage
Clear understanding of how data infrastructure choices propagate into model quality and training efficiency
Experience with data pipeline tools, such as Apache Beam, Apache Spark, or AWS Glue
Strong understanding of data quality, data governance, and data privacy principles
Excellent collaboration and communication skills, with the ability to work with cross-functional teams
Experience with Agile development methodologies and version control systems, such as Git

Nice to Have

Experience with machine learning frameworks, such as TensorFlow or PyTorch
Knowledge of cloud-based data platforms, such as AWS, GCP, or Azure
Experience with data visualization tools, such as Tableau or Power BI
Certification in data engineering or a related field
Experience with containers and orchestration tools, such as Docker or Kubernetes

Benefits and Perks

Competitive base salary commensurate with experience
Comprehensive benefits package, including health, dental, and vision insurance
401(k) retirement plan with company match
Generous PTO policy, including vacation, sick leave, and holidays
Remote work stipend, including equipment and software reimbursement
Opportunities for professional growth and development, including training and education reimbursement
Collaborative and dynamic work environment, with a team of experienced professionals

How to Stand Out

Be prepared to showcase your technical expertise and hands-on experience with large-scale data systems and AI workloads.
Highlight your understanding of data infrastructure choices and how they impact model quality and training efficiency.
Emphasize your experience with data pipeline tools, such as Apache Beam or Apache Spark, and your ability to design and operate large-scale data pipelines.
Demonstrate your ability to collaborate with cross-functional teams, including ML researchers and engineers.
Be prepared to discuss your approach to data quality, data governance, and data privacy principles.
Showcase your experience with Agile development methodologies and version control systems, such as Git.
Highlight any certifications or training you have received in data engineering or a related field.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.