AI Performance Optimization Engineer

Bright Vision Technologies·Remote(United States)

Software Development

Excel

WFA Digital Insight

The demand for AI performance optimization experts is on the rise, with a staggering 45% growth in job postings over the past year. As companies like Bright Vision Technologies continue to invest in AI-driven solutions, the need for skilled professionals who can maximize throughput and minimize latency has never been more pressing. With the company's commitment to innovation and employee growth, this role stands out in the remote job market. Candidates should be prepared to showcase their technical expertise in areas like compiler-level optimization and deep learning, as well as their ability to collaborate with cross-functional teams. Before applying, it's essential to understand the company's focus on in-house SOW engagements and direct W2 employment.

Job Description

About the Role

The AI Performance Optimization Engineer role at Bright Vision Technologies is a unique opportunity to work on cutting-edge AI solutions, focusing on optimizing the performance of large neural network systems. As a key member of the team, you will be responsible for extracting maximum throughput, minimizing latency, and reducing costs across training and inference workloads. The role requires a deep understanding of GPU architecture, model parallelism, and compiler-level optimization, as well as the ability to collaborate with cross-functional partners to translate ambiguous requirements into well-engineered solutions.

The ideal candidate will have a strong engineering discipline, a clear communication style, and a track record of shipping meaningful work that holds up well in production. The role will involve working closely with product, design, engineering, operations, and business stakeholders to drive the development of scalable, secure, and user-friendly applications.

Bright Vision Technologies is a forward-thinking software development company dedicated to building innovative solutions that help businesses automate and optimize their operations. The company leverages cutting-edge technologies to create applications that make a real impact on the industry.

What You Will Do

Profile and optimize end-to-end AI training and inference pipelines for throughput, latency, and cost
Identify and eliminate bottlenecks across data loading, model compute, communication, and memory
Implement and tune quantization, sparsity, and pruning strategies to reduce model footprint and accelerate inference
Optimize distributed training using tensor parallelism, pipeline parallelism, FSDP, and ZeRO-style sharding
Tune attention implementations using FlashAttention, paged attention, and related techniques
Implement KV cache optimization, continuous batch, and other techniques to reduce memory usage and improve performance
Collaborate with cross-functional teams to translate ambiguous requirements into well-engineered solutions
Participate in code review, design review, and mentorship of more junior engineers to raise the bar for the team
Develop and maintain technical documentation and knowledge base for AI performance optimization techniques and tools

What We Are Looking For

6+ years of experience in AI performance optimization, with a focus on deep learning and neural networks
Strong understanding of GPU architecture, model parallelism, and compiler-level optimization
Experience with distributed training and inference, including tensor parallelism and pipeline parallelism
Proficiency in programming languages such as Python, C++, and CUDA
Strong knowledge of data structures, algorithms, and software design patterns
Experience with AI frameworks such as TensorFlow, PyTorch, or MXNet
Strong engineering discipline, with a focus on code quality, testing, and validation
Excellent communication and collaboration skills, with the ability to work with cross-functional teams

Nice to Have

Experience with cloud-based AI platforms such as AWS SageMaker, Google Cloud AI Platform, or Azure Machine Learning
Knowledge of containerization and orchestration using Docker and Kubernetes
Familiarity with Agile development methodologies and version control systems such as Git
Experience with data visualization tools such as Tableau, Power BI, or D3.js

Benefits and Perks

Competitive base salary commensurate with experience
Comprehensive benefits package, including health, dental, and vision insurance
401(k) retirement plan with company match
Generous PTO policy, with flexible working hours and remote work options
Access to cutting-edge technologies and tools, with opportunities for professional growth and development
Collaborative and dynamic work environment, with a team of experienced professionals
Recognition and reward programs, with bonuses and incentives for outstanding performance

How to Stand Out

To stand out in this role, be prepared to showcase your technical expertise in areas like compiler-level optimization and deep learning, as well as your ability to collaborate with cross-functional teams.
When applying, make sure to highlight your experience with AI frameworks such as TensorFlow, PyTorch, or MXNet, and your understanding of GPU architecture and model parallelism.
In the interview process, be prepared to walk through your approach to optimizing AI training and inference pipelines, and to discuss your experience with distributed training and inference.
As you negotiate your salary, be sure to consider the company's compensation package and benefits, as well as the current market rate for AI performance optimization engineers.
When evaluating the company culture, pay attention to the team's dynamics and values, as well as the company's commitment to innovation and employee growth.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.