AI Performance Optimization Engineer
WFA Digital Insight
The demand for AI performance optimization experts is on the rise, with a staggering 45% growth in job postings over the past year. As companies like Bright Vision Technologies continue to invest in AI-driven solutions, the need for skilled professionals who can maximize throughput and minimize latency has never been more pressing. With the company's commitment to innovation and employee growth, this role stands out in the remote job market. Candidates should be prepared to showcase their technical expertise in areas like compiler-level optimization and deep learning, as well as their ability to collaborate with cross-functional teams. Before applying, it's essential to understand the company's focus on in-house SOW engagements and direct W2 employment.
Job Description
About the Role
The AI Performance Optimization Engineer role at Bright Vision Technologies is a unique opportunity to work on cutting-edge AI solutions, focusing on optimizing the performance of large neural network systems. As a key member of the team, you will be responsible for extracting maximum throughput, minimizing latency, and reducing costs across training and inference workloads. The role requires a deep understanding of GPU architecture, model parallelism, and compiler-level optimization, as well as the ability to collaborate with cross-functional partners to translate ambiguous requirements into well-engineered solutions.The ideal candidate will have a strong engineering discipline, a clear communication style, and a track record of shipping meaningful work that holds up well in production. The role will involve working closely with product, design, engineering, operations, and business stakeholders to drive the development of scalable, secure, and user-friendly applications.
Bright Vision Technologies is a forward-thinking software development company dedicated to building innovative solutions that help businesses automate and optimize their operations. The company leverages cutting-edge technologies to create applications that make a real impact on the industry.
What You Will Do
- Profile and optimize end-to-end AI training and inference pipelines for throughput, latency, and cost
- Identify and eliminate bottlenecks across data loading, model compute, communication, and memory
- Implement and tune quantization, sparsity, and pruning strategies to reduce model footprint and accelerate inference
- Optimize distributed training using tensor parallelism, pipeline parallelism, FSDP, and ZeRO-style sharding
- Tune attention implementations using FlashAttention, paged attention, and related techniques
- Implement KV cache optimization, continuous batch, and other techniques to reduce memory usage and improve performance
- Collaborate with cross-functional teams to translate ambiguous requirements into well-engineered solutions
- Participate in code review, design review, and mentorship of more junior engineers to raise the bar for the team
- Develop and maintain technical documentation and knowledge base for AI performance optimization techniques and tools
What We Are Looking For
- 6+ years of experience in AI performance optimization, with a focus on deep learning and neural networks
- Strong understanding of GPU architecture, model parallelism, and compiler-level optimization
- Experience with distributed training and inference, including tensor parallelism and pipeline parallelism
- Proficiency in programming languages such as Python, C++, and CUDA
- Strong knowledge of data structures, algorithms, and software design patterns
- Experience with AI frameworks such as TensorFlow, PyTorch, or MXNet
- Strong engineering discipline, with a focus on code quality, testing, and validation
- Excellent communication and collaboration skills, with the ability to work with cross-functional teams
Nice to Have
- Experience with cloud-based AI platforms such as AWS SageMaker, Google Cloud AI Platform, or Azure Machine Learning
- Knowledge of containerization and orchestration using Docker and Kubernetes
- Familiarity with Agile development methodologies and version control systems such as Git
- Experience with data visualization tools such as Tableau, Power BI, or D3.js
Benefits and Perks
- Competitive base salary commensurate with experience
- Comprehensive benefits package, including health, dental, and vision insurance
- 401(k) retirement plan with company match
- Generous PTO policy, with flexible working hours and remote work options
- Access to cutting-edge technologies and tools, with opportunities for professional growth and development
- Collaborative and dynamic work environment, with a team of experienced professionals
- Recognition and reward programs, with bonuses and incentives for outstanding performance
How to Stand Out
- To stand out in this role, be prepared to showcase your technical expertise in areas like compiler-level optimization and deep learning, as well as your ability to collaborate with cross-functional teams.
- When applying, make sure to highlight your experience with AI frameworks such as TensorFlow, PyTorch, or MXNet, and your understanding of GPU architecture and model parallelism.
- In the interview process, be prepared to walk through your approach to optimizing AI training and inference pipelines, and to discuss your experience with distributed training and inference.
- As you negotiate your salary, be sure to consider the company's compensation package and benefits, as well as the current market rate for AI performance optimization engineers.
- When evaluating the company culture, pay attention to the team's dynamics and values, as well as the company's commitment to innovation and employee growth.
This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.