AI Performance Engineer

Bright Vision Technologies·Remote(United States)

Software Development

Excel

WFA Digital Insight

As demand for AI solutions continues to rise, companies are looking for experts who can optimize and improve the performance of their AI systems. With the AI market projected to grow to

90 billion by 2025, professionals with skills in AI performance engineering are in high demand. Bright Vision Technologies, a forward-thinking software development company, is no exception. With a focus on innovation and scalability, they're looking for an AI Performance Engineer to join their team. Candidates should be prepared to showcase their expertise in GPU architecture, model parallelism, and compiler-level optimization, as well as their ability to work collaboratively with cross-functional teams.

Job Description

About the Role

The AI Performance Engineer role at Bright Vision Technologies is a key position that requires a deep understanding of AI systems and their optimization. The successful candidate will be responsible for extracting maximum throughput, minimizing latency, and reducing cost across training and inference workloads for large neural network systems. This role spans the full stack, from low-level kernel optimization to distributed system tuning, and requires a strong understanding of GPU architecture, model parallelism, memory management, and compiler-level optimization.

As an AI Performance Engineer, you will work closely with cross-functional partners, including product, design, engineering, operations, and business stakeholders, to translate ambiguous requirements into well-engineered solutions. You will be expected to raise the bar through code review, design review, and mentorship of more junior engineers. The ideal candidate will have a strong engineering discipline, a clear communication style, and a track record of shipping meaningful work that holds up well in production.

Bright Vision Technologies is a forward-thinking software development company dedicated to building innovative solutions that help businesses automate and optimize their operations. As a member of their team, you will be part of a dynamic and collaborative environment that values innovation, scalability, and user-friendly design.

What You Will Do

Profile and optimize end-to-end AI training and inference pipelines for throughput, latency, and cost
Identify and eliminate bottlenecks across data loading, model compute, communication, and memory
Implement and tune quantization, sparsity, and pruning strategies to reduce model footprint and accelerate inference
Optimize distributed training using tensor parallelism, pipeline parallelism, FSDP, and ZeRO-style sharding
Tune attention implementations using FlashAttention, paged attention, and related techniques
Implement KV cache optimization, continuous learning, and other optimization strategies
Collaborate with cross-functional teams to translate ambiguous requirements into well-engineered solutions
Participate in code review, design review, and mentorship of more junior engineers
Develop and maintain technical documentation and guides for AI performance optimization

What We Are Looking For

6+ years of experience in AI performance engineering or a related field
Strong understanding of GPU architecture, model parallelism, memory management, and compiler-level optimization
Expertise in AI frameworks, including TensorFlow, PyTorch, or similar
Experience with distributed training and inference on large neural network systems
Strong programming skills in languages such as Python, C++, or Java
Experience with containerization using Docker and Kubernetes
Strong understanding of cloud computing platforms, including AWS, GCP, or Azure
Excellent communication and collaboration skills

Nice to Have

Experience with automated testing and validation frameworks
Familiarity with agile development methodologies and version control systems such as Git
Knowledge of data loading and processing pipelines, including data ingestion and preprocessing
Experience with model serving and deployment on cloud platforms

Benefits and Perks

Competitive base salary commensurate with experience
Comprehensive benefits package, including health, dental, and vision insurance
401(k) or retirement plan with company match
Flexible PTO and vacation policy
Remote work stipend and support for home office setup
Opportunities for professional growth and development, including training and education programs
Access to cutting-edge technologies and tools, including AI and machine learning frameworks
Collaborative and dynamic work environment with a team of experienced professionals

How to Stand Out

Develop a strong portfolio that showcases your expertise in AI performance optimization, including examples of optimized pipelines and frameworks.
Highlight your experience with distributed training and inference on large neural network systems, and be prepared to discuss optimization strategies and techniques.
Familiarize yourself with Bright Vision Technologies' products and services, and be prepared to discuss how your skills and experience align with their mission and goals.
Be prepared to participate in a technical coding assessment, and make sure your programming skills are up to date.
Research the company culture and values, and be prepared to discuss how you would contribute to a collaborative and dynamic work environment.
Prepare to discuss your experience with cloud computing platforms, containerization, and automated testing and validation frameworks.
Be ready to provide specific examples of how you have optimized AI systems in the past, and what strategies you would use to improve performance in this role.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.