AI Performance Engineer

Bright Vision TechnologiesBright Vision Technologies·Remote(United States)
Software Development
Excel

WFA Digital Insight

As demand for AI solutions continues to rise, companies are looking for experts who can optimize and improve the performance of their AI systems. With the AI market projected to grow to

90 billion by 2025, professionals with skills in AI performance engineering are in high demand. Bright Vision Technologies, a forward-thinking software development company, is no exception. With a focus on innovation and scalability, they're looking for an AI Performance Engineer to join their team. Candidates should be prepared to showcase their expertise in GPU architecture, model parallelism, and compiler-level optimization, as well as their ability to work collaboratively with cross-functional teams.

Job Description

About the Role

The AI Performance Engineer role at Bright Vision Technologies is a key position that requires a deep understanding of AI systems and their optimization. The successful candidate will be responsible for extracting maximum throughput, minimizing latency, and reducing cost across training and inference workloads for large neural network systems. This role spans the full stack, from low-level kernel optimization to distributed system tuning, and requires a strong understanding of GPU architecture, model parallelism, memory management, and compiler-level optimization.

As an AI Performance Engineer, you will work closely with cross-functional partners, including product, design, engineering, operations, and business stakeholders, to translate ambiguous requirements into well-engineered solutions. You will be expected to raise the bar through code review, design review, and mentorship of more junior engineers. The ideal candidate will have a strong engineering discipline, a clear communication style, and a track record of shipping meaningful work that holds up well in production.

Bright Vision Technologies is a forward-thinking software development company dedicated to building innovative solutions that help businesses automate and optimize their operations. As a member of their team, you will be part of a dynamic and collaborative environment that values innovation, scalability, and user-friendly design.

What You Will Do

  • Profile and optimize end-to-end AI training and inference pipelines for throughput, latency, and cost
  • Identify and eliminate bottlenecks across data loading, model compute, communication, and memory
  • Implement and tune quantization, sparsity, and pruning strategies to reduce model footprint and accelerate inference
  • Optimize distributed training using tensor parallelism, pipeline parallelism, FSDP, and ZeRO-style sharding
  • Tune attention implementations using FlashAttention, paged attention, and related techniques
  • Implement KV cache optimization, continuous learning, and other optimization strategies
  • Collaborate with cross-functional teams to translate ambiguous requirements into well-engineered solutions
  • Participate in code review, design review, and mentorship of more junior engineers
  • Develop and maintain technical documentation and guides for AI performance optimization

What We Are Looking For

  • 6+ years of experience in AI performance engineering or a related field
  • Strong understanding of GPU architecture, model parallelism, memory management, and compiler-level optimization
  • Expertise in AI frameworks, including TensorFlow, PyTorch, or similar
  • Experience with distributed training and inference on large neural network systems
  • Strong programming skills in languages such as Python, C++, or Java
  • Experience with containerization using Docker and Kubernetes
  • Strong understanding of cloud computing platforms, including AWS, GCP, or Azure
  • Excellent communication and collaboration skills

Nice to Have

  • Experience with automated testing and validation frameworks
  • Familiarity with agile development methodologies and version control systems such as Git
  • Knowledge of data loading and processing pipelines, including data ingestion and preprocessing
  • Experience with model serving and deployment on cloud platforms

Benefits and Perks

  • Competitive base salary commensurate with experience
  • Comprehensive benefits package, including health, dental, and vision insurance
  • 401(k) or retirement plan with company match
  • Flexible PTO and vacation policy
  • Remote work stipend and support for home office setup
  • Opportunities for professional growth and development, including training and education programs
  • Access to cutting-edge technologies and tools, including AI and machine learning frameworks
  • Collaborative and dynamic work environment with a team of experienced professionals

How to Stand Out

  • Develop a strong portfolio that showcases your expertise in AI performance optimization, including examples of optimized pipelines and frameworks.
  • Highlight your experience with distributed training and inference on large neural network systems, and be prepared to discuss optimization strategies and techniques.
  • Familiarize yourself with Bright Vision Technologies' products and services, and be prepared to discuss how your skills and experience align with their mission and goals.
  • Be prepared to participate in a technical coding assessment, and make sure your programming skills are up to date.
  • Research the company culture and values, and be prepared to discuss how you would contribute to a collaborative and dynamic work environment.
  • Prepare to discuss your experience with cloud computing platforms, containerization, and automated testing and validation frameworks.
  • Be ready to provide specific examples of how you have optimized AI systems in the past, and what strategies you would use to improve performance in this role.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.