MLOps Engineer

Bright Vision Technologies·Remote(United States)

Data & Analytics

WFA Digital Insight

As demand for AI and machine learning specialists grows, roles like MLOps Engineer are becoming increasingly crucial. With a 25% increase in AI adoption in 2025, companies like Bright Vision Technologies are looking for experts to design and operate high-performance inference platforms. To stand out, candidates need strong distributed systems and performance engineering expertise, as well as experience with machine learning serving. Before applying, consider the company's focus on innovation and scalability, and be prepared to showcase your technical abilities.

Job Description

About the Role

The MLOps Engineer role at Bright Vision Technologies is a unique opportunity to work on designing, building, and operating high-performance inference platforms for serving large machine learning models in production. This role focuses on the systems engineering side of AI deployment and requires strong distributed systems and performance engineering expertise. As an MLOps Engineer, you will be part of a dynamic team that is dedicated to building innovative solutions to help businesses automate and optimize their operations.

The day-to-day responsibilities of this role will involve working closely with cross-functional teams to design and operate model serving platforms that support diverse workloads, including LLMs, vision models, and recommendation systems. You will need to optimize inference performance, implement multi-tenant routing, and build autoscaling and capacity management systems that balance latency, throughput, and cost.

Bright Vision Technologies is a forward-thinking software development company that leverages cutting-edge technologies to create scalable, secure, and user-friendly applications. As a member of this team, you will have the opportunity to contribute to the company's mission of transforming business processes through technology and work on projects that have a real impact on the industry.

What You Will Do

Design and operate model serving platforms supporting diverse workloads, including LLMs, vision models, and recommendation systems
Optimize inference performance using continuous batching, paged attention, speculative decoding, and request multiplexing
Implement multi-tenant routing, rate limiting, and quality-of-service policies across model endpoints
Build autoscaling and capacity management systems that balance latency, throughput, and cost
Tune GPU utilization, memory management, and KV cache strategies for LLM serving workloads
Integrate model serving with API gateways, identity systems, and observability platforms
Implement caching, prompt deduplication, and response reuse strategies where appropriate
Drive end-to-end observability, including latency histograms, queue dynamics, GPU utilization, and error tracking
Develop deployment workflows, including canary releases, shadow testing, and automated rollback
Operate incident response for high-availability AI services and drive durable reliability improvements
Collaborate with ML and product teams to support new model releases and capability rollouts
Implement security controls, including request signing, content filtering, and abuse detection at the serving layer

What We Are Looking For

Bachelor's or Master's degree in Computer Science or a related field
Six or more years of experience in a related field, with a strong focus on distributed systems and performance engineering
Experience with machine learning serving and model deployment
Strong expertise in designing and operating high-performance inference platforms
Experience with cloud-based technologies, such as AWS or Azure
Strong understanding of trade-offs between latency, throughput, cost, and quality in ML serving
Experience with containerization, orchestration, and automation tools
Strong communication and collaboration skills

Nice to Have

Experience with natural language processing and computer vision
Knowledge of DevOps practices and tools
Experience with agile development methodologies
Certification in a related field, such as machine learning or cloud computing

Benefits and Perks

Competitive salary and benefits package
Opportunity to work with a dynamic and innovative company
Collaborative and supportive team environment
Professional development and growth opportunities
Flexible working hours and remote work options
Access to cutting-edge technologies and tools
Recognition and reward for outstanding performance
Comprehensive health insurance and retirement plan

How to Stand Out

Make sure you have a strong understanding of distributed systems and performance engineering, as well as experience with machine learning serving and model deployment.
Be prepared to showcase your technical abilities through a coding assessment or other evaluation methods.
Highlight your experience with cloud-based technologies and containerization, orchestration, and automation tools.
Emphasize your ability to collaborate and communicate effectively with cross-functional teams.
Consider learning more about the company's focus on innovation and scalability, and be prepared to discuss how your skills and experience align with these goals.
Be prepared to discuss your experience with security controls and implementation of request signing, content filtering, and abuse detection.
Research the company's products and services, and be prepared to discuss how you can contribute to the company's mission and goals.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.