AI Infrastructure & Platform Operations Engineer

name·Remote(Poland)

AI & Machine Learning

Excel

WFA Digital Insight

The demand for skilled AI infrastructure engineers is on the rise, with a 25% increase in job openings in the past year alone. As companies invest heavily in AI infrastructure, professionals with expertise in NVIDIA GPUs, Kubernetes, and high-performance networking are in high demand. With the global AI market expected to reach

90 billion by 2025, this role offers a unique opportunity to be at the forefront of AI innovation. Candidates should be prepared to showcase their technical expertise and experience working in complex infrastructure environments. Before applying, it's essential to understand the company's vision for AI infrastructure and how this role contributes to its growth.

Job Description

About the Role

As an AI Infrastructure & Platform Operations Engineer, you will play a crucial role in the European AI Infrastructure & Platform Operations team. Your primary responsibility will be to monitor, operate, and support large-scale AI infrastructure environments powered by cutting-edge technologies such as NVIDIA GPUs, high-performance networking, and Kubernetes. The team is responsible for ensuring the smooth operation of these environments, and your expertise will be instrumental in resolving infrastructure-related incidents and improving overall system efficiency.

The AI Infrastructure & Platform Operations team is at the forefront of AI innovation, working with the latest technologies to drive business growth. As a key member of this team, you will have the opportunity to gain exposure to next-generation AI infrastructure and contribute to shaping the future of AI-powered operations. Your work will have a direct impact on the company's ability to deliver high-quality AI solutions, making this a highly rewarding role for those passionate about AI and infrastructure.

What You Will Do

Monitor and operate production AI infrastructure platforms to ensure high availability and performance
Investigate and resolve infrastructure, networking, hardware, and platform-related incidents
Collaborate with cross-functional teams to implement new technologies and improve existing infrastructure
Develop and maintain operational documentation and runbooks for AI infrastructure environments
Participate in shift-based operational environments, providing 24/7 support for critical systems
Work closely with the development team to ensure seamless integration of new features and technologies
Analyze system performance and provide recommendations for optimization
Develop and implement automation scripts to improve efficiency and reduce manual errors
Stay up-to-date with the latest advancements in AI infrastructure and platform technologies

What We Are Looking For

At least 3+ years of experience in infrastructure operations, platform operations, network operations, site reliability engineering, cloud operations, or related technical roles
Strong Linux administration and troubleshooting skills
Good understanding of networking concepts and experience diagnosing infrastructure-related issues
Working knowledge of Kubernetes in production environments
Experience supporting production infrastructure and services
Strong analytical and problem-solving skills
Experience working within structured operational and incident management processes
Excellent communication and collaboration skills

Nice to Have

Experience with NVIDIA GPU technologies and high-performance computing environments
Knowledge of cloud platforms such as AWS or Azure
Familiarity with containerization technologies like Docker
Experience with automation tools like Ansible or Terraform

Benefits and Perks

Competitive salary and benefits package
Opportunity to work with cutting-edge AI technologies and contribute to the development of next-generation AI infrastructure
Collaborative and dynamic work environment with a team of experienced professionals
Professional development opportunities, including training and conference attendance
Flexible working hours and remote work options
Access to the latest tools and technologies
Recognition and reward for outstanding performance
Comprehensive health insurance and retirement plan

How to Stand Out

Ensure your resume highlights specific experience with Linux administration, Kubernetes, and high-performance networking.
Be prepared to provide examples of complex infrastructure issues you've resolved in the past.
Familiarize yourself with NVIDIA GPU technologies and their applications in AI infrastructure.
Showcase your ability to work collaboratively in a team environment and effectively communicate technical concepts.
Consider creating a personal project or contributing to open-source projects to demonstrate your skills in AI infrastructure and platform operations.
Prepare to discuss your experience with automation tools and scripting languages.
Research the company's approach to AI infrastructure and be ready to discuss how your skills align with their vision.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.