Engineering Tech Lead (AI Infrastructure)

vCluster LabsvCluster Labs·Remote(Germany, United States)
Software Development
SalesforceAdjust

WFA Digital Insight

As demand for AI infrastructure specialists surges, with a 25% growth in job postings over the past year, roles like this Engineering Tech Lead position at vCluster Labs are becoming increasingly crucial. With the global AI market projected to exceed

90 billion by 2027, companies are scrambling to build robust AI infrastructure. vCluster Labs, pioneering Kubernetes virtualization for the AI era, stands out for its innovative approach and remote-first work culture. Candidates with hands-on experience in GPU infrastructure, particularly with NVIDIA HGX systems, are in high demand. Before applying, it's essential to understand the complexities of AI infrastructure and the importance of collaboration in a remote setup.

Job Description

About the Role

The Engineering Tech Lead (AI Infrastructure) at vCluster Labs is a pivotal role that involves building the foundation for the company's AI infrastructure product story. This position requires a unique blend of technical expertise, particularly in NVIDIA HGX systems, and leadership skills to guide the team in developing and deploying AI infrastructure. The role is deeply hands-on, involving the setup of HGX-based GPU labs, advising on equipment procurement, and commissioning systems.

Day-to-day, the Engineering Tech Lead will work closely with the engineering and product teams to inform design decisions, ensuring that the software meets real production requirements. This role is not just about technical proficiency but also about knowledge sharing, as the Engineering Tech Lead will be responsible for documenting reference architectures and teaching others about GPU infrastructure.

The team context for this role is dynamic, with vCluster Labs being in a hyper-growth phase. The company has raised over $30 million from top-tier VCs and is looking for motivated professionals to complement its team. The headquarters are in San Francisco, but the team is distributed globally, and the company embraces a remote-first work culture.

What You Will Do

  • Own the end-to-end setup of vCluster Labs' HGX-based test and lab environment, including equipment procurement and commissioning of NVIDIA HGX systems.
  • Advise on data center colocation, specify cabling and switch configurations, and physically commission HGX systems.
  • Develop and document reference architectures for AI Cloud operators and enterprise AI factories to deploy their own infrastructure alongside the vCluster Platform.
  • Make critical decisions on network topology, switch selection, and hardware configuration for the HGX infrastructure.
  • Work closely with the engineering and product teams to inform design decisions, ensuring the software meets real production requirements.
  • Teach and share knowledge with the team, including documentation, hands-on sessions, and day-to-day collaboration.
  • Build and maintain relationships with key stakeholders, including NVIDIA and the broader AI infrastructure community.
  • Stay updated on the latest developments in AI infrastructure and contribute to the company's thought leadership in this area.
  • Collaborate with the sales and marketing teams to build and present reference architectures directly to enterprise customers.
  • Participate in the development of the company's product roadmap, particularly in relation to AI infrastructure.

What We Are Looking For

  • Experience in physically provisioning NVIDIA HGX systems, including racking, cabling, and commissioning.
  • Deep understanding of data center hands-on expertise, including specifying cabling, connecting servers to top-of-rack switches, and selecting colocation facilities.
  • Networking depth, particularly in InfiniBand, high-speed Ethernet, and switch configuration.
  • Background in AI Cloud, HPC, or enterprise AI factories, with experience owning infrastructure at the hardware level.
  • Ability to document and explain technical reasoning and a passion for teaching others.
  • Familiarity with Adjust and Salesforce, particularly in how they integrate with AI infrastructure.
  • Experience with software depth, including Kubernetes, GPU scheduling (MIG, time-slicing), and the software stack that runs on top of HGX systems.
  • Customer-facing experience, particularly in building or presenting reference architectures directly to enterprise customers.

Nice to Have

  • Existing connections at NVIDIA or within the AI infrastructure community.
  • Experience with NVIDIA ecosystem relationships and how to leverage them for business growth.
  • Knowledge of emerging trends in AI infrastructure and how they might impact the company's product roadmap.

Benefits and Perks

  • Competitive salary and equity package, reflective of the company's stage and growth potential.
  • Comprehensive health insurance, including dental and vision, for employees and their families.
  • Generous PTO policy, including vacation days, sick leave, and holidays, to ensure a healthy work-life balance.
  • Remote stipend to support home office setup and productivity.
  • Opportunities for professional growth and development, including training and conference attendance.
  • Access to the latest technology and tools, including NVIDIA HGX systems and cutting-edge software.
  • Collaborative and dynamic work environment with a team of motivated professionals.
  • Flexible working hours and asynchronous communication to accommodate different time zones and work styles.

How to Stand Out

  • Highlight hands-on experience: When applying, make sure to emphasize any direct experience with NVIDIA HGX systems and AI infrastructure deployment.
  • Showcase knowledge sharing skills: Since teaching and sharing knowledge are crucial aspects of this role, prepare examples of how you've documented complex technical information or taught others about AI infrastructure.
  • Research vCluster Labs: Understand the company's mission, products, and values to demonstrate your interest and how you can contribute to its growth.
  • Prepare for deep technical questions: Be ready to dive into the specifics of GPU infrastructure, networking, and AI cloud operations during the interview process.
  • Emphasize collaboration and communication skills: Given the remote nature of the job, highlight your ability to work effectively in distributed teams and communicate complex ideas simply.
  • Review NVIDIA's ecosystem and partnerships: Familiarize yourself with NVIDIA's role in the AI infrastructure space and how vCluster Labs' products fit into this ecosystem.
  • Be ready to discuss future trends: Show that you're not just knowledgeable about current AI infrastructure but also aware of emerging trends and technologies that could impact the field.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.