Infrastructure Engineer

Orcrist Technologies·Remote(Germany)
Software Development

WFA Digital Insight

As demand for digital infrastructure specialists grows, with a 25% increase in 2025, professionals with expertise in bare-metal engineering and Kubernetes are in high demand. With the rise of remote work, companies like Orcrist Technologies are looking for skilled engineers to manage their infrastructure. Orcrist's innovative approach to data intelligence platforms and their commitment to remote work make this role stand out. Before applying, candidates should be aware of the required skills, including experience with NVIDIA GPU stacks and bare-metal Linux. With the right skills, this role can be a great fit for those looking to make a direct impact on critical missions.

Job Description

About the Role

The Infrastructure Engineer role at Orcrist Technologies is a critical position that involves designing, building, and operating bare-metal GPU server fleets across on-prem and air-gapped environments. This role is essential to the company's mission to provide a next-generation data intelligence platform using cutting-edge technologies. The successful candidate will be responsible for managing the infrastructure that powers every deployment, from the metal up.

As an Infrastructure Engineer, you will be working closely with the SRE and ML teams to deliver fast, reliable on-prem inference. This will involve partnering with the ML and MLOps teams on on-prem inference serving, including model deployment, GPU scheduling, and performance tuning. Your work will have a direct impact on critical missions across private and public-sector customers.

The role is based in Germany, and the company offers a remote-first approach with occasional team events in Berlin. This provides a great opportunity for those looking to work remotely while still being part of a collaborative team.

What You Will Do

  • Design, size, provision, and operate bare-metal GPU server fleets across on-prem and air-gapped environments
  • Own the NVIDIA GPU stack end to end, including drivers, CUDA, GPU Operator, Container Toolkit, MIG, and DCGM
  • Build the bare-metal substrate Kubernetes runs on, including node lifecycle, container runtime, GPU device plugins, node feature discovery, and kernel/NUMA tuning
  • Engineer data-center networking and resilient storage, including VLANs/switching, RDMA, Ceph/ZFS/NVMe
  • Partner with ML and MLOps on on-prem inference serving, including model deployment, GPU scheduling, and performance tuning
  • Plan and run on-site build-outs, including rack integration, power/UPS and cooling sizing, commissioning, capacity planning, runbooks, and operator handover
  • Operate in air-gapped or on-prem environments and travel to customer sites for builds and deployments
  • Document your work, focusing on methodical and calm approaches during hardware incidents
  • Collaborate with the SRE and ML teams to deliver fast, reliable on-prem inference

What We Are Looking For

  • 5+ years of experience in bare-metal, HPC/GPU, data-center, or systems infrastructure engineering
  • Hands-on ownership of physical and compute infrastructure, including firmware, BMC, PXE, kernel, and storage tuning
  • Strong bare-metal Linux skills, including RHEL/Rocky/Ubuntu, and solid networking and storage fundamentals
  • Real experience with the NVIDIA GPU stack, including drivers, CUDA, GPU Operator, MIG, and DCGM
  • Experience serving GPU models in production, including model deployment, GPU scheduling, and performance tuning
  • Comfortable operating in air-gapped or on-prem environments and traveling to customer sites for builds and deployments
  • Eligible to work in Germany

Nice to Have

  • German language skills (B1+)
  • NVIDIA DGX/HGX or Slurm experience
  • InfiniBand/RDMA fabrics experience
  • Inference optimization experience, including TensorRT-LLM, vLLM, and quantization
  • Certifications such as NVIDIA NCP-AIO, Red Hat RHCSA/RHCE, or CKA/CKS
  • Field-engineering experience and familiarity with secure or regulated deployment environments

Benefits and Perks

  • Modern architecture and stack
  • Remote-first approach in Germany with occasional team events in Berlin
  • Home office budget and great equipment
  • 30 days vacation
  • Direct impact on critical missions across private and public-sector customers
  • Opportunity to work on a next-generation data intelligence platform using cutting-edge technologies
  • Collaborative team environment with a focus on remote work
  • Professional development opportunities, including training and certifications
  • Access to the latest technologies and tools, including NVIDIA GPU stacks and Kubernetes

How to Stand Out

  • Tip: Make sure you have hands-on experience with the NVIDIA GPU stack, including drivers, CUDA, GPU Operator, MIG, and DCGM, as this is a crucial part of the role.
  • Tip: Highlight your experience with bare-metal Linux, including RHEL/Rocky/Ubuntu, and your understanding of networking and storage fundamentals.
  • Tip: Be prepared to discuss your experience with on-prem inference serving, including model deployment, GPU scheduling, and performance tuning.
  • Tip: Emphasize your ability to work in air-gapped or on-prem environments and your willingness to travel to customer sites for builds and deployments.
  • Tip: Show your understanding of the company's technology stack and your enthusiasm for working on a next-generation data intelligence platform.
  • Tip: Be prepared to provide examples of your experience with documentation, focusing on methodical and calm approaches during hardware incidents.
  • Tip: Research the company culture and values, and be prepared to discuss how you can contribute to the team's collaborative environment.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.