Infrastructure Engineer

Orcrist Technologies·Remote(Germany)

Software Development

WFA Digital Insight

As demand for digital infrastructure specialists grows, with a 25% increase in 2025, professionals with expertise in bare-metal engineering and Kubernetes are in high demand. With the rise of remote work, companies like Orcrist Technologies are looking for skilled engineers to manage their infrastructure. Orcrist's innovative approach to data intelligence platforms and their commitment to remote work make this role stand out. Before applying, candidates should be aware of the required skills, including experience with NVIDIA GPU stacks and bare-metal Linux. With the right skills, this role can be a great fit for those looking to make a direct impact on critical missions.

Job Description

About the Role

The Infrastructure Engineer role at Orcrist Technologies is a critical position that involves designing, building, and operating bare-metal GPU server fleets across on-prem and air-gapped environments. This role is essential to the company's mission to provide a next-generation data intelligence platform using cutting-edge technologies. The successful candidate will be responsible for managing the infrastructure that powers every deployment, from the metal up.

As an Infrastructure Engineer, you will be working closely with the SRE and ML teams to deliver fast, reliable on-prem inference. This will involve partnering with the ML and MLOps teams on on-prem inference serving, including model deployment, GPU scheduling, and performance tuning. Your work will have a direct impact on critical missions across private and public-sector customers.

The role is based in Germany, and the company offers a remote-first approach with occasional team events in Berlin. This provides a great opportunity for those looking to work remotely while still being part of a collaborative team.

What You Will Do

Design, size, provision, and operate bare-metal GPU server fleets across on-prem and air-gapped environments
Own the NVIDIA GPU stack end to end, including drivers, CUDA, GPU Operator, Container Toolkit, MIG, and DCGM
Build the bare-metal substrate Kubernetes runs on, including node lifecycle, container runtime, GPU device plugins, node feature discovery, and kernel/NUMA tuning
Engineer data-center networking and resilient storage, including VLANs/switching, RDMA, Ceph/ZFS/NVMe
Partner with ML and MLOps on on-prem inference serving, including model deployment, GPU scheduling, and performance tuning
Plan and run on-site build-outs, including rack integration, power/UPS and cooling sizing, commissioning, capacity planning, runbooks, and operator handover
Operate in air-gapped or on-prem environments and travel to customer sites for builds and deployments
Document your work, focusing on methodical and calm approaches during hardware incidents
Collaborate with the SRE and ML teams to deliver fast, reliable on-prem inference

What We Are Looking For

5+ years of experience in bare-metal, HPC/GPU, data-center, or systems infrastructure engineering
Hands-on ownership of physical and compute infrastructure, including firmware, BMC, PXE, kernel, and storage tuning
Strong bare-metal Linux skills, including RHEL/Rocky/Ubuntu, and solid networking and storage fundamentals
Real experience with the NVIDIA GPU stack, including drivers, CUDA, GPU Operator, MIG, and DCGM
Experience serving GPU models in production, including model deployment, GPU scheduling, and performance tuning
Comfortable operating in air-gapped or on-prem environments and traveling to customer sites for builds and deployments
Eligible to work in Germany

Nice to Have

German language skills (B1+)
NVIDIA DGX/HGX or Slurm experience
InfiniBand/RDMA fabrics experience
Inference optimization experience, including TensorRT-LLM, vLLM, and quantization
Certifications such as NVIDIA NCP-AIO, Red Hat RHCSA/RHCE, or CKA/CKS
Field-engineering experience and familiarity with secure or regulated deployment environments

Benefits and Perks

Modern architecture and stack
Remote-first approach in Germany with occasional team events in Berlin
Home office budget and great equipment
30 days vacation
Direct impact on critical missions across private and public-sector customers
Opportunity to work on a next-generation data intelligence platform using cutting-edge technologies
Collaborative team environment with a focus on remote work
Professional development opportunities, including training and certifications
Access to the latest technologies and tools, including NVIDIA GPU stacks and Kubernetes

How to Stand Out

Tip: Make sure you have hands-on experience with the NVIDIA GPU stack, including drivers, CUDA, GPU Operator, MIG, and DCGM, as this is a crucial part of the role.
Tip: Highlight your experience with bare-metal Linux, including RHEL/Rocky/Ubuntu, and your understanding of networking and storage fundamentals.
Tip: Be prepared to discuss your experience with on-prem inference serving, including model deployment, GPU scheduling, and performance tuning.
Tip: Emphasize your ability to work in air-gapped or on-prem environments and your willingness to travel to customer sites for builds and deployments.
Tip: Show your understanding of the company's technology stack and your enthusiasm for working on a next-generation data intelligence platform.
Tip: Be prepared to provide examples of your experience with documentation, focusing on methodical and calm approaches during hardware incidents.
Tip: Research the company culture and values, and be prepared to discuss how you can contribute to the team's collaborative environment.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.