Principal Infrastructure Engineer

Voxel51Voxel51·Remote(United States)
Software Development

WFA Digital Insight

The demand for skilled infrastructure engineers has skyrocketed, with the market expected to grow by 25% in the next two years. As companies like Voxel51 continue to push the boundaries of AI-driven innovation, the need for experts who can design and implement scalable, secure infrastructure has never been more pressing. With the rise of remote work, candidates can now access these opportunities from anywhere in the United States. Before applying, it's essential to understand the unique challenges of serving unstructured data at scale and the importance of community-driven, open-source solutions.

Job Description

About the Role

As a Principal Infrastructure Engineer at Voxel51, you will play a crucial role in shaping the architecture and strategy of the systems that power the company's platform. Your day-to-day responsibilities will involve designing, building, and scaling deployment systems across cloud and on-premises environments, ensuring reliability, security, and repeatability. You will be part of a team that values autonomy, community, and human-first principles, and you will have the opportunity to collaborate with enterprise customers, guiding and troubleshooting their production deployments.

The role is fully remote, with the requirement to attend at least two in-person retreats per year. As a Principal Infrastructure Engineer, you will report to the engineering leadership team and will be responsible for mentoring peers and setting technical direction.

Voxel51's platform, FiftyOne, is a mission-critical linchpin for managing unstructured data, model development, and AI systems at the world's largest companies. The company believes in the power of open source and has already seen significant adoption, with over 4 million downloads to date.

What You Will Do

  • Shape the architecture and evolution of Voxel51's infrastructure to support deployments ranging from individual researchers to Fortune 500 enterprises
  • Design, build, and scale deployment systems across cloud (GCP, AWS, Azure) and on-premises environments
  • Partner with enterprise customers to deliver and support production-grade deployments in their environments
  • Lead infrastructure initiatives across engineering teams, enabling peers to develop, test, and ship features faster
  • Drive best practices in CI/CD, evolving pipelines and introducing new approaches where they add value
  • Develop and maintain deployment solutions for Voxel51-hosted environments (GKE) and customer on-prem installations (K8s or Docker Compose)
  • Champion developer productivity, improving workflows for development and automated cloud deployments
  • Troubleshoot and resolve complex infrastructure issues, spanning build failures, runtime failures, and customer deployment challenges
  • Anticipate and prevent failures by designing monitoring, alerting, and predictive solutions

What We Are Looking For

  • Deep experience with containerized environments, building, packaging, and debugging container images
  • Kubernetes (and Docker Compose) for orchestration, with experience building, maintaining, and deploying Helm charts
  • Infrastructure as Code expertise (Terraform, Ansible, or equivalent)
  • Scripting and automation skills (Bash or similar)
  • Python expertise, including build and environment management, packaging/distribution, release management, and dependency debugging
  • CI/CD systems experience, with a focus on GitHub Actions and Google Cloud Build
  • Strong understanding of cloud and on-premises environments, with experience in GCP, AWS, and Azure

Nice to Have

  • Experience with agile development methodologies and version control systems like Git
  • Familiarity with monitoring and logging tools like Prometheus, Grafana, and ELK Stack
  • Knowledge of security best practices and compliance frameworks like HIPAA and PCI-DSS

Benefits and Perks

  • Competitive salary and equity package
  • Opportunity to work with a cutting-edge, open-source platform
  • Collaborative, remote-first work environment with a strong emphasis on community and autonomy
  • Professional development opportunities, including conference attendance and training programs
  • Access to the latest tools and technologies, including cloud credits and software subscriptions
  • Flexible working hours and unlimited paid time off

How to Stand Out

  • Make sure to highlight your experience with containerized environments and Kubernetes in your resume and cover letter.
  • Be prepared to discuss your approach to CI/CD and infrastructure as code during the interview process.
  • Having a strong understanding of cloud and on-premises environments, as well as experience with Python and scripting languages, will be beneficial.
  • Showcase your ability to troubleshoot complex infrastructure issues and anticipate failures.
  • Don't be afraid to ask about the company culture and values during the interview, as Voxel51 prioritizes a human-first approach.
  • Keep an eye out for red flags, such as unclear expectations or a lack of emphasis on employee well-being.
  • When negotiating salary, be sure to highlight your unique skills and experience, and don't be afraid to ask about benefits and perks.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.