Staff Software Engineer Infrastructure
WFA Digital Insight
As the demand for cloud-based infrastructure specialists grows, Docker stands out for its commitment to innovation and remote-first culture. With over 20 million monthly users, the company is poised for further expansion. The role of a Staff Software Engineer Infrastructure is crucial in driving this growth, requiring expertise in multi-region network architecture, self-service systems, and AI-assisted operations. Candidates should be aware of the high bar for 'the easy path is also the safe path' and be prepared to lead technical direction and drive production adoption. The global remote job market has seen a significant increase in demand for skilled engineers, with a reported 25% rise in job postings for cloud infrastructure specialists in the past year alone.
Job Description
About the Role
Docker is seeking a highly skilled Staff Software Engineer Infrastructure to join its globally distributed, remote-first team. As a leader in developer tooling, Docker is trusted by over 20 million monthly users and is at the center of the shift towards AI-assisted software development. The company is investing heavily in its platform, which supports hundreds of engineers across multiple development teams, with the goal of creating a trustworthy and scalable infrastructure.The role of the Staff Software Engineer Infrastructure is to drive the technical direction of the platform, focusing on building self-service systems, multi-region network architecture, and AI-assisted operations. The ideal candidate will have expertise in Go, Terraform, Kubernetes, and experience with cloud-based infrastructure. They will be responsible for leading the team in building scalable and secure platforms, with a focus on paved roads, safe defaults, and strong guardrails.
The team at Docker is growing, with plans to expand from four to seven members this year. The company values a culture of innovation, collaboration, and remote-first work, offering a unique opportunity for engineers to work on cutting-edge projects and technologies.
What You Will Do
- Take ambiguous infrastructure problems and turn them into proposals that the organization can rally around, driving them through RFCs and architecture reviews across teams.
- Design self-service capabilities and platform APIs for onboarding, provisioning, deployment, observability defaults, and day-2 operations, with contracts and documentation that teams can use.
- Set delivery standards using Terraform, GitOps with Argo CD, progressive rollout, and good testing, including building the continuous-deployment flow.
- Evolve the multi-tenant EKS foundations towards better reliability, security, scale, and cost, including Envoy Gateway ingress, traffic routing, and multi-region, cross-account connectivity.
- Improve SLOs, alerting, and incident follow-up on Grafana Cloud, making production safer and less dependent on heroics.
- Collaborate with teams to drive adoption and measure success, focusing on outcomes that consuming teams feel, such as provisioning and shipping speed, autonomy, and reliability.
- Help shape the role of AI-assisted and agentic workflows, ensuring they stay safe, auditable, and human-reviewed.
- Drive the technical direction of the platform, focusing on building scalable and secure systems, with a focus on paved roads, safe defaults, and strong guardrails.
What We Are Looking For
- 5+ years of experience in software engineering, with a focus on infrastructure, cloud computing, and scalability.
- Expertise in Go, Terraform, Kubernetes, and experience with cloud-based infrastructure.
- Experience with self-service systems, multi-region network architecture, and AI-assisted operations.
- Strong understanding of security, reliability, and scalability principles.
- Experience with continuous-deployment flows, progressive rollout, and good testing practices.
- Strong collaboration and communication skills, with the ability to drive technical direction and lead teams.
- Experience with GitOps, Argo CD, and Envoy Gateway.
- Strong understanding of cloud computing platforms, including AWS, GCP, or Azure.
Nice to Have
- Experience with Docker, containerization, and microservices architecture.
- Knowledge of AI-assisted and agentic workflows, including alert enrichment and incident context-gathering.
- Experience with Grafana Cloud, Prometheus, and other monitoring and logging tools.
- Strong understanding of networking fundamentals, including TCP/IP, DNS, and load balancing.
Benefits and Perks
- Competitive salary and equity package.
- Comprehensive health, dental, and vision insurance.
- Flexible PTO policy, with a minimum of 20 days per year.
- Remote work stipend, including equipment and software allowance.
- Professional development opportunities, including training and conference attendance.
- Access to cutting-edge technologies and innovative projects.
- Collaborative and dynamic work environment, with a focus on innovation and growth.
How to Stand Out
- To stand out, make sure to highlight your experience with Go, Terraform, and Kubernetes, as well as your understanding of cloud computing platforms.
- When applying, be prepared to discuss your experience with self-service systems, multi-region network architecture, and AI-assisted operations.
- Showcase your ability to drive technical direction and lead teams, with a focus on paved roads, safe defaults, and strong guardrails.
- Be prepared to talk about your experience with continuous-deployment flows, progressive rollout, and good testing practices.
- Research Docker's company culture and values, and be prepared to discuss how you align with them.
- Make sure to review the job description carefully and tailor your application materials to the specific requirements and qualifications listed.
This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.