Intermediate Site Reliability Engineer, Environment Automation

GitlabGitlab·Remote(Remote, Canada; Remote, US)
Software Development
Adjust

WFA Digital Insight

The demand for skilled site reliability engineers has grown significantly, with a 25% increase in job postings over the past year. As companies like Gitlab continue to expand their remote teams, the need for professionals who can ensure the reliability and scalability of complex systems is more pressing than ever. With its strong focus on AI-driven productivity and innovation, Gitlab offers a unique environment for engineers to grow and make a meaningful impact. Before applying, candidates should be aware of the importance of automation, infrastructure as code, and collaboration in this role.

Job Description

About the Role

As an Intermediate Site Reliability Engineer focused on Environment Automation at Gitlab, you will play a critical role in ensuring the reliability, scalability, and security of hundreds of isolated GitLab environments for customers. Your work will involve treating everything as code and contributing to automation across the entire lifecycle, from initial provisioning to day-to-day operations. This is a unique challenge that requires collaboration with senior SREs to solve the complexities of managing many tenant environments in parallel.

The role is part of the Dedicated team, where you will help define, deploy, and maintain GitLab environments across cloud providers using infrastructure as code, deployment packages, and Kubernetes. Your contributions will directly impact how customers experience GitLab Dedicated and other managed offerings, enabling them to focus on building software while ensuring their GitLab environments are always production-ready.

Gitlab's commitment to innovation and AI-driven productivity means that as a member of the team, you will be expected to incorporate AI into your daily workflows to drive efficiency, innovation, and impact. This presents a compelling opportunity for professional growth and the chance to work with industry leaders to solve complex problems.

What You Will Do

  • Contribute to the design and evolution of infrastructure automation using Terraform, Ansible, and Kubernetes to provision, upgrade, and operate many GitLab environments with minimal manual effort.
  • Help debug and resolve production issues across Kubernetes clusters, GitLab components, and cloud services, then assist in building automation and safeguards that prevent similar issues from recurring.
  • Assist in creating and maintaining deployment and orchestration tools, such as Helm Charts, omnibus-gitlab configurations, and multi-tenant workflows, that make it easy for teams to manage GitLab environments at scale.
  • Collaborate with senior SREs to solve unique challenges of managing many tenant environments in parallel, each with its own constraints and integration points.
  • Contribute to automation that reduces manual work, assists in building tooling that orchestrates upgrades and configuration changes safely at scale, and supports an observability stack that lets the team understand and improve the health of every environment.
  • Treat everything as code, emphasizing infrastructure as code and contributing to automation across the lifecycle of GitLab environments.
  • Participate in the development of a high-performance culture driven by Gitlab's values and continuous knowledge exchange, enabling team members to reach their full potential.
  • Collaborate with the team to define, deploy, and maintain GitLab environments across cloud providers.
  • Contribute to building and maintaining a strong observability stack to monitor and improve the health of environments.
  • Engage in knowledge sharing and best practices to enhance the overall efficiency and effectiveness of the team.

What We Are Looking For

  • 2+ years of experience in a site reliability engineering role, preferably in a cloud-native environment.
  • Strong understanding of infrastructure as code (e.g., Terraform, CloudFormation) and containerization (e.g., Docker, Kubernetes).
  • Experience with automation tools such as Ansible or similar.
  • Proficiency in at least one programming language (e.g., Python, Ruby, Go).
  • Experience with CI/CD pipelines and practices.
  • Strong understanding of networking fundamentals and cloud computing concepts.
  • Ability to work collaboratively in a remote team environment.
  • Excellent problem-solving skills and the ability to debug complex issues.
  • Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).

Nice to Have

  • Experience with GitLab and its ecosystem.
  • Knowledge of AI and machine learning concepts and their application in automation and productivity.
  • Experience with multi-tenant environments and their unique challenges.
  • Certification in cloud computing (e.g., AWS, Azure, GCP) or containerization (e.g., Kubernetes).

Benefits and Perks

  • Competitive salary and benefits package.
  • Opportunity to work with a leading technology company that is shaping the future of software development.
  • Collaborative and dynamic work environment with a strong focus on innovation and AI-driven productivity.
  • Flexible working hours and remote work options.
  • Professional development and growth opportunities.
  • Access to cutting-edge technologies and tools.
  • Participation in a high-performance culture driven by values and continuous knowledge exchange.
  • Comprehensive health and wellness programs.
  • Generous parental leave policy.
  • Employee stock options.
  • Annual stipend for professional development and education.

How to Stand Out

  • Develop a strong understanding of infrastructure as code tools like Terraform and Ansible, as well as containerization using Kubernetes.
  • Build a personal project that demonstrates your ability to automate and manage complex systems, showcasing your problem-solving skills.
  • Learn about Gitlab's ecosystem and its unique challenges in managing multi-tenant environments.
  • Prepare examples of how you've incorporated AI or machine learning into your workflows to drive efficiency and innovation.
  • Research Gitlab's values and be ready to discuss how your work style and experience align with them.
  • Practice explaining complex technical concepts simply, as effective communication is key in this role.
  • Be ready to discuss your experience with CI/CD pipelines, monitoring tools, and logging practices.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.