Intermediate Site Reliability Engineer, Tenant Scale: Tenant Services
Software Development
Adjust
WFA Digital Insight
Demand for skilled site reliability engineers has surged as companies prioritize digital transformation and cloud infrastructure. With Gitlab's emphasis on AI-driven productivity, this role stands out in the remote job market. Candidates should be prepared to showcase their expertise in operating systems, storage, and networking, as well as their ability to automate and improve system reliability.
Job Description
About the Role
As a Site Reliability Engineer at GitLab, you will be responsible for keeping GitLab.com and other production systems running smoothly for millions of users. This involves combining pragmatic operations with strong software engineering practices, focusing on systems layer and edge services, and designing highly scalable, reliable, and secure infrastructure.Responsibilities
- Automate away toil and improve availability and performance
- Respond to incidents during local daytime hours as part of a globally distributed on-call rotation
- Work across the Infrastructure organization to scale customer data and increase automation
Requirements
- Strong software engineering practices
- Experience with systems layer, edge services, and Kubernetes workloads
Nice to Have
- Experience with automation and improving system reliability
How to Stand Out
- Be prepared to discuss your experience with Kubernetes and containerization, highlighting specific challenges you've overcome and solutions you've implemented.
- Showcase your ability to automate tasks and processes, and explain how you've used scripting languages or other tools to improve efficiency.
- Emphasize your understanding of system reliability and availability, and provide examples of how you've improved these metrics in previous roles.
- Review Gitlab's values and be prepared to discuss how your own work style and values align with the company's mission and culture.
- Prepare to back up your claims with specific numbers and metrics, such as 'Improved system uptime by 25% through automation and process improvements.'
This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.