Site Reliability Engineer (USA Only - 100% Remote)

Close·Remote(USA - Remote)

Software Development

GTM

WFA Digital Insight

As the demand for skilled site reliability engineers continues to grow, with a 25% increase in job postings over the past year, Close is at the forefront of innovation in the field. With their commitment to a 100% remote workforce, they're leading the way in flexible and dynamic work environments. This role is particularly exciting for those interested in working with a diverse array of infrastructure tools and systems, and being part of a team that values simplicity, resilience, and maintainability. Candidates should be prepared to showcase their expertise in cloud computing, CI/CD pipelines, and configuration management, and demonstrate their ability to work effectively in a fast-paced, distributed team.

Job Description

About the Role

The Site Reliability Engineer role at Close is a critical component of the company's infrastructure team, responsible for building and maintaining the platforms that run all of Close's systems. This is a unique opportunity to work with a talented team of engineers who prioritize taking ownership and making a meaningful impact. As a Site Reliability Engineer, you will be working with a wide range of technologies, including multi-terrabyte MongoDB, PostgreSQL, and Elasticsearch clusters, as well as telemetry systems built on Grafana's LGTM stack and ClickHouse.

The team at Close is passionate about creating a stable, up-to-date system that can handle the demands of a rapidly growing company. With a focus on simplicity, resilience, and maintainability, the infrastructure team is committed to building composable and maintainable tools that can support the company's mission to supercharge sales productivity. As a Site Reliability Engineer, you will be responsible for ensuring the smooth operation of these systems, and working with the team to identify and resolve any issues that may arise.

What You Will Do

Design, build, and maintain the infrastructure that supports Close's systems, including MongoDB, PostgreSQL, and Elasticsearch clusters
Work with the team to develop and implement CI/CD pipelines using GitHub Actions and ArgoCD
Collaborate with the engineering team to identify and resolve issues with the system, and develop solutions to improve performance and reliability
Develop and maintain telemetry systems using Grafana's LGTM stack and ClickHouse
Work with the team to implement and maintain configuration management systems using Ansible and Terraform
Participate in on-call rotations to ensure 24/7 coverage of the system
Collaborate with the team to develop and maintain documentation of the system, including architecture diagrams and technical guides
Work with the team to identify and prioritize areas for improvement, and develop solutions to address these issues
Participate in code reviews and contribute to the development of the company's open-source projects

What We Are Looking For

5+ years of experience working as a site reliability engineer, with a focus on infrastructure and cloud computing
Experience working with a diverse array of infrastructure tools and systems, including CI/CD pipelines, configuration management, and telemetry systems
Strong knowledge of cloud computing platforms, including AWS and Kubernetes
Experience working with databases, including MongoDB, PostgreSQL, and Elasticsearch
Strong understanding of networking fundamentals, including TCP/IP and DNS
Experience working with containerization using Docker, and orchestration using Kubernetes
Strong knowledge of security principles and practices, including encryption and access control
Excellent communication and collaboration skills, with the ability to work effectively in a distributed team

Nice to Have

Experience working with other cloud computing platforms, including GCP and Azure
Knowledge of other programming languages, including Python and Java
Experience working with other CI/CD tools, including CircleCI and Jenkins
Knowledge of other configuration management systems, including Puppet and Chef
Experience working with other telemetry systems, including Prometheus and Loki

Benefits and Perks

Competitive salary and equity package
Flexible and dynamic work environment, with the opportunity to work remotely from anywhere in the US
Comprehensive benefits package, including health, dental, and vision insurance
Generous PTO policy, with unlimited paid time off
Opportunities for professional growth and development, including training and education programs
Access to the latest technologies and tools, including MacBook and software subscriptions
Participation in the company's open-source projects, and the opportunity to contribute to the development of new technologies
Collaborative and supportive team environment, with regular team-building activities and social events

Company Culture

At Close, we prioritize taking ownership and making a meaningful impact. We're a team of talented and passionate individuals who are committed to creating a product that our customers love. We value simplicity, resilience, and maintainability, and we're always looking for ways to improve our systems and processes. As a Site Reliability Engineer, you will be an integral part of our team, and will have the opportunity to make a real impact on the company's mission and vision.

How to Stand Out

Develop a strong understanding of cloud computing platforms, including AWS and Kubernetes, and be prepared to discuss your experience working with these technologies in an interview.
Make sure to highlight your experience working with CI/CD pipelines, configuration management, and telemetry systems, as these are critical components of the Site Reliability Engineer role.
Be prepared to discuss your approach to solving complex technical problems, and provide examples of times when you've had to troubleshoot and resolve issues in a fast-paced environment.
Emphasize your ability to work effectively in a distributed team, and provide examples of times when you've collaborated with others to achieve a common goal.
Be prepared to discuss your experience working with databases, including MongoDB, PostgreSQL, and Elasticsearch, and be able to explain your approach to database design and optimization.
Consider creating a personal project or contributing to an open-source project to demonstrate your skills and experience working with infrastructure and cloud computing technologies.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.