Senior Database Reliability Engineer
WFA Digital Insight
The demand for skilled database engineers has skyrocketed, with a 27% increase in remote job postings in the last year. As companies shift to remote work, the need for experts who can manage and optimize database services has become crucial. Cloudlinux, a pioneer in remote-first infrastructure and security, is seeking a seasoned Senior Database Reliability Engineer to join their team. With a strong focus on PostgreSQL and ClickHouse, this role requires a unique blend of technical expertise and problem-solving skills. Before applying, candidates should be aware of the company's emphasis on automation, documentation, and collaboration.
Job Description
About the Role
The Senior Database Reliability Engineer will play a critical role in ensuring the reliability and performance of Cloudlinux's database services. As a key member of the Infrastructure Department, you will be responsible for designing, implementing, and maintaining highly available database systems. Your expertise in PostgreSQL and ClickHouse will be essential in supporting the company's products and services. You will work closely with cross-functional teams to identify and resolve database-related issues, improving overall system efficiency and user experience.The ideal candidate will have a deep understanding of database internals, replication, and high-availability designs. You will be expected to automate repeated tasks, implement monitoring and alerting systems, and develop scripts to improve database operations. Your ability to communicate complex technical concepts to non-technical stakeholders will be invaluable in this role.
What You Will Do
- Own production PostgreSQL reliability: design, implement, and maintain highly available database systems
- Improve disaster recovery and operational evidence: develop and test restore procedures, create documented recovery paths, and establish measurable RTO/RPO targets
- Support the wider database estate: troubleshoot incidents, review access and data-safety changes, and improve monitoring for ClickHouse, MongoDB, and Redis
- Automate DBA workflows using Ansible, Terraform/OpenTofu, GitLab CI/CD, and scripts
- Develop reproducible runbooks for provisioning, grants, backups, restores, health checks, and ownership metadata
- Collaborate with engineering teams to build DBaaS-style self-service capabilities
- Improve observability and incident response through Grafana, metrics, logs, SLOs, alert rules, and clear communication
- Participate in on-call rotations and provide 24/7 support for critical database issues
- Stay up-to-date with industry trends and emerging technologies in database management
What We Are Looking For
- 5+ years of experience in database administration, with a focus on PostgreSQL and ClickHouse
- Strong understanding of database internals, replication, and high-availability designs
- Proficiency in Linux and infrastructure fundamentals: systemd, networking, storage, filesystems, CPU/memory/disk bottlenecks, TLS, DNS, firewalls
- Experience with automation tools: Ansible, Terraform/OpenTofu, GitLab CI/CD
- Excellent problem-solving skills and ability to communicate complex technical concepts
- Strong understanding of security principles and data safety practices
- Ability to work in a fast-paced environment and adapt to changing priorities
Nice to Have
- Experience with MongoDB and Redis database systems
- Knowledge of cloud-based infrastructure and containerization technologies
- Familiarity with machine learning and AI-assisted engineering practices
Benefits and Perks
- Competitive salary and benefits package
- Opportunity to work with a remote-first company and collaborate with a global team
- Professional development and growth opportunities
- Access to cutting-edge technologies and tools
- Flexible working hours and remote work arrangements
- Comprehensive health insurance and wellness programs
- Generous paid time off and holiday policy
How to Stand Out
- Be prepared to discuss your experience with PostgreSQL and ClickHouse, including specific use cases and challenges you've overcome
- Showcase your automation skills by sharing examples of scripts or tools you've developed to improve database operations
- Highlight your ability to communicate complex technical concepts to non-technical stakeholders
- Emphasize your experience with Linux and infrastructure fundamentals, as well as your understanding of security principles and data safety practices
- Be ready to discuss your approach to problem-solving and incident response, including examples of times when you've had to think critically and act quickly
- Don't underestimate the importance of soft skills: demonstrate your ability to collaborate with cross-functional teams and work in a fast-paced environment
This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.