Database Reliability Engineer - Core Team
WFA Digital Insight
The demand for skilled database reliability engineers has skyrocketed, with a 25% increase in job postings over the past year. As companies like ClickHouse lead the charge in real-time analytics and data warehousing, professionals with expertise in distributed database internals and SQL are in high demand. With its commitment to innovation and customer satisfaction, ClickHouse is an attractive destination for top talent. Before applying, candidates should be prepared to showcase their problem-solving skills, experience with cloud computing platforms, and ability to work in a fast-paced environment. The database reliability engineering role at ClickHouse is a unique opportunity for professionals to make a significant impact on the company's elastic, limitless scale, high-performance capabilities.
Job Description
About the Role
ClickHouse is seeking a highly skilled Database Reliability Engineer to join its Core Team. As a key member of the Site Reliability Engineering team, you will be responsible for ensuring the reliability, availability, scalability, and performance of ClickHouse core. You will collaborate with various teams, including Control Plane, Dataplane, Security, Support, and Operations, to implement ClickHouse in the best way possible for customers. Your expertise will be crucial in building and leading processes to improve the overall performance of ClickHouse.The successful candidate will have a strong background in reliability engineering, QA, or customer-facing engineering, with at least 5 years of experience. You will have a deep understanding of distributed database internals and SQL, as well as experience operating ClickHouse or other SQL databases in production. Your excellent problem-solving skills, combined with your ability to work in a fast-paced environment, will make you a valuable asset to the team.
What You Will Do
- Continuously improve the reliability and performance of ClickHouse core
- Improve and create metrics and alerts for ClickHouse to identify and prevent problems in production
- Dig deeper into common problems encountered by customers in ClickHouse Core to identify root causes and submit bug fixes, issue reports, and suggest improvements
- Enhance and refine incident response processes and post-mortem analysis for ClickHouse core-related outages
- Plan, enable, and drive Chaos initiatives across Engineering teams
- Manage on-call processes to respond to performance and reliability issues and establish best practices for coordinating escalation to resolve issues and minimize customer impact
- Collaborate with different teams to implement ClickHouse in the best way possible for customers
- Own areas of managing engineering escalation management and response, investigations, post-mortem analysis, and continuous improvement of how ClickHouse is run and optimized in the cloud
What We Are Looking For
- Bachelor's or Master's degree in Computer Science or a related field
- At least 5 years of experience in Reliability Engineering, QA, or customer-facing engineering
- Previous experience operating ClickHouse or other SQL databases in production
- Excellent understanding of distributed database internals and SQL
- Scripting experience with Shell or Python, and ability to read and understand C++ code
- Knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform
- Strong problem-solving skills and solid production debugging skills
- Ability to work in a fast-paced environment as part of a global team
Nice to Have
- Experience with Adjust and Excel
- Familiarity with ClickHouse's cloud-based offerings
- Certification in database administration or a related field
Benefits and Perks
- Competitive salary and benefits package
- Opportunity to work with a fast-growing and innovative company
- Collaborative and dynamic work environment
- Professional development and growth opportunities
- Flexible working hours and remote work options
- Access to cutting-edge technologies and tools
How to Stand Out
- Be prepared to showcase your experience with distributed database internals and SQL, as well as your ability to work in a fast-paced environment.
- Highlight your problem-solving skills and ability to debug production issues.
- Familiarize yourself with ClickHouse's cloud-based offerings and be prepared to discuss how you can contribute to the company's growth.
- Be ready to provide examples of your experience with scripting languages such as Shell or Python.
- Research the company culture and be prepared to discuss how you can thrive in a global team environment.
- Prepare to discuss your experience with cloud computing platforms and how you can apply that knowledge to improve ClickHouse's performance and reliability.
This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.