Senior Site Reliability Engineer, CCIP

Chainlink LabsChainlink Labs·Remote(Argentina, Brazil, Canada, Colombia, Mexico, United States)
Software Development
Excel

WFA Digital Insight

The demand for skilled site reliability engineers has surged, with a 25% increase in job postings over the past year. Chainlink Labs, a pioneer in decentralized oracle networks, is seeking a seasoned professional to drive production resilience and operational excellence. With the blockchain market expected to reach $23.3 billion by 2027, this role offers a unique chance to contribute to the growth of a cutting-edge company. Before applying, candidates should be aware of the importance of scalability, reliability, and security in distributed systems, as well as the need for collaboration with cross-functional teams.

Job Description

About the Role

As a Senior Site Reliability Engineer on the CCIP Platform team at Chainlink Labs, you will play a critical role in ensuring the reliability, scalability, and operational excellence of the systems powering Chainlink's Cross-Chain Interoperability Protocol (CCIP). Your primary focus will be on strengthening production resilience, reducing operational toil, and enabling engineering teams to ship safely while maintaining high service availability. You will collaborate with cross-functional teams to establish operational standards that scale with the business and influence reliability practices across the platform.

The CCIP Platform team is responsible for developing and maintaining the infrastructure that enables secure and efficient communication between different blockchain networks. As a Senior Site Reliability Engineer, you will work closely with the team to identify areas for improvement, implement new technologies and processes, and drive adoption of best practices.

Chainlink Labs is a global company with a remote-first approach, allowing you to work from anywhere in the world. The company values flexibility, autonomy, and open communication, and is committed to creating a culture of innovation and collaboration.

What You Will Do

  • Improve deployment safety and increase delivery velocity by advancing production engineering practices
  • Establish distributed tracing across the platform to improve observability and accelerate incident investigation
  • Eliminate operational toil through automation that increases engineering efficiency and platform reliability
  • Drive adoption of meaningful SLOs, SLIs, and error budgets that guide engineering decisions and improve service health
  • Increase platform scalability and operational readiness as CCIP continues to grow
  • Strengthen Chainlink's reputation through highly available production systems while reducing operational overhead
  • Collaborate with engineering teams to conduct production-readiness reviews before service launches
  • Partner with software engineering teams to define and implement operational standards and best practices
  • Lead on-call operations, including defining rotations, escalation policies, and improving alert quality
  • Apply chaos engineering or fault-injection techniques to improve production resilience
  • Perform capacity planning and performance tuning for high-throughput distributed services

What We Are Looking For

  • Demonstrated experience in Site Reliability Engineering, Production Engineering, or a similar role operating large-scale distributed systems
  • Deep expertise defining, implementing, and driving adoption of SLOs, SLIs, and error budgets across engineering organizations
  • Built and operated production Kubernetes environments supporting critical services
  • Applied OpenTelemetry to improve observability across distributed systems
  • Experience improving the reliability, scalability, and operability of production infrastructure
  • Strong understanding of cloud-based infrastructure and containerization
  • Experience with scripting languages such as Python or Bash
  • Excellent problem-solving skills and ability to work in a fast-paced environment
  • Strong communication and collaboration skills
  • Experience working with agile development methodologies

Nice to Have

  • Demonstrated technical leadership influencing reliability practices across engineering teams
  • Experience performing capacity planning and performance tuning for high-throughput distributed services
  • Previous experience working on Web3 infrastructure or within a crypto-native engineering organization
  • Applied chaos engineering or fault-injection techniques to improve production resilience
  • Experience leading on-call operations, including defining rotations, escalation policies, and improving alert quality

Benefits and Perks

  • Competitive salary and benefits package
  • Opportunity to work with a cutting-edge technology company
  • Flexible working hours and remote work arrangement
  • Professional development and growth opportunities
  • Access to a global network of professionals
  • Health insurance and wellness programs
  • Retirement savings plan and matching program
  • Generous paid time off and holidays
  • Remote work stipend and equipment allowance

How to Stand Out

  • To stand out in this role, focus on highlighting your experience with large-scale distributed systems and your ability to drive adoption of reliability practices across engineering teams.
  • Make sure to have a strong understanding of cloud-based infrastructure, containerization, and scripting languages such as Python or Bash.
  • Be prepared to discuss your experience with SLOs, SLIs, and error budgets, and how you have applied them in previous roles.
  • Chainlink Labs values innovation and collaboration, so be prepared to discuss your experience working in agile development methodologies and your ability to collaborate with cross-functional teams.
  • When negotiating salary, be sure to research the market rate for Senior Site Reliability Engineers in your area and be prepared to discuss your expectations.
  • Be aware of the company's remote-first approach and be prepared to discuss your experience working remotely and your ability to work independently.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.