Staff Site Reliability Engineer, Production Engineering

DropboxDropbox·Remote(Remote - Canada: Select locations)
Software Development
Excel

WFA Digital Insight

As remote work reshapes the tech landscape, demand for skilled site reliability engineers has grown significantly. With AI technologies increasingly influencing software development, professionals who can ensure stability and operational excellence are in high demand. Dropbox, a pioneer in cloud storage, is seeking a Staff Site Reliability Engineer to drive its company-wide reliability strategy. To succeed in this role, candidates will need strong technical skills and the ability to collaborate across teams. With the global cloud storage market expected to reach new heights, this is an exciting opportunity for those looking to make a meaningful impact.

Job Description

About the Role

The Staff Site Reliability Engineer position at Dropbox is a critical role focused on advancing the company's stability, observability, incident response, and operational excellence. As AI technologies continue to reshape how software is built and operated, the successful candidate will play a key role in defining and evolving Dropbox's technical reliability strategy. This involves preparing the company for increases in pull request volume, service complexity, incident patterns, and demand for debugging and monitoring tools.

Day-to-day, the Staff Site Reliability Engineer will work closely with cross-functional teams, including Engineering, Product, and leadership, to raise the bar for reliability and guide long-term platform investments. This is a strategic role that requires strong technical expertise, excellent communication skills, and the ability to drive change across the organization.

What You Will Do

  • Define and evolve Dropbox's company-wide technical reliability strategy to support the changing engineering environment created by AI-assisted and agentic software development.
  • Set multi-year reliability goals, standards, and roadmaps across observability, debugging, incident management, service health, and operational readiness.
  • Lead cross-team initiatives that reduce reliability risk as software delivery velocity, pull request volume, service complexity, and incident volume increase.
  • Partner with engineering leaders and platform teams to improve monitoring, alerting, debugging, SLOs, SLAs, and incident response systems at company scale.
  • Identify emerging reliability trends and technologies and assess their potential impact on Dropbox's systems and services.
  • Develop and maintain a deep understanding of Dropbox's systems, services, and technology stack.
  • Collaborate with engineers to design and implement reliable systems and services.
  • Participate in on-call rotations to ensure 24/7 coverage of Dropbox's systems and services.
  • Analyze and resolve complex technical issues, including those related to system reliability and performance.
  • Develop and maintain technical documentation to support knowledge sharing and collaboration across teams.

What We Are Looking For

  • Strong technical skills, with a deep understanding of software development, system design, and reliability engineering principles.
  • Experience with cloud-based systems and services, including those related to monitoring, alerting, and incident management.
  • Excellent communication and collaboration skills, with the ability to work effectively with cross-functional teams.
  • Strong problem-solving skills, with the ability to analyze complex technical issues and develop effective solutions.
  • Experience with programming languages such as Python, Java, or C++.
  • Familiarity with Agile development methodologies and version control systems such as Git.
  • Strong understanding of system reliability and performance, including metrics such as uptime, latency, and throughput.
  • Experience with incident management and response, including the development of playbooks and runbooks.
  • Strong analytical and problem-solving skills, with the ability to analyze complex data sets and develop insights.

Nice to Have

  • Experience with AI-assisted and agentic software development, including the use of machine learning and automation technologies.
  • Familiarity with DevOps practices and tools, including those related to continuous integration and delivery.
  • Experience with cloud-based platforms such as AWS or Azure.
  • Strong understanding of security principles and practices, including those related to system hardening and vulnerability management.
  • Experience with technical leadership, including the development of technical roadmaps and strategic plans.

Benefits and Perks

  • Competitive salary and benefits package.
  • Opportunity to work with a talented team of engineers and technicians.
  • Collaborative and dynamic work environment.
  • Flexible working hours and remote work options.
  • Professional development opportunities, including training and education programs.
  • Access to cutting-edge technologies and tools.
  • Recognition and reward programs, including bonuses and stock options.
  • Comprehensive health and wellness programs, including medical, dental, and vision coverage.

How to Stand Out

  • Develop a strong understanding of system reliability and performance, including metrics such as uptime, latency, and throughput.
  • Familiarize yourself with cloud-based systems and services, including those related to monitoring, alerting, and incident management.
  • Highlight your problem-solving skills, including your ability to analyze complex technical issues and develop effective solutions.
  • Showcase your experience with programming languages such as Python, Java, or C++, and your familiarity with Agile development methodologies and version control systems such as Git.
  • Be prepared to discuss your experience with incident management and response, including the development of playbooks and runbooks.
  • Emphasize your ability to work effectively with cross-functional teams, including your excellent communication and collaboration skills.
  • Prepare to discuss your understanding of AI-assisted and agentic software development, including the use of machine learning and automation technologies.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.