Senior Site Reliability Engineer, Observability
WFA Digital Insight
As demand for digital experiences continues to skyrocket, the need for reliable and scalable infrastructure has never been more pressing. With Webflow's commitment to remote work and AI-native digital experiences, this Senior Site Reliability Engineer role offers a unique chance to shape the future of web development. Candidates should be aware that the market for skilled SREs is highly competitive, with a 25% increase in job postings over the past year, driven by the growing importance of cloud infrastructure and cybersecurity.
Job Description
About the Role
The Senior Site Reliability Engineer, Observability role is a critical part of Webflow's engineering team, responsible for ensuring the reliability, stability, and scalability of the company's customer-facing infrastructure. This includes the Webflow application and hosting services, which serve millions of page views per hour. As a remote-first company, Webflow is committed to providing a collaborative and autonomous work environment.The Observability team is newly formed and focused on providing engineers with the tools, data, and practices needed to understand the health and performance of Webflow's services. As a Senior Site Reliability Engineer, you will be instrumental in shaping the future of this team and driving the adoption of best practices in observability.
What You Will Do
- Own and evolve Webflow's observability stack, including OpenTelemetry and Datadog, to provide reliable and actionable metrics, traces, and logs across services.
- Regularly dive into the main Webflow application in TypeScript, Node, or Go to better debug and sometimes fix behavior in production.
- Continuously raise the bar on observability practices by driving adoption of SLOs, distributed tracing, and structured logging throughout engineering.
- Build and maintain AI-powered agents and automation that help engineers surface insights faster, reduce alert fatigue, and accelerate incident resolution.
- Guide and empower engineers on other teams to instrument their services effectively and introduce new features into production with confidence.
- Participate in and continuously improve on-call and incident response processes, with a focus on making observability data the foundation of faster, more effective responses.
- Reduce toil by automating common observability workflows to keep the rest of engineering working smoothly with fewer interruptions.
- Partner effectively with engineering teams to define, implement, and improve observability practices, enabling them to confidently ship and operate services in production.
- Help define the culture of this growing team as it expands its international presence.
What We Are Looking For
- BS/BA college degree or relevant experience in a related field.
- Business-level fluency to read, write, and speak in English.
- 5+ years of experience building, maintaining, and debugging distributed systems in a customer-facing environment that allows for little to no downtime.
- Hands-on experience with observability platforms such as OpenTelemetry and Datadog.
- Strong understanding of software development principles, including testing, continuous integration, and continuous deployment.
- Experience with AI-powered tools and automation.
- Ability to work collaboratively in a remote-first environment.
- Strong problem-solving skills and attention to detail.
Nice to Have
- Experience with TypeScript, Node, or Go.
- Knowledge of cloud infrastructure and cybersecurity best practices.
- Experience with Agile development methodologies.
- Familiarity with AI-native digital experience platforms.
Benefits and Perks
- Competitive salary and benefits package.
- Opportunity to work with a leading AI-native digital experience platform.
- Collaborative and autonomous remote-first work environment.
- Professional development opportunities, including training and conference sponsorships.
- Access to the latest tools and technologies.
- Flexible PTO policy and remote work stipend.
- Health insurance and wellness programs.
- Equity opportunities for eligible employees.
How to Stand Out
- When applying, make sure to highlight your experience with observability platforms and distributed systems.
- Be prepared to discuss your approach to debugging and troubleshooting complex issues in a production environment.
- Showcase your ability to work collaboratively in a remote-first environment and your experience with AI-powered tools.
- Tailor your resume and cover letter to emphasize your skills in software development, testing, and continuous integration.
- Research Webflow's technology stack and be prepared to discuss how you can contribute to the company's mission and values.
- Be ready to provide specific examples of your experience with automation, incident response, and observability practices.
This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.