Staff Site Reliability Engineer, Release Engineering
WFA Digital Insight
As the demand for reliable and scalable digital infrastructure grows, so does the need for skilled Site Reliability Engineers. With the rise of remote work, companies like Plaid are looking for experts who can ensure seamless deployment and operation of complex systems. The market context is clear: the shift to online financial services has led to a 25% increase in demand for SREs in the past year. Plaid stands out for its commitment to empowering financial transformation, and this role is a unique chance to shape the company's reliability practices. Before applying, candidates should be aware of the high expectations for technical leadership and the need to drive organizational change.
Job Description
About the Role
The Staff Site Reliability Engineer role at Plaid is a technical leadership position that entails defining and scaling the company's reliability practices across product engineering. As a key member of the Infrastructure team, you will architect the SLO and error-budget framework, drive the adoption of progressive delivery, and ensure new products are production-ready. You will partner with product and platform teams to translate complex production needs into intuitive, self-service tooling.The Release Engineering team at Plaid owns the path from merge to production, including zero-touch deployment, progressive rollouts, metric-gated analysis, and automatic rollback. The goal is to make safe shipping the default for every product team. As a Staff Site Reliability Engineer, you will lead the expansion of reliability standards across product engineering, converting foundational infrastructure into lasting operational habits and tooling.
Plaid's culture is rooted in impact and collective growth, and the company seeks technical leaders who resonate with its principles of inventing tomorrow and embracing openness. The mission at Plaid is to unlock financial freedom for everyone, and to support this mission, the company seeks to build a diverse team of driven individuals who care deeply about making the financial ecosystem more equitable.
What You Will Do
- Lead the expansion of reliability standards across product engineering
- Architect and manage the SLO and error-budget framework
- Promote widespread use of progressive delivery and automated safety gates
- Guide emerging product teams toward production readiness
- Collaborate with SRE, Platform, and Infrastructure teams to transform complex production requirements into intuitive, self-service platform features
- Direct the response to critical incidents and ensure the resulting post-mortem actions yield permanent improvements to the platform
- Prepare for an AI-driven development landscape by scaling safety nets to handle an increased volume and frequency of code changes
- Define and scale Plaid's reliability practices across product engineering
- Drive the adoption of progressive delivery and automated safety gates
- Ensure new products are production-ready
What We Are Looking For
- Over 8 years of professional experience in backend systems, SRE, or platform engineering roles
- Proven track record of designing reliability programs that achieved cross-team adoption
- Direct experience building or operating canary rollout systems, metric-gated analysis, or automated rollback infrastructure
- Technical proficiency in software development, with a preference for Go or similar systems languages
- Ability to drive organizational change and influence engineering culture without formal authority
- Sound technical judgment in high-stakes production scenarios, balancing user impact with developer velocity
- Prior exposure to Kubernetes, service mesh technologies, Prometheus, or ArgoCD
Nice to Have
- Experience with AI-assisted development and its implications for reliability and safety
- Knowledge of cloud-native technologies and their application in production environments
- Familiarity with Agile development methodologies and their implementation in engineering teams
Benefits and Perks
- Competitive salary and equity package
- Comprehensive health, dental, and vision insurance
- Generous PTO policy and paid holidays
- Remote work stipend and flexible working hours
- Professional development opportunities and conference sponsorships
- Access to cutting-edge technologies and tools
- Collaborative and dynamic work environment
How to Stand Out
- To stand out in your application, highlight your experience with reliability programs and progressive delivery, and provide specific examples of how you drove adoption and improvement in your previous roles.
- Make sure to showcase your technical proficiency in software development, particularly with Go or similar systems languages.
- Emphasize your ability to drive organizational change and influence engineering culture without formal authority, as this is a key aspect of the Staff Site Reliability Engineer role.
- Prepare to discuss your experience with high-stakes production scenarios and how you balanced user impact with developer velocity.
- Be ready to talk about your vision for the future of reliability and safety in software development, and how you plan to contribute to Plaid's mission.
- Don't be afraid to ask about the company culture and values, and how they impact the way the team works and collaborates.
- Pay attention to the company's commitment to diversity and inclusion, and be prepared to discuss how you can contribute to these efforts.
This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.