Senior Site Reliability Engineer

Airbyte·Remote(San Francisco)

Software Development

WFA Digital Insight

The demand for skilled site reliability engineers has skyrocketed, with the industry experiencing a 27% growth in the last year alone. Airbyte, a pioneer in open-source data movement, is at the forefront of this trend. With its innovative approach to data integration, Airbyte is poised to revolutionize the way companies handle data. As a senior site reliability engineer, you'll play a crucial role in shaping the future of data infrastructure. With the rise of AI and machine learning, the ability to manage and optimize complex systems is more critical than ever. Before applying, candidates should be prepared to demonstrate their expertise in Kubernetes, Terraform, and AI-powered tooling.

Job Description

About the Role

The Senior Site Reliability Engineer will be responsible for building and maintaining the infrastructure underpinning Airbyte's Data Replication platform. This is a critical role that requires a deep understanding of cloud-based infrastructure, Kubernetes, and Terraform. As a member of the Data Replication team, you will work closely with product engineers to ensure seamless integration of product features with infrastructure.

The successful candidate will have a strong background in infrastructure engineering, with a focus on reliability, scalability, and security. You will be responsible for setting the infrastructure bar for the team, building self-serve tooling, and coaching engineers to own more of their stack.

Airbyte is committed to innovation and experimentation, and we're looking for someone who shares this vision. Our company culture values trust, directness, and craftsmanship, and we're looking for someone who embodies these principles.

What You Will Do

Own the infrastructure underpinning the Data Replication platform, including Kubernetes clusters, CI/CD pipelines, secrets management, networking, and cloud resource configuration across AWS and GCP
Partner with product engineers to reliably integrate product features with infrastructure
Maintain and enhance observability, alerting, and anomaly detection with an eye towards LLM automation
Maintain and enhance AI-augmented release and internal tooling, including canary deployments, progressive rollouts, automated release qualification, and rollback automation
Set the infrastructure bar for the team, building self-serve tooling, writing runbooks, and coaching engineers to own more of their stack
Collaborate with cross-functional teams to identify and prioritize infrastructure needs
Develop and maintain technical documentation for infrastructure components
Participate in on-call rotations and respond to incidents as needed
Identify areas for improvement and implement changes to increase efficiency and reliability

What We Are Looking For

7+ years of experience in infrastructure, platform engineering, SRE, or DevOps
Hands-on ownership of Kubernetes, Helm, and Terraform in production environments
Deep experience with observability stacks, including Prometheus, Grafana, and Datadog
Experience with CI/CD pipeline ownership and developer tooling
Ability and willingness to read backend code to understand how systems break and instrument them correctly
Fluency with AI tools, including LLMs and agentic frameworks
A startup-ready mindset, comfortable with ambiguity, moving fast, and owning problems end-to-end
Strong communication and collaboration skills
Experience with cloud-based infrastructure, including AWS and GCP
Strong understanding of security and compliance principles

Nice to Have

Experience with data pipelines, replication systems, or ETL/ELT platforms
Control plane/data plane architectures or internal developer platforms
Experience with Airbyte, CDKs, or connector-based architectures
Experience with agile development methodologies

Benefits and Perks

Competitive salary and equity package
Comprehensive health insurance, including medical, dental, and vision
Flexible PTO policy and paid holidays
Remote work stipend and equipment budget
Professional development opportunities, including conferences and training
Access to cutting-edge technology and tools
Collaborative and dynamic work environment

How to Stand Out

Tip: Make sure to highlight your experience with Kubernetes, Terraform, and AI-powered tooling in your resume and cover letter.
Tip: Be prepared to discuss your approach to infrastructure engineering, including your experience with cloud-based infrastructure and CI/CD pipelines.
Tip: Showcasing your ability to write clean, efficient code and your experience with observability stacks will be a major plus.
Tip: Demonstrate your understanding of security and compliance principles, and be prepared to discuss your experience with security audits and compliance frameworks.
Tip: Airbyte values innovation and experimentation, so be prepared to discuss your experience with new technologies and your willingness to take calculated risks.
Tip: Show enthusiasm for Airbyte's mission and values, and be prepared to discuss how your skills and experience align with the company's goals.
Tip: Be prepared to provide specific examples of your experience with infrastructure engineering, and be ready to discuss your approach to problem-solving and collaboration.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.