Senior Site Reliability Engineer, Core AI Infrastructure
WFA Digital Insight
As the demand for AI and machine learning specialists continues to rise, with a 25% growth in 2025, professionals with expertise in site reliability engineering are in high demand. Coinbase, a leader in the cryptocurrency space, is looking for a skilled Senior Site Reliability Engineer to drive its AI transformation. With the shift to remote work, companies are looking for professionals who can ensure the reliability and security of their infrastructure. This role offers a unique opportunity to work on critical AI infrastructure and collaborate with senior leadership. Before applying, candidates should be prepared to showcase their experience in automating and supporting cloud infrastructure, as well as their proficiency in programming languages like Python or Go.
Job Description
About the Role
The Senior Site Reliability Engineer will join a high-performing team of engineers driving AI transformation at Coinbase. This team builds and scales the infrastructure powering Coinbase's AI products, with direct exposure to senior leadership in a fast-paced, incubator-style environment. As a Senior Site Reliability Engineer, you will own the reliability and automation of critical AI infrastructure, ensuring our systems are resilient, observable, and secure at scale.The ideal candidate will have experience in automating and supporting cloud infrastructure, as well as proficiency in programming languages like Python or Go. You will be working closely with the Coinbase Infrastructure team to extend CI/CD frameworks supporting IT services and enterprise network platforms, and with Security and Compliance to integrate surveillance tooling into deployment pipelines.
What You Will Do
- Own the reliability, monitoring, and incident response lifecycle for AI infrastructure services, including on-call support for AWS deployment pipelines, root cause analysis, and blameless retros.
- Build automation and tooling to streamline operational IT workflows, eliminate manual tasks, and improve deployment velocity across CI/CD frameworks and Kubernetes environments.
- Partner with the Coinbase Infrastructure team to extend CI/CD frameworks supporting IT services and enterprise network platforms, and with Security and Compliance to integrate surveillance tooling into deployment pipelines.
- Strengthen observability and documentation standards across IT engineering by defining metrics, implementing monitoring solutions, and maintaining technical documentation that sets a standard of excellence.
- Develop full-stack applications that power internal AI products and infrastructure with Go or Python.
- Collaborate with cross-functional teams to ensure the successful deployment of AI products and infrastructure.
- Participate in on-call rotations to provide 24/7 support for AI infrastructure services.
- Continuously monitor and improve the performance, security, and reliability of AI infrastructure services.
What We Are Looking For
- 5+ years of experience automating and supporting cloud infrastructure (AWS) and network environments, with hands-on use of infrastructure-as-code tools (Terraform, Ansible, Chef, Puppet, or Salt).
- Proven experience deploying, managing, and troubleshooting containerized workloads using Docker and Kubernetes in production environments.
- Proficiency in at least one scripting or programming language (Python, Bash, Ruby, or Go) and version control workflows using Git-based CI/CD pipelines.
- Track record of leading incident response in environments with strict SLAs, including root cause analysis, blameless retros, and measurable reliability improvements.
- Utilizes generative AI responsibly, maintaining human oversight to deliver business-ready outputs and drive measurable improvements in workflow efficiency, cost, and quality.
- Excellent communication and collaboration skills, with the ability to work effectively with cross-functional teams.
Nice to Have
- Experience with CI/CD pipelines and automation tools such as Jenkins, GitLab CI/CD, or CircleCI.
- Knowledge of security and compliance frameworks, such as HIPAA, PCI-DSS, or SOC 2.
- Experience with cloud-based security solutions, such as AWS IAM or Google Cloud Security Command Center.
Benefits and Perks
- Competitive salary and benefits package, including medical, dental, and vision insurance, 401(k) matching, and flexible PTO.
- Opportunity to work on critical AI infrastructure and collaborate with senior leadership.
- Professional development opportunities, including training, mentorship, and conference sponsorship.
- Access to the latest technologies and tools, including cloud-based infrastructure and AI platforms.
- Flexible remote work arrangements, with quarterly in-person working sessions.
How to Stand Out
- Be prepared to showcase your experience in automating and supporting cloud infrastructure, and your proficiency in programming languages like Python or Go.
- Highlight your ability to work effectively with cross-functional teams, including experience with collaboration tools like Slack or Microsoft Teams.
- Emphasize your understanding of security and compliance frameworks, and your experience with cloud-based security solutions.
- Prepare examples of your experience with incident response, including root cause analysis and blameless retros.
- Research the company culture and values, and be prepared to discuss how you align with them.
- Be prepared to discuss your experience with generative AI, and how you utilize it responsibly in your work.
This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.