Site Reliability Engineer

Akamai Technologies·Remote(United States)

Software Development

WFA Digital Insight

As the demand for reliable digital infrastructure grows, so does the need for skilled Site Reliability Engineers. With a 25% increase in cloud computing adoption in 2025, companies like Akamai are at the forefront of this shift. Akamai's commitment to innovation and employee flexibility makes this role stand out in the current remote job market. Candidates should be prepared to showcase their expertise in automation, scalability, and collaboration, as well as their ability to work effectively in a distributed team environment. Before applying, consider the company's emphasis on work-life balance and its FlexBase program, which prioritizes employee autonomy and adaptability.

Job Description

About the Role

The Site Reliability Engineer position at Akamai Technologies is a critical role that focuses on ensuring the reliability, scalability, and performance of the company's digital infrastructure. As a key member of the Site Reliability Engineering team, you will collaborate with cross-functional teams to design, develop, and manage applications and infrastructure that support Akamai Cloud's products and services. Your primary goal will be to drive reliability improvements through automation, reduce operational toil, and increase the resilience of engineering processes.

The team's mission is to make life better for billions of people, billions of times a day, by providing a highly reliable and scalable digital platform. You will be responsible for partnering with software engineering, infrastructure, and platform teams to investigate complex production issues, identify root causes, and implement long-term corrective actions. Your expertise in configuration management, IAC, and CI/CD will be essential in driving the team's success.

Akamai's Site Reliability Engineering team is a highly skilled and collaborative group that values innovation, creativity, and technical excellence. As a member of this team, you will have the opportunity to work on complex and challenging projects, share knowledge and best practices with your peers, and contribute to the company's mission to make the internet a better experience for everyone.

What You Will Do

Design, develop, test, and operate critical services that support the reliability, scalability, and performance of Akamai's infrastructure
Develop and implement observability solutions, including monitoring, logging, alerting, and telemetry capabilities, to proactively detect and resolve issues
Drive reliability improvements through automation, reducing operational toil and increasing the resilience of engineering processes
Collaborate with software engineering, infrastructure, and platform teams to investigate complex production issues, identify root causes, and implement long-term corrective actions
Participate in an on-call rotation and provide leadership during incident response, driving timely service restoration, effective communication, and post-incident improvement efforts
Develop technical expertise in IAC systems and serve as a trusted technical resource, mentoring engineers and sharing best practices
Design, develop, and deploy software and infrastructure at scale in a Linux environment
Configure and manage Infrastructure as Code solutions using tools such as Terraform, SaltStack, Ansible, Chef, or Puppet
Ensure compliance with security and regulatory requirements, adhering to industry standards and best practices
Continuously monitor and evaluate the performance of Akamai's infrastructure, identifying areas for improvement and implementing optimizations

What We Are Looking For

Relevant experience in a Site Reliability or Software Engineering role, working with large-scale distributed systems
Bachelor's degree in Computer Engineering, Computer Science, or equivalent
Experience with Terraform, including module development, state management, workspace design, policy enforcement, and enterprise-scale Infrastructure as Code implementations
Experience managing Infrastructure as Code solutions using tools such as Terraform, SaltStack, Ansible, Chef, or Puppet
Experience designing, developing, and deploying software and infrastructure at scale in a Linux environment
Strong communication and interpersonal skills, with the ability to collaborate effectively with cross-functional teams
Experience with configuration management, IAC, and CI/CD
Strong understanding of security and regulatory requirements, with experience adhering to industry standards and best practices
Experience with monitoring, logging, and alerting tools, such as Prometheus, Grafana, or ELK
Experience with cloud-based infrastructure, such as AWS or Azure

Nice to Have

Experience with containerization using Docker or Kubernetes
Experience with serverless computing using AWS Lambda or Azure Functions
Experience with machine learning or artificial intelligence
Experience with agile development methodologies, such as Scrum or Kanban
Experience with IT service management frameworks, such as ITIL

Benefits and Perks

Competitive salary and benefits package
Opportunity to work with a highly skilled and collaborative team
Flexible work arrangements, including remote work options
Professional development opportunities, including training and education programs
Access to the latest technologies and tools
Comprehensive health and wellness programs
Generous paid time off and holiday schedule
Retirement savings plan with company match
Employee stock purchase plan
Flexible spending accounts for health and child care expenses
Employee assistance programs, including mental health support and financial counseling

How to Stand Out

Tip: Highlight your experience with automation tools, such as Terraform or Ansible, and provide specific examples of how you have used these tools to drive reliability improvements.
Tip: Be prepared to discuss your understanding of security and regulatory requirements, and provide examples of how you have ensured compliance in previous roles.
Tip: Emphasize your ability to collaborate effectively with cross-functional teams, and provide examples of how you have worked with software engineering, infrastructure, and platform teams to resolve complex issues.
Tip: Showcase your technical expertise in IAC systems and your experience with monitoring, logging, and alerting tools.
Tip: Research Akamai's company culture and values, and be prepared to discuss how your own values and work style align with the company's mission and vision.
Tip: Consider creating a portfolio of your work, including examples of your coding projects or infrastructure deployments, to demonstrate your technical skills and experience.
Tip: Be prepared to negotiate your salary and benefits package, and do not be afraid to ask about opportunities for professional development and growth within the company.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.