Site Reliability Engineer
WFA Digital Insight
As the demand for reliable digital infrastructure grows, so does the need for skilled Site Reliability Engineers. With a 25% increase in cloud computing adoption in 2025, companies like Akamai are at the forefront of this shift. Akamai's commitment to innovation and employee flexibility makes this role stand out in the current remote job market. Candidates should be prepared to showcase their expertise in automation, scalability, and collaboration, as well as their ability to work effectively in a distributed team environment. Before applying, consider the company's emphasis on work-life balance and its FlexBase program, which prioritizes employee autonomy and adaptability.
Job Description
About the Role
The Site Reliability Engineer position at Akamai Technologies is a critical role that focuses on ensuring the reliability, scalability, and performance of the company's digital infrastructure. As a key member of the Site Reliability Engineering team, you will collaborate with cross-functional teams to design, develop, and manage applications and infrastructure that support Akamai Cloud's products and services. Your primary goal will be to drive reliability improvements through automation, reduce operational toil, and increase the resilience of engineering processes.The team's mission is to make life better for billions of people, billions of times a day, by providing a highly reliable and scalable digital platform. You will be responsible for partnering with software engineering, infrastructure, and platform teams to investigate complex production issues, identify root causes, and implement long-term corrective actions. Your expertise in configuration management, IAC, and CI/CD will be essential in driving the team's success.
Akamai's Site Reliability Engineering team is a highly skilled and collaborative group that values innovation, creativity, and technical excellence. As a member of this team, you will have the opportunity to work on complex and challenging projects, share knowledge and best practices with your peers, and contribute to the company's mission to make the internet a better experience for everyone.
What You Will Do
- Design, develop, test, and operate critical services that support the reliability, scalability, and performance of Akamai's infrastructure
- Develop and implement observability solutions, including monitoring, logging, alerting, and telemetry capabilities, to proactively detect and resolve issues
- Drive reliability improvements through automation, reducing operational toil and increasing the resilience of engineering processes
- Collaborate with software engineering, infrastructure, and platform teams to investigate complex production issues, identify root causes, and implement long-term corrective actions
- Participate in an on-call rotation and provide leadership during incident response, driving timely service restoration, effective communication, and post-incident improvement efforts
- Develop technical expertise in IAC systems and serve as a trusted technical resource, mentoring engineers and sharing best practices
- Design, develop, and deploy software and infrastructure at scale in a Linux environment
- Configure and manage Infrastructure as Code solutions using tools such as Terraform, SaltStack, Ansible, Chef, or Puppet
- Ensure compliance with security and regulatory requirements, adhering to industry standards and best practices
- Continuously monitor and evaluate the performance of Akamai's infrastructure, identifying areas for improvement and implementing optimizations
What We Are Looking For
- Relevant experience in a Site Reliability or Software Engineering role, working with large-scale distributed systems
- Bachelor's degree in Computer Engineering, Computer Science, or equivalent
- Experience with Terraform, including module development, state management, workspace design, policy enforcement, and enterprise-scale Infrastructure as Code implementations
- Experience managing Infrastructure as Code solutions using tools such as Terraform, SaltStack, Ansible, Chef, or Puppet
- Experience designing, developing, and deploying software and infrastructure at scale in a Linux environment
- Strong communication and interpersonal skills, with the ability to collaborate effectively with cross-functional teams
- Experience with configuration management, IAC, and CI/CD
- Strong understanding of security and regulatory requirements, with experience adhering to industry standards and best practices
- Experience with monitoring, logging, and alerting tools, such as Prometheus, Grafana, or ELK
- Experience with cloud-based infrastructure, such as AWS or Azure
Nice to Have
- Experience with containerization using Docker or Kubernetes
- Experience with serverless computing using AWS Lambda or Azure Functions
- Experience with machine learning or artificial intelligence
- Experience with agile development methodologies, such as Scrum or Kanban
- Experience with IT service management frameworks, such as ITIL
Benefits and Perks
- Competitive salary and benefits package
- Opportunity to work with a highly skilled and collaborative team
- Flexible work arrangements, including remote work options
- Professional development opportunities, including training and education programs
- Access to the latest technologies and tools
- Comprehensive health and wellness programs
- Generous paid time off and holiday schedule
- Retirement savings plan with company match
- Employee stock purchase plan
- Flexible spending accounts for health and child care expenses
- Employee assistance programs, including mental health support and financial counseling
How to Stand Out
- Tip: Highlight your experience with automation tools, such as Terraform or Ansible, and provide specific examples of how you have used these tools to drive reliability improvements.
- Tip: Be prepared to discuss your understanding of security and regulatory requirements, and provide examples of how you have ensured compliance in previous roles.
- Tip: Emphasize your ability to collaborate effectively with cross-functional teams, and provide examples of how you have worked with software engineering, infrastructure, and platform teams to resolve complex issues.
- Tip: Showcase your technical expertise in IAC systems and your experience with monitoring, logging, and alerting tools.
- Tip: Research Akamai's company culture and values, and be prepared to discuss how your own values and work style align with the company's mission and vision.
- Tip: Consider creating a portfolio of your work, including examples of your coding projects or infrastructure deployments, to demonstrate your technical skills and experience.
- Tip: Be prepared to negotiate your salary and benefits package, and do not be afraid to ask about opportunities for professional development and growth within the company.
This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.