Operations Engineer Kuala Lumpur
WFA Digital Insight
As the demand for skilled operations engineers continues to rise, with a 25% increase in job postings over the past year, professionals with expertise in troubleshooting and incident management are in high demand. Xsolla, a leading global commerce company, is seeking an Operations Engineer to join their team in Kuala Lumpur. With the company's commitment to innovation and customer satisfaction, this role offers a unique opportunity to work with a dynamic team and contribute to the success of game developers worldwide. Before applying, candidates should be aware of the importance of strong communication skills, attention to detail, and the ability to work in a fast-paced environment.
Job Description
About the Role
The Operations Engineer will play a critical role in ensuring the smooth operation of Xsolla's global platform. As a key member of the Global Technical Operations (GTO) team, you will be responsible for monitoring and investigating production issues, improving incident response, and contributing to the development of better communication with partners and stakeholders. Your day-to-day tasks will involve collaborating with cross-functional teams, analyzing production data, and identifying trends and patterns to inform process improvements.As an Operations Engineer at Xsolla, you will have the opportunity to work with a talented team of professionals who are passionate about delivering exceptional results. You will be part of a dynamic and fast-paced environment that requires strong technical skills, attention to detail, and excellent communication skills. Your contributions will have a direct impact on the success of game developers and players worldwide, making this role an exciting and rewarding opportunity for the right candidate.
The GTO team is responsible for ensuring the reliability and performance of Xsolla's platform, and as an Operations Engineer, you will be at the forefront of this effort. You will work closely with the Technical Support Operations (TSO) Lead to resolve major incidents, provide technical expertise, and contribute to the development of incident management processes.
What You Will Do
- Serve as the primary dashboard monitor during your shift, detecting anomalies and determining whether alerts warrant an incident ticket or can be resolved through immediate investigation
- Triage and investigate production incidents, creating incident tickets and performing initial technical investigations using Datadog
- Own lower-severity incidents end-to-end from detection through resolution, diagnosing and executing runbook procedures to resolve incidents without escalation
- Escalate promptly when an incident is unresolved within defined thresholds or requires a code-level fix
- Support the TSO Lead during major incidents, surfacing real-time data, maintaining the incident ticket, and executing mitigation actions as directed
- Draft incident communications under TSO Lead direction, including internal Slack updates, stakeholder notifications, and customer-facing status page updates
- Analyze incident trends, recurring issues, and production bugs, compiling data from Datadog, JIRA, and Slack to identify patterns and contribute findings to regular reports
- Publish health reports of critical apps periodically, providing insights into platform performance and reliability
- Collaborate with cross-functional teams to develop and implement process improvements, contributing to the development of better communication with partners and stakeholders
What We Are Looking For
- Strong technical skills, including experience with observability platforms, scripting, and troubleshooting
- Excellent communication skills, with the ability to communicate clearly and effectively in English, both written and verbal
- Experience working in a fast-paced, dynamic environment, with the ability to prioritize tasks and manage multiple projects simultaneously
- Strong attention to detail, with the ability to analyze complex data sets and identify trends and patterns
- Experience working in SRE, DevOps, production operations, or NOC environments, supporting high-availability platforms
- Strong problem-solving skills, with the ability to diagnose and resolve complex technical issues
- Experience with incident management processes and tools, including JIRA Service Management and Datadog
- Ability to work collaboratively as part of a global team, with a strong emphasis on teamwork and open communication
Nice to Have
- Experience working in the gaming industry, with a strong understanding of the unique challenges and opportunities facing game developers
- Familiarity with cloud-based platforms, including AWS or Azure
- Experience with automation tools, including Ansible or Puppet
- Strong understanding of ITIL processes and principles
- Certification in ITIL, DevOps, or a related field
Benefits and Perks
- Competitive salary and benefits package
- Opportunity to work with a talented team of professionals who are passionate about delivering exceptional results
- Collaborative and dynamic work environment, with a strong emphasis on teamwork and open communication
- Professional development opportunities, including training and certification programs
- Flexible working hours and remote work options, with the ability to work from anywhere in the world
- Access to the latest technologies and tools, including Datadog, JIRA, and Slack
- Opportunity to contribute to the development of innovative products and services, with a focus on customer satisfaction and success
- Comprehensive health and wellness program, including mental health support and employee assistance programs
- Generous paid time off policy, including vacation days, sick leave, and holidays
- Employee recognition and rewards program, including bonuses and incentives for outstanding performance
How to Stand Out
- To stand out as a candidate, be prepared to provide examples of your experience with observability platforms, scripting, and troubleshooting, and highlight your ability to communicate complex technical issues clearly and effectively.
- When applying, make sure to tailor your resume and cover letter to the specific requirements of the role, and be prepared to provide examples of your experience working in a fast-paced, dynamic environment.
- During the interview process, be prepared to answer behavioral questions that assess your problem-solving skills, attention to detail, and ability to work collaboratively as part of a global team.
- If you're new to the gaming industry, be prepared to learn about the unique challenges and opportunities facing game developers, and be open to feedback and guidance from more experienced colleagues.
- When negotiating salary, be sure to research the market rate for Operations Engineers in Kuala Lumpur, and be prepared to make a strong case for your worth based on your skills, experience, and qualifications.
- Be aware of the importance of strong communication skills, and be prepared to provide examples of your ability to communicate clearly and effectively in English, both written and verbal.
- Be prepared to ask insightful questions during the interview process, such as what the biggest challenges facing the team are, and how the company approaches professional development and growth opportunities for employees.
This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.