Manager II, Engineering - Site Reliability Engineering

Datadog·Remote(France, Remote; Ireland, Remote)

Software Development

Adjust

WFA Digital Insight

The demand for skilled site reliability engineers has grown 25% in the past year, driven by the need for robust and secure systems in the AI era. With the global shift to remote work, companies like Datadog are looking for leaders who can drive reliability and security initiatives across distributed teams. As a Manager II, Engineering at Datadog, you will have the opportunity to shape the strategic vision for reliability and security, working with a talented team of engineers and contributing to the company's mission to democratize access to observability and security. With a strong focus on employee growth and development, Datadog offers a unique opportunity for career advancement and skills growth in a rapidly evolving field.

Job Description

About the Role

The Manager II, Engineering - Site Reliability Engineering role at Datadog is a unique opportunity to lead a team of talented engineers and drive the strategic vision for reliability and security. As a leader in the site reliability engineering organization, you will be responsible for shaping the direction of reliability and security initiatives, working closely with cross-functional teams to drive organization-wide outcomes. With a focus on innovation and continuous improvement, you will have the opportunity to contribute to the development of new technologies and approaches that will help Datadog stay ahead of the curve in the rapidly evolving field of observability and security.

The site reliability engineering team at Datadog is responsible for ensuring the reliability and security of the company's systems and infrastructure. This includes developing and implementing strategies for resilience validation, chaos engineering, and penetration testing, as well as working closely with other teams to drive a culture of reliability and security across the organization. As a Manager II, Engineering, you will be responsible for leading a team of engineers and contributing to the development of the company's overall strategy for reliability and security.

What You Will Do

Lead and mentor engineering managers, fostering their growth and development and building strong, high-performing teams across the organization
Contribute to and advance the vision for reliability at Datadog, including adapting practices in response to the evolving impact of AI on software development
Guide teams in defining and executing roadmaps that balance immediate impact with sustainable, long-term improvements
Build strong cross-functional partnerships across engineering, security, and product teams to drive organization-wide reliability outcomes
Champion a solutions/outcome-oriented approach by driving risk mitigation efforts and leading initiatives that result in measurable reliability improvements
Collaborate with other leaders to drive a culture of reliability and security across the organization
Develop and implement strategies for resilience validation, chaos engineering, and penetration testing
Work closely with other teams to drive a culture of reliability and security across the organization
Stay up-to-date with industry trends and developments in site reliability engineering and contribute to the company's thought leadership in this area

What We Are Looking For

Experience managing managers in dynamic, fast-evolving environments and a passion for developing leadership at multiple levels
Deep expertise in site reliability engineering and experience working with large-scale distributed systems
Ability to define and communicate a clear vision, aligning diverse stakeholders across a complex organization
Exposure to security practices such as penetration testing or adversarial validation
Ability to drive meaningful change and adjust approach based on context and organizational needs
Strong leadership and management skills, with experience building and leading high-performing teams
Excellent communication and collaboration skills, with ability to work effectively with cross-functional teams
Strong technical skills, with experience working with a range of technologies and systems

Nice to Have

Experience working in a cloud-based environment and familiarity with cloud-based technologies
Knowledge of machine learning and artificial intelligence and their applications in site reliability engineering
Experience working in a rapid-growth company and familiarity with the challenges and opportunities of scaling quickly
Familiarity with DevOps practices and tools, such as continuous integration and continuous deployment
Experience working in a remote or distributed team environment and familiarity with the challenges and opportunities of remote work

Benefits and Perks

Generous and competitive benefits package, including health insurance, retirement savings, and paid time off
New hire stock equity (RSUs) and employee stock purchase plan, providing opportunities for long-term financial growth and success
Continuous career development and pathing opportunities, including training and education programs and mentorship and coaching
Employee-focused best-in-class onboarding, providing a smooth transition into the company and role
Internal mentor and cross-departmental buddy program, providing opportunities for networking and career growth
Friendly and inclusive workplace culture, with a focus on diversity, equity, and inclusion
Flexible and remote work arrangements, including the ability to work from home or remotely and flexible hours and schedules
Access to cutting-edge technologies and tools, including the latest software and hardware
Opportunities for professional growth and development, including conferences, training, and education programs
Collaborative and dynamic work environment, with a focus on teamwork, communication, and innovation

How to Stand Out

Develop a strong understanding of site reliability engineering principles and practices, including resilience validation, chaos engineering, and penetration testing.
Build a portfolio of your work, including examples of your experience with large-scale distributed systems and your approach to driving reliability and security initiatives.
Prepare to talk about your experience working with cross-functional teams and driving organization-wide outcomes, including examples of your ability to collaborate and communicate effectively.
Research the company and role, including the company's mission, values, and culture, and be prepared to talk about why you are a good fit for the role and company.
Develop a strong understanding of the company's products and services, including the Datadog platform and its applications in observability and security.
Be prepared to talk about your approach to leadership and management, including your experience building and leading high-performing teams and your approach to developing leadership at multiple levels.
Highlight your ability to drive meaningful change and adjust your approach based on context and organizational needs, including examples of your experience working in dynamic, fast-evolving environments.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.