Senior AI Engineer - APM Integrations

DatadogDatadog·Remote(Portugal, Remote)
Software Development
Excel

WFA Digital Insight

As demand for AI and ML specialists continues to grow, with a 28% increase in job postings in 2025, companies like Datadog are looking for experts to drive innovation in their APM integrations. With a strong focus on simplicity, correctness, and performance, this Senior AI Engineer role stands out in the current remote job market. Candidates should be aware that this role requires not only technical expertise but also the ability to navigate ambiguity and iterate from prototype to production, making it a challenging yet rewarding opportunity for the right candidate.

Job Description

About the Role

The Senior AI Engineer - APM Integrations role at Datadog is a unique opportunity for a product-minded engineer to ship AI to production, focusing on building tools that make it easier for engineers to build and maintain integrations over time. As part of the IDM team, you will work closely with other teams to understand their workflows and build solutions that fit their needs, defining what 'good' means for these tools and setting up evaluation and testing processes. This is a senior role with high ownership, from early prototypes to production and ongoing support, requiring strong ML fundamentals, experience with distributed systems, and a production operations mindset.

About the Team

The IDM team is responsible for connecting Datadog to the tools and services customers use, aiming to keep quality and reliability high. You will be part of a hybrid workplace that values office culture and collaboration while ensuring a work-life harmony that best fits each team member.

What You Will Do

  • Build agent workflows that take an integration need from plan to implementation and validation with humans approving at the right checkpoints
  • Create systems that synthesize context from codebases, docs, specs, telemetry, and historical incidents to make changes that match Datadog conventions and customer expectations
  • Generate and evolve integration code and tests, including end-to-end scenarios that reflect real customer workloads and product features
  • Design evaluation harnesses that prevent silent regressions: golden sets, scenario baselines, semantic checks, performance thresholds, and release gating
  • Build portfolio-level automation: proactive updates for upstream breaking changes, tracer feature rollouts across the catalog, migrations to new schemas/semantics, and targeted coverage expansion
  • Partner tightly with PM, support engineers, and integration-owning teams to make the system adoptable, trustworthy, and embedded in daily engineering workflows
  • Define and track key metrics to measure the success of the AI-assisted tools and identify areas for improvement
  • Work closely with cross-functional teams to ensure seamless integration of the AI-assisted tools with existing systems and workflows

What We Are Looking For

  • 6+ years building backend systems (Go, Java, or .NET) with a strong focus on simplicity, correctness, and performance
  • Proven experience delivering LLM/agent features to production (prompting, tooling, evals, safety/guardrails)
  • Comfortable navigating ambiguity, iterating from prototype to production, and measuring impact with clear metrics
  • Demonstrated ability to use AI coding tools in day-to-day workflows and validate, critique, and refine AI-generated output
  • Strong ML fundamentals, including a solid grasp of the ML lifecycle and statistics for experiments
  • Experience with microservices performance: tracing, latency breakdowns, concurrency, resiliency patterns
  • Production operations mindset: monitoring, alerting, and participating in on-call rotations where applicable
  • Ability to communicate technical ideas and results to both technical and non-technical audiences

Nice to Have

  • Hands-on experience with distributed tracing stacks (OpenTelemetry/Datadog APM), profilers, and logs/metrics pipelines
  • Experience with planning/agent frameworks, tool-use orchestration, RAG, and retrieval/indexing over large context
  • Experience building developer tools (IDEs, static analysis, compilers, code transformation)
  • Familiarity with Excel for data analysis and reporting

Benefits and Perks

  • Competitive salary and equity package
  • Comprehensive health insurance and wellness programs
  • Generous PTO policy and flexible working hours
  • Remote work stipend and home office setup support
  • Professional development opportunities and conference sponsorship
  • Access to cutting-edge technologies and tools
  • Collaborative and dynamic work environment with a team of experienced professionals

How to Stand Out

  • Ensure your portfolio showcases your experience with AI-assisted tool development and ML lifecycle management.
  • Practice explaining complex technical concepts to non-technical audiences, as this will be a key part of your role.
  • Familiarize yourself with Datadog's APM integrations and be prepared to discuss how you can contribute to their development.
  • Highlight your experience with distributed systems and microservices performance in your resume and during interviews.
  • Be prepared to discuss your approach to measuring the success of AI-assisted tools and how you stay up-to-date with the latest developments in ML and AI.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.