Researcher, Artifacts - Agent Post-Training

OpenaiOpenai·Remote(San Francisco)
Other

WFA Digital Insight

The demand for skilled AI researchers has surged, with the market growing 25% in the past year. Openai is at the forefront, and this role offers a unique chance to work on cutting-edge models. With a strong background in machine learning and software engineering, you'll thrive in this role. Before applying, consider the importance of product impact and model behavior in the AI landscape, and be prepared to demonstrate your skills in areas like RL and data pipelines. The ability to work in a remote setting and collaborate with cross-functional teams is also crucial.

Job Description

About the Role

The Agent Post-Training team at Openai is responsible for creating the next generation of agents that can operate computers, collaborate with people, and expand what is possible. As a Researcher, Artifacts, you will play a key role in training frontier models to create polished, useful work products. Your day-to-day work will involve designing and running experiments, owning improvements to the post-training stack, and partnering with product teams to understand user needs.

The team's work spans coding, tool use, computer use, multi-agent coordination, and long-horizon execution. You will be working closely with researchers, engineers, and product teams to decide what should go into major model runs and measure the success of those runs. Your work will have a direct impact on the development of Openai's next agents.

Openai is committed to pushing the boundaries of what is possible with AI, and this role offers the opportunity to be at the forefront of that effort. You will be working on complex problems with a talented team of researchers and engineers who are passionate about creating the next generation of AI models.

What You Will Do

  • Design and run experiments that improve agentic model behavior for complex software and plugins
  • Own end-to-end improvements to the post-training stack, including RL, data pipelines, graders, reward signals, evals, diagnostics, and model-behavior analysis
  • Build evals and environments that expose the next set of model failures, then turn those failures into training data, product fixes, or new research directions
  • Partner with Codex and ChatGPT product teams to understand what users need and translate product signal into model improvements
  • Work on early-training and alignment interventions, including data mixtures, objectives, synthetic data, and eval loops that shape downstream agent behavior
  • Help decide which integrations, capabilities, and fixes are ready for inclusion in major model runs
  • Improve the machinery for large-scale training and launch: experiment velocity, reliability, observability, reproducibility, cost, latency, and production readiness
  • Take on cross-functional projects that touch model training, product infrastructure, and the production agent harness
  • Debug hard failures in shipped or near-shipped models and turn messy qualitative behavior into concrete hypotheses, experiments, and fixes

What We Are Looking For

  • Strong technical fundamentals in machine learning, software engineering, systems, statistics, or a related field
  • Hands-on experience with LLMs, RL, RLHF/RLAIF, post-training, evals, graders, synthetic data, model training, coding agents, or production ML systems
  • Ability to learn quickly across the parts of the field you have not worked in before
  • Experience working with complex software systems and plugins
  • Strong understanding of model behavior, product impact, and user needs
  • Ability to work in a remote setting and collaborate with cross-functional teams
  • Strong communication and problem-solving skills

Nice to Have

  • Experience with multi-agent systems or training directly against production-like environments
  • Knowledge of data mixtures, objectives, synthetic data, and eval loops
  • Familiarity with the production agent harness and model training infrastructure
  • Experience with debugging hard failures in shipped or near-shipped models

Benefits and Perks

  • Opportunity to work on cutting-edge AI models and contribute to the development of the next generation of agents
  • Collaborative and dynamic work environment with a talented team of researchers and engineers
  • Flexible remote work arrangements and stipend for remote work setup
  • Comprehensive health insurance and wellness programs
  • Generous PTO and paid holidays
  • Access to the latest tools, technologies, and training opportunities
  • Opportunity to work on cross-functional projects and collaborate with product teams,

How to Stand Out

  • Be prepared to demonstrate your skills in areas like RL, data pipelines, and graders, and show how you can apply them to real-world problems.
  • Highlight your experience working with complex software systems and plugins, and your ability to learn quickly across different parts of the field.
  • Show a strong understanding of model behavior, product impact, and user needs, and be able to articulate how you can contribute to the development of the next generation of agents.
  • Emphasize your ability to work in a remote setting and collaborate with cross-functional teams, and highlight your strong communication and problem-solving skills.
  • Be prepared to discuss your experience with debugging hard failures in shipped or near-shipped models, and how you can turn messy qualitative behavior into concrete hypotheses, experiments, and fixes.
  • Consider creating a portfolio that showcases your work in machine learning and software engineering, and be prepared to discuss your projects and experience in detail.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.