Researcher, Artifacts - Agent Post-Training

Openai·Remote(San Francisco)

Other

WFA Digital Insight

The demand for skilled AI researchers has surged, with the market growing 25% in the past year. Openai is at the forefront, and this role offers a unique chance to work on cutting-edge models. With a strong background in machine learning and software engineering, you'll thrive in this role. Before applying, consider the importance of product impact and model behavior in the AI landscape, and be prepared to demonstrate your skills in areas like RL and data pipelines. The ability to work in a remote setting and collaborate with cross-functional teams is also crucial.

Job Description

About the Role

The Agent Post-Training team at Openai is responsible for creating the next generation of agents that can operate computers, collaborate with people, and expand what is possible. As a Researcher, Artifacts, you will play a key role in training frontier models to create polished, useful work products. Your day-to-day work will involve designing and running experiments, owning improvements to the post-training stack, and partnering with product teams to understand user needs.

The team's work spans coding, tool use, computer use, multi-agent coordination, and long-horizon execution. You will be working closely with researchers, engineers, and product teams to decide what should go into major model runs and measure the success of those runs. Your work will have a direct impact on the development of Openai's next agents.

Openai is committed to pushing the boundaries of what is possible with AI, and this role offers the opportunity to be at the forefront of that effort. You will be working on complex problems with a talented team of researchers and engineers who are passionate about creating the next generation of AI models.

What You Will Do

Design and run experiments that improve agentic model behavior for complex software and plugins
Own end-to-end improvements to the post-training stack, including RL, data pipelines, graders, reward signals, evals, diagnostics, and model-behavior analysis
Build evals and environments that expose the next set of model failures, then turn those failures into training data, product fixes, or new research directions
Partner with Codex and ChatGPT product teams to understand what users need and translate product signal into model improvements
Work on early-training and alignment interventions, including data mixtures, objectives, synthetic data, and eval loops that shape downstream agent behavior
Help decide which integrations, capabilities, and fixes are ready for inclusion in major model runs
Improve the machinery for large-scale training and launch: experiment velocity, reliability, observability, reproducibility, cost, latency, and production readiness
Take on cross-functional projects that touch model training, product infrastructure, and the production agent harness
Debug hard failures in shipped or near-shipped models and turn messy qualitative behavior into concrete hypotheses, experiments, and fixes

What We Are Looking For

Strong technical fundamentals in machine learning, software engineering, systems, statistics, or a related field
Hands-on experience with LLMs, RL, RLHF/RLAIF, post-training, evals, graders, synthetic data, model training, coding agents, or production ML systems
Ability to learn quickly across the parts of the field you have not worked in before
Experience working with complex software systems and plugins
Strong understanding of model behavior, product impact, and user needs
Ability to work in a remote setting and collaborate with cross-functional teams
Strong communication and problem-solving skills

Nice to Have

Experience with multi-agent systems or training directly against production-like environments
Knowledge of data mixtures, objectives, synthetic data, and eval loops
Familiarity with the production agent harness and model training infrastructure
Experience with debugging hard failures in shipped or near-shipped models

Benefits and Perks

Opportunity to work on cutting-edge AI models and contribute to the development of the next generation of agents
Collaborative and dynamic work environment with a talented team of researchers and engineers
Flexible remote work arrangements and stipend for remote work setup
Comprehensive health insurance and wellness programs
Generous PTO and paid holidays
Access to the latest tools, technologies, and training opportunities
Opportunity to work on cross-functional projects and collaborate with product teams,

How to Stand Out

Be prepared to demonstrate your skills in areas like RL, data pipelines, and graders, and show how you can apply them to real-world problems.
Highlight your experience working with complex software systems and plugins, and your ability to learn quickly across different parts of the field.
Show a strong understanding of model behavior, product impact, and user needs, and be able to articulate how you can contribute to the development of the next generation of agents.
Emphasize your ability to work in a remote setting and collaborate with cross-functional teams, and highlight your strong communication and problem-solving skills.
Be prepared to discuss your experience with debugging hard failures in shipped or near-shipped models, and how you can turn messy qualitative behavior into concrete hypotheses, experiments, and fixes.
Consider creating a portfolio that showcases your work in machine learning and software engineering, and be prepared to discuss your projects and experience in detail.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.