Senior Research Scientist, Reward Models

Anthropic·Remote(Remote-Friendly (Travel Required) | San Francisco, CA)

Other

WFA Digital Insight

As demand for AI specialists grows, Anthropic stands out for its focus on reliable and interpretable AI systems. With a 34% increase in AI-related job postings in 2025, professionals with expertise in reward models and large language models are in high demand. This role requires a unique blend of research skills and practical application, making it an exciting opportunity for those looking to drive innovation in AI.

Job Description

About the Role

The Senior Research Scientist on our Reward Models team will lead research efforts to improve how we specify and learn human preferences at scale. This role focuses on pushing the frontier of reward modeling for large language models.

Responsibilities

Lead research on novel reward model architectures and training approaches for RLHF
Develop and evaluate LLM-based grading and evaluation methods
Research techniques to detect, characterize, and mitigate reward hacking and specification gaming
Collaborate with cross-functional teams to ensure research translates into concrete improvements

How to Stand Out

Be prepared to discuss your experience with large language models and reward modeling in the context of AI alignment.
Showcase your ability to drive ambitious research agendas while also shipping practical improvements to production systems.
Highlight your understanding of the importance of interpretability and safety in AI systems.
Prepare examples of how you've collaborated with cross-functional teams to advance research goals.
Emphasize your proficiency in Python, as all interviews for this role are conducted in Python.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.