Software Engineer, RL Data

Anthropic·Remote(London, UK; Remote-Friendly (Travel-Required) | San Francisco, CA | Seattle, WA | New York City, NY)

Software Development

WFA Digital Insight

As demand for AI and machine learning specialists continues to grow, companies like Anthropic are leading the charge in developing reliable and interpretable AI systems. With the job market for software engineers projected to increase by 21% in the next 5 years, professionals with expertise in reinforcement learning and data engineering are in high demand. Anthropic stands out for its commitment to AI safety research, making this role an exciting opportunity for those passionate about the societal impact of their work. Before applying, candidates should be prepared to demonstrate their technical skills, adaptability, and ability to iterate quickly in a fast-changing environment.

Job Description

About the Role

The Software Engineer, RL Data role at Anthropic is a unique opportunity to work on the development of reliable, interpretable, and steerable AI systems. As a member of the RL Data team, you will be responsible for building the systems that produce high-quality reinforcement learning data for Claude, Anthropic's AI model. This includes designing and implementing data collection pipelines, human feedback tooling, and quality assurance processes to ensure the data is trustworthy and effective.

The RL Data team is a quickly growing group of committed researchers, engineers, and experts working together to build beneficial AI systems. As a software engineer on this team, you will play a critical role in shaping the technical direction and development of the company's AI capabilities. Your work will have a direct impact on the success of the company and the advancement of AI safety research.

In this role, you will work closely with cross-functional teams, including research, product, and engineering teams, to design and develop systems that support the collection and analysis of reinforcement learning data. You will also collaborate with external partners and vendors to integrate their technologies and expertise into Anthropic's AI systems.

What You Will Do

Design, build, and maintain data collection pipelines and infrastructure to support reinforcement learning data collection
Develop and implement human feedback tooling and quality assurance processes to ensure data quality and accuracy
Collaborate with research teams to design and develop evals, graders, and other tools to support reinforcement learning data collection
Build and maintain interfaces for data collection, including user-facing tools and APIs
Work with operations, security, and compliance partners to roll out systems to new users and manage technical relationships with external data vendors
Embed with domain experts and teams who use Anthropic's systems day-to-day to design pipelines, support users, and ship improvements
Own significant parts of the stack end-to-end, from technical architecture through operational work
Build data collection pipelines, read transcripts, and iterate on prompts, evals, and graders until output is good
Develop and improve QA frameworks to catch reward hacking and ensure environment quality
Harden execution environments, including sandboxing, snapshotting, and tool coverage, to support task execution at training scale

What We Are Looking For

Strong software engineering skills and proficiency in at least one modern programming language (Python and TypeScript experience a plus)
Experience designing, building, and running backend systems or infrastructure
Effective use of AI tools in your own day-to-day work
Willingness to own problems end-to-end, including non-engineering aspects
Proactive, open communication style, with the ability to run a workstream and escalate issues early
Comfort iterating quickly in ambiguous, fast-changing situations
Care about the societal impacts of your work and a passion for AI safety research
Experience working with reinforcement learning, machine learning, or natural language processing
Strong understanding of software development principles, including testing, validation, and deployment

Nice to Have

Experience building LLM-powered systems, including prompt pipelines, evals, or products with models in the loop
Experience with reinforcement learning on LLMs, including creating environments, rewards, graders, or training data
Time as a forward-deployed engineer, founder, or early startup engineer, with experience owning and driving technical projects

Benefits and Perks

Competitive salary and equity package
Opportunity to work on cutting-edge AI research and development projects
Collaborative and dynamic work environment with a team of experienced researchers and engineers
Flexible working hours and remote work options
Professional development opportunities, including training and conference sponsorship
Access to state-of-the-art tools and technologies, including cloud infrastructure and machine learning frameworks
Comprehensive health and wellness benefits, including medical, dental, and vision insurance
Generous paid time off and holiday policy
Annual stipend for professional development and education

How to Stand Out

Tips for applying: Be prepared to demonstrate your technical skills in software engineering, AI, and machine learning, as well as your ability to work in a fast-paced, dynamic environment.
To stand out, highlight your experience working with reinforcement learning, LLMs, or other AI technologies, and be prepared to discuss your passion for AI safety research.
When preparing for the interview, review the company's mission and values, and be ready to discuss how your skills and experience align with Anthropic's goals.
In your portfolio, include examples of your work in software engineering, AI, or machine learning, and be prepared to walk the interviewer through your design and development process.
When negotiating salary, be prepared to discuss your expectations and requirements, and be open to negotiation and creative solutions.
Red flags to watch for: Be cautious of companies that prioritize profit over people or the environment, and be sure to research the company's values and mission before applying.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.