Software Engineer, RL Data

AnthropicAnthropic·Remote(London, UK; Remote-Friendly (Travel-Required) | San Francisco, CA | Seattle, WA | New York City, NY)
Software Development

WFA Digital Insight

As demand for AI and machine learning specialists continues to grow, companies like Anthropic are leading the charge in developing reliable and interpretable AI systems. With the job market for software engineers projected to increase by 21% in the next 5 years, professionals with expertise in reinforcement learning and data engineering are in high demand. Anthropic stands out for its commitment to AI safety research, making this role an exciting opportunity for those passionate about the societal impact of their work. Before applying, candidates should be prepared to demonstrate their technical skills, adaptability, and ability to iterate quickly in a fast-changing environment.

Job Description

About the Role

The Software Engineer, RL Data role at Anthropic is a unique opportunity to work on the development of reliable, interpretable, and steerable AI systems. As a member of the RL Data team, you will be responsible for building the systems that produce high-quality reinforcement learning data for Claude, Anthropic's AI model. This includes designing and implementing data collection pipelines, human feedback tooling, and quality assurance processes to ensure the data is trustworthy and effective.

The RL Data team is a quickly growing group of committed researchers, engineers, and experts working together to build beneficial AI systems. As a software engineer on this team, you will play a critical role in shaping the technical direction and development of the company's AI capabilities. Your work will have a direct impact on the success of the company and the advancement of AI safety research.

In this role, you will work closely with cross-functional teams, including research, product, and engineering teams, to design and develop systems that support the collection and analysis of reinforcement learning data. You will also collaborate with external partners and vendors to integrate their technologies and expertise into Anthropic's AI systems.

What You Will Do

  • Design, build, and maintain data collection pipelines and infrastructure to support reinforcement learning data collection
  • Develop and implement human feedback tooling and quality assurance processes to ensure data quality and accuracy
  • Collaborate with research teams to design and develop evals, graders, and other tools to support reinforcement learning data collection
  • Build and maintain interfaces for data collection, including user-facing tools and APIs
  • Work with operations, security, and compliance partners to roll out systems to new users and manage technical relationships with external data vendors
  • Embed with domain experts and teams who use Anthropic's systems day-to-day to design pipelines, support users, and ship improvements
  • Own significant parts of the stack end-to-end, from technical architecture through operational work
  • Build data collection pipelines, read transcripts, and iterate on prompts, evals, and graders until output is good
  • Develop and improve QA frameworks to catch reward hacking and ensure environment quality
  • Harden execution environments, including sandboxing, snapshotting, and tool coverage, to support task execution at training scale

What We Are Looking For

  • Strong software engineering skills and proficiency in at least one modern programming language (Python and TypeScript experience a plus)
  • Experience designing, building, and running backend systems or infrastructure
  • Effective use of AI tools in your own day-to-day work
  • Willingness to own problems end-to-end, including non-engineering aspects
  • Proactive, open communication style, with the ability to run a workstream and escalate issues early
  • Comfort iterating quickly in ambiguous, fast-changing situations
  • Care about the societal impacts of your work and a passion for AI safety research
  • Experience working with reinforcement learning, machine learning, or natural language processing
  • Strong understanding of software development principles, including testing, validation, and deployment

Nice to Have

  • Experience building LLM-powered systems, including prompt pipelines, evals, or products with models in the loop
  • Experience with reinforcement learning on LLMs, including creating environments, rewards, graders, or training data
  • Time as a forward-deployed engineer, founder, or early startup engineer, with experience owning and driving technical projects

Benefits and Perks

  • Competitive salary and equity package
  • Opportunity to work on cutting-edge AI research and development projects
  • Collaborative and dynamic work environment with a team of experienced researchers and engineers
  • Flexible working hours and remote work options
  • Professional development opportunities, including training and conference sponsorship
  • Access to state-of-the-art tools and technologies, including cloud infrastructure and machine learning frameworks
  • Comprehensive health and wellness benefits, including medical, dental, and vision insurance
  • Generous paid time off and holiday policy
  • Annual stipend for professional development and education

How to Stand Out

  • Tips for applying: Be prepared to demonstrate your technical skills in software engineering, AI, and machine learning, as well as your ability to work in a fast-paced, dynamic environment.
  • To stand out, highlight your experience working with reinforcement learning, LLMs, or other AI technologies, and be prepared to discuss your passion for AI safety research.
  • When preparing for the interview, review the company's mission and values, and be ready to discuss how your skills and experience align with Anthropic's goals.
  • In your portfolio, include examples of your work in software engineering, AI, or machine learning, and be prepared to walk the interviewer through your design and development process.
  • When negotiating salary, be prepared to discuss your expectations and requirements, and be open to negotiation and creative solutions.
  • Red flags to watch for: Be cautious of companies that prioritize profit over people or the environment, and be sure to research the company's values and mission before applying.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.