Member of Technical Staff (Data Scientist, Evals)

Perplexity·Remote(London)

Data & Analytics

Programmatic

WFA Digital Insight

The demand for skilled data scientists in the digital marketing space continues to soar, with a particular emphasis on expertise in programmatic and LLM evaluation. As companies like Perplexity push the boundaries of AI-driven search engines, the need for professionals who can develop and maintain high-quality answer evaluation systems has never been more pressing. With the market for specialized data scientists expected to grow significantly, candidates with a strong background in data science, machine learning, and Python are in high demand. Perplexity stands out as a leader in this field, and candidates applying for this role should be prepared to bring their expertise to the forefront and drive innovation in answer quality. Before applying, it's crucial for candidates to understand the latest developments in LLM evaluation and how they can contribute to Perplexity's mission.

Job Description

About the Role

The Data Scientist, Evals position at Perplexity is a critical role that involves building and maintaining specialized evaluation systems to assess the quality of answers provided by the company's AI-driven search engine. This role is at the heart of Perplexity's efforts to delivering high-quality, reliable answers to its tens of millions of users daily. The successful candidate will work within a small, high-impact team, collaborating closely with technical leadership to measure and improve answer quality.

Day-to-day, the Data Scientist will be responsible for architecting and maintaining automated evaluation pipelines, designing evaluation sets and methods, and developing VLM-based solutions to assess how final answers render visually across different platforms and devices. The role requires a deep understanding of data science, machine learning, and the latest developments in LLM evaluation.

As part of the Perplexity team, the Data Scientist will have the opportunity to work on challenging problems, apply the latest research methods to real-world problems, and contribute to the development of cutting-edge technologies.

What You Will Do

Architect and maintain automated evaluation pipelines to assess answer quality across Perplexity's products
Design evaluation sets and methods specifically to measure the impact of tool calls on the final answer's quality
Develop VLM-based solutions to programmatically evaluate how final answers render visually across different platforms and devices
Continuously review public benchmarks and academic evaluations for their applicability to the Perplexity product
Adapt and incorporate public benchmarks into regular performance measurements
Collaborate closely with technical leadership to measure and improve answer quality
Operate within a small, high-impact team where evaluation metrics directly shape product changes
Stay up-to-date with the latest developments in LLM evaluation and apply this knowledge to improve Perplexity's answer quality
Develop and maintain large-scale data pipelines and architectures
Work with cross-functional teams to identify areas for improvement and implement data-driven solutions

What We Are Looking For

PhD or MS in a technical field or equivalent experience
4+ years of experience in data science or machine learning
Strong proficiency in Python and SQL
Experience building within a modern cloud data stack, specifically AWS and Databricks
Comfortable with agentic coding workflows and using AI-assisted development tools
Experience with data visualization tools and technologies
Strong understanding of data structures and algorithms
Experience with machine learning frameworks and libraries
Excellent problem-solving skills and attention to detail

Nice to Have

1+ years of experience working with LLMs at scale, specifically with LLM-as-a-judge setups
Prior experience working on customer-facing web products or consumer apps, with real user traffic at scale
A strong research background, with experience applying research methods to real-world ML problems
Experience defining evaluation metrics and building ground truth datasets

Benefits and Perks

Competitive compensation package
Opportunities for professional growth and development
Collaborative and dynamic work environment
Flexible working hours and remote work options
Access to cutting-edge technologies and tools
Comprehensive health insurance package
Generous PTO and vacation days
Employee recognition and reward programs
Professional development and training opportunities
Stock options or equity participation

How to Stand Out

Develop a strong portfolio showcasing your experience with data science, machine learning, and LLM evaluation, including examples of evaluation pipelines and VLM-based solutions you've developed.
Familiarize yourself with the latest developments in LLM evaluation and be prepared to discuss how you can apply this knowledge to improve Perplexity's answer quality.
Highlight your experience with Python, SQL, and cloud data stacks, such as AWS and Databricks, and be prepared to provide examples of how you've used these technologies in previous roles.
Emphasize your understanding of data structures and algorithms, as well as your experience with machine learning frameworks and libraries.
Prepare to discuss your approach to collaboration and how you've worked with cross-functional teams in the past to drive data-driven solutions.
Be ready to provide specific examples of how you've defined evaluation metrics and built ground truth datasets in previous roles.
Research Perplexity's products and services, and be prepared to discuss how your skills and experience align with the company's mission and goals.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.