Member of Technical Staff (Data Scientist, Evals)
WFA Digital Insight
The demand for skilled data scientists in the digital marketing space continues to soar, with a particular emphasis on expertise in programmatic and LLM evaluation. As companies like Perplexity push the boundaries of AI-driven search engines, the need for professionals who can develop and maintain high-quality answer evaluation systems has never been more pressing. With the market for specialized data scientists expected to grow significantly, candidates with a strong background in data science, machine learning, and Python are in high demand. Perplexity stands out as a leader in this field, and candidates applying for this role should be prepared to bring their expertise to the forefront and drive innovation in answer quality. Before applying, it's crucial for candidates to understand the latest developments in LLM evaluation and how they can contribute to Perplexity's mission.
Job Description
About the Role
The Data Scientist, Evals position at Perplexity is a critical role that involves building and maintaining specialized evaluation systems to assess the quality of answers provided by the company's AI-driven search engine. This role is at the heart of Perplexity's efforts to delivering high-quality, reliable answers to its tens of millions of users daily. The successful candidate will work within a small, high-impact team, collaborating closely with technical leadership to measure and improve answer quality.Day-to-day, the Data Scientist will be responsible for architecting and maintaining automated evaluation pipelines, designing evaluation sets and methods, and developing VLM-based solutions to assess how final answers render visually across different platforms and devices. The role requires a deep understanding of data science, machine learning, and the latest developments in LLM evaluation.
As part of the Perplexity team, the Data Scientist will have the opportunity to work on challenging problems, apply the latest research methods to real-world problems, and contribute to the development of cutting-edge technologies.
What You Will Do
- Architect and maintain automated evaluation pipelines to assess answer quality across Perplexity's products
- Design evaluation sets and methods specifically to measure the impact of tool calls on the final answer's quality
- Develop VLM-based solutions to programmatically evaluate how final answers render visually across different platforms and devices
- Continuously review public benchmarks and academic evaluations for their applicability to the Perplexity product
- Adapt and incorporate public benchmarks into regular performance measurements
- Collaborate closely with technical leadership to measure and improve answer quality
- Operate within a small, high-impact team where evaluation metrics directly shape product changes
- Stay up-to-date with the latest developments in LLM evaluation and apply this knowledge to improve Perplexity's answer quality
- Develop and maintain large-scale data pipelines and architectures
- Work with cross-functional teams to identify areas for improvement and implement data-driven solutions
What We Are Looking For
- PhD or MS in a technical field or equivalent experience
- 4+ years of experience in data science or machine learning
- Strong proficiency in Python and SQL
- Experience building within a modern cloud data stack, specifically AWS and Databricks
- Comfortable with agentic coding workflows and using AI-assisted development tools
- Experience with data visualization tools and technologies
- Strong understanding of data structures and algorithms
- Experience with machine learning frameworks and libraries
- Excellent problem-solving skills and attention to detail
Nice to Have
- 1+ years of experience working with LLMs at scale, specifically with LLM-as-a-judge setups
- Prior experience working on customer-facing web products or consumer apps, with real user traffic at scale
- A strong research background, with experience applying research methods to real-world ML problems
- Experience defining evaluation metrics and building ground truth datasets
Benefits and Perks
- Competitive compensation package
- Opportunities for professional growth and development
- Collaborative and dynamic work environment
- Flexible working hours and remote work options
- Access to cutting-edge technologies and tools
- Comprehensive health insurance package
- Generous PTO and vacation days
- Employee recognition and reward programs
- Professional development and training opportunities
- Stock options or equity participation
How to Stand Out
- Develop a strong portfolio showcasing your experience with data science, machine learning, and LLM evaluation, including examples of evaluation pipelines and VLM-based solutions you've developed.
- Familiarize yourself with the latest developments in LLM evaluation and be prepared to discuss how you can apply this knowledge to improve Perplexity's answer quality.
- Highlight your experience with Python, SQL, and cloud data stacks, such as AWS and Databricks, and be prepared to provide examples of how you've used these technologies in previous roles.
- Emphasize your understanding of data structures and algorithms, as well as your experience with machine learning frameworks and libraries.
- Prepare to discuss your approach to collaboration and how you've worked with cross-functional teams in the past to drive data-driven solutions.
- Be ready to provide specific examples of how you've defined evaluation metrics and built ground truth datasets in previous roles.
- Research Perplexity's products and services, and be prepared to discuss how your skills and experience align with the company's mission and goals.
This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.