Senior Research Engineer, Model Evaluation

Cohere·Remote(Toronto)

Software Development

WFA Digital Insight

As demand for AI solutions surges, companies like Cohere are driving innovation in model evaluation. With a 25% growth in AI-related job postings in the last year, skilled professionals in AI research are in high demand. This role stands out for its focus on scalable evaluation methods, offering a chance to work with top researchers and engineers. Candidates should be prepared to showcase their expertise in developing evaluation benchmarks and datasets, as well as their experience with large language models. With the AI market projected to reach

90 billion by 2025, this is an exciting time to join a company at the forefront of AI research.

Job Description

About the Role

The Senior Research Engineer, Model Evaluation role at Cohere is a unique opportunity to contribute to the development of cutting-edge AI models. As a member of the research team, you will be responsible for creating next-generation evaluation methods and scalable infrastructure to measure the performance of large language models. This role matters because it will enable the widespread adoption of AI solutions, driving innovation and progress in various industries.

The day-to-day responsibilities of this role will involve collaborating with cross-functional teams, including researchers, engineers, and designers. You will be working closely with the best researchers and engineers in the field to push the state-of-the-art in LLM evaluation methods. The team context is fast-paced and dynamic, with a focus on delivering high-quality results and driving progress in AI research.

What You Will Do

Develop evaluation benchmarks, datasets, and environments for measuring the bleeding edge of model capabilities
Conduct research to push the state-of-the-art in LLM evaluation methods, including training LLM judges and improving evaluation efficiency
Build scalable tools for investigating and understanding evaluation results that are used by all members of technical staff at Cohere, as well as leadership and the CEO
Learn from and work with the best researchers and engineers in the field to develop new methods and techniques for evaluating AI models
Collaborate with cross-functional teams to integrate evaluation methods into the development pipeline
Develop and maintain high-quality datasets and evaluation environments for AI models
Investigate and analyze the performance of AI models using various evaluation methods
Identify and mitigate biases in AI models using robust evaluation techniques
Develop and implement automated testing and evaluation frameworks for AI models
Stay up-to-date with the latest developments in AI research and evaluation methods

What We Are Looking For

A strong background in computer science, AI, or a related field, with a focus on AI research and evaluation methods
Experience developing evaluation benchmarks, datasets, and environments for AI models
A track record of developing new methods and/or data to evaluate AI models, including publications at top-tier conferences
Deep experience building with and around large language models, including developing tools for analyzing and understanding their performance
Strong software engineering skills, with proficiency in languages such as Python and Java
Experience working with cloud-based infrastructure and scalable computing environments
Strong collaboration and communication skills, with the ability to work with cross-functional teams
A strong understanding of AI ethics and the ability to identify and mitigate biases in AI models

Nice to Have

Experience working with deep learning frameworks such as TensorFlow or PyTorch
Knowledge of natural language processing and computer vision techniques
Experience working with large datasets and developing data pipelines
Familiarity with agile development methodologies and version control systems
Experience working in a fast-paced and dynamic environment, with a focus on delivering high-quality results

Benefits and Perks

Competitive compensation package, including salary and equity
Comprehensive health and dental benefits, including a separate budget for mental health
100% parental leave top-up for up to 6 months
Personal enrichment benefits, including arts and culture, fitness and well-being, quality time, and workspace improvement
Remote-flexible work arrangement, with offices in Toronto, New York, San Francisco, London, and Paris
Co-working stipend and allowance for remote work expenses
6 weeks of vacation (30 working days) and flexible PTO policy
Access to cutting-edge technology and tools, including cloud-based infrastructure and scalable computing environments
Opportunities for professional development and growth, including training and education programs

How to Stand Out

To stand out as a candidate, be prepared to showcase your expertise in developing evaluation benchmarks and datasets for AI models.
Highlight your experience working with large language models and developing tools for analyzing and understanding their performance.
Familiarize yourself with the latest developments in AI research and evaluation methods, and be prepared to discuss your thoughts on the future of AI evaluation.
Be prepared to provide specific examples of your experience working on complex AI projects and your approach to evaluating AI models.
Consider creating a portfolio of your work, including publications, presentations, and code samples, to demonstrate your skills and experience.
Be prepared to discuss your approach to identifying and mitigating biases in AI models, and your understanding of AI ethics.
Research the company culture and values, and be prepared to discuss how your skills and experience align with Cohere's mission and goals.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.