Staff Research Engineer, Model Efficiency

Cohere·Remote(New York)

Software Development

WFA Digital Insight

The demand for AI and machine learning experts is skyrocketing, with a 25% increase in job postings over the past year. As a Staff Research Engineer, Model Efficiency at Cohere, you'll be at the forefront of this trend, working on large language models that are pushing the boundaries of what AI systems can do. With a strong foundation in machine learning and a passion for innovation, you'll thrive in this role. Cohere's commitment to diversity and inclusion, as well as its remote-friendly environment, make it an attractive option for top talent. Before applying, consider highlighting your experience with model architecture optimization and decoding algorithm improvements.

Job Description

About the Role

As a Staff Research Engineer, Model Efficiency at Cohere, you will be responsible for developing, prototyping, and deploying techniques that improve the efficiency of large language models. You will be working on the Model Efficiency team, which is focused on pushing the limits of LLM inference efficiency across Cohere's foundation models. Your day-to-day work will involve exploring and shipping breakthroughs across the model execution stack, including model architecture and MoE routing optimization, decoding and inference-time algorithm improvements, and software/hardware co-design for GPU acceleration.

The Model Efficiency team is concentrated in the EST and PST time zones, and you will be working closely with a team of researchers, engineers, and designers who are passionate about their craft. You will be responsible for contributing to increasing the capabilities of Cohere's models and the value they drive for customers.

Cohere's mission is to scale intelligence to serve humanity, and as a Staff Research Engineer, Model Efficiency, you will be instrumental in achieving this goal. You will be working on cutting-edge models that are being used by developers and enterprises to build AI systems that power magical experiences like content generation, semantic search, and agents.

What You Will Do

Develop and deploy techniques to improve LLM inference efficiency across Cohere's foundation models
Explore and ship breakthroughs across the model execution stack, including model architecture and MoE routing optimization
Improve decoding and inference-time algorithm efficiency
Work on software/hardware co-design for GPU acceleration
Collaborate with the Model Efficiency team to identify and prioritize areas for improvement
Develop and maintain large-scale models and systems
Work closely with cross-functional teams, including research, engineering, and design
Stay up-to-date with the latest advancements in machine learning and AI
Participate in code reviews and ensure high-quality code
Mentor junior engineers and contribute to the growth of the team

What We Are Looking For

PhD in Machine Learning or a related field
Strong understanding of LLM architecture and optimization techniques
Experience with model efficiency techniques, including model pruning, knowledge distillation, and quantization
Strong software engineering skills, including proficiency in languages like Python and C++
Experience with deep learning frameworks, including TensorFlow and PyTorch
Strong understanding of computer architecture and GPU acceleration
Experience with agile development methodologies and version control systems like Git
Strong communication and collaboration skills
Ability to work in a fast-paced, high-ambiguity environment

Nice to Have

Experience with natural language processing and speech recognition
Familiarity with cloud-based infrastructure and containerization
Experience with DevOps and continuous integration/continuous deployment
Strong understanding of data structures and algorithms

Benefits and Perks

Competitive salary and benefits package
Opportunity to work on cutting-edge models and technologies
Collaborative and dynamic work environment
Flexible work arrangements, including remote work options
Professional development opportunities, including conferences and training
Access to state-of-the-art equipment and tools
Comprehensive health and dental benefits
Parental leave top-up and mental health support
Wellness and self-care programs, including fitness and meditation classes
Social events and team-building activities

How to Stand Out

Highlight your experience with model efficiency techniques, including model pruning, knowledge distillation, and quantization.
Showcase your understanding of LLM architecture and optimization techniques.
Emphasize your strong software engineering skills, including proficiency in languages like Python and C++.
Prepare to discuss your experience with deep learning frameworks, including TensorFlow and PyTorch.
Be ready to talk about your ability to work in a fast-paced, high-ambiguity environment and your experience with agile development methodologies.
Research Cohere's mission and values and be prepared to discuss how your skills and experience align with them.
Practice your communication and collaboration skills, as you will be working closely with cross-functional teams.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.