Staff Research Engineer, Model Efficiency
WFA Digital Insight
The demand for AI and machine learning experts is skyrocketing, with a 25% increase in job postings over the past year. As a Staff Research Engineer, Model Efficiency at Cohere, you'll be at the forefront of this trend, working on large language models that are pushing the boundaries of what AI systems can do. With a strong foundation in machine learning and a passion for innovation, you'll thrive in this role. Cohere's commitment to diversity and inclusion, as well as its remote-friendly environment, make it an attractive option for top talent. Before applying, consider highlighting your experience with model architecture optimization and decoding algorithm improvements.
Job Description
About the Role
As a Staff Research Engineer, Model Efficiency at Cohere, you will be responsible for developing, prototyping, and deploying techniques that improve the efficiency of large language models. You will be working on the Model Efficiency team, which is focused on pushing the limits of LLM inference efficiency across Cohere's foundation models. Your day-to-day work will involve exploring and shipping breakthroughs across the model execution stack, including model architecture and MoE routing optimization, decoding and inference-time algorithm improvements, and software/hardware co-design for GPU acceleration.The Model Efficiency team is concentrated in the EST and PST time zones, and you will be working closely with a team of researchers, engineers, and designers who are passionate about their craft. You will be responsible for contributing to increasing the capabilities of Cohere's models and the value they drive for customers.
Cohere's mission is to scale intelligence to serve humanity, and as a Staff Research Engineer, Model Efficiency, you will be instrumental in achieving this goal. You will be working on cutting-edge models that are being used by developers and enterprises to build AI systems that power magical experiences like content generation, semantic search, and agents.
What You Will Do
- Develop and deploy techniques to improve LLM inference efficiency across Cohere's foundation models
- Explore and ship breakthroughs across the model execution stack, including model architecture and MoE routing optimization
- Improve decoding and inference-time algorithm efficiency
- Work on software/hardware co-design for GPU acceleration
- Collaborate with the Model Efficiency team to identify and prioritize areas for improvement
- Develop and maintain large-scale models and systems
- Work closely with cross-functional teams, including research, engineering, and design
- Stay up-to-date with the latest advancements in machine learning and AI
- Participate in code reviews and ensure high-quality code
- Mentor junior engineers and contribute to the growth of the team
What We Are Looking For
- PhD in Machine Learning or a related field
- Strong understanding of LLM architecture and optimization techniques
- Experience with model efficiency techniques, including model pruning, knowledge distillation, and quantization
- Strong software engineering skills, including proficiency in languages like Python and C++
- Experience with deep learning frameworks, including TensorFlow and PyTorch
- Strong understanding of computer architecture and GPU acceleration
- Experience with agile development methodologies and version control systems like Git
- Strong communication and collaboration skills
- Ability to work in a fast-paced, high-ambiguity environment
Nice to Have
- Experience with natural language processing and speech recognition
- Familiarity with cloud-based infrastructure and containerization
- Experience with DevOps and continuous integration/continuous deployment
- Strong understanding of data structures and algorithms
Benefits and Perks
- Competitive salary and benefits package
- Opportunity to work on cutting-edge models and technologies
- Collaborative and dynamic work environment
- Flexible work arrangements, including remote work options
- Professional development opportunities, including conferences and training
- Access to state-of-the-art equipment and tools
- Comprehensive health and dental benefits
- Parental leave top-up and mental health support
- Wellness and self-care programs, including fitness and meditation classes
- Social events and team-building activities
How to Stand Out
- Highlight your experience with model efficiency techniques, including model pruning, knowledge distillation, and quantization.
- Showcase your understanding of LLM architecture and optimization techniques.
- Emphasize your strong software engineering skills, including proficiency in languages like Python and C++.
- Prepare to discuss your experience with deep learning frameworks, including TensorFlow and PyTorch.
- Be ready to talk about your ability to work in a fast-paced, high-ambiguity environment and your experience with agile development methodologies.
- Research Cohere's mission and values and be prepared to discuss how your skills and experience align with them.
- Practice your communication and collaboration skills, as you will be working closely with cross-functional teams.
This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.