Member of Technical Staff, Training Performance Engineer
WFA Digital Insight
Demand for AI-focused performance engineers is skyrocketing, with a 25% growth in job postings over the past year. As companies like Cohere continue to push the boundaries of natural language processing, skilled professionals with expertise in software engineering, machine learning, and low-level kernel design are in high demand. With remote work on the rise, candidates can now access these cutting-edge roles from anywhere. Before applying, consider highlighting your experience with large-scale distributed training strategies and autoregressive sequence models. Cohere stands out for its commitment to diversity and innovation, making this an exciting opportunity for those passionate about AI research and development.
Job Description
About the Role
The Member of Technical Staff, Training Performance Engineer role at Cohere is a unique opportunity to join a team of talented researchers, engineers, and designers passionate about advancing the field of natural language processing. As a performance engineer, you will play a crucial role in optimizing the performance of Cohere's advanced language models and systems. Your expertise in software engineering, machine learning, and low-level kernel design will be essential in improving key model training metrics, such as training throughput, and ensuring high accelerator utilization.The Pre-Training team at Cohere combines expertise in software engineering, machine learning, and low-level kernel design to design robust systems and enhance model performance. As a member of this team, you will work closely with other experts to identify and remove performance bottlenecks, develop cutting-edge training and profiling tools, and drive innovation in the field of natural language processing.
Cohere is a dynamic and innovative company that values diversity and inclusivity. With offices in London, Toronto, New York, and San Francisco, the company also offers remote-friendly work arrangements, allowing you to collaborate with colleagues across different time zones.
What You Will Do
- Design and write high-performant and scalable software for training
- Understand architectural modifications and design choices and their effects on training throughput and quality
- Write low-level CUDA, triton kernels to squeeze every last bit of performance from accelerators
- Research, implement, and experiment with ideas on supercompute and data infrastructure
- Learn from and work with the best researchers in the field
- Collaborate with cross-functional teams to drive innovation and improvement
- Develop and maintain large-scale distributed training strategies
- Optimize and improve model performance and training efficiency
- Stay up-to-date with the latest developments in natural language processing and machine learning
What We Are Looking For
- Extremely strong software engineering skills
- Proficiency in Python and related ML frameworks such as JAX, Pytorch, and XLA/MLIR
- Experience writing kernels for GPUs using CUDA, triton, etc.
- Experience using large-scale distributed training strategies
- Familiarity with autoregressive sequence models, such as Transformers
- Strong understanding of computer architecture and systems programming
- Experience with machine learning and deep learning frameworks
- Excellent problem-solving skills and attention to detail
- Ability to work effectively in a team and collaborate with others
Nice to Have
- Paper published at top-tier venues such as NeurIPS, ICML, ICLR, AIStats, MLSys, JMLR, AAAI, Nature, COLING, ACL, EMNLP
- Experience with natural language processing and language models
- Knowledge of cloud computing platforms and infrastructure
- Familiarity with agile development methodologies and version control systems
Benefits and Perks
- Competitive salary and benefits package
- Opportunity to work on cutting-edge AI research and development projects
- Collaborative and dynamic work environment with a team of experts
- Flexible work arrangements, including remote work options
- Professional development and growth opportunities
- Access to the latest technologies and tools
- Comprehensive health and dental benefits
- Parental leave top-up for up to 6 months
- Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
- Weekly lunch stipend and in-office lunches and snacks
- Co-worki and office spaces in London, Toronto, New York, San Francisco, and Paris
How to Stand Out
- When applying for this role, make sure to highlight your experience with large-scale distributed training strategies and autoregressive sequence models.
- Showcase your proficiency in Python and related ML frameworks, and demonstrate your ability to write low-level CUDA, triton kernels.
- Be prepared to discuss your understanding of computer architecture and systems programming, as well as your experience with machine learning and deep learning frameworks.
- Emphasize your problem-solving skills and attention to detail, and provide examples of how you have optimized and improved model performance and training efficiency in previous roles.
- Consider creating a portfolio or repository of your work, including any published papers or projects, to demonstrate your expertise and showcase your skills to potential employers.
- Research the company culture and values, and be prepared to discuss how you align with them and how you can contribute to the team's mission and goals.
- Don't be afraid to ask about the company's approach to diversity and inclusion, and how they support the growth and development of their employees
This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.