Member of Technical Staff, Training Performance Engineer

Cohere·Remote(London)

Software Development

WFA Digital Insight

Demand for AI-focused performance engineers is skyrocketing, with a 25% growth in job postings over the past year. As companies like Cohere continue to push the boundaries of natural language processing, skilled professionals with expertise in software engineering, machine learning, and low-level kernel design are in high demand. With remote work on the rise, candidates can now access these cutting-edge roles from anywhere. Before applying, consider highlighting your experience with large-scale distributed training strategies and autoregressive sequence models. Cohere stands out for its commitment to diversity and innovation, making this an exciting opportunity for those passionate about AI research and development.

Job Description

About the Role

The Member of Technical Staff, Training Performance Engineer role at Cohere is a unique opportunity to join a team of talented researchers, engineers, and designers passionate about advancing the field of natural language processing. As a performance engineer, you will play a crucial role in optimizing the performance of Cohere's advanced language models and systems. Your expertise in software engineering, machine learning, and low-level kernel design will be essential in improving key model training metrics, such as training throughput, and ensuring high accelerator utilization.

The Pre-Training team at Cohere combines expertise in software engineering, machine learning, and low-level kernel design to design robust systems and enhance model performance. As a member of this team, you will work closely with other experts to identify and remove performance bottlenecks, develop cutting-edge training and profiling tools, and drive innovation in the field of natural language processing.

Cohere is a dynamic and innovative company that values diversity and inclusivity. With offices in London, Toronto, New York, and San Francisco, the company also offers remote-friendly work arrangements, allowing you to collaborate with colleagues across different time zones.

What You Will Do

Design and write high-performant and scalable software for training
Understand architectural modifications and design choices and their effects on training throughput and quality
Write low-level CUDA, triton kernels to squeeze every last bit of performance from accelerators
Research, implement, and experiment with ideas on supercompute and data infrastructure
Learn from and work with the best researchers in the field
Collaborate with cross-functional teams to drive innovation and improvement
Develop and maintain large-scale distributed training strategies
Optimize and improve model performance and training efficiency
Stay up-to-date with the latest developments in natural language processing and machine learning

What We Are Looking For

Extremely strong software engineering skills
Proficiency in Python and related ML frameworks such as JAX, Pytorch, and XLA/MLIR
Experience writing kernels for GPUs using CUDA, triton, etc.
Experience using large-scale distributed training strategies
Familiarity with autoregressive sequence models, such as Transformers
Strong understanding of computer architecture and systems programming
Experience with machine learning and deep learning frameworks
Excellent problem-solving skills and attention to detail
Ability to work effectively in a team and collaborate with others

Nice to Have

Paper published at top-tier venues such as NeurIPS, ICML, ICLR, AIStats, MLSys, JMLR, AAAI, Nature, COLING, ACL, EMNLP
Experience with natural language processing and language models
Knowledge of cloud computing platforms and infrastructure
Familiarity with agile development methodologies and version control systems

Benefits and Perks

Competitive salary and benefits package
Opportunity to work on cutting-edge AI research and development projects
Collaborative and dynamic work environment with a team of experts
Flexible work arrangements, including remote work options
Professional development and growth opportunities
Access to the latest technologies and tools
Comprehensive health and dental benefits
Parental leave top-up for up to 6 months
Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
Weekly lunch stipend and in-office lunches and snacks
Co-worki and office spaces in London, Toronto, New York, San Francisco, and Paris

How to Stand Out

When applying for this role, make sure to highlight your experience with large-scale distributed training strategies and autoregressive sequence models.
Showcase your proficiency in Python and related ML frameworks, and demonstrate your ability to write low-level CUDA, triton kernels.
Be prepared to discuss your understanding of computer architecture and systems programming, as well as your experience with machine learning and deep learning frameworks.
Emphasize your problem-solving skills and attention to detail, and provide examples of how you have optimized and improved model performance and training efficiency in previous roles.
Consider creating a portfolio or repository of your work, including any published papers or projects, to demonstrate your expertise and showcase your skills to potential employers.
Research the company culture and values, and be prepared to discuss how you align with them and how you can contribute to the team's mission and goals.
Don't be afraid to ask about the company's approach to diversity and inclusion, and how they support the growth and development of their employees

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.