Member of Technical Staff, Training Infra Engineer
WFA Digital Insight
Demand for skilled AI engineers is skyrocketing, with the global AI market expected to reach
Job Description
## About the Role As a Member of Technical Staff at Cohere, you will be part of a dynamic team pushing the boundaries of AI research and development. Day-to-day, you will design and write high-performant and scalable software for training, working closely with researchers and engineers to bridge the gap between research and production. Your expertise will contribute significantly to the development of model training pipelines and the deployment of state-of-the-art models. You will also have the opportunity to work on improving the training setup from an infrastructure and codebase performance standpoint, crafting and implementing tools to speed up training cycles.
Cohere is a team of exceptional individuals passionate about their craft, and each member is considered one of the best in the world at what they do. The company believes that a diverse range of perspectives is crucial for building great products and is committed to creating an inclusive work environment. You will be working in a fast-paced, results-driven environment where your contributions will directly impact the company's mission to scale intelligence to serve humanity.
The role is remote-friendly, with offices in several major cities around the world, including Paris, where you can choose to be based. This flexibility, combined with the cutting-edge nature of the work, makes this an attractive opportunity for professionals seeking a challenge that aligns with their passion for AI.
## What You Will Do - Design and write high-performant and scalable software for training - Improve the training setup from an infrastructure and codebase performance standpoint - Craft and implement tools to speed up training cycles and improve the overall efficacy of the training infrastructure - Research, implement, and experiment with ideas on supercompute and data infrastructure - Learn from and work with the best researchers in the field - Contribute to writing production code and supporting the research effort depending on individual interest and organizational needs - Participate in the development of model training pipelines and the deployment of state-of-the-art models - Collaborate with cross-functional teams to identify and prioritize project requirements - Develop and maintain technical documentation of the training infrastructure - Stay updated with the latest developments in AI and related technologies
## What We Are Looking For - Extremely strong software engineering skills - Proficiency in Python and related ML frameworks such as JAX, Pytorch, and XLA/MLIR - Experience with distributed training infrastructures (Kubernetes, Slurm) and associated frameworks (Ray) - Experience using large-scale distributed training strategies - Hands-on experience training large models at scale and contributing to the tooling and/or setup of the training infrastructure - Strong understanding of AI principles and their applications - Ability to work collaboratively in a team environment - Excellent problem-solving skills and attention to detail - Strong communication skills
## Nice to Have - Paper at top-tier venues (such as NeurIPS, ICML, ICLR, AIStats, MLSys, JMLR, AAAI, Nature, COLING, ACL, EMNLP) - Experience with cloud-based services (AWS, Azure, Google Cloud) - Knowledge of containerization (Docker) and orchestration (Kubernetes) - Familiarity with agile development methodologies
## Benefits and Perks - An open and inclusive culture and work environment - Opportunity to work closely with a team on the cutting edge of AI research - Weekly lunch stipend, in-office lunches & snacks - Full health and dental benefits, including a separate budget to take care of your mental health - 100% Parental Leave top-up for up to 6 months - Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement - Remote-flexible work arrangement with offices in Toronto, New York, San Francisco, London, and Paris, as well as a co-working stipend - 6 weeks of vacation (30 working days)
How to Stand Out
- Highlight your technical expertise: Emphasize your experience with Python, ML frameworks, and distributed training infrastructures in your resume and cover letter.
- Showcase your passion for AI: Demonstrate your interest in AI research and development through personal projects, publications, or contributions to open-source projects.
- Prepare for technical interviews: Review common AI and software engineering interview questions and practice coding challenges to improve your problem-solving skills.
- Research the company: Understand Cohere's mission, values, and current projects to show your enthusiasm and commitment to the company's goals.
- Be ready to discuss your experience with large-scale distributed training: Share specific examples of your experience with training large models at scale and your contributions to the tooling and/or setup of the training infrastructure.
- Ask about the team and company culture: Show your interest in the company's inclusive work environment and your willingness to collaborate with cross-functional teams.
- Negotiate your compensation package: Consider discussing your salary, benefits, and perks to ensure they align with your expectations and industry standards.
This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.