Lead Member of Technical Staff, Inference Infrastructure

Cohere·Remote(San Francisco)

Other

WFA Digital Insight

The demand for skilled technical leaders in AI and machine learning has surged, with a 25% increase in job postings over the past year. Cohere, a pioneer in AI platform development, is seeking a seasoned expert to drive technical direction and strategy. As a leader in this field, you'll need to navigate complex distributed systems, Kubernetes, and GPU workloads. With the global AI market projected to reach

90 billion by 2025, this role offers a chance to shape the future of AI adoption. Before applying, consider your experience in high-performance computing, NLP, and technical leadership.

Job Description

About the Role

The Lead Member of Technical Staff, Inference Infrastructure, is a critical role at Cohere, where you will provide technical leadership across multiple teams. You will drive the architecture and strategy for deploying optimized NLP models to production in low latency, high throughput, and high availability environments. As a key point of contact for customers, you will lead the design of customized deployments to meet their specific needs and mentor engineers to raise the technical bar across the team.

The Model Serving team at Cohere is responsible for developing, deploying, and operating the AI platform that delivers Cohere's large language models through easy-to-use API endpoints. You will work closely with this team to ensure seamless integration and deployment of NLP models.

As a technical leader, you will be responsible for driving technical direction and strategy, as well as mentoring and guiding engineers to achieve their full potential.

What You Will Do

Provide technical leadership across multiple teams, driving the architecture and strategy for deploying optimized NLP models
Lead the design of customized deployments to meet customer-specific needs
Mentor engineers to raise the technical bar across the team
Drive the development, deployment, and operation of the AI platform
Collaborate with the Model Serving team to ensure seamless integration and deployment of NLP models
Develop and maintain technical roadmaps and strategies for the inference infrastructure
Work closely with customers to understand their needs and provide technical guidance
Participate in the design and development of new features and capabilities
Collaborate with cross-functional teams to build mission-critical systems

What We Are Looking For

8+ years of engineering experience running production infrastructure at a large scale
Demonstrated experience leading the architecture and design of large, highly available distributed systems
Deep expertise with Kubernetes dev and production coding and support
Extensive experience across GCP, Azure, AWS, OCI, and multi-cloud on-prem / hybrid serving environments
Proven ability to lead the design, deployment, support, and troubleshooting of complex Linux-based computing environments
Experience owning compute/storage/network resource and cost management at an organisational level
Exceptional collaboration and communication skills, with experience mentoring engineers
Strong expertise in the computational characteristics of accelerators (GPUs, TPUs, and/or custom accelerators)

Nice to Have

Experience with Golang, C++ or other languages designed for high-performance scalable servers
Knowledge of distributed systems, with experience establishing patterns and practices across engineering teams
Proficiency in setting team-wide standards and best practices

Benefits and Perks

Competitive compensation and equity package
Comprehensive health insurance and benefits
Flexible working hours and remote work arrangements
Professional development opportunities and training
Access to cutting-edge technology and tools
Collaborative and dynamic work environment
Recognition and rewards for outstanding performance

How to Stand Out

Be prepared to showcase your experience with Kubernetes and GPU workloads, as well as your ability to lead technical direction and strategy.
Highlight your understanding of distributed systems and NLP applications, and be ready to discuss your approach to complex technical challenges.
Emphasize your collaboration and communication skills, and provide examples of your experience mentoring engineers.
Familiarize yourself with Cohere's technology stack and be prepared to discuss how you can contribute to the company's mission.
Be prepared to negotiate your salary and benefits package, and consider factors such as flexible working hours and professional development opportunities.
Research the company culture and values, and be ready to discuss how you align with them.
Prepare examples of your experience with high-performance computing and technical leadership, and be ready to discuss your approach to driving technical innovation.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.