Lead Member of Technical Staff, Inference Infrastructure

CohereCohere·Remote(San Francisco)
Other

WFA Digital Insight

The demand for skilled technical leaders in AI and machine learning has surged, with a 25% increase in job postings over the past year. Cohere, a pioneer in AI platform development, is seeking a seasoned expert to drive technical direction and strategy. As a leader in this field, you'll need to navigate complex distributed systems, Kubernetes, and GPU workloads. With the global AI market projected to reach

90 billion by 2025, this role offers a chance to shape the future of AI adoption. Before applying, consider your experience in high-performance computing, NLP, and technical leadership.

Job Description

About the Role

The Lead Member of Technical Staff, Inference Infrastructure, is a critical role at Cohere, where you will provide technical leadership across multiple teams. You will drive the architecture and strategy for deploying optimized NLP models to production in low latency, high throughput, and high availability environments. As a key point of contact for customers, you will lead the design of customized deployments to meet their specific needs and mentor engineers to raise the technical bar across the team.

The Model Serving team at Cohere is responsible for developing, deploying, and operating the AI platform that delivers Cohere's large language models through easy-to-use API endpoints. You will work closely with this team to ensure seamless integration and deployment of NLP models.

As a technical leader, you will be responsible for driving technical direction and strategy, as well as mentoring and guiding engineers to achieve their full potential.

What You Will Do

  • Provide technical leadership across multiple teams, driving the architecture and strategy for deploying optimized NLP models
  • Lead the design of customized deployments to meet customer-specific needs
  • Mentor engineers to raise the technical bar across the team
  • Drive the development, deployment, and operation of the AI platform
  • Collaborate with the Model Serving team to ensure seamless integration and deployment of NLP models
  • Develop and maintain technical roadmaps and strategies for the inference infrastructure
  • Work closely with customers to understand their needs and provide technical guidance
  • Participate in the design and development of new features and capabilities
  • Collaborate with cross-functional teams to build mission-critical systems

What We Are Looking For

  • 8+ years of engineering experience running production infrastructure at a large scale
  • Demonstrated experience leading the architecture and design of large, highly available distributed systems
  • Deep expertise with Kubernetes dev and production coding and support
  • Extensive experience across GCP, Azure, AWS, OCI, and multi-cloud on-prem / hybrid serving environments
  • Proven ability to lead the design, deployment, support, and troubleshooting of complex Linux-based computing environments
  • Experience owning compute/storage/network resource and cost management at an organisational level
  • Exceptional collaboration and communication skills, with experience mentoring engineers
  • Strong expertise in the computational characteristics of accelerators (GPUs, TPUs, and/or custom accelerators)

Nice to Have

  • Experience with Golang, C++ or other languages designed for high-performance scalable servers
  • Knowledge of distributed systems, with experience establishing patterns and practices across engineering teams
  • Proficiency in setting team-wide standards and best practices

Benefits and Perks

  • Competitive compensation and equity package
  • Comprehensive health insurance and benefits
  • Flexible working hours and remote work arrangements
  • Professional development opportunities and training
  • Access to cutting-edge technology and tools
  • Collaborative and dynamic work environment
  • Recognition and rewards for outstanding performance

How to Stand Out

  • Be prepared to showcase your experience with Kubernetes and GPU workloads, as well as your ability to lead technical direction and strategy.
  • Highlight your understanding of distributed systems and NLP applications, and be ready to discuss your approach to complex technical challenges.
  • Emphasize your collaboration and communication skills, and provide examples of your experience mentoring engineers.
  • Familiarize yourself with Cohere's technology stack and be prepared to discuss how you can contribute to the company's mission.
  • Be prepared to negotiate your salary and benefits package, and consider factors such as flexible working hours and professional development opportunities.
  • Research the company culture and values, and be ready to discuss how you align with them.
  • Prepare examples of your experience with high-performance computing and technical leadership, and be ready to discuss your approach to driving technical innovation.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.