Audio Inference Engineer, Model Efficiency

CohereCohere·Remote(New York)
Software Development

WFA Digital Insight

The demand for skilled audio inference engineers has skyrocketed, with the global AI market poised to reach

90 billion by 2027. As companies like Cohere push the boundaries of machine learning, professionals with expertise in high-performance audio systems are in high demand. With a focus on innovation and a commitment to diversity, Cohere stands out as a leader in the industry. Before applying, candidates should be prepared to showcase their proficiency in programming languages like C++ and Python, as well as their experience with deep learning models for audio applications.

Job Description

About the Role

Audio inference engineers play a critical role in advancing the capabilities of machine learning systems, and Cohere is at the forefront of this innovation. As an Audio Inference Engineer on the Model Efficiency team, you will be responsible for optimizing audio inference serving efficiency using cutting-edge techniques. Your day-to-day work will involve diving deep into systems, identifying bottlenecks, and delivering creative solutions for audio processing and streaming workloads.

The Model Efficiency team is a fast-growing group of committed researchers and engineers, and you will collaborate closely with both the training and serving infrastructure teams to ensure seamless integration between model development and deployment. With a focus on real-time and streaming audio inference, you will be working on the latest technologies and contributing to the development of reliable machine learning systems.

Cohere's mission is to scale intelligence to serve humanity, and the company is dedicated to creating a diverse and inclusive work environment. As a member of the team, you will be expected to contribute to this mission and work towards advancing the capabilities of the company's models.

What You Will Do

  • Advance core audio model serving metrics, including latency, throughput, and quality
  • Dive deep into systems to identify bottlenecks and deliver creative solutions for audio processing and streaming workloads
  • Collaborate closely with the training and serving infrastructure teams to ensure seamless integration between model development and deployment
  • Work on optimizing audio inference serving efficiency using innovative techniques
  • Contribute to the development of reliable machine learning systems
  • Participate in the design and implementation of new audio processing and streaming architectures
  • Stay up-to-date with the latest advancements in machine learning and audio processing technologies
  • Collaborate with cross-functional teams to ensure the successful deployment of models
  • Contribute to the development of internal tools and systems to support the work of the Model Efficiency team

What We Are Looking For

  • Significant experience developing high-performance audio or machine learning inference systems
  • Proficiency with programming languages such as C++ and Python
  • Hands-on experience with deep learning models for audio, speech, or language applications
  • A bias for action and a strong results-oriented mindset
  • Experience with GPU programming, low-level system optimization, and model parallelization techniques
  • Knowledge of duplex real-time streaming architectures
  • Experience with inference frameworks like vLLM, SGLang, Tensort-LLM, or custom distributed inference systems
  • Sequence modeling experience, including transformers for audio/speech
  • End-to-end audio pipeline optimization experience

Nice to Have

  • Experience with PyTorch, TensorFlow, or specialized audio libraries
  • Familiarity with machine learning frameworks for audio
  • Experience with distributed inference systems
  • Knowledge of cloud-based infrastructure and containerization

Benefits and Perks

  • Competitive salary and benefits package
  • Opportunity to work with a talented team of researchers and engineers
  • Collaborative and dynamic work environment
  • Flexible working hours and remote work options
  • Professional development opportunities, including training and education
  • Access to the latest technologies and tools
  • Recognition and reward for outstanding performance
  • Comprehensive health and dental benefits
  • Parental leave top-up and mental health support

How to Stand Out

  • Highlight your experience with deep learning models for audio applications and programming languages like C++ and Python.
  • Showcase your ability to work collaboratively with cross-functional teams and contribute to the development of reliable machine learning systems.
  • Be prepared to discuss your experience with GPU programming, low-level system optimization, and model parallelization techniques.
  • Emphasize your strong results-oriented mindset and bias for action.
  • Consider creating a portfolio that demonstrates your expertise in audio processing and streaming workloads.
  • Prepare to discuss your experience with inference frameworks and distributed inference systems.
  • Research the company culture and values to demonstrate your alignment with Cohere's mission and vision.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.