Audio Inference Engineer, Model Efficiency

Cohere·Remote(New York)

Software Development

WFA Digital Insight

The demand for skilled audio inference engineers has skyrocketed, with the global AI market poised to reach

90 billion by 2027. As companies like Cohere push the boundaries of machine learning, professionals with expertise in high-performance audio systems are in high demand. With a focus on innovation and a commitment to diversity, Cohere stands out as a leader in the industry. Before applying, candidates should be prepared to showcase their proficiency in programming languages like C++ and Python, as well as their experience with deep learning models for audio applications.

Job Description

About the Role

Audio inference engineers play a critical role in advancing the capabilities of machine learning systems, and Cohere is at the forefront of this innovation. As an Audio Inference Engineer on the Model Efficiency team, you will be responsible for optimizing audio inference serving efficiency using cutting-edge techniques. Your day-to-day work will involve diving deep into systems, identifying bottlenecks, and delivering creative solutions for audio processing and streaming workloads.

The Model Efficiency team is a fast-growing group of committed researchers and engineers, and you will collaborate closely with both the training and serving infrastructure teams to ensure seamless integration between model development and deployment. With a focus on real-time and streaming audio inference, you will be working on the latest technologies and contributing to the development of reliable machine learning systems.

Cohere's mission is to scale intelligence to serve humanity, and the company is dedicated to creating a diverse and inclusive work environment. As a member of the team, you will be expected to contribute to this mission and work towards advancing the capabilities of the company's models.

What You Will Do

Advance core audio model serving metrics, including latency, throughput, and quality
Dive deep into systems to identify bottlenecks and deliver creative solutions for audio processing and streaming workloads
Collaborate closely with the training and serving infrastructure teams to ensure seamless integration between model development and deployment
Work on optimizing audio inference serving efficiency using innovative techniques
Contribute to the development of reliable machine learning systems
Participate in the design and implementation of new audio processing and streaming architectures
Stay up-to-date with the latest advancements in machine learning and audio processing technologies
Collaborate with cross-functional teams to ensure the successful deployment of models
Contribute to the development of internal tools and systems to support the work of the Model Efficiency team

What We Are Looking For

Significant experience developing high-performance audio or machine learning inference systems
Proficiency with programming languages such as C++ and Python
Hands-on experience with deep learning models for audio, speech, or language applications
A bias for action and a strong results-oriented mindset
Experience with GPU programming, low-level system optimization, and model parallelization techniques
Knowledge of duplex real-time streaming architectures
Experience with inference frameworks like vLLM, SGLang, Tensort-LLM, or custom distributed inference systems
Sequence modeling experience, including transformers for audio/speech
End-to-end audio pipeline optimization experience

Nice to Have

Experience with PyTorch, TensorFlow, or specialized audio libraries
Familiarity with machine learning frameworks for audio
Experience with distributed inference systems
Knowledge of cloud-based infrastructure and containerization

Benefits and Perks

Competitive salary and benefits package
Opportunity to work with a talented team of researchers and engineers
Collaborative and dynamic work environment
Flexible working hours and remote work options
Professional development opportunities, including training and education
Access to the latest technologies and tools
Recognition and reward for outstanding performance
Comprehensive health and dental benefits
Parental leave top-up and mental health support

How to Stand Out

Highlight your experience with deep learning models for audio applications and programming languages like C++ and Python.
Showcase your ability to work collaboratively with cross-functional teams and contribute to the development of reliable machine learning systems.
Be prepared to discuss your experience with GPU programming, low-level system optimization, and model parallelization techniques.
Emphasize your strong results-oriented mindset and bias for action.
Consider creating a portfolio that demonstrates your expertise in audio processing and streaming workloads.
Prepare to discuss your experience with inference frameworks and distributed inference systems.
Research the company culture and values to demonstrate your alignment with Cohere's mission and vision.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.