AI Researcher – Multilingual Data

Featherless AI·Remote·Work From Anywhere

AI & Machine Learning

WFA Digital Insight

The demand for AI researchers with expertise in multilingual data has skyrocketed, with a 25% increase in job postings over the past year. As a remote AI researcher at Featherless AI, you'll have the opportunity to work on cutting-edge language models, collaborating with a team of innovators. With the rise of globalization, companies are looking for professionals who can develop and implement AI solutions that cater to diverse languages and cultures. Featherless AI stands out for its commitment to publishing high-quality research and translating it into production systems, making it an attractive choice for those who want to make a real impact.

Job Description

About the Role

As a remote AI researcher at Featherless AI, you will play a crucial role in developing and scaling next-generation language models across diverse languages and domains. Your primary focus will be on multilingual data, designing and executing research on datasets, including data collection, filtering, deduplication, and quality measurement. You will also develop strategies for low-resource and long-tail languages, researching and improving cross-lingual transfer, alignment, and robustness in large language models.

The role requires a strong background in NLP/ML research, with a focus on multilingual or cross-lingual modeling. You will have the opportunity to work closely with engineers and researchers on training pipelines and model architecture decisions, as well as publish research at top venues and contribute to open-source projects.

Featherless AI values innovation and collaboration, providing a dynamic and supportive work environment that encourages creativity and growth. As a remote team member, you will have the flexibility to work from anywhere, with access to modern infrastructure and large datasets.

What You Will Do

Design and execute research on multilingual datasets, including data collection, filtering, deduplication, and quality measurement
Develop strategies for low-resource and long-tail languages, including sampling, augmentation, and curriculum design
Research and improve cross-lingual transfer, alignment, and robustness in large language models
Build and maintain evaluation benchmarks for multilingual performance
Collaborate with engineers and researchers on training pipelines and model architecture decisions
Publish research at top venues, such as ACL, EMNLP, NeurIPS, ICML, and ICLR
Contribute to open-source projects and translate research insights into practical improvements in production models
Develop and implement data quality metrics, filtering, and dataset bias detection
Work with large-scale text datasets across multiple languages, using tokenization and vocabulary design for multilingual models
Utilize transfer learning and multilingual representation learning to enhance model performance

What We Are Looking For

Strong background in NLP/ML research, with a focus on multilingual or cross-lingual modeling
Publication record at respected conferences or journals, such as ACL, EMNLP, NeurIPS, ICML, and ICLR
Experience working with large-scale text datasets across multiple languages
Solid understanding of tokenization and vocabulary design for multilingual models
Familiarity with data quality metrics, filtering, and dataset bias detection
Comfortable prototyping in Python with modern ML frameworks, such as PyTorch and JAX
Ability to operate independently and ship research in a startup pace environment
Strong communication and collaboration skills, with the ability to work effectively with engineers and researchers

Nice to Have

Experience with low-resource languages or non-Latin scripts
Open-source contributions in NLP or data tooling
Experience training or evaluating large language models
Familiarity with multilingual benchmarks, such as XTREME, FLORES, and TyDi QA

Benefits and Perks

Competitive compensation and meaningful equity at an early stage
Access to meaningful scale, including large datasets, modern infrastructure, and fast iteration
Flexible remote work arrangement, with the ability to work from anywhere
Opportunity to work on cutting-edge language models and collaborate with a team of innovators
Professional development opportunities, including conference attendance and training
Comprehensive health insurance and retirement plan
Generous PTO and holiday policy, with a focus on work-life balance

How to Stand Out

Develop a strong portfolio showcasing your research experience and publications in NLP/ML, particularly in multilingual or cross-lingual modeling.
Familiarize yourself with popular ML frameworks, such as PyTorch and JAX, and practice prototyping in Python.
Prepare to discuss your experience working with large-scale text datasets and your understanding of data quality metrics and dataset bias detection.
Highlight your ability to operate independently and ship research in a fast-paced environment, and be prepared to provide examples of your work.
Research Featherless AI's current projects and be prepared to discuss how your skills and experience align with their goals and values.
Be prepared to discuss your experience with open-source contributions and your willingness to collaborate with engineers and researchers.
Practice your communication skills, as you will be working remotely and collaborating with a team of innovators.

This is a remote position listed on WFA Digital, the platform for professionals who work from anywhere. Browse more remote jobs across all categories.