TixelJobs
F
Featherlessaivia Ashby

AI Researcher – Multilingual Data

REMOTEPosted 3mo ago
ResearchMid LevelFull-time

Not sure if you're a good fit?

Upload your resume and TixelJobs AI will compare it against AI Researcher – Multilingual Data at Featherlessai. Get a match score, missing keywords, and improvement tips before you apply.

Free preview · Your resume stays private

About the Role

ABOUT THE ROLE

We’re looking for an AI Researcher focused on multilingual data to help us build and scale next-generation language models across diverse languages and domains. You’ll own research and execution around data sourcing, curation, evaluation, and training strategies for multilingual and low-resource languages, with a strong emphasis on publishing high-quality research and translating it into production systems.

This role is ideal for someone who enjoys working close to the frontier: balancing papers, prototypes, and real-world impact in a fast-moving startup environment.


WHAT YOU’LL DO

- Design and execute research on multilingual datasets, including data collection, filtering, deduplication, and quality measurement

- Develop strategies for low-resource and long-tail languages (sampling, augmentation, curriculum design)

- Research and improve cross-lingual transfer, alignment, and robustness in large language models

- Build and maintain evaluation benchmarks for multilingual performance

- Collaborate with engineers and researchers on training pipelines and model architecture decisions

- Publish research at top venues (e.g., ACL, EMNLP, NeurIPS, ICML, ICLR) and contribute to open-source when appropriate

- Translate research insights into practical improvements in production models


WHAT WE’RE LOOKING FOR

- Strong background in NLP / ML research, with a focus on multilingual or cross-lingual modeling

- Publication record at respected conferences or journals (ACL, EMNLP, NeurIPS, ICML, ICLR, etc.)

- Experience working with large-scale text datasets across multiple languages

- Solid understanding of:

- Tokenization and vocabulary design for multilingual models

- Data quality metrics, filtering, and dataset bias

- Transfer learning and multilingual representation learning

- Comfortable prototyping in Python with modern ML frameworks (PyTorch, JAX, etc.)

- Ability to operate independently and ship research in a startup pace environment


NICE TO HAVE

- Experience with low-resource languages or non-Latin scripts

- Open-source contributions in NLP or data tooling

- Experience training or evaluating large language models

- Familiarity with multilingual benchmarks (e.g., XTREME, FLORES, TyDi QA)


WHY JOIN US

- Real ownership over research direction and impact

- A team that values papers and production

- Access to meaningful scale: large datasets, modern infrastructure, and fast iteration

- Competitive compensation and meaningful equity at an early stage
Share