Rosita: Polyglot Contextual Representations

Description

Rosita is a project for training polyglot language models. The outputs of LMs trained in this way can be used as shared contextual representations between languages, which are useful as inputs to polyglot models for tasks including SRL, NER, and Universal Dependencies parsing. Because it is a multilingual version of ELMo (Peters et al., 2018), it is named after a bilingual character from Sesame Street. See github.com/pmulcaire/rosita for code and more details.

Pretrained Model Downloads

Monolingual: Arabic, English, Simplified Chinese, Traditional Chinese

Bilingual (character-based): Arabic and English, Simplified Chinese and English, Traditional Chinese and English

Bilingual (word- and character-based): Arabic and English, Simplified Chinese and English, Traditional Chinese and English