Rosita is a project for training polyglot language models. The outputs of LMs trained in this way can be used as shared contextual representations between languages, which are useful as inputs to polyglot models for tasks including SRL, NER, and Universal Dependencies parsing. Because it is a multilingual version of ELMo (Peters et al., 2018), it is named after a bilingual character from Sesame Street. See github.com/pmulcaire/rosita for code and more details.
Monolingual: Arabic, English, Simplified Chinese, Traditional Chinese
Bilingual (character-based): Arabic and English, Simplified Chinese and English, Traditional Chinese and English
Bilingual (word- and character-based): Arabic and English, Simplified Chinese and English, Traditional Chinese and English