Jais highlights:
- State-of-the-art 13-billion-parameter bilingual Arabic-English model
- Trained on a new data set including 116 billion Arabic tokens incorporating books, Wikipedia, and machine translation from English. Also trained on 279 billion English/code tokens
- Bidirectional transfer learning: Arabic improved because of the English tokens, and English improved because of the Arabic tokens
- Open source and available for download on Hugging Face
To learn more, check out the following:
- Press Release: https://www.cerebras.net/press-release/meet-jais-the-worlds-most-advanced-arabic-large-language-model-open-sourced-by-g42s-inception
- Read the technical paper: https://www.inceptioniai.org/jais/docs/Technicalpaper.pdf
- Models on Hugging Face: https://huggingface.co/inception-mbzuai