GEITje is a large open Dutch language model with 7 billion parameters, based on Mistral 7B. I've continued pretraining it on 10 billion tokens of Dutch text. This has improved its Dutch language skills and increased its knowledge of Dutch topics. There's an experimental chat-finetuned model too, called GEITje-chat.
I have to say the experience of gathering a dataset and training a model has been very educational for me. Being forced to deal with every detail yourself really deepens your understanding of a subject.
It's all in the hands of the community now. Can't wait to see what they do with it!
Want to try it out? There's a demo live at Hugging Face Spaces right now: https://huggingface.co/spaces/Rijgersberg/GEITje-7B-chat