Predictive text for more languages

Also -unrelated to the above- i’d like to add a way of getting a corpus from Wikipedia in case you have trouble finding one.

Download this: GitHub - attardi/wikiextractor: A tool for extracting plain text from Wikipedia dumps

Download a _locale_wiki-latest-pages-articles.xml file from:

https://dumps.wikimedia.org/_locale_wiki/latest/

and run: python3 WikiExtractor.py --infn _locale_wiki-latest-pages-articles.xml

you will get a large .txt file to use as corpus.

On the above substitute locale with your preferred text one. Ie in the case of Czech use cs and so on. (Index of /cswiki/latest/)

2 Likes