This is a demonstration of a language guesser, as proposed in
Cavnar, Trenkle, N-Gram-Based Text Categorization.
It's implemented in Perl. You can get the Perl script under GPL
copyright restrictions here. For free! No commercial
version available! The competitors!
LIST OF LANGUAGES currently supported.
But some languages are only supported in certain encodings...
- afrikaans
- albanian
- amharic-utf
- arabic-iso8859_6
- arabic-windows1256
- armenian
- basque
- belarus-windows1251
- bosnian
- breton
- bulgarian-iso8859_5
- catalan
- chinese-big5
- chinese-gb2312
- croatian-ascii
- czech-iso8859_2
- danish
- dutch
- english
- esperanto
- estonian
- finnish
- french
- frisian
- georgian
- german
- greek-iso8859-7
- hawaian
- hebrew-iso8859_8
- hindi
- hungarian
- icelandic
- indonesian
- irish
- italian
- japanese-euc_jp
- japanese-shift_jis
- korean
- latin
- latvian
- lithuanian
- malay
- marathi
- mf
- middle_frisian
- mingo
- nepali
- norwegian
- persian
- polish
- portuguese
- quechua
- romanian
- russian-iso8859_5
- russian-koi8_r
- russian-windows1251
- sanskrit
- scots
- scots_gaelic
- serbian-ascii
- slovak-ascii
- slovak-windows1250
- slovenian-ascii
- slovenian-iso8859_2
- spanish
- swahili
- swedish
- tagalog
- tamil
- thai
- turkish
- ukrainian-koi8_r
- vietnamese
- welsh
- yiddish-utf
Gertjan van Noord
Last modified: Tue Jan 5 15:07:42 MET 1999