Hi, taking advantage of a word list composed from one million words of a Portuguese newspaper, I filtered, and filtered and filtered the garbage (and I have a strong idea it still has a lot of garbage), to generate a Portuguese dictionary for OpenMoko’s Illume keyboard.
Since UTF-8 support is still borked, I have replaced special characters like é with a plain e. Yeah, it’s like using an US keyboard for writing Portuguese, but one’s gotta work with the eggs one has in order to make an omelet.
Enjoy: Portuguese (ASCII).dic
Just do:
bunzip2 "Portuguese (ASCII)-0.1.0.dic.bz2" scp "Portuguese (ASCII)-0.1.0.dic" root@192.168.0.202:Portuguese\ \(ASCII\).dic ssh root@192.168.0.202 mv Portuguese\ \(ASCII\).dic \ /usr/lib/enlightenment/modules/illume/dicts/Portuguese\ \(ASCII\).dic
With a lot of thanks to Alberto Simões for pointing me to http://www.linguateca.pt/ACDC/ and Rasterman for the hints about the (quite simple) file format.
Hi, where you find the name (Portuguese\ \(ASCII\).dic) to use for the dict?
That file name is the name I gave to the word list I link at the beggning after 5 steps of filtering…
You could also make a file where all characters like ‘é’, ‘è’, etc. are simply converted to ‘e’ for usage in SMS, this allows for more characters in an SMS.
Ha, ignore my previous post. I’m working on a similar list for Dutch and according to http://en.wikipedia.org/wiki/Short_message_service I’ve decided to generate three versions, 7-bit, 8-bit and 16-bit. I ran into the same problem and am also considering a 4th version like you did for US keyboard.
In the dictionary file, I replaced all the accentuated characters with no accent characters (eg, ‘é’ into ‘e’) because I read somewhere that these kind of characters wasn’t yet well supported.
If I missed any, please let me know.
I also did a new Default.kbd whit a few more commonly used characters (at least by me) on SMS/texting.