[mkgmap-dev] [PATCH] Alpha code for Highway Symbols
From Marko Mäkelä marko.makela at iki.fi on Mon Apr 6 13:57:57 BST 2009
On Mon, Apr 06, 2009 at 02:38:15PM +0200, Johann Gail wrote: > \u Syntax is java Syntax, and is *NOT* UTF8-Encoding! Correct. For example, \u2020 (the dagger symbol, †) would be \xe2\x80\xa0 or \342\200\240 in the UTF-8 encoding and \x20\x20 or \40\40 in UTF-16 (no matter if big or little endian, in this case). The octal and hex notation are 8-bit byte codes. I think that it is much more readable to write \u2020 for U+2020 than \xe2\x80\xa0. The \u notation will apparently also be in the next C and C++ syntax. > Both of them are unicode, but the encoding scheme is different. At the > moment it works fine, if you use an editor, which can handle unicode > properly. I'm not sure if I understand your comment. I have understood that java.lang.String uses something like UTF-16 internally. I have never seen a text file containing Unicode characters that would be encoded in anything else than UTF-8. As far as I understand, the MySQL database (which I develop for a living) accepts UTF-16 string literals (called "ucs2"), but the bug reports I've seen always have been in ASCII, ISO 8859-1, or UTF-8. > But it is good idea, instead of introducing a new proprietary ~[xx] > style, use a n existing standard, as e.g. the \u4 notation. That exactly was my point. It should be trivial to implement all three notations (\x hex bytes, \ octal bytes, \u hex unicode). Marko
- Previous message: [mkgmap-dev] [PATCH] Alpha code for Highway Symbols
- Next message: [mkgmap-dev] [PATCH] Alpha code for Highway Symbols
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the mkgmap-dev mailing list