Rev 1519 |
Last modification |
Compare with Previous |
View Log
| RSS feed
Last modification
- Rev 3319 2014-07-31 17:47:44
- Author: steve
- Log message:
- Give a warning for probable cases of files with wrong encoding.
The default reading routines in Java cause bad characters in unicode input to be
replaced with the unicode replacement character 0xfffd. We detect this in the
TokenScanner, and print a message with the file and line number. This will
apply to all files that are read using TokenScanner, so probably everything
other than the maps themselves.
It is possible for a file that is in the wrong encoding to still be valid utf-8
although this would be rare, and you can always add a comment with some accented
characters that is likely to fail when in the incorrect encoding.
It does mean that you can't genuinely have a replacement character in your input.
I don't think that this is likely to be a real problem, but we did have a few
ourselves already, which have been removed by this patch.