Repairing XML files from Windows to Mac

wilkolak

Chieftain
Joined
Sep 10, 2009
Messages
3
Hi,
I recently bought CivIV. I'm from Poland and I like to play games in my own language. Polish community made translation to CivIV and I wanted to apply it. But it's Windows-centric community... After applying, when first lunched game started to show error messages (as in attachment). After confirming all messages (6, related to 3 files - in vanilla, and much more in Warlords) the game loads and... some texts are missing. It is playable but I don't want to see sometimes 'TXT_SOMETHING_TEXT' when I should read what this unit should do...

My question is: how I can fix those files? I know a little about programming so I'm not afraid of any terminal related stuff or Xcode work.

Hope for a quick answer. :)
 

Attachments

  • error1.png
    error1.png
    52.7 KB · Views: 875
  • error2.png
    error2.png
    38.5 KB · Views: 854
The problem is that the Windows-created text files include characters that the Mac considers illegal. The XML files used in Civ4 typically specify in the first line:

<?xml version="1.0" encoding="ISO-8859-1"?>

This encoding standard (also known as 'Latin1') uses 8 bits to specify each character code. The full range of code values runs from 0 to 255, but there are non-printable characters in the code ranges 0-31 and 128-160. These are not permitted in an XML or HTML document encoded as Latin1. However, Polish Windows users may be editing the files using a text editor that inserts Windows Code Page 1250 character codes that lie within the range 128-160. They seem to be able to get away with doing this on Windows, but the Mac throws errors.

I use a brute force approach to these problems in order to get files to work for English-speaking users. I simply replace all character codes above 127 with near-equivalents below 128. Or I sometimes simply delete them! These are fairly easy to do using my BBEdit text editor. However, this is not going to be satisfactory for non-English use.

You may be able to find and replace codes in the 128-160 range of code page 1250. But, if you need to support the full Polish alphabet, you probably need to change the encoding for the files to UTF-8, and use a Unicode-compliant text editor to re-encode the text to conform with the UTF-8 standards.
 
Thanks for quick response. Polish specific characters weren't a problem. Coders made them by html entitles so UTF-8 weren't needed.
I've probably found a solution by playing with options in TextWrangler. One has a funny name... and it was that what I was searching. It is called 'Zap Gremlins'. I've clicked it for fun... and it found all non ASCII characters like hard space or something like that which went to the file in 'God one knows way'.
I've repaired that way vanilla files, know it is time for Warlords...
 
Zap Gremlins is what I use - that's what I was referring to in the BBEdit text editor (same family as TextWrangler).

Zap Gremlins does find those odd punctuation characters like smart quotes and long dashes. I think those are in the invalid 128-160 range. However, it changes *all* the characters that are above 127, including valid ISO-8859-1 accented characters. I assumed that would have a bad effect on the Polish characters as well.
 
Yes, it would. Deadly bad I think. :P But fortunately for some Vista 64 reasons there aren't any of them in files. Just pure html entitles.
Thanks again for your support.
 
Back
Top Bottom