I hope y'all will pardon my posting in this forum. I'm not a modder (yet) and this is not exactly a tutorial, but it does contain information that may be of use to modders, and I hope you will all refer to it. If I've missed my mark, then I pray a kind moderator will remove it to a more appropriate location.
BACKGROUND:
I'm running Civ4 on a computer that runs a Japanese version of Windows XP. This is pretty much unavoidable. I bought my computer in Japan, and don't really feel like shelling out the extra bucks for an English OS. And as a native speaker of English, I'd just as soon not play Civ4 in Japanese.
As some of you may know, computers tend to encode text in Japanese (and indeed certain other languages) using two bytes per character. This doesn't affect common ASCII characters, such as the ones you're reading now. But certain two-byte combinations are co-opted by the OS to represent Japanese characters, and so cannot be used as they are in most Western-language OSes.
Why should you care about this? Well, among the co-opted combinations are a number of strings commonly found in XML files, particularly those which include foreign language translations. For example, the TAM_Eras.xml file included with The Ancient Mediterranean modpack contains the string
as its French translation of the English phrase "Stone Age." And it just so happens that the combination of an acute-accented "e" followed by the less-than sign is used to represent some obscure Japanese character or another. So when the XML parser gets to that point, it chokes and stops loading the file. Civ4 gamely tries to run the mod, but text errors abound, making the mod unplayable.
I've found one surefire way around this problem, and that is to replace the problematic characters with the appropriate character codes. For example, the accented e in the above string could (and indeed should) be represented by the following string of characters:
By going into the XML files and making these changes manually, I can fix the mod so the XML parser will load it without problem -- and even if I wished to run my copy of Civ4 in French, this change would presumably allow it to display the French translation for "Stone Age" correctly.
WHY SHOULD YOU CARE?
By now you may be wondering why you've bothered reading this far. Well, you should care, if you really want people everywhere to play your mods.
I can state from experience (having manually repaired all the mods that shipped with Civ4 Complete) that fixing a mod this way involves a fair bit of slog-work. Often several different files need to be fixed; fortunately the XML parser tells us which ones contain errors.
But since the parser only reports the first error it encounters in any given file, there's no way of knowing how many errors of what type I'll need to fix going in. I may know about the acute-accented "e" in the French translation in line 16, but how about the umlauted "u" in the German translation in line 1256? So I have two choices: either (A) eyeball the whole file looking for all possible errors, or (B) fix the one I know about and load the mod again to see if the XML parser will choke on another error.
As I'm sure you can imagine, this takes a bit of time. It's not uncommon to have to spend an hour fixing accented characters before I can start playing a new mod. Not quite the entertainment I was hoping for! And it's not just once: every time you release an update to your mod, I'll have to go through and fix the same errors again, and again, and again.
Yes, I can probably cut down on the slog by creating a macro that will scan a batch of files and fix all accented characters automatically. But while that will fix the problem for me, what about all the other folks who would love to play your mod and don't possess my degree of kung fu? Most likely they'll download and install it, say "Bah, it's broken!" and never give it the chance it deserves. Are you prepared to write them off?
WHAT NEEDS TO BE DONE?
If the XML files in your mod contain foreign language translations (French, Spanish, German, and Italian are the most common culprits) then they need to be fixed so they can be parsed by systems which use two-byte encodings. Accented characters should be replaced by the appropriate character codes, which are the same codes used to ensure correct display by HTML browsers. The list I usually refer to is this one right here:
http://www.w3.org/MarkUp/html-spec/html-spec_13.html
Ideally these codes should be used to represent all accented characters (and special characters) in the XML files. In practice, however, it's generally only necessary to replace characters that come directly before a less-than sign. (At least, that's what works for me. I can't be certain how systems running in other languages are affected.)
I hope you can agree it would be a good idea if this problem were fixed at the source. That way the fix only needs to be applied once. Then it will be fixed for everybody, and for future versions.
To maximize the audience for your mod, please consider adopting this fix and making it part of your mod creation workflow.
And keep up the great work!
BACKGROUND:
I'm running Civ4 on a computer that runs a Japanese version of Windows XP. This is pretty much unavoidable. I bought my computer in Japan, and don't really feel like shelling out the extra bucks for an English OS. And as a native speaker of English, I'd just as soon not play Civ4 in Japanese.
As some of you may know, computers tend to encode text in Japanese (and indeed certain other languages) using two bytes per character. This doesn't affect common ASCII characters, such as the ones you're reading now. But certain two-byte combinations are co-opted by the OS to represent Japanese characters, and so cannot be used as they are in most Western-language OSes.
Why should you care about this? Well, among the co-opted combinations are a number of strings commonly found in XML files, particularly those which include foreign language translations. For example, the TAM_Eras.xml file included with The Ancient Mediterranean modpack contains the string
<Text>Antiquité</Text>
as its French translation of the English phrase "Stone Age." And it just so happens that the combination of an acute-accented "e" followed by the less-than sign is used to represent some obscure Japanese character or another. So when the XML parser gets to that point, it chokes and stops loading the file. Civ4 gamely tries to run the mod, but text errors abound, making the mod unplayable.
I've found one surefire way around this problem, and that is to replace the problematic characters with the appropriate character codes. For example, the accented e in the above string could (and indeed should) be represented by the following string of characters:
<Text>Antiquit&#233;</Text>
By going into the XML files and making these changes manually, I can fix the mod so the XML parser will load it without problem -- and even if I wished to run my copy of Civ4 in French, this change would presumably allow it to display the French translation for "Stone Age" correctly.
WHY SHOULD YOU CARE?
By now you may be wondering why you've bothered reading this far. Well, you should care, if you really want people everywhere to play your mods.
I can state from experience (having manually repaired all the mods that shipped with Civ4 Complete) that fixing a mod this way involves a fair bit of slog-work. Often several different files need to be fixed; fortunately the XML parser tells us which ones contain errors.
But since the parser only reports the first error it encounters in any given file, there's no way of knowing how many errors of what type I'll need to fix going in. I may know about the acute-accented "e" in the French translation in line 16, but how about the umlauted "u" in the German translation in line 1256? So I have two choices: either (A) eyeball the whole file looking for all possible errors, or (B) fix the one I know about and load the mod again to see if the XML parser will choke on another error.
As I'm sure you can imagine, this takes a bit of time. It's not uncommon to have to spend an hour fixing accented characters before I can start playing a new mod. Not quite the entertainment I was hoping for! And it's not just once: every time you release an update to your mod, I'll have to go through and fix the same errors again, and again, and again.
Yes, I can probably cut down on the slog by creating a macro that will scan a batch of files and fix all accented characters automatically. But while that will fix the problem for me, what about all the other folks who would love to play your mod and don't possess my degree of kung fu? Most likely they'll download and install it, say "Bah, it's broken!" and never give it the chance it deserves. Are you prepared to write them off?
WHAT NEEDS TO BE DONE?
If the XML files in your mod contain foreign language translations (French, Spanish, German, and Italian are the most common culprits) then they need to be fixed so they can be parsed by systems which use two-byte encodings. Accented characters should be replaced by the appropriate character codes, which are the same codes used to ensure correct display by HTML browsers. The list I usually refer to is this one right here:
http://www.w3.org/MarkUp/html-spec/html-spec_13.html
Ideally these codes should be used to represent all accented characters (and special characters) in the XML files. In practice, however, it's generally only necessary to replace characters that come directly before a less-than sign. (At least, that's what works for me. I can't be certain how systems running in other languages are affected.)
I hope you can agree it would be a good idea if this problem were fixed at the source. That way the fix only needs to be applied once. Then it will be fixed for everybody, and for future versions.
To maximize the audience for your mod, please consider adopting this fix and making it part of your mod creation workflow.
And keep up the great work!