[Dev tool] Civilization 4 XML translation tool

dbkblk · Jan 24, 2015

Ok. I hope you'll fix it.

EDIT: The Gamebryo indicates UTC-2 as 2-bytes indicator. I think it's a mistake, they mean UCS-2.

EDIT2: I continue the brain-storming. Sylfaen.ttf seems limited to Latin1to9 plus Georgian, Armenian and Hebrew. WHAT IF... we change the default gamefont for one that uses full unicode ? I don't really understand where can i find a full supported font as even Arial bring Arabian, but not Chinese.

EDIT3: I discovered that the font in use is defined by "Civ4\Beyond the Sword\Resource\Themes\Civ4\Civ4Theme_Common.thm"

Code:

			with SF_CtrlTheme_Civ4_Control_Font
			{
				GFont	.Size0_Normal			=	GFont("Tahoma",		"Regular",		11, GFlags(GFontFeature, GFC_FONT_ALPHA));
				GFont	.Size0_Bold				=	GFont("Tahoma",		"Bold",			11, GFlags(GFontFeature, GFC_FONT_BOLD,	GFC_FONT_ALPHA));
				GFont	.Size0_Italic			=	GFont("Tahoma",		"Italic",		11, GFlags(GFontFeature, GFC_FONT_ITALIC, GFC_FONT_ALPHA), 0, GRectMargin(1));
				GFont	.Size0_BoldItalic		=	GFont("Tahoma",		"Bold Italic",	11, GFlags(GFontFeature, GFC_FONT_BOLD,	GFC_FONT_ITALIC, GFC_FONT_ALPHA), 0, GRectMargin(1));

				GFont	.Size1_Normal			=	GFont("Tahoma",		"Regular",		12, GFlags(GFontFeature, GFC_FONT_ALPHA));
				GFont	.Size1_Bold				=	GFont("Tahoma",		"Bold",			12, GFlags(GFontFeature, GFC_FONT_BOLD,	GFC_FONT_ALPHA));
				GFont	.Size1_Italic			=	GFont("Tahoma",		"Italic",		12, GFlags(GFontFeature, GFC_FONT_ITALIC, GFC_FONT_ALPHA), 0, GRectMargin(1));
				GFont	.Size1_BoldItalic		=	GFont("Tahoma",		"Bold Italic",	12, GFlags(GFontFeature, GFC_FONT_BOLD,	GFC_FONT_ITALIC, GFC_FONT_ALPHA), 0, GRectMargin(1));

				GFont	.Size2_Normal			=	GFont("Tahoma",		"Regular",		14, GFlags(GFontFeature, GFC_FONT_ALPHA));
				GFont	.Size2_Bold				=	GFont("Tahoma",		"Bold",			14, GFlags(GFontFeature, GFC_FONT_BOLD,	GFC_FONT_ALPHA));
				GFont	.Size2_Italic			=	GFont("Tahoma",		"Italic",		14, GFlags(GFontFeature, GFC_FONT_ITALIC, GFC_FONT_ALPHA), 0, GRectMargin(1));
//				GFont	.Size2_BoldItalic		=	GFont("Tahoma",		"Bold Italic",	14, GFlags(GFontFeature, GFC_FONT_BOLD,	GFC_FONT_ITALIC, GFC_FONT_ALPHA), 0, GRectMargin(1));

				GFont	.Size3_Normal			=	GFont("Tahoma",		"Regular",		15, GFlags(GFontFeature, GFC_FONT_ALPHA));
				GFont	.Size3_Bold				=	GFont("Tahoma",		"Bold",			15, GFlags(GFontFeature, GFC_FONT_BOLD,	GFC_FONT_ALPHA));
//				GFont	.Size3_Italic			=	GFont("Tahoma",		"Italic",		15, GFlags(GFontFeature, GFC_FONT_ITALIC, GFC_FONT_ALPHA), 0, GRectMargin(1));
//				GFont	.Size3_BoldItalic		=	GFont("Tahoma",		"Bold Italic",	15, GFlags(GFontFeature, GFC_FONT_BOLD,	GFC_FONT_ITALIC, GFC_FONT_ALPHA), 0, GRectMargin(1));

				GFont	.Size4_Normal			=	GFont("Tahoma",		"Regular",		18, GFlags(GFontFeature, GFC_FONT_ALPHA));
				GFont	.Size4_Bold				=	GFont("Tahoma",		"Bold",			18, GFlags(GFontFeature, GFC_FONT_BOLD,	GFC_FONT_ALPHA));
//				GFont	.Size4_Italic			=	GFont("Tahoma",		"Italic",		18, GFlags(GFontFeature, GFC_FONT_ITALIC, GFC_FONT_ALPHA), 0, GRectMargin(1));
//				GFont	.Size4_BoldItalic		=	GFont("Tahoma",		"Bold Italic",	18, GFlags(GFontFeature, GFC_FONT_BOLD,	GFC_FONT_ITALIC, GFC_FONT_ALPHA), 0, GRectMargin(1));
			}
			
			.Normal								=	SF_CtrlTheme_Civ4_Control_Font_Size1_Normal;
			.Bold								=	SF_CtrlTheme_Civ4_Control_Font_Size1_Bold;
			.Italic								=	SF_CtrlTheme_Civ4_Control_Font_Size1_Italic;
			.BoldItalic							=	SF_CtrlTheme_Civ4_Control_Font_Size1_BoldItalic;

Arakhor · Jan 24, 2015

Spoiler :

The Rise of Mankind text could be a brighter colour, especially as the WordArt-style shadowing reduces the contrast even further. The background is very busy and probably unrealistic too, because I doubt you'd be able to see any stars when that close to the sun.

45°38'N-13°47'E · Jan 24, 2015

Talking about realism the earth is a bit further from the sun... so realism shouldn't be an issue. But I think I agree on the text color

dbkblk · Jan 24, 2015

I'll decrease the number of stars in the next version. However, did you actually tried the animation or your comment is only based on the image ? Because the animation makes the color to change. I think the rendering of the logo could be improved however. It is not easy to read in some circumstances.
Indeed, the earth is close to the sun

Should i change that ?

The thread to talk about the menu is here.

Nightinggale · Jan 24, 2015

dbkblk said:
EDIT3: I discovered that the font in use is defined by "Civ4\Beyond the Sword\Resource\Themes\Civ4\Civ4Theme_Common.thm"

THat's odd. I would assume it to use sylfaen.ttf :confused:

I have some bad news regarding switching the DLL to unicode. I couldn't get CP65001 to work, so I went online to look for a solution. I found a thread from 2005 telling that it is buggy and that MS made a quick "fixed" the bug by ignoring requests to switch to this CodePage. Even more interesting it was updated in 2014 saying that nothing changed and that 65001 is still broken.

I also learned that the OS level of CodePages is hardcoded to use one or two bytes for each character, but not more than that. Take for instance CP932 (Japanese). With the exception of the first 128 standard ASCII characters, all characters use two bytes. Those characters all use 3 bytes in UTF-8. I think it's the same for Korean and Chinese.

The big question now is: now what? The asian mods likely switches to their local codepage or even just assume the computer to use that locale. It wouldn't bother those people to hardcode their game to just one language.

dbkblk · Jan 24, 2015

Nightinggale said:
THat's odd. I would assume it to use sylfaen.ttf

In fact, default is Sylfaen but when i used CivScale for some tests, it switched to Tahoma. I found the rendering much better with calibri (with smaller sizes), but this font is only included since Vista (so it need a fallback for XP).

Nightinggale said:
I also learned that the OS level of CodePages is hardcoded to use one or two bytes for each character, but not more than that. Take for instance CP932 (Japanese). With the exception of the first 128 standard ASCII characters, all characters use two bytes. Those characters all use 3 bytes in UTF-8. I think it's the same for Korean and Chinese. The big question now is: now what? The asian mods likely switches to their local codepage or even just assume the computer to use that locale. It wouldn't bother those people to hardcode their game to just one language.

This explains why the Gamebryo engine is told to be limited to 2-bytes characters. So we "only" need to expand the FontConversion trick to convert UTF8 chars to local codepage 932, 936, 949 and 950. If we manage to get this, i think we can both agree that we've largely extended the font possibilities of the game

EDIT: Here is an Arabic screenshot ! Nice, isn't it ? Note that i had to set a font that support arabic characters into the theme file ("Simplified Arabic" which is available since XP).

EDIT2: In fact, except Hindi & Tamil (unicode only), we've made it possible to play the game in all the most influential languages in the world. Here's a list of new languages supported by the game.

dbkblk · Jan 25, 2015

@Nightinggale: While trying to adapt the code for Korean (hangul), i've faced a little problem. As this is two-bytes, the first pass is to detect the first byte then to redirect the processing of the second byte relative to the value of the first.
Then, to process the second, the iconv function is this one:

Code:

uhc_1_mbtowc(conv_t conv, ucs4_t *pwc, const unsigned char *s, int n)
{
	unsigned char c1 = s[0];
	if ((c1 >= 0x81 && c1 <= 0xa0)) {
		if (n >= 2) {
			unsigned char c2 = s[1];
			if ((c2 >= 0x41 && c2 < 0x5b) || (c2 >= 0x61 && c2 < 0x7b) || (c2 >= 0x81 && c2 < 0xff)) {
				unsigned int row = c1 - 0x81;
				unsigned int col = c2 - (c2 >= 0x81 ? 0x4d : c2 >= 0x61 ? 0x47 : 0x41);
				unsigned int i = 178 * row + col;
				if (i < 5696) {
					*pwc = (ucs4_t)(uhc_1_2uni_main_page81[2 * row + (col >= 89 ? 1 : 0)] + uhc_1_2uni_page81[i]);
					return 2;
				}
			}
			return RET_ILSEQ;
		}
		return RET_TOOFEW(0);
	}
	return RET_ILSEQ;
}

The problem i encounter is pretty basic.

Code:

unsigned char c2 = s[1];

In the code, we pass:

Code:

char cOld = szTextVal.data()[i];

So, we try to extract a char value from the "i" position of the string. But, on 2 bytes chars, the function expect the second byte of the char. I don't know how to get this one programmatically, as we already extracted the char from an array (szTextVal.data()). Should i use "cOld[1]" ? Are the 2-bytes given to the char or should i extract two values from "szTextVal.data()" (which would be problematic for ASCII sentences -not translated- in korean)?

dbkblk · Jan 25, 2015

I have a very hard time figuring out how to use Korean. I would like to try to encode directly korean in CP949 locale to the file but iconv don't want to convert UTF-8 to CP949 and QT tells me that some chars are not supported to unicode. I don't really understand how to encode this language.

Nightinggale · Jan 25, 2015

If you encode the XML files in CP 949 and you set the game locale to CP949, then you should be able to skip the conversion code entirely and just forward the XML input to the exe.

If you want to encode the file in CP949, then you could try to open the file in notepad++ (or similar) and then change encoding and save (backup first!). That will likely result in unsupported characters being converted to ?, but the Korean should do just fine.

dbkblk · Jan 25, 2015

No, it doesn't work that easy. Just try to translate a word in google translate. It outputs UTF8 encoding. Then save it to Notepad++ UTF-8, then convert to CP949. You will see many ?. My current locale is CP949 and i cannot see all chars. Even stranger, my converter quickly made in QT tells me that my "current locale (949) don't allow me to see some chars".
ICONV tell me that UTF-8 -> CP949 is unsupported. Strange.

dbkblk · Feb 18, 2015

@Afforess: Some weeks ago, i started to rewrite the XML parser with a GUI, but the more i work on it, the more i think it's a waste of time.

The most boring part about making a link between transifex and our mod is to export xml files then import them back. I don't want to automatize the import part as it needs review but i want it for the export part.

Now, the plan is to break the xml parser into dead-simple tools. I tell you this because one of these tools, which i'll call "exporter" will convert our mod text files into the xml format for Transifex. Basically, i want to compile it for linux so you can put it as a hook after an update so it exports the files into civ.afforess.com, say, into a "translations/" folder.
Then i'll link transifex to look at file updates on this folder and it will automatically update english strings on the platform.

The others tools would be (not on your server):
- merger.exe (into "Assets/XML/Text"): double click on it and it divides files into categories (so it merges the files that have a different name).
- importer.exe: will merge translations into the mod files

I'll look at the other features of my parser later as these are the most important ones.

dbkblk · Feb 22, 2015

@Nightinggale: I've found a gem !!!!!! Japanese version source code v3.19. We just have to compare with stock 3.19.

Nightinggale · Feb 22, 2015

dbkblk said:
@Nightinggale: I've found a gem !!!!!! Japanese version source code v3.19. We just have to compare with stock 3.19.

Great, but where is it?

dbkblk · Feb 22, 2015

I've isolated source code and text files from the patch (one... by one... stupid installshield): https://dl.dropboxusercontent.com/u/369241/jap_patch3.19.7z

Nightinggale · Feb 22, 2015

dbkblk said:
I've isolated source code and text files from the patch (one... by one... stupid installshield): https://dl.dropboxusercontent.com/u/369241/jap_patch3.19.7z

ありがとうごさいます

dbkblk · Feb 22, 2015

Now i merged their method and will try to launch the game in korean. Let's cross fingers.
Keep me informed if you manage to make it to work!

dbkblk · Feb 22, 2015

The game crashes at startup with their code.

EDIT: What is the purpose of their Convert function ? It doesn't seem used anywhere.

Afforess · Feb 22, 2015

Can someone post the diff between the two, or the original english BTS source code, so I can do a comparison? I don't have the original game source code anymore, steam doesn't install it, and I don't feel like re-installing from dvd to get it.

Nightinggale · Feb 22, 2015

dbkblk said:
The game crashes at startup with their code.

The code seems to rely on CP_ACP, which is the CodePage currently used by windows. For all we know it only works if you switch to Japanese locale.

dbkblk said:
EDIT: What is the purpose of their Convert function ? It doesn't seem used anywhere.

Good question. In fact I find all the modifications a bit odd.

CvGameTextMgr.cpp: minor corrections, such as year before date. Nothing important.
exitingToMainMenu() takes a wide char pointer as argument. From what I can tell it is only used with NULL pointers.
CvDefines: gives it a unique ID for finding other players on GameSpy
CvInitCore: some wide strings are used as wide strings instead of converting to strings
CvString.h: the convert function is added, but appears not to be called

That's about it. Where is the magic, which enables the ingame non-latin characters :confused:

Nightinggale · Feb 22, 2015

Afforess said:
Can someone post the diff between the two, or the original english BTS source code, so I can do a comparison? I don't have the original game source code anymore, steam doesn't install it, and I don't feel like re-installing from dvd to get it.

I would if I could. The

compare program refuse to make a diff

[Dev tool] Civilization 4 XML translation tool

Emperor

Dremora Courtier

Deity

Emperor

Deity

Emperor

Emperor

Emperor

Deity

Emperor

Emperor

Emperor

Deity

Emperor

Deity

Emperor

Emperor

The White Wizard

Deity

Deity

Similar threads