[Dev tool] Civilization 4 XML translation tool

<Nexus> · Jan 22, 2015

dbkblk said:
In fact the picture is to show russian support ! The main screen is downloadable in the creative thread

I've got that. Good job :goodjob:

It was just shocking new for me. Is it animated? (No, I can't wait to try it

)

dbkblk · Jan 22, 2015

Is it

I've totally reworked the default animation from the reference of the sun instead of the earth. It give a deep space feeling ! I also added new backgrounds for the launcher and a background random choice for loading screens.

dbkblk · Jan 22, 2015

Czech support (i have written nonsense

):

Full hungarian support (just for Sogroon):

However, the work is still not finished as characters are not displayed in city billboards.

<Nexus> · Jan 22, 2015

Man, you are crazy! :lol:

BTW:

Je parle français. I speak english. Yo hablo un poco español. Ich lerne Deutsch. Yeg studerer bokmål.

Duolingo?
Ich lerne Deutsch auch

私は日本語を習っているだけです (watashi wa nihongo wo naratteiru dake desu)

dbkblk · Jan 22, 2015

Here's the current problematic:

Some sentences are rendered using Gamefonts. I understand that citybillboard isn't rendered as it is hardcoded but that is surprising to find rendered strings in the description.
Parts of the GUI doesn't seems to work (BUG does not seems to support it all). I have still yet much debug work to do.

@Sogroon: Ich habe kein gespräch jetzt noch. Ich gebrauche Assimil (viel gut) für lernen Deutsch und Norweger.

@Nightinggale:
I've added this modified function in the hope that i can bypass Gamefont rendering but it doesn't work. I don't think i understand the relation between a wchar and a &wchar.

Code:

CvWString convertWString(CvWString wszTextVal)
{
	wchar caNewChar[2];
	caNewChar[1] = 0;

	CvWString &wszNewString;
	wszNewString.clear();

	unsigned int iCharset = GetACP();

	// The input is a string as it is written in XML, where it is UTF-8 encoded.
	// The output is the same content, but the character values have been changed to match those used in GameFont.

	for (unsigned i = 0; i < wszTextVal.length(); ++i)
	{
		char cOld = wszTextVal.data()[i];
		wchar cNew = 0;
		unsigned int iID = cOld;
		iID &= 0xFF; // ensure we only get the 8 bytes and no sign extension

		if (cOld >= 0 && cOld <= 0x7F)
		{
			// single byte char. No conversion needed
			cNew = cOld;
		}
		else
		{
			unsigned char cMask = 0x40;
			iID &= 0x7F;

			while (iID & cMask)
			{
				iID &= ~cMask;
				cMask >>= 1;
			}

			cMask = 0x40;

			// the number of bytes in an UTF-8 character is the same as the number of 1s before the first 0 in the first byte (1 byte starts with 0 for backward compatibility)
			// read all bytes for the current character using this info
			while (cMask & cOld)
			{
				cMask >>= 1;
				i++;
				iID <<= 6;
				int iChar = wszTextVal.data()[i] & 0xFF;
				iID |= (iChar & 0x3F);
				FAssertMsg((iChar & 0xC0) == 0x80, CvString::format("Not UTF-8 encoded: %s %s", GC.getCurrentXMLFile().GetCString(), wszTextVal.data()));
			}

			// at this point iID contains the unicode value of the character.
			// cNew is the index in GameFont, which should be used for the character in question.

			for (int iIndex = 0; iIndex < 0x80; iIndex++)
			{
				if (iID == conversion_table_unicode[iIndex])
				{
					cNew = iIndex + 0x80;
					break;
				}
			}
		}

		if (cNew == 0)
		{
			//FAssertMsg(cNew != 0, CvString::format("UTF-8 fail: %X %s %s", iID, GC.getCurrentXMLFile().GetCString(), (szTextVal.data() + i)));
			cNew = '?';
		}
		caNewChar[0] = cNew;
		wszNewString.append(caNewChar);
	}
	return wszNewString;
}

Of course, it doesn't work

Nightinggale · Jan 22, 2015

dbkblk said:
Do you have any tip ? May i use the wrong code as you've mentionned previously on this thread that your computer forces 1250 ?

It doesn't anymore and I have no idea why.

dbkblk said:
EDIT: I've managed to change the locale to Russian while still using my native language but the codepage in use is 866 and not 1251

That is just plain stupid. It would appear that Windows has at least 3 versions of Russian (1251, 866, 855). Which one you get appears to be decided by your hardware and/or software. There are 11 single byte CodePages and I guess it would be good to add iconv code for all of them. That way we have the best chance of people being able to play regardless of their local settings.

Things would have been so much simpler if civ4 had just used the full unicode set :sad:

dbkblk · Jan 22, 2015

Nightinggale said:
That is just plain stupid. It would appear that Windows has at least 3 versions of Russian (1251, 866, 855). Which one you get appears to be decided by your hardware and/or software. There are 11 single byte CodePages and I guess it would be good to add iconv code for all of them. That way we have the best chance of people being able to play regardless of their local settings.

In fact, it is not that difficult. When Russian is set as "language for non-unicode program" (In Windows8 > Config panel > Language > Change formats of date... > Administration tab > Change regional parameters), the console answer 866 to "chcp" but after a reboot, the game use cp1251, so that's work!

The thing that stops me for now is that parts of the code uses szBuffer to build sentences (i don't yet understand how that buffer thing works) and this seems hardcoded into the executable (as defined "virtual CvWString getText(CvWString szIDTag, ...) = 0;" in "CvDLLUtilityIFaceBase.h").
I need to figure out if there is another way to bypass getText to send unicode sentences to the game. You may have any idea ?

EDIT: Another thought: Isn't it possible to directly read the xml cached file and to extract sentence in UTF to replace "gDll->getText" ?

EDIT2: I'm convinced that bypassing gDll->getText() is the key, but i don't know if it's possible.

EDIT3: What is the purpose of this function ?

Code:

void CvGameText::setText(const wchar* szText)				
{
	m_szText = szText; 
}

<Nexus> · Jan 22, 2015

OFF

Spoiler :

Nightinggale · Jan 22, 2015

dbkblk said:
I've added this modified function in the hope that i can bypass Gamefont rendering but it doesn't work. I don't think i understand the relation between a wchar and a &wchar.

The & means it's a reference to the variable. For instance if we have

Code:

int FuncA(int& A)
{
    A++;
    return A*A;
}

void FuncB()
{
   int myInt = 4;
   FuncA(&myInt); // can't remember if this is the precise reference syntax
   printf("%d\n", myInt);
}

The question is what is printed. FuncA gets a reference to myInt, meaning when it uses ++, myInt in FuncB also increments and 5 will be printed. If the & are removed, FuncA will get a copy of myInt and the ++ will only affect the code and 4 will be printed.

I have no idea about language support in BuG as I never used it. In fact when I read about the problems people have, I feel perfectly fine not using it

One thing I have run into though is lazy programmers writing the English string in the code rather than using the TXT_KEY_ system. That results in hardcoded English, which the translators can't do anything about. I don't know if that is the case with the English strings in the screenshot.

Non-coding language stuff.

Spoiler :

Sogroon said:
Man, you are crazy!

Quite possibly, but I haven't seen any evidence to support that claim. If it is language learning or coding skills, which makes you say that, then I must be a complete lunatic :crazyeye:

Sogroon said:
Ich lerne Deutsch auch

Einmal musste ich nach Deutschland fahren (kein Uhrlaub

) und entdeckte das viele Leute in Deutschland kein Englisch sprechen. Das was kein große Problem und heute brauchen ich nickt Deutsch zu lernen

Sogroon said:
私は日本語を習っているだけです (watashi wa nihongo wo naratteiru dake desu)

Sadly that's a two byte CodePage and I haven't managed to get it to work. I have tried though.

Btw I found an interesting tool for reading Japanese: Translation Aggregator.
It can do a whole lot of stuff, which makes the guide long and possibly hard to follow, but the really good part is fairly easy. Download it, add EDICT2 (a fairly decent free Japanese-English dictionary) and that's it for setup. When you run it, close all translators except for JParser. I assume you don't want the machine translations.

What it does is it monitors the clipboard and everything you copy, it pastes into it's own window. It splits the words with decent accuracy (but not perfect) and if you hover over a word, you will get a popup with the dictionary lookup, making it really quick to look up the words you need to look up. Also it can add furigana in hiragana, katakana and romaji, which for the most parts eliminates getting stuck on kanji. I still wonder who wants furigana in katakana, but it can provide it if you want.

It's intended purpose is reading visual novels, but we can forget about those (unless you want to read those). Due to the clipboard monitoring it is great for any Japanese text, be it a document or a web page. So far it has handled everything I have thrown at it with good results.

Naturally you still need to know Japanese to be able to use it. It just makes the whole character reading and dictionary lookup much easier and much faster. If you don't know any Japanese, you will not really gain anything from the dictionary telling that the word is in potential negative form (or whatever it writes).

dbkblk · Jan 22, 2015

Nightinggale said:
The & means it's a reference to the variable. For instance if we have...

Thank you. That is much more clear now.

Nightinggale said:
One thing I have run into though is lazy programmers writing the English string in the code rather than using the TXT_KEY_ system. That results in hardcoded English, which the translators can't do anything about. I don't know if that is the case with the English strings in the screenshot.

No. None of these strings are hardcoded, but they are referenced directly into the code:

Code:

if (kBuilding.getMaxDomesticConnectedCommerce() <= 0)
			{
				szBuffer.append(gDLL->getText("TXT_KEY_BUILDING_DOMESTIC_CONNECTED_COMMERCE_EACH_CITY", kBuilding.getDomesticConnectedCommerce()));
			}

which prints:

Code:

   <TEXT>
      <Tag>TXT_KEY_BUILDING_DOMESTIC_CONNECTED_COMMERCE_EACH_CITY</Tag>
      <English>[ICON_BULLET]+%d1[ICON_COMMERCE] from each connected domestic city</English>
      <French>[ICON_BULLET]+%d1[ICON_COMMERCE] de chaque ville connectée.</French>
   </TEXT>

I try to catch what is the difference between these direct calls and the way other sentences are written like "Double culture in 1000 years" (last one of the russian description).

<Nexus> · Jan 22, 2015

Quite possibly, but I haven't seen any evidence to support that claim. If it is language learning or coding skills, which makes you say that, then I must be a complete lunatic

@Nightinggale
I was referring to dbkblk's screenshot on Hungarian support (I am Hungarian)
He changed the main menu to say:

Hello Sogroon
I don't speak Hungarian
But Google does
Every Hungarian character too

When I saw it I had to LOL - which may be dangerous in a workplace :crazyeye:

Nightinggale · Jan 22, 2015

dbkblk said:
I need to figure out if there is another way to bypass getText to send unicode sentences to the game. You may have any idea ?

EDIT: Another thought: Isn't it possible to directly read the xml cached file and to extract sentence in UTF to replace "gDll->getText" ?

EDIT2: I'm convinced that bypassing gDll->getText() is the key, but i don't know if it's possible.

I-got-an-idea----idea-animated-animation-smiley-emoticon-000274-medium.gif

I'm going to try to inject unicode into the game instead of using getText() and see what happens. It sounds like a great idea.

There are some strings, which are hardcoded in the exe if I recall correctly. Let's ignore that for a moment and see how far we can get by totally ignoring the exe in the string system.

I think we can "hijack" gDll->getText() meaning vanilla has it virtual if I recall correctly. If we simply remove virtual and write our own function, our code will link to that function regardless of what is inside the exe. Doing that saves us from renaming every line using getText().

dbkblk said:
EDIT3: What is the purpose of this function ?

Code:

void CvGameText::setText(const wchar* szText) { m_szText = szText; }

m_szText is a member variable in CvGameText and szText is the argument. It simply copies the argument into the member variable, effectively telling the text what it is.

Generally speaking I don't like classes where there is a get and set function as it makes the protected state of the variable a bit pointless, but that's a debate for code design. It does work, which is the main point.

Sogroon said:
@Nightinggale
I was referring to dbkblk's screenshot on Hungarian support (I am Hungarian)
He changed the main menu to say:

Hello Sogroon
I don't speak Hungarian
But Google does
Every Hungarian character too

When I saw it I had to LOL - which may be dangerous in a workplace

Oh man, I missed that. I just read that it was Hungarian and assumed it to be just a regular translation. It's brilliant and looks like something I could have come up with :lol:

Actually I have to check if I did that with Russian. I did make some Russian screenshots at some point where I inserted some test text. I copy pasted the alphabet and stuff like that, but I might have pranked too.

dbkblk · Jan 22, 2015

After messing up with ConvertWString, i ****ed up the first chars but the sentence is not rendered by GameFont anymore (before -> after):

...so there HAVE TO be a way to bypass gamefont rendering in any circumstance. I'm SURE of that !

EDIT: I miss the point, getGameFontString(getGameFontString(pCity->getName())) get the correct name in Russian. No need to rewrite it.

Nightinggale · Jan 22, 2015

dbkblk said:
EDIT: I miss the point, getGameFontString(getGameFontString(pCity->getName())) get the correct name in Russian. No need to rewrite it.

The point is that whenever the game writes a regular string using a normal font, it uses standardized numbers for each character. GameFont.tga however gives us little choice of numbers meaning we can't place characters at to give the the correct ID. As a result the same character has one ID in the string (font) and another in GameFont. I wrote a function to look up the "right" ID and then figure out what the GameFont ID is for the same character.

It should work as long as you call it correctly as well as provide the correct input (like where the Russian chars are located in GameFont).

I tested bypassing the exe in getText. I ended up with blank text. That didn't look very promising.

dbkblk · Jan 22, 2015

Ok. I've tried to remove as much information as possible to the city billboard to see if it can render using just a string but there are still graphical elements.

Even if we can't bypass getText, we might write a new function and replace the calls in the code, it's not that much of work.

I haven't yet included Russian into GameFont as i want to try to avoid that (to be able to theorically add any language out there!).

EDIT: Question: In your code, you force to put the "?" char to unknown chars. Is your function the lonely one to do that ? I mean, when billboard gets unknown chars, it puts only "????" in names.

EDIT2: Another less good solution could be to implement iconv transliteration to fallback on known characters for city billboards ? and to improve gettext for all others popups (like the one in the previous screenshot).

Nightinggale · Jan 22, 2015

dbkblk said:
Even if we can't bypass getText, we might write a new function and replace the calls in the code, it's not that much of work.

Bypassing getText isn't the issue. The issue is that we gain nothing from doing so. It looks like the limitation lies in the font writing part of the exe, not the getText part meaning even if we write our own code, we still call something in the exe to actually put the text on the screen and that part is broken in regards to supporting multiple languages.

dbkblk said:
I haven't yet included Russian into GameFont as i want to try to avoid that (to be able to theorically add any language out there!).

And you managed to get the city billboards to write Russian without GameFont? I haven't been able to write anything other than GameFont icons on billboards.

dbkblk said:
EDIT: Question: In your code, you force to put the "?" char to unknown chars. Is your function the lonely one to do that ? I mean, when billboard gets unknown chars, it puts only "????" in names.

No the game uses ? too. I actually started using ¿, but then I realized that only some of the CodePages uses that character. The problem is that if there is a unicode character, which isn't present in the CodePage, the character still needs to be added and it just have to be some character. I picked ? as it sort of make sense, but we could use any ASCII character, such as _ or :

dbkblk · Jan 22, 2015

Nightinggale said:
Bypassing getText isn't the issue. The issue is that we gain nothing from doing so. It looks like the limitation lies in the font writing part of the exe, not the getText part meaning even if we write our own code, we still call something in the exe to actually put the text on the screen and that part is broken in regards to supporting multiple languages.

I don't think so. We need to distinguish two different factors:
- The game seems to use GameFont to render city billboard, no matter what we define. I've tried to change ? to . in your code and i still gets ? into billboard. Maybe that part isn't "fixable" and we need to use transliteration or something smarter.
- The popup part however is totally fixable as there are some sentences with icons that correctly render and others that does not. We still need to figure out which code gets the correct result and which one to avoid.

EDIT: Another solution for city billboard is to fallback to english strings. Do you think it's possible ?

EDIT2: Just a small question, i've tried to insert a FAssertMsg which show the codepage in use but i can't get it to insert the value. Any idea?

Code:

unsigned int iCharset = GetACP();
	FAssertMsg(false, "Codepage in use is %d\n", iCharset);

With %d, %u, %i, i still get the exact message "Codepage in use is %d" instead of the value.

Nightinggale · Jan 22, 2015

dbkblk said:
- The game seems to use GameFont to render city billboard, no matter what we define. I've tried to change ? to . in your code and i still gets ? into billboard. Maybe that part isn't "fixable" and we need to use transliteration or something smarter.

"my" ? is when a character isn't present in the CodePage. The game ? is when you try to write an invalid GameFont ID. Those two are totally different.

dbkblk said:
EDIT: Another solution for city billboard is to fallback to english strings. Do you think it's possible ?

We always have English characters available for the Billboards. We could make a conversion table from Russian to English, which would result in Russian being written with latin characters. I'm not sure I would advice that as we have the Russian characters to add.

I don't think we can make a generic "English" approach to billboards. The problem is that once Russian is selected, the Russian string names are stored, but not the English. Changing that would mean some serious recoding of the string reading code. This is not just city names, but also the unit/building it is producing meaning it will be a whole lot of strings.

Also remember that players can enter a custom name. How do you find the English string to a custom Russian cityname?

I don't think it will be realistic to get proper billboards without actually adding the characters to GameFont.tga. Note that the billboards only use that file, not the GameFont with the small icons.

dbkblk · Jan 22, 2015

Ok. Another crazy idea: Remove the cityname and production and get a popup written in the interface which stick on the citybar ?

EDIT: Better get no text than "?" for the moment. I will focus on the bad popup rendering which seems far more important.

EDIT2: Some food for thought:
Good rendering:
In CIV4GameTextInfos:

Code:

	<TEXT>
		<Tag>TXT_KEY_BUILDING_FREE_IN_CITY</Tag>
		<!-- %s1_Name = TXT_KEY_BUILDING_ -->
		<English>[ICON_BULLET]Free [COLOR_BUILDING_TEXT][LINK=literal]%s1_Name[\LINK][COLOR_REVERT] in Every City</English>
		<French>[ICON_BULLET]1 [COLOR_BUILDING_TEXT][LINK=literal]%s1_Name[\LINK][COLOR_REVERT] dans chaque ville</French>
		<German>[ICON_BULLET]Freies Gebäude [COLOR_BUILDING_TEXT][LINK=literal]%s1_Name[\LINK][COLOR_REVERT] in jeder Stadt</German>
		<Italian>[ICON_BULLET][COLOR_BUILDING_TEXT][LINK=literal]%s1_Name[\LINK][COLOR_REVERT] [NUM1:gratuito:gratuita:gratuiti:gratuite] in ogni città</Italian>
		<Spanish>[ICON_BULLET][NUM1:Un:Una:Unos:Unas] [COLOR_BUILDING_TEXT][LINK=literal]%s1:1_Name[\LINK][COLOR_REVERT] gratis en todas las ciudades.</Spanish>
	</TEXT>

Called by, in CvGameTextMgr.cpp:

Code:

szBuffer.append(gDLL->getText("TXT_KEY_BUILDING_FREE_IN_CITY", GC.getBuildingInfo(eFreeBuilding).getTextKeyWide()));

Bad rendering:
In BUILDINGS.xml:

Code:

   <TEXT>
      <Tag>TXT_KEY_BUILDING_DOMESTIC_CONNECTED_COMMERCE_EACH_CITY</Tag>
      <English>[ICON_BULLET]+%d1[ICON_COMMERCE] from each connected domestic city</English>
      <French>[ICON_BULLET]+%d1[ICON_COMMERCE] de chaque ville connectée.</French>
   </TEXT>

Called by, in CvGameTextMgr.cpp:

Code:

szBuffer.append(convertWString(gDLL->getText("TXT_KEY_BUILDING_FOREIGN_CONNECTED_COMMERCE_CITY", kBuilding.getForeignConnectedCommerce(), iCities)));

dbkblk · Jan 22, 2015

PFFIOU... The guilty was the GameFont offset code. As i haven't defined new fonts, it was randomly selecting some unused ID. Now all the popups text seems to render properly. I will look for errors.

EDIT: For some weird reason, the food and commerce ([ICON_...]) don't show up on the screen. All others do.

EDIT2: Fixed. I commented out every single gamefont code part. Now i still have to fix BUG screen and find a smart CityBillboard solution.

[Dev tool] Civilization 4 XML translation tool

Traveler of the Multiverse

Emperor

Emperor

Traveler of the Multiverse

Emperor

Deity

Emperor

Traveler of the Multiverse

Deity

Emperor

Traveler of the Multiverse

Deity

Emperor

Deity

Emperor

Deity

Emperor

Deity

Emperor

Emperor

Similar threads