1. We have added a Gift Upgrades feature that allows you to gift an account upgrade to another member, just in time for the holiday season. You can see the gift option when going to the Account Upgrades screen, or on any user profile screen.
    Dismiss Notice

[Dev tool] Civilization 4 XML translation tool

Discussion in 'Rise of Mankind: A New Dawn' started by dbkblk, Jun 2, 2014.

  1. dbkblk

    dbkblk Emperor

    Joined:
    Oct 28, 2005
    Messages:
    1,781
    Location:
    France
    Maybe i'm a bit utopist but, billboards apart, shouldn't is there a better method to support unicode !?

    A summary of what we know:

    Text import process.
    - The executable first lists all the text relative to the tags at the start of the game. (CvXMLLoadUtility::read())
    - It imports all the language text, or english by default, into a cached file (memory ?)
    - By hijacking importation, we can write files to UTF8 and convert unicode strings to local codepage during reading so the game store local codepage strings.
    - All the standard functions render these local CP without the need to change anything else in the code.

    Other infos:
    - The game renders graphically the CityBillboard (name and current production), so no hope or magic for this.
    - Standard text IS NOT rendered by the graphical engine, as we can write any character from the local codepage.
    - Base files are written in iso8859-1, which is pretty limited to US and western europe languages.

    Now, let's push this a little further.
    WHAT IF... *drum rolls* we would rewrite the way it works with the known limits of the game ?
    - Store EVERY character as number code (like in ISO8859-1 old method), for example & #1049;, in the xml (just need to modify a bit the xml parser to do this easily).
    - The game starts and imports these number suite (we remove the conversion for now).
    - Then we create new functions to convert & #1049; into an unicode char, rewrite getText to getUnicodeText and replace all the calls in the DLL codes (and in python, as 2.4 support unicode). If the game outputs ?, that does not necessarily mean that it don't know which char it is, but maybe it don't which one to render. "& # 1 0 4 9" is just a suite of ASCII chars.

    It shouln't be impossible (if it work at all lol).

    Now look at this (chinese) or this (japanese):


    How can chinese, japanese, korean patches can exists using 2 bytes ? We are missing something here.
     
  2. Nightinggale

    Nightinggale Deity

    Joined:
    Feb 2, 2009
    Messages:
    4,271
    Wrong, though that is what you get from reading the forum. It is actually CP1252, which for most characters is the same as iso8859-1. They are not 100% identical though.

    Do you have an URL to those patches? I have been looking around, but I never actually found a download link for them. Maybe we can figure out how to do it, but if we figure out how they did it, it could be much faster.
     
  3. dbkblk

    dbkblk Emperor

    Joined:
    Oct 28, 2005
    Messages:
    1,781
    Location:
    France
    You're right, the XML heading is misleading.

    I have the korean patch and a japanese patch for a really old version. I have not found a 3C chinese patch (i don't why 3C, but russian also called their patch 3C).
    None of these gave the source code of course.
    Do you think you can grab the DLL output on getText calls with your wonderful C++ skills ? :D

    Links:
    Japanese patch for RoM 2.71
    Korean patch for AND rev618.
    The same original files from the same revision as the Korean patch (to compare).

    The full source code of our mod at revision 618 is here.
     
  4. dbkblk

    dbkblk Emperor

    Joined:
    Oct 28, 2005
    Messages:
    1,781
    Location:
    France
    Korean patch, CvAppInterface.py, line 58
    Code:
    	# ENABLE_CJK
    	sys.setdefaultencoding('utf-8')
    EDIT: Wooh ! This plus a pinch of brain sauce made it to fix the last python i encountered. Thank you koreans !
     
  5. Nightinggale

    Nightinggale Deity

    Joined:
    Feb 2, 2009
    Messages:
    4,271
    :eek:

    Looking up this function I get that it should not be used. Instead people should use utf-8 (isn't that what we want?).

    The question is if we need to do something similar in the DLL. I once tried to do that, but.... that didn't go well. The main problem is that the compiler is from 2003 and I tried using a function, which was introduced in 2005. For some reason finding online documentation on how to code prior to functions introduced 10 years ago is a near impossible task.

    Also I tried looking into the Korean patch, but sourceforge had problems at the time and I couldn't reach the svn server. It appears to be working well now and I might give it another go really soon.
     
  6. dbkblk

    dbkblk Emperor

    Joined:
    Oct 28, 2005
    Messages:
    1,781
    Location:
    France
    If Python is directly UTF8 enabled, we can try to use a similar thing in the DLL. That could be hard, but there is a hope !

    For the moment, that fix go way beyond my expectations as it also enable accents in Dynamic Civ Names in French :D
     
  7. dbkblk

    dbkblk Emperor

    Joined:
    Oct 28, 2005
    Messages:
    1,781
    Location:
    France
    BREAKING NEWS:
    CIV4ArtDefines_Misc.xml
    Code:
    		<MiscArtInfo>
    			<Type>CITY_BILLBOARDS</Type>
    			<Path>None</Path>
    			<!-- positive scale: city billboards use fonts from GameFont.tga -->
    			<!-- negative scale: GFC billboards (uses the interface font) -->
    			<fScale>-1.0</fScale>
    			<NIF>None</NIF>
    			<KFM>None</KFM>
    		</MiscArtInfo>
    EDIT: I forced a text to render and here it is:


    We still need to implement unicode support into the game, but we've never been so far yet !

    EDIT2: Tralalaioula :) (just remove GetGameFontString() before writing production and city name).


    We need to keep pushing !!
     
  8. dbkblk

    dbkblk Emperor

    Joined:
    Oct 28, 2005
    Messages:
    1,781
    Location:
    France
    My intuition tells me that this is the key:
    "DllExport bool GetChildXmlValByName(wchar* pszVal, const TCHAR* szName, wchar* pszDefault = NULL);"

    If we managed to convert UTF8 to wchar or something like that... ?
     
  9. Nightinggale

    Nightinggale Deity

    Joined:
    Feb 2, 2009
    Messages:
    4,271
    That looks awesome. Totally close to getting something useful. I assume you have Russian locale when you made that screenshot.

    I did some research and made an interesting discovery. I have been using the function setLocale(), yet the documentation says:
    Back to the idea of having an XML file providing into on available translations. My idea is to have an int telling which CodePage the chosen translation needs.

    However I have been having problems with setlocale(). I can't remember the details, but it wasn't working correctly.

    I just discovered that _setmbcp() sets two byte CodePages, while setlocale() only works with one byte CodePages.

    Maybe we can switch CodePage with code like this: (pseudo code. Arguments are somewhat complex)
    Code:
    int iCodePage = XML value;
    if (setlocale(iCodePage) == NULL)
    {
        _setmbcp(iCodePage);
    }
    setlocale returns NULL if it fails, which will make the code try to use the same CodePage, but in two byte mode. If that one is NULL too, then we have a problem.
     
  10. Nightinggale

    Nightinggale Deity

    Joined:
    Feb 2, 2009
    Messages:
    4,271
    I already have that in UTF8, but the exe expects to get text delivered in the locale. In other words that function is pointless.

    Besides szName is the name of the language, meaning it will look for "<English>" , "<Russian>" or whatever. All of those are in ASCII, which mean it will work in all locale.
     
  11. dbkblk

    dbkblk Emperor

    Joined:
    Oct 28, 2005
    Messages:
    1,781
    Location:
    France
    That is a track indeed. The other i've found is (you might have a better understanding of the question than me):
    - CvWString is a wide string, so it should be able to encode more than 2 bytes (theorically unlimited).
    - UTF8 is also encoded on one to 4 bytes (for some specials chars).
    - If we manage to read UTF8 directly in the xml and convert to a wide string, i think we should get results.
    Unfortunately, i'm still trying to convert it.

    EDIT: Why do you made the assumption that the exe expect text in the locale ? If Python can use UTF8 directly from the executable, maybe there is a way to just inject UTF to the exe. Maybe the executable doesn't care about text encoding.

    The fact is when the XML is coded in ISO8859-1/W1252, the code is made to understand that encoding and works. When using UTF8, it is not native so we need to convert it to something usable by the code or to adapt the code.

    EDIT2: THAT ?
    Code:
    #ifndef USE_RAPID_XML
    			wchar buf[2048];
    			int iNumWritten = MultiByteToWideChar(CP_UTF8, 0, szTextVal.c_str(), -1, buf, 1000);
    			FAssertMsg(iNumWritten < 2048, "UTF8 text too long, increase buffer size");
    
    			// if the conversion fails, fall back to using the read wide string directly
    			if (iNumWritten <= 0)
    			{
    #endif
     
  12. Nightinggale

    Nightinggale Deity

    Joined:
    Feb 2, 2009
    Messages:
    4,271
    :dance::dance::dance:
    If we get that to work and we only store XML in UTF-8, then we will not have to convert. However MultiByteToWideChar() might be needed. I hope our antique compiler can handle that function.
     
  13. dbkblk

    dbkblk Emperor

    Joined:
    Oct 28, 2005
    Messages:
    1,781
    Location:
    France
    I'm sure it does as it is already included in the default code (in my previous post).
     
  14. dbkblk

    dbkblk Emperor

    Joined:
    Oct 28, 2005
    Messages:
    1,781
    Location:
    France
  15. Nightinggale

    Nightinggale Deity

    Joined:
    Feb 2, 2009
    Messages:
    4,271
    We have a partial boost 1.32. There is only one (or was it two?) boost dll files. However full boost have a whole lot more (I think it was over 20). This mean we might not have access to a function even if it is in boost. I tried compiling boost-thread.dll at some point and it is quite hard to do. Even worse it turned out that the only place the game will look for it is next to the exe file. There code to make it look for it else, like next to the mod DLL file, but for some reason it wants to locate the boost dll used by the dll even before it executes the first dll line of code. You could compile the library in a static version, which would put the boost dll code inside our dll file. However I never managed to get that to work. I didn't try that hard though as I discarded the whole threading idea when I learned that I could only save 0.4 seconds when waiting for next turn :wallbash:
    (not precisely what I wanted for a near one minute wait)

    In short: boost is worth looking into, but it might not be easy to get to work.

    The first line is an if sentence for the precompiler. If USE_RAPID_XML is defined, then everything until #endif will be ignored by the compiler meaning it will accept code, which can't be compiled.

    Interesting enough I can't find anything about USE_RAPID_XML in the code. However I do have MultiByteToWideChar() in code, which is compiled. That should answer the question if it is available.
     
  16. dbkblk

    dbkblk Emperor

    Joined:
    Oct 28, 2005
    Messages:
    1,781
    Location:
    France
    I've pasted the RAPIDXML thing, but i have removed it from my tests. So far, i managed to get only blank lines and my debug doesn't want to stop at breakpoints anymore (i don't know why.) so i can't check if the string is broken before or after being eaten by the executable.
    I managed to get small blank lines and also blank lines with the correct size :D
     
  17. dbkblk

    dbkblk Emperor

    Joined:
    Oct 28, 2005
    Messages:
    1,781
    Location:
    France
    After some researchs, it seems the GameBryo engine 2.0 (which is used by Civ4) is limited to UTC-2 (UTF on 2-bytes) and ASCII. That means we can encode strings to handle korean, japanese, etc. but it seems not so realistic to get UTF8.
    The documentation here is for the newest version, which i assume, use a more feature-full engine.

    I think we should set as a goal to include support for asian languages using our current method for the moment.

    Python handles UTF8, so all we have to do is to inject 2-bytes chars into the executable. Maybe we should be inspired by the python method koreans used!
     
  18. Nightinggale

    Nightinggale Deity

    Joined:
    Feb 2, 2009
    Messages:
    4,271
    I can't get this to work. If I add it to init():, all I get is
    :cry:

    Did you manage to add this? If so, what did you do?
     
  19. dbkblk

    dbkblk Emperor

    Joined:
    Oct 28, 2005
    Messages:
    1,781
    Location:
    France
  20. Nightinggale

    Nightinggale Deity

    Joined:
    Feb 2, 2009
    Messages:
    4,271
    Problem solved.... sort of. Moving to a different computer made the game accept the new code with no problems whatsoever. The problem causing computer have been acting up lately and to be honest reinstalling windows is on the todo list. I just never had any problems with colonization before.
     

Share This Page